Skip to content

Wider numeric field format support#44

Open
flywire wants to merge 5 commits intomarlanperumal:developfrom
flywire:patch-3
Open

Wider numeric field format support#44
flywire wants to merge 5 commits intomarlanperumal:developfrom
flywire:patch-3

Conversation

@flywire
Copy link
Contributor

@flywire flywire commented Jun 20, 2021

Use regular expression operations on numeric fields to discard all characters except numbers and decimal separator, and identify negative values by a leading or trailing '-' or case insensitive 'DR'. See #33.

flywire added 2 commits June 20, 2021 16:16
Use regular expression operations on numeric fields to discard all characters except numbers and decimal separator, and identify negative values by a leading or trailing '-' or case insensitive 'DR'.
def clean_numeric(df, config):
numeric_cols = [config["columns"][col] for col in config["cleaning"]["numeric"]]

def format_negatives(s):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I haven't been good yet about writing tests for the project, but can you please add some tests to show that the format_currency_number method actually does what it's supposed to.

I have a natural aversion to regex, mostly because it generally results in the opposite of what python generally gives us - clean human readable code that's easy to easily visually grep what's going on. Python's built in string methods usually do a better job at this. In cases like this however with multiple complex rules, I'll concede that using regex may result in cleaner code. I'd only include it if there were tests against it though. Just reading it now, I'm not certain what all the side effects and edge cases might be.

I'll update with a sample test on the original format_negatives method that you can use as a basis.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the format_negatives method outside of the clean_numeric method so that it could be tested but that's now caused a merge conflict with your branch. Should be easy enough to fix though.

Looking at that old code I wrote in parse.py there's certainly a lot to be desired in terms of it being robust to the different content that might come out of a pdf statement file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants