Skip to content

Floating Point issues in percentiles #2

@DragonWarrior15

Description

@DragonWarrior15

For numeric variables with few numeric values.. say 2-3, the percentile values will be same across a lot of points. Hence, the condition in np.where(arr > arr[start]) might break and return the wrong lowest percentile, causing the program to be stuck in the while loop.

def get_next_range(arr,group_range,start):
    if group_range + start >=100:
        return 100
    elif (100 - group_range/2) < start + group_range:
        return 100
    elif arr[-1] == arr[start]:
        return 100
    elif (arr[start+group_range] == arr[start]) or (arr[start] < 0):
        return np.max([np.min(np.where(arr > arr[start])),np.min(np.where(arr >= 0))])
    else:
        return group_range + start

For rectification of this error, percentile values after calculation must be rounded off to some fixed decimal values
Something like the following
percentiles = np.around(np.array([np.percentile(df1[var],p) for p in range(0,100)]), decimals = 5)
will fix this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions