Skip to content

fix(data/filter): apply sort_index result in SeriesDFilter#2240

Open
genisis0x wants to merge 1 commit into
microsoft:mainfrom
genisis0x:fix/filter-sort-index-discarded
Open

fix(data/filter): apply sort_index result in SeriesDFilter#2240
genisis0x wants to merge 1 commit into
microsoft:mainfrom
genisis0x:fix/filter-sort-index-discarded

Conversation

@genisis0x
Copy link
Copy Markdown

Description

Closes #2165.

In qlib/data/filter.py, SeriesDFilter._getFilterSeries sorts the timestamp series but discards the result:

# sort the timestamp_series according to the timestamps
timestamp_series.sort_index()

pandas.Series.sort_index() returns a new sorted Series and does not sort in place (that would require inplace=True). The series is therefore iterated unsorted in the loop that builds the (timestamp, timestamp) ranges, which can yield incorrect filter boundaries when the input index is not already ordered.

Change

Assign the sorted result back: timestamp_series = timestamp_series.sort_index(). This matches the intent of the existing comment and the in-order assumption of the boundary scan that follows.

Testing

With an out-of-order input index, the boundary scan now iterates ascending timestamps and produces correctly ordered ranges. Already-sorted inputs are unaffected.

_getFilterSeries called timestamp_series.sort_index() and discarded the
return value. pandas Series.sort_index() returns a new sorted Series and
does not sort in place, so the subsequent iteration ran over the
unsorted series and could produce incorrect (timestamp, timestamp)
filter ranges. Assign the sorted result back so the boundary scan sees
timestamps in order, matching the intent of the existing comment. Closes microsoft#2165.
@genisis0x
Copy link
Copy Markdown
Author

@SunsetWolf a review when convenient would be great. One-liner in SeriesDFilter: sort_index() return value was discarded (pandas doesn't sort in place), so the timestamp boundary scan ran on an unsorted series; now assigned back. CLA signed, CI green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: sort_index() result discarded in filter.py — timestamp series may be unsorted

1 participant