Skip to content

Comments

Fix ExpressionDFilter treating missing data as passing filter#2119

Open
Ayush10 wants to merge 1 commit intomicrosoft:mainfrom
Ayush10:fix/issue-2084-filter-nan-handling
Open

Fix ExpressionDFilter treating missing data as passing filter#2119
Ayush10 wants to merge 1 commit intomicrosoft:mainfrom
Ayush10:fix/issue-2084-filter-nan-handling

Conversation

@Ayush10
Copy link

@Ayush10 Ayush10 commented Jan 31, 2026

Summary

  • Fix NaN values in filter expressions being incorrectly treated as True (passing the filter)
  • Add fillna(False) before astype("bool") in SeriesDFilter._filterSeries

Root Cause

In qlib/data/filter.py line 145, filter_series.astype("bool") converts NaN to True due to pandas' casting behavior. This means instruments with missing feature data (e.g., during trading halts) incorrectly pass expression filters like $close > 2000.

Fix

# Before (NaN → True)
filter_series = filter_series.astype("bool")

# After (NaN → False)
filter_series = filter_series.fillna(False).astype("bool")

Test plan

  • ExpressionDFilter(rule_expression='$close>2000') should exclude instruments on dates where $close is missing
  • Existing _toTimestamp NaN handling (lines 171-172) is now consistent with this fix

Fixes #2084

fillna(False) before astype("bool") in _filterSeries so that NaN
values from missing feature data (e.g. during trading halts) are
treated as not passing the filter instead of incorrectly passing.

pandas astype("bool") converts NaN to True, which caused instruments
with missing data to be included as if they satisfied the filter
expression.

Fixes microsoft#2084
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExpressionDFilter result not as expected

1 participant