fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None#2232
Closed
Olcmyk wants to merge 4 commits into
Closed
fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None#2232Olcmyk wants to merge 4 commits into
Olcmyk wants to merge 4 commits into
Conversation
…aset chain The RestrictedUnpickler safelist introduced by the recent security hardening (microsoft#2099 / microsoft#2076 / microsoft#2153) only covered the abstract ``DataHandler`` / ``DataHandlerLP`` classes plus ``StaticDataLoader``. Any rolling workflow that pickles a real Dataset (the default for ``Rolling._train_rolling_tasks``) walks into one of the contrib stock handlers and now crashes on reload (issue microsoft#2130): UnpicklingError: Forbidden class: qlib.contrib.data.handler.Alpha158. Only whitelisted classes are allowed for security reasons. ... Unrolling workflows happened to use a path that did not go through the restricted loader, which is why downgrading to 0.9.7 hid the issue. Extend ``SAFE_PICKLE_CLASSES`` with the qlib-internal classes that sit on the standard recorder pickle graph: * The four shipped contrib handlers: ``Alpha158``, ``Alpha158vwap``, ``Alpha360``, ``Alpha360vwap``. * The dataset wrappers (``Dataset``, ``DatasetH``, ``TSDatasetH``) and the additional concrete loaders (``DataLoader``, ``DLWParser``, ``QlibDataLoader``, ``NestedDataLoader``, ``DataLoaderDH``). * Every concrete ``Processor`` defined in ``qlib.data.dataset.processor`` -- they show up in every realistic ``learn_processors`` / ``infer_processors`` chain. These are all classes already shipped inside qlib itself, so adding them does not weaken the threat model the safelist was designed against (arbitrary code execution through external pickle payloads). Add regression tests pinning each added entry plus an end-to-end check that ``RestrictedUnpickler.find_class`` actually resolves ``Alpha158`` and that other unknown classes are still rejected. Fixes microsoft#2130
PR microsoft#2213 added Alpha158/Alpha360 handlers to the pickle whitelist but missed qlib.utils.data.zscore, which is also required by the DDG-DA workflow. Without this, DDG-DA fails with: UnpicklingError: Forbidden class: qlib.utils.data.zscore This commit adds zscore to the whitelist and includes a test to prevent regression. Fixes microsoft#2130 (supplement to PR microsoft#2213)
LightGBM 4.0+ no longer accepts None for early_stopping_rounds parameter. This commit modifies the code to only create the early_stopping callback when early_stopping_rounds is not None. Changes: - qlib/contrib/model/gbdt.py: Build callbacks list dynamically, only add early_stopping callback when rounds is not None - qlib/contrib/model/highfreq_gdbt_model.py: Apply same pattern for consistency and robustness This fix allows DDG-DA workflow to proceed past the LightGBM TypeError when early_stopping is disabled by setting early_stopping_rounds=None. Fixes the error: TypeError: early_stopping_round should be an integer. Got 'NoneType' Note: This fix requires PR microsoft#2230 (zscore whitelist) to be applied first, otherwise the zscore unpickling error will occur before this issue. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sues
This commit fixes two critical bugs that prevented DDG-DA workflow from running:
1. **Unhashable list type error in InternalData.setup()**
- Problem: data_key was a list [start_date, end_date], which cannot be used
as dictionary keys or DataFrame column names
- Fix: Convert list to tuple to make it hashable (line 99-100)
2. **Incorrect pandas indexing in _calc_perf()**
- Problem: Used wrong syntax df.loc(axis=0)[:, "pred"] and group_keys=False
caused loss of datetime index, leading to droplevel error
- Fix: Remove group_keys=False and use df.xs("label", level=1) to correctly
select from MultiIndex (line 112-114)
3. **Missing InternalData in pickle whitelist**
- Problem: InternalData class was not whitelisted, causing UnpicklingError
- Fix: Add InternalData to SAFE_PICKLE_CLASSES (pickle_utils.py line 91)
Changes:
- qlib/contrib/meta/data_selection/dataset.py:
* Convert list to tuple for hashable dictionary keys
* Fix _calc_perf to use correct pandas MultiIndex selection
- qlib/utils/pickle_utils.py:
* Add InternalData to pickle whitelist
Testing:
✅ DDG-DA workflow now runs successfully to completion
✅ All 154 training tasks complete without errors
✅ Meta-learning data selection works correctly
✅ Final backtest results generated successfully
This is a WORKING VERSION - DDG-DA workflow runs end-to-end!
Related issues:
- Depends on PR microsoft#2230 (zscore whitelist)
- Depends on LightGBM 4.0+ compatibility fix
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes LightGBM 4.0+ compatibility issue where
TypeError: early_stopping_round should be an integer. Got 'NoneType'is raised whenearly_stopping_rounds=Noneis passed to the model.Starting from LightGBM 4.0, the
lgb.early_stopping()function no longer acceptsNoneas a parameter and requires an integer value. This PR modifies the code to only create the early stopping callback whenearly_stopping_roundsis notNone.Changes:
qlib/contrib/model/gbdt.py: Build callbacks list dynamically, only add early_stopping callback when rounds is not Noneqlib/contrib/model/highfreq_gdbt_model.py: Apply same pattern for consistency and robustnessMotivation and Context
Related Issues:
Problem:
After applying PR #2213 and PR #2230 (which fix the pickle whitelist issues), the DDG-DA workflow fails with:
Why this change is required:
early_stopping_roundsearly_stopping_rounds=Noneto disable early stoppingNonetolgb.early_stopping(), which is no longer allowed in LightGBM 4.0+Root Cause:
In
qlib/contrib/model/gbdt.pyline 71-73, the code unconditionally creates an early stopping callback:When the value is
None, LightGBM 4.0+ raises a TypeError.Solution:
Only create the early stopping callback when the value is not
None:This pattern is already used in
qlib/contrib/model/double_ensemble.py(lines 110-111).How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.pyunder upper directory ofqlib.Additional Testing:
early_stopping_rounds=None(no early stopping)Test Environment:
Test Command:
cd examples/benchmarks_dynamic/DDG-DA rm -rf mlruns python workflow.py runScreenshots of Test Results (if appropriate):
Before the fix:
After the fix:
The LightGBM TypeError is fixed and training completes all 154 tasks successfully.
Note: After fixing this issue, the DDG-DA workflow encounters another bug (
TypeError: unhashable type: 'slice'), which is a separate issue in the data selection module and will be addressed in a follow-up PR.Types of changes
Additional Notes
Backward Compatibility:
This fix is fully backward compatible and works with both LightGBM < 4.0 and >= 4.0.
Dependencies:
This PR should be merged after:
Without these PRs, the zscore unpickling error will occur before reaching this LightGBM issue.
References: