Skip to content

fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None#2232

Closed
Olcmyk wants to merge 4 commits into
microsoft:mainfrom
Olcmyk:fix/lightgbm-4.0-compatibility
Closed

fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None#2232
Olcmyk wants to merge 4 commits into
microsoft:mainfrom
Olcmyk:fix/lightgbm-4.0-compatibility

Conversation

@Olcmyk
Copy link
Copy Markdown

@Olcmyk Olcmyk commented May 23, 2026

Description

This PR fixes LightGBM 4.0+ compatibility issue where TypeError: early_stopping_round should be an integer. Got 'NoneType' is raised when early_stopping_rounds=None is passed to the model.

Starting from LightGBM 4.0, the lgb.early_stopping() function no longer accepts None as a parameter and requires an integer value. This PR modifies the code to only create the early stopping callback when early_stopping_rounds is not None.

Changes:

  • Modified qlib/contrib/model/gbdt.py: Build callbacks list dynamically, only add early_stopping callback when rounds is not None
  • Modified qlib/contrib/model/highfreq_gdbt_model.py: Apply same pattern for consistency and robustness

Motivation and Context

Related Issues:

Problem:
After applying PR #2213 and PR #2230 (which fix the pickle whitelist issues), the DDG-DA workflow fails with:

TypeError: early_stopping_round should be an integer. Got 'NoneType'

Why this change is required:

  1. LightGBM 4.0+ changed the API to require an integer for early_stopping_rounds
  2. The DDG-DA workflow explicitly sets early_stopping_rounds=None to disable early stopping
  3. The code directly passes None to lgb.early_stopping(), which is no longer allowed in LightGBM 4.0+

Root Cause:
In qlib/contrib/model/gbdt.py line 71-73, the code unconditionally creates an early stopping callback:

early_stopping_callback = lgb.early_stopping(
    self.early_stopping_rounds if early_stopping_rounds is None else early_stopping_rounds
)

When the value is None, LightGBM 4.0+ raises a TypeError.

Solution:
Only create the early stopping callback when the value is not None:

callbacks = []
early_stop_rounds = self.early_stopping_rounds if early_stopping_rounds is None else early_stopping_rounds
if early_stop_rounds is not None:
    callbacks.append(lgb.early_stopping(early_stop_rounds))

This pattern is already used in qlib/contrib/model/double_ensemble.py (lines 110-111).

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Additional Testing:

  • Tested with LightGBM 4.6.0 (latest version)
  • Verified DDG-DA workflow proceeds past the LightGBM error
  • Confirmed early stopping still works when a valid integer is provided
  • Confirmed training works when early_stopping_rounds=None (no early stopping)

Test Environment:

  • Python: 3.8.10
  • OS: Linux (Ubuntu 22.04)
  • LightGBM version: 4.6.0
  • qlib version: 0.9.8.dev33

Test Command:

cd examples/benchmarks_dynamic/DDG-DA
rm -rf mlruns
python workflow.py run

Screenshots of Test Results (if appropriate):

  1. Pipeline test: ✅ Passed
  2. Your own tests:

Before the fix:

TypeError: early_stopping_round should be an integer. Got 'NoneType'
File "qlib/contrib/model/gbdt.py", line 71, in fit
    early_stopping_callback = lgb.early_stopping(...)

After the fix:

train tasks: 100%|████████████████████████████████████| 154/154 [05:31<00:00,  2.15s/it]
calc: 100%|█████████████████████████████████████████| 154/154 [00:01<00:00, 100.48it/s]

The LightGBM TypeError is fixed and training completes all 154 tasks successfully.

Note: After fixing this issue, the DDG-DA workflow encounters another bug (TypeError: unhashable type: 'slice'), which is a separate issue in the data selection module and will be addressed in a follow-up PR.

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

Additional Notes

Backward Compatibility:
This fix is fully backward compatible and works with both LightGBM < 4.0 and >= 4.0.

Dependencies:
This PR should be merged after:

Without these PRs, the zscore unpickling error will occur before reaching this LightGBM issue.

References:

⚠️ Important: This PR depends on #2230

genisis0x and others added 4 commits May 12, 2026 15:43
…aset chain

The RestrictedUnpickler safelist introduced by the recent security
hardening (microsoft#2099 / microsoft#2076 / microsoft#2153) only covered the abstract
``DataHandler`` / ``DataHandlerLP`` classes plus ``StaticDataLoader``.
Any rolling workflow that pickles a real Dataset (the default for
``Rolling._train_rolling_tasks``) walks into one of the contrib stock
handlers and now crashes on reload (issue microsoft#2130):

    UnpicklingError: Forbidden class:
    qlib.contrib.data.handler.Alpha158. Only whitelisted classes
    are allowed for security reasons. ...

Unrolling workflows happened to use a path that did not go through the
restricted loader, which is why downgrading to 0.9.7 hid the issue.

Extend ``SAFE_PICKLE_CLASSES`` with the qlib-internal classes that sit
on the standard recorder pickle graph:

* The four shipped contrib handlers: ``Alpha158``, ``Alpha158vwap``,
  ``Alpha360``, ``Alpha360vwap``.
* The dataset wrappers (``Dataset``, ``DatasetH``, ``TSDatasetH``) and
  the additional concrete loaders (``DataLoader``, ``DLWParser``,
  ``QlibDataLoader``, ``NestedDataLoader``, ``DataLoaderDH``).
* Every concrete ``Processor`` defined in
  ``qlib.data.dataset.processor`` -- they show up in every realistic
  ``learn_processors`` / ``infer_processors`` chain.

These are all classes already shipped inside qlib itself, so adding
them does not weaken the threat model the safelist was designed
against (arbitrary code execution through external pickle payloads).

Add regression tests pinning each added entry plus an end-to-end check
that ``RestrictedUnpickler.find_class`` actually resolves ``Alpha158``
and that other unknown classes are still rejected.

Fixes microsoft#2130
PR microsoft#2213 added Alpha158/Alpha360 handlers to the pickle whitelist but
missed qlib.utils.data.zscore, which is also required by the DDG-DA
workflow. Without this, DDG-DA fails with:

  UnpicklingError: Forbidden class: qlib.utils.data.zscore

This commit adds zscore to the whitelist and includes a test to prevent
regression.

Fixes microsoft#2130 (supplement to PR microsoft#2213)
LightGBM 4.0+ no longer accepts None for early_stopping_rounds parameter.
This commit modifies the code to only create the early_stopping callback
when early_stopping_rounds is not None.

Changes:
- qlib/contrib/model/gbdt.py: Build callbacks list dynamically, only add
  early_stopping callback when rounds is not None
- qlib/contrib/model/highfreq_gdbt_model.py: Apply same pattern for
  consistency and robustness

This fix allows DDG-DA workflow to proceed past the LightGBM TypeError
when early_stopping is disabled by setting early_stopping_rounds=None.

Fixes the error:
TypeError: early_stopping_round should be an integer. Got 'NoneType'

Note: This fix requires PR microsoft#2230 (zscore whitelist) to be applied first,
otherwise the zscore unpickling error will occur before this issue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sues

This commit fixes two critical bugs that prevented DDG-DA workflow from running:

1. **Unhashable list type error in InternalData.setup()**
   - Problem: data_key was a list [start_date, end_date], which cannot be used
     as dictionary keys or DataFrame column names
   - Fix: Convert list to tuple to make it hashable (line 99-100)

2. **Incorrect pandas indexing in _calc_perf()**
   - Problem: Used wrong syntax df.loc(axis=0)[:, "pred"] and group_keys=False
     caused loss of datetime index, leading to droplevel error
   - Fix: Remove group_keys=False and use df.xs("label", level=1) to correctly
     select from MultiIndex (line 112-114)

3. **Missing InternalData in pickle whitelist**
   - Problem: InternalData class was not whitelisted, causing UnpicklingError
   - Fix: Add InternalData to SAFE_PICKLE_CLASSES (pickle_utils.py line 91)

Changes:
- qlib/contrib/meta/data_selection/dataset.py:
  * Convert list to tuple for hashable dictionary keys
  * Fix _calc_perf to use correct pandas MultiIndex selection
- qlib/utils/pickle_utils.py:
  * Add InternalData to pickle whitelist

Testing:
✅ DDG-DA workflow now runs successfully to completion
✅ All 154 training tasks complete without errors
✅ Meta-learning data selection works correctly
✅ Final backtest results generated successfully

This is a WORKING VERSION - DDG-DA workflow runs end-to-end!

Related issues:
- Depends on PR microsoft#2230 (zscore whitelist)
- Depends on LightGBM 4.0+ compatibility fix

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Olcmyk Olcmyk closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants