Skip to content

DDG-DA workflow fails with chain of sequential bugs (LightGBM 4.0+, unhashable list, pandas indexing) #2233

@Olcmyk

Description

@Olcmyk

🐛 Bug Description

DDG-DA (Data Distribution Guided Domain Adaptation) workflow is completely broken due to a chain of sequential bugs. Each bug masks the next one, making them impossible to discover without fixing the previous bug first.

To Reproduce

Important Prerequisites:
This bug chain can only be reproduced after PR #2230 is merged. Without PR #2230, you will encounter the zscore unpickling error first, which masks all subsequent bugs.

Steps to reproduce:

  1. Apply PR Fix/pickle whitelist zscore #2230 (zscore and InternalData pickle whitelist):

    # Wait for PR #2230 to be merged, or apply it locally
    git fetch origin pull/2230/head:test-ddgda
    git checkout test-ddgda
    pip install -e .
  2. Run DDG-DA workflow:

    cd examples/benchmarks_dynamic/DDG-DA
    rm -rf mlruns
    python workflow.py run
  3. Observe the bug chain (each bug is revealed after fixing the previous one)

The Bug Chain

Bug 1: LightGBM 4.0+ Compatibility Issue ⚠️ Blocks all subsequent bugs

Error:

TypeError: early_stopping_round should be an integer. Got 'NoneType'

Location: qlib/contrib/model/gbdt.py:71-73

Root Cause:

  • LightGBM 4.0+ no longer accepts None for early_stopping_rounds parameter
  • DDG-DA workflow explicitly sets early_stopping_rounds=None to disable early stopping
  • Code unconditionally passes the value to lgb.early_stopping(), causing TypeError

Impact: DDG-DA workflow fails immediately after completing meta-model training tasks


Bug 2: Unhashable List Type Error ⚠️ Revealed after fixing Bug 1

Error:

TypeError: unhashable type: 'list'

Location: qlib/contrib/meta/data_selection/dataset.py:97-102

Root Cause:

data_key = task["dataset"]["kwargs"]["segments"]["train"]  # Returns a list: [start_date, end_date]
key_l.append(data_key)
# ...
self.data_ic_df = pd.DataFrame(dict(zip(key_l, ic_l)))  # ❌ Lists cannot be dict keys

Impact: InternalData.setup() fails when trying to create DataFrame with list as column keys


Bug 3: Incorrect Pandas MultiIndex Selection ⚠️ Revealed after fixing Bug 2

Error:

ValueError: Cannot remove 1 levels from an index with 1 levels: at least one level must be left.

Location: qlib/contrib/meta/data_selection/dataset.py:110-114

Root Cause:

def _calc_perf(self, pred, label):
    df = pd.DataFrame({"pred": pred, "label": label})
    df = df.groupby("datetime", group_keys=False).corr(method="spearman")
    corr = df.loc(axis=0)[:, "pred"]["label"].droplevel(axis=0, level=-1)  # ❌ Wrong syntax

Problems:

  1. df.loc(axis=0) is incorrect syntax (should be df.loc[...])
  2. group_keys=False causes loss of datetime index
  3. Attempting to droplevel when only 1 level exists

Impact: _calc_perf() fails during correlation calculation


Expected Behavior

DDG-DA workflow should run successfully from start to finish:

  1. ✅ Train 154 meta-models
  2. ✅ Calculate data similarity matrix
  3. ✅ Train meta-learning model with data selection
  4. ✅ Generate final predictions and backtest results

Environment

  • Qlib version: 0.9.8.dev33 (main branch)
  • Python version: 3.8.10
  • OS: Linux (Ubuntu 22.04)
  • LightGBM version: 4.6.0 (affects 4.0+)

Why These Bugs Form a Chain

  1. Bug 1 (LightGBM) occurs first and prevents any further execution
  2. Bug 2 (unhashable list) is only reached after Bug 1 is fixed
  3. Bug 3 (pandas indexing) is only reached after Bug 2 is fixed

This is why they were not discovered earlier - each bug completely blocks the workflow, masking all subsequent bugs.

Proposed Solution

All three bugs must be fixed together for DDG-DA to work. The fixes are:

Fix 1: LightGBM 4.0+ Compatibility

Files: qlib/contrib/model/gbdt.py, qlib/contrib/model/highfreq_gdbt_model.py

# Build callbacks list dynamically
callbacks = []

# Only add early_stopping callback if rounds is not None (LightGBM 4.0+ compatibility)
early_stop_rounds = self.early_stopping_rounds if early_stopping_rounds is None else early_stopping_rounds
if early_stop_rounds is not None:
    callbacks.append(lgb.early_stopping(early_stop_rounds))

callbacks.append(lgb.log_evaluation(period=verbose_eval))
callbacks.append(lgb.record_evaluation(evals_result))

self.model = lgb.train(..., callbacks=callbacks, ...)

Fix 2: Convert List to Tuple

File: qlib/contrib/meta/data_selection/dataset.py:97-100

data_key = task["dataset"]["kwargs"]["segments"]["train"]
# Convert list to tuple to make it hashable
if isinstance(data_key, list):
    data_key = tuple(data_key)
key_l.append(data_key)

Fix 3: Fix Pandas MultiIndex Selection

File: qlib/contrib/meta/data_selection/dataset.py:110-114

def _calc_perf(self, pred, label):
    df = pd.DataFrame({"pred": pred, "label": label})
    df = df.groupby("datetime").corr(method="spearman")  # Remove group_keys=False
    # Use xs to select 'label' from the second level of MultiIndex
    corr = df.xs("label", level=1)["pred"]
    return corr

Testing

After applying all three fixes:

cd examples/benchmarks_dynamic/DDG-DA
rm -rf mlruns
python workflow.py run

Expected output:

train tasks: 100%|████████████████████████████████| 154/154 [05:31<00:00,  2.15s/it]
calc: 100%|█████████████████████████████████████| 154/154 [00:01<00:00, 100.48it/s]
...
[Final backtest results displayed successfully]

Additional Notes

Why submit as one issue?

  • These bugs form a true dependency chain, not an artificial grouping
  • Each bug completely blocks discovery of the next
  • All must be fixed for DDG-DA to work
  • Splitting into separate issues would create confusion about reproduction steps

Dependencies:

Impact:

  • 🔴 Critical: DDG-DA workflow is completely non-functional
  • Affects all users trying to use DDG-DA for meta-learning
  • Affects LightGBM 4.0+ users across the codebase (not just DDG-DA)

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions