DDG-DA workflow fails with chain of sequential bugs (LightGBM 4.0+, unhashable list, pandas indexing)

## 🐛 Bug Description

DDG-DA (Data Distribution Guided Domain Adaptation) workflow is completely broken due to a chain of sequential bugs. Each bug masks the next one, making them impossible to discover without fixing the previous bug first.

## To Reproduce

**Important Prerequisites:**
This bug chain can only be reproduced after PR #2230 is merged. Without PR #2230, you will encounter the zscore unpickling error first, which masks all subsequent bugs.

### Steps to reproduce:

1. **Apply PR #2230** (zscore and InternalData pickle whitelist):
   ```bash
   # Wait for PR #2230 to be merged, or apply it locally
   git fetch origin pull/2230/head:test-ddgda
   git checkout test-ddgda
   pip install -e .
   ```

2. **Run DDG-DA workflow:**
   ```bash
   cd examples/benchmarks_dynamic/DDG-DA
   rm -rf mlruns
   python workflow.py run
   ```

3. **Observe the bug chain** (each bug is revealed after fixing the previous one)

## The Bug Chain

### Bug 1: LightGBM 4.0+ Compatibility Issue ⚠️ **Blocks all subsequent bugs**

**Error:**
```
TypeError: early_stopping_round should be an integer. Got 'NoneType'
```

**Location:** `qlib/contrib/model/gbdt.py:71-73`

**Root Cause:**
- LightGBM 4.0+ no longer accepts `None` for `early_stopping_rounds` parameter
- DDG-DA workflow explicitly sets `early_stopping_rounds=None` to disable early stopping
- Code unconditionally passes the value to `lgb.early_stopping()`, causing TypeError

**Impact:** DDG-DA workflow fails immediately after completing meta-model training tasks

---

### Bug 2: Unhashable List Type Error ⚠️ **Revealed after fixing Bug 1**

**Error:**
```
TypeError: unhashable type: 'list'
```

**Location:** `qlib/contrib/meta/data_selection/dataset.py:97-102`

**Root Cause:**
```python
data_key = task["dataset"]["kwargs"]["segments"]["train"]  # Returns a list: [start_date, end_date]
key_l.append(data_key)
# ...
self.data_ic_df = pd.DataFrame(dict(zip(key_l, ic_l)))  # ❌ Lists cannot be dict keys
```

**Impact:** InternalData.setup() fails when trying to create DataFrame with list as column keys

---

### Bug 3: Incorrect Pandas MultiIndex Selection ⚠️ **Revealed after fixing Bug 2**

**Error:**
```
ValueError: Cannot remove 1 levels from an index with 1 levels: at least one level must be left.
```

**Location:** `qlib/contrib/meta/data_selection/dataset.py:110-114`

**Root Cause:**
```python
def _calc_perf(self, pred, label):
    df = pd.DataFrame({"pred": pred, "label": label})
    df = df.groupby("datetime", group_keys=False).corr(method="spearman")
    corr = df.loc(axis=0)[:, "pred"]["label"].droplevel(axis=0, level=-1)  # ❌ Wrong syntax
```

**Problems:**
1. `df.loc(axis=0)` is incorrect syntax (should be `df.loc[...]`)
2. `group_keys=False` causes loss of datetime index
3. Attempting to `droplevel` when only 1 level exists

**Impact:** _calc_perf() fails during correlation calculation

---

## Expected Behavior

DDG-DA workflow should run successfully from start to finish:
1. ✅ Train 154 meta-models
2. ✅ Calculate data similarity matrix
3. ✅ Train meta-learning model with data selection
4. ✅ Generate final predictions and backtest results

## Environment

- Qlib version: `0.9.8.dev33` (main branch)
- Python version: `3.8.10`
- OS: `Linux` (Ubuntu 22.04)
- LightGBM version: `4.6.0` (affects 4.0+)

## Why These Bugs Form a Chain

1. **Bug 1 (LightGBM)** occurs first and prevents any further execution
2. **Bug 2 (unhashable list)** is only reached after Bug 1 is fixed
3. **Bug 3 (pandas indexing)** is only reached after Bug 2 is fixed

This is why they were not discovered earlier - each bug completely blocks the workflow, masking all subsequent bugs.

## Proposed Solution

All three bugs must be fixed together for DDG-DA to work. The fixes are:

### Fix 1: LightGBM 4.0+ Compatibility
**Files:** `qlib/contrib/model/gbdt.py`, `qlib/contrib/model/highfreq_gdbt_model.py`

```python
# Build callbacks list dynamically
callbacks = []

# Only add early_stopping callback if rounds is not None (LightGBM 4.0+ compatibility)
early_stop_rounds = self.early_stopping_rounds if early_stopping_rounds is None else early_stopping_rounds
if early_stop_rounds is not None:
    callbacks.append(lgb.early_stopping(early_stop_rounds))

callbacks.append(lgb.log_evaluation(period=verbose_eval))
callbacks.append(lgb.record_evaluation(evals_result))

self.model = lgb.train(..., callbacks=callbacks, ...)
```

### Fix 2: Convert List to Tuple
**File:** `qlib/contrib/meta/data_selection/dataset.py:97-100`

```python
data_key = task["dataset"]["kwargs"]["segments"]["train"]
# Convert list to tuple to make it hashable
if isinstance(data_key, list):
    data_key = tuple(data_key)
key_l.append(data_key)
```

### Fix 3: Fix Pandas MultiIndex Selection
**File:** `qlib/contrib/meta/data_selection/dataset.py:110-114`

```python
def _calc_perf(self, pred, label):
    df = pd.DataFrame({"pred": pred, "label": label})
    df = df.groupby("datetime").corr(method="spearman")  # Remove group_keys=False
    # Use xs to select 'label' from the second level of MultiIndex
    corr = df.xs("label", level=1)["pred"]
    return corr
```

## Testing

After applying all three fixes:

```bash
cd examples/benchmarks_dynamic/DDG-DA
rm -rf mlruns
python workflow.py run
```

**Expected output:**
```
train tasks: 100%|████████████████████████████████| 154/154 [05:31<00:00,  2.15s/it]
calc: 100%|█████████████████████████████████████| 154/154 [00:01<00:00, 100.48it/s]
...
[Final backtest results displayed successfully]
```

## Additional Notes

**Why submit as one issue?**
- These bugs form a true dependency chain, not an artificial grouping
- Each bug completely blocks discovery of the next
- All must be fixed for DDG-DA to work
- Splitting into separate issues would create confusion about reproduction steps

**Dependencies:**
- ⚠️ **Requires PR #2230 to be merged first** (zscore and InternalData pickle whitelist)
- Without PR #2230, the workflow fails earlier with UnpicklingError

**Impact:**
- 🔴 **Critical**: DDG-DA workflow is completely non-functional
- Affects all users trying to use DDG-DA for meta-learning
- Affects LightGBM 4.0+ users across the codebase (not just DDG-DA)

**Related:**
- Issue #2130: Original UnpicklingError report
- PR #2213: Alpha158/Alpha360 handlers whitelist
- PR #2230: zscore and InternalData whitelist (prerequisite)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDG-DA workflow fails with chain of sequential bugs (LightGBM 4.0+, unhashable list, pandas indexing) #2233

🐛 Bug Description

To Reproduce

Steps to reproduce:

The Bug Chain

Bug 1: LightGBM 4.0+ Compatibility Issue ⚠️ Blocks all subsequent bugs

Bug 2: Unhashable List Type Error ⚠️ Revealed after fixing Bug 1

Bug 3: Incorrect Pandas MultiIndex Selection ⚠️ Revealed after fixing Bug 2

Expected Behavior

Environment

Why These Bugs Form a Chain

Proposed Solution

Fix 1: LightGBM 4.0+ Compatibility

Fix 2: Convert List to Tuple

Fix 3: Fix Pandas MultiIndex Selection

Testing

Additional Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DDG-DA workflow fails with chain of sequential bugs (LightGBM 4.0+, unhashable list, pandas indexing) #2233

Description

🐛 Bug Description

To Reproduce

Steps to reproduce:

The Bug Chain

Bug 1: LightGBM 4.0+ Compatibility Issue ⚠️ Blocks all subsequent bugs

Bug 2: Unhashable List Type Error ⚠️ Revealed after fixing Bug 1

Bug 3: Incorrect Pandas MultiIndex Selection ⚠️ Revealed after fixing Bug 2

Expected Behavior

Environment

Why These Bugs Form a Chain

Proposed Solution

Fix 1: LightGBM 4.0+ Compatibility

Fix 2: Convert List to Tuple

Fix 3: Fix Pandas MultiIndex Selection

Testing

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions