QuantumNoLab · George930502 · Apr 2, 2026 · Apr 1, 2026 · Apr 1, 2026 · Apr 1, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -257,7 +257,7 @@ qvartools/
 │   │       └── lucj_sampler.py       # LUCJSampler (Qiskit + ffsim LUCJ circuit)
 │   │
 │   ├── molecules/                # Molecular system registry
-│   │   └── registry.py           # MOLECULE_REGISTRY (24 molecules: 12 full-space + 12 CAS), get_molecule, list_molecules
+│   │   └── registry.py           # MOLECULE_REGISTRY (26 molecules: 12 full-space + 14 CAS), get_molecule, list_molecules
 │   │
 │   ├── _ext/                     # Experimental GPU extensions
 │   │   ├── __init__.py
@@ -268,8 +268,9 @@ qvartools/
 │   │   └── nqs/
 │   │       ├── nqs_sqd.py        # NQSSQDConfig, run_nqs_sqd
 │   │       ├── nqs_skqd.py       # NQSSKQDConfig, run_nqs_skqd
-│   │       ├── hi_nqs_sqd.py     # HINQSSQDConfig, run_hi_nqs_sqd (initial_basis warm-start)
-│   │       └── hi_nqs_skqd.py    # HINQSSKQDConfig, run_hi_nqs_skqd (initial_basis warm-start)
+│   │       ├── hi_nqs_sqd.py     # HINQSSQDConfig, run_hi_nqs_sqd (initial_basis, PT2 selection)
+│   │       ├── hi_nqs_skqd.py    # HINQSSKQDConfig, run_hi_nqs_skqd (initial_basis warm-start)
+│   │       └── _pt2_helpers.py   # compute_pt2_scores, evict_by_coefficient, compute_temperature
 │   │
 │   └── _utils/                   # Internal utilities
 │       ├── scaling/
@@ -454,7 +455,7 @@ Stage 1: Train Flow + NQS          Stage 2: Basis Selection         Stage 3: Sub
 
 ## 4. Molecule Registry
 
-24 pre-configured molecular benchmarks (12 full-space + 12 CAS active-space) accessible via `get_molecule(name)`:
+26 pre-configured molecular benchmarks (12 full-space + 14 CAS active-space) accessible via `get_molecule(name)`:
 
 **Full-space molecules (4--28 qubits)**
 
@@ -473,7 +474,7 @@ Stage 1: Train Flow + NQS          Stage 2: Basis Selection         Stage 3: Sub
 | H2S | 26 | sto-3g | bent |
 | C2H4 | 28 | sto-3g | planar |
 
-**CAS active-space molecules (24--58 qubits)**
+**CAS active-space molecules (24--72 qubits)**
 
 | Name | Qubits | Basis Set | Active Space |
 |------|--------|-----------|--------------|
@@ -489,6 +490,8 @@ Stage 1: Train Flow + NQS          Stage 2: Basis Selection         Stage 3: Sub
 | Cr2-CAS(12,26) | 52 | cc-pvdz | 12e, 26o |
 | Cr2-CAS(12,28) | 56 | cc-pvdz | 12e, 28o |
 | Cr2-CAS(12,29) | 58 | cc-pvdz | 12e, 29o |
+| Cr2-CAS(12,32) | 64 | cc-pvdz | 12e, 32o |
+| Cr2-CAS(12,36) | 72 | cc-pvdz | 12e, 36o |
 
 ---
 
@@ -763,16 +766,27 @@ The `_ext/` subpackage is **experimental and optional**. `sbd_subprocess` requir
 
 When `SQDConfig.use_cartesian_product=True` (default), SQD splits sampled configs into alpha/beta spin strings via `split_spin_strings()`, then enumerates all alpha×beta pairs via `cartesian_product_configs()`. This dramatically improves basis coverage for molecular Hamiltonians.
 
+### IBM `solve_fermion` Energy Convention
+
+IBM's `qiskit_addon_sqd.fermion.solve_fermion` returns **electronic energy only** (no nuclear repulsion). Always add `hamiltonian.integrals.nuclear_repulsion` to the result. Its `sci_state.amplitudes` is **2D** (n_alpha_strs × n_beta_strs), not 1D — use α/β marginals for NQS teacher weights.
+
+### S-CORE is for Quantum Hardware Only
+
+`recover_configurations` (S-CORE) in `qiskit_addon_sqd` is a noise-recovery technique for noisy quantum hardware samples. **Do not use it for classical NQS samples** — it adds massive overhead (NH₃: 1.5 hr → 5 s without it) with no accuracy benefit on clean samples.
+
 ---
 
 ## 11. CI/CD
 
 ### GitHub Actions
 
 **CI Pipeline** (`.github/workflows/ci.yml`):
-- **Lint job:** `ruff format --check` + `ruff check` on Python 3.11
-- **Test job:** `pytest` on Python 3.10, 3.11, 3.12 with `[dev,pyscf]` extras
-- Excludes `gpu` marker tests
+- **Lint job:** `ruff format --check` + `ruff check` on Python 3.11, pip cached
+- **Typecheck job:** `mypy` on core modules (informational)
+- **Smoke job:** Verifies 26+ molecules registered + all public modules importable
+- **Test job:** `pytest` on Python 3.10, 3.11, 3.12 with `[dev,pyscf]` extras; coverage only on 3.11 (`--cov-fail-under=40`); excludes `gpu` marker
+- **Docs job:** Sphinx build check on PRs (warns but doesn't block)
+- **Global:** `concurrency: cancel-in-progress` cancels superseded runs; `fail-fast: false`
 
 **Docs Pipeline** (`.github/workflows/docs.yml`):
 - Sphinx build on push to main

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -18,12 +18,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - `compute_molecular_integrals` now accepts `cas` and `casci` parameters for CAS active-space reduction
-- 12 new CAS molecules in registry: N₂-CAS(10,12/15/17/20/26), Cr₂ + variants, Benzene CAS(6,15)
+- 14 new CAS molecules in registry (26 total): N₂-CAS(10,12/15/17/20/26), Cr₂ + variants up to 72Q, Benzene CAS(6,15)
+- IBM `solve_fermion` auto-enabled when `qiskit_addon_sqd` is installed (α×β Cartesian product, dramatically better accuracy)
+- `_train_nqs_teacher` raises `ValueError` when `energy_weight > 0` without `hamiltonian`
 - `_compute_cas_integrals` helper with auto-CASCI fallback for large active spaces
 - `MolecularHamiltonian.build_sparse_hamiltonian()` for O(nnz) sparse H construction
 - Sparse eigenvalue dispatch in `gpu_solve_fermion` for basis > 8K configs
 - CAS-aware `FCISolver` using active-space integrals directly (no full molecule rebuild)
 - FCI-free pipeline support: 25 experiment scripts gracefully handle `exact_energy=None`
+- PT2 configuration selection for HI+NQS+SQD (`use_pt2_selection=True`, ADR-005)
+- `_pt2_helpers.py`: EN-PT2 scoring, ASCI coefficient eviction, temperature annealing
+- 3-term NQS teacher loss (teacher KL + energy REINFORCE + entropy)
+- CIPSI sparse fallback for basis > 10K via `build_sparse_hamiltonian`
 - `TransformerAsNQS` adapter: enables `AutoregressiveTransformer` in NF training pipeline
 - `NQSWithSampling` adapter: enables any `NeuralQuantumState` in HI training pipeline
 - `qvartools._logging` module with `configure_logging()` and `get_logger()`
@@ -38,10 +44,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - ADR-002 decision record (deferred: torch/numpy roundtrip not a bottleneck)
 - ADR-003 decision record (GPU-native SBD integration via r-ccs-cms/sbd)
 
+### Removed
+- S-CORE (`recover_configurations`) from HI-NQS-SQD IBM path — designed for quantum hardware noise, not needed for classical NQS samples (NH₃ 1.5 hr → 5 s)
+
 ### Fixed
 - `TransformerNFSampler._build_nqs()` used wrong parameter name `hidden_dim` instead of `hidden_dims`
 - `hi_nqs_sqd.py` passed tensors instead of numpy arrays to `vectorized_dedup`
 - Groups 07/08 pipelines discarded NF+DCI basis when calling iterative NQS solvers (Issue #10)
+- IBM `solve_fermion` returns electronic energy only; now correctly adds `nuclear_repulsion`
+- CIPSI sparse path: `h_matrix.detach().cpu().numpy()` instead of `np.asarray` for CUDA tensors
 
 ## [0.0.0] - 2026-03-26
 

diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ qvartools consolidates normalizing-flow-guided neural quantum states (NF-NQS), s
 - **Unified solver interface** covering FCI, CCSD, SQD, SKQD, and iterative NF variants -- all returning a common `SolverResult`
 - **Automatic system-size scaling** that adapts network architectures and sampling budgets to the Hilbert-space dimension
 - **YAML-based experiment configuration** with CLI overrides for reproducible experiments
-- **Molecule registry** with pre-configured benchmarks from H2 (4 qubits) to C2H4 (28 qubits)
+- **Molecule registry** with 26 pre-configured benchmarks from H₂ (4 qubits) to Cr₂-CAS(12,36) (72 qubits)
 
 ## Installation
 
@@ -169,6 +169,8 @@ Each subpackage is self-contained with a clean public API. Lower-level modules h
 | Cr2-CAS(12,26) | 52 | cc-pvdz | 12e, 26 orb |
 | Cr2-CAS(12,28) | 56 | cc-pvdz | 12e, 28 orb |
 | Cr2-CAS(12,29) | 58 | cc-pvdz | 12e, 29 orb |
+| Cr2-CAS(12,32) | 64 | cc-pvdz | 12e, 32 orb |
+| Cr2-CAS(12,36) | 72 | cc-pvdz | 12e, 36 orb |
 
 ## Documentation
 

diff --git a/docs/api_reference.md b/docs/api_reference.md
@@ -762,7 +762,21 @@ Fast integer hash for a single configuration tensor.
 
 ### `run_hi_nqs_sqd(hamiltonian, mol_info, config=None, *, initial_basis=None)`
 
-Iterative HI+NQS+SQD pipeline with self-consistent eigenvector feedback. Config: `HINQSSQDConfig`. The `initial_basis` kwarg accepts a `torch.Tensor` of shape `(n_configs, n_qubits)` to warm-start the cumulative basis.
+Iterative HI+NQS+SQD pipeline with self-consistent eigenvector feedback. Config: `HINQSSQDConfig`. The `initial_basis` kwarg accepts a `torch.Tensor` of shape `(n_configs, n_qubits)` to warm-start the cumulative basis. Auto-enables IBM `solve_fermion` (α×β Cartesian product) when `qiskit_addon_sqd` is installed.
+
+### `_pt2_helpers` (Internal PT2 Selection Helpers)
+
+#### `compute_pt2_scores(candidates, basis, coeffs, hamiltonian, e0) -> np.ndarray`
+
+Score candidate configs by Epstein-Nesbet PT2 importance: `score(x) = |⟨x|H|Φ₀⟩|² / |E₀ - H_xx|`. Returns non-negative scores, shape `(n_cand,)`.
+
+#### `evict_by_coefficient(basis, coeffs, max_size) -> tuple[Tensor, ndarray]`
+
+Keep only the highest-|c_i|² configs (ASCI pattern). Returns trimmed basis and coefficients.
+
+#### `compute_temperature(iteration, max_iterations, t_init, t_final) -> float`
+
+Linear temperature annealing from `t_init` to `t_final` over iterations.
 
 ### `run_hi_nqs_skqd(hamiltonian, mol_info, config=None, *, initial_basis=None)`
 

diff --git a/docs/decisions/005-pt2-selection-hi-nqs-v3.md b/docs/decisions/005-pt2-selection-hi-nqs-v3.md
@@ -0,0 +1,226 @@
+# ADR-005: PT2 Configuration Selection for HI-NQS v3
+
+- **Status**: Proposed
+- **Date**: 2026-04-02
+- **Author**: George Chang, Jen-Yu Chang
+- **Relates to**: Issue #25 (adaptive sampling RFC), PR #30 (original proposal)
+
+---
+
+## Context
+
+The current `run_hi_nqs_sqd` adds ALL unique NQS samples to the
+cumulative basis each iteration, relying on random sampling to find
+important configurations. At 40+ qubits, NQS sampling covers < 0.01%
+of the Hilbert space, and most samples are uninformative.
+
+PR #30 (leo07010) proposed adding PT2-based perturbative selection to
+filter NQS samples before adding them to the basis. The algorithmic
+concept is sound but the implementation has critical issues: 3 API
+crash bugs, hard-imported optional dependencies, deleted backward
+compatibility (initial_basis, CAS FCI, logging), and a mean-field
+approximation in the teacher signal that loses correlation information.
+
+This ADR documents the design decisions for a correct reimplementation.
+
+---
+
+## Decisions
+
+### D1: PT2 scoring formula — Epstein-Nesbet
+
+**Options:** Epstein-Nesbet (EN), Møller-Plesset (MP), Heat-Bath CI (HCI)
+
+**Choice: Epstein-Nesbet**
+
+```
+score(x) = |⟨x|H|Φ₀⟩|² / |E₀ - H_xx|
+```
+
+- EN uses the actual diagonal element `H_xx`, which naturally captures
+  correlation effects in the denominator
+- MP uses orbital energy sums, which requires a Fock operator (not
+  always available in our framework)
+- HCI uses `max_i |H_{xi} c_i|` without a denominator — simpler but
+  gives no PT2 energy correction estimate
+- EN is the standard in CIPSI/Quantum Package and our existing
+  `SelectedCIExpander`
+
+**Source:** Quantum Package docs, QMCPACK Selected CI docs, Holmes et
+al. JCTC 2016.
+
+### D2: NQS teacher signal — full |c_x|² joint distribution
+
+**Options:** Full `|c_x|²`, α/β marginal product, uniform
+
+**Choice: Full |c_x|²**
+
+PR #30 used `alpha_marginal[a] × beta_marginal[b]` as teacher weights.
+This is a mean-field approximation that loses alpha-beta correlation —
+for strongly correlated molecules (Cr₂, bond-breaking), the joint
+distribution `|c_{ab}|²` has off-diagonal structure that the product
+approximation misses entirely.
+
+The original `_train_nqs_teacher` used full `|c_x|²`, which is correct.
+We preserve this approach.
+
+**Source:** Lanczos-NQS paper (arXiv:2502.01264), Thompson & Gunlycke
+(arXiv:2603.24728).
+
+### D3: Basis eviction — coefficient-based (ASCI pattern)
+
+**Options:** PT2 score from insertion time, |c_i|² after diag, random
+
+**Choice: |c_i|² after each diagonalisation**
+
+PR #30 stored PT2 scores from the iteration when each config was added
+and used these for eviction. This is methodologically flawed: scores
+from different iterations use different eigenvectors, making cross-
+iteration comparison meaningless.
+
+The Adaptive Sampling CI (ASCI) method by Tubman et al. uses the
+correct approach: after each diagonalisation, keep the configs with
+largest `|c_i|²` (CI coefficient magnitude). This naturally discards
+configs that the eigenvector no longer considers important.
+
+**Source:** Tubman et al. JCTC 2020, Quantum Package CIPSI truncation.
+
+### D4: Diag backend — gpu_solve_fermion (preserve optional dep guard)
+
+**Options:** `qiskit_addon_sqd.solve_fermion` (hard import), `gpu_solve_fermion` (guarded)
+
+**Choice: gpu_solve_fermion**
+
+PR #30 hard-imported `solve_fermion` from `qiskit_addon_sqd`, which
+broke CI (the package is optional). We preserve the existing pattern:
+`gpu_solve_fermion` with the try/except guard for `qiskit_addon_sqd`.
+
+### D5: Backward compatibility — extend, don't replace
+
+PR #30 deleted `initial_basis` (PR #20), CAS FCI support (PR #24),
+logging, `__all__`, frozen dataclass, and NumPy-style docstrings.
+
+**Choice:** Preserve ALL existing API. Add PT2 selection as new
+parameters on the existing `HINQSSQDConfig`:
+
+```python
+@dataclass(frozen=True)
+class HINQSSQDConfig:
+    # ... existing fields preserved ...
+    # New PT2 selection fields
+    use_pt2_selection: bool = False       # opt-in, backward compatible
+    pt2_top_k: int = 2000                 # configs kept per iteration
+    max_basis_size: int = 10_000          # eviction threshold
+    convergence_window: int = 3           # consecutive converged iters
+    initial_temperature: float = 1.0      # annealing start
+    final_temperature: float = 0.3        # annealing end
+```
+
+When `use_pt2_selection=False` (default), behavior is identical to
+current code. This ensures all existing tests pass unchanged.
+
+---
+
+## Implementation Plan (TDD)
+
+### P0: Config fields (backward compatible)
+
+Add new fields to frozen `HINQSSQDConfig` with defaults that preserve
+existing behavior (`use_pt2_selection=False`).  Also add 3-term loss
+weights (`teacher_weight`, `energy_weight`, `entropy_weight`).
+
+### P1: Standalone helpers in `methods/nqs/_pt2_helpers.py`
+
+Three pure functions (no NQS dependency, independently testable):
+
+1. `compute_pt2_scores(candidates, basis_coeffs, hamiltonian, e0)` —
+   EN-PT2 scoring via `get_connections` (NOT `get_connections_vectorized_batch`).
+   Uses existing `bitstring_format` utilities (NOT local reimplementation).
+2. `evict_by_coefficient(basis, coeffs, max_size)` — keep highest |c_i|²
+   (ASCI pattern).
+3. `compute_temperature(iteration, max_iter, t_init, t_final)` — linear
+   interpolation.
+
+### P2: Enhance `_train_nqs_teacher` with 3-term loss
+
+Add energy (REINFORCE with diagonal advantage) and entropy terms.
+Use full `|c_x|²` teacher (NOT α/β marginal product — loses correlation).
+Correctly call `nqs.log_prob(alpha, beta)` with 2 args (split at n_orb).
+Keep original behavior when `energy_weight=0` and `entropy_weight=0`.
+
+### P3: Integration into `run_hi_nqs_sqd`
+
+Gate on `use_pt2_selection`:
+- `True`: PT2 filter → eviction → temperature anneal → convergence window
+- `False`: zero change to existing behavior
+
+Preserve: `initial_basis`, CAS compat, logging, `__all__`, docstrings.
+
+### P4: CIPSI sparse fallback (independent)
+
+When `n_basis > 10K`, use `hamiltonian.build_sparse_hamiltonian(basis)` +
+`scipy.sparse.linalg.eigsh`.  Uses OUR API (NOT PR #30's nonexistent
+`get_sparse_matrix_elements`).
+
+### Dependency graph
+
+```
+P0 (config) ──┬── P1 (helpers)  ──┐
+              ├── P2 (3-term loss) ├── P3 (integration) ── P5 (review) ── P6 (PR)
+              └── P4 (CIPSI sparse, independent) ──────────┘
+```
+
+P1, P2, P4 are independent after P0.  P3 depends on P1 + P2.
+
+### Scope: SQD only, SKQD deferred
+
+PT2 selection is added to `run_hi_nqs_sqd` only.  `run_hi_nqs_skqd`
+already has Krylov expansion which serves a similar basis-enrichment
+role.  Extending PT2 to SKQD is a future enhancement.
+
+---
+
+## Consequences
+
+### Positive
+
+- More accurate basis selection at scale (PT2 > random sampling)
+- Basis eviction prevents unbounded memory growth
+- Temperature annealing improves exploration→exploitation transition
+- Fully backward compatible (opt-in via `use_pt2_selection`)
+
+### Negative
+
+- PT2 scoring adds O(n_candidates × n_connections) per iteration
+- Eviction adds one eigenvector sort per iteration (negligible)
+- More config parameters to tune
+
+### Risks
+
+- PT2 scoring is CPU-bound Python (get_connections loop) — may be slow at 40Q
+- Coefficient-based eviction may discard configs that become important later
+- Temperature annealing schedule may need per-system tuning
+
+### Validation Results (2026-04-02)
+
+HI-NQS IBM (5K samples/iter) vs SCI (CIPSI, natural convergence, Numba):
+
+| System | HI-NQS Energy | SCI Energy | Diff | HI-NQS Time | SCI Time |
+|--------|--------------|------------|------|-------------|----------|
+| C2H2 24Q | **-76.02457** | -76.02453 | HI-NQS wins 0.46 mHa | **456s** | 1,088s |
+| N2 40Q | -109.1844 | **-109.2132** | SCI wins 28.8 mHa | **20 min** | 3h45m |
+
+Conclusion: HI-NQS exceeds SCI at 24Q; at 40Q, systematic H-connection expansion
+(Issue #35 Tier 1) is needed to close the 28.8 mHa gap.
+
+---
+
+## References
+
+- PR #30 (leo07010): original HI-NQS v3 proposal
+- Quantum Package CIPSI: EN-PT2 standard
+- Holmes et al. JCTC 2016: Heat-Bath CI comparison
+- Tubman et al. JCTC 2020: ASCI coefficient-based selection
+- arXiv:2503.06292: HI-VQE iteration strategy
+- arXiv:2603.24728: Auto-regressive NQS for Selected CI
+- arXiv:2502.01264: Lanczos-NQS (KL vs MSE for teacher)