Skip to content

[QDP] feat: add credit card fraud benchmark + amplitude encoding optimizations#1106

Open
rich7420 wants to merge 4 commits intoapache:mainfrom
rich7420:credit-card
Open

[QDP] feat: add credit card fraud benchmark + amplitude encoding optimizations#1106
rich7420 wants to merge 4 commits intoapache:mainfrom
rich7420:credit-card

Conversation

@rich7420
Copy link
Contributor

@rich7420 rich7420 commented Mar 2, 2026

Changes

  • New benchmark: encoding_benchmarks/qdp_pipeline/creditcardfraud_amplitude.py — 5-qubit
    amplitude VQC on Credit Card Fraud data, aligned with PennyLane baseline (same circuit, loss,
    optimizer). Closes the QDP vs baseline training time gap from ~22% slower to <1% gap.
  • New baseline: encoding_benchmarks/pennylane_baseline/creditcardfraud_amplitude.py
    PennyLane reference implementation with AUPRC/F1 metrics for imbalanced data.
  • QuantumDataLoader API: added source_array(X) (in-memory, no temp file),
    as_torch(device), and as_numpy() for ergonomic batch output format.
  • Rust PipelineIterator: added new_from_array() constructor; InMemory next_batch now
    passes &data[start..end] slice directly (no per-batch to_vec()).
  • amplitude.rs: moved D2H norm validation to after encode kernel + device.synchronize(),
    eliminating a mid-pipeline GPU→CPU roundtrip in encode_batch.
  • Bug fixes (iris + creditcard benchmarks): requires_grad=False on all data arrays to
    prevent AdamOptimizer from computing unnecessary gradients through state vectors;
    AmplitudeEmbedding(normalize=False) in place of StatePrep; .real extraction after
    torch.from_dlpack() to convert complex128 DLPack output to float64.

Motivation

The existing QDP benchmark suite only covers the Iris dataset (100 samples, 2 qubits), which is too small to surface real-world data-loading and encoding bottlenecks. Credit Card Fraud (284,807 transactions, 5 qubits) is a standard imbalanced-classification benchmark from Kaggle/OpenML that stresses the full QDP pipeline — batch iteration, GPU encoding, and training — at realistic scale.

Adding this benchmark serves this purposes:

  • Expands loader API coverage: source_array(X), as_torch(), and as_numpy() are exercised end-to-end by the new benchmark and tests, catching integration issues across the Python → PyO3 → Rust → CUDA boundary.

Checklist

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other
  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes

@rich7420
Copy link
Contributor Author

rich7420 commented Mar 2, 2026

It seems a little be too big one. Sorry about that.

Copy link
Contributor

@viiccwen viiccwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for contributing! 🙌
I'll look deeper tomorrow, and I think we should add tests to cover new loader APIs, especially the new behavior crosses Python, PyO3, Rust, and CUDA boundaries.

Comment on lines +356 to +358
elif kind == "numpy":
for qt in raw_iter:
yield _torch.from_dlpack(qt).cpu().numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_torch() validates that torch is installed, but as_numpy() does not. Then _wrap_iterator() calls _torch.from_dlpack(...) for the "numpy" path.

Does it mean as_numpy() can succeed at configuration time and then fail during iteration with an unclear runtime error if PyTorch is not installed. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh , nice catch! you're right

@ryankert01
Copy link
Member

amplitude.rs: moved D2H norm validation to after encode kernel + device.synchronize(),
eliminating a mid-pipeline GPU→CPU roundtrip in encode_batch.

nice

@400Ping
Copy link
Member

400Ping commented Mar 3, 2026

Please solve conflicts

@rich7420
Copy link
Contributor Author

rich7420 commented Mar 6, 2026

plz take a look and test, not hurry once you have time

@400Ping 400Ping self-assigned this Mar 8, 2026
Copy link
Member

@ryankert01 ryankert01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pennylane baseline uses CPU for training, whereas qdp pipeline has dual CPU/GPU. We can simplify both to use GPU for training always.

@rich7420
Copy link
Contributor Author

no problem!

@400Ping
Copy link
Member

400Ping commented Mar 19, 2026

Please solve conflicts

@400Ping
Copy link
Member

400Ping commented Mar 19, 2026

Can you also give some more context in the PR description for why is this needed?

Copy link
Member

@400Ping 400Ping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array-loader optimization claim does not match the implementation in this PR.
create_array_loader() says batching uses slices without per-batch to_vec(), but
PipelineIterator::take_batch_from_source() still clones each in-memory batch with
data[start..end].to_vec().

next_batch() already handles InMemory via zero-copy &data[start..end].
Remove the dead-code .to_vec() clone path that contradicted the
documented optimization claim.

Addresses review comment from 400Ping.
@rich7420
Copy link
Contributor Author

The array-loader optimization claim does not match the implementation in this PR. create_array_loader() says batching uses slices without per-batch to_vec(), but PipelineIterator::take_batch_from_source() still clones each in-memory batch with data[start..end].to_vec().

Great catch! You are right that the code in take_batch_from_source() was misleading.
Actually, next_batch() already handles the InMemory variant directly using the zero-copy data[start..end] slice logic and never calls take_batch_from_source() for it. To avoid confusion and make the implementation match the documentation claims, I've replaced the dead-code InMemory arm in take_batch_from_source() with unreachable!().
I've also rebased the branch onto main and resolved all the merge conflicts. Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants