Skip to content

Add QDP backend detection and pure-PyTorch reference implementations#1189

Open
ryankert01 wants to merge 1 commit intoapache:mainfrom
ryankert01:add-pytorch-reference
Open

Add QDP backend detection and pure-PyTorch reference implementations#1189
ryankert01 wants to merge 1 commit intoapache:mainfrom
ryankert01:add-pytorch-reference

Conversation

@ryankert01
Copy link
Member

Closes #1177

Summary

This PR adds a pure-PyTorch reference backend to the QDP Python package to compare with our implementation of GPU kernel~

Benchmark Results

All runs: 100 batches x 64 vectors (except 18-qubit: 50 batches x 64), median of 3 trials.

Amplitude Encoding

Qubits Mode PyTorch CPU PyTorch GPU Mahout Mahout vs GPU
10 encode-only 462,882 1,390,998 567,499 0.4x
10 end-to-end 73,699 86,939 234,170 2.7x
14 encode-only 118,620 789,862 151,713 0.2x
14 end-to-end 5,514 6,356 25,603 4.0x
16 encode-only 5,458 228,525 65,358 0.3x
16 end-to-end 721 964 6,336 6.6x
18 encode-only 1,313 58,761 15,876 0.3x
18 end-to-end 194 237 1,529 6.5x

Angle Encoding

Qubits Mode PyTorch CPU PyTorch GPU Mahout Mahout vs GPU
14 encode-only 1,086 59,864 45,332 0.8x
14 end-to-end 1,114 56,456 50,919 0.9x
16 encode-only 262 10,254 8,032 0.8x
16 end-to-end 252 10,093 11,496 1.1x

IQP Encoding

Qubits Mode PyTorch CPU PyTorch GPU Mahout Mahout vs GPU
10 encode-only 15,381 74,239 484,071 6.5x
14 encode-only 1,258 25,597 55,304 2.2x

Analysis

  • Amplitude encode-only: PyTorch GPU wins 2-4x because its vectorized L2-norm + pad is very efficient, while Mahout's encode path still pays per-batch GPU output allocation + D2H norm validation sync overhead.
  • Amplitude end-to-end: Mahout wins 2.7-6.6x, with the advantage growing at higher qubit counts. Rust data generation + integrated pipeline dominates Python generate_batch_data + torch.tensor + H2D transfer.
  • Angle: Near-parity in both modes (0.8-1.1x). The tensor-product encoding is compute-bound and both implementations are similarly efficient.
  • IQP encode-only: Mahout wins decisively (2.2-6.5x). Mahout's CUDA kernel for IQP (Walsh-Hadamard + phase computation) is significantly faster than PyTorch's Python-level loop over butterfly stages.

Known Limitations

  • Basis encode-only: Rust engine.encode expects per-sample basis indices; batch input format differs from PyTorch. Requires Rust API change to support.
  • IQP end-to-end: Rust pipeline uses 1 << num_qubits as sample_size regardless of encoding method, causing a mismatch for IQP (which expects n + n*(n-1)/2). Pre-existing Rust pipeline bug.

Changes

New files (3)

  • qdp/qdp-python/qumat_qdp/_backend.py — Backend detection (RUST_CUDA / PYTORCH / NONE) with auto-selection, force_backend() for testing
  • testing/qdp_python/test_torch_ref.py — 396 lines of tests for the PyTorch reference encoders
  • testing/qdp_python/test_fallback.py — 306 lines of tests for the fallback path (loader, API)

Modified files (4)

  • qdp/qdp-python/qumat_qdp/__init__.py — Graceful degradation when _qdp Rust extension is unavailable; exports Backend, BACKEND
  • qdp/qdp-python/qumat_qdp/api.pyQdpBenchmark now supports .backend("rust" | "pytorch" | "auto") with PyTorch throughput/latency implementations
  • qdp/qdp-python/qumat_qdp/loader.pyQuantumDataLoader falls back to PyTorch encoding when _qdp is missing; supports synthetic data and .npy/.pt files
  • testing/conftest.pytest_torch_ref.py and test_fallback.py are allowed to run without the Rust extension

Benchmark file (1)

  • qdp/qdp-python/benchmark/benchmark_pytorch_ref.py — Added --mode flag (encode-only default, end-to-end), two new functions (run_mahout_encode_only, run_pytorch_end_to_end), updated banner with mode info and footnotes

How --mode works

encode-only (default) end-to-end
PyTorch Pre-gen on GPU → time encoding only CPU data gen + GPU transfer + encode (all timed)
Mahout Pre-gen on GPU → QdpEngine.encode(cuda_tensor) run_throughput_pipeline_py (gen + H2D + encode)
Fair? Yes Yes

Key design decisions

  1. No Rust changes neededQdpEngine.encode() already accepts CUDA tensors via DLPack zero-copy, enabling the encode-only Mahout benchmark in pure Python
  2. Graceful degradation — The entire qumat_qdp package now works without _qdp compiled, falling back to PyTorch. This makes the package installable/testable on machines without CUDA
  3. Backend auto-detection_backend.py provides a single source of truth for which backend is available, with force_backend() for testing

- Implemented backend detection and selection logic in _backend.py, prioritizing Rust+CUDA, PyTorch, and fallback to None.
- Added pure-PyTorch reference implementations for quantum data encoding methods in torch_ref.py, including amplitude, angle, basis, and IQP encoding.
- Created comprehensive tests for fallback mechanisms and pure-PyTorch encodings in test_fallback.py and test_torch_ref.py, ensuring functionality without the Rust extension.
- Enhanced error handling and validation across encoding methods to ensure robustness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a PyTorch reference implementation

1 participant