Add QDP backend detection and pure-PyTorch reference implementations by ryankert01 · Pull Request #1189 · apache/mahout

ryankert01 · 2026-03-15T12:15:21Z

Summary

This PR adds a pure-PyTorch reference backend to the QDP Python package to compare with our implementation of GPU kernel~

Benchmark Results

All runs: 100 batches x 64 vectors (except 18-qubit: 50 batches x 64), median of 3 trials.

Amplitude Encoding

Qubits	Mode	PyTorch CPU	PyTorch GPU	Mahout	Mahout vs GPU
10	encode-only	462,882	1,390,998	567,499	0.4x
10	end-to-end	73,699	86,939	234,170	2.7x
14	encode-only	118,620	789,862	151,713	0.2x
14	end-to-end	5,514	6,356	25,603	4.0x
16	encode-only	5,458	228,525	65,358	0.3x
16	end-to-end	721	964	6,336	6.6x
18	encode-only	1,313	58,761	15,876	0.3x
18	end-to-end	194	237	1,529	6.5x

Angle Encoding

Qubits	Mode	PyTorch CPU	PyTorch GPU	Mahout	Mahout vs GPU
14	encode-only	1,086	59,864	45,332	0.8x
14	end-to-end	1,114	56,456	50,919	0.9x
16	encode-only	262	10,254	8,032	0.8x
16	end-to-end	252	10,093	11,496	1.1x

IQP Encoding

Qubits	Mode	PyTorch CPU	PyTorch GPU	Mahout	Mahout vs GPU
10	encode-only	15,381	74,239	484,071	6.5x
14	encode-only	1,258	25,597	55,304	2.2x

Analysis

Amplitude encode-only: PyTorch GPU wins 2-4x because its vectorized L2-norm + pad is very efficient, while Mahout's encode path still pays per-batch GPU output allocation + D2H norm validation sync overhead.
Amplitude end-to-end: Mahout wins 2.7-6.6x, with the advantage growing at higher qubit counts. Rust data generation + integrated pipeline dominates Python generate_batch_data + torch.tensor + H2D transfer.
Angle: Near-parity in both modes (0.8-1.1x). The tensor-product encoding is compute-bound and both implementations are similarly efficient.
IQP encode-only: Mahout wins decisively (2.2-6.5x). Mahout's CUDA kernel for IQP (Walsh-Hadamard + phase computation) is significantly faster than PyTorch's Python-level loop over butterfly stages.

Known Limitations

Basis encode-only: Rust engine.encode expects per-sample basis indices; batch input format differs from PyTorch. Requires Rust API change to support.
IQP end-to-end: Rust pipeline uses 1 << num_qubits as sample_size regardless of encoding method, causing a mismatch for IQP (which expects n + n*(n-1)/2). Pre-existing Rust pipeline bug.

Changes

New files (3)

qdp/qdp-python/qumat_qdp/_backend.py — Backend detection (RUST_CUDA / PYTORCH / NONE) with auto-selection, force_backend() for testing
testing/qdp_python/test_torch_ref.py — 396 lines of tests for the PyTorch reference encoders
testing/qdp_python/test_fallback.py — 306 lines of tests for the fallback path (loader, API)

Modified files (4)

qdp/qdp-python/qumat_qdp/__init__.py — Graceful degradation when _qdp Rust extension is unavailable; exports Backend, BACKEND
qdp/qdp-python/qumat_qdp/api.py — QdpBenchmark now supports .backend("rust" | "pytorch" | "auto") with PyTorch throughput/latency implementations
qdp/qdp-python/qumat_qdp/loader.py — QuantumDataLoader falls back to PyTorch encoding when _qdp is missing; supports synthetic data and .npy/.pt files
testing/conftest.py — test_torch_ref.py and test_fallback.py are allowed to run without the Rust extension

Benchmark file (1)

qdp/qdp-python/benchmark/benchmark_pytorch_ref.py — Added --mode flag (encode-only default, end-to-end), two new functions (run_mahout_encode_only, run_pytorch_end_to_end), updated banner with mode info and footnotes

How `--mode` works

	encode-only (default)	end-to-end
PyTorch	Pre-gen on GPU → time encoding only	CPU data gen + GPU transfer + encode (all timed)
Mahout	Pre-gen on GPU → `QdpEngine.encode(cuda_tensor)`	`run_throughput_pipeline_py` (gen + H2D + encode)
Fair?	Yes	Yes

Key design decisions

No Rust changes needed — QdpEngine.encode() already accepts CUDA tensors via DLPack zero-copy, enabling the encode-only Mahout benchmark in pure Python
Graceful degradation — The entire qumat_qdp package now works without _qdp compiled, falling back to PyTorch. This makes the package installable/testable on machines without CUDA
Backend auto-detection — _backend.py provides a single source of truth for which backend is available, with force_backend() for testing

- Implemented backend detection and selection logic in _backend.py, prioritizing Rust+CUDA, PyTorch, and fallback to None. - Added pure-PyTorch reference implementations for quantum data encoding methods in torch_ref.py, including amplitude, angle, basis, and IQP encoding. - Created comprehensive tests for fallback mechanisms and pure-PyTorch encodings in test_fallback.py and test_torch_ref.py, ensuring functionality without the Rust extension. - Enhanced error handling and validation across encoding methods to ensure robustness.

ryankert01 requested review from 400Ping and guan404ming as code owners March 15, 2026 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QDP backend detection and pure-PyTorch reference implementations#1189

Add QDP backend detection and pure-PyTorch reference implementations#1189
ryankert01 wants to merge 1 commit intoapache:mainfrom
ryankert01:add-pytorch-reference

ryankert01 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryankert01 commented Mar 15, 2026

Summary

Benchmark Results

Amplitude Encoding

Angle Encoding

IQP Encoding

Analysis

Known Limitations

Changes

New files (3)

Modified files (4)

Benchmark file (1)

How --mode works

Key design decisions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

How `--mode` works