feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait by rich7420 · Pull Request #1310 · apache/mahout

rich7420 · 2026-05-12T02:53:59Z

Related Issues

Changes

Why

The single-sample f32 zero-copy entry point exists as a standalone pub unsafe fn on each encoder type (AmplitudeEncoder / AngleEncoder / BasisEncoder), while the batch sibling (encode_batch_from_gpu_ptr_f32) lives on the QuantumEncoder trait with a default NotImplemented body.

That asymmetry means adding a new encoder with f32 zero-copy support requires updating three separate touch points (inherent fn on the encoder, dispatcher in engine.rs, engine-level entry method) instead of one. The batch side already enforces the trait-override discipline; the single-sample side should too.

This is a pure structural refactor — no new functionality, no behaviour change, no kernel changes.

How

Trait change (qdp-core/src/gpu/encodings/mod.rs)
Add encode_from_gpu_ptr_f32 to QuantumEncoder with a default body returning MahoutError::NotImplemented. Signature mirrors the existing encode_batch_from_gpu_ptr_f32. Cfg-gated to Linux / CUDA, same as the batch sibling.

Encoder overrides

AmplitudeEncoder, AngleEncoder, BasisEncoder each gain a trait impl that delegates to the existing inherent _with_stream workhorse fn. Hot paths can keep calling the inherent fn directly without going through a vtable.
Encoders that don't have an f32 zero-copy path yet (Phase, IQP, IQP-Z) inherit the default NotImplemented body — same pattern as the batch sibling.

Amplitude alignment (qdp-core/src/gpu/encodings/amplitude.rs)
AmplitudeEncoder previously had no standalone encode_from_gpu_ptr_f32_with_stream — the L2-norm + kernel-launch sequence was inlined in QdpEngine::encode_from_gpu_ptr_f32_with_stream (lib.rs). Add the missing inherent fn to mirror the angle / basis pattern, then have QdpEngine delegate to it (−62 LoC inline, no behaviour change).

Tests (qdp-core/tests/gpu_ptr_encoding.rs)
Four new trait-dispatch tests route through Encoding::encoder() (the trait object) rather than the inherent helper, so they exercise the new override mechanism:

amplitude / angle / basis success via trait method
phase's default body returns NotImplemented with the expected message

Local verification

Rust unit + integration: all passing
GPU-required tests (gpu_ptr_encoding, gpu_fidelity, gpu_angle_encoding): all passing (incl. the 4 new ones)
Python pytest: all passing
Clippy / fmt / license-header: clean
Baseline throughput (16 qubits, batch_size = 64, RTX 2080 Ti) holds: amplitude 14,316 / angle 204,209 / basis 648,118 vec/s

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes

The single-sample f32 zero-copy entry point existed as a standalone `pub unsafe fn` on each encoder type, while the batch variant (`encode_batch_from_gpu_ptr_f32`) lived on the `QuantumEncoder` trait with a default `NotImplemented` body. That asymmetry meant adding a new encoder with f32 zero-copy required updating the dispatcher in three separate places. Changes: - Add `QuantumEncoder::encode_from_gpu_ptr_f32` with a default body returning `MahoutError::NotImplemented`, mirroring the existing batch variant exactly. Cfg-gated to Linux/CUDA same as the batch sibling. - Override on `AmplitudeEncoder`, `AngleEncoder`, `BasisEncoder` — each delegate to the existing inherent `_with_stream` workhorse fn so hot paths can keep calling without a vtable. - Add `AmplitudeEncoder::encode_from_gpu_ptr_f32_with_stream` as a standalone inherent fn matching the angle/basis pattern. The kernel-launch + L2-norm sequence was previously inlined in `QdpEngine::encode_from_gpu_ptr_f32_with_stream`; that engine method now delegates to the new encoder fn (-62 LoC inline, +0 net behaviour). - Add 4 trait-dispatch tests in `tests/gpu_ptr_encoding.rs`: amplitude/angle/basis success via trait method, plus phase's default body returning a `NotImplemented` with the expected message. Verified on RTX 2080 Ti: - 76 lib + 12 + 64 GPU + 6 type tests passing - Baseline throughput: amplitude 14,316 / angle 204,209 / basis 648,117 vec/s (no regression vs post-apache#1275 baseline)

rich7420 requested review from 400Ping, guan404ming and ryankert01 as code owners May 12, 2026 02:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310

feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310
rich7420 wants to merge 1 commit into
apache:mainfrom
rich7420:feat/encode-from-gpu-ptr-f32-on-trait

rich7420 commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rich7420 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Changes

Why

How

Local verification

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rich7420 commented May 12, 2026 •

edited

Loading