Skip to content

feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310

Open
rich7420 wants to merge 1 commit into
apache:mainfrom
rich7420:feat/encode-from-gpu-ptr-f32-on-trait
Open

feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310
rich7420 wants to merge 1 commit into
apache:mainfrom
rich7420:feat/encode-from-gpu-ptr-f32-on-trait

Conversation

@rich7420
Copy link
Copy Markdown
Contributor

@rich7420 rich7420 commented May 12, 2026

Related Issues

Changes

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Why

The single-sample f32 zero-copy entry point exists as a standalone pub unsafe fn on each encoder type (AmplitudeEncoder / AngleEncoder / BasisEncoder), while the batch sibling (encode_batch_from_gpu_ptr_f32) lives on the QuantumEncoder trait with a default NotImplemented body.

That asymmetry means adding a new encoder with f32 zero-copy support requires updating three separate touch points (inherent fn on the encoder, dispatcher in engine.rs, engine-level entry method) instead of one. The batch side already enforces the trait-override discipline; the single-sample side should too.

This is a pure structural refactor — no new functionality, no behaviour change, no kernel changes.

How

Trait change (qdp-core/src/gpu/encodings/mod.rs)
Add encode_from_gpu_ptr_f32 to QuantumEncoder with a default body returning MahoutError::NotImplemented. Signature mirrors the existing encode_batch_from_gpu_ptr_f32. Cfg-gated to Linux / CUDA, same as the batch sibling.

Encoder overrides

  • AmplitudeEncoder, AngleEncoder, BasisEncoder each gain a trait impl that delegates to the existing inherent _with_stream workhorse fn. Hot paths can keep calling the inherent fn directly without going through a vtable.
  • Encoders that don't have an f32 zero-copy path yet (Phase, IQP, IQP-Z) inherit the default NotImplemented body — same pattern as the batch sibling.

Amplitude alignment (qdp-core/src/gpu/encodings/amplitude.rs)
AmplitudeEncoder previously had no standalone encode_from_gpu_ptr_f32_with_stream — the L2-norm + kernel-launch sequence was inlined in QdpEngine::encode_from_gpu_ptr_f32_with_stream (lib.rs). Add the missing inherent fn to mirror the angle / basis pattern, then have QdpEngine delegate to it (−62 LoC inline, no behaviour change).

Tests (qdp-core/tests/gpu_ptr_encoding.rs)
Four new trait-dispatch tests route through Encoding::encoder() (the trait object) rather than the inherent helper, so they exercise the new override mechanism:

  • amplitude / angle / basis success via trait method
  • phase's default body returns NotImplemented with the expected message

Local verification

  • Rust unit + integration: all passing
  • GPU-required tests (gpu_ptr_encoding, gpu_fidelity, gpu_angle_encoding): all passing (incl. the 4 new ones)
  • Python pytest: all passing
  • Clippy / fmt / license-header: clean
  • Baseline throughput (16 qubits, batch_size = 64, RTX 2080 Ti) holds: amplitude 14,316 / angle 204,209 / basis 648,118 vec/s

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes

The single-sample f32 zero-copy entry point existed as a standalone
`pub unsafe fn` on each encoder type, while the batch variant
(`encode_batch_from_gpu_ptr_f32`) lived on the `QuantumEncoder` trait
with a default `NotImplemented` body. That asymmetry meant adding a new
encoder with f32 zero-copy required updating the dispatcher in three
separate places.

Changes:

- Add `QuantumEncoder::encode_from_gpu_ptr_f32` with a default body
  returning `MahoutError::NotImplemented`, mirroring the existing batch
  variant exactly. Cfg-gated to Linux/CUDA same as the batch sibling.
- Override on `AmplitudeEncoder`, `AngleEncoder`, `BasisEncoder` —
  each delegate to the existing inherent `_with_stream` workhorse fn
  so hot paths can keep calling without a vtable.
- Add `AmplitudeEncoder::encode_from_gpu_ptr_f32_with_stream` as a
  standalone inherent fn matching the angle/basis pattern. The
  kernel-launch + L2-norm sequence was previously inlined in
  `QdpEngine::encode_from_gpu_ptr_f32_with_stream`; that engine method
  now delegates to the new encoder fn (-62 LoC inline, +0 net behaviour).
- Add 4 trait-dispatch tests in `tests/gpu_ptr_encoding.rs`:
  amplitude/angle/basis success via trait method, plus phase's default
  body returning a `NotImplemented` with the expected message.

Verified on RTX 2080 Ti:
- 76 lib + 12 + 64 GPU + 6 type tests passing
- Baseline throughput: amplitude 14,316 / angle 204,209 /
  basis 648,117 vec/s (no regression vs post-apache#1275 baseline)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant