feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310
Open
rich7420 wants to merge 1 commit into
Open
feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310rich7420 wants to merge 1 commit into
rich7420 wants to merge 1 commit into
Conversation
The single-sample f32 zero-copy entry point existed as a standalone `pub unsafe fn` on each encoder type, while the batch variant (`encode_batch_from_gpu_ptr_f32`) lived on the `QuantumEncoder` trait with a default `NotImplemented` body. That asymmetry meant adding a new encoder with f32 zero-copy required updating the dispatcher in three separate places. Changes: - Add `QuantumEncoder::encode_from_gpu_ptr_f32` with a default body returning `MahoutError::NotImplemented`, mirroring the existing batch variant exactly. Cfg-gated to Linux/CUDA same as the batch sibling. - Override on `AmplitudeEncoder`, `AngleEncoder`, `BasisEncoder` — each delegate to the existing inherent `_with_stream` workhorse fn so hot paths can keep calling without a vtable. - Add `AmplitudeEncoder::encode_from_gpu_ptr_f32_with_stream` as a standalone inherent fn matching the angle/basis pattern. The kernel-launch + L2-norm sequence was previously inlined in `QdpEngine::encode_from_gpu_ptr_f32_with_stream`; that engine method now delegates to the new encoder fn (-62 LoC inline, +0 net behaviour). - Add 4 trait-dispatch tests in `tests/gpu_ptr_encoding.rs`: amplitude/angle/basis success via trait method, plus phase's default body returning a `NotImplemented` with the expected message. Verified on RTX 2080 Ti: - 76 lib + 12 + 64 GPU + 6 type tests passing - Baseline throughput: amplitude 14,316 / angle 204,209 / basis 648,117 vec/s (no regression vs post-apache#1275 baseline)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issues
Changes
Why
The single-sample f32 zero-copy entry point exists as a standalone
pub unsafe fnon each encoder type (AmplitudeEncoder/AngleEncoder/BasisEncoder), while the batch sibling (encode_batch_from_gpu_ptr_f32) lives on theQuantumEncodertrait with a defaultNotImplementedbody.That asymmetry means adding a new encoder with f32 zero-copy support requires updating three separate touch points (inherent fn on the encoder, dispatcher in
engine.rs, engine-level entry method) instead of one. The batch side already enforces the trait-override discipline; the single-sample side should too.This is a pure structural refactor — no new functionality, no behaviour change, no kernel changes.
How
Trait change (
qdp-core/src/gpu/encodings/mod.rs)Add
encode_from_gpu_ptr_f32toQuantumEncoderwith a default body returningMahoutError::NotImplemented. Signature mirrors the existingencode_batch_from_gpu_ptr_f32. Cfg-gated to Linux / CUDA, same as the batch sibling.Encoder overrides
AmplitudeEncoder,AngleEncoder,BasisEncodereach gain a traitimplthat delegates to the existing inherent_with_streamworkhorse fn. Hot paths can keep calling the inherent fn directly without going through a vtable.NotImplementedbody — same pattern as the batch sibling.Amplitude alignment (
qdp-core/src/gpu/encodings/amplitude.rs)AmplitudeEncoderpreviously had no standaloneencode_from_gpu_ptr_f32_with_stream— the L2-norm + kernel-launch sequence was inlined inQdpEngine::encode_from_gpu_ptr_f32_with_stream(lib.rs). Add the missing inherent fn to mirror the angle / basis pattern, then haveQdpEnginedelegate to it (−62 LoC inline, no behaviour change).Tests (
qdp-core/tests/gpu_ptr_encoding.rs)Four new trait-dispatch tests route through
Encoding::encoder()(the trait object) rather than the inherent helper, so they exercise the new override mechanism:NotImplementedwith the expected messageLocal verification
gpu_ptr_encoding,gpu_fidelity,gpu_angle_encoding): all passing (incl. the 4 new ones)Checklist