Skip to content

Rust PyO3 renderer port and native performance paths#2

Open
ThomAub wants to merge 35 commits into
mainfrom
rust-pyo3-port
Open

Rust PyO3 renderer port and native performance paths#2
ThomAub wants to merge 35 commits into
mainfrom
rust-pyo3-port

Conversation

@ThomAub
Copy link
Copy Markdown
Owner

@ThomAub ThomAub commented May 21, 2026

Summary

  • Add the Rust/PyO3 renderer port with native parity coverage across the supported families.
  • Add native runtime performance paths: prepared tools, sessions, NumPy/packed outputs, fast role/content inputs, tool text caching, and TokenPlanBuf batch dynamic encoding.
  • Update SGLang and vLLM examples to use native prepared tools and sessions when available while keeping token-ID engine contracts unchanged.

Verification

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --locked
  • cargo test --workspace
  • uv run ruff check examples/sglang/multiturn_generate_sglang.py examples/sglang/online_multiturn_sglang.py examples/vllm/multiturn_generate_vllm.py benchmarks/native_vs_python_qwen3.py tests/test_native_numpy.py
  • uv run ruff format --check examples/sglang/multiturn_generate_sglang.py examples/sglang/online_multiturn_sglang.py examples/vllm/multiturn_generate_vllm.py benchmarks/native_vs_python_qwen3.py tests/test_native_numpy.py
  • uv run maturin develop --manifest-path crates/renderers-py/Cargo.toml --release
  • uv run pytest -m parity tests/test_native_parity.py -q -rs
  • env RENDERERS_NATIVE=all uv run pytest tests/test_render_ids.py tests/test_bridge.py tests/test_roundtrip.py tests/test_message_indices.py tests/test_native_router.py tests/test_native_vision.py tests/test_native_numpy.py -q -rs

Benchmark Artifacts

  • /private/tmp/renderers-native-tokenplan-family-smoke-002.json
  • /private/tmp/renderers-native-tokenplan-family-smoke-002.md

@ThomAub
Copy link
Copy Markdown
Owner Author

ThomAub commented May 21, 2026

Completion audit passed for the native runtime performance pass.

Evidence:

  • TokenPlanBuf and encode_batch_no_special are in emit.rs / tokenizer.rs.
  • Long no-tool render_ids uses TokenPlanBuf in DeepSeek V3, Qwen35/Qwen36, MiniMax M2, and GLM.
  • vLLM and SGLang examples now use prepare_tools(...) plus new_session(...) when native APIs exist.
  • render_fast_ids(...) is exposed in PyO3 and documented for role/content serving loops.
  • Verification passed: clippy, cargo tests, full native parity, and native-forced suite.
  • Matched benchmark artifact: /private/tmp/renderers-native-tokenplan-family-smoke-002.json.

Measured long-history direct render_ids native list improvements versus the prior smoke artifact:

family prior native list current native list prior speedup vs Python current speedup vs Python
DeepSeek V3 907.148 us 123.459 us 1.57x 4.96x
Qwen35 823.732 us 232.878 us 2.71x 9.20x
Qwen36 824.496 us 252.372 us 2.81x 9.35x
MiniMax M2 818.452 us 218.835 us 2.24x 8.10x
GLM5 678.772 us 200.545 us 2.68x 8.88x
GLM5.1 665.796 us 222.906 us 3.02x 7.94x
GLM4.5 682.401 us 202.465 us 2.80x 9.43x

Focused family geomeans from the matched benchmark run:

family list geomean NumPy geomean
DeepSeek V3 3.28x 3.51x
Qwen35 3.85x 4.19x
Qwen36 3.82x 4.16x
MiniMax M2 6.16x 6.75x
GLM5 2.86x 3.11x
GLM5.1 2.85x 3.11x
GLM4.5 3.33x 3.65x

@ThomAub ThomAub force-pushed the rust-pyo3-port branch 2 times, most recently from f048690 to 23d82c3 Compare May 21, 2026 09:32
@ThomAub
Copy link
Copy Markdown
Owner Author

ThomAub commented May 28, 2026

Keep in mind PrimeIntellect-ai#70

@ThomAub
Copy link
Copy Markdown
Owner Author

ThomAub commented May 28, 2026

Bench:

  • render_batch_ids short_batch: native list 95.246us -> 87.370us, native np 92.104us -> 84.731us.
  • render_batch_ids short_batch_prepared_tools: native np 137.099us -> 130.805us.
  • session_render_ids long_history_gen_prompt: native list 228.157us -> 212.276us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant