What's Changed
- Simplify min_score selection logic, correct type hint for
propose_suffix_draft_token_idsby @CptTZ in #195 - Add op_builder for jitting the kernels by @sfc-gh-reyazda in #193
- Update links in README for Shift Parallelism by @sfc-gh-mhidayetoglu in #196
- bump to v0.0.10 by @sfc-gh-jrasley in #194
- Move SwiftKV ops to JIT-build by @sfc-gh-yewang in #198
- Add @sfc-gh-reyazda as code owner by @sfc-gh-yewang in #199
- Explicitly initialize CUDA buffers for next tokens by @sfc-gh-yewang in #201
- Port suffix decoding to nanobind by @sfc-gh-aqiao in #206
- upgrade to vllm 0.10.1 by @sfc-gh-yewang in #162
- Suffix decoding: break out of speculate loop early by @sfc-gh-aqiao in #207
- Suffix decoding speculation optimization by @sfc-gh-aqiao in #211
- reshape_and_cache_flash fp4 kernel by @sfc-gh-yewang in #210
- More suffix decoding optimizations by @sfc-gh-aqiao in #212
- remove ulysses moe patch by @sfc-gh-mhidayetoglu in #213
- Bump version from 0.0.10 to 0.1.0 by @sfc-gh-jrasley in #214
New Contributors
- @sfc-gh-reyazda made their first contribution in #193
Full Changelog: v0.0.9...v0.1.0