Skip to content

issue/356 - support qwen3_moe by naive modules#357

Open
pengcheng888 wants to merge 1 commit intomainfrom
issue/356
Open

issue/356 - support qwen3_moe by naive modules#357
pengcheng888 wants to merge 1 commit intomainfrom
issue/356

Conversation

@pengcheng888
Copy link
Copy Markdown
Collaborator

@pengcheng888 pengcheng888 commented May 8, 2026

qwen3moe 从v0.1临时分支迁移适配到v0.2

nvidia平台测试
单次推理

python examples/test_infer.py --device nvidia --model=/data-aisoft/mechdancer/models/Qwen3-30B-A3B --tp=1 --max-new-tokens=100 --enable-paged-attn --attn=paged-attn

nvidia-单卡推理

服务测试

 python python/infinilm/server/inference_server.py \
--device nvidia \
--model=/data-aisoft/mechdancer/models/Qwen3-30B-A3B/ \
--temperature 1.0 \
--top-p 0.8 \
--top-k 1 \
--port 8103 \
--tp 2  \
--block-size 256 \
--max-new-tokens 256 \
--num-blocks 512 \
--max-batch-size 64 \
--attn paged-attn \
--enable-paged-attn
nidia-服务测试

metax平台
单次推理
python examples/test_infer.py --device metax --model=/data-aisoft/wangpengcheng_data/Qwen3-30B-A3B_small --tp=1 --max-new-tokens=10 --enable-paged-attn --attn=paged-attn

maca-单次

服务测试

 python python/infinilm/server/inference_server.py \
--device metax \
--model=/data-aisoft/mechdancer/models/Qwen3-30B-A3B/ \
--temperature 1.0 \
--top-p 0.8 \
--top-k 1 \
--port 8000 \
--tp 2  \
--block-size 256 \
--max-new-tokens 256 \
--num-blocks 512 \
--max-batch-size 64 \
--attn paged-attn \
--enable-paged-attn
maca-服务

@pengcheng888 pengcheng888 requested review from a team and wooway777 May 8, 2026 07:40
@pengcheng888 pengcheng888 linked an issue May 8, 2026 that may be closed by this pull request
@pengcheng888
Copy link
Copy Markdown
Collaborator Author

依赖infinicore的pr

@pengcheng888 pengcheng888 requested a review from Ceng23333 May 8, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] qwen3moe 从v0.1临时分支迁移适配到v0.2

1 participant