[Feature]【Hackathon 10th No.50】add MiniCPM4/4.1-8B model support#6982
Open
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
Open
[Feature]【Hackathon 10th No.50】add MiniCPM4/4.1-8B model support#6982cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
Conversation
- MiniCPM4MLP: gate/up merged, silu activation, no bias - MiniCPM4Attention: GQA with QKVParallelLinear(bias=False), neox rotary - MiniCPM4DecoderLayer: μP residual scaling (scale_depth/sqrt(num_layers)) - MiniCPM4Model: μP embedding scaling (scale_emb), LongRoPE support - MiniCPM4ForCausalLM: μP lm_head scaling (hidden_size/dim_model_base) - Weight mapping: HF model. to FD minicpm4. prefix - Architecture: MiniCPMForCausalLM (GQA, not MLA) - Follows Qwen2 patterns adapted for MiniCPM4 μP parametrization
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6982 +/- ##
==========================================
Coverage ? 73.67%
==========================================
Files ? 400
Lines ? 56129
Branches ? 8847
==========================================
Hits ? 41354
Misses ? 11855
Partials ? 2920
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2e8de3e to
1cb8661
Compare
Contributor
Author
|
补充说明一下当前 CI 状态:
如果 reviewer 认为还需要补更多 lightweight structural tests,我可以继续补;当前这条红灯本质上是新模型文件的 coverage policy 与接入场景不完全匹配,不是功能回归。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。
This PR adds support for deploying the openbmb/MiniCPM4.1-8B model family in FastDeploy, as required by Hackathon 10th Spring No.50.
MiniCPM4.1-8B is a dense 8B parameter model from OpenBMB with the following key features:
num_key_value_heads=2MiniCPMForCausalLMModifications
Model Code (
fastdeploy/model_executor/models/minicpm4.py)New model file (516 lines) implementing:
MiniCPM4MLP: Gate/up merged projection with SiLU activation, no biasMiniCPM4Attention: GQA withQKVParallelLinear(with_bias=False), neox-style RoPEMiniCPM4DecoderLayer: μP residual scaling (scale_depth / √num_hidden_layers)MiniCPM4Model: μP embedding scaling (scale_emb), graph optimization supportMiniCPM4ForCausalLM: μP lm_head scaling, weight mapping (HFmodel.→ FDminicpm4.), registered asMiniCPMForCausalLMMiniCPM4PretrainedModel: Tensor parallel mappings (no bias splits)Documentation
docs/best_practices/MiniCPM4-8B.md: Usage guide with hardware requirements, deployment examples, and performance tuningdocs/supported_models.md: Added MiniCPM4 entry to LLM model tableDesign Decisions
@ModelRegistry.register_model_classdecorator — no manual imports neededscale_emb,scale_depth,dim_model_base) read from HFconfig.jsonviaModelConfigauto-setattrUsage or Command
See docs/best_practices/MiniCPM4-8B.md for full deployment guide.
Accuracy Tests
Unit Tests (16/16 passed across 3 environments)
tests/model_executor/test_minicpm4.py(320 lines, 4 classes, 16 tests)TestMuPScaling(6 tests): Validates all 3 μP scaling sites — embedding (×12), residual (×scale_depth/√N), lm_head (÷hidden/base)TestWeightMapping(5 tests): Verifies HF→FD weight name mapping (model.→minicpm4.), column/row parallel splitsTestRegistration(4 tests): Model registry, config auto-setattr, architecture nameMiniCPMForCausalLMTestComputeLogits(1 test): End-to-end lm_head scaling with real Paddle tensorsAI Studio V100 GPU Validation
Tested on Baidu AI Studio V100 16GB — job logs: 16/16 passed in 0.09s.
Environment: Tesla V100-SXM2 16GB, CUDA 12.0, PaddlePaddle 3.3.0, Python 3.10.
CI Coverage Job (H20 GPU)
All 16 tests passed in CI (
run_tests_with_coveragejob):Local CPU Test Output
GPU Validation Note
Full model inference validation requires downloading the 16GB model weights, which exceeds CI test scope. The model architecture is structurally validated by the unit tests above. Full deployment validation can be performed using the commands in the Usage section.
Checklist
@ModelRegistry.register_model_classdecorator