Skip to content

[Feature]【Hackathon 10th No.50】add MiniCPM4/4.1-8B model support#6982

Open
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
cloudforge1:task/050-minicpm41-model
Open

[Feature]【Hackathon 10th No.50】add MiniCPM4/4.1-8B model support#6982
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
cloudforge1:task/050-minicpm41-model

Conversation

@cloudforge1
Copy link
Contributor

@cloudforge1 cloudforge1 commented Mar 23, 2026

Motivation

为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。

This PR adds support for deploying the openbmb/MiniCPM4.1-8B model family in FastDeploy, as required by Hackathon 10th Spring No.50.

MiniCPM4.1-8B is a dense 8B parameter model from OpenBMB with the following key features:

  • μP (Maximal Update Parametrization): Three scaling sites — embedding (×12), residual (×scale_depth/√num_layers), and lm_head (÷hidden_size/dim_model_base)
  • GQA: Grouped Query Attention with num_key_value_heads=2
  • LongRoPE: Extended position encoding supporting up to 65,536 tokens
  • Architecture registered as MiniCPMForCausalLM

Modifications

Model Code (fastdeploy/model_executor/models/minicpm4.py)

New model file (516 lines) implementing:

  • MiniCPM4MLP: Gate/up merged projection with SiLU activation, no bias
  • MiniCPM4Attention: GQA with QKVParallelLinear(with_bias=False), neox-style RoPE
  • MiniCPM4DecoderLayer: μP residual scaling (scale_depth / √num_hidden_layers)
  • MiniCPM4Model: μP embedding scaling (scale_emb), graph optimization support
  • MiniCPM4ForCausalLM: μP lm_head scaling, weight mapping (HF model. → FD minicpm4.), registered as MiniCPMForCausalLM
  • MiniCPM4PretrainedModel: Tensor parallel mappings (no bias splits)

Documentation

  • docs/best_practices/MiniCPM4-8B.md: Usage guide with hardware requirements, deployment examples, and performance tuning
  • docs/supported_models.md: Added MiniCPM4 entry to LLM model table

Design Decisions

  • Followed Qwen2 model pattern (closest architecture in FastDeploy) with μP scaling additions
  • Auto-discovery via @ModelRegistry.register_model_class decorator — no manual imports needed
  • μP config values (scale_emb, scale_depth, dim_model_base) read from HF config.json via ModelConfig auto-setattr
  • Quantization support (WINT8/WINT4/FP8) through standard FastDeploy layers — no custom ops needed

Usage or Command

# Deploy MiniCPM4.1-8B with WINT4 quantization
python -m fastdeploy.entrypoints.openai.api_server \
       --model openbmb/MiniCPM4.1-8B \
       --tensor-parallel-size 1 \
       --quantization wint4 \
       --max-model-len 32768 \
       --max-num-seqs 128

# Send a request
curl http://localhost:8180/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openbmb/MiniCPM4.1-8B",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 512
  }'

See docs/best_practices/MiniCPM4-8B.md for full deployment guide.

Accuracy Tests

Unit Tests (16/16 passed across 3 environments)

  • Test file: tests/model_executor/test_minicpm4.py (320 lines, 4 classes, 16 tests)
  • TestMuPScaling (6 tests): Validates all 3 μP scaling sites — embedding (×12), residual (×scale_depth/√N), lm_head (÷hidden/base)
  • TestWeightMapping (5 tests): Verifies HF→FD weight name mapping (model.minicpm4.), column/row parallel splits
  • TestRegistration (4 tests): Model registry, config auto-setattr, architecture name MiniCPMForCausalLM
  • TestComputeLogits (1 test): End-to-end lm_head scaling with real Paddle tensors

AI Studio V100 GPU Validation

Tested on Baidu AI Studio V100 16GB — job logs: 16/16 passed in 0.09s.

Environment: Tesla V100-SXM2 16GB, CUDA 12.0, PaddlePaddle 3.3.0, Python 3.10.

CI Coverage Job (H20 GPU)

All 16 tests passed in CI (run_tests_with_coverage job):

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED

Local CPU Test Output

$ pytest tests/model_executor/test_minicpm4.py -v
========================= test session starts ==========================
platform linux -- Python 3.13.9, pytest-9.0.2, pluggy-1.5.0
collected 16 items

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED [  6%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED [ 12%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED [ 18%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED [ 25%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED [ 31%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED [ 37%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED [ 43%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED [ 50%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED [ 56%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED [ 62%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED [ 68%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED [ 75%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED [ 81%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED [ 87%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED [ 93%]
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED [100%]
======================== 16 passed, 1 warning in 0.55s =================

GPU Validation Note

Full model inference validation requires downloading the 16GB model weights, which exceeds CI test scope. The model architecture is structurally validated by the unit tests above. Full deployment validation can be performed using the commands in the Usage section.

Checklist

  • Model code follows existing FastDeploy patterns (Qwen2 reference)
  • All pre-commit checks pass (black, isort, flake8, ruff)
  • Model registered via @ModelRegistry.register_model_class decorator
  • Weight mapping supports HuggingFace torch format
  • Usage documentation provided
  • Supported models table updated
  • GPU validation (unit tests passed on V100)
  • Unit tests: 16/16 passed (CPU + GPU)

@paddle-bot
Copy link

paddle-bot bot commented Mar 23, 2026

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 30.05181% with 135 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@0b4c1cb). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/models/minicpm4.py 30.05% 135 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6982   +/-   ##
==========================================
  Coverage           ?   73.67%           
==========================================
  Files              ?      400           
  Lines              ?    56129           
  Branches           ?     8847           
==========================================
  Hits               ?    41354           
  Misses             ?    11855           
  Partials           ?     2920           
Flag Coverage Δ
GPU 73.67% <30.05%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cloudforge1
Copy link
Contributor Author

补充说明一下当前 CI 状态:

  • Check PR Template 已修复并转绿。
  • run_tests_with_coverage 的红灯来自 diff-cover --fail-under=80,不是单测失败。该 job 中 TEST_EXIT_CODE=0,所有测试均通过;失败点是新增模型文件 fastdeploy/model_executor/models/minicpm4.py 的 diff coverage 为 30%。
  • 这里是新模型接入场景,不是 H10 单测补覆盖任务。coverage job 不会下载 16GB 模型权重,也不会在 CI 中执行真实模型加载/前向,因此主体模型代码无法像单测任务那样在 diff-cover 中达到 80%。
  • 已补充结构化单测 tests/model_executor/test_minicpm4.py:本地 CPU 16/16 通过,AI Studio V100 16/16 通过,CI 中该测试文件也通过。

如果 reviewer 认为还需要补更多 lightweight structural tests,我可以继续补;当前这条红灯本质上是新模型文件的 coverage policy 与接入场景不完全匹配,不是功能回归。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants