[Feature]【Hackathon 10th No.50】add MiniCPM4/4.1-8B model support by cloudforge1 · Pull Request #6982 · PaddlePaddle/FastDeploy

cloudforge1 · 2026-03-23T18:33:48Z

Motivation

为 FastDeploy 提供部署高性能的 openbmb/MiniCPM4.1-8B 系列模型的能力。

This PR adds support for deploying the openbmb/MiniCPM4.1-8B model family in FastDeploy, as required by Hackathon 10th Spring No.50.

MiniCPM4.1-8B is a dense 8B parameter model from OpenBMB with the following key features:

μP (Maximal Update Parametrization): Three scaling sites — embedding (×12), residual (×scale_depth/√num_layers), and lm_head (÷hidden_size/dim_model_base)
GQA: Grouped Query Attention with num_key_value_heads=2
LongRoPE: Extended position encoding supporting up to 65,536 tokens
Architecture registered as MiniCPMForCausalLM

Modifications

Model Code (`fastdeploy/model_executor/models/minicpm4.py`)

New model file (516 lines) implementing:

MiniCPM4MLP: Gate/up merged projection with SiLU activation, no bias
MiniCPM4Attention: GQA with QKVParallelLinear(with_bias=False), neox-style RoPE
MiniCPM4DecoderLayer: μP residual scaling (scale_depth / √num_hidden_layers)
MiniCPM4Model: μP embedding scaling (scale_emb), graph optimization support
MiniCPM4ForCausalLM: μP lm_head scaling, weight mapping (HF model. → FD minicpm4.), registered as MiniCPMForCausalLM
MiniCPM4PretrainedModel: Tensor parallel mappings (no bias splits)

Documentation

docs/best_practices/MiniCPM4-8B.md: Usage guide with hardware requirements, deployment examples, and performance tuning
docs/supported_models.md: Added MiniCPM4 entry to LLM model table

Design Decisions

Followed Qwen2 model pattern (closest architecture in FastDeploy) with μP scaling additions
Auto-discovery via @ModelRegistry.register_model_class decorator — no manual imports needed
μP config values (scale_emb, scale_depth, dim_model_base) read from HF config.json via ModelConfig auto-setattr
Quantization support (WINT8/WINT4/FP8) through standard FastDeploy layers — no custom ops needed

Usage or Command

# Deploy MiniCPM4.1-8B with WINT4 quantization
python -m fastdeploy.entrypoints.openai.api_server \
       --model openbmb/MiniCPM4.1-8B \
       --tensor-parallel-size 1 \
       --quantization wint4 \
       --max-model-len 32768 \
       --max-num-seqs 128

# Send a request
curl http://localhost:8180/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openbmb/MiniCPM4.1-8B",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 512
  }'

See docs/best_practices/MiniCPM4-8B.md for full deployment guide.

Accuracy Tests

Unit Tests (16/16 passed across 3 environments)

Test file: tests/model_executor/test_minicpm4.py (320 lines, 4 classes, 16 tests)
TestMuPScaling (6 tests): Validates all 3 μP scaling sites — embedding (×12), residual (×scale_depth/√N), lm_head (÷hidden/base)
TestWeightMapping (5 tests): Verifies HF→FD weight name mapping (model. → minicpm4.), column/row parallel splits
TestRegistration (4 tests): Model registry, config auto-setattr, architecture name MiniCPMForCausalLM
TestComputeLogits (1 test): End-to-end lm_head scaling with real Paddle tensors

AI Studio V100 GPU Validation

Tested on Baidu AI Studio V100 16GB — job logs: 16/16 passed in 0.09s.

Environment: Tesla V100-SXM2 16GB, CUDA 12.0, PaddlePaddle 3.3.0, Python 3.10.

CI Coverage Job (H20 GPU)

All 16 tests passed in CI (run_tests_with_coverage job):

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED

Local CPU Test Output

$ pytest tests/model_executor/test_minicpm4.py -v
========================= test session starts ==========================
platform linux -- Python 3.13.9, pytest-9.0.2, pluggy-1.5.0
collected 16 items

tests/model_executor/test_minicpm4.py::TestMuPScaling::test_embedding_scaling PASSED [  6%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_value PASSED [ 12%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scaling_applied PASSED [ 18%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scaling PASSED [ 25%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_lm_head_scale_fallback PASSED [ 31%]
tests/model_executor/test_minicpm4.py::TestMuPScaling::test_residual_scale_depth_default PASSED [ 37%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_hf_prefix_rename PASSED [ 43%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_qkv_stacking PASSED [ 50%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_gate_up_stacking PASSED [ 56%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_embed_and_lm_head_rename PASSED [ 62%]
tests/model_executor/test_minicpm4.py::TestWeightMapping::test_weight_name_replacement PASSED [ 68%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_architecture_string PASSED [ 75%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_module_name_is_minicpm4 PASSED [ 81%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_model_classes_exist PASSED [ 87%]
tests/model_executor/test_minicpm4.py::TestRegistration::test_no_qkv_bias PASSED [ 93%]
tests/model_executor/test_minicpm4.py::TestComputeLogits::test_lm_head_scaling_and_vocab_mask PASSED [100%]
======================== 16 passed, 1 warning in 0.55s =================

GPU Validation Note

Full model inference validation requires downloading the 16GB model weights, which exceeds CI test scope. The model architecture is structurally validated by the unit tests above. Full deployment validation can be performed using the commands in the Usage section.

Checklist

Model code follows existing FastDeploy patterns (Qwen2 reference)
All pre-commit checks pass (black, isort, flake8, ruff)
Model registered via @ModelRegistry.register_model_class decorator
Weight mapping supports HuggingFace torch format
Usage documentation provided
Supported models table updated
GPU validation (unit tests passed on V100)
Unit tests: 16/16 passed (CPU + GPU)

- MiniCPM4MLP: gate/up merged, silu activation, no bias - MiniCPM4Attention: GQA with QKVParallelLinear(bias=False), neox rotary - MiniCPM4DecoderLayer: μP residual scaling (scale_depth/sqrt(num_layers)) - MiniCPM4Model: μP embedding scaling (scale_emb), LongRoPE support - MiniCPM4ForCausalLM: μP lm_head scaling (hidden_size/dim_model_base) - Weight mapping: HF model. to FD minicpm4. prefix - Architecture: MiniCPMForCausalLM (GQA, not MLA) - Follows Qwen2 patterns adapted for MiniCPM4 μP parametrization

paddle-bot · 2026-03-23T18:33:53Z

Thanks for your contribution!

codecov-commenter · 2026-03-23T21:17:52Z

Codecov Report

❌ Patch coverage is 30.05181% with 135 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@0b4c1cb). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/models/minicpm4.py	30.05%	135 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6982   +/-   ##
==========================================
  Coverage           ?   73.67%           
==========================================
  Files              ?      400           
  Lines              ?    56129           
  Branches           ?     8847           
==========================================
  Hits               ?    41354           
  Misses             ?    11855           
  Partials           ?     2920

Flag	Coverage Δ
GPU	`73.67% <30.05%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…registration

cloudforge1 · 2026-03-24T08:12:24Z

补充说明一下当前 CI 状态：

Check PR Template 已修复并转绿。
run_tests_with_coverage 的红灯来自 diff-cover --fail-under=80，不是单测失败。该 job 中 TEST_EXIT_CODE=0，所有测试均通过；失败点是新增模型文件 fastdeploy/model_executor/models/minicpm4.py 的 diff coverage 为 30%。
这里是新模型接入场景，不是 H10 单测补覆盖任务。coverage job 不会下载 16GB 模型权重，也不会在 CI 中执行真实模型加载/前向，因此主体模型代码无法像单测任务那样在 diff-cover 中达到 80%。
已补充结构化单测 tests/model_executor/test_minicpm4.py：本地 CPU 16/16 通过，AI Studio V100 16/16 通过，CI 中该测试文件也通过。

如果 reviewer 认为还需要补更多 lightweight structural tests，我可以继续补；当前这条红灯本质上是新模型文件的 coverage policy 与接入场景不完全匹配，不是功能回归。

cloudforge1 added 12 commits March 6, 2026 10:30

Merge remote-tracking branch 'upstream/develop' into develop

daf20d9

Merge remote-tracking branch 'upstream/develop' into develop

6f1e63c

Merge remote-tracking branch 'upstream/develop' into develop

4deb7a7

Merge remote-tracking branch 'upstream/develop' into develop

676daf6

Merge remote-tracking branch 'upstream/develop' into develop

9bcfdca

Merge remote-tracking branch 'upstream/develop' into develop

2bfa878

Merge remote-tracking branch 'upstream/develop' into develop

262c470

Merge remote-tracking branch 'upstream/develop' into develop

171b4d3

Merge remote-tracking branch 'upstream/develop' into develop

def0bd2

Merge remote-tracking branch 'upstream/develop' into develop

4fad5dc

add MiniCPM4 usage documentation and supported models entry

59758a5

cloudforge1 temporarily deployed to Metax_ci March 23, 2026 18:33 — with GitHub Actions Inactive

paddle-bot bot added the contributor External developers label Mar 23, 2026

cloudforge1 temporarily deployed to Metax_ci March 23, 2026 19:41 — with GitHub Actions Inactive

cloudforge1 mentioned this pull request Mar 23, 2026

【Hackathon 10th】开源贡献个人挑战赛 · 春节特别季 PaddlePaddle/Paddle#77429

Open

add CPU-side unit tests for MiniCPM4 μP scaling, weight mapping, and …

1cb8661

…registration

cloudforge1 force-pushed the task/050-minicpm41-model branch from 2e8de3e to 1cb8661 Compare March 23, 2026 22:05

cloudforge1 temporarily deployed to Metax_ci March 23, 2026 22:05 — with GitHub Actions Inactive

luotao1 added the PaddlePaddle Hackathon label Mar 24, 2026

luotao1 self-assigned this Mar 24, 2026

luotao1 assigned chang-wenbin Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]【Hackathon 10th No.50】add MiniCPM4/4.1-8B model support#6982

[Feature]【Hackathon 10th No.50】add MiniCPM4/4.1-8B model support#6982
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
cloudforge1:task/050-minicpm41-model

cloudforge1 commented Mar 23, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 23, 2026

Uh oh!

codecov-commenter commented Mar 23, 2026 •

edited

Loading

Uh oh!

cloudforge1 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cloudforge1 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Model Code (fastdeploy/model_executor/models/minicpm4.py)

Documentation

Design Decisions

Usage or Command

Accuracy Tests

Unit Tests (16/16 passed across 3 environments)

AI Studio V100 GPU Validation

CI Coverage Job (H20 GPU)

Local CPU Test Output

GPU Validation Note

Checklist

Uh oh!

paddle-bot bot commented Mar 23, 2026

Uh oh!

codecov-commenter commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cloudforge1 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cloudforge1 commented Mar 23, 2026 •

edited

Loading

Model Code (`fastdeploy/model_executor/models/minicpm4.py`)

codecov-commenter commented Mar 23, 2026 •

edited

Loading