Skip to content

[BugFix] Fix model loading error for 300B FP8 EP parallel test case#6382

Open
Echo-Nie wants to merge 4 commits intoPaddlePaddle:developfrom
Echo-Nie:fix_bug
Open

[BugFix] Fix model loading error for 300B FP8 EP parallel test case#6382
Echo-Nie wants to merge 4 commits intoPaddlePaddle:developfrom
Echo-Nie:fix_bug

Conversation

@Echo-Nie
Copy link
Contributor

@Echo-Nie Echo-Nie commented Feb 6, 2026

Motivation

Fix model loading error in 300B FP8 EP parallel test case: the error occurred when transposing weights (dimension mismatch due to missing weight adaptation before loading).

Modifications

  1. Added weight_adapter function to handle weight renaming.
  2. Replaced original weights_iterator with adapted_iterator (processed by weight_adapter) in both cache and direct loading paths.
  3. Fixed parameter count mismatch in get_padding_offset call.

Usage or Command

model_path=ERNIE-4.5-300B-A47B-FP8-Paddle
config_yaml=yaml/eb45-8k-fp8-tp1-dp8_ep.yaml
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
ep_engine_ports="5378,5379,5380,5381,5382,5383,5384,5385"
metrics_ports="5110,5111,5112,5113,5114,5115,5116,5117"
ports="8188,8189,8190,8191,8192,8193,8194,8195"
cache_ports="9320,9321,9322,9323,9324,9325,9326,9327"
server_num=8
export FD_ENABLE_MULTI_API_SERVER=1
export KVCACHE_RDMA_NICS="mlx5_2,mlx5_3,mlx5_4,mlx5_5,mlx5_6,mlx5_7,mlx5_8,mlx5_9"
python -m fastdeploy.entrypoints.openai.multi_api_server --ports ${ports} --num-servers ${server_num} --metrics-ports ${metrics_ports} --args --model ${model_path} --cache-queue-port ${cache_ports} --engine-worker-queue-port ${ep_engine_ports} --config ${config_yaml} >server.log 2>&1 &

Accuracy Tests

Service Satrt

[INFO] Application startup complete.
[INFO] 127.0.0.1:25712 - "GET /v1/models HTTP/1.1" 200
[INFO] 127.0.0.1:25712 - "POST /v1/chat/completions HTTP/1.1" 200

Test

connecting: http://localhost:8188/v1 ...
model: /workspace/FirstBuildFD/bug2_3/bd
📝 Question: '1+1=?'
🤖 Response: The question "1 + 1 = ?" is a basic arithmetic problem. 
In standard arithmetic, when we add 1 and 1 together, the result is 2. 
So, 1 + 1 = 2.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@2ffcb3d). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...y/model_executor/model_loader/default_loader_v1.py 75.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6382   +/-   ##
==========================================
  Coverage           ?   68.36%           
==========================================
  Files              ?      391           
  Lines              ?    52250           
  Branches           ?     8148           
==========================================
  Hits               ?    35723           
  Misses             ?    13918           
  Partials           ?     2609           
Flag Coverage Δ
GPU 68.36% <75.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants