[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8 by gongweibao · Pull Request #6997 · PaddlePaddle/FastDeploy

gongweibao · 2026-03-24T13:16:09Z

Motivation

The default FD_CUSTOM_AR_MAX_SIZE_MB of 64MB is unnecessarily large for most single-GPU and small-model deployments. Reducing it to 8MB lowers shared memory allocation overhead. Multi-GPU or large-model scenarios that need bigger buffers can set the env var explicitly.

Modifications

fastdeploy/envs.py: Change default value from 64 to 8, update comment accordingly.
tests/e2e/4cards_cases/test_determinism_long.py: Explicitly set FD_CUSTOM_AR_MAX_SIZE_MB=64 (was using 57 as fallback; now aligned with other test files).

Usage or Command

# Use default 8MB buffer
python -m fastdeploy.entrypoints.openai.api_server ...

# Override for multi-GPU with large tensors
export FD_CUSTOM_AR_MAX_SIZE_MB=64

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. No new unit tests needed — this is a config default change; existing e2e tests cover the behavior.
Provide accuracy results.

🤖 Generated with Claude Code

Most single-GPU and small-model deployments do not need 64MB custom all-reduce buffers. Lowering the default to 8MB reduces unnecessary shared memory allocation. Tests that require larger buffers now explicitly set the value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddle-bot · 2026-03-24T13:16:16Z

Thanks for your contribution!

CLAassistant · 2026-03-24T13:16:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

gongweibao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull request overview

该 PR 调整自定义 all-reduce 共享 buffer 的默认配置：将 FD_CUSTOM_AR_MAX_SIZE_MB 默认值从 64MB 下调到 8MB，以降低多数小规模部署的共享内存占用，并在确定性相关 e2e 用例中显式使用更大的 buffer 以保持稳定性。

Changes:

将 FD_CUSTOM_AR_MAX_SIZE_MB 默认值从 64 改为 8，并更新对应注释
将 tests/e2e/4cards_cases/test_determinism_long.py 中该环境变量的默认回退值改为 64

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
fastdeploy/envs.py	下调 custom all-reduce buffer 默认值并更新注释
tests/e2e/4cards_cases/test_determinism_long.py	调整 determinism long e2e 用例中 buffer 的默认回退值

Copilot · 2026-03-24T13:19:09Z

fastdeploy/envs.py

+    # Custom all-reduce max buffer size in MB (default 8MB).
    # Increase this to avoid NCCL fallback for large tensors in deterministic mode.


这里的注释与实际行为不一致：在 FD_DETERMINISTIC_MODE=1 时，如果输入 tensor 超过 max_size，会直接抛 RuntimeError（communication._ensure_deterministic_ready），不会“fallback 到 NCCL”。建议把注释改成“超大 tensor 会报错/需要调大该值以满足 deterministic all-reduce 的 max_size 限制”，避免误导。

Suggested change

# Custom all-reduce max buffer size in MB (default 8MB).

# Increase this to avoid NCCL fallback for large tensors in deterministic mode.

# Custom deterministic all-reduce max buffer size in MB (default 8MB).

# When FD_DETERMINISTIC_MODE=1, tensors larger than this limit will raise an error

# instead of falling back to NCCL. Increase this value to avoid max_size errors.

Copilot · 2026-03-24T13:19:10Z

fastdeploy/envs.py

    # Increase this to avoid NCCL fallback for large tensors in deterministic mode.
    # E.g. FD_CUSTOM_AR_MAX_SIZE_MB=128 for 128MB.
-    "FD_CUSTOM_AR_MAX_SIZE_MB": lambda: int(os.getenv("FD_CUSTOM_AR_MAX_SIZE_MB", "64")),
+    "FD_CUSTOM_AR_MAX_SIZE_MB": lambda: int(os.getenv("FD_CUSTOM_AR_MAX_SIZE_MB", "8")),


FD_CUSTOM_AR_MAX_SIZE_MB 这里直接 int(os.getenv(...))，未校验取值范围；如果被设置为 0/负数，会导致 CustomAllreduce 分配共享 buffer 时 size_in_bytes 非法并在更底层报错，排查困难。建议新增一个校验（类似 _validate_split_kv_size），确保该值为正整数（例如 >=1），并在不合法时给出明确异常信息。

Copilot · 2026-03-24T13:19:10Z

tests/e2e/4cards_cases/test_determinism_long.py

            "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES", "0,1,2,3"),
            "FD_DETERMINISTIC_MODE": "1",
-            "FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "57"),
+            "FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "64"),


该测试这里允许外部环境变量覆盖 FD_CUSTOM_AR_MAX_SIZE_MB；如果 CI/本地恰好设置为更小值（比如现在默认 8），可能导致 deterministic all-reduce 在大 tensor 场景直接报错而引入不稳定。考虑与同目录其他 determinism e2e 测试保持一致，直接固定为 "64"（如 _test_determinism_offline.py），让用例结果不受外部环境影响。

Suggested change

"FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "64"),

"FD_CUSTOM_AR_MAX_SIZE_MB": "64",

codecov-commenter · 2026-03-24T15:10:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@6f5aa88). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6997   +/-   ##
==========================================
  Coverage           ?   73.85%           
==========================================
  Files              ?      399           
  Lines              ?    56045           
  Branches           ?     8849           
==========================================
  Hits               ?    41392           
  Misses             ?    11727           
  Partials           ?     2926

Flag	Coverage Δ
GPU	`73.85% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings March 24, 2026 13:16

gongweibao temporarily deployed to Metax_ci March 24, 2026 13:16 — with GitHub Actions Inactive

Copilot started reviewing on behalf of gongweibao March 24, 2026 13:16 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Jiang-Jia-Jun approved these changes Mar 25, 2026

View reviewed changes

Jiang-Jia-Jun merged commit 48cfb60 into PaddlePaddle:develop Mar 25, 2026
36 of 41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8#6997

[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8#6997
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:developfrom
gongweibao:fixcustomarsize

gongweibao commented Mar 24, 2026

Uh oh!

paddle-bot bot commented Mar 24, 2026

Uh oh!

CLAassistant commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

codecov-commenter commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		# Custom all-reduce max buffer size in MB (default 8MB).
		# Increase this to avoid NCCL fallback for large tensors in deterministic mode.

-    # Custom all-reduce max buffer size in MB (default 8MB).
-    # Increase this to avoid NCCL fallback for large tensors in deterministic mode.
+    # Custom deterministic all-reduce max buffer size in MB (default 8MB).
+    # When FD_DETERMINISTIC_MODE=1, tensors larger than this limit will raise an error
+    # instead of falling back to NCCL. Increase this value to avoid max_size errors.

	"FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "64"),
	"FD_CUSTOM_AR_MAX_SIZE_MB": "64",

Conversation

gongweibao commented Mar 24, 2026

Motivation

Modifications

Usage or Command

Checklist

Uh oh!

paddle-bot bot commented Mar 24, 2026

Uh oh!

CLAassistant commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 24, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants