[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8#6997
[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8#6997Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:developfrom
Conversation
Most single-GPU and small-model deployments do not need 64MB custom all-reduce buffers. Lowering the default to 8MB reduces unnecessary shared memory allocation. Tests that require larger buffers now explicitly set the value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
|
gongweibao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Pull request overview
该 PR 调整自定义 all-reduce 共享 buffer 的默认配置:将 FD_CUSTOM_AR_MAX_SIZE_MB 默认值从 64MB 下调到 8MB,以降低多数小规模部署的共享内存占用,并在确定性相关 e2e 用例中显式使用更大的 buffer 以保持稳定性。
Changes:
- 将
FD_CUSTOM_AR_MAX_SIZE_MB默认值从 64 改为 8,并更新对应注释 - 将
tests/e2e/4cards_cases/test_determinism_long.py中该环境变量的默认回退值改为 64
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| fastdeploy/envs.py | 下调 custom all-reduce buffer 默认值并更新注释 |
| tests/e2e/4cards_cases/test_determinism_long.py | 调整 determinism long e2e 用例中 buffer 的默认回退值 |
| # Custom all-reduce max buffer size in MB (default 8MB). | ||
| # Increase this to avoid NCCL fallback for large tensors in deterministic mode. |
There was a problem hiding this comment.
这里的注释与实际行为不一致:在 FD_DETERMINISTIC_MODE=1 时,如果输入 tensor 超过 max_size,会直接抛 RuntimeError(communication._ensure_deterministic_ready),不会“fallback 到 NCCL”。建议把注释改成“超大 tensor 会报错/需要调大该值以满足 deterministic all-reduce 的 max_size 限制”,避免误导。
| # Custom all-reduce max buffer size in MB (default 8MB). | |
| # Increase this to avoid NCCL fallback for large tensors in deterministic mode. | |
| # Custom deterministic all-reduce max buffer size in MB (default 8MB). | |
| # When FD_DETERMINISTIC_MODE=1, tensors larger than this limit will raise an error | |
| # instead of falling back to NCCL. Increase this value to avoid max_size errors. |
| # Increase this to avoid NCCL fallback for large tensors in deterministic mode. | ||
| # E.g. FD_CUSTOM_AR_MAX_SIZE_MB=128 for 128MB. | ||
| "FD_CUSTOM_AR_MAX_SIZE_MB": lambda: int(os.getenv("FD_CUSTOM_AR_MAX_SIZE_MB", "64")), | ||
| "FD_CUSTOM_AR_MAX_SIZE_MB": lambda: int(os.getenv("FD_CUSTOM_AR_MAX_SIZE_MB", "8")), |
There was a problem hiding this comment.
FD_CUSTOM_AR_MAX_SIZE_MB 这里直接 int(os.getenv(...)),未校验取值范围;如果被设置为 0/负数,会导致 CustomAllreduce 分配共享 buffer 时 size_in_bytes 非法并在更底层报错,排查困难。建议新增一个校验(类似 _validate_split_kv_size),确保该值为正整数(例如 >=1),并在不合法时给出明确异常信息。
| "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES", "0,1,2,3"), | ||
| "FD_DETERMINISTIC_MODE": "1", | ||
| "FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "57"), | ||
| "FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "64"), |
There was a problem hiding this comment.
该测试这里允许外部环境变量覆盖 FD_CUSTOM_AR_MAX_SIZE_MB;如果 CI/本地恰好设置为更小值(比如现在默认 8),可能导致 deterministic all-reduce 在大 tensor 场景直接报错而引入不稳定。考虑与同目录其他 determinism e2e 测试保持一致,直接固定为 "64"(如 _test_determinism_offline.py),让用例结果不受外部环境影响。
| "FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "64"), | |
| "FD_CUSTOM_AR_MAX_SIZE_MB": "64", |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #6997 +/- ##
==========================================
Coverage ? 73.85%
==========================================
Files ? 399
Lines ? 56045
Branches ? 8849
==========================================
Hits ? 41392
Misses ? 11727
Partials ? 2926
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
The default
FD_CUSTOM_AR_MAX_SIZE_MBof 64MB is unnecessarily large for most single-GPU and small-model deployments. Reducing it to 8MB lowers shared memory allocation overhead. Multi-GPU or large-model scenarios that need bigger buffers can set the env var explicitly.Modifications
fastdeploy/envs.py: Change default value from64to8, update comment accordingly.tests/e2e/4cards_cases/test_determinism_long.py: Explicitly setFD_CUSTOM_AR_MAX_SIZE_MB=64(was using57as fallback; now aligned with other test files).Usage or Command
Checklist
pre-commitbefore commit.🤖 Generated with Claude Code