【TI-Consisent】Added Metric logits_stats to the ZMQ branch#6979
【TI-Consisent】Added Metric logits_stats to the ZMQ branch#6979liuruyan wants to merge 7 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
…into logit_stat_dev
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6979 +/- ##
==========================================
Coverage ? 73.84%
==========================================
Files ? 399
Lines ? 56093
Branches ? 8853
==========================================
Hits ? 41421
Misses ? 11743
Partials ? 2929
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…into logit_stat_dev
There was a problem hiding this comment.
Pull request overview
该 PR 旨在在 ZMQ 分支的 logprobs 输出链路中新增 logits_stats(min/max/mean/std)指标,用于训推一致性与稳定性监控,并通过新增开关 compute_logits_stats/--compute-logits-stats 控制是否输出。
Changes:
- 新增
compute_logits_stats配置与 CLI 参数,并在 engine→worker 启动参数中透传。 - 扩展
LogprobsTensors/LogprobsLists以携带 logits 统计信息,并在 OpenAI chat logprobs 响应中输出到LogProbEntry.logits_stats。 - 更新相关单测/E2E/CE 用例以适配新增字段(部分用例通过剥离
logits_stats保持断言稳定)。
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/worker/test_gpu_prompt_logprobs.py | 适配 gather_logprobs 返回结构变化(从 tuple 改为 NamedTuple)。 |
| tests/output/test_token_processor.py | 测试配置补齐 compute_logits_stats 字段。 |
| tests/output/test_process_batch_output.py | 适配 top_logprobs 扩展后的字段长度预期。 |
| tests/e2e/4cards_cases/test_ernie_21b_tp1_dp4_mtp.py | E2E 通过递归剥离 logits_stats 保持对比稳定。 |
| tests/e2e/4cards_cases/test_ernie_21b_tp1_dp4.py | 同上,新增剥离工具函数并在断言前处理。 |
| tests/ce/server/test_logprobs.py | CE 用例新增剥离 logits_stats 以兼容新增返回字段。 |
| fastdeploy/worker/xpu_model_runner.py | 适配 gather_logprobs 新返回结构的字段访问方式。 |
| fastdeploy/worker/metax_model_runner.py | 同上。 |
| fastdeploy/worker/gpu_model_runner.py | 同上。 |
| fastdeploy/worker/worker_process.py | worker 侧新增 --compute_logits_stats 参数。 |
| fastdeploy/worker/output.py | 扩展 Logprobs* 结构以承载 logits 统计。 |
| fastdeploy/output/token_processor.py | ZMQ 输出处理链路中提取并填充 outputs.logits_stats。 |
| fastdeploy/entrypoints/openai/serving_completion.py | prompt_logprobs 解包逻辑适配新增字段。 |
| fastdeploy/entrypoints/openai/serving_chat.py | chat logprobs 构建逻辑支持 logits_stats 并透传到协议层。 |
| fastdeploy/entrypoints/openai/protocol.py | OpenAI 协议结构 LogProbEntry 新增 logits_stats 字段。 |
| fastdeploy/entrypoints/llm.py | prompt_logprobs 解包逻辑适配新增字段。 |
| fastdeploy/engine/request.py | CompletionOutput 增加 logits_stats 并在序列化/打印中包含。 |
| fastdeploy/engine/engine.py | 启动 worker 时透传 compute_logits_stats 开关。 |
| fastdeploy/engine/args_utils.py | engine 侧新增 --compute-logits-stats 参数。 |
| fastdeploy/config.py | ModelConfig 增加 compute_logits_stats 字段。 |
| def _build_logprobs_response( | ||
| self, | ||
| request_logprobs: bool, | ||
| response_logprobs: Optional[LogprobsLists], | ||
| request_top_logprobs: int, | ||
| request_decode_flag: bool, | ||
| logits_stats: Optional[dict[str, float]] = None, | ||
| ) -> Optional[LogProbs]: |
There was a problem hiding this comment.
_build_logprobs_response 的类型注解使用了 dict[str, float],但本文件未启用 from __future__ import annotations,在 Python 3.7/3.8(setup.py 标注支持 >=3.7)会在导入时报 TypeError: 'type' object is not subscriptable。建议改为 Optional[Dict[str, float]](并从 typing 引入 Dict),或在文件顶部添加 from __future__ import annotations 后再统一使用内置泛型。
| logprobs: list[list[float]] | ||
| # [num_reqs] | ||
| sampled_token_ranks: list[int] | ||
| # Logits statistics for each sequence (optional) | ||
| logits_min: Optional[list[float]] = None # [num_reqs] | ||
| logits_max: Optional[list[float]] = None # [num_reqs] | ||
| logits_mean: Optional[list[float]] = None # [num_reqs] | ||
| logits_std: Optional[list[float]] = None # [num_reqs] |
There was a problem hiding this comment.
本文件未启用 from __future__ import annotations,但新增的 Optional[list[float]] / list[list[int]] 等内置泛型注解在 Python 3.7/3.8 下会导致导入时异常;同时 setup.py 仍声明 python_requires=">=3.7"。建议:1)在文件顶部增加 from __future__ import annotations;或 2)把这些新增注解改为 Optional[List[float]] 等 typing 形式并补充导入,以保持与声明的 Python 版本兼容。
| return LogprobsLists( | ||
| [row[start:end] for row in self.logprob_token_ids], | ||
| [row[start:end] for row in self.logprobs], | ||
| self.sampled_token_ranks, # unchanged | ||
| # [row[start:end] for row in self.logits_min], | ||
| # [row[start:end] for row in self.logits_max], | ||
| # [row[start:end] for row in self.logits_mean], | ||
| # [row[start:end] for row in self.logits_std], | ||
| self.logits_min, # unchanged | ||
| self.logits_max, # unchanged | ||
| self.logits_mean, # unchanged | ||
| self.logits_std, # unchanged |
There was a problem hiding this comment.
slice_columns 里保留了被注释掉的 logits_* 切片代码(62-65 行),当前实现又选择“unchanged”透传这些字段,容易让人误解哪些字段需要随列切片。建议删除注释代码并在 docstring/注释里明确 logits_* 的维度语义(是否按 position/token 对齐,还是按 request 对齐)。
| assert ( | ||
| logprobs_list.logits_min is not None | ||
| ), "logits_min is None when compute_logits_stats is enabled" | ||
| assert ( | ||
| logprobs_list.logits_max is not None | ||
| ), "logits_max is None when compute_logits_stats is enabled" | ||
| assert ( | ||
| logprobs_list.logits_mean is not None | ||
| ), "logits_mean is None when compute_logits_stats is enabled" | ||
| assert ( | ||
| logprobs_list.logits_std is not None | ||
| ), "logits_std is None when compute_logits_stats is enabled" |
There was a problem hiding this comment.
这里用 assert ... is not None 来保证 logits_* 存在:
1)在 python -O 下 assert 会被跳过,可能导致后续 float(None) 等异常;
2)assert 触发后会被外层 except 吞掉,只打 warning,最终静默缺失 logits_stats,与 --compute-logits-stats 的预期不一致。
建议改成显式的条件判断:若缺字段则记录更明确的错误并决定是否直接报错/降级关闭 logits_stats 输出。
| assert ( | |
| logprobs_list.logits_min is not None | |
| ), "logits_min is None when compute_logits_stats is enabled" | |
| assert ( | |
| logprobs_list.logits_max is not None | |
| ), "logits_max is None when compute_logits_stats is enabled" | |
| assert ( | |
| logprobs_list.logits_mean is not None | |
| ), "logits_mean is None when compute_logits_stats is enabled" | |
| assert ( | |
| logprobs_list.logits_std is not None | |
| ), "logits_std is None when compute_logits_stats is enabled" | |
| missing_fields = [] | |
| if logprobs_list.logits_min is None: | |
| missing_fields.append("logits_min") | |
| if logprobs_list.logits_max is None: | |
| missing_fields.append("logits_max") | |
| if logprobs_list.logits_mean is None: | |
| missing_fields.append("logits_mean") | |
| if logprobs_list.logits_std is None: | |
| missing_fields.append("logits_std") | |
| if missing_fields: | |
| # When compute_logits_stats is enabled, all logits_* fields must be present | |
| raise ValueError( | |
| "Missing logits stats fields when compute_logits_stats is enabled: " | |
| + ", ".join(missing_fields) | |
| ) |
| def _strip_logits_stats(obj): | ||
| """Recursively remove 'logits_stats' keys from logprobs response.""" | ||
| if isinstance(obj, dict): | ||
| obj.pop("logits_stats", None) | ||
| for v in obj.values(): | ||
| _strip_logits_stats(v) | ||
| elif isinstance(obj, list): | ||
| for item in obj: | ||
| _strip_logits_stats(item) | ||
|
|
There was a problem hiding this comment.
_strip_logits_stats 在多个测试文件中以相同实现重复出现(该文件与另外的 e2e/ce 用例都新增了一份)。建议抽到 tests 的公共工具模块(例如 tests/e2e/utils 或 tests/ce/server/core)并复用,减少后续字段变更时需要同步修改的点。
| default=EngineArgs.enable_logprob, | ||
| help="Enable output of token-level log probabilities.", | ||
| ) | ||
| model_group.add_argument( | ||
| "--compute-logits-stats", | ||
| action="store_true", | ||
| default=EngineArgs.compute_logits_stats, | ||
| help="Enable per-token logits statistics (min/max/mean/std) output.", | ||
| ) |
There was a problem hiding this comment.
PR 标题目前为“【TI-Consisent】...”,不符合仓库要求的 [CLASS]Title 格式(模板里给出的 tag 列表如 [Feature] / [BugFix] 等)。建议将标题改为类似 [Feature] Add logits_stats metric for ZMQ logprobs,并修正 Consisent 的拼写以便后续检索与自动化流程识别。
| "use_internode_ll_two_stage": self.cfg.parallel_config.use_internode_ll_two_stage, | ||
| "disable_sequence_parallel_moe": self.cfg.parallel_config.disable_sequence_parallel_moe, | ||
| "enable_logprob": self.cfg.model_config.enable_logprob, | ||
| "compute_logits_stats": self.cfg.model_config.compute_logits_stats, |
| logits_min: Optional[list[float]] = None # [num_reqs] | ||
| logits_max: Optional[list[float]] = None # [num_reqs] | ||
| logits_mean: Optional[list[float]] = None # [num_reqs] | ||
| logits_std: Optional[list[float]] = None # [num_reqs] |
Motivation
背景:出于对训推一致性丰富检测指标及长期CI/CE监控考量,对sample之后的logits添加logits_stats(min/max/mean/std),用来保证确定性及稳定性。
Modifications
数据结构及接口:由于logprob与logits_stat同样是输出的重要检测指标且均从logits计算而来,暂时实现先将logits_stat存入LogprobsTensors数据结构保存,并升级logprob传播链路上相关接口,支持同时透传logits_stats
FLAG:添加与enable_logprob同级别model_config:
self.compute_logits_stats = False,且在server启动时支持配置--compute-logits-statsUsage or Command
--compute-logits-stats,--enable-logproblogprobs=True,top_logprobs=0response = client.chat.completions.create( model="null", messages=[ {"role": "system", "content": "I'm a helpful AI assistant."}, {"role": "user", "content": "把李白的静夜思改写为现代诗"}, ], stream=True, # False max_tokens=100, logprobs=True, top_logprobs=0 )Accuracy Tests
本PR不涉及精度修改
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.