Skip to content

【TI-Consisent】Added Metric logits_stats to the ZMQ branch#6979

Open
liuruyan wants to merge 7 commits intoPaddlePaddle:developfrom
liuruyan:logit_stat_dev
Open

【TI-Consisent】Added Metric logits_stats to the ZMQ branch#6979
liuruyan wants to merge 7 commits intoPaddlePaddle:developfrom
liuruyan:logit_stat_dev

Conversation

@liuruyan
Copy link

@liuruyan liuruyan commented Mar 23, 2026

Motivation

背景:出于对训推一致性丰富检测指标及长期CI/CE监控考量,对sample之后的logits添加logits_stats(min/max/mean/std),用来保证确定性及稳定性。

Modifications

数据结构及接口:由于logprob与logits_stat同样是输出的重要检测指标且均从logits计算而来,暂时实现先将logits_stat存入LogprobsTensors数据结构保存,并升级logprob传播链路上相关接口,支持同时透传logits_stats

class LogprobsTensors(NamedTuple):
    """ """

    # [num_reqs, max_num_logprobs + 1]
    logprob_token_ids: paddle.Tensor
    # [num_reqs, max_num_logprobs + 1]
    logprobs: paddle.Tensor
    # [num_reqs]
    selected_token_ranks: paddle.Tensor
    # Logits statistics for each sequence (optional)
    logits_min: Optional[paddle.Tensor] = None  # [num_reqs]
    logits_max: Optional[paddle.Tensor] = None  # [num_reqs]
    logits_mean: Optional[paddle.Tensor] = None  # [num_reqs]
    logits_std: Optional[paddle.Tensor] = None
    ...

FLAG:添加与enable_logprob同级别model_config:self.compute_logits_stats = False,且在server启动时支持配置--compute-logits-stats

注:由于改变了返回字段,导致有些单测无法通过,所以改动单测文件,返回值中提出新增字段(logits_stats )

Usage or Command

  • 本功能暂时只支持ZMQ,流式与非流式测试均可正常返回
  • 启动FD服务时需要同时开启--compute-logits-stats,--enable-logprob
export FD_USE_GET_SAVE_OUTPUT_V1=1 
python -m fastdeploy.entrypoints.openai.api_server \
       --enable-logprob \
       --compute-logits-stats \
       ...  # more setting
  • 发送请求时需要指定logprobs=True,top_logprobs=0
response = client.chat.completions.create(
    model="null",
    messages=[
        {"role": "system", "content": "I'm a helpful AI assistant."},
        {"role": "user", "content": "把李白的静夜思改写为现代诗"},
    ],
    stream=True,  # False
    max_tokens=100,
    logprobs=True,
    top_logprobs=0
)

Accuracy Tests

本PR不涉及精度修改

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Mar 23, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Mar 23, 2026
@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 67.27273% with 18 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@6f5aa88). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/entrypoints/openai/serving_chat.py 51.72% 11 Missing and 3 partials ⚠️
fastdeploy/output/token_processor.py 42.85% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6979   +/-   ##
==========================================
  Coverage           ?   73.84%           
==========================================
  Files              ?      399           
  Lines              ?    56093           
  Branches           ?     8853           
==========================================
  Hits               ?    41421           
  Misses             ?    11743           
  Partials           ?     2929           
Flag Coverage Δ
GPU 73.84% <67.27%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在在 ZMQ 分支的 logprobs 输出链路中新增 logits_stats(min/max/mean/std)指标,用于训推一致性与稳定性监控,并通过新增开关 compute_logits_stats/--compute-logits-stats 控制是否输出。

Changes:

  • 新增 compute_logits_stats 配置与 CLI 参数,并在 engine→worker 启动参数中透传。
  • 扩展 LogprobsTensors/LogprobsLists 以携带 logits 统计信息,并在 OpenAI chat logprobs 响应中输出到 LogProbEntry.logits_stats
  • 更新相关单测/E2E/CE 用例以适配新增字段(部分用例通过剥离 logits_stats 保持断言稳定)。

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/worker/test_gpu_prompt_logprobs.py 适配 gather_logprobs 返回结构变化(从 tuple 改为 NamedTuple)。
tests/output/test_token_processor.py 测试配置补齐 compute_logits_stats 字段。
tests/output/test_process_batch_output.py 适配 top_logprobs 扩展后的字段长度预期。
tests/e2e/4cards_cases/test_ernie_21b_tp1_dp4_mtp.py E2E 通过递归剥离 logits_stats 保持对比稳定。
tests/e2e/4cards_cases/test_ernie_21b_tp1_dp4.py 同上,新增剥离工具函数并在断言前处理。
tests/ce/server/test_logprobs.py CE 用例新增剥离 logits_stats 以兼容新增返回字段。
fastdeploy/worker/xpu_model_runner.py 适配 gather_logprobs 新返回结构的字段访问方式。
fastdeploy/worker/metax_model_runner.py 同上。
fastdeploy/worker/gpu_model_runner.py 同上。
fastdeploy/worker/worker_process.py worker 侧新增 --compute_logits_stats 参数。
fastdeploy/worker/output.py 扩展 Logprobs* 结构以承载 logits 统计。
fastdeploy/output/token_processor.py ZMQ 输出处理链路中提取并填充 outputs.logits_stats
fastdeploy/entrypoints/openai/serving_completion.py prompt_logprobs 解包逻辑适配新增字段。
fastdeploy/entrypoints/openai/serving_chat.py chat logprobs 构建逻辑支持 logits_stats 并透传到协议层。
fastdeploy/entrypoints/openai/protocol.py OpenAI 协议结构 LogProbEntry 新增 logits_stats 字段。
fastdeploy/entrypoints/llm.py prompt_logprobs 解包逻辑适配新增字段。
fastdeploy/engine/request.py CompletionOutput 增加 logits_stats 并在序列化/打印中包含。
fastdeploy/engine/engine.py 启动 worker 时透传 compute_logits_stats 开关。
fastdeploy/engine/args_utils.py engine 侧新增 --compute-logits-stats 参数。
fastdeploy/config.py ModelConfig 增加 compute_logits_stats 字段。

Comment on lines 895 to 902
def _build_logprobs_response(
self,
request_logprobs: bool,
response_logprobs: Optional[LogprobsLists],
request_top_logprobs: int,
request_decode_flag: bool,
logits_stats: Optional[dict[str, float]] = None,
) -> Optional[LogProbs]:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_logprobs_response 的类型注解使用了 dict[str, float],但本文件未启用 from __future__ import annotations,在 Python 3.7/3.8(setup.py 标注支持 >=3.7)会在导入时报 TypeError: 'type' object is not subscriptable。建议改为 Optional[Dict[str, float]](并从 typing 引入 Dict),或在文件顶部添加 from __future__ import annotations 后再统一使用内置泛型。

Copilot uses AI. Check for mistakes.
Comment on lines 44 to +51
logprobs: list[list[float]]
# [num_reqs]
sampled_token_ranks: list[int]
# Logits statistics for each sequence (optional)
logits_min: Optional[list[float]] = None # [num_reqs]
logits_max: Optional[list[float]] = None # [num_reqs]
logits_mean: Optional[list[float]] = None # [num_reqs]
logits_std: Optional[list[float]] = None # [num_reqs]
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本文件未启用 from __future__ import annotations,但新增的 Optional[list[float]] / list[list[int]] 等内置泛型注解在 Python 3.7/3.8 下会导致导入时异常;同时 setup.py 仍声明 python_requires=">=3.7"。建议:1)在文件顶部增加 from __future__ import annotations;或 2)把这些新增注解改为 Optional[List[float]] 等 typing 形式并补充导入,以保持与声明的 Python 版本兼容。

Copilot uses AI. Check for mistakes.
Comment on lines 58 to +69
return LogprobsLists(
[row[start:end] for row in self.logprob_token_ids],
[row[start:end] for row in self.logprobs],
self.sampled_token_ranks, # unchanged
# [row[start:end] for row in self.logits_min],
# [row[start:end] for row in self.logits_max],
# [row[start:end] for row in self.logits_mean],
# [row[start:end] for row in self.logits_std],
self.logits_min, # unchanged
self.logits_max, # unchanged
self.logits_mean, # unchanged
self.logits_std, # unchanged
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice_columns 里保留了被注释掉的 logits_* 切片代码(62-65 行),当前实现又选择“unchanged”透传这些字段,容易让人误解哪些字段需要随列切片。建议删除注释代码并在 docstring/注释里明确 logits_* 的维度语义(是否按 position/token 对齐,还是按 request 对齐)。

Copilot uses AI. Check for mistakes.
Comment on lines +356 to +367
assert (
logprobs_list.logits_min is not None
), "logits_min is None when compute_logits_stats is enabled"
assert (
logprobs_list.logits_max is not None
), "logits_max is None when compute_logits_stats is enabled"
assert (
logprobs_list.logits_mean is not None
), "logits_mean is None when compute_logits_stats is enabled"
assert (
logprobs_list.logits_std is not None
), "logits_std is None when compute_logits_stats is enabled"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里用 assert ... is not None 来保证 logits_* 存在:
1)在 python -O 下 assert 会被跳过,可能导致后续 float(None) 等异常;
2)assert 触发后会被外层 except 吞掉,只打 warning,最终静默缺失 logits_stats,与 --compute-logits-stats 的预期不一致。
建议改成显式的条件判断:若缺字段则记录更明确的错误并决定是否直接报错/降级关闭 logits_stats 输出。

Suggested change
assert (
logprobs_list.logits_min is not None
), "logits_min is None when compute_logits_stats is enabled"
assert (
logprobs_list.logits_max is not None
), "logits_max is None when compute_logits_stats is enabled"
assert (
logprobs_list.logits_mean is not None
), "logits_mean is None when compute_logits_stats is enabled"
assert (
logprobs_list.logits_std is not None
), "logits_std is None when compute_logits_stats is enabled"
missing_fields = []
if logprobs_list.logits_min is None:
missing_fields.append("logits_min")
if logprobs_list.logits_max is None:
missing_fields.append("logits_max")
if logprobs_list.logits_mean is None:
missing_fields.append("logits_mean")
if logprobs_list.logits_std is None:
missing_fields.append("logits_std")
if missing_fields:
# When compute_logits_stats is enabled, all logits_* fields must be present
raise ValueError(
"Missing logits stats fields when compute_logits_stats is enabled: "
+ ", ".join(missing_fields)
)

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +36
def _strip_logits_stats(obj):
"""Recursively remove 'logits_stats' keys from logprobs response."""
if isinstance(obj, dict):
obj.pop("logits_stats", None)
for v in obj.values():
_strip_logits_stats(v)
elif isinstance(obj, list):
for item in obj:
_strip_logits_stats(item)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_strip_logits_stats 在多个测试文件中以相同实现重复出现(该文件与另外的 e2e/ce 用例都新增了一份)。建议抽到 tests 的公共工具模块(例如 tests/e2e/utils 或 tests/ce/server/core)并复用,减少后续字段变更时需要同步修改的点。

Copilot uses AI. Check for mistakes.
Comment on lines 893 to +901
default=EngineArgs.enable_logprob,
help="Enable output of token-level log probabilities.",
)
model_group.add_argument(
"--compute-logits-stats",
action="store_true",
default=EngineArgs.compute_logits_stats,
help="Enable per-token logits statistics (min/max/mean/std) output.",
)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 标题目前为“【TI-Consisent】...”,不符合仓库要求的 [CLASS]Title 格式(模板里给出的 tag 列表如 [Feature] / [BugFix] 等)。建议将标题改为类似 [Feature] Add logits_stats metric for ZMQ logprobs,并修正 Consisent 的拼写以便后续检索与自动化流程识别。

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@ckl117 ckl117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要补充下单测,过覆盖率

"use_internode_ll_two_stage": self.cfg.parallel_config.use_internode_ll_two_stage,
"disable_sequence_parallel_moe": self.cfg.parallel_config.disable_sequence_parallel_moe,
"enable_logprob": self.cfg.model_config.enable_logprob,
"compute_logits_stats": self.cfg.model_config.compute_logits_stats,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common_engine.py中也得加这个参数

Comment on lines +48 to +51
logits_min: Optional[list[float]] = None # [num_reqs]
logits_max: Optional[list[float]] = None # [num_reqs]
logits_mean: Optional[list[float]] = None # [num_reqs]
logits_std: Optional[list[float]] = None # [num_reqs]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR里没有这些参数的计算逻辑?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants