Pass gracefully if token_id not found in message#3862
Conversation
|
This PR has been automatically converted to draft because all PRs must start as drafts. When you are ready for review, click Ready for Review to begin the review process. This will:
See the contribution guide for more details. |
| from megatron.core.inference.contexts import BaseInferenceContext | ||
| from megatron.core.packed_seq_params import PackedSeqParams | ||
| from megatron.core.process_groups_config import ProcessGroupCollection | ||
| from megatron.core.transformer.cuda_graphs import _CudagraphGlobalRecord |
There was a problem hiding this comment.
Is this change still necessary? Same for the change in transformer_layer.py
There was a problem hiding this comment.
I will rebase, this will go away.
fa102a0 to
3a6b0cf
Compare
|
/ok to test 3a6b0cf |
@santhnm2, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
a4c4971 to
fe1840d
Compare
adbe80a to
53e2731
Compare
Signed-off-by: rislam <rislam@nvidia.com>
Signed-off-by: rislam <rislam@nvidia.com>
Truncate token-id, routing index, hash, and numeric series fields in chat completion logging to reduce log volume and improve readability while preserving key diagnostics.
Improve qwen3 tool parsing for schema variants and coerce JSON-like structured arguments (arrays/objects) into correct payload types so tool calls are emitted in OpenAI-compatible format.
Apply post-parse guardrails to prevent destructive reservation calls when update tools are present and force consistent hold messaging when transferring to human agents.
53e2731 to
8e1f977
Compare
Shift object/array argument coercion from qwen3 parser logic into chat_completions using request tool schemas so normalization happens at a single request-level layer. Keep parser extraction behavior and retain the parser tool-schema lookup compatibility path (dict/object tool configs) to avoid schema resolution regressions. Made-with: Cursor Signed-off-by: rislam <rislam@nvidia.com>
8e1f977 to
5ccb0ed
Compare
santhnm2
left a comment
There was a problem hiding this comment.
I think we need to do a follow-up PR to clean this up and generalize, but will approve for now so we can get it into main.
|
/ok to test a164b8b |
|
/ok to test 416654b |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23578747653 |
What does this PR do ?
Before: some requests had OpenAI-style content like:
chat_completions.pyfixes a hard failure from the strict assert in prefix-replacement logic.Before this patch, the code required:
If either key was missing (or not a list), request handling crashed with an assertion like:
AssertionError: Last assistant message must have prompt_token_ids and generation_token_ids ...Now it:
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.