Skip to content

[None][fix] Cherry-pick PR13488 to fix ima issue#13903

Merged
lfr-0531 merged 2 commits intoNVIDIA:feat/deepseek_v4from
heyuhhh:user/yuhangh/use-pr13488-overlap-ima
May 9, 2026
Merged

[None][fix] Cherry-pick PR13488 to fix ima issue#13903
lfr-0531 merged 2 commits intoNVIDIA:feat/deepseek_v4from
heyuhhh:user/yuhangh/use-pr13488-overlap-ima

Conversation

@heyuhhh
Copy link
Copy Markdown
Collaborator

@heyuhhh heyuhhh commented May 8, 2026

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

heyuhhh and others added 2 commits May 8, 2026 12:01
This partially reverts commit 5509617 by removing the _kv_cache.py CPU-side page-index publication guard. Keep the py_executor.py scheduling changes from that fix.

Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
V2's StorageManager._batched_migrate() launched GPU<->host copies on a TemporaryCudaStream that only waited on per-page/per-slot ready_events. Those events can be stale or NULL'd, so the copy can start before pending forward-pass work on the execution_stream is drained.

Record a fresh CachedCudaEvent on the execution_stream and add it to prior_events so TemporaryCudaStream waits for pending execution_stream work before starting the copy. This mirrors V1 syncWithBufferManager semantics.

(cherry picked from commit 133f507)
Signed-off-by: Yuhang He <58161490+heyuhhh@users.noreply.github.com>
@heyuhhh heyuhhh requested a review from a team as a code owner May 8, 2026 12:06
@heyuhhh heyuhhh requested a review from yizhang-nv May 8, 2026 12:07
@heyuhhh heyuhhh assigned jiaganc and unassigned heyuhhh and jiaganc May 8, 2026
@heyuhhh heyuhhh requested review from jiaganc and lfr-0531 May 8, 2026 12:07
@heyuhhh heyuhhh marked this pull request as draft May 8, 2026 12:13
@heyuhhh heyuhhh marked this pull request as ready for review May 8, 2026 13:34
@heyuhhh
Copy link
Copy Markdown
Collaborator Author

heyuhhh commented May 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47409 [ run ] triggered by Bot. Commit: 2fa8313 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47409 [ run ] completed with state SUCCESS. Commit: 2fa8313
/LLM/main/L0_MergeRequest_PR pipeline #37336 completed with status: 'SUCCESS'

CI Report

Link to invocation

@lfr-0531 lfr-0531 merged commit e9f8376 into NVIDIA:feat/deepseek_v4 May 9, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants