Skip to content

[https://nvbugs/6143599][fix] Re-apply proven fix from commit 295615d8bf (not present in HEAD): subtract 2× pr#13915

Open
tensorrt-cicd wants to merge 1 commit intoNVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6143599
Open

[https://nvbugs/6143599][fix] Re-apply proven fix from commit 295615d8bf (not present in HEAD): subtract 2× pr#13915
tensorrt-cicd wants to merge 1 commit intoNVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6143599

Conversation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

@tensorrt-cicd tensorrt-cicd commented May 8, 2026

Summary

  • Root cause: KV cache budget sized from a small dummy profiling request (2.62 GiB activations) leaves no headroom for real batch=1024+ MoE permute/DeepGEMM workspaces, so PyTorch cap reaches physical GPU OOM during inference.
  • Fix: Re-apply proven fix from commit 295615d (not present in HEAD): subtract 2× profiled activation_bytes from kv_cache_max_memory in configure_kv_cache_capacity, and point extract_stress_test_metrics() default artifacts_dir at os.getcwd()/artifacts to match where aiperf writes.
  • Automated fix generated by repair-bot

Test plan

  • Verify fix on the same GPU type as the original failure
  • Check for regressions in related tests

Links

Summary by CodeRabbit

  • Bug Fixes

    • Improved KV-cache memory management by reserving additional headroom for peak activations and workspace allocations, preventing potential out-of-memory errors.
  • Configuration

    • Updated artifact directory defaults for stress tests to use the current working directory for better consistency.
  • Tests

    • Updated stress test waivers to reflect current configurations.

…d fix stress test artifacts directory

The DeepSeek-V3 tp8 stress test on B200 was failing with CUDA OOM
during high-concurrency inference. Two issues:

1. KV cache budget calculation did not account for dynamic activation
   memory that scales with batch size. Profiling captures activations
   for a small dummy request (2.6 GiB), but runtime activations
   (MoE permute buffers, MLA KV projections) at full batch size are
   significantly larger. Reserve 2x the profiled activation memory
   from the KV cache budget to prevent OOM under sustained load.

2. extract_stress_test_metrics() looked for aiperf artifacts relative
   to the script file location, but aiperf writes them relative to
   the current working directory. Use os.getcwd() instead.

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

KV cache memory allocation now reserves headroom for peak activation memory by subtracting twice the profiled activation bytes from the cache budget. Test infrastructure defaults artifact directories to the current working directory instead of script-relative paths, and stress test waivers are updated to remove the GUARANTEED_NO_EVICT variant for DeepSeek-V3.

Changes

KV Cache and Test Stability

Layer / File(s) Summary
KV Cache Memory Reservation
tensorrt_llm/_torch/pyexecutor/_util.py
KvCacheCreator.configure_kv_cache_capacity subtracts 2 * activation_bytes from estimated kv_cache_max_memory (when activation_bytes > 0), clamps to zero, and logs reserved and adjusted KV-cache budget.
Test Artifact Path and Waivers
tests/integration/defs/stress_test/stress_test.py, tests/integration/test_lists/waives.txt
extract_stress_test_metrics() defaults artifacts_dir to current working directory instead of script-relative path; waiver for DeepSeek-V3_tp8 with GUARANTEED_NO_EVICT scheduler is removed while MAX_UTILIZATION variant remains.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is cut off mid-word ('subtract 2× pr' instead of complete phrase) and lacks the complete fix description, making it incomplete and unclear. Complete the title with the full description, e.g., '[https://nvbugs/6143599][fix] Re-apply fix from commit 295615d: subtract 2× activation bytes from KV cache budget'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description adequately explains the root cause, the fix applied, and includes test coverage verification and relevant links, though the PR checklist is not completed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integration/defs/stress_test/stress_test.py (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the NVIDIA copyright year on this modified file.

This file was modified, but the header still ends at 2024.

Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines: “Include NVIDIA copyright header on all new files; update year on modified files” and “All C++, Python, and other source files must contain NVIDIA copyright header with current modification year”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/defs/stress_test/stress_test.py` at line 1, Update the
copyright header in tests/integration/defs/stress_test/stress_test.py to include
the current modification year (replace "2024" with the current year) so the file
header matches the project's copyright guidelines; locate the top-of-file SPDX
header line and update the year range accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/integration/defs/stress_test/stress_test.py`:
- Line 1: Update the copyright header in
tests/integration/defs/stress_test/stress_test.py to include the current
modification year (replace "2024" with the current year) so the file header
matches the project's copyright guidelines; locate the top-of-file SPDX header
line and update the year range accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 00837d04-0ed9-49bb-9a21-c8dc482cf9a4

📥 Commits

Reviewing files that changed from the base of the PR and between f8572ab and 27c853c.

📒 Files selected for processing (3)
  • tensorrt_llm/_torch/pyexecutor/_util.py
  • tests/integration/defs/stress_test/stress_test.py
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants