[#8542][feat] AutoDeploy: add Llama-3.1-8B FP8 perf-sanity test on H100#14039
[#8542][feat] AutoDeploy: add Llama-3.1-8B FP8 perf-sanity test on H100#14039MrGeva wants to merge 1 commit into
Conversation
…0 single GPU Mirrors the structure of ``super_ad_blackwell-super_ad_ws1_1k1k`` but targets single-GPU AutoDeploy on Hopper. Changes: - ``tests/integration/defs/perf/test_perf_sanity.py``: add ``llama_v3.1_8b_instruct_fp8`` → ``llama-3.1-model/Llama-3.1-8B-Instruct-FP8`` to ``MODEL_PATH_DICT`` so the new config can resolve the model directory. - ``tests/scripts/perf-sanity/aggregated/llama3_1_8b_fp8_ad_hopper.yaml``: new perf-sanity config with one server config ``llama3_1_8b_ad_ws1_1k1k`` — ``backend: _autodeploy``, ``world_size: 1``, pointing at ``examples/auto_deploy/model_registry/configs/llama3_1_8b.yaml`` (FP8 + trtllm attention + GEMM/RoPE/SiLU fusions). Client config matches the reference: concurrency 64, 10 iterations, ISL=OSL=1024, openai backend. - ``tests/integration/test_lists/test-db/l0_h100.yml``: enroll the new test in the pre-merge ``backend: autodeploy`` block on single-GPU H100. The new test is discoverable as ``perf/test_perf_sanity.py::test_e2e[aggr_upload-llama3_1_8b_fp8_ad_hopper-llama3_1_8b_ad_ws1_1k1k]`` and maps to the ``H100_PCIe-AutoDeploy-1`` CI stage. Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
/bot run --extra-stage "H100_PCIe-AutoDeploy-1" --disable-fail-fast |
📝 WalkthroughWalkthroughThis pull request adds Llama v3.1 8B FP8 model support to the performance sanity testing infrastructure. It registers the model path, creates a test configuration for Hopper-class GPUs with auto-deploy backend, and integrates the test into the H100 pre-merge test suite. ChangesLlama 3.1 8B FP8 Performance Sanity Testing
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/defs/perf/test_perf_sanity.py (1)
1-1:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate copyright year in modified Python source header.
This file was modified, so the NVIDIA header year range should include 2026.
Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.As per coding guidelines: “Include NVIDIA copyright header on all new files; update year on modified files.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/defs/perf/test_perf_sanity.py` at line 1, Update the SPDX copyright header in the modified Python source so the NVIDIA year range includes 2026; locate the top-of-file header comment (the SPDX/ copyright line) in test_perf_sanity.py and change the year range 2022-2025 to 2022-2026.
🧹 Nitpick comments (1)
tests/integration/test_lists/test-db/l0_h100.yml (1)
526-527: ⚡ Quick winAdd this test to QA scheduled perf lists if it should run in the QA cadence.
The test is present in test-db (l0_h100.yml), but there is no corresponding entry in the QA scheduled list files (
tests/integration/test_lists/qa/llm_perf_*.yml). If this test is intended for QA scheduled runs, add it to the appropriate file (likelyllm_perf_hopper.ymlor similar single-node H100 perf list, depending on what list convention exists for H100).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_lists/test-db/l0_h100.yml` around lines 526 - 527, The test perf/test_perf_sanity.py::test_e2e[aggr_upload-llama3_1_8b_fp8_ad_hopper-llama3_1_8b_ad_ws1_1k1k] is present in tests/integration/test_lists/test-db/l0_h100.yml but missing from the QA scheduled perf lists; add this exact test identifier to the appropriate QA schedule file (e.g., tests/integration/test_lists/qa/llm_perf_hopper.yml or the single-node H100 perf list following existing naming conventions) so it runs in the QA cadence, placing it under the same section/grouping used for other H100 hopper tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Line 1: Update the SPDX copyright header in the modified Python source so the
NVIDIA year range includes 2026; locate the top-of-file header comment (the
SPDX/ copyright line) in test_perf_sanity.py and change the year range 2022-2025
to 2022-2026.
---
Nitpick comments:
In `@tests/integration/test_lists/test-db/l0_h100.yml`:
- Around line 526-527: The test
perf/test_perf_sanity.py::test_e2e[aggr_upload-llama3_1_8b_fp8_ad_hopper-llama3_1_8b_ad_ws1_1k1k]
is present in tests/integration/test_lists/test-db/l0_h100.yml but missing from
the QA scheduled perf lists; add this exact test identifier to the appropriate
QA schedule file (e.g., tests/integration/test_lists/qa/llm_perf_hopper.yml or
the single-node H100 perf list following existing naming conventions) so it runs
in the QA cadence, placing it under the same section/grouping used for other
H100 hopper tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b5fd8fab-1d81-4279-8441-55a8bc368a69
📒 Files selected for processing (3)
tests/integration/defs/perf/test_perf_sanity.pytests/integration/test_lists/test-db/l0_h100.ymltests/scripts/perf-sanity/aggregated/llama3_1_8b_fp8_ad_hopper.yaml
|
PR_Github #47934 [ run ] triggered by Bot. Commit: |
|
PR_Github #47934 [ run ] completed with state
|
Mirrors the structure of
super_ad_blackwell-super_ad_ws1_1k1kbut targets single-GPU AutoDeploy on Hopper.Changes:
tests/integration/defs/perf/test_perf_sanity.py: addllama_v3.1_8b_instruct_fp8→llama-3.1-model/Llama-3.1-8B-Instruct-FP8toMODEL_PATH_DICTso the new config can resolve the model directory.tests/scripts/perf-sanity/aggregated/llama3_1_8b_fp8_ad_hopper.yaml: new perf-sanity config with one server configllama3_1_8b_ad_ws1_1k1k—backend: _autodeploy,world_size: 1, pointing atexamples/auto_deploy/model_registry/configs/llama3_1_8b.yaml(FP8 + trtllm attention + GEMM/RoPE/SiLU fusions). Client config matches the reference: concurrency 64, 10 iterations, ISL=OSL=1024, openai backend.tests/integration/test_lists/test-db/l0_h100.yml: enroll the new test in the pre-mergebackend: autodeployblock on single-GPU H100.The new test is discoverable as
perf/test_perf_sanity.py::test_e2e[aggr_upload-llama3_1_8b_fp8_ad_hopper-llama3_1_8b_ad_ws1_1k1k]and maps to theH100_PCIe-AutoDeploy-1CI stage.Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.