Skip to content

Conversation

@hiworldwzj
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hiworldwzj, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the maintainability and robustness of the unit test suite. It involves a substantial refactoring of test file locations and import paths to align with a more generalized common base model architecture. Additionally, it introduces mechanisms to gracefully handle missing dependencies and hardware-specific features, preventing unnecessary test failures. A notable change is the removal of several dedicated FP8 quantization tests, suggesting an evolution in how these features are validated. Overall, the changes aim to streamline the testing process and improve the clarity of internal data structures.

Highlights

  • Test File Restructuring and Renaming: Numerous unit test files for Llama and Deepseek2 models have been renamed and moved from model-specific directories (e.g., models/llama/triton_kernel) to a more generalized common base model structure (e.g., common/basemodel/triton_kernel), indicating a significant refactoring effort to centralize shared kernel tests.
  • Improved Test Robustness and Environment Handling: Several tests now include pytest.skip directives to conditionally bypass execution if specific dependencies (like lightllmKernel, deep_ep, deep_gemm, sgl_kernel) are not installed or if the hardware does not natively support FP8. This prevents test failures in non-standard environments.
  • Refactoring of InferStateInfo Parameters: Variable names within InferStateInfo objects have been updated for clarity and consistency, specifically renaming max_len_in_batch to max_kv_seq_len and b_start_loc to b_q_start_loc in several attention-related tests.
  • Removal of Specific FP8 Quantization Tests: A set of unit tests specifically designed for FP8 quantization in Llama and Deepseek2 models, including those using FlashAttention v3 and FlashInfer, have been removed. This might indicate a consolidation of FP8 testing or a shift in quantization strategy.
  • Test Environment Setup for Server Components: New fixtures have been added to test_req.py and test_shm_req_manager.py to explicitly set environment variables for server-side components, ensuring consistent and controlled test execution conditions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on fixing and refactoring unit tests. The changes include moving kernels to a common location and updating import paths, renaming variables for clarity, and simplifying function call APIs. Several tests are now correctly skipped if their dependencies (like custom kernels or FP8 support) are not met, which improves the robustness of the test suite. A critical fix was made to a benchmark test by adding torch.cuda.Event.synchronize() for accurate timing. Overall, these changes significantly improve the correctness and maintainability of the unit tests. I have added a couple of minor suggestions to improve consistency in the pytest.skip messages.

@@ -1,4 +1,7 @@
import pytest

pytest.skip(reason="need install lightllmkernel", allow_module_level=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency across the test suite, it's better to use a consistent name for the kernel in the skip reason. In test_ppl_int8kv_flash_decoding_diverse.py, the reason is 'need install lightllmKernel'. Please consider using the same capitalization here.

Suggested change
pytest.skip(reason="need install lightllmkernel", allow_module_level=True)
pytest.skip(reason="need install lightllmKernel", allow_module_level=True)



if not is_fp8_native_supported():
pytest.skip("not support fp8 in this gpu card", allow_module_level=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and consistency, it's good practice to use the reason keyword argument for pytest.skip. Also, to align with other tests checking for FP8 support (e.g., in test_moe_silu_and_mul_mix_quant_ep.py), consider using a more descriptive and consistent reason like 'not support fp8 test in this gpu card'.

Suggested change
pytest.skip("not support fp8 in this gpu card", allow_module_level=True)
pytest.skip(reason="not support fp8 test in this gpu card", allow_module_level=True)

@hiworldwzj hiworldwzj merged commit 5d2f630 into main Jan 10, 2026
1 check passed
@hiworldwzj hiworldwzj deleted the wzj branch January 10, 2026 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants