-
Notifications
You must be signed in to change notification settings - Fork 638
[ci] fix fail testcase and add generate testcase in pr test #4231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
zhulinJulia24
commented
Dec 23, 2025
- --dump-res-length in the oc command to display output tokens and extended timeout from 3600s to 10800s.
- refactor workflows by dividing them into GPU-based tasks.
- add generate testcases for A100, 3090, and 5080 in daily testing and pr test.
- refactored stop_restful_api to use the terminate API, replacing the direct termination of processors for improved stability.
- made minor adjustments to improve test case stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the CI/CD test infrastructure to improve stability and expand test coverage. The main changes include implementing a terminate API for gracefully stopping RESTful servers, adding generate test cases for different GPU configurations (A100, 3090, 5080), refactoring parallel configuration handling, and reorganizing workflow structures to support GPU-based task division.
Key changes:
- Replaced direct process termination with a terminate API endpoint for improved stability
- Added
--dump-res-lengthflag to OpenCompass commands and extended timeout from 3600s to 10800s - Introduced parametrized test cases for multiple backends and models in restful interface tests
- Refactored evaluation workflows to support parallel execution across different GPU configurations
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| autotest/utils/run_restful_chat.py | Added terminate_restful_api function to use terminate API endpoint; added --allow-terminate-by-client flag and start_new_session=True to subprocess |
| autotest/utils/restful_return_check.py | Replaced get_repeat_times with has_repeated_fragment function that returns tuple (bool, dict) |
| autotest/utils/pipeline_chat.py | Added start_new_session=True to subprocess calls; removed comment lines |
| autotest/utils/evaluate_utils.py | Refactored to use parallel_config_str instead of tp_num; extracted get_parallel_config_str helper function |
| autotest/utils/constant.py | Added new config entries for 32k context lengths and lists of models/backends for restful tests |
| autotest/utils/config_utils.py | Added filtering logic to exclude 'w8a8' and 'gptq' models from benchmark lists |
| autotest/utils/common_utils.py | Fixed indentation issue with return statement; removed kill_process function and psutil dependency |
| autotest/utils/benchmark_utils.py | Adjusted longtext test parameters; added --session-len parameter; improved result handling |
| autotest/tools/restful/*.py | Updated fixtures to use terminate_restful_api in finally blocks |
| autotest/interface/restful/*.py | Added parametrization for backend and model; adjusted test assertions and thresholds |
| autotest/evaluate/*.py | Updated to use terminate_restful_api; adjusted session lengths |
| .github/workflows/*.yml | Added new test jobs for generate endpoint testing; restructured matrix configurations; updated container images |
| autotest/config*.yaml | Updated model lists and paths |
Comments suppressed due to low confidence (5)
autotest/interface/restful/test_restful_chat_completions_v1.py:237
- The function name has a typo:
has_repeated_fragmentshould check if the function returns(True, info_dict)when repetitions are found, but the assertion on line 237 doesn't unpack the tuple. The function returns a tuple(bool, dict)but the assertion treats it as a single boolean value.
autotest/interface/restful/test_restful_chat_completions_v1.py:24 - The test class is parametrized with both
backendandmodel_casebut the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server atBASE_URLregardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
autotest/interface/restful/test_restful_chat_completions_v1.py:82 - The test class is parametrized with both
backendandmodel_casebut the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server atBASE_URLregardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
autotest/interface/restful/test_restful_chat_completions_v1.py:549 - The test class is parametrized with both
backendandmodel_casebut the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server atBASE_URLregardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
autotest/interface/restful/test_restful_chat_completions_v1.py:255 - The function
has_repeated_fragmentreturns a tuple(bool, dict), but it's being used as a boolean assertion. Need to unpack the tuple properly:has_repetition, _ = has_repeated_fragment(response)and then assert onhas_repetition.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.