Skip to content

Conversation

@zhulinJulia24
Copy link
Collaborator

  1. --dump-res-length in the oc command to display output tokens and extended timeout from 3600s to 10800s.
  2. refactor workflows by dividing them into GPU-based tasks.
  3. add generate testcases for A100, 3090, and 5080 in daily testing and pr test.
  4. refactored stop_restful_api to use the terminate API, replacing the direct termination of processors for improved stability.
  5. made minor adjustments to improve test case stability.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the CI/CD test infrastructure to improve stability and expand test coverage. The main changes include implementing a terminate API for gracefully stopping RESTful servers, adding generate test cases for different GPU configurations (A100, 3090, 5080), refactoring parallel configuration handling, and reorganizing workflow structures to support GPU-based task division.

Key changes:

  • Replaced direct process termination with a terminate API endpoint for improved stability
  • Added --dump-res-length flag to OpenCompass commands and extended timeout from 3600s to 10800s
  • Introduced parametrized test cases for multiple backends and models in restful interface tests
  • Refactored evaluation workflows to support parallel execution across different GPU configurations

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
autotest/utils/run_restful_chat.py Added terminate_restful_api function to use terminate API endpoint; added --allow-terminate-by-client flag and start_new_session=True to subprocess
autotest/utils/restful_return_check.py Replaced get_repeat_times with has_repeated_fragment function that returns tuple (bool, dict)
autotest/utils/pipeline_chat.py Added start_new_session=True to subprocess calls; removed comment lines
autotest/utils/evaluate_utils.py Refactored to use parallel_config_str instead of tp_num; extracted get_parallel_config_str helper function
autotest/utils/constant.py Added new config entries for 32k context lengths and lists of models/backends for restful tests
autotest/utils/config_utils.py Added filtering logic to exclude 'w8a8' and 'gptq' models from benchmark lists
autotest/utils/common_utils.py Fixed indentation issue with return statement; removed kill_process function and psutil dependency
autotest/utils/benchmark_utils.py Adjusted longtext test parameters; added --session-len parameter; improved result handling
autotest/tools/restful/*.py Updated fixtures to use terminate_restful_api in finally blocks
autotest/interface/restful/*.py Added parametrization for backend and model; adjusted test assertions and thresholds
autotest/evaluate/*.py Updated to use terminate_restful_api; adjusted session lengths
.github/workflows/*.yml Added new test jobs for generate endpoint testing; restructured matrix configurations; updated container images
autotest/config*.yaml Updated model lists and paths
Comments suppressed due to low confidence (5)

autotest/interface/restful/test_restful_chat_completions_v1.py:237

  • The function name has a typo: has_repeated_fragment should check if the function returns (True, info_dict) when repetitions are found, but the assertion on line 237 doesn't unpack the tuple. The function returns a tuple (bool, dict) but the assertion treats it as a single boolean value.
    autotest/interface/restful/test_restful_chat_completions_v1.py:24
  • The test class is parametrized with both backend and model_case but the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server at BASE_URL regardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
    autotest/interface/restful/test_restful_chat_completions_v1.py:82
  • The test class is parametrized with both backend and model_case but the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server at BASE_URL regardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
    autotest/interface/restful/test_restful_chat_completions_v1.py:549
  • The test class is parametrized with both backend and model_case but the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server at BASE_URL regardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
    autotest/interface/restful/test_restful_chat_completions_v1.py:255
  • The function has_repeated_fragment returns a tuple (bool, dict), but it's being used as a boolean assertion. Need to unpack the tuple properly: has_repetition, _ = has_repeated_fragment(response) and then assert on has_repetition.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants