[ci] fix fail testcase and add generate testcase in pr test #4231

zhulinJulia24 · 2025-12-23T06:36:06Z

--dump-res-length in the oc command to display output tokens and extended timeout from 3600s to 10800s.
refactor workflows by dividing them into GPU-based tasks.
add generate testcases for A100, 3090, and 5080 in daily testing and pr test.
refactored stop_restful_api to use the terminate API, replacing the direct termination of processors for improved stability.
made minor adjustments to improve test case stability.

Copilot

Pull request overview

This PR refactors the CI/CD test infrastructure to improve stability and expand test coverage. The main changes include implementing a terminate API for gracefully stopping RESTful servers, adding generate test cases for different GPU configurations (A100, 3090, 5080), refactoring parallel configuration handling, and reorganizing workflow structures to support GPU-based task division.

Key changes:

Replaced direct process termination with a terminate API endpoint for improved stability
Added --dump-res-length flag to OpenCompass commands and extended timeout from 3600s to 10800s
Introduced parametrized test cases for multiple backends and models in restful interface tests
Refactored evaluation workflows to support parallel execution across different GPU configurations

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
autotest/utils/run_restful_chat.py	Added `terminate_restful_api` function to use terminate API endpoint; added `--allow-terminate-by-client` flag and `start_new_session=True` to subprocess
autotest/utils/restful_return_check.py	Replaced `get_repeat_times` with `has_repeated_fragment` function that returns tuple `(bool, dict)`
autotest/utils/pipeline_chat.py	Added `start_new_session=True` to subprocess calls; removed comment lines
autotest/utils/evaluate_utils.py	Refactored to use `parallel_config_str` instead of `tp_num`; extracted `get_parallel_config_str` helper function
autotest/utils/constant.py	Added new config entries for 32k context lengths and lists of models/backends for restful tests
autotest/utils/config_utils.py	Added filtering logic to exclude 'w8a8' and 'gptq' models from benchmark lists
autotest/utils/common_utils.py	Fixed indentation issue with return statement; removed `kill_process` function and psutil dependency
autotest/utils/benchmark_utils.py	Adjusted longtext test parameters; added `--session-len` parameter; improved result handling
autotest/tools/restful/*.py	Updated fixtures to use `terminate_restful_api` in finally blocks
autotest/interface/restful/*.py	Added parametrization for backend and model; adjusted test assertions and thresholds
autotest/evaluate/*.py	Updated to use `terminate_restful_api`; adjusted session lengths
.github/workflows/*.yml	Added new test jobs for generate endpoint testing; restructured matrix configurations; updated container images
autotest/config*.yaml	Updated model lists and paths

Comments suppressed due to low confidence (5)

autotest/interface/restful/test_restful_chat_completions_v1.py:237

The function name has a typo: has_repeated_fragment should check if the function returns (True, info_dict) when repetitions are found, but the assertion on line 237 doesn't unpack the tuple. The function returns a tuple (bool, dict) but the assertion treats it as a single boolean value.
autotest/interface/restful/test_restful_chat_completions_v1.py:24
The test class is parametrized with both backend and model_case but the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server at BASE_URL regardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
autotest/interface/restful/test_restful_chat_completions_v1.py:82
The test class is parametrized with both backend and model_case but the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server at BASE_URL regardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
autotest/interface/restful/test_restful_chat_completions_v1.py:549
The test class is parametrized with both backend and model_case but the parameters are not being used to set up the appropriate test environment. The tests always connect to a single server at BASE_URL regardless of the parameter values. This means tests will run with the wrong backend/model combination, or fail if no server is running for that combination.
autotest/interface/restful/test_restful_chat_completions_v1.py:255
The function has_repeated_fragment returns a tuple (bool, dict), but it's being used as a boolean assertion. Need to unpack the tuple properly: has_repetition, _ = has_repeated_fragment(response) and then assert on has_repetition.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

autotest/evaluate/test_api_evaluate.py

autotest/interface/restful/test_restful_generate.py

autotest/utils/run_restful_chat.py

.github/workflows/pr_ete_test.yml

autotest/interface/restful/test_restful_completions_v1.py

autotest/utils/run_restful_chat.py

autotest/interface/pipeline/test_pipeline_func.py

.github/workflows/daily_ete_test_5080.yml

.github/workflows/daily_ete_test.yml

.github/workflows/pr_ete_test.yml

lvhan028 and others added 30 commits November 29, 2025 21:35

bump version to v0.11.0

9717024

fix

9482800

fix sm70 sm75 compilation

c0623e4

update according to comments

cc38c0b

Merge branch 'main' into bump-version

cf5e478

update

dff7e52

update

833ff93

merge

780d3da

fix error

948f2b7

update

5794ded

merge main

cca9472

update

1adec3a

update

b64018b

update

5b986b8

Merge branch 'InternLM:main' into fix_fail_testcase

4d69675

merge main

09cb51e

update

0de5167

update

924057f

update

064a10f

update

db735fd

Merge branch 'InternLM:main' into fix_fail_testcase

56c620b

update

9c54a84

fix models and outdir

dd3e564

update

343b19b

update model coverage

f995ac4

Merge branch 'InternLM:main' into fix_fail_testcase

8751342

update

d87bbfa

update

b18aec0

fix pid

7ef6f7f

update

ba58758

zhulinJulia24 requested a review from Copilot December 23, 2025 06:36

Copilot started reviewing on behalf of zhulinJulia24 December 23, 2025 06:36 View session

fix lint

0b20e5d

Copilot AI reviewed Dec 23, 2025

View reviewed changes

zhulinJulia24 and others added 7 commits December 23, 2025 14:54

fix lint

0f8da66

fix lint

c661772

fix benchmark return

8799166

Merge branch 'InternLM:main' into fix_fail_testcase

428eb99

add mount in pr_test

5acbe18

Update pr_ete_test.yml

5186d35

update

29a5f74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ci] fix fail testcase and add generate testcase in pr test #4231

[ci] fix fail testcase and add generate testcase in pr test #4231

Uh oh!

zhulinJulia24 commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ci] fix fail testcase and add generate testcase in pr test #4231

Are you sure you want to change the base?

[ci] fix fail testcase and add generate testcase in pr test #4231

Uh oh!

Conversation

zhulinJulia24 commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants