[XPU] [CI] Fix xpu ci bug by plusNew001 · Pull Request #7014 · PaddlePaddle/FastDeploy

plusNew001 · 2026-03-25T11:42:07Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-25T11:42:16Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 旨在修复 XPU CI 场景下日志难以获取/排查的问题：在 pytest 执行过程中按用例归档关键日志，并在 GitHub Actions 中将归档目录作为 artifact 上传，便于失败后回溯。

Changes:

在 tests/xpu_ci/conftest.py 新增 pytest hook，将 log* 目录/文件与 server.log 归档到 case_logs/<case>/。
在 4 卡/8 卡 XPU CI workflow 中创建 case_logs、调整权限，并无条件上传 artifact。
workflow 中通过 set +e 捕获 pytest 退出码，保证日志归档/上传步骤仍会执行。

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
tests/xpu_ci/conftest.py	增加用例级日志归档逻辑与 pytest hook
.github/workflows/_xpu_8cards_case_test.yml	创建/授权 case_logs 并上传 8 卡用例日志 artifact
.github/workflows/_xpu_4cards_case_test.yml	创建/授权 case_logs 并上传 4 卡用例日志 artifact

Copilot · 2026-03-25T11:45:24Z

tests/xpu_ci/conftest.py

+    # 复制所有 log* 目录
+    for entry in glob.glob("log*"):
+        if os.path.isdir(entry):
+            shutil.copytree(entry, os.path.join(dest_dir, entry), dirs_exist_ok=True)


这里使用了 shutil.copytree(..., dirs_exist_ok=True)。该参数仅在 Python 3.8+ 可用，而仓库 python_requires 为 >=3.7，会导致在 Python 3.7 环境直接 TypeError。建议改为兼容实现（例如：若目标目录已存在则先删除/清空后再 copytree，或手动遍历复制），避免依赖 dirs_exist_ok。

Suggested change

shutil.copytree(entry, os.path.join(dest_dir, entry), dirs_exist_ok=True)

dest_path = os.path.join(dest_dir, entry)

if os.path.exists(dest_path):

shutil.rmtree(dest_path)

shutil.copytree(entry, dest_path)

Copilot · 2026-03-25T11:45:25Z

tests/xpu_ci/conftest.py

+@pytest.hookimpl(hookwrapper=True, trylast=True)
+def pytest_runtest_makereport(item, call):
+    """每个测试阶段结束后归档日志（仅在 call 阶段后执行）"""
+    outcome = yield
+    report = outcome.get_result()
+
+    if report.when == "call":
+        # 使用测试文件名（不含 .py）作为归档目录名
+        test_file = os.path.basename(item.fspath)
+        test_name = os.path.splitext(test_file)[0]
+        try:
+            _archive_case_logs(test_name)
+        except Exception as e:


当前 hook 在每个用例的 call 阶段都会归档日志，并且遍历打印归档目录下所有文件。XPU CI 用例数量较多时，这会显著增加 I/O 和 CI 日志输出量，导致执行变慢、artifact 体积膨胀。建议仅在失败时（report.failed）或通过环境变量/pytest 选项开关启用归档与文件清单输出。

Copilot · 2026-03-25T11:45:25Z

tests/xpu_ci/conftest.py

+        # 使用测试文件名（不含 .py）作为归档目录名
+        test_file = os.path.basename(item.fspath)
+        test_name = os.path.splitext(test_file)[0]
+        try:


归档目录名目前仅使用测试文件名（不含 .py）。如果同一文件内存在多个测试函数/参数化用例，日志会被反复覆盖或混合，定位问题会变困难。建议使用 item.nodeid（做路径/非法字符替换）或包含测试函数名的更细粒度标识作为目录名。

plusNew001 · 2026-03-25T11:50:45Z

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_gpu

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Copilot · 2026-03-26T03:15:02Z

tests/xpu_ci/conftest.py

+# ============ 日志归档 pytest hook ============
+
+
+def _archive_case_logs(test_name):


PR 描述目前基本是模板占位，缺少“为什么要改/解决了什么问题/如何验证”的信息。建议补充至少：触发的 XPU CI bug 现象、根因、改动点，以及如何在 CI/本地复现与验证（例如失败时需要哪些日志、上传 artifact 的预期）。

Copilot · 2026-03-26T03:15:02Z

tests/xpu_ci/conftest.py

+
+    if report.when == "call":
+        # 使用测试文件名（不含 .py）作为归档目录名
+        test_file = os.path.basename(item.fspath)


这里通过 item.fspath 获取测试文件名。item.fspath 在较新的 pytest 版本中已被弃用（趋向使用 item.path / pathlib.Path），未来升级 pytest 可能导致属性不存在，从而让日志归档 hook 失效。建议改用 pytest 推荐的路径属性（并在需要时兼容旧版本）。

Suggested change

test_file = os.path.basename(item.fspath)

# Prefer pytest's newer `item.path` API and fall back to `item.fspath` for older versions

test_path = getattr(item, "path", None)

if test_path is None:

test_path = getattr(item, "fspath", "")

test_file = os.path.basename(str(test_path))

Copilot · 2026-03-26T03:15:03Z

.github/workflows/_xpu_4cards_case_test.yml

+
+      - name: Upload case logs
+        if: always()
+        uses: actions/upload-artifact@v6


该 workflow 里使用 actions/upload-artifact@v6，但仓库内同类 XPU workflow（例如 _xpu_8cards_case_test.yml）使用的是 @v4。建议统一到同一主版本（并确认所选版本在当前 GitHub Actions 上可用），避免不同 workflow 行为不一致或因版本不可用导致上传步骤失败。

Suggested change

uses: actions/upload-artifact@v6

uses: actions/upload-artifact@v4

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot · 2026-03-26T06:36:34Z

tests/xpu_ci/conftest.py

+CASE_LOGS_DIR = os.path.join(os.getcwd(), "case_logs")
+


PR 描述目前基本是模板空白（Motivation/Modifications/Usage/Tests 未填写）。建议补充：该 CI bug 的具体表现、根因、修复点，以及如何在本地/CI 复现与验证（例如期望产出的 case_logs 结构与上传 artifact 名称）。

Copilot · 2026-03-26T06:36:35Z

.github/workflows/_xpu_4cards_case_test.yml

          else
              echo "============================4卡cases测试失败,请检查日志!============================"
-              exit $exit_code
+              exit 1


这里在捕获到 pytest 失败后固定 exit 1 会丢失真实的 pytest 退出码（例如用于区分超时/内部错误/用例失败等），也和 8cards workflow 的处理方式不一致。建议保留并 exit $exit_code，让上游能拿到准确失败原因。

Suggested change

exit 1

exit "$exit_code"

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Copilot · 2026-03-26T07:18:17Z

tests/xpu_ci/conftest.py

+    if report.when == "call":
+        # 使用测试文件名（不含 .py）作为归档目录名
+        test_file = os.path.basename(item.fspath)
+        test_name = os.path.splitext(test_file)[0]
+        try:
+            _archive_case_logs(test_name)


当前 hook 在每个用例 call 阶段都会归档日志（无论成功/失败），可能导致 CI 额外 I/O 和产物体积显著增加。若目的是排查失败，建议仅在 report.failed（或 report.outcome != "passed"）时再触发归档。

Copilot · 2026-03-26T07:18:17Z

tests/xpu_ci/conftest.py

+        try:
+            _archive_case_logs(test_name)
+        except Exception:
+            pass


这里捕获了所有异常后直接 pass，如果归档失败会被静默吞掉，后续排障很难定位原因。建议至少打印 warning/异常信息（或用 pytest 的 terminalreporter/logging 记录），并考虑只捕获预期异常类型。

Copilot · 2026-03-26T07:18:17Z

.github/workflows/_xpu_8cards_case_test.yml

          echo "============================开始运行pytest测试============================"
          export PYTHONPATH=/workspace/FastDeploy/
          export PYTHONPATH=$(pwd)/tests/xpu_ci:$PYTHONPATH
+          mkdir -p case_logs
+          set +e
          python -m pytest -v -s --tb=short tests/xpu_ci/8cards_cases/
          exit_code=$?


PR 描述目前基本未填写（Motivation/Modifications/Usage/Command/Tests 等为空），难以确认修复的具体问题、复现/验证方式以及为何需要这些改动。建议补充：1) 具体 CI bug 现象/日志；2) 本次改动如何修复；3) 如何在本地或 CI 中验证（命令/期望结果）。

fix xpu ci bug

641a90d

Copilot AI review requested due to automatic review settings March 25, 2026 11:42

plusNew001 had a problem deploying to Metax_ci March 25, 2026 11:42 — with GitHub Actions Error

paddle-bot bot added the contributor External developers label Mar 25, 2026

Copilot started reviewing on behalf of plusNew001 March 25, 2026 11:42 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Remove unnecessary blank line in conftest.py

3fdcec0

plusNew001 temporarily deployed to Metax_ci March 25, 2026 11:52 — with GitHub Actions Inactive

Update upload-artifact action to version 6

1c3c73a

Copilot AI review requested due to automatic review settings March 26, 2026 03:10

plusNew001 had a problem deploying to Metax_ci March 26, 2026 03:10 — with GitHub Actions Error

Update _xpu_8cards_case_test.yml

cdbb863

plusNew001 temporarily deployed to Metax_ci March 26, 2026 03:11 — with GitHub Actions Inactive

Copilot started reviewing on behalf of plusNew001 March 26, 2026 03:11 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

fix ci bug

e1f921a

plusNew001 temporarily deployed to Metax_ci March 26, 2026 03:32 — with GitHub Actions Inactive

Change exit code on test failure to 1

d73a8a9

Copilot AI review requested due to automatic review settings March 26, 2026 06:32

plusNew001 temporarily deployed to Metax_ci March 26, 2026 06:32 — with GitHub Actions Inactive

Copilot started reviewing on behalf of plusNew001 March 26, 2026 06:33 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

fix ci bug

20e47c2

plusNew001 temporarily deployed to Metax_ci March 26, 2026 07:13 — with GitHub Actions Inactive

fix ci bug

b5e3f8f

Copilot AI review requested due to automatic review settings March 26, 2026 07:15

plusNew001 temporarily deployed to Metax_ci March 26, 2026 07:15 — with GitHub Actions Inactive

Copilot started reviewing on behalf of plusNew001 March 26, 2026 07:15 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

fix ci bug

5ef57d0

plusNew001 temporarily deployed to Metax_ci March 26, 2026 08:45 — with GitHub Actions Inactive

-            shutil.copytree(entry, os.path.join(dest_dir, entry), dirs_exist_ok=True)
+            dest_path = os.path.join(dest_dir, entry)
+            if os.path.exists(dest_path):
+                shutil.rmtree(dest_path)
+            shutil.copytree(entry, dest_path)

		# ============ 日志归档 pytest hook ============


		def _archive_case_logs(test_name):

-        test_file = os.path.basename(item.fspath)
+        # Prefer pytest's newer `item.path` API and fall back to `item.fspath` for older versions
+        test_path = getattr(item, "path", None)
+        if test_path is None:
+            test_path = getattr(item, "fspath", "")
+        test_file = os.path.basename(str(test_path))

	uses: actions/upload-artifact@v6
	uses: actions/upload-artifact@v4

Conversation

plusNew001 commented Mar 25, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

plusNew001 commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants