-
Notifications
You must be signed in to change notification settings - Fork 728
[XPU] [CI] Fix xpu ci bug #7014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
641a90d
3fdcec0
1c3c73a
cdbb863
e1f921a
d73a8a9
20e47c2
b5e3f8f
5ef57d0
b112f1a
b142c22
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -182,8 +182,14 @@ jobs: | |
| echo "============================开始运行pytest测试============================" | ||
| export PYTHONPATH=/workspace/FastDeploy/ | ||
| export PYTHONPATH=$(pwd)/tests/xpu_ci:$PYTHONPATH | ||
| mkdir -p case_logs | ||
| set +e | ||
| python -m pytest -v -s --tb=short tests/xpu_ci/8cards_cases/ | ||
| exit_code=$? | ||
|
Comment on lines
182
to
188
|
||
| set -e | ||
|
|
||
| # 修改case_logs权限,确保Docker外部的runner用户可以读取并上传 | ||
| chmod -R a+rX case_logs/ 2>/dev/null || true | ||
|
|
||
| if [ $exit_code -eq 0 ]; then | ||
| echo "============================8卡cases测试通过!============================" | ||
|
|
@@ -192,3 +198,12 @@ jobs: | |
| exit $exit_code | ||
| fi | ||
| ' | ||
|
|
||
| - name: Upload case logs | ||
| if: always() | ||
| uses: actions/upload-artifact@v6 | ||
| with: | ||
| name: xpu-8cards-case-logs | ||
| path: FastDeploy/case_logs/ | ||
| retention-days: 7 | ||
| if-no-files-found: ignore | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -23,6 +23,7 @@ | |||||||||||||||||||||||||||||||||||||
| 4. 环境配置 - 设置XPU相关环境变量 | ||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| import glob | ||||||||||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||||||||||
| import shutil | ||||||||||||||||||||||||||||||||||||||
|
|
@@ -31,6 +32,8 @@ | |||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| import pytest | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| CASE_LOGS_DIR = os.path.join(os.getcwd(), "case_logs") | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+35
to
+36
|
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| def get_xpu_id(): | ||||||||||||||||||||||||||||||||||||||
| """获取XPU_ID环境变量""" | ||||||||||||||||||||||||||||||||||||||
|
|
@@ -457,3 +460,42 @@ def setup_logprobs_zmq_env(): | |||||||||||||||||||||||||||||||||||||
| os.environ[key] = value | ||||||||||||||||||||||||||||||||||||||
| print(f"设置环境变量: {key}={value}") | ||||||||||||||||||||||||||||||||||||||
| return original_values | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # ============ 日志归档 pytest hook ============ | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| def _archive_case_logs(test_name): | ||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+465
to
+468
|
||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||
| 将当前工作目录下所有 log 开头的文件夹和 server.log 复制到 case_logs/{test_name}/ 下 | ||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||
| dest_dir = os.path.join(CASE_LOGS_DIR, test_name) | ||||||||||||||||||||||||||||||||||||||
| os.makedirs(dest_dir, exist_ok=True) | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # 复制所有 log* 目录 | ||||||||||||||||||||||||||||||||||||||
| for entry in glob.glob("log*"): | ||||||||||||||||||||||||||||||||||||||
| if os.path.isdir(entry): | ||||||||||||||||||||||||||||||||||||||
| shutil.copytree(entry, os.path.join(dest_dir, entry), dirs_exist_ok=True) | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
| shutil.copytree(entry, os.path.join(dest_dir, entry), dirs_exist_ok=True) | |
| dest_path = os.path.join(dest_dir, entry) | |
| if os.path.exists(dest_path): | |
| shutil.rmtree(dest_path) | |
| shutil.copytree(entry, dest_path) |
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的注释“处理 server.log 等 log 开头的文件”与实际逻辑不一致:该分支处理的是 log* 匹配到的普通文件(例如 workerlog),而 server.log 下面又有单独分支复制。建议更新注释以避免误导后续维护。
| # 处理 server.log 等 log 开头的文件 | |
| # Copy regular files starting with "log" (e.g. workerlog); server.log is handled separately below |
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里使用了 shutil.copytree(..., dirs_exist_ok=True),该参数仅在 Python 3.8+ 支持;仓库 setup.py 声明 python_requires>=3.7,会导致在 Python 3.7 环境运行 xpu_ci 直接 TypeError。建议改为兼容写法(例如目标目录存在时先删除/清空再 copytree,或手动递归复制以实现“合并”效果),避免依赖 dirs_exist_ok。
| # 复制所有 log* 目录 | |
| for entry in glob.glob("log*"): | |
| if os.path.isdir(entry): | |
| shutil.copytree(entry, os.path.join(dest_dir, entry), dirs_exist_ok=True) | |
| elif os.path.isfile(entry): | |
| # 处理 server.log 等 log 开头的文件 | |
| shutil.copy2(entry, os.path.join(dest_dir, entry)) | |
| # 复制所有 log* 目录或文件 | |
| for entry in glob.glob("log*"): | |
| dest_path = os.path.join(dest_dir, entry) | |
| if os.path.isdir(entry): | |
| # Python 3.7 does not support dirs_exist_ok, so handle existing dirs manually | |
| if os.path.exists(dest_path): | |
| shutil.rmtree(dest_path) | |
| shutil.copytree(entry, dest_path) | |
| elif os.path.isfile(entry): | |
| # 处理 server.log 等 log 开头的文件 | |
| shutil.copy2(entry, dest_path) |
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里通过 item.fspath 获取测试文件名。item.fspath 在较新的 pytest 版本中已被弃用(趋向使用 item.path / pathlib.Path),未来升级 pytest 可能导致属性不存在,从而让日志归档 hook 失效。建议改用 pytest 推荐的路径属性(并在需要时兼容旧版本)。
| test_file = os.path.basename(item.fspath) | |
| # Prefer pytest's newer `item.path` API and fall back to `item.fspath` for older versions | |
| test_path = getattr(item, "path", None) | |
| if test_path is None: | |
| test_path = getattr(item, "fspath", "") | |
| test_file = os.path.basename(str(test_path)) |
Copilot
AI
Mar 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
归档目录名目前仅使用测试文件名(不含 .py)。如果同一文件内存在多个测试函数/参数化用例,日志会被反复覆盖或混合,定位问题会变困难。建议使用 item.nodeid(做路径/非法字符替换)或包含测试函数名的更细粒度标识作为目录名。
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前 hook 在每个用例 call 阶段都会归档日志(无论成功/失败),可能导致 CI 额外 I/O 和产物体积显著增加。若目的是排查失败,建议仅在 report.failed(或 report.outcome != "passed")时再触发归档。
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里捕获了所有异常后直接 pass,如果归档失败会被静默吞掉,后续排障很难定位原因。建议至少打印 warning/异常信息(或用 pytest 的 terminalreporter/logging 记录),并考虑只捕获预期异常类型。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该 workflow 里使用 actions/upload-artifact@v6,但仓库内同类 XPU workflow(例如 _xpu_8cards_case_test.yml)使用的是 @v4。建议统一到同一主版本(并确认所选版本在当前 GitHub Actions 上可用),避免不同 workflow 行为不一致或因版本不可用导致上传步骤失败。