Add GPU E2E stage to Linux VHD builder pipeline#8138
Add GPU E2E stage to Linux VHD builder pipeline#8138ganeshkumarashok wants to merge 1 commit intomainfrom
Conversation
With the standalone GPU E2E pipeline triggers disabled in #8135, GPU E2E tests need to run as part of the VHD build pipeline. This adds a gpu_e2e stage that runs after the build stage, matching the configuration from the standalone e2e-gpu.yaml.
PR Title Lint Failed ❌Current Title: Your PR title doesn't follow the expected format. Please update your PR title to follow one of these patterns: Conventional Commits Format:
Guidelines:
Examples:
Please update your PR title and the lint check will run again automatically. |
There was a problem hiding this comment.
Pull request overview
Adds GPU E2E coverage back into the Linux VHD build pipeline by introducing a dedicated gpu_e2e stage that runs after the build stage, in parallel with existing E2E stages—ensuring GPU scenarios continue to execute even with standalone GPU E2E pipeline triggers disabled.
Changes:
- Add a new
gpu_e2estage to.pipelines/.vsts-vhd-builder.yaml. - Configure the stage to run GPU-tagged Linux scenarios (
TAGS_TO_RUN=gpu=true) with a custom timeout and capacity-skip behavior. - Run the stage in parallel with existing
e2eandscriptless_cse_cmd_e2estages (all depend onbuild).
| - stage: gpu_e2e | ||
| dependsOn: build | ||
| condition: and(succeeded(), ne(variables.SKIP_E2E_TESTS, 'true')) | ||
| variables: | ||
| VHD_BUILD_ID: $(Build.BuildId) | ||
| TAGS_TO_RUN: "gpu=true" | ||
| TAGS_TO_SKIP: "os=windows" | ||
| SKIP_TESTS_WITH_SKU_CAPACITY_ISSUE: "true" | ||
| E2E_GO_TEST_TIMEOUT: "75m" | ||
| jobs: | ||
| - template: ./templates/e2e-template.yaml | ||
| parameters: | ||
| name: Linux GPU Tests | ||
| IgnoreScenariosWithMissingVhd: true | ||
|
|
There was a problem hiding this comment.
Adding gpu_e2e makes a 3rd parallel stage using templates/e2e-template.yaml. That template publishes an artifact named $(LOGGING_DIR), and .pipelines/scripts/e2e_run.sh sets LOGGING_DIR using date +%s (seconds resolution). Parallel jobs that start within the same second can end up with the same artifact name and cause intermittent pipeline failures when publishing artifacts. Consider making the log/artifact name deterministically unique per job/stage (e.g., include stage/job ID or use higher-resolution time) so concurrent E2E stages can’t collide.
| condition: and(succeeded(), ne(variables.SKIP_E2E_TESTS, 'true')) | ||
| variables: | ||
| VHD_BUILD_ID: $(Build.BuildId) | ||
| TAGS_TO_RUN: "gpu=true" |
There was a problem hiding this comment.
Lets pick some few cases of GPU to run, tag them with GPU_Basic or something like that and run it part of PR
Summary
gpu_e2estage to the Linux VHD builder pipeline (.vsts-vhd-builder.yaml) that runs GPU E2E tests after VHD build completese2eandscriptless_cse_cmd_e2estages, all depending on thebuildstageTest plan
gpu_e2estage afterbuildsucceeds