Skip to content

Add GPU E2E stage to Linux VHD builder pipeline#8138

Open
ganeshkumarashok wants to merge 1 commit intomainfrom
aganeshkumar/gpu-e2e-post-vhd-build
Open

Add GPU E2E stage to Linux VHD builder pipeline#8138
ganeshkumarashok wants to merge 1 commit intomainfrom
aganeshkumar/gpu-e2e-post-vhd-build

Conversation

@ganeshkumarashok
Copy link
Contributor

Summary

  • Adds a gpu_e2e stage to the Linux VHD builder pipeline (.vsts-vhd-builder.yaml) that runs GPU E2E tests after VHD build completes
  • With standalone GPU E2E pipeline triggers disabled in fix: disable automatic e2e pipeline triggers #8135, this ensures GPU E2E tests continue to run as part of the VHD build flow
  • The new stage runs in parallel with the existing e2e and scriptless_cse_cmd_e2e stages, all depending on the build stage

Test plan

  • Verify the VHD builder pipeline runs the new gpu_e2e stage after build succeeds
  • Confirm GPU-tagged E2E scenarios execute with the freshly built VHDs
  • Confirm non-GPU E2E stages are unaffected

With the standalone GPU E2E pipeline triggers disabled in #8135,
GPU E2E tests need to run as part of the VHD build pipeline.
This adds a gpu_e2e stage that runs after the build stage,
matching the configuration from the standalone e2e-gpu.yaml.
@github-actions
Copy link
Contributor

PR Title Lint Failed ❌

Current Title: Add GPU E2E stage to Linux VHD builder pipeline

Your PR title doesn't follow the expected format. Please update your PR title to follow one of these patterns:

Conventional Commits Format:

  • feat: add new feature - for new features
  • fix: resolve bug in component - for bug fixes
  • docs: update README - for documentation changes
  • refactor: improve code structure - for refactoring
  • test: add unit tests - for test additions
  • chore: remove dead code - for maintenance tasks
  • chore(deps): update dependencies - for updating dependencies
  • ci: update build pipeline - for CI/CD changes

Guidelines:

  • Use lowercase for the type and description
  • Keep the description concise but descriptive
  • Use imperative mood (e.g., "add" not "adds" or "added")
  • Don't end with a period

Examples:

  • feat(windows): add secure TLS bootstrapping for Windows nodes
  • fix: resolve kubelet certificate rotation issue
  • docs: update installation guide
  • Added new feature
  • Fix bug.
  • Update docs

Please update your PR title and the lint check will run again automatically.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GPU E2E coverage back into the Linux VHD build pipeline by introducing a dedicated gpu_e2e stage that runs after the build stage, in parallel with existing E2E stages—ensuring GPU scenarios continue to execute even with standalone GPU E2E pipeline triggers disabled.

Changes:

  • Add a new gpu_e2e stage to .pipelines/.vsts-vhd-builder.yaml.
  • Configure the stage to run GPU-tagged Linux scenarios (TAGS_TO_RUN=gpu=true) with a custom timeout and capacity-skip behavior.
  • Run the stage in parallel with existing e2e and scriptless_cse_cmd_e2e stages (all depend on build).

Comment on lines +211 to +225
- stage: gpu_e2e
dependsOn: build
condition: and(succeeded(), ne(variables.SKIP_E2E_TESTS, 'true'))
variables:
VHD_BUILD_ID: $(Build.BuildId)
TAGS_TO_RUN: "gpu=true"
TAGS_TO_SKIP: "os=windows"
SKIP_TESTS_WITH_SKU_CAPACITY_ISSUE: "true"
E2E_GO_TEST_TIMEOUT: "75m"
jobs:
- template: ./templates/e2e-template.yaml
parameters:
name: Linux GPU Tests
IgnoreScenariosWithMissingVhd: true

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding gpu_e2e makes a 3rd parallel stage using templates/e2e-template.yaml. That template publishes an artifact named $(LOGGING_DIR), and .pipelines/scripts/e2e_run.sh sets LOGGING_DIR using date +%s (seconds resolution). Parallel jobs that start within the same second can end up with the same artifact name and cause intermittent pipeline failures when publishing artifacts. Consider making the log/artifact name deterministically unique per job/stage (e.g., include stage/job ID or use higher-resolution time) so concurrent E2E stages can’t collide.

Copilot uses AI. Check for mistakes.
condition: and(succeeded(), ne(variables.SKIP_E2E_TESTS, 'true'))
variables:
VHD_BUILD_ID: $(Build.BuildId)
TAGS_TO_RUN: "gpu=true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets pick some few cases of GPU to run, tag them with GPU_Basic or something like that and run it part of PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants