Skip to content

Conversation

@PeterSu92
Copy link

This just adds a couple of fixes to get some more tests to pass - right now, 5 tests are still failing using the google-batch provider, but these also fail on main

The following summary was generated with the help of Claude Code:

Overview

Branch: 0.5.2_dev
Provider: google-batch
Environment: Jupyter on Workbench (VPC-SC)

Results: 38 passed, 5 failed (out of 43 tests)


Failing Tests (5)

Important: All 5 failing tests also failed on 0.5.1/main branch. These are not regressions from 0.5.2 changes.

Test Error Root Cause Also Failed on 0.5.1?
e2e_image.sh ValueError: --use-private-address must specify a --image with a gcr.io or pkg.dev host Uses Docker Hub images (bash:4.4, python:2-slim, python:3-slim) which can't be pulled with --use-private-address in VPC-SC environments Yes - expected VPC-SC limitation
e2e_verify_failure_log.sh dstat_output: unbound variable (line 46) Test script has a missing variable declaration Yes - pre-existing test bug
e2e_python_api.py JobExecutionError Python API test job failed during execution Yes - documented in TEST_FAILURE_ANALYSIS.md
e2e_non_root.sh JobExecutionError Tests running as non-root user failed Yes - documented in TEST_FAILURE_ANALYSIS.md
e2e_requester_pays_buckets.sh JobExecutionError Tests requester-pays bucket access Yes - documented in TEST_FAILURE_ANALYSIS.md

Test Fixes Applied in This Branch

1. USER Variable Fix

Several tests failed with USER: unbound variable in bash strict mode. Fixed by setting default:

Files modified:

  • test/integration/io_setup.sh - ${USER}${USER:-jupyter}
  • test/integration/io_tasks_setup.sh - ${USER}${USER:-jupyter}
  • test/integration/e2e_logging_paths.sh - ${USER}${USER:-jupyter}
  • test/integration/e2e_logging_paths_pattern_tasks.sh - ${USER}${USER:-jupyter}
  • test/integration/test_setup.sh - Added export USER="${USER:-jupyter}"

2. Python Test Setup Fix

Python tests (e2e_*.py) were missing --service-account and had hardcoded network settings.

File modified: test/integration/test_setup_e2e.py

  • Added support for GPU_NETWORK, GPU_SUBNETWORK, LOCATION, PET_SA_EMAIL environment variables
  • Added --service-account flag when PET_SA_EMAIL is set

3. Shell Test Setup Fix

Shell tests had hardcoded network settings that didn't work in VPC-SC.

File modified: test/integration/test_setup.sh

  • Uses GPU_NETWORK, GPU_SUBNETWORK, PET_SA_EMAIL, LOCATION environment variables
  • Falls back to sensible defaults for non-VPC-SC environments

VPC-SC Limitations

The following tests are expected to fail in VPC-SC environments:

  1. e2e_image.sh - Uses public Docker Hub images which cannot be pulled when --use-private-address is enabled. VPC-SC environments block access to public container registries.

  2. Any test using non-gcr.io/pkg.dev images - The --use-private-address flag requires images from Google Container Registry (gcr.io) or Artifact Registry (pkg.dev).


Comparison to Previous Runs

| Run | Total | Passed | Failed | Notes |
| After cleanup fix | 43 | 38 | 5 | Current state |

Current explanations for the failing tests:

  1. e2e_verify_failure_log.sh - Fix the unbound variable bug (pre-existing issue, separate PR)

  2. e2e_image.sh - Document as expected failure in VPC-SC, or mirror test images to GAR

  3. Job execution failures (e2e_python_api.py, e2e_non_root.sh, e2e_requester_pays_buckets.sh) - Investigate root cause, but these appear to be environmental rather than code issues

jupyter if unset. test_setup.sh updated to use env variables rather than hardcoded network settings

echo " Checking user-id"
util::dstat_yaml_assert_field_equal "${dstat_output}" "[0].user-id" "${USER}"
util::dstat_yaml_assert_field_equal "${dstat_output}" "[0].user-id" "${USER:-jupyter}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable for io_setup.sh to set a variable, like JOB_USER in the other test scripts, and then reference it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated that here as well as in io_tasks_setup.sh

…w integration test files and use that directly
@PeterSu92
Copy link
Author

I also verified that the tests pass in the AoU app; the only failing tests were the same 5 that failed outside of the perimeter plus e2e_accelerator.google-batch.sh, which we expect to fail due to not setting a boot disk image with the GPU drivers (this case is accounted for by e2e_accelerator_vpc_sc.google-batch.sh)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants