Skip to content

Conversation

@cjac
Copy link
Contributor

@cjac cjac commented Oct 7, 2025

This PR significantly enhances support for installing GPU drivers
in environments requiring an HTTP/S proxy for internet access.
The set_proxy function now comprehensively configures APT, DNF,
GPG, conda and Java to use the proxy defined in instance metadata.

If a proxy certificate is provided via http-proxy-pem-uri, it's
added to the OS and Java trust stores, and tooling is configured
to use HTTPS for the proxy. This enables driver downloads and
package manager operations in locked-down VPCs where all egress
is via an SWP or other proxy.

Additional changes:

  • configure_dkms_certs is now more idempotent, skipping key
    generation if files exist.

  • Spark RAPIDS versions and repository URL aligned with
    spark-rapids/spark-rapids.sh to work towards a single
    GPU/RAPIDS installation script.

This work was validated in a development environment where all
internet egress was forced through a proxy.

@cjac cjac self-assigned this Oct 7, 2025
@cjac
Copy link
Contributor Author

cjac commented Oct 7, 2025

/gcbrun

@cjac cjac force-pushed the proxy-support-2025 branch 2 times, most recently from f80a3fe to 7390639 Compare October 8, 2025 02:33
@cjac
Copy link
Contributor Author

cjac commented Oct 8, 2025

/gcbrun

@cjac cjac force-pushed the proxy-support-2025 branch from 7390639 to 97ebf9d Compare October 10, 2025 01:36
@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

/gcbrun

@cjac cjac force-pushed the proxy-support-2025 branch 2 times, most recently from a8c249d to 4b10ff1 Compare October 10, 2025 02:05
@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

/gcbrun

1 similar comment
@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

/gcbrun

@cjac cjac requested a review from dilipgodhia October 10, 2025 07:05
@cjac cjac marked this pull request as ready for review October 10, 2025 15:46
@cjac cjac force-pushed the proxy-support-2025 branch from 4b10ff1 to f793009 Compare October 10, 2025 19:21
@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

/gcbrun

@cjac cjac force-pushed the proxy-support-2025 branch from f793009 to c4e1237 Compare October 10, 2025 19:53
@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

/gcbrun

@cjac cjac force-pushed the proxy-support-2025 branch from c4e1237 to b819581 Compare October 10, 2025 23:31
@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

I've tried to make the build a bit more reliable.

@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

/gcbrun

@cjac cjac force-pushed the proxy-support-2025 branch from b819581 to 09329ea Compare October 10, 2025 23:37
@cjac
Copy link
Contributor Author

cjac commented Oct 10, 2025

/gcbrun

This PR introduces comprehensive HTTP/S proxy support for the GPU
driver installation script, enabling its use in environments with
restricted internet egress, such as those using Secure Web Proxy.

The `set_proxy` function, controlled by the `http-proxy` and new
`http-proxy-pem-uri` metadata attributes, now configures APT, GPG,
Java, pip, and Conda to route traffic through the specified proxy. If a
PEM certificate URI is provided, the certificate is installed into the
OS, Conda, and Java trust stores. The script now correctly handles
the proxy scheme (HTTP vs HTTPS) based on the presence of the
`http-proxy-pem-uri` metadata.

This change was validated in a development environment where all
internet access was routed through an explicit proxy.

Additional changes:

- `README.md` updated to document the new `http-proxy-pem-uri`
  metadata option and clarify `http-proxy` usage.
- GCS caching for the NVIDIA driver is checked earlier to avoid
  unnecessary HEAD requests to the NVIDIA CDN.
- `configure_dkms_certs` is now more idempotent.
- Spark RAPIDS versions and repository URL aligned with
  `spark-rapids/spark-rapids.sh` as part of a move towards a unified
  GPU/RAPIDS installation script.
- Switched to using `/sys/bus/pci/devices/*/uevent` for GPU detection
  to remove dependency on pciutils
- Moved `set_proxy` call earlier in `prepare_to_install`.
- Refactored `no_proxy` and `nvcc_gencode` list generation.

fix(ci): Add retry logic to kubectl logs in presubmit

- Wrapped `kubectl logs` command in `run-presubmit-on-k8s.sh` with a
  retry loop to handle transient "No agent available" errors from GKE.
@cjac cjac force-pushed the proxy-support-2025 branch from 09329ea to fe3f54a Compare October 11, 2025 00:02
@cjac cjac merged commit 1d9d3e6 into GoogleCloudDataproc:main Oct 11, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants