Skip to content

Add workspace ID validation to storage proxy probe#1324

Merged
parthban-db merged 2 commits intomainfrom
parthban-db/stack/refactor-files-client-5
Mar 13, 2026
Merged

Add workspace ID validation to storage proxy probe#1324
parthban-db merged 2 commits intomainfrom
parthban-db/stack/refactor-files-client-5

Conversation

@parthban-db
Copy link
Contributor

@parthban-db parthban-db commented Mar 11, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Summary

Adds workspace ID validation to the storage proxy probe so the proxy is only used when it can serve the target workspace. Prevents false positives when a WorkspaceClient is configured to point at a different workspace than the one hosting the data plane.

Why

The previous probe (GET /DatabricksInternal/Probes/ping) only checked if the proxy was reachable. It didn't verify whether the proxy could serve the specific workspace the client is targeting. This meant a WorkspaceClient configured with a different workspace's host and token could incorrectly route uploads through the local proxy, which cannot serve that workspace.

What changed

Interface changes

None.

Behavioral changes

  • The probe now resolves the workspace ID via the SCIM Me endpoint (GET /api/2.0/preview/scim/v2/Me) and passes it as ?ew=<workspace_id> to the probe URL. The proxy returns 200 only if it can serve that workspace, 403 otherwise.
  • If the workspace ID cannot be resolved (e.g., auth fails on the target host), the probe is skipped and the SDK falls back to presigned URLs.

Internal changes

  • _resolve_workspace_id() — new method that calls SCIM Me to get the workspace ID from the X-Databricks-Org-Id response header.
  • _probe_storage_proxy() — updated to resolve workspace ID first, skip if unavailable, use HEAD method, and pass ew parameter.
  • Test mock updated to handle the SCIM Me endpoint and assert the ew parameter is present in the probe URL.

How is this tested?

  • Unit tests
  • E2E validated on 5 paths:
    • AWS proxy (on DP): SUCCESS — probe HEAD with ew returns 200, _StorageProxyRequestBuilder used, 200 MB hash verified.
    • GCP proxy (on DP): SUCCESS — same.
    • AWS presigned (off DP): SUCCESS — flag disabled, _PresignedUrlRequestBuilder used, 200 MB hash verified.
    • GCP presigned (off DP): SUCCESS — same.
    • Cross-workspace (AWS DP, GCP token/host): SUCCESS — SCIM Me resolves GCP workspace ID, probe sends ew=<gcp_id> to local AWS proxy, proxy returns 403, correctly falls back to presigned URLs.

NO_CHANGELOG=true

github-merge-queue bot pushed a commit that referenced this pull request Mar 11, 2026
…1295)

## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1295/files)
to review incremental changes.
-
[**stack/refactor-files-client-3**](#1295)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1295/files)]
-
[stack/refactor-files-client-4](#1319)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1319/files/9e504b191ad065bf8adbeb003fa493ba51bcc165..8218fc4ae84ff4b9ec2f838b26faa0c3a957f2e5)]
-
[stack/refactor-files-client-5](#1324)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1324/files/8218fc4ae84ff4b9ec2f838b26faa0c3a957f2e5..2c4f3e63287e86b0984a4dfb323547cdaacfa058)]

---------
## Summary

Extracts the presigned URL coordination logic (`create-upload-part-urls`
and `create-resumable-upload-url` API calls and response parsing) into a
dedicated `_PresignedUrlRequestBuilder` class, replacing inline code in
three upload methods.

## Why

The `create-upload-part-urls` API call, response validation, and header
parsing was duplicated across `_do_upload_one_part`,
`_perform_multipart_upload`, and `_perform_resumable_upload` (with a
similar pattern for `create-resumable-upload-url`). Each copy built the
request body, called `_api.do()`, validated the response structure, and
converted the headers list to a dict — all inline. This made the upload
methods longer than necessary and meant any change to the coordination
logic required updating multiple places.

`_PresignedUrlRequestBuilder` consolidates this into a single class. It
also prepares for storage-proxy routing (#1278), where a different
builder implementation can construct URLs directly instead of calling
the presigned URL APIs.

## What changed

### Interface changes

None. All changes are to private methods.

### Behavioral changes

- Malformed presigned URL responses (`ValueError`/`KeyError` from
response parsing) now trigger fallback to single-shot upload on the
first part, instead of propagating as hard errors. Previously the
`_api.do()` call was wrapped in `try/except` but the response parsing
was outside it. Now both are encapsulated in the builder, so parsing
errors are also caught. This is more resilient — a broken coordination
response should not abort the upload when a simpler path exists.

### Internal changes

- **`_PresignedUrl`** — new dataclass holding a resolved presigned URL
and its associated headers.
- **`_PresignedUrlRequestBuilder`** — new class with
`build_upload_part_urls()` and `build_resumable_upload_url()`.
Encapsulates the coordination API calls and response parsing.
- **`_do_upload_one_part`** — replaced inline `create-upload-part-urls`
call and response parsing with `builder.build_upload_part_urls(...,
count=1)[0]`.
- **`_perform_multipart_upload`** — replaced inline
`create-upload-part-urls` batch call and response parsing with
`builder.build_upload_part_urls(..., count=batch_size)`.
- **`_perform_resumable_upload`** — replaced inline
`create-resumable-upload-url` call and response parsing with
`builder.build_resumable_upload_url()`.

## How is this tested?

Unit tests.

NO_CHANGELOG=true
@parthban-db parthban-db force-pushed the parthban-db/stack/refactor-files-client-5 branch from 2c4f3e6 to 93372a6 Compare March 11, 2026 22:38
@parthban-db parthban-db force-pushed the parthban-db/stack/refactor-files-client-5 branch from 93372a6 to 14e1367 Compare March 12, 2026 22:20
@parthban-db parthban-db requested a review from rauchy March 13, 2026 09:44
@parthban-db parthban-db marked this pull request as ready for review March 13, 2026 09:45
github-merge-queue bot pushed a commit that referenced this pull request Mar 13, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1319/files)
to review incremental changes.
-
[**stack/refactor-files-client-4**](#1319)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1319/files)]
-
[stack/refactor-files-client-5](#1324)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1324/files/85824ec07a796f348263077a05c5339a190f1131..14e13676c5e12adff8f68d5f30aca295a423acdd)]
-
[stack/refactor-files-client-6](#1329)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1329/files/14e13676c5e12adff8f68d5f30aca295a423acdd..80c211ffa59afea71f5d01697d95381e0d126865)]

---------
## Summary

Adds optional storage-proxy routing for file uploads, gated behind the
`experimental_files_ext_enable_storage_proxy` config flag. When enabled
and the proxy is reachable, uploads bypass the presigned URL
coordination APIs and go directly to the storage proxy.

## Why

When running inside a Databricks cluster or notebook, a storage proxy is
available on the data plane. It can handle file upload operations
directly.

The feature is disabled by default and marked experimental, so there is
no impact on existing users.

## What changed

### Interface changes

One new config field:
- **`experimental_files_ext_enable_storage_proxy: bool = False`** —
enables the storage proxy path.

### Behavioral changes

- When the flag is enabled and the probe succeeds, multipart uploads
(AWS/Azure) and resumable uploads (GCP) go directly to the proxy instead
of fetching presigned URLs from coordination APIs.
- The proxy requires SDK authentication on every request, so an
authenticated session (`session.auth` callback) is used instead of the
unauthenticated session used for presigned URL uploads.
- `build_abort_url` was added to `_PresignedUrlRequestBuilder`,
extracting inline abort URL logic from `_abort_multipart_upload`.

### Internal changes

- **`_StorageProxyRequestBuilder`** — new class with the same method
signatures as `_PresignedUrlRequestBuilder` (`build_upload_part_urls`,
`build_resumable_upload_url`, `build_abort_url`). Constructs URLs
directly to the proxy endpoint instead of calling coordination APIs.
- **`_create_request_builder()`** — factory method that returns the
proxy builder when the proxy is available, otherwise the presigned URL
builder.
- **`_probe_storage_proxy()`** — GET to the proxy ping endpoint with SDK
auth. Result is cached.
- **`_create_storage_proxy_session()`** — cached `requests.Session` with
`session.auth` callback for SDK credentials. Protected by a lock for
thread safety.
- **`_cloud_provider_session()`** — updated to return the authenticated
proxy session when the proxy is active. Session creation protected by a
lock for thread safety.
- **`_get_hostname()`** — updated to return the proxy hostname when the
flag is enabled and the probe succeeds.
- **`_abort_multipart_upload`** — now uses the builder instead of inline
`create-abort-upload-url` API call.
- Test infrastructure: `UploadTestCase` gains `use_storage_proxy`
parameter. Storage proxy test cases added to existing
`MultipartUploadTestCase` and `ResumableUploadTestCase` parametrized
lists. Presigned URL coordination handlers assert they are never called
when storage proxy is active.

## How is this tested?

- Unit tests.
- E2E validated on two cloud environments with 200 MB file uploads:
- **AWS** (AWS staging): multipart upload via proxy, 20 parallel parts,
hash verified.
- **GCP** (GCP staging): resumable upload via proxy, 20 sequential
chunks, hash verified.
- Both environments confirmed: `_StorageProxyRequestBuilder` used, no
`create-upload-part-urls`/`create-resumable-upload-url` calls,
`session.auth` callback active, uploaded and downloaded hashes match.


NO_CHANGELOG=true
@parthban-db parthban-db force-pushed the parthban-db/stack/refactor-files-client-5 branch from 14e1367 to 214368b Compare March 13, 2026 09:58
@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1324
  • Commit SHA: 214368bb27264d3f66156799b719bf75adac6911

Checks will be approved automatically on success.

@parthban-db parthban-db added this pull request to the merge queue Mar 13, 2026
Merged via the queue into main with commit fa94f57 Mar 13, 2026
17 of 18 checks passed
@parthban-db parthban-db deleted the parthban-db/stack/refactor-files-client-5 branch March 13, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants