Skip to content

Track bundle deploy state file sizes in telemetry#5180

Draft
shreyas-goenka wants to merge 1 commit intodatabricks:mainfrom
shreyas-goenka:shreyas-goenka/state-size-telemetry
Draft

Track bundle deploy state file sizes in telemetry#5180
shreyas-goenka wants to merge 1 commit intodatabricks:mainfrom
shreyas-goenka:shreyas-goenka/state-size-telemetry

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka commented May 5, 2026

Adds a new typed BundleResourcesMetadata struct under BundleDeployExperimental capturing per-resource-type metadata for a bundle deploy:

  • count of resources of each type declared in the bundle configuration (replaces the deprecated resource_*_count fields)
  • max, mean, median state size in bytes across resources of that type
  • whole state file size on disk
  • deployment engine ("direct" or "terraform")

For Terraform deploys, the tfstate is translated to the direct-engine representation before sizing so per-type stats are comparable across engines.

Companion proto PR: https://github.com/databricks-eng/universe/pull/1892380

The deprecated Resource*Count Go fields keep being populated during the transition and are marked // Deprecated:. Measurement is isolated in bundle/phases/resources_metadata.go (one read of the on-disk state file at telemetry-emission time). One-line wiring in telemetry.go. To remove: delete the new module + revert the call site + revert the proto/Go field.

This pull request and its description were written by Isaac.

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/state-size-telemetry branch 2 times, most recently from a81fff3 to 8a52a00 Compare May 5, 2026 06:17
Adds a new typed BundleResourcesMetadata struct under
BundleDeployExperimental, capturing per-resource-type metadata for a
bundle deploy:
- count of resources of each type declared in the bundle configuration
- max, mean, median state size in bytes across resources of that type
- whole state file size on disk
- deployment engine ("direct" or "terraform")

For Terraform deployments the tfstate is translated to the direct-
engine representation (via the existing TerraformToGroupName map) before
sizing so per-type stats are comparable across engines.

The new count field replaces the deprecated DatabricksBundleDeployEvent
.resource_*_count fields; both are populated during the transition.
The Go mirror marks the deprecated Resource*Count fields with a
"// Deprecated:" comment.

Measurement is performed at telemetry-emission time by reading the
on-disk state file once, so this lands as a single isolated module
(bundle/phases/resources_metadata.go) with one new line at the call
site — no instrumentation in deploy mutators, state-mgmt code, or
bundle.Metrics. To remove: delete the new module and revert one line
in telemetry.go plus the proto/Go field.

Requires the new resources_metadata field on BundleDeployExperimental
from the universe PR. Lumberjack drops unknown fields, so the two PRs
can land in either order.
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/state-size-telemetry branch from 8a52a00 to 2ddb2e5 Compare May 5, 2026 06:31
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

An authorized user can trigger integration tests manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 5180
  • Commit SHA: 2ddb2e5a4ffe714958f97987228515a192f4da02

Checks will be approved automatically on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant