Skip to content

Conversation

@tshtark
Copy link
Contributor

@tshtark tshtark commented Dec 11, 2025

Summary

Adds support for user-controlled build IDs via spec.workerOptions.customBuildID, enabling rolling updates for non-workflow code changes while preserving new deployment creation for workflow code changes.

Problem

With PINNED versioning strategy and long-running workflows, any pod spec change (image tag, env vars, resources) generates a new build ID, causing deployment proliferation.

Result: 10-15 active deployments running simultaneously, causing resource waste and operational complexity.

Solution

Allow users to set a stable build ID via spec.workerOptions.customBuildID. When the build ID is stable but pod spec changes, trigger a rolling update instead of creating a new deployment.

Key Changes

  • Add CustomBuildID field to WorkerOptions struct in API types
  • Update ComputeBuildID to use custom field when set
  • Implement hash-based drift detection using SHA256 of user-provided pod template spec
  • Store hash in temporal.io/pod-template-spec-hash annotation on deployments
  • Extract ApplyControllerPodSpecModifications as shared helper for code reuse
  • Only check for drift when customBuildID is explicitly set by user

Drift Detection

When spec.workerOptions.customBuildID is set, the controller detects spec drift by comparing a SHA256 hash of the user-provided pod template spec against the hash stored in the deployment annotation.

How it works:

  1. When a deployment is created, the controller computes a hash of the user-provided pod template spec (before controller modifications) and stores it in an annotation
  2. On each reconciliation, the controller computes the hash of the current spec and compares it to the stored hash
  3. If hashes differ, a rolling update is triggered

This approach:

  • Detects ALL changes to the pod template spec (images, env vars, commands, volumes, resources, etc.)
  • Avoids issues with cluster-provided values and timestamps
  • Maintains backwards compatibility (legacy deployments without the hash annotation are not affected)

Usage

apiVersion: temporal.io/v1alpha1
kind: TemporalWorkerDeployment
metadata:
  name: my-worker
spec:
  workerOptions:
    connectionRef:
      name: my-connection
    temporalNamespace: default
    # Set this to your workflow code hash (e.g., from CI/CD)
    customBuildID: "wf-a1b2c3d4"
  template:
    spec:
      containers:
        - name: worker
          image: my-worker:v1.2.3  # Can change without new deployment

Behavior Matrix

Scenario Build ID Result
No customBuildID field Auto-generated from image + hash Existing behavior
customBuildID set, first deploy Uses field value New deployment
customBuildID unchanged, spec changed Same build ID Rolling update
customBuildID changed New build ID New deployment
Empty/invalid customBuildID Falls back to auto-generated Existing behavior

Backwards Compatibility

  • Empty or invalid customBuildID values fall back to existing hash-based generation
  • No changes required for users who don't use this feature
  • Drift detection only runs when spec.workerOptions.customBuildID is explicitly set
  • Legacy deployments without the pod template spec hash annotation are not affected

Test plan

  • CustomBuildID spec field override (6 test cases)
  • Hash computation tests (determinism, different images/env vars/commands/volumes)
  • Drift detection with hash comparison (7 test cases)
  • Backwards compatibility for legacy deployments without hash annotation
  • Edge cases: empty values, invalid chars, long values
  • go test ./... passes

@tshtark tshtark requested review from a team and jlegrone as code owners December 11, 2025 11:44
@tshtark tshtark force-pushed the feat/stable-build-id-override branch from 4f23d55 to 6362a09 Compare December 11, 2025 11:58
Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, sorry for the late review, just getting back to this after the holidays. I'm committed to turning this PR around quickly though and I think it's really close! I'll be keeping an eye on this tomorrow so we can iterate quickly.

@tshtark tshtark force-pushed the feat/stable-build-id-override branch from 8ba2666 to 7aad356 Compare January 8, 2026 06:53
Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close to approving! The main decision I still want to make together is how to name the new field.

My review was delayed today because I wanted to write an integration test for this new functionality, which you can look at in this commit. Confirmed it works :) In the early phases of this project, features that only had unit tests and no integration tests experienced regressions, so I want to make sure that doesn't happen here. Feel free to cherry-pick my test commit onto your fork (if that can be done with a fork), or I can merge my integration test right after your PR merges.

@tshtark tshtark force-pushed the feat/stable-build-id-override branch from 2276b37 to cd68fc9 Compare January 10, 2026 13:56
@tshtark tshtark changed the title feat(api): add stable build ID override via spec.workerOptions.buildID feat(api): add stable build ID override via spec.workerOptions.customBuildID Jan 10, 2026
Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved! you'll need to run make fmt-imports to get the linter to pass.
I also changed the repo settings so the CI jobs should all run when you next push instead of requiring admin approval.

tshtark and others added 4 commits January 13, 2026 10:07
Adds support for user-controlled build IDs via spec.workerOptions.buildID,
enabling rolling updates for non-workflow code changes while preserving
new deployment creation for workflow code changes.

Key changes:
- Add BuildID field to WorkerOptions struct in API types
- Update ComputeBuildID to use spec field instead of annotation
- Implement drift detection by comparing deployed spec with desired spec
- Only check for drift when buildID is explicitly set by user

Drift detection currently monitors: replicas, minReadySeconds, container
images, container resources (limits/requests), and init container images.
Other fields (env vars, volumes, commands) are not monitored - this is
documented in the BuildID field comment.

Note: CRD regeneration includes some unrelated changes from controller-gen
(default values for name fields, x-kubernetes-map-type annotations). These
are standard regeneration artifacts and don't affect functionality.

This solves deployment proliferation for PINNED versioning strategy where
any pod spec change (image tag, env vars, resources) would generate a new
build ID and create unnecessary deployments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace per-field comparison with SHA256 hash of user-provided pod template
spec. This detects ALL changes (env vars, commands, volumes, etc.) rather
than a subset of fields.

Changes:
- Add ComputePodTemplateSpecHash() using spew for deterministic hashing
- Store hash in pod-template-spec-hash annotation on deployments
- Compare hashes instead of individual fields for drift detection
- Add backwards compatibility for legacy deployments without hash

Tests:
- 6 tests for hash computation (images, env vars, commands, volumes)
- Updated drift detection tests including env var change case
- Backwards compatibility test for deployments without hash annotation

Docs:
- Update BuildID field documentation to reflect hash-based detection
Address PR review feedback:

1. Extract shared helper ApplyControllerPodSpecModifications() to avoid
   duplicating pod spec modification code between NewDeploymentWithOwnerRef
   and updateDeploymentWithPodTemplateSpec.

2. Rename spec.workerOptions.buildID to spec.workerOptions.customBuildID
   to make it clear that providing your own build ID is optional and
   requires careful management to avoid NDEs.

Changes:
- Add ApplyControllerPodSpecModifications() in internal/k8s/deployments.go
- Update NewDeploymentWithOwnerRef to use the shared helper
- Update updateDeploymentWithPodTemplateSpec to use the shared helper
- Rename BuildID -> CustomBuildID in WorkerOptions struct
- Update all test references to use CustomBuildID
- Regenerate CRDs with new field name
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@tshtark tshtark force-pushed the feat/stable-build-id-override branch from 7951e7d to bb19ed7 Compare January 13, 2026 08:07
@tshtark
Copy link
Contributor Author

tshtark commented Jan 13, 2026

@carlydf Thanks!
But actually I see that the CI workflows are still waiting for approval to run..
Could you approve those?

@carlydf carlydf merged commit 5784f7a into temporalio:main Jan 13, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants