Skip to content

feat: adds pdbs to workbench sessions#67

Draft
amdove wants to merge 1 commit intomainfrom
session-pdbs
Draft

feat: adds pdbs to workbench sessions#67
amdove wants to merge 1 commit intomainfrom
session-pdbs

Conversation

@amdove
Copy link
Contributor

@amdove amdove commented Feb 3, 2026

Adds Pod Disruption Budgets (PDBs) for Workbench session pods to prevent user sessions from being evicted during node drains or cluster maintenance.

Problem

Previously, only Workbench server pods had PDB protection. Session pods (the user's R/Python sessions created by Launcher) had no eviction protection. During node drains or cluster scaling, Kubernetes could terminate active user sessions without warning.

Solution

Creates a second PDB for each Workbench spec that targets session pods:

  • Name: -workbench-sessions
  • Selector: launcher-instance-id: (label set by Launcher on all session pods)
  • Policy: maxUnavailable: 0 (no sessions can be evicted)

This ensures user sessions remain persistent and uninterruptible during cluster operations.

@claude
Copy link

claude bot commented Feb 3, 2026

Claude finished @amdove's task —— View job


Review Complete

  • Read review guidelines
  • Analyze PR diff and changed files
  • Review code changes
  • Submit review

I've submitted a review with feedback on the session PDB implementation. Key suggestions:

  1. Extract launcher-instance-id as a constant since it's an external contract
  2. Document the operational implications of maxUnavailable=0 blocking node drains
  3. Add test coverage for the new session PDB creation

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Session PDBs for Workbench

Summary

This PR refactors CreateOrUpdateDisruptionBudget to accept a pre-built PDB spec and adds a new session PDB for workbench pods. The change improves separation of concerns and adds protection for user sessions during node drains.

Positive Aspects

  • The refactored CreateOrUpdateDisruptionBudget signature is cleaner - separating PDB construction from persistence improves testability
  • The new DefineSessionDisruptionBudget function is well-documented and follows existing patterns
  • Consistent updates across all three product controllers (Connect, PackageManager, Workbench)

Suggestions

api/product/disruption_budget.go:61-63
Consider extracting the launcher-instance-id label key as a constant since it represents an external contract with the launcher:

const LauncherInstanceIDLabel = "launcher-instance-id"

api/product/disruption_budget.go:66-68
maxUnavailable=0 will block node drains indefinitely until sessions complete. This is likely intentional for session persistence, but worth documenting in operator docs that cluster admins need to ensure sessions have reasonable timeouts or plan for manual intervention during maintenance windows.

internal/controller/core/workbench.go:1006-1010
Per the controller review checklist: tests should exist for this change. Consider adding a test case to workbench_test.go to verify the session PDB is created with:

  • The expected name pattern ({component}-sessions)
  • The correct selector (launcher-instance-id: {component})
  • maxUnavailable=0

Checklist Review

  • Reconciliation appears idempotent (uses CreateOrUpdateResource)
  • Config flows correctly from Workbench
  • Tests for the new session PDB (suggested above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant