Skip to content

Bootstrap fails silently when disk space is low #97

@bussyjd

Description

@bussyjd

Description

The bootstrap process times out with a generic "pods did not become ready" error when the host system has low disk space (>85% full). Kubernetes detects disk pressure and taints the node, preventing all pods from scheduling.

Current Behavior

When running obol bootstrap or obolup.sh on a system with low disk space:

  1. Cluster creates successfully
  2. Infrastructure deployment appears to complete
  3. Bootstrap waits for pods to become ready
  4. Times out after 5-10 minutes with: cluster readiness check failed: pods did not become ready within timeout
  5. All application pods stuck in Pending state with no clear indication of the root cause

Expected Behavior

The bootstrap process should detect disk pressure and provide a clear, actionable error message such as:

  • "Insufficient disk space detected. Please free up disk space and try again."
  • "Kubernetes node has disk pressure. X GB of free space required."

Steps to Reproduce

  1. Fill disk to >85% capacity
  2. Run ./obolup.sh or obol bootstrap
  3. Observe timeout error without clear indication of disk issue

Environment

  • macOS with Docker Desktop
  • Disk usage: 92% (72GB free out of 926GB)

Additional Context

The issue is not immediately visible because:

  • Docker prune can free up 80GB+ of space
  • kubectl describe node shows DiskPressure: True but bootstrap doesn't check this
  • Pod events show taint-related scheduling failures but aren't surfaced to user

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions