Skip to content

RFC: Alembic Adoption for Database Migrations in ADK-Python #4387

@achilleasatha

Description

@achilleasatha

Status: Draft

Author: @achilleasatha

Date: 2026-02-05

Related Issue: #3343 - Update database schema automatically


1. Summary

Adopt Alembic as the database migration framework for adk-python to enable automatic schema updates, migration-coordinated deployments with minimal service disruption, and proper rollback support for enterprise production environments.

2. Motivation

2.1 Current Problems

From GitHub issue #3343 and community feedback:

  1. Manual migrations are not viable for production: Users must manually execute migration scripts when upgrading ADK versions. For enterprise deployments with multiple environments and multiple ADK-based agents per environment, translates to a lot of manual work, SSHing into servers and running scripts.

  2. Breaking changes in minor versions: Schema changes in ADK 1.14.0, 1.17.0, and 1.19.0 broke deployments without clear migration paths or automation.

  3. No support for Kubernetes deployments: Cannot run migrations via Helm hooks or init containers. SSH access to every pod is not scalable or feasible.

  4. Race conditions: Multiple pods starting simultaneously could attempt migrations concurrently, causing failures. The current migration script provided does not offer a locking mechanism to protect from race conditions.

  5. No rollback capability: Current system doesn't support downgrade migrations, making failed deployments difficult to recover.

  6. Permission constraints: Service accounts may lack DDL execution permissions, requiring manual intervention.

3. Proposal

3.1 Overview

Integrate Alembic as the primary database migration framework with:

  • Automatic migrations on startup (feature-flagged with ADK_AUTO_MIGRATE_DB)
  • Distributed locking to prevent race conditions across multiple pods
  • Kubernetes Helm hook support for pre-deployment migrations
  • Rollback capability via Alembic's downgrade functionality
  • Comprehensive testing across PostgreSQL, MySQL, and SQLite (and/or any other DBs that ADK-python wants to support)

Responsibility model: ADK owns schema definition and migration logic; operators retain full control over when and how migrations are executed (via environment variables or Helm hooks).

3.2 Key Features

For Developers:

  • Generate migrations: adk migrate generate --message "add_column"
  • Alembic autogenerate compares SQLAlchemy models to database
  • Clear upgrade/downgrade paths with version tracking

For Operations:

  • Helm pre-install/pre-upgrade hooks run migrations before app deployment
  • Database locking (eg. PostgreSQL advisory locks)

For Enterprise:

  • Migration-coordinated deployments with proper synchronization
  • Audit logging of all migration events
  • Support for Cloud SQL Proxy, Workload Identity, and secret management
  • Comprehensive documentation with Helm chart examples

4. Design Principles

  1. Safe by default: Auto-migration disabled by default (ADK_AUTO_MIGRATE_DB=false) to prevent unexpected schema changes
  2. Backward compatible: Phased rollout over releases with parallel support (e.g., deprecation warnings on minor/patches, destructive operations only on major releases)
  3. Well-tested: Integration tests with real databases (PostgreSQL, MySQL, SQLite)
  4. Production-ready: Distributed locking, timeout handling, error recovery
  5. Well-documented: Migration guides, Helm examples, troubleshooting docs

5. Technical Approach

5.1 Architecture

Application Startup
        ↓
DatabaseSessionService._prepare_tables()
        ↓
Check: ADK_AUTO_MIGRATE_DB?
        ↓ (true)
AlembicRunner.check_needs_migration()
        ↓ (yes)
Acquire DatabaseMigrationLock
        ↓
Run Alembic upgrade to "head"
        ↓
Alembic updates alembic_version table
        ↓
Migration script updates adk_internal_metadata.schema_version
        ↓
Release lock

5.2 Distributed Locking

Version Tracking: Alembic uses alembic_version table to track migration revisions. ADK additionally maintains adk_internal_metadata.schema_version as a higher-level compatibility layer for existing tooling and schema detection logic.

PostgreSQL: Advisory locks (non-blocking, session-scoped)

SELECT pg_try_advisory_lock(1234567890);

MySQL/SQLite: Table-based locks with expiration

  • Lock expires after 5 minutes (default, configurable via ADK_MIGRATION_TIMEOUT)
  • Automatic cleanup of stale locks
  • Note: Best-effort locking to prevent accidental concurrent migrations. Not a strict consensus mechanism. Clock skew or stale locks may affect correctness in edge cases.

SQLite: Migrations assume single-writer deployments (e.g., local development)

Behavior:

  • Instance A acquires lock → runs migration
  • Instances B-E wait → poll for completion every 2 seconds
  • All instances verify schema after migration completes

When is locking used?

Deployment Mode Locking Needed? Reason
Helm hook (K8s) ❌ No Single Job pod runs migration before app pods start - sequential by design
Auto-migration (ADK_AUTO_MIGRATE_DB=true) ✅ Yes Multiple app instances may start simultaneously and call _prepare_tables()

Use cases requiring locks:

  • Cloud Run: Multiple instances scaling up simultaneously with shared Cloud SQL
  • Docker Compose: Multiple service replicas sharing a database
  • K8s without Helm hooks: Deploying with ADK_AUTO_MIGRATE_DB=true on multiple pods
  • Local development: Running multiple instances for testing

5.3 Kubernetes Integration

Migration Job (Helm pre-install/pre-upgrade hook):

apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-weight": "-5"
spec:
  template:
    spec:
      containers:
      - name: migration
        command: ["python", "-m", "google.adk.cli.migration_entrypoint"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: adk-db-secret
              key: url

Application Deployment:

env:
- name: ADK_AUTO_MIGRATE_DB
  value: "false"  # Disabled when using migration Job

6. Migration Workflow

6.1 For Developers

Creating a new migration:

# 1. Generate migration template for session database schema
adk migrate generate --message "add_session_tags"

# 2. Review auto-generated migration file
# alembic/versions/004_add_session_tags_v2.py

# 3. Customize if needed (data transformations, etc.)

# 4. Test migration
pytest tests/unittests/sessions/migration/

# 5. Commit migration file
git add alembic/versions/004_add_session_tags_v2.py
git commit -m "feat(migration): add session tags column"

Applying migration:

# Local development (applies to session database schema)
adk migrate session

# Production (via Helm)
helm upgrade my-app ./chart --set image.tag=v1.26.0
# Migration runs automatically via pre-upgrade hook

6.2 For Operations

Migration happens automatically in the library - users don't write migration code or operate migration scripts. There are two deployment modes:

Mode 1: Helm Pre-Upgrade Hook (Recommended for Production/Kubernetes)

# Migration runs as separate Job BEFORE app pods start
# See section 5.3 for full example
# In application deployment, disable auto-migration
env:
  - name: ADK_AUTO_MIGRATE_DB
    value: "false"

Mode 2: Auto-Migration on Startup (Local dev, testing, simple deployments)

# Just set environment variable - no code needed
export ADK_AUTO_MIGRATE_DB=true

# Migration runs automatically when DatabaseSessionService initializes
# This happens in _prepare_tables() method inside the library

Monitoring:

  • Track migration duration (e.g., Prometheus histogram)
  • Alert on migration failures
  • Monitor schema version lag across environments

Rollback (automated via Helm pre-rollback hook):

⚠️ Warning: Downgrades are best-effort only. Not all schema or data migrations are reversible (e.g., dropping columns, destructive transformations). Operators must verify downgrade safety before relying on automated rollback in production.

# templates/migration-rollback-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ .Release.Name }}-migration-rollback
  annotations:
    "helm.sh/hook": pre-rollback
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": before-hook-creation
spec:
  template:
    spec:
      containers:
      - name: migration-rollback
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        command:
          - python
          - -m
          - google.adk.cli.migration_entrypoint
          - --db-url
          - $(DATABASE_URL)
          - --downgrade
          - "1"  # Number of migrations to rollback (should auto-calculate from app version)
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: {{ .Values.database.secretName }}
              key: url

Workflow:

# Deploy v1.26.0 (migration runs automatically)
helm upgrade my-app ./chart --version 1.26.0

# Rollback to v1.25.0 (DB downgrade runs automatically)
helm rollback my-app
# ↑ Helm pre-rollback hook downgrades DB before rolling back app

7. Testing Strategy

7.1 Test Coverage

Unit Tests (tests/unittests/sessions/migration/):

  • AlembicMigrationRunner functionality
  • DatabaseMigrationLock (PostgreSQL + table-based)
  • Individual migration scripts (upgrade/downgrade)
  • Schema version detection and synchronization

Integration Tests (tests/integration/sessions/):

  • Full migration paths (V0 → V1 → V2)
  • Auto-migration on startup
  • Concurrent migrations
  • Rollback scenarios
  • Data preservation through migrations

GitHub Actions Workflow (.github/workflows/test-migrations.yml):

strategy:
  matrix:
    db: [postgres:15, postgres:14, mysql:8.0]
    python: ["3.10", "3.11", "3.12"]

services:
  postgres:
    image: ${{ matrix.db }}
    env:
      POSTGRES_PASSWORD: testpass
    ports:
      - 5432:5432

8. Backward Compatibility

8.1 Phased Rollout

ADK 1.25.0 (Parallel Support):

  • Add Alembic support
  • Keep existing manual migration scripts
  • Document both approaches
  • CLI supports both: adk migrate session (Alembic) and adk migrate session --legacy (manual)

ADK 1.26.0 (Soft Deprecation):

  • Alembic becomes default
  • Deprecation warnings for manual migrations

ADK 1.27.0 (Hard Deprecation):

  • Remove manual migrations from main codebase
  • CLI only supports Alembic

8.2 Migration Path for Existing Users

V1 databases (without Alembic):

# Bootstrap Alembic - stamps baseline revision without running migrations
adk migrate session --bootstrap-alembic

This stamps the database with 002_baseline_v1 revision, indicating it's already at V1 schema.

V0 databases (legacy pickle-based):

# Migrate V0 → V1 via Alembic
adk migrate session --upgrade

This runs the 003_v0_to_v1_migration.py script to convert pickle to JSON.

Automatic detection:
When ADK_AUTO_MIGRATE_DB=true, the system automatically detects schema version and bootstraps Alembic if needed (stamps appropriate baseline revision).

9. Documentation Plan

9.1 New Documentation

User Guides:

  • docs/migration_guide.md - Developer workflow for creating migrations
  • docs/helm_migration_guide.md - Kubernetes deployment patterns

Helm Chart Examples:

  1. Simple PostgreSQL migration job
  2. Cloud SQL with Workload Identity
  3. Migration with health check init containers

Migration Template:
Enhanced alembic/script.py.mako with ADK-specific fields:

  • Database schema version
  • Compatible ADK versions
  • Description and data migration notes
  • Rollback considerations

9.2 Updated Documentation

  • src/google/adk/sessions/migration/README.md - Add Alembic workflow
  • Release notes for each version with migration instructions

10. References

Metadata

Metadata

Assignees

Labels

services[Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions