Skip to content

PostgreSQL 18: OLD_DATABASES detection fails to find data in /var/lib/postgresql/data/18/docker/ causing silent data loss on container recreation #1400

@athom

Description

@athom

Summary

The docker_setup_env() function in the PostgreSQL 18 entrypoint script fails to detect existing database data located
at /var/lib/postgresql/data/18/docker/ during container recreation, leading to silent data loss through creation of
a new database in an anonymous volume.

Environment

  • Image: postgres:18.1-alpine
  • PGDATA: /var/lib/postgresql/18/docker (default for PostgreSQL 18)
  • Volume Configuration:
    volumes:
      - ./data/postgres:/var/lib/postgresql/data
    

Root Cause

The glob pattern used to detect old databases is incomplete:

Current detection logic (docker-entrypoint.sh, lines 250-254):

  for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker; do
      if [ -s "$d/PG_VERSION" ]; then
          OLD_DATABASES+=( "$d" )
      fi
  done

Problem: The pattern /var/lib/postgresql/*/docker matches only two-level paths:

  • ✅ Matches: /var/lib/postgresql/18/docker
  • ❌ Misses: /var/lib/postgresql/data/18/docker (three-level path)

Reproduction Steps

Initial Setup (First Container)

  # Create docker-compose.yml
  cat > docker-compose.yml <<EOF
  services:
    postgres:
      image: postgres:18.1-alpine
      volumes:
        - ./data/postgres:/var/lib/postgresql/data
      environment:
        POSTGRES_USER: testuser
        POSTGRES_PASSWORD: testpass
        POSTGRES_DB: testdb
  EOF

  # Start container
  docker compose up -d

  # Verify data location
  docker exec postgres ls -la /var/lib/postgresql/data/18/docker/
  # Output shows PG_VERSION exists

  # Insert test data
  docker exec postgres psql -U testuser -d testdb -c "CREATE TABLE test (id INT); INSERT INTO test VALUES (1);"
  docker exec postgres psql -U testuser -d testdb -c "SELECT * FROM test;"
  # Output: 1 row

Container Recreation (Triggers Bug)

# Recreate container (e.g., due to config change)
docker compose down
docker compose up -d

# Check data - LOST!
docker exec postgres psql -U testuser -d testdb -c "SELECT * FROM test;"
# ERROR: relation "test" does not exist

Why Data Loss Occurs

  1. First container: PostgreSQL initializes data at /var/lib/postgresql/data/18/docker/ (bind mount)
  2. Container recreation: New anonymous volume created for /var/lib/postgresql/
  3. Entrypoint check:
# PGDATA = /var/lib/postgresql/18/docker (in anonymous volume)
[ -s "$PGDATA/PG_VERSION" ]  # FALSE (anonymous volume is empty)

# Check old database locations
for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker; do
    # /var/lib/postgresql/18/docker - matches, but is new (empty)
    # /var/lib/postgresql/data/18/docker - NOT MATCHED by glob pattern!
done
  1. Result: OLD_DATABASES array remains empty → No error raised → initdb runs in anonymous volume → New database created
    → Old data silently abandoned

Verification

Test glob pattern behavior:

In running container

  docker exec postgres sh -c 'for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker; do if [
   -s "$d/PG_VERSION" ]; then echo "Found: $d"; fi; done'
  # Output: Found: /var/lib/postgresql/18/docker
  # MISSING: /var/lib/postgresql/data/18/docker

With fixed pattern:

  docker exec postgres sh -c 'for d in /var/lib/postgresql /var/lib/postgresql/data /var/lib/postgresql/*/docker
  /var/lib/postgresql/data/*/docker; do if [ -s "$d/PG_VERSION" ]; then echo "Found: $d"; fi; done'
  # Output: Found: /var/lib/postgresql/18/docker
  #         Found: /var/lib/postgresql/data/18/docker ✓

Impact

  • Severity: Critical - Silent data loss
  • Scope: Any deployment where:
    • Volume mounted at /var/lib/postgresql/data (common legacy pattern)
    • Container recreation occurs (e.g., docker-compose config changes, docker run --force-recreate)
  • No warnings: Users receive no error messages - database appears to work but data is lost

Proposed Fix

Update glob pattern in docker_setup_env() to include three-level paths:

  for d in /var/lib/postgresql \
           /var/lib/postgresql/data \
           /var/lib/postgresql/*/docker \
           /var/lib/postgresql/data/*/docker; do  # <-- ADD THIS LINE
      if [ -s "$d/PG_VERSION" ]; then
          OLD_DATABASES+=( "$d" )
      fi
  done

This ensures detection of databases in:

  • /var/lib/postgresql/18/docker (new default)
  • /var/lib/postgresql/data/18/docker (legacy bind mount + PostgreSQL 18 subdirectory)
  • /var/lib/postgresql/17/docker (old version upgrade scenarios)

Workaround

Option 1: Mount parent directory (PostgreSQL 18 recommendation)
volumes:
- ./data/postgres:/var/lib/postgresql # <-- No /data suffix

Option 2: Set explicit PGDATA in bind mount
environment:
PGDATA: /var/lib/postgresql/data/pgdata
volumes:
- ./data/postgres:/var/lib/postgresql/data

Option 3: Manual recovery after recreation

  # Remove anonymous volume
  docker compose down
  docker volume rm $(docker volume ls -q -f name=postgres)
  # Restart - will use bind mount data
  docker compose up -d

Related Issues

Additional Context

This bug was discovered during a production deployment where a configuration change (adding extra_hosts to Caddy
service) triggered container recreation via docker compose up -d. The PostgreSQL container was recreated despite no
changes to its own configuration, resulting in data loss for a production database. The bind mount contained 9.9MB of
user data in /var/lib/postgresql/data/18/docker/, but the entrypoint script failed to detect it, creating a fresh
database in an anonymous volume instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions