Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docker-compose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ You can adjust BlockScout environment variables:
- for visualizer in `./envs/common-visualizer.env`
- for user-ops-indexer in `./envs/common-user-ops-indexer.env`

For production resource limits, health checks, log rotation, and monitoring
requirements, see [`production-hardening.md`](./production-hardening.md).

Descriptions of the ENVs are available

- for [backend](https://docs.blockscout.com/setup/env-variables)
Expand Down
83 changes: 83 additions & 0 deletions docker-compose/production-hardening.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Production Hardening

This checklist keeps the public Docker Compose files safe to publish while
covering the failure mode observed on the Numbers mainnet explorer: long-running
containers consumed nearly all VM memory, then the VM kept receiving packets but
stopped returning responses.

## Resource Guards

The compose service files define non-secret defaults that can be overridden by
the production environment:

```sh
BACKEND_MEM_LIMIT=4g
BACKEND_MEMSWAP_LIMIT=4g
DB_MEM_LIMIT=6g
DB_MEMSWAP_LIMIT=6g
STATS_MEM_LIMIT=1g
STATS_MEMSWAP_LIMIT=1g
STATS_DB_MEM_LIMIT=1g
STATS_DB_MEMSWAP_LIMIT=1g
REDIS_MEM_LIMIT=512m
REDIS_MEMSWAP_LIMIT=512m
```

Keep total container limits below host memory so the OS, nginx, Docker, and the
monitoring agent have headroom. On a 16 GiB VM, reserve at least 3 GiB for the
host.

## Health Checks

The backend container now exposes a Docker healthcheck against:

```text
http://localhost:$${PORT:-4000}/api/v2/main-page/indexing-status
```

Production monitoring should also check the public endpoint:

```sh
curl -fsS --max-time 10 \
https://mainnet.num.network/api/v2/main-page/indexing-status
```

Alert if this endpoint is non-200 or exceeds the expected latency for multiple
consecutive checks.

## Log Rotation

Docker `json-file` log rotation is enabled for the high-volume services. The
defaults are intentionally conservative and can be adjusted without changing the
compose files:

```sh
BACKEND_LOG_MAX_SIZE=50m
BACKEND_LOG_MAX_FILE=5
DB_LOG_MAX_SIZE=50m
DB_LOG_MAX_FILE=5
STATS_LOG_MAX_SIZE=50m
STATS_LOG_MAX_FILE=5
STATS_DB_LOG_MAX_SIZE=25m
STATS_DB_LOG_MAX_FILE=5
REDIS_LOG_MAX_SIZE=25m
REDIS_LOG_MAX_FILE=5
```

## Monitoring

Required production alerts:

- VM memory used > 85% for 10 minutes.
- VM outbound bytes = 0 while inbound bytes > 0 for 5 minutes.
- Public explorer health endpoint returns non-200 for 3 consecutive checks.
- Docker container health is `unhealthy` for backend.

Enable process-level memory metrics on the VM. Without process RSS history, an
incident can prove memory exhaustion but cannot identify which process caused it.
Do not exclude these Ops Agent metrics in production:

```yaml
agent.googleapis.com/processes/*
agent.googleapis.com/swap/*
```
15 changes: 14 additions & 1 deletion docker-compose/services/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ services:
image: blockscout/${DOCKER_REPO:-blockscout}:${DOCKER_TAG:-latest}
pull_policy: always
restart: always
mem_limit: ${BACKEND_MEM_LIMIT:-4g}
memswap_limit: ${BACKEND_MEMSWAP_LIMIT:-4g}
stop_grace_period: 5m
container_name: 'backend'
command: sh -c "bin/blockscout eval \"Elixir.Explorer.ReleaseTasks.create_and_migrate()\" && bin/blockscout start"
Expand All @@ -14,4 +16,15 @@ services:
- ../envs/common-blockscout.env
volumes:
- ./logs/:/app/logs/
- ./dets/:/app/dets/
- ./dets/:/app/dets/
healthcheck:
test: ["CMD-SHELL", "curl -fsS http://localhost:$${PORT:-4000}/api/v2/main-page/indexing-status >/dev/null || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 2m
logging:
driver: json-file
options:
max-size: ${BACKEND_LOG_MAX_SIZE:-50m}
max-file: ${BACKEND_LOG_MAX_FILE:-5}
7 changes: 7 additions & 0 deletions docker-compose/services/db.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ services:
user: 2000:2000
shm_size: 256m
restart: always
mem_limit: ${DB_MEM_LIMIT:-6g}
memswap_limit: ${DB_MEMSWAP_LIMIT:-6g}
container_name: 'db'
command: postgres -c 'max_connections=200' -c 'client_connection_check_interval=60000'
environment:
Expand All @@ -33,3 +35,8 @@ services:
timeout: 5s
retries: 5
start_period: 10s
logging:
driver: json-file
options:
max-size: ${DB_LOG_MAX_SIZE:-50m}
max-file: ${DB_LOG_MAX_FILE:-5}
8 changes: 8 additions & 0 deletions docker-compose/services/redis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,13 @@ services:
image: 'redis:alpine'
container_name: redis-db
command: redis-server
restart: always
mem_limit: ${REDIS_MEM_LIMIT:-512m}
memswap_limit: ${REDIS_MEMSWAP_LIMIT:-512m}
volumes:
- ./redis-data:/data
logging:
driver: json-file
options:
max-size: ${REDIS_LOG_MAX_SIZE:-25m}
max-file: ${REDIS_LOG_MAX_FILE:-5}
14 changes: 14 additions & 0 deletions docker-compose/services/stats.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ services:
user: 2000:2000
shm_size: 256m
restart: always
mem_limit: ${STATS_DB_MEM_LIMIT:-1g}
memswap_limit: ${STATS_DB_MEMSWAP_LIMIT:-1g}
container_name: 'stats-db'
command: postgres -c 'max_connections=200'
environment:
Expand All @@ -33,12 +35,19 @@ services:
timeout: 5s
retries: 5
start_period: 10s
logging:
driver: json-file
options:
max-size: ${STATS_DB_LOG_MAX_SIZE:-25m}
max-file: ${STATS_DB_LOG_MAX_FILE:-5}

stats:
image: ghcr.io/blockscout/stats:${STATS_DOCKER_TAG:-latest}
pull_policy: always
platform: linux/amd64
restart: always
mem_limit: ${STATS_MEM_LIMIT:-1g}
memswap_limit: ${STATS_MEMSWAP_LIMIT:-1g}
container_name: 'stats'
extra_hosts:
- 'host.docker.internal:host-gateway'
Expand All @@ -49,3 +58,8 @@ services:
- STATS__BLOCKSCOUT_DB_URL=${STATS__BLOCKSCOUT_DB_URL:-postgresql://blockscout:ceWb1MeLBEeOIfk65gU8EjF8@db:5432/blockscout}
- STATS__CREATE_DATABASE=${STATS__CREATE_DATABASE:-true}
- STATS__RUN_MIGRATIONS=${STATS__RUN_MIGRATIONS:-true}
logging:
driver: json-file
options:
max-size: ${STATS_LOG_MAX_SIZE:-50m}
max-file: ${STATS_LOG_MAX_FILE:-5}
Loading