Skip to content

Add S3 storage backend for stateless deployment #6

@tac0turtle

Description

@tac0turtle

Problem

The current design uses SQLite for persistent storage (headers, blobs, sync state, namespaces). This ties the indexer to local disk, making it stateful and harder to run in ephemeral environments (Kubernetes pods, spot instances, auto-scaling groups).

Proposal

Add an S3-compatible storage backend as an alternative to SQLite, enabling fully stateless deployments.

Approach

Implement the Store interface (pkg/store/store.go) with an S3-backed implementation (pkg/store/s3.go) that:

  • Blob/header storage: Stores blobs and headers as S3 objects keyed by namespace/height/commitment (or similar hierarchy)
  • Sync state: Stores checkpoint as a single S3 object (or uses DynamoDB/similar for atomic updates)
  • Namespace config: Loaded from config file (no DB table needed)
  • Query support: Uses S3 list operations with prefix filtering for GetAll(namespace, height) style queries

Configuration

[storage]
type = "s3"  # or "sqlite" (default)
bucket = "apex-indexer"
prefix = "mainnet/"
region = "us-east-1"
# endpoint = "http://minio:9000"  # for S3-compatible stores (MinIO, R2, etc.)

Considerations

  • Read latency: S3 reads are slower than SQLite. May need a local LRU cache for hot data (recent heights).
  • Consistency: S3 is eventually consistent for list-after-write in some edge cases. Sync state updates need care.
  • Cost: Many small objects can add up. Consider batching blobs per height into single objects to reduce request count.
  • Compatibility: Support S3-compatible APIs (MinIO, Cloudflare R2, DigitalOcean Spaces) via configurable endpoint.
  • Hybrid option: SQLite for sync state + S3 for blob data could be a pragmatic middle ground.

Benefits

  • Stateless pods: any instance can pick up where another left off
  • Simplified backups: S3 handles durability and replication
  • Horizontal scaling: multiple read-only API instances backed by the same bucket
  • Cost-effective for large datasets with infrequent access patterns

Alternatives

  • PostgreSQL/CockroachDB: Shared relational store, but heavier operational overhead
  • Redis + S3: Redis for hot data / sync state, S3 for cold blob storage
  • Embedded object store (Badger): Still local disk, doesn't solve statelessness

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions