-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
The current design uses SQLite for persistent storage (headers, blobs, sync state, namespaces). This ties the indexer to local disk, making it stateful and harder to run in ephemeral environments (Kubernetes pods, spot instances, auto-scaling groups).
Proposal
Add an S3-compatible storage backend as an alternative to SQLite, enabling fully stateless deployments.
Approach
Implement the Store interface (pkg/store/store.go) with an S3-backed implementation (pkg/store/s3.go) that:
- Blob/header storage: Stores blobs and headers as S3 objects keyed by
namespace/height/commitment(or similar hierarchy) - Sync state: Stores checkpoint as a single S3 object (or uses DynamoDB/similar for atomic updates)
- Namespace config: Loaded from config file (no DB table needed)
- Query support: Uses S3 list operations with prefix filtering for
GetAll(namespace, height)style queries
Configuration
[storage]
type = "s3" # or "sqlite" (default)
bucket = "apex-indexer"
prefix = "mainnet/"
region = "us-east-1"
# endpoint = "http://minio:9000" # for S3-compatible stores (MinIO, R2, etc.)Considerations
- Read latency: S3 reads are slower than SQLite. May need a local LRU cache for hot data (recent heights).
- Consistency: S3 is eventually consistent for list-after-write in some edge cases. Sync state updates need care.
- Cost: Many small objects can add up. Consider batching blobs per height into single objects to reduce request count.
- Compatibility: Support S3-compatible APIs (MinIO, Cloudflare R2, DigitalOcean Spaces) via configurable endpoint.
- Hybrid option: SQLite for sync state + S3 for blob data could be a pragmatic middle ground.
Benefits
- Stateless pods: any instance can pick up where another left off
- Simplified backups: S3 handles durability and replication
- Horizontal scaling: multiple read-only API instances backed by the same bucket
- Cost-effective for large datasets with infrequent access patterns
Alternatives
- PostgreSQL/CockroachDB: Shared relational store, but heavier operational overhead
- Redis + S3: Redis for hot data / sync state, S3 for cold blob storage
- Embedded object store (Badger): Still local disk, doesn't solve statelessness
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels