RSS-Lance can store all its data on S3 instead of local disk. Both the Python fetcher and Go server use LanceDB's native S3 support (via the Rust object_store crate) — no boto3, no AWS SDK, no extra dependencies.
- An S3 bucket (or S3-compatible service: MinIO, GCS, Azure Blob)
- AWS credentials configured via the standard AWS credential chain
RSS-Lance reads credentials the same way the AWS CLI does. If aws s3 ls s3://your-bucket/ works from your terminal, RSS-Lance will work too.
| Method | How | Best for |
|---|---|---|
| Shared credentials file | ~/.aws/credentials via aws configure |
Local dev, personal machines |
| Named profile | AWS_PROFILE=myprofile |
Multiple AWS accounts |
| Environment variables | AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY |
Docker, Lambda, CI |
| EC2 instance role | Automatic via IMDS | EC2 instances |
| ECS task role | Automatic via task metadata | ECS / Fargate |
Do not put access keys in config.toml. Use ~/.aws/credentials, environment variables, or IAM roles. This keeps secrets out of your repo and lets you rotate credentials without touching the app.
# Install AWS CLI (one-time)
# https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
# Configure credentials - creates ~/.aws/credentials and ~/.aws/config
aws configure
# AWS Access Key ID: AKIA...
# AWS Secret Access Key: ****
# Default region name: us-east-1
# Default output format: json
# Verify it works
aws s3 ls s3://your-bucket/On EC2 or ECS, skip aws configure entirely — attach an IAM role to your instance/task and credentials are provided automatically.
Edit config.toml:
[storage]
type = "s3"
path = "s3://my-rss-bucket/rss-lance"
# Only needed if not set in ~/.aws/config or AWS_DEFAULT_REGION
# s3_region = "us-east-1"That's it. Start the fetcher and server as usual:
./run.sh fetch-once
./run.sh server# 1. Create an S3 bucket
aws s3 mb s3://my-rss-feeds --region us-east-1
# 2. Clone and build
git clone https://github.com/sysadminmike/rss-lance rss-lance && cd rss-lance
./build.sh all
# 3. Configure for S3
cat > config.toml << 'EOF'
[storage]
type = "s3"
path = "s3://my-rss-feeds/data"
[fetcher]
interval_minutes = 30
max_concurrent = 5
[server]
host = "127.0.0.1"
port = 8080
EOF
# 4. Add feeds and start
./run.sh demo-data
./run.sh fetch-once
./run.sh serverOpen http://127.0.0.1:8080 — your data is now on S3.
Because the data lives on S3, the fetcher and server don't need to be on the same machine:
Machine A (Linux server) S3 bucket Machine B (laptop)
┌──────────────────┐ ┌──────────────┐ ┌──────────────────┐
│ Feed Fetcher │─writes─►│ s3://my-rss/ │◄─reads─│ Go Server │
│ (cron / daemon) │ │ │ │ (serves UI) │
└──────────────────┘ └──────────────┘ └──────────────────┘
Both machines just need:
- The same
config.tomlpointing tos3://my-rss/data - Valid AWS credentials (each machine can use its own IAM role)
# Machine A — fetcher only (daemon or cron)
./run.sh fetcher
# Machine B — server only
./run.sh serverRun the fetcher as a scheduled Lambda (zero infrastructure when not fetching), and the server on your local machine:
# Lambda fetcher uses an IAM execution role — no credentials to manage
# Local server uses ~/.aws/credentials
./run.sh serverFor self-hosted S3-compatible storage:
[storage]
type = "s3"
path = "s3://my-bucket/rss-lance"
s3_endpoint = "http://localhost:9000" # MinIO endpoint
s3_region = "us-east-1" # required even for MinIOSet MinIO credentials via environment variables:
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
./run.sh serverR2 uses S3-compatible API tokens:
[storage]
type = "s3"
path = "s3://my-rss-bucket/data"
s3_endpoint = "https://<account-id>.r2.cloudflarestorage.com"
s3_region = "auto"export AWS_ACCESS_KEY_ID=<r2-access-key-id>
export AWS_SECRET_ACCESS_KEY=<r2-secret-access-key>
./run.sh serverNote: R2 does not support conditional PUT (
PUT-IF-NONE-MATCH). With a single fetcher + single server (the normal setup), this is fine — Lance's optimistic concurrency handles it. For multi-writer setups, see database.md for DynamoDB as an external manifest store.
With S3, your security perimeter is your AWS IAM policy — not your application:
- Bucket policy controls who can read/write
- IAM roles grant access without long-lived keys
- Server-side encryption (SSE-S3 or SSE-KMS) encrypts data at rest
- VPC endpoints keep traffic off the public internet
- CloudTrail logs every access
No firewalls, no nginx, no reverse proxy, no application-level auth needed.
The fetcher and server need these S3 permissions on the bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-rss-bucket",
"arn:aws:s3:::my-rss-bucket/*"
]
}
]
}For read-only server access (if the server never needs to mark articles as read/starred from this machine):
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::my-rss-bucket",
"arn:aws:s3:::my-rss-bucket/*"
]
}Both the Python lancedb library and the Go lancedb-go SDK use the Rust object_store crate for storage. This is the same storage layer used by Apache Arrow, Delta Lake, and other data infrastructure tools.
When you pass s3://bucket/path as the data path:
- LanceDB resolves AWS credentials via the standard chain (
~/.aws/credentials→ env vars → IMDS) - Reads use S3
GETwith range requests (only fetches the bytes needed) - Writes create new immutable files via S3
PUT - MVCC commits use
PUT-IF-NONE-MATCH(conditional PUT) to ensure exactly one writer wins a race - DuckDB's Lance extension also reads from S3 via
httpfsfor SQL queries
No boto3 dependency. No AWS SDK. The Rust crate reads the same credential files that boto3/AWS CLI use, but it's a completely independent implementation.
S3 costs for a typical single-user RSS reader are minimal:
| Operation | Approximate cost |
|---|---|
| Storage (1000 articles) | ~$0.01/month |
| GET requests (browsing) | ~$0.01/month |
| PUT requests (each fetch cycle) | ~$0.01/month |
| Total | < $0.05/month |
The fetcher batches all writes into a single Lance append per cycle to minimise PUT costs.
AWS credentials are not configured. Run aws configure or set AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY environment variables.
The IAM user/role doesn't have permission to access the bucket. Check the bucket policy and IAM policy.
The bucket doesn't exist or is in a different region. Verify with aws s3 ls s3://your-bucket/.
DuckDB uses its own httpfs extension for S3 access. It reads credentials from the same environment variables. If the Go server was started in a different shell without AWS_* variables, DuckDB won't have them. Make sure credentials are available in the server's environment.
If costs are higher than expected, check compaction settings. Frequent small writes create many small files. The fetcher's batching and auto-compaction are designed to minimise this — make sure compaction thresholds are reasonable (see configuration.md).