Skip to content

A PostgreSQL flight recorder that captures waits, locks, query fingerprints, and replication state into a rewindable timeline for incident forensics.

License

Notifications You must be signed in to change notification settings

alfredeengreen/pgtrace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pgtrace

Postgres Query Flight Recorder - an always-on, low-overhead, bounded "black box" for PostgreSQL incidents.

Overview

pgtrace continuously captures behavioral signals (waits, locks, query fingerprints, replication/WAL, markers) into a rewindable timeline, enabling deterministic incident forensics.

Requirements

System Requirements

  • PostgreSQL: 13+ (target), with feature detection per version
  • Rust: 1.70+ (for agent and CLI)
  • Elixir/OTP: 1.14+ / OTP 26+ (for Phoenix server)

Optional but Recommended

  • pg_stat_statements: PostgreSQL extension for query fingerprinting (highly recommended)
  • pg_cron: PostgreSQL extension for scheduled retention jobs (optional)

Operating System

  • Linux (primary target)

Quick Start

1. Install Prerequisites

Rust Toolchain

Install Rust via rustup (recommended):

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

Or via package manager:

  • macOS: brew install rust
  • Ubuntu/Debian: apt install rustc cargo
  • Fedora: dnf install rust cargo

Verify installation:

rustc --version  # Should be 1.70+
cargo --version

Elixir/Mix

Install Elixir (includes Mix and Erlang/OTP):

  • macOS: brew install elixir
  • Ubuntu/Debian: apt install elixir
  • Fedora: dnf install elixir
  • Or use asdf version manager

Verify installation:

elixir --version  # Should be 1.14+
mix --version

2. Database Setup

pgtrace requires a PostgreSQL database to store collected data. You can use an existing PostgreSQL instance or set up a new one.

Option A: Use Existing PostgreSQL (Recommended)

If you already have PostgreSQL running, create a database for pgtrace:

# Connect to your PostgreSQL instance
psql -h your-postgres-host -U postgres

# Create database for pgtrace
CREATE DATABASE pgtrace_store;

# Run migrations
\c pgtrace_store
\i store/migrations/001_init.sql
\i store/migrations/002_rollup_tables.sql
\i store/migrations/003_roles_grants.sql
\i store/migrations/004_retention_helpers.sql

Or from terminal:

# Create database
createdb -h your-postgres-host -U postgres pgtrace_store

# Run migrations
for file in store/migrations/*.sql; do
  psql -h your-postgres-host -U postgres -d pgtrace_store -f "$file"
done

Important: The store database is separate from the database you're observing. The agent reads from your production database (source_db) and writes to the pgtrace_store database (store_db).

Option B: Set Up New PostgreSQL (Development Only)

For local development and testing, you can use Docker Compose:

# Start PostgreSQL container
docker-compose up -d postgres-store

# Run migrations
export PGPASSWORD=postgres
for file in store/migrations/*.sql; do
  psql -h localhost -p 5433 -U postgres -d pgtrace_store -f "$file"
done

3. Source Database Permissions

The agent needs read-only access to your production PostgreSQL database. Create a user with minimal permissions:

-- Connect to your production database
\c your_production_database

-- Create read-only user for agent
CREATE USER pgtrace_agent WITH PASSWORD 'your_secure_password';

-- Grant required permissions (read-only access to system views)
GRANT CONNECT ON DATABASE your_production_database TO pgtrace_agent;
GRANT USAGE ON SCHEMA public TO pgtrace_agent;

-- Grant access to system views (adjust schema if needed)
GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO pgtrace_agent;
GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO pgtrace_agent;

-- If using pg_stat_statements extension
GRANT SELECT ON pg_stat_statements TO pgtrace_agent;

The agent needs access to these system views:

  • pg_stat_activity
  • pg_locks
  • pg_stat_database
  • pg_stat_bgwriter
  • pg_stat_statements (recommended)
  • pg_stat_replication (primary only)
  • pg_stat_wal (version-dependent, PostgreSQL 14+)
  • pg_stat_wal_receiver (standby only)

These views are typically accessible to all users, but verify based on your PostgreSQL configuration.

4. Build Components

Build Agent

cd agent
cargo build --release

The binary will be at agent/target/release/pgtrace-agent.

Build CLI

cd cli
cargo build --release

The binary will be at cli/target/release/pgtrace.

Setup Phoenix Server

cd server
mix deps.get
mix compile

5. Configuration

Agent Configuration

Create agent/pgtrace-agent.yaml:

node_id: "prod-eu-1"  # Unique identifier for this database instance
source:
  dsn: "postgres://pgtrace_agent:your_password@your-prod-host:5432/your_production_database"
store:
  dsn: "postgres://postgres:password@your-postgres-host:5432/pgtrace_store"
intervals:
  waits_ms: 500
  locks_ms: 1000
  qstats_ms: 1000
  repl_ms: 1000
  wal_ms: 1000
limits:
  max_events_per_sec: 20000
  max_payload_bytes: 16384
  top_qstats_per_interval: 200
  max_lock_edges_per_snapshot: 50000
privacy:
  store_sql_samples: false
  store_user_app: false
  store_client_addr: false
features:
  pg_stat_statements: auto
  replication: auto
  wal_stats: auto
retention:
  events_hours: 24
  lock_edges_hours: 24
  repl_days: 7
  wal_days: 7
  markers_days: 30

Key points:

  • source.dsn: Connection to your production database (read-only)
  • store.dsn: Connection to pgtrace_store database (write access)
  • node_id: Unique identifier (use descriptive names like "prod-eu-1", "staging-us-west-2")
  • Environment variables override config file: PGTRACE_SOURCE_DSN, PGTRACE_STORE_DSN, PGTRACE_NODE_ID

Server Configuration

Edit server/config/config.exs or use environment variables:

export PGTRACE_DB_HOST=your-postgres-host
export PGTRACE_DB_NAME=pgtrace_store
export PGTRACE_DB_USER=pgtrace_api
export PGTRACE_DB_PASSWORD=your_password
export SECRET_KEY_BASE=$(mix phx.gen.secret)
export PHX_HOST=pgtrace.example.com
export PORT=4000

For production, see server/config/prod.exs and docs/DEPLOYMENT.md.

6. Run Components

Start Agent

cd agent
./target/release/pgtrace-agent run --config pgtrace-agent.yaml

The agent will:

  • Connect to your source database (read-only)
  • Poll system views at configured intervals
  • Write collected data to the store database
  • Run retention jobs periodically

Start Server

cd server
mix phx.server

Server will be available at http://localhost:4000

Use CLI

cd cli

# Set store database connection
export PGTRACE_STORE_DSN="postgres://postgres:password@your-postgres-host:5432/pgtrace_store"

# Run commands
./target/release/pgtrace doctor
./target/release/pgtrace tail --node prod-eu-1 --window 5m
./target/release/pgtrace incident --node prod-eu-1 --at "2026-01-13T10:00:00Z" --window 10m

Architecture

pgtrace uses a separate database architecture:

  • Source Database: Your production PostgreSQL (read-only access)
  • Store Database: Dedicated database for pgtrace data (separate PostgreSQL instance or database)

This separation:

  • Avoids adding write load to production
  • Allows independent scaling and maintenance
  • Provides isolation for observability data
  • Enables centralized monitoring of multiple databases

Multiple agents can write to the same store database, allowing you to monitor multiple PostgreSQL instances from one store.

Components

  1. Agent (Rust): Polls production Postgres, derives lock edges, inserts to store DB
  2. Store (PostgreSQL): Schema, migrations, retention logic
  3. Server (Elixir/Phoenix): REST API + LiveView realtime incident room UI
  4. CLI (Rust): Terminal interface for querying store DB

Database Roles

The migrations create three roles in the store database:

  • pgtrace_writer: Insert-only access (used by agent)
  • pgtrace_reader: Read-only access (used by API/CLI)
  • pgtrace_marker: Permission to insert markers only

See store/migrations/003_roles_grants.sql for details.

For production, create separate users:

-- In store database
CREATE USER pgtrace_agent WITH PASSWORD 'secure_password';
GRANT pgtrace_writer TO pgtrace_agent;

CREATE USER pgtrace_api WITH PASSWORD 'secure_password';
GRANT pgtrace_reader TO pgtrace_api;

Testing

See TESTING.md for comprehensive testing guide.

Quick test setup:

# Automated setup and basic tests
./scripts/test-setup.sh
./scripts/test-basic.sh

Production Deployment

See docs/DEPLOYMENT.md for production deployment guide.

Key production considerations:

  • Enable authentication (see docs/SECURITY.md)
  • Use TLS/SSL (reverse proxy recommended)
  • Configure secrets via environment variables
  • Use dedicated database users
  • Set up monitoring and backups

Development

Running Tests

# Rust tests
cd agent && cargo test
cd ../cli && cargo test

# Elixir tests
cd server && mix test

Code Formatting

# Rust
cd agent && cargo fmt
cd ../cli && cargo fmt

# Elixir
cd server && mix format

Documentation

Troubleshooting

Agent won't connect to source database

  • Verify source database DSN is correct
  • Check user has CONNECT permission on database
  • Verify user has SELECT on required system views
  • Test connection manually: psql "your_source_dsn" -c "SELECT 1"
  • Check firewall/network connectivity

Agent won't connect to store database

  • Verify store database exists and migrations are run
  • Check user has write permissions (pgtrace_writer role)
  • Test connection: psql "your_store_dsn" -c "SELECT 1"
  • Verify schema exists: psql "your_store_dsn" -c "\dn pgtrace"

Server won't start

  • Check database connection in server/config/config.exs or environment variables
  • Ensure migrations have been run in store database
  • Check user has read permissions (pgtrace_reader role)
  • Review logs: mix phx.server shows errors on startup

CLI connection errors

  • Set PGTRACE_STORE_DSN environment variable correctly
  • Verify store database is accessible
  • Check user has pgtrace_reader role or equivalent permissions
  • Test connection: psql "$PGTRACE_STORE_DSN" -c "SELECT 1"

Build errors

  • Ensure Rust toolchain is up to date: rustup update
  • Clean build: cargo clean && cargo build
  • Check dependency versions in Cargo.toml
  • For Elixir: mix deps.clean --all && mix deps.get

License

Apache-2.0

See LICENSE for details.

Limitations

See docs/LIMITATIONS.md for known limitations and what pgtrace cannot tell you.

Contributing

This is an open-source project. Contributions welcome. Please ensure:

  • Code follows project style guidelines
  • Tests pass
  • Documentation is updated
  • No secrets or testing credentials in commits

About

A PostgreSQL flight recorder that captures waits, locks, query fingerprints, and replication state into a rewindable timeline for incident forensics.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published