Summary
Set up the PostgresPro Airlines demo database, generate a large (“big data”) instance with the official generator, and implement flight timetable queries (departures/arrivals by airport/date/route). Use these queries to establish a repeatable benchmark and compare PostgreSQL 18 performance and ergonomics with a Doublets implementation on equivalent operations. The environment must be containerized, with two modes: default (durable) and embedded-like (WAL-light).
Scope
- PostgreSQL side (Docker): data generation, schema inspection, timetable queries (with validity checks), basic indexes/constraints, timing runs in both modes.
- Doublets side: functionally equivalent data model and queries (or pipelines), plus timing runs.
- Deliver reproducible scripts, Docker assets, and result artifacts (logs/CSVs).
Deliverables
[docker]/docker-compose.yml pinned to postgres:18.
[docker]/compose.embedded.yml (override) for embedded-like mode (WAL-light).
[docker]/init/99_unlogged.sql (optional) to switch data tables to UNLOGGED in embedded-like runs.
[docs]/HOWTO.md — runbook for Docker, generation parameters, and how to run both benchmark modes.
[sql]/10_timetable_queries.sql — queries below.
[bench]/pg/run.sh — timings (EXPLAIN (ANALYZE, BUFFERS) + wall-clock), CSV output.
[bench]/doublets/run.* — equivalent Doublets operations, same CSV schema.
[bench]/schema-mapping.md — mapping Airlines entities/fields to Doublets structures.
- README update with a “Benchmark: Timetable” section linking to the above.
Prerequisites (PostgreSQL)
- PostgreSQL 18 (pinned) via Docker image
postgres:18.
- Disk space per chosen dataset size (3m/6m/1y/2y).
- Extensions typically used by the generator:
btree_gist, cube, earthdistance, dblink.
Containerized Environment
Durable (default) compose
# [docker]/docker-compose.yml
version: "3.9"
services:
pg:
image: postgres:18
container_name: airlines-pg18
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports: ["5432:5432"]
volumes:
- pgdata:/var/lib/postgresql/data
- ./init:/docker-entrypoint-initdb.d
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 20
volumes:
pgdata:
Embedded-like (WAL-light) override
This mode trades durability for speed to better match embedded DB behavior.
# [docker]/compose.embedded.yml
services:
pg:
command:
- "postgres"
- "-c"; "fsync=off"
- "-c"; "synchronous_commit=off"
- "-c"; "full_page_writes=off"
- "-c"; "wal_level=minimal"
Optional (strongly recommended for WAL-free data writes): switch large tables to UNLOGGED in embedded mode.
-- [docker]/init/99_unlogged.sql (only applied when you want embedded-like runs)
DO $$
DECLARE r record;
BEGIN
FOR r IN
SELECT format('%I.%I', n.nspname, c.relname) AS fqname
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'bookings' AND c.relkind = 'r'
LOOP
EXECUTE 'ALTER TABLE '|| r.fqname || ' SET UNLOGGED';
END LOOP;
END$$;
Revert to durable: replace SET UNLOGGED with SET LOGGED.
Run:
# Durable
docker compose -f docker/docker-compose.yml up -d
# Embedded-like (WAL-light)
docker compose -f docker/docker-compose.yml -f docker/compose.embedded.yml up -d
Setup Options
Option A — Load a ready snapshot
gunzip -c demo-YYYYMMDD-1y.sql.gz \
| docker compose -f docker/docker-compose.yml exec -T pg psql -U postgres
Recreates demo inside the container.
Option B — Generate your own “big data” (recommended)
From psql in the container (any DB, not demo yet):
\i install; -- creates DB demo, schemas, extensions
-- Example: generate 1 year with 4 workers
CALL generate(now(), now() + interval '1 year', 4);
-- Progress
SELECT busy();
-- Post-checks
\i check.sql;
What to implement (PostgreSQL)
A. Timetable queries (using the provided view)
bookings.timetable already handles airport-local times and the temporal validity join.
-- Departures from SVO on 2025-10-07
SELECT *
FROM bookings.timetable
WHERE departure_airport = 'SVO'
AND (scheduled_departure AT TIME ZONE 'UTC')::date = DATE '2025-10-07'
ORDER BY scheduled_departure;
-- Arrivals to SVO on 2025-10-07
SELECT *
FROM bookings.timetable
WHERE arrival_airport = 'SVO'
AND (scheduled_arrival AT TIME ZONE 'UTC')::date = DATE '2025-10-07'
ORDER BY scheduled_arrival;
-- Next available flight after model-time “now” for a route
SELECT *
FROM bookings.timetable t
WHERE t.departure_airport = 'SVX'
AND t.arrival_airport = 'WUH'
AND t.scheduled_departure > bookings.now()
ORDER BY t.scheduled_departure
LIMIT 1;
B. Manual timetable (without the view, explicit validity check)
SELECT f.flight_id,
r.route_no,
r.departure_airport,
r.arrival_airport,
f.status,
f.scheduled_departure,
f.scheduled_arrival
FROM bookings.flights AS f
JOIN bookings.routes AS r
ON r.route_no = f.route_no
AND r.validity @> f.scheduled_departure -- validity at composition time
WHERE r.departure_airport = 'SVO'
AND f.scheduled_departure::date = DATE '2025-10-07'
ORDER BY f.scheduled_departure;
Benchmark plan
Metrics
- Wall-clock time (median over N runs).
EXPLAIN (ANALYZE, BUFFERS) for PostgreSQL queries (saved to logs).
- Result row counts and checksum (simple hash) for equivalence.
Datasets
- Two scales (e.g., 6 months and 1 year).
Procedure
- Warm-up: run each query once (discard).
- Execute each query 10 runs, record wall-clock ms.
- Store results in
bench/results/*.csv with columns:
system,durability_mode,dataset,query_id,run,rows,ms.
- Compute summary stats (
min/median/p95) and check consistency.
Durability modes
durable: default compose (baseline, production-like).
embedded: compose override + UNLOGGED tables to eliminate WAL where possible.
- Caveats: not crash-safe;
UNLOGGED tables are truncated after crash/restart; no replication.
Doublets parity
- Model the same entities/time attributes.
- Return the same logical result sets as the SQL above.
- Export CSV with the same columns (include
durability_mode = embedded).
Validation
SELECT bookings.now(); returns a timestamp (model time).
- Sanity: row counts by status, min/max scheduled times, airports present.
- Validity sanity: joining
routes to flights without the view must use r.validity @> f.scheduled_departure.
- Embedded mode check: verify
relpersistence='u' for bookings tables:
SELECT relname, relpersistence
FROM pg_class
WHERE relnamespace = 'bookings'::regnamespace AND relkind='r';
Acceptance Criteria
Nice to have (optional)
- Timing plots from CSVs.
- Notes on indexes that materially change plans.
References
Summary
Set up the PostgresPro Airlines demo database, generate a large (“big data”) instance with the official generator, and implement flight timetable queries (departures/arrivals by airport/date/route). Use these queries to establish a repeatable benchmark and compare PostgreSQL 18 performance and ergonomics with a Doublets implementation on equivalent operations. The environment must be containerized, with two modes: default (durable) and embedded-like (WAL-light).
Scope
Deliverables
[docker]/docker-compose.ymlpinned topostgres:18.[docker]/compose.embedded.yml(override) for embedded-like mode (WAL-light).[docker]/init/99_unlogged.sql(optional) to switch data tables to UNLOGGED in embedded-like runs.[docs]/HOWTO.md— runbook for Docker, generation parameters, and how to run both benchmark modes.[sql]/10_timetable_queries.sql— queries below.[bench]/pg/run.sh— timings (EXPLAIN (ANALYZE, BUFFERS)+ wall-clock), CSV output.[bench]/doublets/run.*— equivalent Doublets operations, same CSV schema.[bench]/schema-mapping.md— mapping Airlines entities/fields to Doublets structures.Prerequisites (PostgreSQL)
postgres:18.btree_gist,cube,earthdistance,dblink.Containerized Environment
Durable (default) compose
Embedded-like (WAL-light) override
Optional (strongly recommended for WAL-free data writes): switch large tables to UNLOGGED in embedded mode.
Revert to durable: replace
SET UNLOGGEDwithSET LOGGED.Run:
Setup Options
Option A — Load a ready snapshot
Option B — Generate your own “big data” (recommended)
From
psqlin the container (any DB, notdemoyet):What to implement (PostgreSQL)
A. Timetable queries (using the provided view)
B. Manual timetable (without the view, explicit validity check)
Benchmark plan
Metrics
EXPLAIN (ANALYZE, BUFFERS)for PostgreSQL queries (saved to logs).Datasets
Procedure
bench/results/*.csvwith columns:system,durability_mode,dataset,query_id,run,rows,ms.min/median/p95) and check consistency.Durability modes
durable: default compose (baseline, production-like).embedded: compose override + UNLOGGED tables to eliminate WAL where possible.UNLOGGEDtables are truncated after crash/restart; no replication.Doublets parity
durability_mode=embedded).Validation
SELECT bookings.now();returns a timestamp (model time).routestoflightswithout the view must user.validity @> f.scheduled_departure.relpersistence='u'forbookingstables:Acceptance Criteria
postgres:18) and reproducible startup.10_timetable_queries.sqlrunnable on the generated dataset.Nice to have (optional)
References