Describe the bug
ExtendDB provisions GSI (secondary-index) tables without an index on (base_pk, base_sk_b). The engine then issues DELETE FROM "<gsi-table>" WHERE base_pk = $1 AND base_sk_b = $2 to maintain the GSI on UpdateItem / DeleteItem of the base item. Because no index leads with those columns, every reverse-lookup DELETE falls through to a sequential scan of the entire GSI table.
In a Twisp benchmark with two GSI tables of ~66k rows each, this single statement accounted for 74.4% of all Postgres CPU time (270 s of 363 s total in a 5×500-tx run), at 12.76 ms mean per call.
Adding CREATE INDEX … (base_pk, base_sk_b) by hand dropped the same DELETE from 12.76 ms to 0.021 ms (≈610× faster per call, ≈250× fewer buffers) and lifted the batched-write throughput of the surrounding workload by ~40% (89.4 → 125.3 tx/s).
The schema generator should provision this index alongside the existing primary key and the (pk, sk_b, base_pk, base_sk_b) secondary, since the engine itself depends on (base_pk, base_sk_b) lookups for GSI maintenance.
To Reproduce
-
Start ExtendDB against Postgres in the default configuration; create a DynamoDB-style table with at least one GSI.
-
Populate it: enough rows to make a Seq Scan visible against an Index Scan (tens of thousands is enough; the regression is asymptotic in row count regardless).
-
Issue an UpdateItem against a base item in a way that triggers GSI maintenance (changes a GSI key, or deletes the item entirely).
-
Inspect the GSI table:
\d "<gsi-table-name>"
EXPLAIN (ANALYZE, BUFFERS)
DELETE FROM "<gsi-table-name>"
WHERE base_pk = 'nonexistent' AND base_sk_b = '\x00';
-
Observe a Seq Scan on the GSI table for the DELETE rather than an Index Scan.
Concrete observation from a local run (Postgres 16, table with 66,072 rows):
Delete on "_ddb_ecfa0b6e-…" (cost=0.00..6799.08 rows=0 width=0) (actual time=20.421..20.421 rows=0 loops=1)
Buffers: shared read=5808
-> Seq Scan on "_ddb_ecfa0b6e-…" (cost=0.00..6799.08 rows=1 width=6) (actual time=20.419..20.420 rows=0 loops=1)
Filter: ((base_pk = 'nonexistent'::text) AND (base_sk_b = '\x00'::bytea))
Rows Removed by Filter: 66072
Buffers: shared read=5808
Execution Time: 20.490 ms
After CREATE INDEX "…_base_pk_base_sk_b_idx" ON "<gsi-table>" (base_pk, base_sk_b);:
Delete on "_ddb_ecfa0b6e-…"
-> Index Scan using _ddb_ecfa0b6e_base_pk_base_sk_b_idx on "_ddb_ecfa0b6e-…"
Index Cond: ((base_pk = 'nonexistent'::text) AND (base_sk_b = '\x00'::bytea))
Buffers: shared read=3
Execution Time: 0.083 ms
Expected behavior
When ExtendDB provisions a GSI table, it provisions all indexes required by its own access patterns — including a leading-(base_pk, base_sk_b) index, since the engine itself issues DELETE … WHERE base_pk = $1 AND base_sk_b = $2 for GSI maintenance.
Actual behavior
GSI tables ship with only:
PRIMARY KEY (pk, base_pk, base_sk_b)
(pk, sk_b, base_pk, base_sk_b) secondary
Both indexes lead with pk. The maintenance DELETE does not filter by pk, so neither index is usable and every DELETE seq-scans the entire GSI table. Cost grows linearly with table size and is paid on every UpdateItem/DeleteItem that touches a GSI key.
Environment
- ExtendDB version: commit
b531241a74adfceccc700a361a96bb2e155ff0bf (pinned in twisp/core MODULE.bazel as @extenddb_src)
- Operating system: macOS 15.5 (host) /
postgres:16-alpine container (DB)
- Rust version: n/a — 1.92.0, edition 2024
- Client SDK/driver:
github.com/aws/aws-sdk-go-v2/service/dynamodb (talking to ExtendDB over its DynamoDB-protocol endpoint)
- Deployment method: container (Twisp's
bazel/services/local:local-server-arm64 image bundles the extenddb binary alongside Postgres 16)
Logs / stack trace
pg_stat_statements top-by-total_exec_time for one benchmark run (before adding the index):
calls | total_s | mean_ms | pct | query
--------+---------+---------+------+-----------------------------------------------------------------
21198 | 270.45 | 12.76 | 74.4 | DELETE FROM "_ddb_ecfa0b6e-…" WHERE base_pk = $1 AND base_sk_b = $2
128154 | 50.50 | 0.39 | 13.9 | SELECT item_data FROM "_ddb_65cec909-…" WHERE pk = $1 AND sk_b = $2 FOR UPDATE
127954 | 14.15 | 0.11 | 3.9 | INSERT INTO stream_records (sequence_number, shard_id, table_id, event_name, record_data) VALUES (…)
151 | 7.40 | 49.04 | 2.0 | SELECT record_data FROM stream_records WHERE shard_id = $1 AND sequence_number > $2 …
29843 | 5.04 | 0.17 | 1.4 | UPDATE "_ddb_65cec909-…" SET item_data = $3 WHERE pk = $1 AND sk_b = $2
98111 | 4.70 | 0.05 | 1.3 | INSERT INTO "_ddb_65cec909-…" (pk, sk_b, item_data) VALUES (…) ON CONFLICT (pk, sk_b) …
77070 | 2.31 | 0.03 | 0.6 | INSERT INTO "_ddb_ecfa0b6e-…" (pk, sk_b, base_pk, base_sk_b, item_data) VALUES (…)
Same workload after the manual index fix:
calls | total_s | mean_ms | pct | query
--------+---------+---------+------+-----------------------------------------------------------------
137909 | 22.75 | 0.165 | 36.8 | SELECT item_data FROM "_ddb_65cec909-…" WHERE pk = $1 AND sk_b = $2 FOR UPDATE
137709 | 14.86 | 0.108 | 24.0 | INSERT INTO stream_records …
36248 | 5.69 | 0.157 | 9.2 | UPDATE "_ddb_65cec909-…" SET item_data = $3 WHERE pk = $1 AND sk_b = $2
…
28201 | 0.58 | 0.021 | 0.9 | DELETE FROM "_ddb_ecfa0b6e-…" WHERE base_pk = $1 AND base_sk_b = $2
The previously dominant DELETE went from 74.4% → 0.9% of total DB time at the same workload.
Additional context
This was found while debugging unexpectedly low throughput on a Twisp benchmark against a local ExtendDB. Initial suspicion was lock contention or sequence serialization; pg_blocking_pids() showed no blocking and nextval() was 0.1% of DB time. pg_stat_statements made the missing-index regression obvious in a single query.
Workaround for anyone hitting this before a release lands:
CREATE INDEX
"<gsi-table>_base_pk_base_sk_b_idx"
ON "<gsi-table>" (base_pk, base_sk_b);
ANALYZE "<gsi-table>";
…repeated for every GSI table in the database. New GSI tables created after the index-generation code is fixed should ship with it by default.
Checklist
Describe the bug
ExtendDB provisions GSI (secondary-index) tables without an index on
(base_pk, base_sk_b). The engine then issuesDELETE FROM "<gsi-table>" WHERE base_pk = $1 AND base_sk_b = $2to maintain the GSI onUpdateItem/DeleteItemof the base item. Because no index leads with those columns, every reverse-lookup DELETE falls through to a sequential scan of the entire GSI table.In a Twisp benchmark with two GSI tables of ~66k rows each, this single statement accounted for 74.4% of all Postgres CPU time (270 s of 363 s total in a 5×500-tx run), at 12.76 ms mean per call.
Adding
CREATE INDEX … (base_pk, base_sk_b)by hand dropped the same DELETE from 12.76 ms to 0.021 ms (≈610× faster per call, ≈250× fewer buffers) and lifted the batched-write throughput of the surrounding workload by ~40% (89.4 → 125.3 tx/s).The schema generator should provision this index alongside the existing primary key and the
(pk, sk_b, base_pk, base_sk_b)secondary, since the engine itself depends on(base_pk, base_sk_b)lookups for GSI maintenance.To Reproduce
Start ExtendDB against Postgres in the default configuration; create a DynamoDB-style table with at least one GSI.
Populate it: enough rows to make a Seq Scan visible against an Index Scan (tens of thousands is enough; the regression is asymptotic in row count regardless).
Issue an
UpdateItemagainst a base item in a way that triggers GSI maintenance (changes a GSI key, or deletes the item entirely).Inspect the GSI table:
Observe a
Seq Scanon the GSI table for the DELETE rather than anIndex Scan.Concrete observation from a local run (Postgres 16, table with 66,072 rows):
After
CREATE INDEX "…_base_pk_base_sk_b_idx" ON "<gsi-table>" (base_pk, base_sk_b);:Expected behavior
When ExtendDB provisions a GSI table, it provisions all indexes required by its own access patterns — including a leading-
(base_pk, base_sk_b)index, since the engine itself issuesDELETE … WHERE base_pk = $1 AND base_sk_b = $2for GSI maintenance.Actual behavior
GSI tables ship with only:
PRIMARY KEY (pk, base_pk, base_sk_b)(pk, sk_b, base_pk, base_sk_b)secondaryBoth indexes lead with
pk. The maintenance DELETE does not filter bypk, so neither index is usable and every DELETE seq-scans the entire GSI table. Cost grows linearly with table size and is paid on every UpdateItem/DeleteItem that touches a GSI key.Environment
b531241a74adfceccc700a361a96bb2e155ff0bf(pinned in twisp/core MODULE.bazel as@extenddb_src)postgres:16-alpinecontainer (DB)github.com/aws/aws-sdk-go-v2/service/dynamodb(talking to ExtendDB over its DynamoDB-protocol endpoint)bazel/services/local:local-server-arm64image bundles the extenddb binary alongside Postgres 16)Logs / stack trace
pg_stat_statementstop-by-total_exec_timefor one benchmark run (before adding the index):Same workload after the manual index fix:
The previously dominant DELETE went from 74.4% → 0.9% of total DB time at the same workload.
Additional context
This was found while debugging unexpectedly low throughput on a Twisp benchmark against a local ExtendDB. Initial suspicion was lock contention or sequence serialization;
pg_blocking_pids()showed no blocking andnextval()was 0.1% of DB time.pg_stat_statementsmade the missing-index regression obvious in a single query.Workaround for anyone hitting this before a release lands:
…repeated for every GSI table in the database. New GSI tables created after the index-generation code is fixed should ship with it by default.
Checklist