feat: add index to speed up reindexFrom by andersju · Pull Request #1775 · libris/librisxl

andersju · 2026-05-26T11:17:02Z

Currently reindexFrom at the end of each reindex takes about 20 minutes or so, even when there's almost nothing to reindex, because of very slow SQL queries. So let's add an index to make them not slow. tl;dr: 100x speedup, 520 rows scanned instead of 20 million (on a QA example query).

For each collection we do a query like this:

SELECT id, data, created, modified, deleted
      FROM lddb
      WHERE GREATEST(modified, (data#>>'{@graph,0,generationDate}')::timestamptz) >= '2026-05-25 11:57:51+02'
        AND GREATEST(modified, (data#>>'{@graph,0,generationDate}')::timestamptz) <= 'infinity'
        AND collection = 'bib'
        AND deleted = false;

Previously on QA:

EXPLAIN ANALYZE SELECT id, data, created, modified, deleted
      FROM lddb
      WHERE GREATEST(modified, (data#>>'{@graph,0,generationDate}')::timestamptz) >= '2026-05-25 11:57:51+02'
        AND GREATEST(modified, (data#>>'{@graph,0,generationDate}')::timestamptz) <= 'infinity'
        AND collection = 'bib'
        AND deleted = false;
   Gather  (cost=221799.77..27499839.54 rows=47163 width=1229) (actual time=7681.396..262575.392 rows=244 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on lddb  (cost=220799.77..27494123.24 rows=19651 width=1229) (actual time=7660.604..262056.823 rows=81 loops=3)
         Recheck Cond: (collection = 'bib'::text)
         Filter: ((NOT deleted) AND (GREATEST(modified, ((data #>> '{@graph,0,generationDate}'::text[]))::timestamp with time zone) >= '2026-05-25 11:57:51+02'::timestamp with time zone) AND (GREATEST(modified, ((data #>> '{@graph,0,generationDate}'::text[]))::timestamp with time zone) <= 'infinity'::timestamp with time zone))
         Rows Removed by Filter: 6686232
         Heap Blocks: exact=2832009
         ->  Bitmap Index Scan on idx_lddb_collection  (cost=0.00..220787.98 rows=19408455 width=0) (actual time=3103.190..3103.191 rows=20079895 loops=1)
               Index Cond: (collection = 'bib'::text)
 Planning Time: 0.162 ms
 Execution Time: 262576.044 ms

QA with the new index (note: it's no longer there, I added it temporarily for testing):

EXPLAIN ANALYZE SELECT id, data, created, modified, deleted
      FROM lddb
      WHERE GREATEST(modified, totstz(data#>>'{@graph,0,generationDate}')) >= '2026-05-25 11:57:51+02'
        AND GREATEST(modified, totstz(data#>>'{@graph,0,generationDate}')) <= 'infinity'
        AND collection = 'bib'
        AND deleted = false;

 Bitmap Heap Scan on lddb  (cost=241772.62..677457.22 rows=48754 width=1229) (actual time=2557.584..2599.301 rows=244 loops=1)
   Recheck Cond: ((GREATEST(modified, totstz((data #>> '{@graph,0,generationDate}'::text[]))) >= '2026-05-25 11:57:51+02'::timestamp with time zone) AND (GREATEST(modified, totstz((data #>> '{@graph,0,generationDate}'::text[]))) <= 'infinity'::timestamp with time zone) AND (collection = 'bib'::text))
   Filter: (NOT deleted)
   Heap Blocks: exact=154
   ->  BitmapAnd  (cost=241772.62..241772.62 rows=100318 width=0) (actual time=2556.991..2556.993 rows=0 loops=1)
         ->  Bitmap Index Scan on idx__lddb_greatest_modified  (cost=0.00..16047.11 rows=774254 width=0) (actual time=0.104..0.104 rows=520 loops=1)
               Index Cond: ((GREATEST(modified, totstz((data #>> '{@graph,0,generationDate}'::text[]))) >= '2026-05-25 11:57:51+02'::timestamp with time zone) AND (GREATEST(modified, totstz((data #>> '{@graph,0,generationDate}'::text[]))) <= 'infinity'::timestamp with time zone))
         ->  Bitmap Index Scan on idx_lddb_collection  (cost=0.00..225700.89 rows=20063509 width=0) (actual time=2531.675..2531.675 rows=20079895 loops=1)
               Index Cond: (collection = 'bib'::text)
 Planning Time: 1.323 ms
 Execution Time: 2599.357 ms

Note that I also changed from ::timestamptz to our totstz() function because it's not possible to create an index with ::timestamptz as it's not marked IMMUTABLE (unlike totstz()).

jannistsiroyannis

I thought we already had a:
idx__lddb_greatest_modified
and it looks like we do:
"idx__lddb_greatest_modified" btree (GREATEST(modified, totstz(data #>> '{@graph,0,generationDate}'::text[])))

I may be mistaken. Need to look closer.

jannistsiroyannis · 2026-05-26T12:14:42Z

I think I was wrong, you might have added it to dev in advance ?

jannistsiroyannis · 2026-05-26T12:17:27Z

Ah my mistake, we had it, but only on the versions-table.

jannistsiroyannis

The tradeoff (the size of the index), is perhaps worth the saved reindexing time. Even if reindexing is only done occasionally. 🤷

andersju · 2026-05-26T13:23:29Z

The new index is about 3.2 GB (out of 59 GB of indices just for the lddb table, and about 337 GB of indices in total). I think it's probably reasonable. An alternative is a UNION like this which would use only pre-existing indices:

EXPLAIN ANALYZE  SELECT id, data, created, modified, deleted FROM lddb
  WHERE modified >= '2026-05-25 11:57:51+02' AND modified <= 'infinity' AND collection = 'bib' AND deleted = false
  UNION
  SELECT id, data, created, modified, deleted FROM lddb
  WHERE totstz(data#>>'{@graph,0,generationDate}') >= '2026-05-25 11:57:51+02' AND totstz(data#>>'{@graph,0,generationDate}') <= 'infinity' AND collection = 'bib' AND deleted = false;

 HashAggregate  (cost=503819.92..503915.55 rows=9563 width=81) (actual time=6.439..6.556 rows=244 loops=1)
   Group Key: lddb.id, lddb.data, lddb.created, lddb.modified, lddb.deleted
   Batches: 1  Memory Usage: 665kB
   ->  Append  (cost=0.57..503700.38 rows=9563 width=81) (actual time=0.025..1.632 rows=244 loops=1)
         ->  Index Scan using idx_lddb_modified on lddb  (cost=0.57..261042.49 rows=4791 width=1229) (actual time=0.025..0.474 rows=244 loops=1)
               Index Cond: ((modified >= '2026-05-25 11:57:51+02'::timestamp with time zone) AND (modified <= 'infinity'::timestamp with time zone))
               Filter: ((NOT deleted) AND (collection = 'bib'::text))
               Rows Removed by Filter: 276
         ->  Index Scan using idx_lddb_generation_date on lddb lddb_1  (cost=0.57..242514.44 rows=4772 width=1229) (actual time=1.134..1.134 rows=0 loops=1)
               Index Cond: ((totstz((data #>> '{@graph,0,generationDate}'::text[])) >= '2026-05-25 11:57:51+02'::timestamp with time zone) AND (totstz((data #>> '{@graph,0,generationDate}'::text[])) <= 'infinity'::timestamp with time zone))
               Filter: ((NOT deleted) AND (collection = 'bib'::text))
               Rows Removed by Filter: 416
 Planning Time: 0.264 ms
 Execution Time: 6.652 ms

But it'd require a little messing with the code, so a new index is still the cleaner solution, I'd say. Probably. 🤔

feat: add index to speed up reindexFrom

b437f33

andersju requested review from jannistsiroyannis, kaipoykio, kwahlin, lrosenstrom and olovy May 26, 2026 11:17

jannistsiroyannis reviewed May 26, 2026

View reviewed changes

jannistsiroyannis approved these changes May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add index to speed up reindexFrom#1775

feat: add index to speed up reindexFrom#1775
andersju wants to merge 1 commit into
developfrom
feature/add-index-for-faster-reindexfrom

andersju commented May 26, 2026 •

edited

Loading

Uh oh!

jannistsiroyannis left a comment

Uh oh!

jannistsiroyannis commented May 26, 2026 •

edited

Loading

Uh oh!

jannistsiroyannis commented May 26, 2026

Uh oh!

jannistsiroyannis left a comment

Uh oh!

andersju commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andersju commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jannistsiroyannis left a comment

Choose a reason for hiding this comment

Uh oh!

jannistsiroyannis commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jannistsiroyannis commented May 26, 2026

Uh oh!

jannistsiroyannis left a comment

Choose a reason for hiding this comment

Uh oh!

andersju commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andersju commented May 26, 2026 •

edited

Loading

jannistsiroyannis commented May 26, 2026 •

edited

Loading

andersju commented May 26, 2026 •

edited

Loading