feat: add index to speed up reindexFrom#1775
Conversation
jannistsiroyannis
left a comment
There was a problem hiding this comment.
I thought we already had a:
idx__lddb_greatest_modified
and it looks like we do:
"idx__lddb_greatest_modified" btree (GREATEST(modified, totstz(data #>> '{@graph,0,generationDate}'::text[])))
I may be mistaken. Need to look closer.
|
I think I was wrong, you might have added it to dev in advance ? |
|
Ah my mistake, we had it, but only on the versions-table. |
jannistsiroyannis
left a comment
There was a problem hiding this comment.
The tradeoff (the size of the index), is perhaps worth the saved reindexing time. Even if reindexing is only done occasionally. 🤷
|
The new index is about 3.2 GB (out of 59 GB of indices just for the lddb table, and about 337 GB of indices in total). I think it's probably reasonable. An alternative is a UNION like this which would use only pre-existing indices: But it'd require a little messing with the code, so a new index is still the cleaner solution, I'd say. Probably. 🤔 |
Currently
reindexFromat the end of each reindex takes about 20 minutes or so, even when there's almost nothing to reindex, because of very slow SQL queries. So let's add an index to make them not slow. tl;dr: 100x speedup, 520 rows scanned instead of 20 million (on a QA example query).For each collection we do a query like this:
Previously on QA:
QA with the new index (note: it's no longer there, I added it temporarily for testing):
Note that I also changed from
::timestamptzto ourtotstz()function because it's not possible to create an index with::timestamptzas it's not marked IMMUTABLE (unliketotstz()).