Skip to content

Improve indexing command#2133

Draft
mascarpon3 wants to merge 3 commits intomainfrom
improve-indexing-command
Draft

Improve indexing command#2133
mascarpon3 wants to merge 3 commits intomainfrom
improve-indexing-command

Conversation

@mascarpon3
Copy link
Contributor

@mascarpon3 mascarpon3 commented Mar 25, 2026

Purpose

to index all documents we need to be able to restart from last successful batch in case of crash. to do so we need to be able to specify time bounds on the bulk index to allow recovering from checkpoint after crash and asynchronously delegate it to celery worker.

Proposal

  • ✨ cache the last successful document.updated_at
  • ✨ add lower_time_bound and upper_time_bound args
  • ✨ apply async existing batch_document_indexer_task
  • ✨ make the command available in django admin

External contributions

Thank you for your contribution! 🎉

Please ensure the following items are checked before submitting your pull request:

  • I have read and followed the contributing guidelines
  • I have read and agreed to the Code of Conduct
  • I have signed off my commits with git commit --signoff (DCO compliance)
  • I have signed my commits with my SSH or GPG key (git commit -S)
  • My commit messages follow the required format: <gitmoji>(type) title description
  • I have added a changelog entry under ## [Unreleased] section (if noticeable change)
  • I have added corresponding tests for new features or bug fixes (if applicable)

i think some docstrings, about a counter, were outdated.
i add more details to help understand the logic.

Signed-off-by: charles <charles.englebert@protonmail.com>
crash-save mode consist in indexing documents in ascending
updated_at order and save the last document.update_at.
This allows resuming indexing from the last successful batch
in case of a crash.

Signed-off-by: charles <charles.englebert@protonmail.com>
@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch 3 times, most recently from d374d7c to a3acb66 Compare March 25, 2026 16:32
we need to be able to specify time bounds on the bulk index
to allow recovering from checkpoint after crash

Signed-off-by: charles <charles.englebert@protonmail.com>
@mascarpon3 mascarpon3 force-pushed the improve-indexing-command branch from a3acb66 to 6d91fca Compare March 25, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant