Skip to content

fix: persist embedding state to S3 so stale vectors are deleted on re…#424

Merged
pulkit004 merged 1 commit intomainfrom
docs-ingestion
Mar 30, 2026
Merged

fix: persist embedding state to S3 so stale vectors are deleted on re…#424
pulkit004 merged 1 commit intomainfrom
docs-ingestion

Conversation

@pulkit004
Copy link
Copy Markdown
Contributor

…-ingestion

The state file (state/indexed-hashes.json) was written to the ephemeral CI runner's disk and lost when the runner terminated. Every subsequent run started with an empty previousHashes set, causing:

  • Vectors deleted: 0 — removed content vectors accumulated in Pinecone forever
  • All 4109 embeddings regenerated on every push (wasteful Bedrock API calls)

Fix: download the state from S3 before embedding sync, upload it after. Uses embed-state/indexed-hashes.json in the existing CONTENT_BUCKET_NAME bucket.

…-ingestion

The state file (state/indexed-hashes.json) was written to the ephemeral CI
runner's disk and lost when the runner terminated. Every subsequent run started
with an empty previousHashes set, causing:
- Vectors deleted: 0 — removed content vectors accumulated in Pinecone forever
- All 4109 embeddings regenerated on every push (wasteful Bedrock API calls)

Fix: download the state from S3 before embedding sync, upload it after.
Uses embed-state/indexed-hashes.json in the existing CONTENT_BUCKET_NAME bucket.

Co-Authored-By: claude-flow <ruv@ruv.net>
@github-actions
Copy link
Copy Markdown

Checklist to merge a PR 🚀

To merge this pull request, please take time to complete the checklist.

What action did you perform?

Review the corresponding checklist items for the action you performed and mark them done.

Edit an existing content (MDX) page

Checklist

  • Review changes using the MDX preview option
  • If the length of content >15000 chars, use the Content preview portal to view changes
  • If a redirect is needed to the existing page, add a key, value pair in redirects.json

Edit an existing API reference page

Checklist


Add a new content (MDX) page

Checklist

  • Create a .mdx file with the path as its name in the content folder
  • Add frontmatter with all the metadata
  • Review the order of items in Sidebar using the Sidebar preview option
  • Review changes using the MDX preview option
  • If the length of content >15000 chars, use the Content preview portal to view changes
  • Created a folder with the same name, if any children were to be added to the page
  • Once all changes are done, update the menu items by using the Menu Items option
  • Add a key, and value pair in redirects.json if you wish to have a redirect to the new page

Add a new API reference page

Checklist

  • Create a .json file with the product path as its name
  • Create an api-reference.mdx file in the respective product folder inside content folder
  • Add frontmatter with all the metadata
  • Review the order of items in Sidebar using the Sidebar preview option
  • Add API reference in JSON format (OpenAPI or Swagger) into created .json file.
  • Used the Content preview portal to view changes
  • Once all changes are done, update the menu items by using the Menu Items option

@pulkit004 pulkit004 merged commit 0b7f84c into main Mar 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants