fix(embedding): skip oversized single item during fallback#12
Merged
AperturePlus merged 1 commit intodevelopfrom Feb 22, 2026
Merged
Conversation
Why: Indexing aborted entirely when one text exceeded model token limits at min batch size. What: Changed fallback logic to skip permanently oversized items by inserting a zero-vector placeholder and continuing processing. Updated embedding client docs/messages to reflect the new behavior. Added property test coverage to verify oversized-item isolation and output ordering preservation. Test: uv run ruff check src tests (pass) uv run pytest tests/property/test_embedding_client_properties.py -q (pass) uv run mypy src --ignore-missing-imports --no-error-summary (fails: existing repo-wide type issues unrelated to this change) uv run pytest tests/ -v --tb=short -q --durations=10 (did not complete cleanly in this run; emitted multiple failures early)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Description
OpenAIEmbeddingClient._embed_with_fallbackto log a warning and append a zero-vector ([0.0] * dimension) when aBatchSizeErroroccurs atmin_batch_size, then advance and continue processing.test_oversized_single_item_is_skipped_with_zero_vectorto verify the client continues processing, preserves output ordering, and places a zero-vector at the oversized item's position.Testing
uv run ruff check src testsand it passed successfully.uv run pytest tests/property/test_embedding_client_properties.py -qand the new test plus file passed.uv run pytest tests/ -v --tb=short -q --durations=10which produced early failures in the full suite unrelated to this change and did not complete cleanly in this execution.uv run mypy src --ignore-missing-imports --no-error-summarywhich reported existing repo-wide type issues unrelated to this change.Codex Task