Skip to content

fix(embedding): skip oversized single item during fallback#12

Merged
AperturePlus merged 1 commit intodevelopfrom
codex/fix-indexing-failure-for-oversized-items
Feb 22, 2026
Merged

fix(embedding): skip oversized single item during fallback#12
AperturePlus merged 1 commit intodevelopfrom
codex/fix-indexing-failure-for-oversized-items

Conversation

@AperturePlus
Copy link
Owner

Motivation

  • Indexing aborted entirely when a single text exceeded the embedding model token limit at minimum batch size, preventing the rest of the codebase from being indexed.
  • Design rationale: insert a zero-vector placeholder for permanently oversized items and continue processing to preserve output ordering and indexing continuity; alternatives considered were omitting the item or failing the entire run, but those either break positional mapping or reduce robustness; tradeoff is that downstream consumers must treat zero vectors as placeholders.

Description

  • Changed OpenAIEmbeddingClient._embed_with_fallback to log a warning and append a zero-vector ([0.0] * dimension) when a BatchSizeError occurs at min_batch_size, then advance and continue processing.
  • Updated client docstrings and exception descriptions to reflect the new behavior for oversized single items.
  • Added a property test test_oversized_single_item_is_skipped_with_zero_vector to verify the client continues processing, preserves output ordering, and places a zero-vector at the oversized item's position.

Testing

  • Ran uv run ruff check src tests and it passed successfully.
  • Ran uv run pytest tests/property/test_embedding_client_properties.py -q and the new test plus file passed.
  • Ran uv run pytest tests/ -v --tb=short -q --durations=10 which produced early failures in the full suite unrelated to this change and did not complete cleanly in this execution.
  • Ran uv run mypy src --ignore-missing-imports --no-error-summary which reported existing repo-wide type issues unrelated to this change.

Codex Task

Why:
Indexing aborted entirely when one text exceeded model token limits at min batch size.

What:
Changed fallback logic to skip permanently oversized items by inserting a zero-vector placeholder and continuing processing.
Updated embedding client docs/messages to reflect the new behavior.
Added property test coverage to verify oversized-item isolation and output ordering preservation.

Test:
uv run ruff check src tests (pass)
uv run pytest tests/property/test_embedding_client_properties.py -q (pass)
uv run mypy src --ignore-missing-imports --no-error-summary (fails: existing repo-wide type issues unrelated to this change)
uv run pytest tests/ -v --tb=short -q --durations=10 (did not complete cleanly in this run; emitted multiple failures early)
@AperturePlus AperturePlus merged commit 0e40760 into develop Feb 22, 2026
1 check passed
@AperturePlus AperturePlus deleted the codex/fix-indexing-failure-for-oversized-items branch February 25, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant