_commit_batch_streaming (conversation_base.py line 434) opens a transaction and calls _update_secondary_indexes_incremental inside it. That function does two expensive things:
-
Embedding generation — _update_message_index_incremental → message_index.add_messages() → text_location_index.add_text_locations() → _embedding_index.add_texts(). These are API calls to the embedding model, happening inside the DB transaction.
-
Fuzzy index term embeddings — _update_related_terms_incremental collects new terms from semantic refs and calls fuzzy_index.add_terms(), which generates an embedding per unique term. Many terms repeat across batches; the CachingEmbeddingModel helps but the per-batch overhead of collecting and checking is still there.
The two-stage pipeline already overlaps LLM extraction(N+1) with commit(N). But the commit phase itself is slower than necessary because of these embedding calls. Pre-computing embeddings alongside knowledge extraction (before the transaction opens) would keep the commit phase to pure DB writes. The fuzzy index could also be deferred to a single pass after all batches complete, since it's only needed for query-time — not for correctness of the ingested data.
_commit_batch_streaming(conversation_base.py line 434) opens a transaction and calls_update_secondary_indexes_incrementalinside it. That function does two expensive things:Embedding generation —
_update_message_index_incremental→message_index.add_messages()→text_location_index.add_text_locations()→_embedding_index.add_texts(). These are API calls to the embedding model, happening inside the DB transaction.Fuzzy index term embeddings —
_update_related_terms_incrementalcollects new terms from semantic refs and callsfuzzy_index.add_terms(), which generates an embedding per unique term. Many terms repeat across batches; theCachingEmbeddingModelhelps but the per-batch overhead of collecting and checking is still there.The two-stage pipeline already overlaps LLM extraction(N+1) with commit(N). But the commit phase itself is slower than necessary because of these embedding calls. Pre-computing embeddings alongside knowledge extraction (before the transaction opens) would keep the commit phase to pure DB writes. The fuzzy index could also be deferred to a single pass after all batches complete, since it's only needed for query-time — not for correctness of the ingested data.