Skip to content

feat: add fts support#408

Open
egolearner wants to merge 24 commits into
alibaba:mainfrom
egolearner:feat/fts
Open

feat: add fts support#408
egolearner wants to merge 24 commits into
alibaba:mainfrom
egolearner:feat/fts

Conversation

@egolearner
Copy link
Copy Markdown
Collaborator

address #397

egolearner added 20 commits May 22, 2026 15:43
Move the per-query filter check from the column-reader loop into the
Disjunction/Conjunction/Phrase iterators so filtered docs no longer pay
for block-max binary search, do_next alignment, or phase-2 position
verification ($POS CF reads). TermDocIterator inherits the base-class
default and stays unchanged.
block_max_info_for() now returns {score, last_doc} in one binary search
(with a small cache), so the standalone current_block_max_score(),
skip_to_next_block(), block_max_score_for() and block_max_last_doc_for()
methods have no live callers. Remove them from BitPackedPostingIterator,
the DocIterator base, and TermDocIterator, along with the now-dead
current_block_max_score_ member and its decode_block assignment. Tests
adjusted to query via block_max_info_for().
When an invert filter is highly selective compared to the FTS posting
size, posting-driven evaluation walks far more docs than necessary.
Mirror the existing vector_recall pattern: when invert match_count is
below fts_brute_force_by_keys_ratio * doc_count, extract the small id
set and AND it into the FTS root via a new CandidateDocIterator. The
candidate iterator becomes the lead by cost, turning the posting walk
into per-candidate advance() + matches() + score() and fully reusing
the existing AND / filter-pushdown / BM25 machinery.

- new CandidateDocIterator: ascending segment-local ids, lower_bound
  advance, zero score contribution
- FtsColumnIndexer::search wraps root_iter in Conjunction when
  FtsQueryParams.candidate_ids is non-empty
- new GlobalConfig::fts_brute_force_by_keys_ratio (default 0.05,
  independent from the vector knob because per-candidate FTS cost is
  higher due to phrase phase-2 IO), wired through C API + Python binding
- DocFilter::get_bf_by_keys_and_update now takes an explicit ratio so
  the two callers (vector vs FTS) pick the right knob; on the brute-
  force branch invert_filter_ is cleared so DocFilter never re-checks
  the same ids
- 9 iterator unit tests + 7 reader equivalence tests (Term / OR / AND /
  Phrase / Nested, coexistence with IndexFilter, empty-candidate
  fallback) + config default / validation asserts
@egolearner egolearner requested a review from Cuiyus as a code owner May 22, 2026 07:43
@egolearner egolearner changed the title feat: add fts support in db layer feat: add fts support May 22, 2026

import pytest

from zvec.model.param.query import Fts, Query
Copy link
Copy Markdown
Collaborator

@JalinWang JalinWang May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming "Fts" is a little bit too generic. Would it be more precise to name it after its underlying dependency, like _FtsQuery (binding) or FtsQueryParam (C++)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants