Skip to content

ref(search): Prefer substring and prefix matches in fzf scoring#111050

Closed
JonasBa wants to merge 1 commit intomasterfrom
jb/ref/fzf
Closed

ref(search): Prefer substring and prefix matches in fzf scoring#111050
JonasBa wants to merge 1 commit intomasterfrom
jb/ref/fzf

Conversation

@JonasBa
Copy link
Copy Markdown
Member

@JonasBa JonasBa commented Mar 18, 2026

The fzf v1 algorithm scores matches purely using word-boundary bonuses and gap
penalties. This causes a scattered subsequence match that happens to hit multiple
word boundaries to outscore a true contiguous substring match that starts mid-word.

In practice: searching "sco" in a Sentry member/assignee picker ranked
sentry.connect.ops@sentry.io (score 72, scattered — s at string start, c
after a dot, o consecutive) above discovery.channel@sentry.io or
francesco.novy@sentry.io (score 56, both contain "sco" as a real substring).

Changes

Two post-score bonuses are added after the existing exact-match boost:

Substring bonus (+24) — applied when matches.length === 1 (all pattern
characters form a single contiguous range). Ensures any substring match beats a
scattered boundary match regardless of where in the string it appears.

Prefix bonus (+8) — applied additionally when sidx === 0. Distinguishes a
match at the very start of the string from the same substring appearing after a
word separator mid-string. Without this, scott.morrison and aaron.scotton
tied at the same score because both s characters receive the same boundary bonus.

Resulting score tiers (pattern "sco")

Tier Example Score
Exact "sco" 96
Prefix substring scott.morrison@sentry.io 112
Boundary substring aaron.scotton@sentry.io 104
Mid-word substring discovery.channel@sentry.io 80
Scattered sentry.connect.ops@sentry.io 72

Tests cover all four tiers with a realistic Sentry org member dataset (labels +
emails) that mirrors the actual search UI, plus existing OTel attribute, full-name,
and email search suites.

The fzf v1 algorithm scores matches purely by word-boundary bonuses and
gap penalties. This causes a scattered subsequence match that happens to
hit multiple word boundaries (e.g. "s" at string start, "co" after a dot)
to outscore a true contiguous substring match that starts mid-word. In
practice this means searching "sco" in a member list ranks
"sentry.connect.ops@sentry.io" (scattered) above "discovery.channel@sentry.io"
or "francesco.novy@sentry.io" (both contain "sco" as a substring).

Two post-score bonuses are added to establish a clear preference hierarchy:

- Substring bonus (+24): applied when matches.length === 1, i.e. all
  pattern characters are contiguous. Ensures any substring match beats a
  scattered boundary match.

- Prefix bonus (+8): applied additionally when sidx === 0. Distinguishes
  "scott.morrison" (match at string start) from "aaron.scotton" (match
  starts a word component mid-string) — both previously tied at the same
  boundary bonus.

Resulting tiers for pattern "sco":
  prefix substring  (scott.morrison)         112
  boundary substring (aaron.scotton)          104
  mid-word substring (discovery.channel)       80
  scattered          (sentry.connect.ops)      72

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Mar 18, 2026

it('ranks globex.io emails above others when searching "globex"', () => {
const results = search('globex');
const globexEmails = results.filter(r => r.email.includes('globex.io'));

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
globex.io
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI about 1 month ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

it('ranks globex.io emails above others when searching "globex"', () => {
const results = search('globex');
const globexEmails = results.filter(r => r.email.includes('globex.io'));
const nonGlobexEmails = results.filter(r => !r.email.includes('globex.io'));

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
globex.io
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI about 1 month ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

@JonasBa JonasBa closed this Mar 19, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants