Skip to content

feat: solve issues #9 and #10 (typo tolerance + neighbor-aware embeddings)#21

Merged
luigi-agosti merged 1 commit intomainfrom
feat/batch-issues-9-10
Apr 11, 2026
Merged

feat: solve issues #9 and #10 (typo tolerance + neighbor-aware embeddings)#21
luigi-agosti merged 1 commit intomainfrom
feat/batch-issues-9-10

Conversation

@Djain912
Copy link
Copy Markdown
Collaborator

@Djain912 Djain912 commented Apr 2, 2026

Summary

This PR completes issues #9 and #10 in one batch:

  1. Issue Typo tolerance via Levenshtein distance #9: Typo tolerance for lexical matching
  • Adds Levenshtein-based typo bonus for edit-distance-1 token pairs.
  • Includes length guards and bonus cap to prevent over-boosting.
  • Adds focused typo tests and a typo benchmark case.
  1. Issue Neighbor-aware embeddings for contextual disambiguation #10: Neighbor-aware embeddings
  • Adds immediate-neighbor context blending for embedding vectors.
  • Introduces configurable neighbor context weight with clamping to [0, 1] and default 0.10.
  • Exposes public constructor for configurable weight.
  • Adds disambiguation tests and benchmark comparison with context disabled.

API

  • Added:
    • NewEmbeddingMatcherWithNeighborWeight(e Embedder, weight float64) ElementMatcher

Files

  • docs/reference/api.md
  • internal/engine/benchmark_test.go
  • internal/engine/embedding.go
  • internal/engine/embedding_test.go
  • internal/engine/lexical.go
  • internal/engine/lexical_test.go
  • semantic.go
  • semantic_test.go

Validation Performed

  • go test -count=1 ./...
  • go test ./internal/engine -run "TestLexicalScore_TypoTolerance_RealWorldSettings|TestEmbeddingMatcher_NeighborContextDisambiguatesRealWorldButtons|TestEmbeddingMatcher_SingleElement_WithNeighborWeight" -count=1
  • go test ./internal/engine -run ^$ -bench . -benchtime=1x
  • Manual CLI checks on real snapshots:
    • go run ./cmd/semantic find "login button" --snapshot testdata/snapshots/login-page.json --top-k 3 --format table
    • go run ./cmd/semantic find "add to cart" --snapshot testdata/snapshots/ecommerce-product.json --top-k 5 --format refs
    • go run ./cmd/semantic match "add to cart" e10 --snapshot testdata/snapshots/ecommerce-product.json
    • go run ./cmd/semantic classify "node is detached from document"

Environment Notes

  • go test -race ./... could not run on this Windows environment due local cgo/compiler limitation.
  • bash scripts/check.sh and bash scripts/e2e.sh could not run because WSL is not installed on this machine.

Closes #9
Closes #10

@luigi-agosti luigi-agosti force-pushed the feat/batch-issues-9-10 branch from 97fce56 to afa2c25 Compare April 11, 2026 20:41
@luigi-agosti luigi-agosti merged commit 62e0982 into main Apr 11, 2026
7 checks passed
@luigi-agosti luigi-agosti deleted the feat/batch-issues-9-10 branch April 11, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Neighbor-aware embeddings for contextual disambiguation Typo tolerance via Levenshtein distance

2 participants