Skip to content

feat(retriever): add level filter to find with retriever-level filtering, preserving recursive navigation#1988

Merged
qin-ctx merged 2 commits into
volcengine:mainfrom
sponge225:feat/level-filter
May 14, 2026
Merged

feat(retriever): add level filter to find with retriever-level filtering, preserving recursive navigation#1988
qin-ctx merged 2 commits into
volcengine:mainfrom
sponge225:feat/level-filter

Conversation

@sponge225
Copy link
Copy Markdown
Collaborator

Description

为 find 功能添加 level 过滤能力,支持按 L0(摘要)、L1(概览)、L2(原始文件)级别筛选返回结果。

核心设计

过滤逻辑在 HierarchicalRetriever 的两个结果收集点实现,而非在向量搜索层或后过滤层:

  1. global search 入池过滤 :全局向量搜索的初始候选者按 level 过滤后进入结果池
  2. 递归遍历入池过滤 :递归搜索中每搜到一个结果,按 level 判断是否入池
    关键点 :递归导航逻辑( dir_queue )不受 level 影响,L0/L1 目录仍正常进入导航队列,保证递归搜索质量不下降。

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

Python 层

  • openviking/server/routers/search.py : FindRequest 新增 level: Optional[List[int]] 字段,路由处理透传给 Service
  • openviking/service/search_service.py : SearchService.find 新增 level 参数,透传给 VikingFS
  • openviking/storage/viking_fs.py : VikingFS.find 新增 level 参数,传给 retriever.retrieve
  • openviking/retrieve/hierarchical_retriever.py :
    • retrieve 方法新增 level 参数,global search 初始候选者按 level 过滤
    • _recursive_search 方法新增 level 参数,两个收集点按 level 过滤(初始候选者 + 递归遍历结果)
    • 修复变量遮蔽 bug:debug 日志循环变量 level 重命名为 result_level
    • 新增停滞检测:当结果池连续多轮无增长时提前终止,避免 level 过滤后永不收敛(例如只有500个L1,但是搜600个L1)
  • openviking/client/local.py 、 openviking/async_client.py 、 openviking/sync_client.py 、 openviking/server/mcp_endpoint.py :SDK 客户端透传 level 参数

Rust CLI 层

  • crates/ov_cli/src/main.rs :Find 命令新增 --level / -L 参数( Option<Vec> , value_delimiter=',' )
  • crates/ov_cli/src/handlers.rs 、 commands/search.rs 、 client.rs :透传 level 参数,JSON body 传数组

测试

  • tests/server/test_api_search.py :新增 7 个测试覆盖参数透传、向后兼容、各级别过滤、混合级别过滤、边界情况

Usage Examples

HTTP API

# 只搜 L2(原始文件)
curl -X POST http://localhost:1977/api/v1/
search/find \
  -H "Content-Type: application/json" \
  -d '{"query": "阿西莫夫", "target_uri": 
  "viking://resources/xxx", "level": [2]}'

# 搜 L0 和 L2
curl -X POST http://localhost:1977/api/v1/
search/find \
  -H "Content-Type: application/json" \
  -d '{"query": "阿西莫夫", "target_uri": 
  "viking://resources/xxx", "level": [0, 
  2]}'

CLI

# 只搜 L2
ov find "阿西莫夫" --uri viking://
resources/xxx --level 2

# 搜 L0 和 L2
ov find "阿西莫夫" --uri viking://
resources/xxx --level 0,2

# 短标志
ov find "阿西莫夫" --uri viking://
resources/xxx -L 0,1,2

Python SDK

from openviking import OpenViking

client = OpenViking()

# 只搜 L2
client.find("阿西莫夫", 
target_uri="viking://resources/xxx", 
level=[2])

# 搜 L0 和 L2
client.find("阿西莫夫", 
target_uri="viking://resources/xxx", 
level=[0, 2])

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
    测试命令:
OPENVIKING_CONFIG_FILE=/path/to/ov.conf 
uv run pytest tests/server/
test_api_search.py -k "level" -xvs -o 
"addopts="

测试结果:8 passed

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 75
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

API Compatibility Breakage

The FindRequest.level field changed from Optional[Union[int, str, List[int]]] to Optional[List[int]], which breaks existing clients sending level as an integer or comma-separated string.

level: Optional[List[int]] = None
Missing Level Field in Request Body

The HttpClient::find method accepts a level parameter but the diff does not show it being added to the JSON request body. This would cause the Rust CLI to not send the level filter to the server.

let body = serde_json::json!({
    "query": query,
    "target_uri": uri,
    "limit": node_limit,
    "score_threshold": threshold,

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@sponge225 sponge225 force-pushed the feat/level-filter branch from 35d5255 to dc8bed0 Compare May 12, 2026 08:50
Comment thread openviking/retrieve/hierarchical_retriever.py Outdated
Comment thread openviking/server/routers/search.py
Comment thread openviking/server/routers/search.py Outdated
…ing, preserving recursive navigation

Add level: Optional[List[int]] parameter to find API/CLI/SDK to filter
results by L0 (abstract), L1 (overview), L2 (original file).

Key design: filter at two result collection points inside
HierarchicalRetriever (global search pool + recursive traversal pool),
NOT at vector search layer or post-filter layer. This preserves L0/L1
directory waypoints in dir_queue for recursive navigation, avoiding the
quality regression that merge_level_filter (PR volcengine#1980) causes.

Changes:
- Python: FindRequest, SearchService, VikingFS, LocalClient,
  AsyncOpenViking, SyncOpenViking, MCP endpoint all pass level through
- Retriever: initial_candidates and collected_by_uri filtered by level;
  dir_queue navigation unchanged
- Fix variable shadowing: rename debug loop 'level' to 'result_level'
- Add stagnation detection to convergence check when level filter
  prevents reaching limit
- CLI: --level / -L flag with Option<Vec<i32>> + value_delimiter
- Remove merge_level_filter from find route (breaks recursive navigation)
- Tests: 8 new test cases covering param passthrough, backward compat,
  single/mixed level filtering, and edge cases
@sponge225 sponge225 force-pushed the feat/level-filter branch from dc8bed0 to 08ef4b8 Compare May 13, 2026 08:31
- HTTP 路由层:FindRequest.level 改为 Union[int, str, List[int]],search 路由传递 level 参数
- Service 层:SearchService.search 增加 level 参数
- 存储层:VikingFS.search 增加 level 参数,传递给 retriever.retrieve
- SDK 层:LocalClient/AsyncOpenViking/SyncOpenViking.search 增加 level 参数
- MCP 端点:search 工具增加 level 参数
- CLI 层:ov search --level 从 Option<String> 改为 Option<Vec<i32>>,与 find 一致
- 删除 append_level_filter_params,改用内联格式化
- 路由层用 _resolve_levels() 统一将 Union 类型转为 List[int]
- 新增 9 个测试:7 个 search level 测试 + 2 个 find Union 类型输入测试
@qin-ctx qin-ctx merged commit 79389ee into volcengine:main May 14, 2026
11 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants