Skip to content

feat(storage): optimize tree func#2372

Open
baojun-zhang wants to merge 1 commit into
volcengine:mainfrom
baojun-zhang:optimize-tree
Open

feat(storage): optimize tree func#2372
baojun-zhang wants to merge 1 commit into
volcengine:mainfrom
baojun-zhang:optimize-tree

Conversation

@baojun-zhang
Copy link
Copy Markdown
Collaborator

Description

Push the tree recursive traversal capability down from the Python layer into ragfs (Rust). A default implementation equivalent to Python's _tree_original is provided in core/filesystem.rs, and s3fs gets a high-performance override based on flat listing. The Python VikingFS layer now only handles view shaping, ACL, and agent enrichment, eliminating the duplicated DFS between _tree_original/_tree_agent. Zero semantic regression, with a major performance gain for wide S3 directories.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Rust Core: add TreeEntry (core/types.rs); add tree_directory() / tree_directory_internal() default recursive implementation to the FileSystem trait (core/filesystem.rs), preserving node_limit/level_limit/show_hidden/rel_path/DFS-order semantics identical to the original Python logic.
  • MountableFS: add tree_directory() routing and rewrite plugin-internal paths back to global AGFS absolute paths (core/mountable.rs).
  • StatsWrappedFS: add the tree_directory() proxy and FsOperation::TreeDir (core/stats_wrapper.rs, core/stats.rs), fixing the wrapper not delegating, which previously made the S3FS optimization ineffective.
  • S3FS: override tree_directory() and add build_tree_entries_from_flat_listing(), replacing per-directory recursive read_dir with a single/few paginated flat listings; client.rs adds list_tree_objects() that retains directory markers (plugins/s3fs/mod.rs, plugins/s3fs/client.rs).
  • Bindings: expose tree_directory() and TreeEntry → PyDict serialization (ragfs-python/src/lib.rs).
  • Python: async_client.py adds an async tree_directory(); viking_fs.py refactors _tree_original/_tree_agent into adapters/enrichment built on a shared _iter_visible_tree_entries(), adds _is_tree_entry_visible()/_ancestor_is_filtered(), and keeps the "apply node_limit after ACL filtering" semantics.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Added/passing test coverage: Rust Core tree_directory default implementation (RUST-TREE-001018), S3FS override (RUST-S3-001012), MountableFS routing (RUST-MNT-001~004); Python _is_tree_entry_visible/_iter_visible_tree_entries/_tree_original/_tree_agent (PY-FLT/ITER/ORIG/AGENT, 31 cases, tests/storage/test_viking_fs_tree.py). 65 unit tests pass in total. Performance regression: at the same result scale (1000 nodes), tree dropped from ~31615ms/call to ~10726ms/call (~2.95x), p99 from 56835ms to 13035ms (~4.36x), and stability improved from 99/100 to 100/100.

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

  • Design reference: perf/s3/tree/tree-optimization-design.md; Chapter 16 documents the implementation retrospective (including the fix for the missing StatsWrappedFS proxy).
  • Semantic constraints: ACL is still enforced in the Python layer, and node_limit must be applied after ACL filtering (not pushed down to Rust); level_limit can be safely pushed down.
  • Unrelated workspace changes such as audit-design.md, sqlite/DESIGN.md, and setup.py are not part of this PR.

@baojun-zhang
Copy link
Copy Markdown
Collaborator Author

性能测试(LS)

前置

导入 1000 个资源,用于准备 ls 测试数据:

python3 ./OpenViking/load_test_add_resource_1000.py

测试命令

python3 perf/s3/ls/load_test_ls_100.py \
  --account-id tenant_d9f3c5b17f \
  --user-id user_b27e46a1cb \
  --api-key 'dGVuY...ZA' \
  --uri viking://resources \
  --iterations 100

结果汇总(100 次)

场景 iterations ok fail total_sec avg_ms p50_ms p95_ms p99_ms last_entry_count
优化前 100 99 1 3161.55 31615.41 37581.54 47593.37 56835.03 1000
优化后 100 100 0 1072.66 10726.53 10455.09 12912.64 13034.52 1000

结论

在相同结果规模(last_entry_count=1000)下,tree 从 ~31615ms/次 降到 ~10726ms/次,提升约 2.95 倍;同时稳定性提升:优化前有 1 次 500 失败(i=69),优化后 100 次全部成功(fail=0)。尾延迟改善尤为明显,p99 从 56835ms 降至 13035ms(约 4.36 倍)。

原始输出

优化前

progress: 20/100, ok=20, fail=0
progress: 40/100, ok=40, fail=0
progress: 60/100, ok=60, fail=0
ERR i=69 status=500 data=None text=
progress: 80/100, ok=79, fail=1
progress: 100/100, ok=99, fail=1

Summary:
  iterations: 100
  ok: 99
  fail: 1
  total_sec: 3161.5479
  avg_ms: 31615.41
  p50_ms: 37581.54
  p95_ms: 47593.37
  p99_ms: 56835.03
  last_entry_count: 1000

优化后

progress: 20/100, ok=20, fail=0
progress: 40/100, ok=40, fail=0
progress: 60/100, ok=60, fail=0
progress: 80/100, ok=80, fail=0
progress: 100/100, ok=100, fail=0

Summary:
  iterations: 100
  ok: 100
  fail: 0
  total_sec: 1072.6601
  avg_ms: 10726.53
  p50_ms: 10455.09
  p95_ms: 12912.64
  p99_ms: 13034.52
  last_entry_count: 1000

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 90
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

PR Code Suggestions ✨

No code suggestions found for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant