[WIP] 测试中 - WM v2 Memory Preprocessing#2131
Conversation
Insert an ExtractionPreprocessor before LLM in WM v2 update path that compresses full session messages into a compact packet using rule-based signal extraction and MMR-based span deduplication. Architecture: raw messages -> rule extraction (structured_facts) + MMR span selection -> compact packet -> LLM WM update Safety nets: - Short sessions (<600 tokens) -> session_too_short fallback - Compact not smaller enough -> compact_not_smaller_enough fallback - High risk sessions -> auto expand budget 1.5x or fallback - Failed tool messages not selected -> fallback - Full archive always preserved via ov_archive_search Signal extraction covers: errors, corrections, preferences, dates, goals, open issues, paths, URLs, functions, plugins, recall, fallback, components Truncation uses paragraph-aware extraction instead of naive head-truncation, preserving the most information-dense paragraphs within the char budget. Config (all default-off): wm_v2_preprocess_enabled, wm_v2_preprocess_max_span_tokens, wm_v2_preprocess_fallback_ratio Only wired into WM v2 update path; creation and long-term extraction unchanged. Tests: 26 preprocessor unit + 19 fixture scenarios = 133 total passing (with existing WM v2 guard/growth tests). Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
PR Reviewer Guide 🔍(Review updated until commit f34913b)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
f7bb425 to
7ce8372
Compare
Change creation_span_budget * 2 to creation_span_budget in session.py, matching the UPDATE path's span budget (1200). The 2x multiplier was causing compact packets to be larger than full messages on CREATION, preventing the preprocessor from ever triggering ACTIVE on first-time WM generation. End-to-end verification with a 150-message real Codex session confirms: - Preprocessor achieves 40% token savings (50K→30K) in ACTIVE mode - Accuracy ON (45%) >= OFF (40%), no quality regression - QA input tokens reduced 70%, total tokens reduced 74% Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
- Add _resolve_adaptive_options() to scale span budget and facts cap with session size (P0 optimization) - Implement three-tier rendering format: Tier 1 (<2K tokens): ultra-compact # WM-Compact with inline facts Tier 2 (2K-8K tokens): moderate format with reduced headers Tier 3 (>8K tokens): full format (existing behavior) - Middle sessions (~5K tokens) now trigger ACTIVE instead of FALLBACK Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
…rginal compacting
Sessions where compact saves <500 tokens are now FALLBACK
("savings_too_small") even when the ratio check passes. Token estimation
uses ceil(len/4) heuristic with inherent noise — small absolute savings
are within estimation error and don't justify compaction's information-loss
risk.
Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
7ce8372 to
a0a8670
Compare
|
benchmark/locomo/data/* have been removed from the PR after the latest force-push, so earlier auto-generated review notes referencing those files are now stale. |
|
Persistent review updated to latest commit f34913b |
PR Code Suggestions ✨No code suggestions found for the PR. |
4b5e8b6 to
7026535
Compare
0df5462 to
c55362a
Compare
…ose savings threshold
0f9c079 to
524c8cc
Compare
524c8cc to
8b310ad
Compare
Description
Introduce WM v2 memory preprocessing so long session payloads can be compacted before the WM update step, then refine activation thresholds and rendering strategy to keep the behavior conservative when savings are marginal.
Related Issue
N/A
Type of Change
Changes Made
ExtractionPreprocessorto the WM v2 update path so full session messages can be distilled into a compact packet before the LLM update step.Testing
Tested locally / referenced in branch commits:
LoCoMo small用例测试
按模式汇总平均值
LoCoMo sample0用例测试中
Checklist
Screenshots (if applicable)
N/A
Additional Notes
The preprocessing logic is intentionally conservative. If compaction is too small to be confidently beneficial, the flow falls back to the full-message path instead of risking information loss for marginal token savings.