Skip to content

[WIP] 测试中 - WM v2 Memory Preprocessing#2131

Open
jcp0578 wants to merge 13 commits into
volcengine:mainfrom
jcp0578:feat/wm-v2-token-distillation
Open

[WIP] 测试中 - WM v2 Memory Preprocessing#2131
jcp0578 wants to merge 13 commits into
volcengine:mainfrom
jcp0578:feat/wm-v2-token-distillation

Conversation

@jcp0578
Copy link
Copy Markdown
Contributor

@jcp0578 jcp0578 commented May 19, 2026

Description

Introduce WM v2 memory preprocessing so long session payloads can be compacted before the WM update step, then refine activation thresholds and rendering strategy to keep the behavior conservative when savings are marginal.

Related Issue

N/A

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Add ExtractionPreprocessor to the WM v2 update path so full session messages can be distilled into a compact packet before the LLM update step.
  • Extract structured signals and select spans with MMR-based deduplication to preserve high-value context while reducing token cost.
  • Add conservative fallback paths for short sessions, weak compaction, risky sessions, and failed tool-selection cases.
  • Add adaptive preprocessing parameters so span budget and fact caps scale with session size.
  • Introduce tiered compact rendering formats for small, medium, and large sessions to improve activation on mid-sized inputs.
  • Align CREATION and UPDATE span budgets so preprocessing can activate consistently during first-time WM generation.
  • Add a minimum absolute savings threshold so sessions with very small estimated gains fall back instead of compacting on noisy token estimates.
  • Keep the feature wired only into the WM v2 update path; WM creation flow outside the span-budget alignment and long-term extraction behavior remain unchanged.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Tested locally / referenced in branch commits:

LoCoMo small用例测试

模式 轮次 Accuracy 正确/总数 Gateway Ingest Tokens Gateway QA Tokens OV Ingest LLM Tokens Combined Total Tokens 每成功任务 Token
OFF 1 88.57% 31 / 35 70,402 759,712 114,015 944,129 30,455.77
OFF 2 85.71% 30 / 35 70,349 755,764 106,295 932,408 31,080.27
OFF 3 91.43% 32 / 35 70,472 772,242 96,306 939,020 29,344.38
ON 1 85.71% 30 / 35 70,762 661,930 88,601 821,293 27,376.43
ON 2 97.14% 34 / 35 70,259 774,898 96,563 941,720 27,697.65
ON 3 97.14% 34 / 35 70,840 796,350 97,856 965,046 28,383.71

按模式汇总平均值

模式 平均 Accuracy 平均正确题数 平均 Gateway Ingest 平均 Gateway QA 平均 OV Ingest LLM 平均 Combined 平均每成功任务 Token
OFF 90.48% 31.00 70,407.67 762,572.67 105,538.67 938,519.00 30,293.47
ON 93.33% 32.67 70,620.33 744,392.67 94,340.00 909,353.00 27,819.26

LoCoMo sample0用例测试中

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

The preprocessing logic is intentionally conservative. If compaction is too small to be confidently beneficial, the flow falls back to the full-message path instead of risking information loss for marginal token savings.

Insert an ExtractionPreprocessor before LLM in WM v2 update path that compresses
full session messages into a compact packet using rule-based signal extraction
and MMR-based span deduplication.

Architecture:
  raw messages -> rule extraction (structured_facts) + MMR span selection
  -> compact packet -> LLM WM update

Safety nets:
  - Short sessions (<600 tokens) -> session_too_short fallback
  - Compact not smaller enough -> compact_not_smaller_enough fallback
  - High risk sessions -> auto expand budget 1.5x or fallback
  - Failed tool messages not selected -> fallback
  - Full archive always preserved via ov_archive_search

Signal extraction covers: errors, corrections, preferences, dates, goals,
  open issues, paths, URLs, functions, plugins, recall, fallback, components

Truncation uses paragraph-aware extraction instead of naive head-truncation,
preserving the most information-dense paragraphs within the char budget.

Config (all default-off):
  wm_v2_preprocess_enabled, wm_v2_preprocess_max_span_tokens,
  wm_v2_preprocess_fallback_ratio

Only wired into WM v2 update path; creation and long-term extraction unchanged.

Tests: 26 preprocessor unit + 19 fixture scenarios = 133 total passing (with
  existing WM v2 guard/growth tests).

Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit f34913b)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 80
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: WM v2 Memory Preprocessing

Relevant files:

  • openviking/session/extraction_preprocessor.py
  • openviking/session/session.py
  • openviking_cli/utils/config/memory_config.py
  • tests/unit/session/test_extraction_preprocessor.py
  • tests/unit/session/test_fixture_token_savings.py

Sub-PR theme: Auto-recall Improvements

Relevant files:

  • examples/openclaw-plugin/auto-recall.ts
  • examples/openclaw-plugin/context-engine.ts
  • examples/openclaw-plugin/process-manager.ts

⚡ Recommended focus areas for review

No-op change

The to_dict method was changed to the same line, which is a no-op. This might be a mistake or a whitespace change.

def to_dict(self) -> Dict[str, Any]:
    """Convert configuration to dictionary."""
    return self.model_dump()
Forced recall fallback

The code now forces a recall fallback even when ctx is available, which might change behavior for non-/v1/responses flows. This is noted as a diagnostic/workaround, but should be verified.

// entered but the transform-context recall path is never reached.
const forcedRecallFallback = await tryMainPathRecallFallback("main_force");
if (forcedRecallFallback) {
  return forcedRecallFallback;
}

@jcp0578 jcp0578 changed the title 记忆预处理 WM v2 Memory Preprocessing May 19, 2026
@jcp0578 jcp0578 changed the title WM v2 Memory Preprocessing [WIP]WM v2 Memory Preprocessing May 19, 2026
@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Replace bare except with specific error handling

The bare except: pass silently swallows all exceptions, making debugging difficult.
Replace it with a specific exception handler that logs the error (even if continuing
execution) to improve visibility into issues.

benchmark/locomo/data/build_codex_benchmark.py [25-26]

-except:
-    pass
+except json.JSONDecodeError as e:
+    print(f"Warning: Failed to decode JSON line: {e}", file=sys.stderr)
+    continue
+except Exception as e:
+    print(f"Warning: Unexpected error processing line: {e}", file=sys.stderr)
+    continue
Suggestion importance[1-10]: 5

__

Why: The suggestion addresses a poor practice (bare except: pass) by adding specific error handling with logging. This improves debuggability without changing the core functionality, making it a moderate-impact improvement.

Low

@jcp0578 jcp0578 force-pushed the feat/wm-v2-token-distillation branch 3 times, most recently from f7bb425 to 7ce8372 Compare May 19, 2026 15:32
jcp0578 and others added 3 commits May 19, 2026 23:32
Change creation_span_budget * 2 to creation_span_budget in session.py,
matching the UPDATE path's span budget (1200). The 2x multiplier was
causing compact packets to be larger than full messages on CREATION,
preventing the preprocessor from ever triggering ACTIVE on first-time
WM generation.

End-to-end verification with a 150-message real Codex session confirms:
- Preprocessor achieves 40% token savings (50K→30K) in ACTIVE mode
- Accuracy ON (45%) >= OFF (40%), no quality regression
- QA input tokens reduced 70%, total tokens reduced 74%

Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
- Add _resolve_adaptive_options() to scale span budget and facts cap
  with session size (P0 optimization)
- Implement three-tier rendering format:
  Tier 1 (<2K tokens): ultra-compact # WM-Compact with inline facts
  Tier 2 (2K-8K tokens): moderate format with reduced headers
  Tier 3 (>8K tokens): full format (existing behavior)
- Middle sessions (~5K tokens) now trigger ACTIVE instead of FALLBACK

Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
…rginal compacting

Sessions where compact saves <500 tokens are now FALLBACK
("savings_too_small") even when the ratio check passes. Token estimation
uses ceil(len/4) heuristic with inherent noise — small absolute savings
are within estimation error and don't justify compaction's information-loss
risk.

Co-Authored-By: deepseekV4-pro <noreply@deepseek.com>
@jcp0578 jcp0578 force-pushed the feat/wm-v2-token-distillation branch from 7ce8372 to a0a8670 Compare May 19, 2026 15:33
@jcp0578
Copy link
Copy Markdown
Contributor Author

jcp0578 commented May 19, 2026

benchmark/locomo/data/* have been removed from the PR after the latest force-push, so earlier auto-generated review notes referencing those files are now stale.

@jcp0578 jcp0578 changed the title [WIP]WM v2 Memory Preprocessing [WIP] 测试中 - WM v2 Memory Preprocessing May 20, 2026
@jcp0578 jcp0578 marked this pull request as ready for review May 20, 2026 10:09
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit f34913b

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@jcp0578 jcp0578 force-pushed the feat/wm-v2-token-distillation branch 2 times, most recently from 4b5e8b6 to 7026535 Compare May 20, 2026 10:23
@jcp0578 jcp0578 force-pushed the feat/wm-v2-token-distillation branch from 0df5462 to c55362a Compare May 20, 2026 12:41
@jcp0578 jcp0578 force-pushed the feat/wm-v2-token-distillation branch from 0f9c079 to 524c8cc Compare May 21, 2026 15:53
@jcp0578 jcp0578 force-pushed the feat/wm-v2-token-distillation branch from 524c8cc to 8b310ad Compare May 21, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant