Skip to content

fix(session): extract meaningful abstract from WM v2 summaries#1868

Open
liuhang93 wants to merge 1 commit into
volcengine:mainfrom
liuhang93:fix/wm-v2-abstract-extraction
Open

fix(session): extract meaningful abstract from WM v2 summaries#1868
liuhang93 wants to merge 1 commit into
volcengine:mainfrom
liuhang93:fix/wm-v2-abstract-extraction

Conversation

@liuhang93
Copy link
Copy Markdown

Summary

  • _extract_abstract_from_summary returned # Working Memory (the markdown heading) for all WM v2 archives because the v1 regex didn't match and the fallback simply grabbed the first line
  • Add a v2-aware regex that extracts the Session Title content line, and improve the final fallback to skip markdown headings

Context

PR #1782 introduced Working Memory v2 with a # Working Memory / ## Session Title structure, but _extract_abstract_from_summary was not updated to handle the new format. As a result, .abstract.md for every WM v2 archive is always the literal string # Working Memory, losing all semantic value.

Before (bug)

# Working Memory       <-- .abstract.md content (useless)

After (fix)

Wang Fang Tencent Senior Backend Engineer Onboarding   <-- actual Session Title

Changes

openviking/session/session.py — 3-tier extraction:

  1. v1 format: **One-sentence overview**: <text> (unchanged, backward compatible)
  2. NEW v2 format: regex for ## Session Title → skip optional italic description → capture content
  3. Improved fallback: iterate lines, skip empty and #-prefixed headings

tests/unit/session/test_wm_v2_guards.py — Added TestExtractAbstract class with 10 test cases covering v1, v2 (with/without italic descriptions), heading-skip fallback, and edge cases.

Test plan

  • v1 structured_summary format still works (backward compat)
  • v2 with italic section description (_A short and distinctive..._)
  • v2 without italic description
  • Real WM v2 output from production run
  • Never returns # Working Memory as abstract
  • Empty input returns empty
  • Fallback skips heading lines
  • Only-headings input returns empty

Made with Cursor

`_extract_abstract_from_summary` always returned `# Working Memory`
(the heading) for WM v2 documents because the v1 regex missed and the
fallback grabbed the first line. Add a v2-aware regex that extracts
the Session Title content, and improve the final fallback to skip
markdown headings.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅

1782 - Fully compliant

Compliant requirements:

  • Fix _extract_abstract_from_summary to handle WM v2 format
  • Add v2-aware regex to extract Session Title
  • Improve fallback to skip markdown headings
  • Add comprehensive test cases
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 95
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Allow underscores in italic description

Fix the optional italic description regex to allow underscores within the italic
text. The current [^_]+ pattern stops at the first internal underscore, which would
fail to match lines like A_title_with_underscores.

openviking/session/session.py [1429-1435]

 title_match = re.search(
     r"^##\s+Session\s+Title\s*\n"  # section header
-    r"(?:_[^_]+_\s*\n)?"            # optional italic description
+    r"(?:_[^\n]+_\s*\n)?"           # optional italic description (allows underscores)
     r"(.+)",                         # title content
     summary,
     re.MULTILINE,
 )
Suggestion importance[1-10]: 7

__

Why: The original regex [^_]+ would fail to match italic lines containing internal underscores (e.g., _A_title_with_underscores_). Changing it to [^\n]+ allows underscores while ensuring the pattern doesn't span multiple lines, fixing a potential false negative in v2 format handling.

Medium

@qin-ctx qin-ctx requested review from chenjw and qin-ctx May 7, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants