fix(memory): filter structural noise from graph entity extraction by bug-ops · Pull Request #1920 · bug-ops/zeph

bug-ops · 2026-03-16T16:24:54Z

Summary

Fixes #1912. The zeph_graph_entities Qdrant collection was being polluted with structural tokens (TOML config keys, file paths, tool names like read_file, wget, generic terms like go, type, src/) extracted from tool result messages rather than meaningful semantic entities.

Root causes and fixes:

FIX-1: persist_message() now skips graph extraction entirely when the message contains ToolResult parts — tool outputs (TOML, JSON, command output) are structural data, not conversational content
FIX-2: The context window passed to the extraction LLM call now excludes Role::User messages with ToolResult parts
FIX-3: Added min_entity_name_bytes = 3 to MemoryWriteValidationConfig, enforced in both validate_graph_extraction and EntityResolver::resolve() via MIN_ENTITY_NAME_BYTES constant — rejects tokens like go, cd, type
FIX-4: Revised extraction prompt — entity types restricted to person, project, technology, organization, concept; explicit rules against extracting config keys, file paths, tool names, TOML/JSON keys, and short tokens

Tests: 3 new unit tests added (2569 → 6049 total pass after merge with main), covering:

context_filter_excludes_tool_result_messages
resolve_short_name_below_min_returns_error
resolve_name_at_min_length_passes

Test plan

cargo nextest run --config-file .github/nextest.toml -p zeph-memory -p zeph-core --lib passes
cargo clippy --workspace --features full -- -D warnings clean
cargo +nightly fmt --check clean
Live session: send a message referencing a config file, verify no config keys appear in zeph_graph_entities

Prevent TOML config keys, file paths, tool names, and short generic tokens from polluting zeph_graph_entities (closes #1912). - Skip graph extraction for Role::User messages containing ToolResult parts — tool outputs are structural data, not conversational content - Exclude ToolResult user messages from the LLM extraction context window - Add min_entity_name_bytes = 3 to MemoryWriteValidationConfig and enforce it in validate_graph_extraction and EntityResolver::resolve() - Restrict extraction prompt entity types to person/project/technology/ organization/concept; add explicit rules against structural tokens, config keys, file paths, and raw command output

github-actions bot added bug Something isn't working size/L documentation Improvements or additions to documentation memory Persistence and memory rust core and removed bug Something isn't working labels Mar 16, 2026

bug-ops enabled auto-merge (squash) March 16, 2026 16:25

github-actions bot added the bug Something isn't working label Mar 16, 2026

bug-ops force-pushed the 1912-graph-entity-extraction-noise branch from b9bda10 to 62115f5 Compare March 16, 2026 16:46

bug-ops merged commit 5160aaa into main Mar 16, 2026
20 checks passed

bug-ops deleted the 1912-graph-entity-extraction-noise branch March 16, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): filter structural noise from graph entity extraction#1920

fix(memory): filter structural noise from graph entity extraction#1920
bug-ops merged 1 commit intomainfrom
1912-graph-entity-extraction-noise

bug-ops commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 16, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant