Update identity and memory extraction guidelines in prompts.py #3958

smian1 · 2025-12-29T08:17:58Z

docs: Update identity and memory extraction guidelines in prompts.py

Added critical identity rules for name recognition and memory extraction
Included banned language and consolidation checks to improve memory accuracy
Clarified rules for extracting user-related information and avoiding duplicates

gemini-code-assist

Code Review

This pull request introduces significant improvements to the memory extraction prompts in prompts.py. The added rules for identity, banned language, deduplication, and bad examples are well-defined and should greatly enhance the quality and accuracy of the generated memories. However, I've identified one critical issue: a contradiction between a newly added rule and an existing one, which could lead to incorrect behavior by the language model. My review includes a comment and a suggestion to resolve this ambiguity, ensuring that valid user information updates are not accidentally discarded.

gemini-code-assist · 2025-12-29T08:19:32Z

backend/utils/prompts.py

+• Family consistency: Don't create children that contradict existing family structure
+• Location consistency: Don't claim multiple contradictory home locations
+• Career consistency: Don't claim conflicting job titles or employers simultaneously
+
+If a fact seems mathematically impossible or contradicts existing memories, DO NOT extract.


There is a critical contradiction in the prompt's instructions. The new LOGIC CHECK section on line 283 states, 'If a fact... contradicts existing memories, DO NOT extract.' However, an existing rule in the CRITICAL DEDUPLICATION & UPDATES RULES section (line 185) explicitly says, 'If a new memory CONTRADICTS or UPDATES an existing one, YOU MUST ADD IT.'

This conflict could confuse the model, causing it to discard valid updates to a user's information (e.g., a change of job or address), treating them as logical errors. The instructions should be clarified to distinguish between illogical contradictions (which should be ignored) and valid updates over time (which should be captured).

Suggested change

• Family consistency: Don't create children that contradict existing family structure

• Location consistency: Don't claim multiple contradictory home locations

• Career consistency: Don't claim conflicting job titles or employers simultaneously

If a fact seems mathematically impossible or contradicts existing memories, DO NOT extract.

• Family consistency: Avoid creating contradictory family structures simultaneously (e.g., claiming someone has children and no children at the same time).

• Location consistency: Avoid claiming someone lives in multiple locations simultaneously.

• Career consistency: Avoid claiming conflicting primary job titles or employers simultaneously.

If a fact seems mathematically impossible or creates a nonsensical contradiction (and is not a valid update over time), DO NOT extract.

Copilot

Pull request overview

This PR enhances memory extraction guidelines in prompts.py and adds speaker name functionality to webhook and developer API responses. The changes aim to improve memory accuracy by providing clearer identity rules, consolidation checks, and examples of what should and shouldn't be extracted. Additionally, the PR enriches transcript segments with human-readable speaker names based on person_id mappings.

Key Changes

Added comprehensive identity rules and validation checks to prevent hallucinations and duplicate person creation in memory extraction
Introduced speaker_name enrichment functionality for transcript segments in webhook payloads and developer API responses
Expanded documentation with banned language patterns, additional bad examples, and logic consistency checks

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
backend/utils/prompts.py	Added identity rules, banned language patterns, consolidation checks, and additional bad examples to improve memory extraction quality
backend/utils/webhooks.py	Added `_add_speaker_names_to_payload` function and integrated it into conversation webhook to include speaker names in transcript segments
backend/routers/developer.py	Added `_add_speaker_names_to_segments` function and integrated it into conversation GET endpoints to enrich transcript segments with speaker names when requested

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-29T09:27:53Z

backend/utils/webhooks.py

+def _add_speaker_names_to_payload(uid, payload: dict):
+    """Add speaker_name to transcript segments in webhook payload."""
+    segments = payload.get('transcript_segments', [])
+    if not segments:
+        return
+
+    person_ids = [seg.get('person_id') for seg in segments if seg.get('person_id')]
+    people_map = {}
+    if person_ids:
+        people_data = users_db.get_people_by_ids(uid, list(set(person_ids)))
+        people_map = {p['id']: p['name'] for p in people_data}
+
+    for seg in segments:
+        if seg.get('is_user'):
+            seg['speaker_name'] = 'User'
+        elif seg.get('person_id') and seg['person_id'] in people_map:
+            seg['speaker_name'] = people_map[seg['person_id']]
+        else:
+            seg['speaker_name'] = f"Speaker {seg.get('speaker_id', 0)}"
+


The function _add_speaker_names_to_payload and _add_speaker_names_to_segments in developer.py (lines 683-704) contain nearly identical logic. Consider extracting this shared logic into a common utility function to avoid code duplication and improve maintainability. The only difference is the parameter types (dict vs list), which could be unified.

Copilot · 2025-12-29T09:27:54Z

backend/utils/prompts.py

+2. **GENERAL KNOWLEDGE**: Science facts, geography, statistics not about the user
+   ❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets"


There's a contradiction in the guidelines. Line 156-157 states "GENERAL KNOWLEDGE: Science facts, geography, statistics not about the user" should NEVER be extracted, but earlier examples in lines 54, 60-61, 66-67, and 233 show that interesting/shareable knowledge facts ARE valid memories if they are framed as things the user learned (e.g., "{user_name} learned that honey never spoils"). Consider clarifying that general knowledge should be extracted when framed as user learnings, but not when simply stating facts without connection to the user.

Suggested change

2. **GENERAL KNOWLEDGE**: Science facts, geography, statistics not about the user

❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets"

2. **STANDALONE GENERAL KNOWLEDGE**: Science facts, geography, statistics stated without connection to {user_name}'s learning or life

❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets"

✅ Exception: Allowed ONLY when explicitly framed as something {user_name} learned and it clearly passes the interesting memory criteria above

Copilot · 2025-12-29T09:27:54Z

backend/utils/prompts.py

+ADDITIONAL BAD EXAMPLES:
+
+**Transient/Temporary (will be outdated):**
+❌ "{user_name} is working on a new app"


There's a contradiction in the guidelines. Line 94 shows "{user_name} and Jamie are working on the Q4 budget presentation" as a valid example, but line 245 in the ADDITIONAL BAD EXAMPLES section states "{user_name} is working on a new app" should not be extracted because it's transient. These appear to contradict each other - both describe temporary work activities. Consider clarifying when work-in-progress should or should not be extracted, or ensuring consistency between these two sections.

backend/utils/prompts.py

Copilot · 2025-12-29T09:27:54Z

backend/utils/webhooks.py

        return obj


+def _add_speaker_names_to_payload(uid, payload: dict):


The function signature lacks type hints for the parameters. Consider adding type hints for better code clarity and type checking. For example: def _add_speaker_names_to_payload(uid: str, payload: dict) -> None:

Suggested change

def _add_speaker_names_to_payload(uid, payload: dict):

def _add_speaker_names_to_payload(uid: str, payload: dict) -> None:

backend/routers/developer.py

Copilot · 2025-12-29T09:27:55Z

backend/utils/webhooks.py

+    people_map = {}
+    if person_ids:
+        people_data = users_db.get_people_by_ids(uid, list(set(person_ids)))
+        people_map = {p['id']: p['name'] for p in people_data}


The dictionary comprehension people_map = {p['id']: p['name'] for p in people_data} assumes all person dictionaries have 'id' and 'name' keys. If any person document is missing these fields, it will raise a KeyError. Consider adding defensive checks or using .get() method: people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p} to handle potentially malformed data gracefully.

Suggested change

people_map = {p['id']: p['name'] for p in people_data}

people_map = {

p['id']: p['name']

for p in people_data

if isinstance(p, dict) and 'id' in p and 'name' in p

}

Copilot · 2025-12-29T09:27:55Z

backend/routers/developer.py

+    people_map = {}
+    if all_person_ids:
+        people_data = users_db.get_people_by_ids(uid, list(all_person_ids))
+        people_map = {p['id']: p['name'] for p in people_data}


The dictionary comprehension people_map = {p['id']: p['name'] for p in people_data} assumes all person dictionaries have 'id' and 'name' keys. If any person document is missing these fields, it will raise a KeyError. Consider adding defensive checks or using .get() method: people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p} to handle potentially malformed data gracefully.

Suggested change

people_map = {p['id']: p['name'] for p in people_data}

people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p}

github-actions · 2025-12-30T03:25:03Z

Hey @smian1 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

smian1 requested review from Copilot and mdmohsin7 and removed request for Copilot December 29, 2025 08:17

Copilot started reviewing on behalf of smian1 December 29, 2025 08:18 View session

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings December 29, 2025 09:23

Copilot started reviewing on behalf of smian1 December 29, 2025 09:24 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

smian1 force-pushed the main branch from 0684dce to 270563c Compare December 29, 2025 09:31

smian1 closed this Dec 30, 2025

smian1 force-pushed the main branch from 2583b47 to 1e3c69a Compare December 30, 2025 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update identity and memory extraction guidelines in prompts.py #3958

Update identity and memory extraction guidelines in prompts.py #3958

smian1 commented Dec 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		2. GENERAL KNOWLEDGE: Science facts, geography, statistics not about the user
		❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets"

		return obj


		def _add_speaker_names_to_payload(uid, payload: dict):

	def _add_speaker_names_to_payload(uid, payload: dict):
	def _add_speaker_names_to_payload(uid: str, payload: dict) -> None:

	people_map = {p['id']: p['name'] for p in people_data}
	people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p}

Update identity and memory extraction guidelines in prompts.py #3958

Update identity and memory extraction guidelines in prompts.py #3958

Conversation

smian1 commented Dec 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant