-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Update identity and memory extraction guidelines in prompts.py #3958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to the memory extraction prompts in prompts.py. The added rules for identity, banned language, deduplication, and bad examples are well-defined and should greatly enhance the quality and accuracy of the generated memories. However, I've identified one critical issue: a contradiction between a newly added rule and an existing one, which could lead to incorrect behavior by the language model. My review includes a comment and a suggestion to resolve this ambiguity, ensuring that valid user information updates are not accidentally discarded.
backend/utils/prompts.py
Outdated
| • Family consistency: Don't create children that contradict existing family structure | ||
| • Location consistency: Don't claim multiple contradictory home locations | ||
| • Career consistency: Don't claim conflicting job titles or employers simultaneously | ||
| If a fact seems mathematically impossible or contradicts existing memories, DO NOT extract. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a critical contradiction in the prompt's instructions. The new LOGIC CHECK section on line 283 states, 'If a fact... contradicts existing memories, DO NOT extract.' However, an existing rule in the CRITICAL DEDUPLICATION & UPDATES RULES section (line 185) explicitly says, 'If a new memory CONTRADICTS or UPDATES an existing one, YOU MUST ADD IT.'
This conflict could confuse the model, causing it to discard valid updates to a user's information (e.g., a change of job or address), treating them as logical errors. The instructions should be clarified to distinguish between illogical contradictions (which should be ignored) and valid updates over time (which should be captured).
| • Family consistency: Don't create children that contradict existing family structure | |
| • Location consistency: Don't claim multiple contradictory home locations | |
| • Career consistency: Don't claim conflicting job titles or employers simultaneously | |
| If a fact seems mathematically impossible or contradicts existing memories, DO NOT extract. | |
| • Family consistency: Avoid creating contradictory family structures simultaneously (e.g., claiming someone has children and no children at the same time). | |
| • Location consistency: Avoid claiming someone lives in multiple locations simultaneously. | |
| • Career consistency: Avoid claiming conflicting primary job titles or employers simultaneously. | |
| If a fact seems mathematically impossible or creates a nonsensical contradiction (and is not a valid update over time), DO NOT extract. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances memory extraction guidelines in prompts.py and adds speaker name functionality to webhook and developer API responses. The changes aim to improve memory accuracy by providing clearer identity rules, consolidation checks, and examples of what should and shouldn't be extracted. Additionally, the PR enriches transcript segments with human-readable speaker names based on person_id mappings.
Key Changes
- Added comprehensive identity rules and validation checks to prevent hallucinations and duplicate person creation in memory extraction
- Introduced speaker_name enrichment functionality for transcript segments in webhook payloads and developer API responses
- Expanded documentation with banned language patterns, additional bad examples, and logic consistency checks
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| backend/utils/prompts.py | Added identity rules, banned language patterns, consolidation checks, and additional bad examples to improve memory extraction quality |
| backend/utils/webhooks.py | Added _add_speaker_names_to_payload function and integrated it into conversation webhook to include speaker names in transcript segments |
| backend/routers/developer.py | Added _add_speaker_names_to_segments function and integrated it into conversation GET endpoints to enrich transcript segments with speaker names when requested |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
backend/utils/webhooks.py
Outdated
| def _add_speaker_names_to_payload(uid, payload: dict): | ||
| """Add speaker_name to transcript segments in webhook payload.""" | ||
| segments = payload.get('transcript_segments', []) | ||
| if not segments: | ||
| return | ||
|
|
||
| person_ids = [seg.get('person_id') for seg in segments if seg.get('person_id')] | ||
| people_map = {} | ||
| if person_ids: | ||
| people_data = users_db.get_people_by_ids(uid, list(set(person_ids))) | ||
| people_map = {p['id']: p['name'] for p in people_data} | ||
|
|
||
| for seg in segments: | ||
| if seg.get('is_user'): | ||
| seg['speaker_name'] = 'User' | ||
| elif seg.get('person_id') and seg['person_id'] in people_map: | ||
| seg['speaker_name'] = people_map[seg['person_id']] | ||
| else: | ||
| seg['speaker_name'] = f"Speaker {seg.get('speaker_id', 0)}" | ||
|
|
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function _add_speaker_names_to_payload and _add_speaker_names_to_segments in developer.py (lines 683-704) contain nearly identical logic. Consider extracting this shared logic into a common utility function to avoid code duplication and improve maintainability. The only difference is the parameter types (dict vs list), which could be unified.
backend/utils/prompts.py
Outdated
| 2. **GENERAL KNOWLEDGE**: Science facts, geography, statistics not about the user | ||
| ❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets" |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a contradiction in the guidelines. Line 156-157 states "GENERAL KNOWLEDGE: Science facts, geography, statistics not about the user" should NEVER be extracted, but earlier examples in lines 54, 60-61, 66-67, and 233 show that interesting/shareable knowledge facts ARE valid memories if they are framed as things the user learned (e.g., "{user_name} learned that honey never spoils"). Consider clarifying that general knowledge should be extracted when framed as user learnings, but not when simply stating facts without connection to the user.
| 2. **GENERAL KNOWLEDGE**: Science facts, geography, statistics not about the user | |
| ❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets" | |
| 2. **STANDALONE GENERAL KNOWLEDGE**: Science facts, geography, statistics stated without connection to {user_name}'s learning or life | |
| ❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets" | |
| ✅ Exception: Allowed ONLY when explicitly framed as something {user_name} learned and it clearly passes the interesting memory criteria above |
backend/utils/prompts.py
Outdated
| ADDITIONAL BAD EXAMPLES: | ||
| **Transient/Temporary (will be outdated):** | ||
| ❌ "{user_name} is working on a new app" |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a contradiction in the guidelines. Line 94 shows "{user_name} and Jamie are working on the Q4 budget presentation" as a valid example, but line 245 in the ADDITIONAL BAD EXAMPLES section states "{user_name} is working on a new app" should not be extracted because it's transient. These appear to contradict each other - both describe temporary work activities. Consider clarifying when work-in-progress should or should not be extracted, or ensuring consistency between these two sections.
backend/utils/webhooks.py
Outdated
| return obj | ||
|
|
||
|
|
||
| def _add_speaker_names_to_payload(uid, payload: dict): |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature lacks type hints for the parameters. Consider adding type hints for better code clarity and type checking. For example: def _add_speaker_names_to_payload(uid: str, payload: dict) -> None:
| def _add_speaker_names_to_payload(uid, payload: dict): | |
| def _add_speaker_names_to_payload(uid: str, payload: dict) -> None: |
backend/utils/webhooks.py
Outdated
| people_map = {} | ||
| if person_ids: | ||
| people_data = users_db.get_people_by_ids(uid, list(set(person_ids))) | ||
| people_map = {p['id']: p['name'] for p in people_data} |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dictionary comprehension people_map = {p['id']: p['name'] for p in people_data} assumes all person dictionaries have 'id' and 'name' keys. If any person document is missing these fields, it will raise a KeyError. Consider adding defensive checks or using .get() method: people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p} to handle potentially malformed data gracefully.
| people_map = {p['id']: p['name'] for p in people_data} | |
| people_map = { | |
| p['id']: p['name'] | |
| for p in people_data | |
| if isinstance(p, dict) and 'id' in p and 'name' in p | |
| } |
backend/routers/developer.py
Outdated
| people_map = {} | ||
| if all_person_ids: | ||
| people_data = users_db.get_people_by_ids(uid, list(all_person_ids)) | ||
| people_map = {p['id']: p['name'] for p in people_data} |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dictionary comprehension people_map = {p['id']: p['name'] for p in people_data} assumes all person dictionaries have 'id' and 'name' keys. If any person document is missing these fields, it will raise a KeyError. Consider adding defensive checks or using .get() method: people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p} to handle potentially malformed data gracefully.
| people_map = {p['id']: p['name'] for p in people_data} | |
| people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p} |
|
Hey @smian1 👋 Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request. After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:
Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out. Thank you for being part of the Omi community! 💜 |
docs: Update identity and memory extraction guidelines in prompts.py