-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat: Add speaker name mapping to transcript segments in developer API and webhook #3962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…d webhook handlers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds the speaker_name to transcript segments in developer API responses and conversation_created webhooks. The implementation correctly fetches person data to map person_id to a name.
My main feedback is regarding code duplication. The logic for adding speaker names is implemented in two separate places (backend/routers/developer.py and backend/utils/webhooks.py). This is a maintainability concern. I've left a specific comment with a suggestion to refactor one of the implementations for better efficiency and highlighted the need to consolidate this logic into a shared utility function in a follow-up PR to avoid duplication, which aligns with our guidelines for handling large-scale refactoring in separate PRs.
| person_ids = [seg.get('person_id') for seg in segments if seg.get('person_id')] | ||
| people_map = {} | ||
| if person_ids: | ||
| people_data = users_db.get_people_by_ids(uid, list(set(person_ids))) | ||
| people_map = {p['id']: p['name'] for p in people_data} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for collecting person_ids can be made more memory-efficient by building a set directly, rather than creating an intermediate list and then converting it to a set.
More importantly, this function _add_speaker_names_to_payload is nearly identical to _add_speaker_names_to_segments in backend/routers/developer.py. Code duplication increases maintenance overhead and can lead to inconsistencies. I strongly recommend extracting this logic into a single, shared utility function in a follow-up PR to improve maintainability. This approach aligns with our guideline that large-scale refactoring should be handled in a separate, dedicated pull request rather than being bundled with feature changes.
| person_ids = [seg.get('person_id') for seg in segments if seg.get('person_id')] | |
| people_map = {} | |
| if person_ids: | |
| people_data = users_db.get_people_by_ids(uid, list(set(person_ids))) | |
| people_map = {p['id']: p['name'] for p in people_data} | |
| person_ids = {seg['person_id'] for seg in segments if seg.get('person_id')} | |
| people_map = {} | |
| if person_ids: | |
| people_data = users_db.get_people_by_ids(uid, list(person_ids)) | |
| people_map = {p['id']: p['name'] for p in people_data} |
References
- Large-scale refactoring, such as converting synchronous I/O calls to asynchronous across multiple files, should be handled in a separate, dedicated pull request rather than being bundled with feature changes to maintain consistency with existing patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds speaker name mapping to transcript segments in both the developer API and webhook system. The feature resolves speaker names from person_id mappings stored in the database, providing more meaningful speaker identification in transcript segments.
Key Changes:
- Added
speaker_namefield to transcript segments with intelligent fallback logic: "User" for user segments, actual person names when person_id is mapped, or "Speaker {id}" as default - Implemented speaker name resolution using batched database queries to efficiently fetch person names from person_id mappings
- Integrated speaker name enrichment into both webhook payloads and developer API responses when transcripts are included
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
backend/utils/webhooks.py |
Added _add_speaker_names_to_payload() helper function and integrated it into the conversation_created_webhook() to enrich webhook payloads with speaker names |
backend/routers/developer.py |
Added _add_speaker_names_to_segments() helper function and integrated it into both the list and single conversation GET endpoints when include_transcript=True |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _add_speaker_names_to_segments(uid, conversations: list): | ||
| """Add speaker_name to transcript segments based on person_id mappings.""" | ||
| all_person_ids = set() | ||
| for conv in conversations: | ||
| for seg in conv.get('transcript_segments', []): | ||
| if seg.get('person_id'): | ||
| all_person_ids.add(seg['person_id']) | ||
|
|
||
| people_map = {} | ||
| if all_person_ids: | ||
| people_data = users_db.get_people_by_ids(uid, list(all_person_ids)) | ||
| people_map = {p['id']: p['name'] for p in people_data} | ||
|
|
||
| for conv in conversations: | ||
| for seg in conv.get('transcript_segments', []): | ||
| if seg.get('is_user'): | ||
| seg['speaker_name'] = 'User' | ||
| elif seg.get('person_id') and seg['person_id'] in people_map: | ||
| seg['speaker_name'] = people_map[seg['person_id']] | ||
| else: | ||
| seg['speaker_name'] = f"Speaker {seg.get('speaker_id', 0)}" | ||
|
|
||
|
|
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is nearly identical to _add_speaker_names_to_payload() in backend/utils/webhooks.py (lines 35-53). The logic for mapping speaker names is duplicated in both places.
Consider extracting this common logic into a shared utility function (e.g., in backend/utils/ or backend/database/) that both files can import and use. This would:
- Reduce code duplication
- Make the logic easier to maintain and test
- Ensure consistent behavior across both the webhook and API endpoints
| def _add_speaker_names_to_segments(uid, conversations: list): | |
| """Add speaker_name to transcript segments based on person_id mappings.""" | |
| all_person_ids = set() | |
| for conv in conversations: | |
| for seg in conv.get('transcript_segments', []): | |
| if seg.get('person_id'): | |
| all_person_ids.add(seg['person_id']) | |
| people_map = {} | |
| if all_person_ids: | |
| people_data = users_db.get_people_by_ids(uid, list(all_person_ids)) | |
| people_map = {p['id']: p['name'] for p in people_data} | |
| for conv in conversations: | |
| for seg in conv.get('transcript_segments', []): | |
| if seg.get('is_user'): | |
| seg['speaker_name'] = 'User' | |
| elif seg.get('person_id') and seg['person_id'] in people_map: | |
| seg['speaker_name'] = people_map[seg['person_id']] | |
| else: | |
| seg['speaker_name'] = f"Speaker {seg.get('speaker_id', 0)}" | |
| def _annotate_speaker_names(uid, transcript_segments: list): | |
| """ | |
| Add speaker_name to transcript segments based on person_id mappings. | |
| This helper operates on a flat list of segments so it can be reused | |
| wherever transcript segments are available. | |
| """ | |
| all_person_ids = {seg['person_id'] for seg in transcript_segments if seg.get('person_id')} | |
| people_map = {} | |
| if all_person_ids: | |
| people_data = users_db.get_people_by_ids(uid, list(all_person_ids)) | |
| people_map = {p['id']: p['name'] for p in people_data} | |
| for seg in transcript_segments: | |
| if seg.get('is_user'): | |
| seg['speaker_name'] = 'User' | |
| elif seg.get('person_id') and seg['person_id'] in people_map: | |
| seg['speaker_name'] = people_map[seg['person_id']] | |
| else: | |
| seg['speaker_name'] = f"Speaker {seg.get('speaker_id', 0)}" | |
| def _add_speaker_names_to_segments(uid, conversations: list): | |
| """Add speaker_name to transcript segments in a list of conversations.""" | |
| # Flatten all transcript segments from conversations so we can annotate | |
| # them in a single pass. | |
| all_segments = [] | |
| for conv in conversations: | |
| segments = conv.get('transcript_segments', []) | |
| if segments: | |
| all_segments.extend(segments) | |
| if all_segments: | |
| _annotate_speaker_names(uid, all_segments) |
| webhook_url += f'?uid={uid}' | ||
| try: | ||
| payload = memory.as_dict_cleaned_dates() | ||
| _add_speaker_names_to_payload(uid, payload) |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new speaker name mapping functionality in the webhook lacks test coverage. Since the repository has established integration tests (e.g., backend/tests/integration/test_notifications_integration.py), consider adding tests to verify:
- Speaker name resolution from person_id mappings
- Correct fallback to "User" for user segments
- Correct fallback to "Speaker {id}" format for unknown speakers
- Handling of conversations with no transcript segments
- Behavior when
get_people_by_ids()returns empty results
This would help ensure the feature works correctly and prevent regressions.
| for conv in unlocked_conversations: | ||
| conv.pop('transcript_segments', None) | ||
| else: | ||
| _add_speaker_names_to_segments(uid, unlocked_conversations) |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new speaker name mapping functionality in the developer API lacks test coverage. Since the repository has established integration tests (e.g., backend/tests/integration/test_notifications_integration.py), consider adding tests to verify:
- Speaker name resolution from person_id mappings works correctly in both single and multi-conversation scenarios
- Correct fallback to "User" for user segments
- Correct fallback to "Speaker {id}" format for unknown speakers
- Behavior when
include_transcript=Truevsinclude_transcript=False - Handling of conversations with no transcript segments
- Behavior when
get_people_by_ids()returns empty or partial results
This would help ensure the feature works correctly and prevent regressions.
| def _add_speaker_names_to_payload(uid, payload: dict): | ||
| """Add speaker_name to transcript segments in webhook payload.""" | ||
| segments = payload.get('transcript_segments', []) | ||
| if not segments: | ||
| return | ||
|
|
||
| person_ids = [seg.get('person_id') for seg in segments if seg.get('person_id')] | ||
| people_map = {} | ||
| if person_ids: | ||
| people_data = users_db.get_people_by_ids(uid, list(set(person_ids))) | ||
| people_map = {p['id']: p['name'] for p in people_data} | ||
|
|
||
| for seg in segments: | ||
| if seg.get('is_user'): | ||
| seg['speaker_name'] = 'User' | ||
| elif seg.get('person_id') and seg['person_id'] in people_map: | ||
| seg['speaker_name'] = people_map[seg['person_id']] | ||
| else: | ||
| seg['speaker_name'] = f"Speaker {seg.get('speaker_id', 0)}" |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is nearly identical to _add_speaker_names_to_segments() in backend/routers/developer.py (lines 683-703). The logic for mapping speaker names is duplicated in both places with only minor differences in the conversations iteration structure.
Consider extracting this common logic into a shared utility function (e.g., in backend/utils/ or backend/database/) that both files can import and use. This would:
- Reduce code duplication
- Make the logic easier to maintain and test
- Ensure consistent behavior across both the webhook and API endpoints
Summary
speaker_namefield to transcript segments in developer API responsesspeaker_namefield to conversation_created webhook payloadChanges
backend/routers/developer.py: Added_add_speaker_names_to_segments()helper and integrated intoGET /v1/dev/user/conversationsendpointsbackend/utils/webhooks.py: Added_add_speaker_names_to_payload()helper and integrated intoconversation_created_webhook()Result
Each transcript segment now includes a
speaker_namefield:"User"if the segment is from the userperson_idis assigned"Speaker {id}"as fallback🤖 Generated with Claude Code