Ybr8u speaker indentification #3966

beastoin · 2025-12-29T13:31:21Z

No description provided.

… same speaker

gemini-code-assist

Code Review

This pull request introduces functionality for speaker identification by extracting and storing speech samples. It includes new database functions to manage speech samples, a complex background processing pipeline in pusher.py to extract samples from audio chunks, and significant refactoring in transcribe.py to improve timestamp handling for conversation segments. My review focuses on potential race conditions, performance, error handling, and a critical bug in timestamp logic. I've identified a critical race condition in the database logic that could violate sample limits, and a critical bug in timestamp handling that could lead to data corruption. I've also pointed out opportunities for performance improvement and more robust error logging.

gemini-code-assist · 2025-12-29T13:35:05Z

backend/database/users.py

+    person_ref = db.collection('users').document(uid).collection('people').document(person_id)
+    person_doc = person_ref.get()
+
+    if not person_doc.exists:
+        return False
+
+    person_data = person_doc.to_dict()
+    current_samples = person_data.get('speech_samples', [])
+
+    # Check if we've hit the limit
+    if len(current_samples) >= max_samples:
+        return False
+
+    person_ref.update({
+        'speech_samples': firestore.ArrayUnion([sample_path]),
+        'updated_at': datetime.now(timezone.utc),
+    })
+    return True


There is a race condition in this function. If two requests to add a sample arrive concurrently when the sample count is near the maximum, both could pass the len(current_samples) >= max_samples check before either one updates the database. This would result in more than max_samples being stored, violating the function's constraint. This read-check-update logic must be atomic. You can achieve this by using a Firestore transaction. You will need to import transactional from google.cloud.firestore_v1.

person_ref = db.collection('users').document(uid).collection('people').document(person_id) @transactional def _update_in_transaction(transaction): person_doc = person_ref.get(transaction=transaction) if not person_doc.exists: return False person_data = person_doc.to_dict() current_samples = person_data.get('speech_samples', []) if len(current_samples) >= max_samples: return False transaction.update(person_ref, { 'speech_samples': firestore.ArrayUnion([sample_path]), 'updated_at': datetime.now(timezone.utc), }) return True transaction = db.transaction() return _update_in_transaction(transaction)

gemini-code-assist · 2025-12-29T13:35:05Z

backend/routers/pusher.py

+                    {
+                        'data': bytes(private_cloud_sync_buffer),
+                        'conversation_id': current_conversation_id,
+                        'timestamp': private_cloud_chunk_start_time or time.time(),


The fallback or time.time() for the timestamp is incorrect. It uses the time of flushing, not a time related to when the audio was recorded. This can lead to incorrect timestamps for the last audio chunk, potentially causing audio ordering issues. Based on the logic, private_cloud_chunk_start_time should never be None if private_cloud_sync_buffer has data. Using time.time() hides a potential logic error with an incorrect value. The fallback should be removed. If this state is possible, the root cause should be investigated.

Suggested change

'timestamp': private_cloud_chunk_start_time or time.time(),

'timestamp': private_cloud_chunk_start_time,

gemini-code-assist · 2025-12-29T13:35:05Z

backend/routers/pusher.py

+    try:
+        # Check current sample count once
+        sample_count = users_db.get_person_speech_samples_count(uid, person_id)
+        if sample_count >= 5:


The maximum number of samples 5 is hardcoded. This value is also used on line 92 and is the default in users_db.add_person_speech_sample. To prevent inconsistencies and improve maintainability, this magic number should be defined as a constant at the module level (e.g., MAX_SPEAKER_SAMPLES = 5 with the other constants at the top of the file) and used in all these places.

gemini-code-assist · 2025-12-29T13:35:05Z

backend/routers/pusher.py

+            abs_end = started_at_ts + segment_end
+
+            # Find relevant chunks
+            sorted_chunks = sorted(chunks, key=lambda c: c['timestamp'])


The list_audio_chunks function already returns a list of chunks sorted by timestamp. Sorting it again here is redundant and adds unnecessary overhead, especially if the number of chunks is large.

Suggested change

sorted_chunks = sorted(chunks, key=lambda c: c['timestamp'])

sorted_chunks = chunks

gemini-code-assist · 2025-12-29T13:35:05Z

backend/routers/pusher.py

+    except Exception as e:
+        print(f"Error extracting speaker samples: {e}", uid, conversation_id)


Catching a broad Exception and only printing it to stdout is not ideal for background tasks, as it can hide important errors and makes debugging difficult. It's better to use the logging module to log the full exception with a stack trace. This will provide better observability. This pattern of except Exception as e: print(...) is repeated in process_private_cloud_queue (lines 312, 318) and process_speaker_sample_queue (line 365) and should be improved in all locations. You would need to import logging at the top of the file.

Suggested change

except Exception as e:

print(f"Error extracting speaker samples: {e}", uid, conversation_id)

except Exception:

logging.exception(f"Error extracting speaker samples for uid={uid} conversation_id={conversation_id}")

beastoin added 7 commits December 29, 2025 20:27

Implement speaker voice sample extraction for person profiles

e05e8ab

Improve audio timestamp accuracy in transcribe and pusher services

ed668ae

Simplify speaker sample request by sending only segment IDs

72176a3

Enhance speaker sample extraction by expanding adjacent segments with…

1fd1270

… same speaker

Simplify transcript time offset with unified approach

3e61af6

Simplify segment expansion logic in speaker sample extraction

edee143

Improve private cloud sync with queue-based processing and retry logic

ad752fe

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ybr8u speaker indentification #3966

Ybr8u speaker indentification #3966

beastoin commented Dec 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	'timestamp': private_cloud_chunk_start_time or time.time(),
	'timestamp': private_cloud_chunk_start_time,

	sorted_chunks = sorted(chunks, key=lambda c: c['timestamp'])
	sorted_chunks = chunks

		except Exception as e:
		print(f"Error extracting speaker samples: {e}", uid, conversation_id)

Ybr8u speaker indentification #3966

Are you sure you want to change the base?

Ybr8u speaker indentification #3966

Conversation

beastoin commented Dec 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants