fix(core): fix speech start time overriden by VAD SOS events#5670
fix(core): fix speech start time overriden by VAD SOS events#5670chenghao-mou wants to merge 4 commits intomainfrom
Conversation
Previously, it was overriden by VAD events such that turn start timestamps in Insights were misaligned to the last speaking span start time. This closes AGT-2840.
| with trace.use_span(self._ensure_user_turn_span()): | ||
| self._hooks.on_end_of_speech(ev) | ||
|
|
||
| self._vad_speech_started = False |
There was a problem hiding this comment.
π‘ clear_user_turn doesn't reset _turn_speech_started, causing stale _speech_start_time in subsequent turns
By making _turn_speech_started turn-scoped (no longer reset on VAD END_OF_SPEECH at the old line 998), clear_user_turn() at audio_recognition.py:650-660 must now explicitly reset _turn_speech_started (and _speech_start_time). Previously, even though clear_user_turn didn't reset _vad_speech_started, the next VAD END_OF_SPEECH event would reset it, allowing the subsequent START_OF_SPEECH to correctly set _speech_start_time. Now, if a user cancels a turn mid-speech (e.g., push-to-talk cancel_turn), _turn_speech_started remains True indefinitely (VAD EOS no longer resets it), so the next turn's first VAD START_OF_SPEECH skips updating _speech_start_time, leaving it stale from the previous turn. This causes incorrect started_speaking_at / stopped_speaking_at / end_of_turn_delay metrics in _EndOfTurnInfo passed to on_end_of_turn.
Prompt for agents
The removal of `self._vad_speech_started = False` from the VAD END_OF_SPEECH handler (old line 998) makes the flag turn-scoped. This is correct for the intended fix (preventing subsequent speech bursts within a turn from overwriting _speech_start_time). However, `clear_user_turn()` at line 650 now needs to also reset `_turn_speech_started` and `_speech_start_time` to ensure a clean slate for the next turn. Without this, cancelling a turn (e.g., push-to-talk cancel_turn) while the user is speaking leaves `_turn_speech_started = True`, so the next turn's VAD START_OF_SPEECH won't update `_speech_start_time`. Add `self._turn_speech_started = False` and `self._speech_start_time = None` to the `clear_user_turn` method body.
Was this helpful? React with π or π to provide feedback.
Previously, it was overriden by VAD events such that turn start timestamps in Insights were misaligned to the last speaking span start time.
This closes AGT-2840.
Before:

After:
