Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions livekit-agents/livekit/agents/voice/audio_recognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,18 @@ async def _on_stt_event(self, ev: stt.SpeechEvent) -> None:
with trace.use_span(self._ensure_user_turn_span()):
self._hooks.on_end_of_speech(None)

# STT EOT changes user state from speaking to listening without updating VAD internal states
# VAD EOS will also skip updating user state from listening (STT enforced) to listening (VAD detected)
# and user state won't be updated until a new VAD SOS is received
# reset VAD so that incorrect end of turn from STT can be corrected by VAD interruption
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment that it is related to the fact that vad is waiting some amount of "speech" before triggering an event

# if user is still speaking (an immediate VAD SOS will interrupt the agent)
if self._vad:
if self._speaking:
logger.warning(
"stt end of speech received while user is speaking, resetting vad"
)
self.update_vad(self._vad)

self._speaking = False
self._user_turn_committed = True
if not self._vad or self._last_speaking_time is None:
Expand Down
Loading