Skip to content

feat(bot): add configurable TTS replies#65

Merged
grinev merged 2 commits intogrinev:mainfrom
shanekunz:feat/tts-replies
Apr 2, 2026
Merged

feat(bot): add configurable TTS replies#65
grinev merged 2 commits intogrinev:mainfrom
shanekunz:feat/tts-replies

Conversation

@shanekunz
Copy link
Copy Markdown
Contributor

Description of changes

  • add configurable TTS replies with /tts toggle and persisted per-chat setting
  • synthesize audio replies from completed assistant responses using configurable TTS provider settings
  • add tests and docs for TTS config, command behavior, and audio reply flow

Closes issue (optional)

How it was tested

image

Checklist

  • PR title follows Conventional Commits: <type>(<scope>)?: <description>
  • This PR contains one logically complete change
  • Branch is rebased on the latest main
  • I ran npm run lint, npm run build, and npm test
  • If this PR is OS-sensitive, behavior/limitations for Linux/macOS/Windows are described

@grinev
Copy link
Copy Markdown
Owner

grinev commented Apr 1, 2026

@shanekunz thanks for this PR - it a great feature. I tested it locally - it works well. I looked through the code and have a few comments:

  1. When the TTS request fails, the bot stays silent. Please send a short user-facing message like “Failed to generate audio reply” so the behavior is clear and not confusing.
  2. Right now the implementation stores TTS as one global setting, but the command text and docs say “for this chat”. To avoid a larger refactor, I suggest updating the wording in i18n and README to remove the per-chat phrasing and describe it as a global bot setting.
  3. The TTS response mode is saved only after session.prompt finishes successfully. This can create a small race condition if the final response arrives before that state is stored. It would be safer to set the mode earlier and clear it only on prompt failure.
  4. Please expand the documentation a bit more: update README.md and PRODUCT.md, and add a small TTS configuration example block in the README, similar to the audio transcription section, so users can see a ready-to-copy setup example.
  5. Lets change command /tts description to "Toggle audio replies" (for all languages) - sounds more clear I think. The command let stay as /tts - it's ok I think.
  6. When the user runs /tts, the bot should check that TTS is fully configured before enabling it. If TTS is not configured, it should show a warning and keep TTS disabled instead of switching it on anyway.
  7. I would also remove the fallback from TTS credentials to STT credentials. This is implicit behavior and may confuse users. It is better to require explicit TTS_API_URL and TTS_API_KEY for TTS.

And also please rebase your branch and resolve small conflict in readme. I will wait for your updates and will not do any changes to avoid more conflicts

@shanekunz
Copy link
Copy Markdown
Contributor Author

@grinev Thanks for the detailed review.

I addressed all points:

  1. Added a user-facing failure message when audio generation fails.
  2. Switched /tts wording/behavior to a global toggle instead of per-chat.
  3. Moved TTS response-mode setup earlier to avoid the prompt/response race.
  4. Expanded docs in README.md and PRODUCT.md, including a copyable TTS config example.
  5. Updated /tts command descriptions to “Toggle audio replies”.
  6. /tts now checks config before enabling and stays off if TTS is not configured.
  7. Removed the implicit fallback from TTS credentials to STT credentials.

I also verified npm run lint, npm run build, and npm test all pass on the updated branch

@grinev
Copy link
Copy Markdown
Owner

grinev commented Apr 2, 2026

@shanekunz you did great job! Thanks for contribution

@grinev grinev merged commit 70a327b into grinev:main Apr 2, 2026
1 check passed
@shanekunz shanekunz deleted the feat/tts-replies branch April 2, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: optional TTS replies for voice/audio prompts

2 participants