feat: echo STT transcripts to thread before agent reply#571
feat: echo STT transcripts to thread before agent reply#571dogzzdogzz wants to merge 2 commits intoopenabdev:mainfrom
Conversation
7f74166 to
6bc70f6
Compare
🔃 Review — PR #571: feat: echo STT transcripts to thread before agent replyVerdict: 🟡 Minor changes requested — well-structured PR with one blocking doc issue. Four-Question Framework
🔴 SUGGESTED CHANGES1. PR description contradicts the code
The code is correct (opt-in is the right call). Please update the PR description to match — reviewers and future readers rely on it as source of truth. 🟡 NITs (non-blocking)
🟢 INFO (good stuff)
Review by 超渡法師 🪬 |
PR Review: #571Reviewed commit: Summary
Core Assessment
Findings🔴 Critical: Helm template default contradicts PR description and code defaultWhat: The PR description says "Opt-out via
The code and docs are internally consistent (default = false, opt-in), but the PR description is wrong — it says "default Where: PR description body, first bullet point. 🟡 Minor:
|
Re-review: #571 (deeper pass)Reviewed commit: After a more thorough second pass, here are additional findings I missed in the first review: 🔴 Critical: Multi-clip ordering inconsistency is now user-visibleWhat: When a user uploads multiple audio clips (A, B, C), the ordering diverges:
Before this PR, the reversal was invisible to users (only the agent saw it). Now that echo shows the transcripts, users will see Where:
The PR description already acknowledges this as out-of-scope, which is fair. But I'd suggest at minimum option 2 — make echo match what the agent actually sees, even if both are reversed from upload order. Otherwise the echo defeats its own purpose for multi-clip messages. Severity reassessment: Downgrading to 🟡 since multi-clip voice messages are rare in practice, and the PR description explicitly calls this out. But it's worth discussing. 🟡 Minor:
|
|
Hi @dogzzdogzz — just checking in. When you get a chance, could you update the PR description to fix the default value mention? The description currently says |
When STT transcribes a voice message, optionally post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Default is OFF — opt in via [stt] echo_transcript = true. - New config: [stt] echo_transcript (default false, opt-in) - New helper: stt::post_echo with platform-agnostic ChatAdapter handle — future LINE/Telegram/Teams adapters get echo for free - Format: > 🎤 <transcript> per clip, all in one thread message - Failure: > 🎤 (transcription failed) line +⚠️ reaction on the user msg - Helm: agents.<name>.stt.echoTranscript (camelCase) wired through configmap - Docs: docs/stt.md and docs/config-reference.md updated Rebased on top of openabdev#567 (gateway config rendering). Tests: 133/133 cargo. helm-unittest: 28/28. Clippy --all-targets -D warnings clean.
|
Rebased onto current
|
Maintainer Fix Pushed (dc33d62)Fixed two issues and pushed to your branch:
Verification
PR is now buildable. The |
✅ E2E Test — STT Echo Transcript VerifiedDeployed PR #571 (commit dc33d62) to a test pod and verified the full STT echo flow with real voice messages. Test Environment
Results
Observations
Remaining Manual Tests (from PR description)
LGTM from E2E perspective. The syntax fix (dc33d62) and this functional test confirm the feature works as designed. |
obrutjack
left a comment
There was a problem hiding this comment.
Chloe Second Review — PR #571
Summary
Both previous blockers are resolved. Code is solid, well-tested, and ready to merge.
Verification of Prior Review Findings
| Finding | Status |
|---|---|
PR description default mismatch (true vs false) |
✅ Fixed |
hasKey dead code in Helm template |
✅ Fixed (no else branch now — only renders when explicitly set) |
| Merge conflict | ✅ Resolved, mergeable |
| Multi-clip ordering | 🟡 Pre-existing, acknowledged in PR description, not blocking |
| Import style nit | 🟡 Cosmetic, not blocking |
Additional Findings
🟡 Minor: No transcript length cap
format_echo_message flattens newlines but does not truncate. A long voice memo could exceed Discord's 2000-char limit, causing send_message to fail silently (error logged and swallowed). The user gets no echo at all.
Suggestion: Truncate at ~1900 chars with …, or split using the existing split_message() utility. Not blocking — worth a follow-up issue.
🟢 Security surface is clean
- No injection risk (transcript goes through
format!+ newline flattening) - No mention leakage (blockquote format, no
@injection path) post_echois fire-and-forget — cannot block agent pipeline- No new dependencies
🟢 Backward compatibility confirmed
echo_transcripthas#[serde(default)]— existing config.toml files parse fine- Helm values.yaml adds key with
falsedefault — existing deployments unaffected - No breaking changes to
ChatAdaptertrait
CI Status
All 10/10 checks pass (check ×2, helm-unittest, smoke-test ×7).
Changes Since Last Review (6bc70f6 → dc33d62)
- PR description corrected (now says default
false, opt-in) - Helm template improved (hasKey guard without else branch — relies on Rust serde default)
- Codebase-wide
cargo fmtpass (~15 files, formatting only, no logic changes) - Rebased on main (conflict resolved)
Verdict: APPROVE ✅
Reviewed by Chloe (second review)
Note: Groq Whisper outputs Simplified ChineseDuring testing with Chinese voice messages, the echo transcript displays Simplified Chinese (简体) regardless of the speaker's dialect. This is a Whisper model behavior — the This is not a bug in this PR — the echo correctly displays the raw STT output as-is. |
Documentation suggestion: Groq API Key prerequisiteThe STT feature requires a Groq API key. Users need to:
Groq free tier includes Whisper STT at no cost. This should be documented in |
Summary
When STT transcribes a voice message, post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Discord and Slack today; platform-agnostic helper means future adapters get it for free.
> 🎤 <transcript>per clip.> 🎤 (transcription failed)line +[stt] echo_transcript = true(defaultfalse, mirrored asstt.echoTranscriptin Helm values).Closes #570.
Originally requested in Discord: https://discord.com/channels/1491295327620169908/1491365150664560881/1497784772230123560
Architecture
stt::post_echo(&Arc<dyn ChatAdapter>, &ChannelRef, &MessageRef, &[EchoEntry], &SttConfig)is the platform-agnostic helper. Discord (src/discord.rs) and Slack (src/slack.rs) collect aVec<EchoEntry>while iterating audio attachments and call the helper before the dispatcher submit. Gateway-based platforms (LINE / Telegram / future Teams) intentionally not wired today — their protocol carries text only. The helper signature is unchanged when audio plumbing lands there later.Files changed
src/config.rs—SttConfig.echo_transcript: bool(defaultfalse).src/stt.rs—EchoEntryenum,format_echo_message,post_echowithMockAdapter-driven tests.src/discord.rs,src/slack.rs— wire echo into the audio attachment loop, callpost_echobeforedispatcher.submit.charts/openab/values.yaml,charts/openab/templates/configmap.yaml— exposeechoTranscript(defaultfalse,hasKeyguard renders only when explicitly set).docs/stt.md,docs/config-reference.md— documentecho_transcript.docs/superpowers/specs/anddocs/superpowers/plans/— design spec + TDD-style implementation plan that drove this work.Test plan
cargo test --bin openab— 133/133 pass (10 instt::testscover format, post_echo success, failure, mixed, disabled config, empty entries).cargo clippy --all-targets -- -D warnings— clean.helm lint charts/openab— clean.helm template ...with default values does NOT renderecho_transcript(relies on Rust serde default); with--set agents.kiro.stt.echoTranscript=truerendersecho_transcript = true.> 🎤 <transcript>before the agent's reply.Out of scope / follow-ups
extra_blocks.insert(0, …)reverses transcript order in the agent prompt whileecho_entries.push(…)preserves upload order. Pre-existing in the agent-prompt path; out of scope for this PR.🤖 Generated with Claude Code