Skip to content

Feature Contribution: Add support for ElevenLabs TTS voices.#1073

Open
evetzyokozuna wants to merge 39 commits intoagent0ai:mainfrom
evetzyokozuna:main
Open

Feature Contribution: Add support for ElevenLabs TTS voices.#1073
evetzyokozuna wants to merge 39 commits intoagent0ai:mainfrom
evetzyokozuna:main

Conversation

@evetzyokozuna
Copy link
Copy Markdown

Summary

This PR adds and stabilizes an ElevenLabs-based voice output path in Agent Zero, alongside existing browser/Kokoro speech behavior.
It updates backend API handling, frontend speech routing, and settings UX so ElevenLabs can be enabled as an optional provider without regressing default behavior.

Files covered

  1. python/api/el11_tts.py
  2. webui/components/chat/speech/speech-store.js
  3. webui/components/settings/agent/speech.html
  4. requirements.txt

Problem Statement

Voice output through kokoro was OK, but for those wanting a more human like voice for their agent-zero implementation, allow for custom voices from ElevenLabs.

This PR implements an additional capability to use ElevenLabs voices.


What this PR changes

1) python/api/el11_tts.py — ElevenLabs proxy API endpoint

Purpose

Provide a server-side TTS proxy endpoint (/el11_tts) that:

  • accepts text input from the UI
  • resolves active voice profile configuration
  • calls ElevenLabs with server-side credentials
  • returns playable audio/mpeg data to the client

Behavior

  • Expects payload like:
    • text (required)
    • profile (optional, defaults to active profile)
  • Loads per-agent voice config from:
    • agents/<profile>/elevenlabs_voice.json
  • Uses environment key:
    • EL11_API_KEY
  • Returns:
    • audio stream bytes (MPEG) on success
    • structured JSON error payload on failure

Why this matters

  • Keeps API key off the browser
  • Enables profile-specific voice identity
  • Creates a clean TTS backend interface that can be reused for telephony paths later

2) webui/components/chat/speech/speech-store.js — speech provider routing + playback

Purpose

Add real speech routing support for ElevenLabs in the existing TTS flow.

Behavior added

  • provider gating checks for ElevenLabs mode (via local settings/toggle)
  • new ElevenLabs speech path that calls /el11_tts
  • robust audio playback for returned audio blobs
  • fallback behavior retained:
    • if ElevenLabs fails, existing Kokoro/browser behavior still works
  • existing stream/chunk speech flow remains intact

Why this matters

  • The UI can now actually use ElevenLabs audio, not just display a toggle
  • Preserves backward compatibility for users not enabling ElevenLabs

3) webui/components/settings/agent/speech.html — settings UX

Purpose

Expose a clear user-facing toggle for ElevenLabs proxy TTS in the Speech settings panel.

Behavior added

  • an explicit “Enable ElevenLabs TTS Proxy” control
  • UX text clarifying this uses the server proxy route and requires configured key/config

Why this matters

  • Provides discoverable, controllable behavior from UI
  • Aligns user intent with actual provider routing in speech-store

4) requirements.txt — dependency/runtime parity

Purpose

Align dependency set with runtime expectations for the ElevenLabs integration path and live environment stability.

Why this matters

  • Reduces “works in one environment but not another” drift
  • Supports reproducible deployments and clean runtime behavior

Configuration and Usage

Required env

  • EL11_API_KEY=<your_elevenlabs_key>

Required voice config

Place elevenlabs_voice.json in relevant agent directories, e.g.:

  • agents/agent0/elevenlabs_voice.json
  • agents/default/elevenlabs_voice.json
  • etc.

Example fields:

  • voice_id
  • model
  • stability
  • similarity_boost
  • style
  • optional quality-related settings as supported by endpoint

Enable in UI

  1. Open Settings -> Agent -> Speech
  2. Enable ElevenLabs TTS Proxy
  3. Trigger any voice output path in chat

Backward Compatibility

  • Default speech behavior remains unchanged unless ElevenLabs mode is enabled.
  • Kokoro/browser fallback paths remain available.
  • Existing speech chunking and stream sequencing logic remains preserved.

Security Considerations

  • ElevenLabs API key remains server-side (not exposed to browser code).
  • Frontend calls local authenticated endpoint (/el11_tts) rather than external API directly.
  • Profile-based config loading is constrained to expected agent config files.

Validation / Test Notes

Manual checks performed

  • endpoint registration and availability for /el11_tts
  • valid audio response path (content-type: audio/mpeg)
  • frontend served assets include ElevenLabs routing logic
  • settings toggle rendered and persisted in UI
  • fallback behavior sanity checked

Suggested reviewer checks

  • verify speech quality changes when ElevenLabs toggle is enabled
  • verify fallback when ElevenLabs key/config is missing
  • verify no regressions in browser/Kokoro modes
  • verify multi-agent profile voice switching behavior

Known Limitations / Follow-ups

  • current control for provider mode is toggle-based; future refinement can consolidate into a single tts_mode setting for stronger clarity.
  • telemetry around provider selection/fallback reason could be added for troubleshooting.
  • future telephony integration may reuse /el11_tts shape or move to provider abstraction layer.

Why this PR is valuable

This change turns ElevenLabs support from “partial wiring + config files” into a working, testable, user-selectable voice path in Agent Zero.
It is designed to preserve current behavior while enabling higher-quality voice output now and cleaner voice-provider extensibility going forward.

evetzyokozuna and others added 30 commits February 17, 2026 04:10
…ech tab, bound to localStorage speech.el11Server
… post method, request.json(), absolute paths)
…mods, speech UI, dashboard dir, flask/bak files)
EL11 TTS Proxy Implementation (feature/elevenlabs)
fix: Sync el11_tts.py & requirements.txt to live runtime (Flask hybrid)
Introduce configurable runtime safety caps for monologue loops and enforce hard stops for iteration count, runtime duration, consecutive misformats, and consecutive repairable errors to prevent runaway execution with backward-compatible defaults.
Introduce configurable controls for oversized tool args, code execution timeouts/output handling, subordinate depth/call/runtime limits, queue backpressure, and memory_load clamps so long-running sessions fail safely and remain tunable.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move history compression ratios and pass limits into settings, and add runtime budget knobs across monologue and subordinate execution paths to make long-running behavior tunable and bounded.

Co-authored-by: Cursor <cursoragent@cursor.com>
peretzrickett and others added 9 commits February 21, 2026 17:13
…erage.

This adds a policy knob to block/convert terminal heredoc writes into safe Python file writes, preserves resolved spilled tool args at execution time, and ensures every env profile exposes all known A0_SET keys with explicit model kwargs defaults.
This introduces a sectioned editor for guardrails and runtime knobs with typed controls and inline descriptions, making advanced tuning discoverable directly in the Settings UI.
…tion.

This introduces overview, knob reference, and testing guides for autonomy guardrails, and links them from the main docs index, usage guide, and env-example setup docs for PR-ready distribution.
This updates the autonomy documentation with an in-repo Settings panel image so the guide renders correctly in upstream GitHub and PR previews.
Add autonomy guardrails, fine-tuning controls, and integrated docs
- agents/health_advisor/: agent.json, role prompt, settings.json
- knowledge/health_advisor/: HealthLogRules.md for RAG
- skills/health-log-rules/: SKILL.md for log schema
- docs/setup/env-examples/profile_health_advisor.env

Phase 1 of HEALTH_ADVISOR_AGENT_IMPLEMENTATION_PLAN.
Deterministic tools (Phase 4) to follow.

Co-authored-by: Cursor <cursoragent@cursor.com>
- health_log_archive, supplement_list, macro_lookup, energy_delta
- health_log_new_day, section_write
- health_log_routine_build (named routine or freestyle)
- health_log_workout_submit (parse filled table, update Exercise Progression + section 06)
- health_log_exercise_append (single row append)
- Tool prompt documents all tools with examples

Co-authored-by: Cursor <cursoragent@cursor.com>
feat(health_advisor): Add Health Advisor agent profile with deterministic health log tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants