Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ The Kubernetes/AWS workflow runs natively in Python — it shells out to `aws`,

`setup kubernetes aws` (no `--config`) opens an interactive Questionary wizard with curated dropdowns for region, Kubernetes version, and per-node-group instance type (engine GPU instances, API c5n family, general-purpose for control-plane and license-proxy). Each dropdown shows `(default)` next to the suggested value and offers an `Other (enter custom)` entry for off-list values. Required fields (Quay credentials, API key, existing-EFS ID, deployment-file path) are enforced with non-empty validators.

Deployment types: `STT`, `TTS`, and `VOICE_AGENT` (which runs STT + TTS + end-of-turn engines together). Picking `VOICE_AGENT` unlocks extra prompts for Aura-2 TTS, per-pool engine replica counts, and LLM provider K8s secret refs. See [Voice Agent docs](kubernetes/aws/README.md#voice-agent) for details.

After the wizard collects answers it shows a Rich summary table (Cluster / Node groups / Other) and a four-way menu:

- **Deploy** — write the config to disk and run the full deployment. When `Dry run` is on, this option is renamed to **Render artifacts (dry run)**.
Expand Down
50 changes: 41 additions & 9 deletions kubernetes/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,16 +45,19 @@ The wizard collects, in order:
1. Cluster name (text)
2. AWS region (dropdown with `(default)` markers + `Other (enter custom)`)
3. Kubernetes version (dropdown)
4. Deployment type (`STT` or `TTS`)
5. Service exposure type (`ClusterIP`, `LoadBalancer`, or `NodePort`)
6. Per-node-group settings for **Control plane**, **Engine (GPU)**, and **API**:
- Instance type (curated dropdown — engine list is GPU instances; control-plane and API have their own curated lists; each supports `Other`)
4. Deployment type (`STT`, `TTS`, or `VOICE_AGENT`)
5. If `VOICE_AGENT`: Aura-2 TTS toggle + per-language opt-in (English / Spanish / Polyglot) with editable `t2cUuid` / `c2aUuid` / `cudaVisibleDevices` defaults
6. Service exposure type (`ClusterIP`, `LoadBalancer`, or `NodePort`)
7. Per-node-group settings for **Control plane**, **Engine**, and **API**:
- Instance type (curated dropdown — engine list is GPU instances; control-plane and API have their own curated lists; each supports `Other`). When `VOICE_AGENT` + Aura-2 is selected, the engine dropdown is filtered to multi-GPU instances (e.g. `g6.12xlarge`).
- Min / desired / max size (integer validators)
7. License Proxy enable + (if enabled) its node-group settings
8. EFS storage mode — `Create new EFS` or `Use existing EFS` (existing requires a non-empty `fs-...` ID)
9. Model URL source — `Enter URLs manually`, `Load from deployment .txt file`, or (only when EFS is existing) `Skip (use models already on EFS)`
10. Kubernetes secrets mode — `Use external secret store` (default) or `Create in-cluster secrets`
11. Dry run toggle
8. If `VOICE_AGENT`: per-pool engine replica counts for `agent-speech-to-text`, `agent-text-to-speech`, `agent-end-of-turn`
9. License Proxy enable + (if enabled) its node-group settings
10. EFS storage mode — `Create new EFS` or `Use existing EFS` (existing requires a non-empty `fs-...` ID)
11. Model URL source — `Enter URLs manually`, `Load from deployment .txt file`, or (only when EFS is existing) `Skip (use models already on EFS)`
12. Kubernetes secrets mode — `Use external secret store` (default) or `Create in-cluster secrets`
13. If `VOICE_AGENT`: LLM provider credentials — one entry per line in the form `provider=secret-ref` (provider one of `openai`, `anthropic`, `groq`, `elevenlabs`, `cartesia`, `xai`, `google`). When secrets mode is `create`, you'll also be prompted (password input) for each provider's API key.
14. Dry run toggle

After the wizard collects answers, the summary screen renders three Rich tables (Cluster / Node groups / Other) and shows a four-way menu:

Expand Down Expand Up @@ -119,6 +122,35 @@ uv run dg-self-hosted setup kubernetes aws --config deployments/stt-2.yaml

Resolution order at deploy time: in-memory wizard input → env vars → values in the YAML. If none of those provide all three, the deploy aborts with a message naming the env vars.

For Voice Agent LLM provider credentials, the same in-memory / env / config precedence applies, per provider. Env vars:

```bash
export DG_OPENAI_API_KEY=...
export DG_ANTHROPIC_API_KEY=...
export DG_GROQ_API_KEY=...
export DG_ELEVENLABS_API_KEY=...
export DG_CARTESIA_API_KEY=...
export DG_XAI_API_KEY=...
export DG_GOOGLE_API_KEY=...
```

## Voice Agent

When `deployment.type` is `VOICE_AGENT`, the wizard and renderer diverge from STT/TTS:

- `agent.enabled: true` in the rendered Helm values.
- `scaling.replicas.engine` becomes a dict with `agent-speech-to-text`, `agent-text-to-speech`, and `agent-end-of-turn` keys (matching Deepgram's [voice agent AWS chart sample](https://github.com/deepgram/self-hosted-resources/blob/main/charts/deepgram-self-hosted/samples/05-voice-agent-aws.values.yaml)).
- `cluster-autoscaler.enabled` is forced to `false`. Autoscaling is not yet supported for Voice Agent upstream.
- The wizard offers Aura-2 TTS per language (English, Spanish, Polyglot). UUID defaults are vendored from the chart sample above. If Deepgram rotates the UUIDs in a model release, refresh them with:

```bash
kubectl logs -l engine-type=agent-text-to-speech -n dg-self-hosted | head -100
```

Look for lines `Inserting model key=TtsKey { ... uuid: ... }` (→ `t2cUuid`) and `Inserting model key=Codes2AudioKey { ... uuid: ... }` (→ `c2aUuid`). Edit the values in the saved config and redeploy.

- LLM provider credentials are collected line-by-line. Each entry yields a corresponding `global.thirdPartyCredentials.*SecretRef` field in the rendered values. When `secrets.mode == create`, the wizard also collects each provider's API key (in memory only) and creates one generic K8s secret per provider at deploy time.

## Prompt Tips

- The instance-type dropdowns offer curated AWS instances for each role (engine = GPU; control-plane / license-proxy = general-purpose; API = c5n family). `Other (enter custom)` lets you type any instance type, including ones not in the list.
Expand Down
93 changes: 92 additions & 1 deletion src/deepgram_self_hosted/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,61 @@

SECRET_KEYS: tuple[str, ...] = ("registry_username", "registry_password", "api_key")

# Known LLM providers supported by the Helm chart's `global.thirdPartyCredentials`.
# Key is the canonical provider id used in our config; value is the Helm-values
# field name under `global.thirdPartyCredentials`.
LLM_PROVIDER_SECRET_REF_FIELDS: dict[str, str] = {
"openai": "openAiSecretRef",
"anthropic": "anthropicSecretRef",
"groq": "groqSecretRef",
"elevenlabs": "elevenLabsSecretRef",
"cartesia": "cartesiaSecretRef",
"xai": "xaiSecretRef",
"google": "googleSecretRef",
}

# Voice Agent engine "replicas" entries (Helm `scaling.replicas.engine` keys).
VOICE_AGENT_ENGINE_REPLICA_KEYS: tuple[str, ...] = (
"agent-speech-to-text",
"agent-text-to-speech",
"agent-end-of-turn",
)

# Aura-2 UUIDs vendored from the upstream chart sample:
# https://github.com/deepgram/self-hosted-resources/blob/main/charts/deepgram-self-hosted/samples/05-voice-agent-aws.values.yaml
# If the chart updates these, fetch the latest values from the sample, or extract
# from a running pod with:
# kubectl logs -l engine-type=agent-text-to-speech -n <namespace> | head -100
DEFAULT_AURA2_UUIDS: dict[str, dict[str, str]] = {
"english": {
"t2cUuid": "0ec06c9b-0aa0-44d0-a001-3ec57d32229e",
"c2aUuid": "2e5096c7-7bf1-435e-bbdd-f673f88d0ebd",
"cudaVisibleDevices": "0,1",
},
"spanish": {
"t2cUuid": "c053c7a8-7317-4de8-8a50-7e01c54e7ba9",
"c2aUuid": "04355c1e-8148-478d-9f6c-6a6c54ec3591",
"cudaVisibleDevices": "2,3",
},
"polyglot": {
"t2cUuid": "04975889-c601-4f80-a02f-0f2f9c22deaf",
"c2aUuid": "9e94567e-11e7-4619-adbc-d28212194367",
"cudaVisibleDevices": "2,3",
},
}


def _default_aura2_language(language: str, enabled: bool = False) -> dict[str, Any]:
uuids = DEFAULT_AURA2_UUIDS[language]
return {
"enabled": enabled,
"maxBatchSize": 8,
"t2cUuid": uuids["t2cUuid"],
"c2aUuid": uuids["c2aUuid"],
"cudaVisibleDevices": uuids["cudaVisibleDevices"],
}


DEFAULT_EKS_CONFIG: dict[str, Any] = {
"target": "kubernetes/aws",
"cluster": {
Expand All @@ -22,7 +77,14 @@
},
"node_groups": {
"control_plane": {"min": 1, "desired": 1, "max": 3, "instance_type": "t3.large"},
"engine": {"min": 1, "desired": 1, "max": 8, "instance_type": "g6.2xlarge"},
"engine": {
"min": 1,
"desired": 1,
"max": 8,
"instance_type": "g6.2xlarge",
# Voice Agent only: per-engine-pool replica counts.
"agent_replicas": {key: 1 for key in VOICE_AGENT_ENGINE_REPLICA_KEYS},
},
"api": {"min": 1, "desired": 1, "max": 2, "instance_type": "c5n.xlarge"},
"license_proxy": {"min": 0, "desired": 0, "max": 2, "instance_type": "t3.large"},
},
Expand All @@ -39,13 +101,30 @@
"registry_username": None,
"registry_password": None,
"api_key": None,
# Per-provider API keys (Voice Agent only). Same redaction semantics as
# the three top-level secret values: stripped to None before write when
# secrets.mode == "create".
"llm_provider_api_keys": {},
},
"license_proxy": {
"enabled": False,
},
"cluster_autoscaler": {
"enabled": True,
},
"agent": {
"enabled": False,
},
"aura2": {
"enabled": False,
"english": _default_aura2_language("english", enabled=False),
"spanish": _default_aura2_language("spanish", enabled=False),
"polyglot": _default_aura2_language("polyglot", enabled=False),
},
# Voice Agent LLM provider secret refs. Keys are provider ids from
# LLM_PROVIDER_SECRET_REF_FIELDS; values are K8s secret names. Empty by
# default — populated by the wizard when type is VOICE_AGENT.
"third_party_credentials": {},
"actions": {
"dry_run": False,
"expanded_eksctl_dry_run": False,
Expand Down Expand Up @@ -82,6 +161,10 @@ def strip_secrets(config: dict[str, Any]) -> dict[str, Any]:
for key in SECRET_KEYS:
if key in secrets:
secrets[key] = None
# Per-provider LLM API keys (Voice Agent) — clear values, preserve provider keys.
llm_keys = secrets.get("llm_provider_api_keys")
if isinstance(llm_keys, dict):
secrets["llm_provider_api_keys"] = {provider: None for provider in llm_keys}
return sanitized


Expand All @@ -90,6 +173,14 @@ def extract_secrets(config: dict[str, Any]) -> dict[str, Any]:
return {key: get_path(config, "secrets", key) for key in SECRET_KEYS}


def extract_llm_provider_api_keys(config: dict[str, Any]) -> dict[str, str | None]:
"""Return a mapping of provider id -> API key value from config (any may be None)."""
keys = get_path(config, "secrets", "llm_provider_api_keys", default={}) or {}
if not isinstance(keys, dict):
return {}
return {provider: value for provider, value in keys.items()}


def clone_eks_config(
source: dict[str, Any],
*,
Expand Down
91 changes: 89 additions & 2 deletions src/deepgram_self_hosted/providers/kubernetes_aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@
from rich.console import Console

from deepgram_self_hosted.config import (
LLM_PROVIDER_SECRET_REF_FIELDS,
SECRET_KEYS,
extract_llm_provider_api_keys,
extract_secrets,
get_path,
load_config,
Expand All @@ -34,6 +36,16 @@
"api_key": "DG_API_KEY",
}

LLM_PROVIDER_ENV_VARS: dict[str, str] = {
"openai": "DG_OPENAI_API_KEY",
"anthropic": "DG_ANTHROPIC_API_KEY",
"groq": "DG_GROQ_API_KEY",
"elevenlabs": "DG_ELEVENLABS_API_KEY",
"cartesia": "DG_CARTESIA_API_KEY",
"xai": "DG_XAI_API_KEY",
"google": "DG_GOOGLE_API_KEY",
}

DEFAULT_NAMESPACE = "dg-self-hosted"
DEFAULT_RELEASE = "deepgram"
HELM_CHART = "deepgram/deepgram-self-hosted"
Expand Down Expand Up @@ -68,7 +80,12 @@ def setup(console: Console, *, config_path: Path | None = None) -> None:
secrets_in_memory = (
extract_secrets(config) if secrets_mode == "create" else None
)
config_to_save = strip_secrets(config) if secrets_in_memory else config
llm_api_keys_in_memory = (
extract_llm_provider_api_keys(config) if secrets_mode == "create" else None
)
config_to_save = (
strip_secrets(config) if secrets_in_memory or llm_api_keys_in_memory else config
)
Comment on lines +83 to +88
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Nit: Add in-memory notification for LLM API keys parallel to standard secrets noticesetup() prints a warning (lines 100–105) telling users which env vars to set for non-interactive re-runs when standard secrets are held in memory, but no equivalent message is printed when llm_api_keys_in_memory is non-empty. Users who complete the Voice Agent wizard with mode=create won't know to set DG_OPENAI_API_KEY etc. until they hit the deploy-time ValueError on a re-run.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In `src/deepgram_self_hosted/providers/kubernetes_aws.py`, after line 105 (the closing of the `if secrets_in_memory:` block), add a parallel notification for LLM API keys:

```python
if llm_api_keys_in_memory:
    console.print(
        "[yellow]LLM provider API keys kept in memory only; not written to disk.[/yellow] "
        "For non-interactive re-runs, set "
        f"{', '.join(LLM_PROVIDER_ENV_VARS[p] for p in llm_api_keys_in_memory if p in LLM_PROVIDER_ENV_VARS)}."
    )

This mirrors the existing block at lines 100–105 that warns about standard secrets.


</details>
<!-- ai_prompt_end -->


if on_disk_path is None:
default_save = KUBERNETES_AWS_ARTIFACTS_DIR / "session.yaml"
Expand All @@ -90,7 +107,12 @@ def setup(console: Console, *, config_path: Path | None = None) -> None:
if decision == "save":
return

_run_native_setup(on_disk_path, console, secrets_override=secrets_in_memory)
_run_native_setup(
on_disk_path,
console,
secrets_override=secrets_in_memory,
llm_api_keys_override=llm_api_keys_in_memory,
)


def _summary_loop(config: dict[str, Any], console: Console) -> str:
Expand Down Expand Up @@ -129,6 +151,7 @@ def _run_native_setup(
console: Console,
*,
secrets_override: dict[str, Any] | None = None,
llm_api_keys_override: dict[str, str | None] | None = None,
) -> None:
config = load_config(config_path)
cluster_name = str(get_path(config, "cluster", "name", default="deepgram-self-hosted-cluster"))
Expand All @@ -143,6 +166,12 @@ def _run_native_setup(
console.print(f"Wrote cluster config to [bold]{cluster_config_path}[/bold]")

if get_path(config, "actions", "dry_run", default=False):
# Render Helm values too so users can diff against the upstream chart
# sample without doing a full deploy. EFS ID and role ARN remain at
# their placeholder values (rendered offline).
values_path = artifact_dir / "my-values.yaml"
values_path.write_text(render_values(config, resolve_aws=False))
console.print(f"Wrote Helm values to [bold]{values_path}[/bold]")
if get_path(config, "actions", "expanded_eksctl_dry_run", default=False):
expanded = artifact_dir / "eksctl-expanded-cluster-config.yaml"
result = run(
Expand Down Expand Up @@ -204,6 +233,8 @@ def _run_native_setup(
if secrets_mode == "create":
creds = _resolve_secrets(config, secrets_override)
_create_secrets(creds, DEFAULT_NAMESPACE, console)
llm_creds = _resolve_llm_provider_secrets(config, llm_api_keys_override)
_create_llm_provider_secrets(config, llm_creds, DEFAULT_NAMESPACE, console)
else:
console.print(
f"[yellow]Skipping secret creation; ensure dg-regcred and "
Expand Down Expand Up @@ -247,6 +278,30 @@ def _resolve_secrets(
}


def _resolve_llm_provider_secrets(
config: dict[str, Any],
llm_api_keys_override: dict[str, str | None] | None,
) -> dict[str, str | None]:
"""Resolve LLM provider API keys for each provider listed in third_party_credentials.

Same precedence as _resolve_secrets: (1) in-memory override, (2) env var,
(3) value remaining in config file.
"""
providers = list((get_path(config, "third_party_credentials", default={}) or {}).keys())
resolved: dict[str, str | None] = {}
for provider in providers:
if llm_api_keys_override and llm_api_keys_override.get(provider):
resolved[provider] = llm_api_keys_override[provider]
continue
env_var = LLM_PROVIDER_ENV_VARS.get(provider)
env_value = os.environ.get(env_var) if env_var else None
config_value = get_path(
config, "secrets", "llm_provider_api_keys", provider
)
resolved[provider] = env_value or config_value
return resolved


def _create_secrets(
creds: dict[str, str | None],
namespace: str,
Expand Down Expand Up @@ -284,6 +339,38 @@ def _create_secrets(
)


def _create_llm_provider_secrets(
config: dict[str, Any],
creds: dict[str, str | None],
namespace: str,
console: Console,
) -> None:
"""Create one generic K8s secret per enabled LLM provider when mode=create."""
refs = get_path(config, "third_party_credentials", default={}) or {}
if not refs:
return

for provider, secret_ref in refs.items():
if provider not in LLM_PROVIDER_SECRET_REF_FIELDS:
continue
api_key = creds.get(provider)
if not api_key:
env_var = LLM_PROVIDER_ENV_VARS.get(provider, "")
raise ValueError(
f"LLM provider `{provider}` is enabled but its API key is missing. "
f"Re-run the wizard interactively or set {env_var} before deploying."
)

console.print(f"Creating LLM provider secret [bold]{secret_ref}[/bold]...")
_kubectl_create_or_replace(
[
"kubectl", "create", "secret", "generic", str(secret_ref),
f"--from-literal=API_KEY={api_key}",
"--namespace", namespace,
],
)


def _kubectl_create_or_replace(create_command: list[str]) -> None:
rendered = subprocess.run(
[*create_command, "--dry-run=client", "-o", "yaml"],
Expand Down
Loading