feat/kani-tts-2 support#12
Conversation
Integrate Kani TTS 2 (nineninesix/kani-tts-2-en) as a new engine alongside Chatterbox and Kokoro. Kani is a 400M-parameter LLM-based TTS model with voice cloning via speaker embeddings and HuggingFace auto-download support. New files: - src/core/kani.engine.ts: TypeScript engine implementing ITTSEngine - scripts/kani_runner.py: Python CLI runner with voice cloning support - scripts/requirements-kani.txt: Python dependencies Modified files: - tts-constants.ts: Add KANI_DEFAULTS and KANI_LIMITS - tts-engine.interface.ts: Add temperature, top_p, repetition_penalty, model to TTSOptions - tts-service.factory.ts: Register "kani" engine type - tools.ts: Add Kani params to MCP tool schema and pass-through - .env.example: Add Kani configuration section - tests: Add engine selection, capabilities, and validation tests Set TTS_ENGINE=kani to use. Models auto-download from HuggingFace on first run. https://claude.ai/code/session_01KNVsaVefDTsFt8Do5NeCu4
Review Summary by QodoAdd Kani TTS 2 engine support with voice cloning and HuggingFace integration
WalkthroughsDescription• Add Kani TTS 2 as third TTS engine option with voice cloning support • Implement KaniEngine class with Python runner for HuggingFace model integration • Extend TTSOptions interface with temperature, top_p, repetition_penalty, model parameters • Register Kani engine in factory and update MCP tool schema with new parameters • Add comprehensive validation tests for Kani-specific options and capabilities Diagramflowchart LR
A["TTSService Factory"] -->|"registers"| B["KaniEngine"]
B -->|"spawns"| C["kani_runner.py"]
C -->|"uses"| D["Kani TTS 2 Model"]
D -->|"auto-downloads from"| E["HuggingFace"]
B -->|"validates"| F["TTSOptions<br/>temperature, top_p,<br/>repetition_penalty"]
G["MCP Tools"] -->|"passes parameters"| B
H["Voice Cloning"] -->|"via speaker embeddings"| B
File Changes1. src/core/kani.engine.ts
|
📝 WalkthroughWalkthroughThis pull request introduces support for Kani TTS 2, a new text-to-speech engine. It includes environment configuration, a Python CLI orchestrator script, TypeScript engine wrapper, TTS system integration across constants, interfaces, and factory, MCP tools support, and comprehensive unit tests. Changes
Sequence DiagramsequenceDiagram
participant Client
participant KaniEngine as KaniEngine<br/>(TypeScript)
participant PythonRunner as kani_runner.py<br/>(Python CLI)
participant KaniTTS as Kani TTS 2<br/>Library
participant FileSystem as Output<br/>WAV File
Client->>KaniEngine: ensureReady()
KaniEngine->>KaniEngine: Verify Python path & script
KaniEngine->>PythonRunner: Check dependencies
PythonRunner-->>KaniEngine: Status
Note over KaniEngine: Install deps if needed
Client->>KaniEngine: synthesize(text, options)
KaniEngine->>KaniEngine: Validate options (temp, top_p, etc.)
KaniEngine->>PythonRunner: Invoke with args + model config
PythonRunner->>KaniTTS: Initialize KaniTTS(model, reference_audio)
KaniTTS->>KaniTTS: Extract speaker embedding
PythonRunner->>KaniTTS: Generate audio(text, kwargs)
KaniTTS-->>PythonRunner: Audio buffer
PythonRunner->>FileSystem: Write WAV
PythonRunner-->>KaniEngine: stdout/stderr + exit code
KaniEngine-->>Client: Resolved path to WAV file
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Summary of ChangesHello @soyboyscout, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive support for Kani TTS 2, a new text-to-speech engine, into the system. This integration provides users with an additional powerful option for speech synthesis, featuring capabilities like voice cloning via speaker embeddings and the use of HuggingFace-hosted models. The changes involve adding a Python runner script and its dependencies, updating configuration examples, implementing a new TypeScript engine, and extending the TTS service factory and tool definitions to seamlessly incorporate Kani TTS 2. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis pull request contains a version bump for the project, incrementing from version 0.1.0 to 0.3.0. This represents a minor version update following semantic versioning conventions. The change is isolated to the package-lock.json file, updating both the top-level version field and the corresponding package version entry within the packages section. No dependencies have been added, removed, or updated, and no other configuration changes are present. This is a straightforward release version update with no functional code changes. Changes
🔗 Cross-Repository Impact AnalysisEnable automatic detection of breaking changes across your dependent repositories. → Set up now Learn more about Cross-Repository AnalysisWhat It Does
How to Enable
Benefits
Note for WindsurfPlease change the default marketplace provider to the following in the windsurf settings:Marketplace Extension Gallery Service URL: https://marketplace.visualstudio.com/_apis/public/gallery Marketplace Gallery Item URL: https://marketplace.visualstudio.com/items Entelligence.ai can learn from your feedback. Simply add 👍 / 👎 emojis to teach it your preferences. More shortcuts belowEmoji Descriptions:
Interact with the Bot:
Also you can trigger various commands with the bot by doing The current supported commands are
More commands to be added soon. |
| if (referenceAudio) { | ||
| args.push("--reference_audio", referenceAudio); | ||
| } | ||
|
|
||
| if (languageTag) { | ||
| args.push("--language_tag", languageTag); | ||
| } | ||
|
|
||
| logger.log("[KaniEngine] Python path:", this.pythonPath); | ||
| logger.log("[KaniEngine] Script path:", this.scriptPath); | ||
| logger.log("[KaniEngine] Arguments:", args); | ||
|
|
||
| return new Promise<string>((resolve, reject) => { |
There was a problem hiding this comment.
Path Traversal (🔒 Security, 🔴 High) - The code uses external input "referenceAudio" to build a file path without validating it, allowing attackers to access files outside the allowed directory. View in Corgea ↗
More Details
🎟️Issue Explanation: The code uses external input "referenceAudio" to build a file path without validating it, allowing attackers to access files outside the allowed directory.
- "referenceAudio" can include path traversal characters like "../", letting attackers reach unauthorized files.
- The check only pushes this input to "args" without sanitizing or restricting the path location.
- This can expose sensitive files if the path resolves outside the restricted directory, risking data leaks or unauthorized access.
🪄Fix Explanation: The fix mitigates path traversal by resolving the input against a safe base directory, validating it stays within that directory, and confirming the target file exists and is a file before use.
The code sets "baseDir" to a fixed safe path ("os.tmpdir()"), ensuring resolution is relative to this directory.
It resolves "referenceAudio" via "path.resolve(baseDir, referenceAudio)", normalizing the path.
The relative path is computed with "path.relative(baseDir, resolvedRef)", and traversal attempts are detected by checking if it starts with ".." or is absolute.
Existence and file-type checks use "fs.existsSync(resolvedRef)" and "fs.statSync(resolvedRef).isFile()" to prevent invalid or malicious inputs.
Only after passing these validations is the resolved, safe path appended to "args", preventing injection of arbitrary file paths.
💡Important Instructions: Ensure
referenceAudio inputs come from trusted sources or include user input validation upstream to avoid legitimate path rejection.
| if (referenceAudio) { | |
| args.push("--reference_audio", referenceAudio); | |
| } | |
| if (languageTag) { | |
| args.push("--language_tag", languageTag); | |
| } | |
| logger.log("[KaniEngine] Python path:", this.pythonPath); | |
| logger.log("[KaniEngine] Script path:", this.scriptPath); | |
| logger.log("[KaniEngine] Arguments:", args); | |
| return new Promise<string>((resolve, reject) => { | |
| if (referenceAudio) { | |
| const baseDir = path.resolve(os.tmpdir()); | |
| const resolvedRef = path.resolve(baseDir, referenceAudio); | |
| const relativeRef = path.relative(baseDir, resolvedRef); | |
| if (relativeRef.startsWith("..") || path.isAbsolute(relativeRef)) { | |
| throw new Error("Invalid referenceAudio path: path traversal detected"); | |
| } | |
| if (!fs.existsSync(resolvedRef) || !fs.statSync(resolvedRef).isFile()) { | |
| throw new Error("Invalid referenceAudio path: file does not exist or is not a file"); | |
| } | |
| args.push("--reference_audio", resolvedRef); | |
| } |
|
🐕 Corgea found the following new SCA issues in the codebase:
Showing 5 out of 8 findings. See full results |
There was a problem hiding this comment.
Code Review
This pull request introduces support for the Kani TTS 2 engine, a significant feature addition. While the implementation includes a new Python runner, a TypeScript engine class, configuration updates, and tests, a critical security vulnerability exists regarding input validation for the new engine. Specifically, the referenceAudio and model parameters are passed to the underlying Python script without proper sanitization and path validation, potentially allowing an attacker to probe the server's filesystem. It is recommended to implement strict path validation and parameter sanitization in the KaniEngine class before merging. Additionally, for better maintainability, consider referencing the requirements-kani.txt file instead of hardcoding dependency lists in error messages.
| const referenceAudio = | ||
| options?.referenceAudio || process.env.KANI_REFERENCE_AUDIO || ""; |
There was a problem hiding this comment.
The KaniEngine implementation fails to validate the referenceAudio path provided in the TTSOptions. Unlike the ChatterboxEngine, which performs rigorous checks to prevent path traversal and restrict access to allowed directories, KaniEngine passes the user-supplied path directly to the underlying Python script. In src/mcp/tools.ts, the validation is explicitly skipped for this engine (line 126). This allows an attacker to specify arbitrary file paths on the server, potentially leading to information disclosure or unauthorized file access if the kani-tts-2 library or the runner script processes the file.
| const model = | ||
| options?.model ?? | ||
| (process.env.KANI_MODEL || KANI_DEFAULTS.MODEL); |
There was a problem hiding this comment.
The model parameter is taken directly from user input without any validation or sanitization. This parameter is passed as a command-line argument to the Python runner script. An attacker could provide a malicious model name or a local file path, which might lead to SSRF (if the library attempts to download a model from an untrusted source) or unauthorized local file access.
| print("Please install required packages:", file=sys.stderr) | ||
| print(" pip install kani-tts-2 transformers==4.56.0 soundfile numpy", file=sys.stderr) |
There was a problem hiding this comment.
For better maintainability, instead of hardcoding the installation command, it would be better to refer to the scripts/requirements-kani.txt file. This ensures that if the dependencies change in the requirements file, this error message remains accurate.
| print("Please install required packages:", file=sys.stderr) | |
| print(" pip install kani-tts-2 transformers==4.56.0 soundfile numpy", file=sys.stderr) | |
| print("Please install required packages from 'scripts/requirements-kani.txt':", file=sys.stderr) | |
| print(" pip install -r scripts/requirements-kani.txt", file=sys.stderr) |
| `Failed to install Kani dependencies automatically.\n` + | ||
| `Please install manually:\n` + | ||
| ` pip install kani-tts-2 transformers==4.56.0 soundfile numpy\n\n` + |
There was a problem hiding this comment.
To improve maintainability, it would be better to reference the scripts/requirements-kani.txt file in this error message instead of hardcoding the package list. This ensures the instructions remain correct if the requirements file is updated.
| `Failed to install Kani dependencies automatically.\n` + | |
| `Please install manually:\n` + | |
| ` pip install kani-tts-2 transformers==4.56.0 soundfile numpy\n\n` + | |
| `Failed to install Kani dependencies automatically.\n` + | |
| `Please install manually from 'scripts/requirements-kani.txt':\n` + | |
| ` pip install -r scripts/requirements-kani.txt\n\n` + |
| `Failed to run pip installer: ${error.message}\n` + | ||
| `Please install dependencies manually:\n` + | ||
| ` pip install kani-tts-2 transformers==4.56.0 soundfile numpy` |
There was a problem hiding this comment.
For consistency and maintainability, let's also refer to the scripts/requirements-kani.txt file here instead of hardcoding the dependency list.
| `Failed to run pip installer: ${error.message}\n` + | |
| `Please install dependencies manually:\n` + | |
| ` pip install kani-tts-2 transformers==4.56.0 soundfile numpy` | |
| `Failed to run pip installer: ${error.message}\n` + | |
| `Please install dependencies manually from 'scripts/requirements-kani.txt':\n` + | |
| ` pip install -r scripts/requirements-kani.txt` |
There was a problem hiding this comment.
🧪 PR Review is completed: Review of Kani TTS 2 integration. Identified critical robustness issues in configuration parsing (NaN handling), reference audio logic, and filename generation.
Skipped files
package-lock.json: Skipped file pattern
⬇️ Low Priority Suggestions (1)
scripts/kani_runner.py (1 suggestion)
Location:
scripts/kani_runner.py(Lines 34-34)🟡 Configuration
Issue: The sample rate is hardcoded to 22050Hz. If Kani TTS 2 uses a different sample rate (e.g., 24kHz), the output audio will have incorrect pitch and speed.
Fix: Verify the model's sample rate and update the constant if necessary. Adding a TODO for verification.
Impact: Audio quality assurance.
- SAMPLE_RATE = 22050 + SAMPLE_RATE = 22050 # TODO: Verify if Kani TTS 2 uses 24kHz or 22050Hz
| const maxCharacters = parseInt( | ||
| process.env.KANI_MAX_CHARACTERS || String(KANI_DEFAULTS.MAX_CHARACTERS) | ||
| ); | ||
| if (text.length > maxCharacters) { | ||
| throw new Error( | ||
| `Text exceeds maximum character limit of ${maxCharacters} characters. Current length: ${text.length}` | ||
| ); | ||
| } | ||
|
|
||
| const model = | ||
| options?.model ?? | ||
| (process.env.KANI_MODEL || KANI_DEFAULTS.MODEL); | ||
| const temperature = | ||
| options?.temperature ?? | ||
| parseFloat( | ||
| process.env.KANI_TEMPERATURE || String(KANI_DEFAULTS.TEMPERATURE) | ||
| ); | ||
| const topP = | ||
| options?.top_p ?? | ||
| parseFloat(process.env.KANI_TOP_P || String(KANI_DEFAULTS.TOP_P)); | ||
| const repetitionPenalty = | ||
| options?.repetition_penalty ?? | ||
| parseFloat( | ||
| process.env.KANI_REPETITION_PENALTY || | ||
| String(KANI_DEFAULTS.REPETITION_PENALTY) | ||
| ); | ||
| const referenceAudio = | ||
| options?.referenceAudio || process.env.KANI_REFERENCE_AUDIO || ""; | ||
| const languageTag = | ||
| options?.language ?? (process.env.KANI_LANGUAGE_TAG || ""); | ||
|
|
||
| const outputDir = | ||
| process.env.KANI_OUTPUT_DIR || | ||
| path.join(os.tmpdir(), COMMON_CONSTANTS.TEMP_DIR_NAME); | ||
| if (!fs.existsSync(outputDir)) { | ||
| fs.mkdirSync(outputDir, { recursive: true }); | ||
| } | ||
|
|
||
| const outputFile = path.join(outputDir, `kani-tts-${Date.now()}.wav`); |
There was a problem hiding this comment.
🟠 Logic Error & Robustness
Issue:
parseFloat/parseInton invalid environment variables returnsNaN, which propagates to the Python script and bypasses validation (sinceNaN < 0is false).- Using
||forreferenceAudioprevents the user from explicitly disabling it (passing"") if the environment variable is set. Date.now()for filename generation can cause collisions in high-concurrency scenarios.maxCharactersfallback logic fails for invalid non-empty strings.
Fix:
- Check for
isNaNwhen parsing environment variables. - Use nullish coalescing (
??) for optional parameters. - Add a random suffix to the output filename.
Impact: Prevents runtime errors, ensures configuration is respected, and avoids file overwrites.
| const maxCharacters = parseInt( | |
| process.env.KANI_MAX_CHARACTERS || String(KANI_DEFAULTS.MAX_CHARACTERS) | |
| ); | |
| if (text.length > maxCharacters) { | |
| throw new Error( | |
| `Text exceeds maximum character limit of ${maxCharacters} characters. Current length: ${text.length}` | |
| ); | |
| } | |
| const model = | |
| options?.model ?? | |
| (process.env.KANI_MODEL || KANI_DEFAULTS.MODEL); | |
| const temperature = | |
| options?.temperature ?? | |
| parseFloat( | |
| process.env.KANI_TEMPERATURE || String(KANI_DEFAULTS.TEMPERATURE) | |
| ); | |
| const topP = | |
| options?.top_p ?? | |
| parseFloat(process.env.KANI_TOP_P || String(KANI_DEFAULTS.TOP_P)); | |
| const repetitionPenalty = | |
| options?.repetition_penalty ?? | |
| parseFloat( | |
| process.env.KANI_REPETITION_PENALTY || | |
| String(KANI_DEFAULTS.REPETITION_PENALTY) | |
| ); | |
| const referenceAudio = | |
| options?.referenceAudio || process.env.KANI_REFERENCE_AUDIO || ""; | |
| const languageTag = | |
| options?.language ?? (process.env.KANI_LANGUAGE_TAG || ""); | |
| const outputDir = | |
| process.env.KANI_OUTPUT_DIR || | |
| path.join(os.tmpdir(), COMMON_CONSTANTS.TEMP_DIR_NAME); | |
| if (!fs.existsSync(outputDir)) { | |
| fs.mkdirSync(outputDir, { recursive: true }); | |
| } | |
| const outputFile = path.join(outputDir, `kani-tts-${Date.now()}.wav`); | |
| const maxCharacters = parseInt(process.env.KANI_MAX_CHARACTERS || "") || KANI_DEFAULTS.MAX_CHARACTERS; | |
| if (text.length > maxCharacters) { | |
| throw new Error( | |
| `Text exceeds maximum character limit of ${maxCharacters} characters. Current length: ${text.length}` | |
| ); | |
| } | |
| const model = | |
| options?.model ?? | |
| (process.env.KANI_MODEL || KANI_DEFAULTS.MODEL); | |
| const envTemp = parseFloat(process.env.KANI_TEMPERATURE || ""); | |
| const temperature = | |
| options?.temperature ?? | |
| (isNaN(envTemp) ? KANI_DEFAULTS.TEMPERATURE : envTemp); | |
| const envTopP = parseFloat(process.env.KANI_TOP_P || ""); | |
| const topP = | |
| options?.top_p ?? | |
| (isNaN(envTopP) ? KANI_DEFAULTS.TOP_P : envTopP); | |
| const envRepPenalty = parseFloat(process.env.KANI_REPETITION_PENALTY || ""); | |
| const repetitionPenalty = | |
| options?.repetition_penalty ?? | |
| (isNaN(envRepPenalty) ? KANI_DEFAULTS.REPETITION_PENALTY : envRepPenalty); | |
| const referenceAudio = | |
| options?.referenceAudio ?? process.env.KANI_REFERENCE_AUDIO ?? ""; | |
| const languageTag = | |
| options?.language ?? (process.env.KANI_LANGUAGE_TAG || ""); | |
| const outputDir = | |
| process.env.KANI_OUTPUT_DIR || | |
| path.join(os.tmpdir(), COMMON_CONSTANTS.TEMP_DIR_NAME); | |
| if (!fs.existsSync(outputDir)) { | |
| fs.mkdirSync(outputDir, { recursive: true }); | |
| } | |
| const outputFile = path.join(outputDir, `kani-tts-${Date.now()}-${Math.random().toString(36).slice(2)}.wav`); |
There was a problem hiding this comment.
3 issues found across 10 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/core/tts-service.factory.ts">
<violation number="1" location="src/core/tts-service.factory.ts:28">
P2: Engine selection only accepts "kani"; the README documents "kani-mlx" as the valid env value, so setting the documented value will default to chatterbox instead of selecting Kani.</violation>
</file>
<file name="src/core/kani.engine.ts">
<violation number="1" location="src/core/kani.engine.ts:224">
P1: Missing path validation for `referenceAudio`. The chatterbox engine validates reference audio paths with `validateReferenceAudioPath()` to prevent directory traversal, but the kani engine passes user-supplied paths directly to the subprocess. Add path validation (e.g., checking allowed extensions, ensuring the path is within expected directories, and verifying the file exists) before passing it to the Python script.</violation>
<violation number="2" location="src/core/kani.engine.ts:235">
P2: `Date.now()` alone is insufficient for unique filenames under concurrent requests. Two requests arriving in the same millisecond will collide. Use `crypto.randomUUID()` or append a random suffix (e.g., `crypto.randomBytes(4).toString('hex')`) to ensure uniqueness.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| String(KANI_DEFAULTS.REPETITION_PENALTY) | ||
| ); | ||
| const referenceAudio = | ||
| options?.referenceAudio || process.env.KANI_REFERENCE_AUDIO || ""; |
There was a problem hiding this comment.
P1: Missing path validation for referenceAudio. The chatterbox engine validates reference audio paths with validateReferenceAudioPath() to prevent directory traversal, but the kani engine passes user-supplied paths directly to the subprocess. Add path validation (e.g., checking allowed extensions, ensuring the path is within expected directories, and verifying the file exists) before passing it to the Python script.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/core/kani.engine.ts, line 224:
<comment>Missing path validation for `referenceAudio`. The chatterbox engine validates reference audio paths with `validateReferenceAudioPath()` to prevent directory traversal, but the kani engine passes user-supplied paths directly to the subprocess. Add path validation (e.g., checking allowed extensions, ensuring the path is within expected directories, and verifying the file exists) before passing it to the Python script.</comment>
<file context>
@@ -0,0 +1,376 @@
+ String(KANI_DEFAULTS.REPETITION_PENALTY)
+ );
+ const referenceAudio =
+ options?.referenceAudio || process.env.KANI_REFERENCE_AUDIO || "";
+ const languageTag =
+ options?.language ?? (process.env.KANI_LANGUAGE_TAG || "");
</file context>
| // Validate and set engine type | ||
| if (envEngine === "kokoro") { | ||
| this.engineType = "kokoro"; | ||
| } else if (envEngine === "kani") { |
There was a problem hiding this comment.
P2: Engine selection only accepts "kani"; the README documents "kani-mlx" as the valid env value, so setting the documented value will default to chatterbox instead of selecting Kani.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/core/tts-service.factory.ts, line 28:
<comment>Engine selection only accepts "kani"; the README documents "kani-mlx" as the valid env value, so setting the documented value will default to chatterbox instead of selecting Kani.</comment>
<file context>
@@ -24,6 +25,8 @@ export class TTSService {
// Validate and set engine type
if (envEngine === "kokoro") {
this.engineType = "kokoro";
+ } else if (envEngine === "kani") {
+ this.engineType = "kani";
} else if (envEngine && envEngine !== "chatterbox") {
</file context>
| } else if (envEngine === "kani") { | |
| } else if (envEngine === "kani" || envEngine === "kani-mlx") { |
| fs.mkdirSync(outputDir, { recursive: true }); | ||
| } | ||
|
|
||
| const outputFile = path.join(outputDir, `kani-tts-${Date.now()}.wav`); |
There was a problem hiding this comment.
P2: Date.now() alone is insufficient for unique filenames under concurrent requests. Two requests arriving in the same millisecond will collide. Use crypto.randomUUID() or append a random suffix (e.g., crypto.randomBytes(4).toString('hex')) to ensure uniqueness.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/core/kani.engine.ts, line 235:
<comment>`Date.now()` alone is insufficient for unique filenames under concurrent requests. Two requests arriving in the same millisecond will collide. Use `crypto.randomUUID()` or append a random suffix (e.g., `crypto.randomBytes(4).toString('hex')`) to ensure uniqueness.</comment>
<file context>
@@ -0,0 +1,376 @@
+ fs.mkdirSync(outputDir, { recursive: true });
+ }
+
+ const outputFile = path.join(outputDir, `kani-tts-${Date.now()}.wav`);
+
+ const args = [
</file context>
Code Review by Qodo
1. Logs include raw --text
|
| logger.log("[KaniEngine] Python path:", this.pythonPath); | ||
| logger.log("[KaniEngine] Script path:", this.scriptPath); | ||
| logger.log("[KaniEngine] Arguments:", args); | ||
|
|
There was a problem hiding this comment.
1. Logs include raw --text 📘 Rule violation ⛨ Security
The Kani engine logs the full spawned argument list, which includes the user-provided TTS text and potentially sensitive file paths. This can leak sensitive data into logs and violates secure logging requirements.
Agent Prompt
## Issue description
`KaniEngine` logs the full spawned argument list, which includes the raw user-provided `text` (and possibly reference audio paths). This can leak sensitive information into logs.
## Issue Context
Secure logging requires avoiding sensitive data in logs. TTS input text may contain PII/secrets, and file paths can also be sensitive.
## Fix Focus Areas
- src/core/kani.engine.ts[237-264]
- src/core/kani.engine.ts[271-285]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| except Exception as e: | ||
| print(f"Error: {e}", file=sys.stderr) | ||
| traceback.print_exc(file=sys.stderr) | ||
| sys.exit(1) |
There was a problem hiding this comment.
2. traceback.print_exc exposed to user 📘 Rule violation ⛨ Security
The CLI runner prints a full stack trace to stderr on any exception, exposing internal implementation details to the end user. This violates secure error handling requirements.
Agent Prompt
## Issue description
The runner prints a full stack trace (`traceback.print_exc`) to stderr, which is user-facing for a CLI and leaks internal implementation details.
## Issue Context
Compliance requires generic user-facing errors and detailed stack traces only in secure/internal logs.
## Fix Focus Areas
- scripts/kani_runner.py[150-153]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const maxCharacters = parseInt( | ||
| process.env.KANI_MAX_CHARACTERS || String(KANI_DEFAULTS.MAX_CHARACTERS) | ||
| ); | ||
| if (text.length > maxCharacters) { | ||
| throw new Error( | ||
| `Text exceeds maximum character limit of ${maxCharacters} characters. Current length: ${text.length}` | ||
| ); | ||
| } | ||
|
|
||
| const model = | ||
| options?.model ?? | ||
| (process.env.KANI_MODEL || KANI_DEFAULTS.MODEL); | ||
| const temperature = | ||
| options?.temperature ?? | ||
| parseFloat( | ||
| process.env.KANI_TEMPERATURE || String(KANI_DEFAULTS.TEMPERATURE) | ||
| ); | ||
| const topP = | ||
| options?.top_p ?? | ||
| parseFloat(process.env.KANI_TOP_P || String(KANI_DEFAULTS.TOP_P)); | ||
| const repetitionPenalty = | ||
| options?.repetition_penalty ?? | ||
| parseFloat( | ||
| process.env.KANI_REPETITION_PENALTY || | ||
| String(KANI_DEFAULTS.REPETITION_PENALTY) | ||
| ); | ||
| const referenceAudio = |
There was a problem hiding this comment.
3. Unvalidated env numeric parsing 📘 Rule violation ⛯ Reliability
Numeric values parsed from environment variables can become NaN and bypass length/range checks or be forwarded to the Python runner. This creates unreliable behavior and weak edge-case handling for external inputs.
Agent Prompt
## Issue description
`parseInt`/`parseFloat` results from env vars are not validated. If an env var is non-numeric, values can become `NaN`, causing length/range checks to fail open or invalid values to be passed downstream.
## Issue Context
These values are external inputs (env) and should be validated with explicit edge-case handling.
## Fix Focus Areas
- src/core/kani.engine.ts[197-223]
- src/core/kani.engine.ts[235-251]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Validate and set engine type | ||
| if (envEngine === "kokoro") { | ||
| this.engineType = "kokoro"; | ||
| } else if (envEngine === "kani") { | ||
| this.engineType = "kani"; | ||
| } else if (envEngine && envEngine !== "chatterbox") { | ||
| logger.warn( | ||
| `Unknown TTS engine: ${envEngine}. Defaulting to chatterbox.` |
There was a problem hiding this comment.
4. Kani limit mismatch 🐞 Bug ✓ Correctness
When TTS_ENGINE=kani, the HTTP server and MCP tool still enforce CHATTERBOX_MAX_CHARACTERS (or Kokoro’s), ignoring KANI_MAX_CHARACTERS introduced by this PR. This makes the Kani config misleading and can reject valid requests unexpectedly.
Agent Prompt
### Issue description
When `TTS_ENGINE=kani`, the request-layer (HTTP and MCP) max-length checks still default to Chatterbox’s env var. This makes `KANI_MAX_CHARACTERS` ineffective and creates inconsistent behavior (request rejected before reaching `KaniEngine`, or enforced by a different limit than the engine).
### Issue Context
- `KaniEngine` already enforces `KANI_MAX_CHARACTERS`.
- `.env.example` documents `KANI_MAX_CHARACTERS`.
- The HTTP and MCP layers should enforce the same engine-specific limit to provide consistent UX and predictable configuration.
### Fix Focus Areas
- src/server.ts[105-125]
- src/mcp/tools.ts[100-122]
- src/core/kani.engine.ts[197-204]
- src/core/tts-service.factory.ts[21-49]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const outputDir = | ||
| process.env.KANI_OUTPUT_DIR || | ||
| path.join(os.tmpdir(), COMMON_CONSTANTS.TEMP_DIR_NAME); | ||
| if (!fs.existsSync(outputDir)) { | ||
| fs.mkdirSync(outputDir, { recursive: true }); | ||
| } | ||
|
|
||
| const outputFile = path.join(outputDir, `kani-tts-${Date.now()}.wav`); | ||
|
|
There was a problem hiding this comment.
5. Output dir override breaks 🐞 Bug ⛯ Reliability
KANI_OUTPUT_DIR is supported and documented, but the HTTP and MCP layers enforce a strict security boundary requiring outputs under os.tmpdir()/local-voice-mcp. Setting KANI_OUTPUT_DIR outside that directory will cause synthesis to fail at the boundary check.
Agent Prompt
### Issue description
`KANI_OUTPUT_DIR` allows outputs outside the temp directory, but both HTTP and MCP layers reject such paths. This makes the config option effectively broken (or produces confusing runtime failures).
### Issue Context
The project intentionally restricts playable/servable audio files to a temp directory boundary for security. Any engine-level output directory override must adhere to that boundary.
### Fix Focus Areas
- src/core/kani.engine.ts[228-236]
- src/mcp/tools.ts[144-154]
- src/server.ts[134-143]
- .env.example[56-57]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/mcp/tools.ts (1)
864-874:⚠️ Potential issue | 🟡 MinorTTSToolSchemas.synthesizeText is missing Kokoro-specific fields that exist in the method signature and inputSchema.
The schema lacks
speed,language,voice,model_path, andvoices_path—all of which are defined in theinputSchemaand accepted by thesynthesizeTextmethod. While the actual tool validation usesinputSchemarather thanTTSToolSchemas, this inconsistency should be resolved to keep the schema complete and to avoid confusion ifTTSToolSchemasis relied upon elsewhere. Additionally, test coverage for these fields is missing from the TTSToolSchemas test suite.
🤖 Fix all issues with AI agents
In `@src/core/kani.engine.ts`:
- Around line 265-317: Add a configurable timeout to the Promise that spawns the
Python TTS process so the caller cannot hang indefinitely: introduce a timeout
value (e.g., this.synthesisTimeout or a parameter) and start a timer after
spawning the process; if the timer fires before childProcess closes, call
childProcess.kill() (or childProcess.kill('SIGKILL') as appropriate), log the
timeout including stderrData/stdoutData, clear the timer on normal close, and
reject the Promise with a descriptive timeout Error instead of leaving it
unresolved; ensure you reference the existing childProcess variable, spawn call
(this.pythonPath, args), outputFile existence check, and clear the timer in the
childProcess.on("close") and childProcess.on("error") handlers.
🧹 Nitpick comments (6)
scripts/kani_runner.py (3)
100-113: Remove extraneousfprefixes from strings without placeholders.Lines 102 and 113 use f-strings with no interpolation. As flagged by Ruff (F541), drop the
fprefix.🔧 Proposed fix
- print(f"[MAIN] Initializing Kani TTS 2...", file=sys.stderr) + print("[MAIN] Initializing Kani TTS 2...", file=sys.stderr)- print(f"[MAIN] Speaker embedding extracted successfully", file=sys.stderr) + print("[MAIN] Speaker embedding extracted successfully", file=sys.stderr)
127-132: Unused variable and misleading truncation suffix.
processed_textis never used — prefix with_to signal intent.- The log on line 128 always appends
...even when the text is shorter than 100 characters.🔧 Proposed fix
- print(f"[MAIN] Generating speech for: '{args.text[:100]}...'", file=sys.stderr) + text_preview = args.text[:100] + ("..." if len(args.text) > 100 else "") + print(f"[MAIN] Generating speech for: '{text_preview}'", file=sys.stderr) print(f"[MAIN] Temperature: {args.temperature}, top_p: {args.top_p}, " f"rep_penalty: {args.repetition_penalty}", file=sys.stderr) - audio, processed_text = model(args.text, **gen_kwargs) + audio, _processed_text = model(args.text, **gen_kwargs)
29-34: HardcodedSAMPLE_RATEduplicates the value intts-constants.ts(AUDIO_CONFIG.SAMPLE_RATE).If the Kani model actually outputs at a different sample rate (e.g., 24 kHz or 44.1 kHz), writing the WAV with 22050 will produce pitched/sped-up audio. Consider sourcing this from the model output or making it a CLI argument so the TypeScript side can pass it through.
tests/core/tts-service.factory.test.ts (1)
151-184: Consider adding upper-bound validation tests fortop_pandrepetition_penalty.The current tests validate the lower-bound violations (
top_p: -0.1,repetition_penalty: 0.5) but don't test upper-bound violations (e.g.,top_p: 1.5,repetition_penalty: 2.5). Adding these would ensure the upper bounds defined inKANI_LIMITSare also enforced.🧪 Proposed additions after the existing invalid cases
// Invalid repetition_penalty should throw expect(() => { service["engine"].validateOptions({ repetition_penalty: 0.5, // Out of range (min is 1.0) }); }).toThrow(); + + // Invalid top_p upper bound should throw + expect(() => { + service["engine"].validateOptions({ + top_p: 1.5, // Out of range (max is 1.0) + }); + }).toThrow(); + + // Invalid repetition_penalty upper bound should throw + expect(() => { + service["engine"].validateOptions({ + repetition_penalty: 2.5, // Out of range (max is 2.0) + }); + }).toThrow(); });src/core/kani.engine.ts (2)
197-199: Pass explicit radix toparseInt.
parseIntwithout a radix can produce unexpected results for strings with leading zeros in some contexts. Always specify radix 10 explicitly.🔧 Proposed fix
- const maxCharacters = parseInt( - process.env.KANI_MAX_CHARACTERS || String(KANI_DEFAULTS.MAX_CHARACTERS) + const maxCharacters = parseInt( + process.env.KANI_MAX_CHARACTERS || String(KANI_DEFAULTS.MAX_CHARACTERS), + 10 );
371-375:shutdowndoesn't kill any in-flight synthesis child process.If
synthesize()is in progress whenshutdown()is called, the spawned Python process will continue running. Consider tracking the active child process and killing it on shutdown.
| return new Promise<string>((resolve, reject) => { | ||
| const childProcess = spawn(this.pythonPath, args); | ||
|
|
||
| let stderrData = ""; | ||
| let stdoutData = ""; | ||
|
|
||
| if (childProcess.stderr) { | ||
| childProcess.stderr.on("data", (data) => { | ||
| const chunk = data.toString(); | ||
| stderrData += chunk; | ||
| logger.log("[KaniEngine] stderr:", chunk.trim()); | ||
| }); | ||
| } | ||
|
|
||
| if (childProcess.stdout) { | ||
| childProcess.stdout.on("data", (data) => { | ||
| const chunk = data.toString(); | ||
| stdoutData += chunk; | ||
| logger.log("[KaniEngine] stdout:", chunk.trim()); | ||
| }); | ||
| } | ||
|
|
||
| childProcess.on("error", (error) => { | ||
| logger.error("[KaniEngine] process error:", error); | ||
| reject(error); | ||
| }); | ||
|
|
||
| const startTime = Date.now(); | ||
| logger.log( | ||
| "[KaniEngine] TTS process started. Waiting for completion..." | ||
| ); | ||
|
|
||
| childProcess.on("close", (code) => { | ||
| const duration = Date.now() - startTime; | ||
| logger.log( | ||
| `[KaniEngine] Process closed with code ${code} after ${duration}ms` | ||
| ); | ||
|
|
||
| if (code === 0 && fs.existsSync(outputFile)) { | ||
| logger.log( | ||
| `[KaniEngine] Synthesis completed successfully in ${duration}ms` | ||
| ); | ||
| resolve(outputFile); | ||
| } else { | ||
| logger.error( | ||
| `[KaniEngine] Synthesis failed after ${duration}ms with code ${code}` | ||
| ); | ||
| logger.error("[KaniEngine] Error output (stderr):", stderrData); | ||
| logger.error("[KaniEngine] Standard output (stdout):", stdoutData); | ||
| reject(new Error("Kani synthesis failed")); | ||
| } | ||
| }); | ||
| }); |
There was a problem hiding this comment.
No timeout on the synthesis child process — risk of indefinite hang.
If the Python process hangs (OOM, GPU deadlock, model download stall), this promise never settles, blocking the caller forever. Consider adding a configurable timeout that kills the child process.
⏱️ Proposed fix — add a timeout
return new Promise<string>((resolve, reject) => {
const childProcess = spawn(this.pythonPath, args);
+ const TIMEOUT_MS = 300_000; // 5 minutes
+ const timer = setTimeout(() => {
+ childProcess.kill("SIGTERM");
+ reject(new Error(`Kani synthesis timed out after ${TIMEOUT_MS / 1000}s`));
+ }, TIMEOUT_MS);
let stderrData = "";
let stdoutData = "";
// ... existing stream handlers ...
childProcess.on("close", (code) => {
+ clearTimeout(timer);
const duration = Date.now() - startTime;
// ... existing close handler ...
});
});🤖 Prompt for AI Agents
In `@src/core/kani.engine.ts` around lines 265 - 317, Add a configurable timeout
to the Promise that spawns the Python TTS process so the caller cannot hang
indefinitely: introduce a timeout value (e.g., this.synthesisTimeout or a
parameter) and start a timer after spawning the process; if the timer fires
before childProcess closes, call childProcess.kill() (or
childProcess.kill('SIGKILL') as appropriate), log the timeout including
stderrData/stdoutData, clear the timer on normal close, and reject the Promise
with a descriptive timeout Error instead of leaving it unresolved; ensure you
reference the existing childProcess variable, spawn call (this.pythonPath,
args), outputFile existence check, and clear the timer in the
childProcess.on("close") and childProcess.on("error") handlers.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d31b169f9f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } else if (envEngine === "kani") { | ||
| this.engineType = "kani"; |
There was a problem hiding this comment.
Handle Kani limits in HTTP pre-validation
By adding envEngine === "kani" here, the service now exposes a third engine type, but the /tts HTTP handler still only branches on kokoro vs fallback (src/server.ts lines 109-114), so Kani requests are pre-validated against CHATTERBOX_MAX_CHARACTERS instead of KANI_MAX_CHARACTERS. In deployments where those limits differ, valid Kani input is rejected early (or error semantics become inconsistent), so the server-side limit selection needs a dedicated Kani branch.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This pull request adds support for Kani TTS 2, a new text-to-speech engine that uses HuggingFace-hosted models for voice synthesis with advanced features like voice cloning via speaker embeddings and fine-grained control over generation parameters.
Changes:
- Added complete Kani TTS 2 engine implementation with automatic dependency installation and HuggingFace model integration
- Extended the MCP tools API to support Kani-specific parameters (temperature, top_p, repetition_penalty, model)
- Added comprehensive test coverage for Kani engine selection, capabilities, and parameter validation
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/core/kani.engine.ts |
New Kani TTS 2 engine implementation with Python integration, dependency management, and synthesis capabilities |
src/core/tts-service.factory.ts |
Updated factory to include "kani" as a valid engine type and instantiate KaniEngine |
src/core/tts-engine.interface.ts |
Extended TTSOptions interface with Kani-specific parameters (temperature, top_p, repetition_penalty, model) |
src/core/tts-constants.ts |
Added KANI_DEFAULTS and KANI_LIMITS constants for configuration and validation |
src/mcp/tools.ts |
Extended synthesizeText tool to accept and pass through Kani parameters, updated tool description and schema |
scripts/kani_runner.py |
Python script that interfaces with kani-tts library for audio generation with validation and error handling |
scripts/requirements-kani.txt |
Python dependencies for Kani TTS 2 (kani-tts-2, transformers, soundfile, numpy) |
tests/core/tts-service.factory.test.ts |
Added comprehensive test cases for Kani engine selection, capabilities, and validation |
.env.example |
Added Kani TTS 2 configuration section with environment variables and documentation |
package-lock.json |
Version bump to 0.3.0 (should be 0.4.0) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const referenceAudio = | ||
| options?.referenceAudio || process.env.KANI_REFERENCE_AUDIO || ""; | ||
| const languageTag = | ||
| options?.language ?? (process.env.KANI_LANGUAGE_TAG || ""); | ||
|
|
||
| const outputDir = | ||
| process.env.KANI_OUTPUT_DIR || | ||
| path.join(os.tmpdir(), COMMON_CONSTANTS.TEMP_DIR_NAME); | ||
| if (!fs.existsSync(outputDir)) { | ||
| fs.mkdirSync(outputDir, { recursive: true }); | ||
| } | ||
|
|
||
| const outputFile = path.join(outputDir, `kani-tts-${Date.now()}.wav`); | ||
|
|
||
| const args = [ | ||
| this.scriptPath, | ||
| "--text", | ||
| text, | ||
| "--output", | ||
| outputFile, | ||
| "--model", | ||
| model, | ||
| "--temperature", | ||
| String(temperature), | ||
| "--top_p", | ||
| String(topP), | ||
| "--repetition_penalty", | ||
| String(repetitionPenalty), | ||
| ]; | ||
|
|
||
| if (referenceAudio) { | ||
| args.push("--reference_audio", referenceAudio); |
There was a problem hiding this comment.
The reference audio path should be validated before being passed to the Python script, similar to how ChatterboxEngine does it. Without validation, this could allow path traversal attacks, access to non-existent files, or unsupported file formats. Consider implementing a validateReferenceAudioPath method (similar to ChatterboxEngine lines 157-211) and calling it here when referenceAudio is provided.
|
|
||
| # TTS Engine Selection | ||
| # Options: "chatterbox" (default), "kokoro" | ||
| # Options: "chatterbox" (default), "kokoro", "kani" |
There was a problem hiding this comment.
The README.md should be updated to document the new Kani TTS 2 engine. Currently it states "Local Voice MCP supports two TTS engines" (line 27) but with this PR there are now three engines. A new section should be added describing Kani TTS 2's features, setup instructions, and configuration options, similar to the sections for Chatterbox and Kokoro engines.
| # Options: "chatterbox" (default), "kokoro", "kani" | |
| # Options: "chatterbox" (default), "kokoro", "kani" (Kani TTS 2) |
|
|
||
| # TTS Engine Selection | ||
| # Options: "chatterbox" (default), "kokoro" | ||
| # Options: "chatterbox" (default), "kokoro", "kani" |
There was a problem hiding this comment.
The CHANGELOG.md should be updated to document the addition of Kani TTS 2 support in this release. This is a significant new feature that should be documented in the "Added" section, including key capabilities like voice cloning via speaker embeddings, HuggingFace model integration, and the new parameters (temperature, top_p, repetition_penalty, model).
Summary by cubic
Add Kani TTS 2 as a third TTS engine with voice cloning and HuggingFace auto-download. Adds temperature/top_p/repetition_penalty controls and updates the MCP tool schema and validations.
New Features
Migration
Written for commit d31b169. Summary will update on new commits.
EntelligenceAI PR Summary
Version bump from 0.1.0 to 0.3.0 with no dependency or functional changes.
Summary by CodeRabbit
Release Notes
New Features
Tests