Skip to content

fix(azure): nest <prosody> inside <voice>, not as direct child of <speak>#39

Merged
OwenMcGirr merged 1 commit into
mainfrom
fix/issue-38-azure-ssml-prosody-nesting
Apr 11, 2026
Merged

fix(azure): nest <prosody> inside <voice>, not as direct child of <speak>#39
OwenMcGirr merged 1 commit into
mainfrom
fix/issue-38-azure-ssml-prosody-nesting

Conversation

@OwenMcGirr
Copy link
Copy Markdown
Collaborator

Problem

When rate, pitch, or volume are passed as options to synthToBytestream / synthToBytes, the generated SSML places <prosody> as a direct child of <speak> instead of inside <voice>:

<!-- Generated (invalid) -->
<speak ...>
  <prosody rate="fast" pitch="high" volume="80%">
    <voice name="en-US-JennyNeural">Hello world</voice>
  </prosody>
</speak>

Azure rejects this with HTTP 500:

Node [speak] with type [RootSpeak] should not contain node [prosody] with type [Others].

Root cause

In ensureAzureSSMLStructure, the this.properties branch (rate/pitch/volume set on the client) correctly extracts content from inside <voice>. But the options branch (rate/pitch/volume passed per-call) always extracted from inside <speak>, so once <voice> had been injected earlier in the same method, the prosody ended up wrapping it from the outside.

Fix

Mirror the this.properties branch in the options branch: when <voice> is present, replace content inside <voice>; only fall back to <speak> when no <voice> element exists.

<!-- Generated after fix (valid) -->
<speak ...>
  <voice name="en-US-JennyNeural">
    <prosody rate="fast" pitch="high" volume="80%">Hello world</prosody>
  </voice>
</speak>

Test

Added regression test in azure-mstts-namespace.test.ts that asserts <prosody> appears after <voice> in the output and matches /<voice[^>]*>\s*<prosody[^>]*>/.

Closes #38

…eak>

When rate, pitch, or volume were passed as options to ensureAzureSSMLStructure,
the prosody element was being extracted from inside <speak> rather than from
inside <voice>. This produced invalid SSML that Azure rejects with:

  "Node [speak] with type [RootSpeak] should not contain node [prosody]
   with type [Others]."

Fix: when a <voice> element is present, replace content inside <voice>
(mirroring the existing this.properties branch above). Add regression test.

Fixes #38
@OwenMcGirr OwenMcGirr merged commit 79f9ad6 into main Apr 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AzureTTSClient: invalid SSML when prosody options are passed (prosody outside voice element)

1 participant