fix: do not speak ssml markup by rwaskiewicz · Pull Request #741 · mbta/screenplay

rwaskiewicz · 2026-05-29T20:42:09Z

Asana task: Bug: PA Messages with SSML readout markup

Description

Update the server to conditionally no longer html escape SSML content.

In cases where we have PA message templates with predefined audio text
to send to Polly, we store the audio text as a single string. This
string may include SSML, e.g.

"""
To make space for other passengers and speed up boarding, please take off your backpack before entering the train and hold it at your side.

<lang xml:lang=\"es-US\"> Para dejar espacio para otros pasajeros y acelerar el embarque, por favor quítese la mochila y manténgala a su lado. </lang>
"""

Where the latter half of that message is to be spoken in spanish by
Polly. However, we currently HTML escape the entire string, causing the
XML tags to be transformed from <lang> to <lang>. This causes
Polly to speak "lang" as if it were a word, rather than an SSML
instruction.

This is done in favor of introducing an XML parser and escaping only the
text nodes in the parsed document as we assume that predefined audio
text such as this is one, valid XML and two, immutable by the client.
This eliminates the potential for handling edge cases with respect to
XML parsing.

This is done in favor of altering the data model to split audio text
into different strings per language to minimize the scope of this effort.

Add prosody rate of 90% and drc effect to spoken text to bring spoken
text into parity with RTS

For features with a design/UX component, deployed branch to dev-green and let product know it's ready for review.

Add a simple test to get a baseline of sanitization behavior before changing the client

Update the server to conditionally no longer html escape SSML content. In cases where we have PA message templates with predefined audio text to send to Polly, we store the audio text as a single string. This string may include SSML, e.g. ``` """ To make space for other passengers and speed up boarding, please take off your backpack before entering the train and hold it at your side. <lang xml:lang=\"es-US\"> Para dejar espacio para otros pasajeros y acelerar el embarque, por favor quítese la mochila y manténgala a su lado. </lang> """ ``` Where the latter half of that message is to be spoken in spanish by Polly. However, we currently HTML escape the entire string, causing the XML tags to be transformed from `<lang>` to `<lang>`. This causes Polly to speak "lang" as if it were a word, rather than an SSML instruction. This is done in favor of introducing an XML parser and escaping only the text nodes in the parsed document as we assume that predefined audio text such as this is one, valid XML and two, immutable by the client. This eliminates the potential for handling edge cases with respect to XML parsing. This is done in favor of altering the data model to split audio text into different strings per language to minimize the scope of this effort.

Add prosody rate of 90% and drc effect to spoken text to bring spoken text into parity with RTS

rwaskiewicz · 2026-05-29T20:45:12Z

  Fetches an audio file from Watts given a string.
  """
-  @callback fetch_tts(String.t()) :: {:ok, binary()} | :error
+  @callback fetch_tts(String.t(), boolean) :: {:ok, binary()} | :error


Asking for opinions here - boolean parameters here feel like a code smell. Unsure if that's Uncle Bob whispering in my ear, the ghost of Crystal Reports past, or something else. Would it be better (i.e. more idiomatic) to have an opts arg and match on a has_ssml field in there?

rwaskiewicz · 2026-05-29T20:46:51Z

  const [phoneticText, setPhoneticText] = useState(
    defaultValues?.audio_text ?? "",
  );
+  const [phoneticTextHasSsml, setPhoneticTextHasSsml] = useState<boolean>(


I think between here and MainForm I've covered the various cases where this state is required, but asking someone with a little more experience to take a second look and evaluate that through the UI to double check me here

rwaskiewicz · 2026-05-29T20:48:33Z

+      Jason.encode!(%{
+        text:
+          ~s(<speak><amazon:effect name="drc"><prosody rate="90%">#{text}</prosody></amazon:effect></speak>),
+        voice_id: "Matthew"


One call out here - we hardcode "Matthew" here. However, for Spanish in RTS, we use "Mia". Any thoughts/concerns on this divergence?

rwaskiewicz added 5 commits May 29, 2026 13:55

test: add basic watts client test

c43a552

Add a simple test to get a baseline of sanitization behavior before changing the client

wire up PaMessageForm to use ssml flag

e511a62

refactor: rename query string is_ssml -> has_ssml

fb18e30

feat: add drc, prosody rate to spoken text

a3fee67

Add prosody rate of 90% and drc effect to spoken text to bring spoken text into parity with RTS

rwaskiewicz commented May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: do not speak ssml markup#741

fix: do not speak ssml markup#741
rwaskiewicz wants to merge 5 commits into
mainfrom
rw/ssml-markup-readout

rwaskiewicz commented May 29, 2026 •

edited

Loading

Uh oh!

rwaskiewicz May 29, 2026

Uh oh!

rwaskiewicz May 29, 2026

Uh oh!

rwaskiewicz May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rwaskiewicz commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwaskiewicz May 29, 2026

Choose a reason for hiding this comment

Uh oh!

rwaskiewicz May 29, 2026

Choose a reason for hiding this comment

Uh oh!

rwaskiewicz May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rwaskiewicz commented May 29, 2026 •

edited

Loading

rwaskiewicz May 29, 2026 •

edited

Loading