Add speaker embeddings support #3855

michalkulakowski · 2025-12-10T15:38:30Z

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

dtrawins · 2026-01-15T14:05:26Z

Can you include the unit tests?

dkalinowski · 2026-01-15T15:05:04Z

src/audio/text_to_speech/t2s_servable.hpp

-    TtsServable(const mediapipe::T2sCalculatorOptions& nodeOptions, const std::string& graphPath) {
-        auto fsModelsPath = std::filesystem::path(nodeOptions.models_path());
+    TtsServable(const std::string& modelDir, const std::string& targetDevice, const google::protobuf::RepeatedPtrField<mediapipe::T2sCalculatorOptions_SpeakerEmbeddings>& graphVoices, const std::string& pluginConfig, const std::string& graphPath) {
+        auto fsModelsPath = std::filesystem::path(modelDir);


can you move this constructor to cpp? this way you could move read_speaker_embedding to cpp as well.

dkalinowski · 2026-01-15T15:06:56Z

src/audio/text_to_speech/t2s_servable.hpp

+    input.seekg(0, std::ios::beg);
+
+    // Check size is multiple of float
+    OPENVINO_ASSERT(buffer_size % sizeof(float) == 0, "File size is not a multiple of float size.");


i think it might be the first time we use openvino_assert
why are we introducing yet another way of reporting errors?
for now we have:

ovms::Status

absl::Status

std::variant<Obj, std::string(error)>

and now this... :(

dkalinowski · 2026-01-15T15:07:06Z

src/audio/text_to_speech/t2s_servable.hpp

+        for (auto voice : graphVoices) {
+            if (!std::filesystem::exists(voice.path()))
+                throw std::runtime_error{"Requested voice speaker embeddings file does not exist."};
+            voices[voice.name()] = read_speaker_embedding(voice.path());


do we catch exceptions thrown inside read_speaker_embedding?

dkalinowski · 2026-01-20T14:51:28Z

demos/audio/README.md

+
+Instead of generating speech with default model voice you can create speaker embeddings with [this script](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/speech_generation/create_speaker_embedding.py)
+```bash
+curl --output create_speaker_embedding.py "https://raw.githubusercontent.com/openvinotoolkit/openvino.genai/refs/heads/master/samples/python/speech_generation/create_speaker_embedding.py"


yet another link for manual change, it wouldnt even hit sed command which i use because its genai and master
do we want to keep this reference to master and risk of change and getting outdated demos? or do we want to hardcode it? or maybe do we want to keep it in our repo?

dkalinowski · 2026-01-20T14:52:51Z

prepare_llm_models.sh

+if [ -f "$1/$TTS_MODEL/$TOKENIZER_FILE" ]; then
+  echo "Model file $1/$TTS_MODEL/$TOKENIZER_FILE exists. Skipping downloading models."
+else
+  python3 demos/common/export_models/export_model.py text2speech --source_model "$TTS_MODEL" --weight-format int4 --model_repository_path $1 --vocoder microsoft/speecht5_hifigan


what about windows script?

dkalinowski · 2026-01-20T15:00:11Z

src/test/audio/text2speech_test.cpp

+            "input": "The quick brown fox jumped over the lazy dog."
+        }
+    )";
+    ASSERT_EQ(


tests check only "OK" and no errors, but what about actual output?

Copilot

Pull request overview

This PR adds speaker embeddings support to the text-to-speech (TTS) functionality, allowing users to customize the voice used for speech generation by providing speaker embedding files.

Changes:

Added configuration and validation for speaker embeddings in TTS calculator
Refactored test infrastructure to reuse common test base class across multiple test files
Added comprehensive test coverage for TTS functionality including speaker embeddings

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
windows_prepare_llm_models.bat	Added TTS model download configuration with vocoder
src/test/test_http_utils.hpp	Extracted V3HttpTest base class for reuse across test files
src/test/reranknode_test.cpp	Refactored to use extracted V3HttpTest base class
src/test/embeddingsnode_test.cpp	Refactored to use extracted V3HttpTest base class
src/test/audio/text2speech_test.cpp	Added comprehensive tests for TTS functionality including speaker embeddings
src/test/audio/graph.pbtxt	Added test graph configuration for TTS with speaker embeddings
src/test/audio/config.json	Added test configuration for TTS mediapipe graph
src/mediapipe_internal/mediapipegraphdefinition.cpp	Added error handling for TTS servable initialization
src/audio/text_to_speech/t2s_servable.hpp	Refactored to use constructor parameters instead of inline initialization
src/audio/text_to_speech/t2s_servable.cpp	Implemented constructor with speaker embeddings loading logic
src/audio/text_to_speech/t2s_calculator.proto	Added SpeakerEmbeddings message definition
src/audio/text_to_speech/t2s_calculator.cc	Added voice parameter handling and speaker embeddings usage
src/audio/text_to_speech/BUILD	Updated build dependencies for new implementation
src/BUILD	Added text2speech_test to build targets
prepare_llm_models.sh	Added TTS model download script for Linux
demos/audio/README.md	Added documentation for speaker embeddings feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/test/audio/text2speech_test.cpp

demos/audio/README.md

src/audio/text_to_speech/t2s_calculator.cc

src/audio/text_to_speech/t2s_servable.cpp

mzegla · 2026-01-21T09:13:38Z

src/audio/text_to_speech/t2s_calculator.cc

+                voice = voiceIt->value.GetString();
+            }
+            if (voice.has_value()) {
+                if (pipe->voices.find(voice.value()) == pipe->voices.end())


Are voices static at this point? Don't we need to lock it for the thread as well for the access?

@michalkulakowski whats the answer?

answer is yes, they are static, loaded once during pipeline initialization

mzegla · 2026-01-21T09:18:08Z

src/audio/text_to_speech/t2s_servable.cpp

+        throw std::runtime_error("File size is not a multiple of float size.");
+    }
+    size_t num_floats = buffer_size / sizeof(float);
+    if (num_floats != 512) {


Why 512?
Can we have more context for this number for future reference?
Is it like embedding dim size or something?

mzegla · 2026-01-21T09:27:07Z

src/audio/text_to_speech/t2s_calculator.cc

            if (streamIt != payload.parsedJson->MemberEnd()) {
                return absl::InvalidArgumentError("streaming is not supported");
            }
+            std::optional<std::string> voice;


Suggested change

std::optional<std::string> voice;

std::optional<std::string> voiceName;

?
To distinguish from pipe->voices elements

mzegla · 2026-01-21T09:29:39Z

src/audio/text_to_speech/t2s_calculator.cc

+            if (voiceIt != payload.parsedJson->MemberEnd() && voiceIt->value.IsString()) {
+                voice = voiceIt->value.GetString();
+            }
+            if (voice.has_value()) {


Do we need separate condition here? Can we bring content from this block to the condition above?
if (voiceIt != payload.parsedJson->MemberEnd() && voiceIt->value.IsString()) I suppose it's the same.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

michalkulakowski changed the title ~~Add speaker ambeddings support~~ Add speaker embeddings support Dec 11, 2025

michalkulakowski requested review from atobiszei and dkalinowski January 9, 2026 08:45

michalkulakowski force-pushed the mkulakow/support_voice_embeddings branch from ec35a1a to 4f45830 Compare January 12, 2026 12:32

dkalinowski reviewed Jan 15, 2026

View reviewed changes

michalkulakowski requested a review from dkalinowski January 20, 2026 08:04

michalkulakowski added 12 commits January 20, 2026 11:17

Add speaker ambeddings support

1d5fb31

fix

771aec8

fix

2999a32

style

af90182

fix

45302f3

fix

2971118

fix

9288149

missing file

a4350d6

fix

20040a2

uts refactor

6be6b58

UTs

e9350a6

style

aa8b340

michalkulakowski force-pushed the mkulakow/support_voice_embeddings branch from 0baece7 to aa8b340 Compare January 20, 2026 13:08

dkalinowski reviewed Jan 20, 2026

View reviewed changes

michalkulakowski added 3 commits January 20, 2026 16:37

fix

91e4a97

style

3dde288

style

ad2cbaa

michalkulakowski requested review from dkalinowski and mzegla January 20, 2026 15:47

dkalinowski approved these changes Jan 20, 2026

View reviewed changes

mzegla requested a review from Copilot January 21, 2026 09:09

Copilot AI reviewed Jan 21, 2026

View reviewed changes

src/test/audio/text2speech_test.cpp Outdated Show resolved Hide resolved

demos/audio/README.md Outdated Show resolved Hide resolved

src/audio/text_to_speech/t2s_calculator.cc Outdated Show resolved Hide resolved

src/audio/text_to_speech/t2s_servable.cpp Outdated Show resolved Hide resolved

michalkulakowski added 2 commits January 21, 2026 10:21

fix

9bd32a4

style

72f4b5b

mzegla reviewed Jan 21, 2026

View reviewed changes

michalkulakowski and others added 5 commits January 21, 2026 10:37

Update src/test/audio/text2speech_test.cpp

a09563f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update demos/audio/README.md

fda1859

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/audio/text_to_speech/t2s_calculator.cc

fe9a02d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/audio/text_to_speech/t2s_servable.cpp

ddf3de4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

review fix

3bd9d80

mzegla approved these changes Jan 21, 2026

View reviewed changes

fix

2452659

michalkulakowski force-pushed the mkulakow/support_voice_embeddings branch 2 times, most recently from a93b865 to 2452659 Compare January 21, 2026 12:21

fix

8161016

michalkulakowski merged commit c9a9610 into main Jan 21, 2026
1 check passed

	std::optional<std::string> voice;
	std::optional<std::string> voiceName;

Add speaker embeddings support #3855

Add speaker embeddings support #3855

Uh oh!

Conversation

michalkulakowski commented Dec 10, 2025

🛠 Summary

🧪 Checklist

Uh oh!

dtrawins commented Jan 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants