Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -883,6 +883,7 @@
"FILLER",
"foundry",
"FOUNDRY",
"Unpooled",
"viseme",
"VISEME",
"webrtc",
Expand Down
38 changes: 37 additions & 1 deletion sdk/ai/azure-ai-voicelive/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,44 @@

### Features Added

- Added `AgentSessionConfig` class for configuring Azure AI Foundry agent sessions:
- Constructor takes required `agentName` and `projectName` parameters
- Fluent setters for optional parameters: `setAgentVersion()`, `setConversationId()`, `setAuthenticationIdentityClientId()`, `setFoundryResourceOverride()`
- `toQueryParameters()` method for converting configuration to WebSocket query parameters
- Added new `startSession(AgentSessionConfig)` overload to `VoiceLiveAsyncClient` for connecting directly to Azure AI Foundry agents
- Added `startSession(AgentSessionConfig, VoiceLiveRequestOptions)` overload for agent sessions with custom request options
- Added `Scene` class for configuring avatar's zoom level, position, rotation and movement amplitude in the video frame
- Added `scene` property to `AvatarConfiguration` for avatar scene configuration
- Added `outputAuditAudio` property to `AvatarConfiguration` to enable audit audio forwarding via WebSocket for review/debugging purposes
- Added `ServerEventWarning` and `ServerEventWarningDetails` classes for non-interrupting warning events
- Added `ServerEventType.WARNING` enum value
- Added interim response configuration for handling latency and tool calls (replaces filler response):
- `InterimResponseConfigBase` base class for interim response configurations
- `StaticInterimResponseConfig` for static/random text interim responses
- `LlmInterimResponseConfig` for LLM-generated context-aware interim responses
- `InterimResponseConfigType` enum (static_interim_response, llm_interim_response)
- `InterimResponseTrigger` enum for trigger conditions (latency, tool)
- Added `interimResponse` property to `VoiceLiveSessionOptions` and `VoiceLiveSessionResponse`

### Breaking Changes

- Changed token authentication scope from `https://cognitiveservices.azure.com/.default` to `https://ai.azure.com/.default`
- Removed `FoundryAgentTool` class - use `AgentSessionConfig` with `startSession(AgentSessionConfig)` for direct agent connections instead
- Removed `FoundryAgentContextType` enum
- Removed `ResponseFoundryAgentCallItem` class
- Removed Foundry agent call lifecycle server events: `ServerEventResponseFoundryAgentCallArgumentsDelta`, `ServerEventResponseFoundryAgentCallArgumentsDone`, `ServerEventResponseFoundryAgentCallInProgress`, `ServerEventResponseFoundryAgentCallCompleted`, `ServerEventResponseFoundryAgentCallFailed`
- Removed `ItemType.FOUNDRY_AGENT_CALL` enum value
- Removed `ToolType.FOUNDRY_AGENT` enum value
- Removed `ServerEventType.MCP_APPROVAL_REQUEST` and `ServerEventType.MCP_APPROVAL_RESPONSE` enum values
- Renamed filler response API to interim response:
- `FillerResponseConfigBase` → `InterimResponseConfigBase`
- `BasicFillerResponseConfig` → `StaticInterimResponseConfig`
- `LlmFillerResponseConfig` → `LlmInterimResponseConfig`
- `FillerResponseConfigType` → `InterimResponseConfigType`
- `FillerTrigger` → `InterimResponseTrigger`
- `VoiceLiveSessionOptions.getFillerResponse()`/`setFillerResponse()` → `getInterimResponse()`/`setInterimResponse()`
- Type values changed: `static_filler` → `static_interim_response`, `llm_filler` → `llm_interim_response`

### Bugs Fixed

### Other Changes
Expand All @@ -28,7 +64,7 @@
- `ResponseFoundryAgentCallItem` for tracking Foundry agent call responses
- Foundry agent call lifecycle events: `ServerEventResponseFoundryAgentCallArgumentsDelta`, `ServerEventResponseFoundryAgentCallArgumentsDone`, `ServerEventResponseFoundryAgentCallInProgress`, `ServerEventResponseFoundryAgentCallCompleted`, `ServerEventResponseFoundryAgentCallFailed`
- `ItemType.FOUNDRY_AGENT_CALL` and `ToolType.FOUNDRY_AGENT` discriminator values
- Added filler response configuration for handling latency and tool calls:
- Added filler response configuration for handling latency and tool calls (renamed to interim response in 1.0.0-beta.5):
- `FillerResponseConfigBase` base class for filler response configurations
- `BasicFillerResponseConfig` for static/random text filler responses
- `LlmFillerResponseConfig` for LLM-generated context-aware filler responses
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@
import java.util.Map;
import java.util.Objects;

import com.azure.ai.voicelive.models.AgentSessionConfig;
import com.azure.ai.voicelive.models.VoiceLiveRequestOptions;
import com.azure.core.annotation.ServiceClient;
import com.azure.core.credential.KeyCredential;
import com.azure.core.credential.TokenCredential;
import com.azure.core.http.HttpHeader;
import com.azure.core.http.HttpHeaderName;
import com.azure.core.http.HttpHeaders;
import com.azure.core.util.logging.ClientLogger;

Expand Down Expand Up @@ -158,12 +161,66 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(VoiceLiveRequestOptions re
}

/**
* Gets the API version.
* Starts a new VoiceLiveSessionAsyncClient for real-time voice communication with an Azure AI Foundry agent.
*
* @return The API version.
* <p>This method configures the session to connect directly to an Azure AI Foundry agent,
* using the agent configuration to set the appropriate query parameters.</p>
*
* @param agentConfig The agent session configuration containing the agent name, project name,
* and optional parameters.
* @return A Mono containing the connected VoiceLiveSessionAsyncClient.
* @throws NullPointerException if {@code agentConfig} is null.
*/
public Mono<VoiceLiveSessionAsyncClient> startSession(AgentSessionConfig agentConfig) {
Objects.requireNonNull(agentConfig, "'agentConfig' cannot be null");

return Mono.fromCallable(() -> convertToWebSocketEndpoint(endpoint, null, agentConfig.toQueryParameters()))
.flatMap(wsEndpoint -> {
VoiceLiveSessionAsyncClient session;
if (keyCredential != null) {
session = new VoiceLiveSessionAsyncClient(wsEndpoint, keyCredential);
} else {
session = new VoiceLiveSessionAsyncClient(wsEndpoint, tokenCredential);
}
return session.connect(additionalHeaders).thenReturn(session);
});
}

/**
* Starts a new VoiceLiveSessionAsyncClient for real-time voice communication with an Azure AI Foundry agent
* and custom request options.
*
* <p>This method configures the session to connect directly to an Azure AI Foundry agent,
* combining the agent configuration with additional custom options.</p>
*
* @param agentConfig The agent session configuration containing the agent name, project name,
* and optional parameters.
* @param requestOptions Custom query parameters and headers for the request.
* @return A Mono containing the connected VoiceLiveSessionAsyncClient.
* @throws NullPointerException if {@code agentConfig} or {@code requestOptions} is null.
*/
String getApiVersion() {
return apiVersion;
public Mono<VoiceLiveSessionAsyncClient> startSession(AgentSessionConfig agentConfig,
VoiceLiveRequestOptions requestOptions) {
Objects.requireNonNull(agentConfig, "'agentConfig' cannot be null");
Objects.requireNonNull(requestOptions, "'requestOptions' cannot be null");

// Merge agent config params with custom query params (custom params take precedence)
Map<String, String> mergedParams = new LinkedHashMap<>(agentConfig.toQueryParameters());
if (requestOptions.getCustomQueryParameters() != null) {
mergedParams.putAll(requestOptions.getCustomQueryParameters());
}

return Mono.fromCallable(() -> convertToWebSocketEndpoint(endpoint, null, mergedParams)).flatMap(wsEndpoint -> {
VoiceLiveSessionAsyncClient session;
if (keyCredential != null) {
session = new VoiceLiveSessionAsyncClient(wsEndpoint, keyCredential);
} else {
session = new VoiceLiveSessionAsyncClient(wsEndpoint, tokenCredential);
}
// Merge additional headers with custom headers from requestOptions
HttpHeaders mergedHeaders = mergeHeaders(additionalHeaders, requestOptions.getCustomHeaders());
return session.connect(mergedHeaders).thenReturn(session);
});
}

/**
Expand All @@ -176,10 +233,14 @@ String getApiVersion() {
private HttpHeaders mergeHeaders(HttpHeaders baseHeaders, HttpHeaders customHeaders) {
HttpHeaders merged = new HttpHeaders();
if (baseHeaders != null) {
baseHeaders.forEach(header -> merged.set(header.getName(), header.getValue()));
for (HttpHeader header : baseHeaders) {
merged.set(HttpHeaderName.fromString(header.getName()), header.getValue());
}
}
if (customHeaders != null) {
customHeaders.forEach(header -> merged.set(header.getName(), header.getValue()));
for (HttpHeader header : customHeaders) {
merged.set(HttpHeaderName.fromString(header.getName()), header.getValue());
}
}
return merged;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
*/
public final class VoiceLiveSessionAsyncClient implements AsyncCloseable, AutoCloseable {
private static final ClientLogger LOGGER = new ClientLogger(VoiceLiveSessionAsyncClient.class);
private static final String COGNITIVE_SERVICES_SCOPE = "https://cognitiveservices.azure.com/.default";
private static final String AZURE_AI_SCOPE = "https://ai.azure.com/.default";
private static final HttpHeaderName API_KEY = HttpHeaderName.fromString("api-key");

// WebSocket configuration constants
Expand Down Expand Up @@ -398,7 +398,7 @@ public Flux<SessionUpdate> receiveEvents() {
.flatMap(this::parseToSessionUpdate)
.doOnError(error -> LOGGER.error("Failed to parse session update", error))
.onErrorResume(error -> {
LOGGER.warning("Skipping unparseable event due to error: {}", error.getMessage());
LOGGER.warning("Skipping unrecognized server event: {}", error.getMessage());
return Flux.empty();
});
}
Expand Down Expand Up @@ -880,7 +880,7 @@ private Mono<HttpHeaders> getAuthorizationHeaders() {
headers.set(API_KEY, keyCredential.getKey());
return Mono.just(headers);
} else if (tokenCredential != null) {
TokenRequestContext tokenRequest = new TokenRequestContext().addScopes(COGNITIVE_SERVICES_SCOPE);
TokenRequestContext tokenRequest = new TokenRequestContext().addScopes(AZURE_AI_SCOPE);
return tokenCredential.getToken(tokenRequest).map(at -> {
headers.set(HttpHeaderName.AUTHORIZATION, "Bearer " + at.getToken());
return headers;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.ai.voicelive.models;

import com.azure.core.annotation.Fluent;

import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Objects;

/**
* Configuration for connecting to an Azure AI Foundry agent session.
*
* <p>This class provides the necessary parameters to establish a connection with an
* Azure AI Foundry agent, including the agent name, project name, and optional
* parameters like agent version, conversation ID, and authentication settings.</p>
*
* <p><strong>Example usage:</strong></p>
* <pre>{@code
* AgentSessionConfig config = new AgentSessionConfig("my-agent", "my-project")
* .setAgentVersion("1.0")
* .setConversationId("conv-123");
*
* client.startSession(config).subscribe(session -> {
* // Use the session
* });
* }</pre>
*/
@Fluent
public final class AgentSessionConfig {

private final String agentName;
private final String projectName;
private String agentVersion;
private String authenticationIdentityClientId;
private String conversationId;
private String foundryResourceOverride;

/**
* Creates a new AgentSessionConfig with the required parameters.
*
* @param agentName The name of the agent. This is required.
* @param projectName The name of the project containing the agent. This is required.
* @throws NullPointerException if agentName or projectName is null.
* @throws IllegalArgumentException if agentName or projectName is empty.
*/
public AgentSessionConfig(String agentName, String projectName) {
Objects.requireNonNull(agentName, "'agentName' cannot be null");
Objects.requireNonNull(projectName, "'projectName' cannot be null");

if (agentName.isEmpty()) {
throw new IllegalArgumentException("'agentName' cannot be empty");
}
if (projectName.isEmpty()) {
throw new IllegalArgumentException("'projectName' cannot be empty");
}

this.agentName = agentName;
this.projectName = projectName;
}

/**
* Gets the agent name.
*
* @return The agent name.
*/
public String getAgentName() {
return agentName;
}

/**
* Gets the project name.
*
* @return The project name.
*/
public String getProjectName() {
return projectName;
}

/**
* Gets the agent version.
*
* @return The agent version, or null if not set.
*/
public String getAgentVersion() {
return agentVersion;
}

/**
* Sets the agent version.
*
* @param agentVersion The agent version.
* @return This AgentSessionConfig for chaining.
*/
public AgentSessionConfig setAgentVersion(String agentVersion) {
this.agentVersion = agentVersion;
return this;
}

/**
* Gets the authentication identity client ID.
*
* <p>This is used when the agent requires a specific managed identity for authentication.</p>
*
* @return The authentication identity client ID, or null if not set.
*/
public String getAuthenticationIdentityClientId() {
return authenticationIdentityClientId;
}

/**
* Sets the authentication identity client ID.
*
* <p>This is used when the agent requires a specific managed identity for authentication.</p>
*
* @param authenticationIdentityClientId The client ID of the managed identity to use.
* @return This AgentSessionConfig for chaining.
*/
public AgentSessionConfig setAuthenticationIdentityClientId(String authenticationIdentityClientId) {
this.authenticationIdentityClientId = authenticationIdentityClientId;
return this;
}

/**
* Gets the conversation ID.
*
* <p>This can be used to resume a previous conversation with the agent.</p>
*
* @return The conversation ID, or null if not set.
*/
public String getConversationId() {
return conversationId;
}

/**
* Sets the conversation ID.
*
* <p>This can be used to resume a previous conversation with the agent.</p>
*
* @param conversationId The conversation ID.
* @return This AgentSessionConfig for chaining.
*/
public AgentSessionConfig setConversationId(String conversationId) {
this.conversationId = conversationId;
return this;
}

/**
* Gets the Foundry resource override.
*
* <p>This can be used to specify a different Azure AI Foundry resource than the default.</p>
*
* @return The Foundry resource override, or null if not set.
*/
public String getFoundryResourceOverride() {
return foundryResourceOverride;
}

/**
* Sets the Foundry resource override.
*
* <p>This can be used to specify a different Azure AI Foundry resource than the default.</p>
*
* @param foundryResourceOverride The Foundry resource override.
* @return This AgentSessionConfig for chaining.
*/
public AgentSessionConfig setFoundryResourceOverride(String foundryResourceOverride) {
this.foundryResourceOverride = foundryResourceOverride;
return this;
}

/**
* Converts this configuration to query parameters for the WebSocket connection.
*
* @return A map of query parameter names to values.
*/
public Map<String, String> toQueryParameters() {
Map<String, String> params = new LinkedHashMap<>();

// Required parameters
params.put("agent-name", agentName);
params.put("agent-project-name", projectName);

// Optional parameters
if (agentVersion != null && !agentVersion.isEmpty()) {
params.put("agent-version", agentVersion);
}
if (conversationId != null && !conversationId.isEmpty()) {
params.put("conversation-id", conversationId);
}
if (authenticationIdentityClientId != null && !authenticationIdentityClientId.isEmpty()) {
params.put("agent-authentication-identity-client-id", authenticationIdentityClientId);
}
if (foundryResourceOverride != null && !foundryResourceOverride.isEmpty()) {
params.put("foundry-resource-override", foundryResourceOverride);
}

return params;
}
}
Loading