-
Notifications
You must be signed in to change notification settings - Fork 62
LCORE-861: Azure Entra ID token managment #988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds Azure Entra ID authentication (token manager), integrates token refresh into startup and per-request flows (reloads library client or updates provider-data), introduces BYOK/RAG config enrichment and an entrypoint, and updates CI, Docker, Makefile, docs, examples, dependencies, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as API Client
participant Endpoint as Query/Streaming Endpoint
participant AEM as AzureEntraIDManager
participant Holder as AsyncLlamaStackClientHolder
participant Llama as Llama Stack
Client->>Endpoint: POST /query
Endpoint->>AEM: is_entra_id_configured? / is_token_expired?
alt token expired
Endpoint->>AEM: refresh_token()
AEM->>AEM: fetch token via ClientSecretCredential
AEM->>Endpoint: AZURE_API_KEY updated
Endpoint->>Holder: reload_library_client() or update_provider_data()
Holder->>Llama: reinitialize or update headers
end
Endpoint->>Llama: send request (with refreshed creds)
Llama-->>Endpoint: response
Endpoint-->>Client: response
sequenceDiagram
participant Entrypoint as scripts/llama-stack-entrypoint.sh
participant Enricher as src/llama_stack_configuration.py
participant Env as .env
participant Llama as Llama Stack
Entrypoint->>Entrypoint: detect lightspeed config
alt config present
Entrypoint->>Enricher: generate_configuration(...)
Enricher->>Enricher: setup_azure_entra_id_token() -> write AZURE_API_KEY to Env
Enricher->>Enricher: enrich BYOK_RAG, write enriched YAML
Entrypoint->>Env: source .env (if created)
Entrypoint->>Llama: exec llama-stack with enriched config
else
Entrypoint->>Llama: exec llama-stack with original config
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
481a427 to
def9221
Compare
| echo "=== Recent commits ===" | ||
| git log --oneline -5 | ||
| - name: Add Azure Entra ID config block to all test configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same set of LCS config files is shared across all the test jobs. Thus the Entra ID config block has to be added only to the relevant test jobs.
| def _update_access_token(self, token: str, expires_on: int) -> None: | ||
| """Update the token in env var and track expiration time.""" | ||
| self._expires_on = expires_on - TOKEN_EXPIRATION_LEEWAY | ||
| os.environ["AZURE_API_KEY"] = token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Azure token is stored only as an environment variable (single source of truth principle). Every uvicorn process should have its own scope for environment variables. Environment variable is used for the injection into LLS config.
| def update_provider_data(self, updates: dict[str, str]) -> AsyncLlamaStackClient: | ||
| """Update provider data headers for service client. | ||
| For use with service mode only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different principle for service client then for library client. The re-injection of a new access token would need to restart the whole service, which is not feasible. Therefore, it replaces tokens in providers' header which is preferably used.
| # configuration for Llama Stack | ||
| if args.generate_llama_stack_configuration: | ||
| try: | ||
| generate_configuration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to src/llama_stack_configuration.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/lightspeed_stack.py (2)
26-40: Docstring references removed CLI flags.The docstring still documents the removed
-g/--generate-llama-stack-configuration,-i/--input-config-file, and-o/--output-config-fileflags (lines 33-36) that are no longer part of the argument parser.📝 Suggested fix
def create_argument_parser() -> ArgumentParser: """Create and configure argument parser object. The parser includes these options: - -v / --verbose: enable verbose output - -d / --dump-configuration: dump the loaded configuration to JSON and exit - -c / --config: path to the configuration file (default "lightspeed-stack.yaml") - - -g / --generate-llama-stack-configuration: generate a Llama Stack - configuration from the service configuration - - -i / --input-config-file: Llama Stack input configuration filename (default "run.yaml") - - -o / --output-config-file: Llama Stack output configuration filename (default "run_.yaml") Returns: Configured ArgumentParser for parsing the service CLI options. """
69-86: Docstring references removed functionality.The
main()docstring still references--generate-llama-stack-configurationbehavior (lines 77-79) and mentions "Llama Stack generation fails" as a SystemExit condition (lines 84-85), but this functionality has been moved tosrc/llama_stack_configuration.py.📝 Suggested fix
def main() -> None: """Entry point to the web service. Start the Lightspeed Core Stack service process based on CLI flags and configuration. Parses command-line arguments, loads the configured settings, and then: - If --dump-configuration is provided, writes the active configuration to configuration.json and exits (exits with status 1 on failure). - - If --generate-llama-stack-configuration is provided, generates and stores - the Llama Stack configuration to the specified output file and exits - (exits with status 1 on failure). - Otherwise, sets LIGHTSPEED_STACK_CONFIG_PATH for worker processes, starts the quota scheduler, and starts the Uvicorn web service. Raises: - SystemExit: when configuration dumping or Llama Stack generation fails - (exits with status 1). + SystemExit: when configuration dumping fails (exits with status 1). """tests/unit/test_client.py (2)
53-66: Missing@pytest.mark.asynciodecorator.This async test function is missing the
@pytest.mark.asynciodecorator, which could cause it to not execute properly as an async test.Proposed fix
+@pytest.mark.asyncio async def test_get_async_llama_stack_remote_client() -> None:
69-83: Missing@pytest.mark.asynciodecorator.Same issue as above - this async test function needs the
@pytest.mark.asynciodecorator.Proposed fix
+@pytest.mark.asyncio async def test_get_async_llama_stack_wrong_configuration() -> None:
🤖 Fix all issues with AI agents
In `@docs/providers.md`:
- Around line 87-89: The YAML example uses AZURE_TENANT_ID / AZURE_CLIENT_ID /
AZURE_CLIENT_SECRET but the bash export example uses TENANT_ID / CLIENT_ID /
CLIENT_SECRET; make them consistent by renaming the bash exports to
AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET (or alternately rename
the YAML keys to the non-prefixed names) and update all occurrences (including
the other instance noted) or add a brief note explaining the intentional
difference so readers aren’t confused.
In `@examples/azure-run.yaml`:
- Around line 71-77: The YAML uses an outdated Azure OpenAI api_version value;
update the api_version field under the inference -> config block (the entry with
provider_id: azure and provider_type: remote::azure) to a current release—e.g.,
set api_version to "2025-04-01-preview" for the data-plane or "2025-06-01" if
you need control-plane GA features—so the inference provider uses a supported
API version.
In `@scripts/llama-stack-entrypoint.sh`:
- Around line 7-10: The issue is that INPUT_CONFIG and ENRICHED_CONFIG can point
to the same file so the existence check for ENRICHED_CONFIG can be true even
when enrichment failed; update the script so ENRICHED_CONFIG uses a distinct
default path (e.g., a different filename) or capture the enrichment script's
exit status instead of suppressing errors with "|| true" and use that exit code
to decide whether to treat the output as enriched; refer to the variables
INPUT_CONFIG and ENRICHED_CONFIG and the enrichment invocation (the command
currently followed by "|| true") and the conditional "if [ -f
\"$ENRICHED_CONFIG\" ]" when making the change.
In `@src/app/endpoints/query.py`:
- Around line 319-339: The current code uses next(...) to find azure_config and
will raise StopIteration if no provider with provider_type == "remote::azure"
exists and may convert a missing api_base into the string "None"; change the
lookup to a safe search (e.g., use next((p.config for p in await
client.providers.list() if p.provider_type == "remote::azure"), None) or
explicitly iterate and return None if not found), then check that azure_config
is not None before calling client_holder.update_provider_data (handle the
missing provider case appropriately), and when building azure_api_base use a
safe default instead of str(None) (e.g., use azure_config.get("api_base") or ""
only if not None) so you never store the literal "None" string.
In `@src/app/endpoints/streaming_query.py`:
- Around line 896-916: The next(...) call that looks for a provider with
provider_type == "remote::azure" can raise StopIteration; change the lookup to
use a safe lookup (e.g., azure_config = next((p.config for p in await
client.providers.list() if p.provider_type == "remote::azure"), None)) and then
check azure_config is not None before using it in
client_holder.update_provider_data; if azure_config is None, handle gracefully
(log/raise a clear error or return an appropriate response) so
AzureEntraIDManager.refresh_token() path won't crash when the Azure provider is
missing.
In `@src/llama_stack_configuration.py`:
- Around line 106-108: The except block that catches ClientAuthenticationError
and CredentialUnavailableError (the Azure Entra ID token generation block that
currently calls logger.error("Azure Entra ID: Failed to generate token: %s", e))
must not silently swallow failures; after logging the error re-raise the caught
exception (raise) so callers can handle or fail fast. Locate the except handling
those two exceptions in the token generation function and append a re-raise (or
alternatively change the function to return an explicit failure status and
propagate that to callers) so authentication failures are observable to callers.
- Around line 146-148: The code builds provider_id using
brag.get("vector_db_id") which can be None and yield a malformed "byok_" value;
update the logic around the vector_db_id/provider_id assignment (where
"vector_db_id", "provider_id", "embedding_model" are set) to validate or provide
a safe default for vector_db_id (e.g., use empty-string-safe fallback or a
sentinel like "unknown") and then construct provider_id from that validated
value (or skip/omit provider_id when vector_db_id is absent) so provider_id is
never "byok_" with an empty suffix.
🧹 Nitpick comments (22)
pyproject.toml (1)
63-64: Add version constraints to Azure dependencies for consistency with other packages.Both
azure-coreandazure-identityare missing version constraints, while every other dependency in the file specifies minimum or exact versions (e.g.,fastapi>=0.115.12,kubernetes>=30.1.0). Consider usingazure-core>=1.37.0andazure-identity>=1.25.1to maintain consistency and prevent unexpected behavior from major version updates.src/app/main.py (1)
44-55: Redundant environment variable assignment.Lines 49-51 set
AZURE_API_KEYfromaccess_token, butAzureEntraIDManager.refresh_token()already setsos.environ["AZURE_API_KEY"]internally via_update_access_token(). The explicit assignment is unnecessary.♻️ Suggested simplification
azure_config = configuration.configuration.azure_entra_id if azure_config is not None: AzureEntraIDManager().set_config(azure_config) try: await AzureEntraIDManager().refresh_token() - os.environ["AZURE_API_KEY"] = ( - AzureEntraIDManager().access_token.get_secret_value() - ) logger.info("Azure Entra ID token set in environment") except ValueError as e: logger.error("Failed to refresh Azure token: %s", e)Additionally, consider whether a failed token refresh should prevent startup. Currently, a
ValueErroris logged but the service continues, which could lead to authentication failures at runtime.src/app/endpoints/query.py (1)
319-327: Consider storing the singleton reference for cleaner code.
AzureEntraIDManager()is instantiated multiple times (lines 321, 322, 324, and 336). While it's a singleton, storing a single reference improves readability.♻️ Suggested refactor
if provider_id == "azure": + azure_manager = AzureEntraIDManager() if ( - AzureEntraIDManager().is_entra_id_configured - and AzureEntraIDManager().is_token_expired + azure_manager.is_entra_id_configured + and azure_manager.is_token_expired ): - await AzureEntraIDManager().refresh_token() + await azure_manager.refresh_token() if client_holder.is_library_client: client = await client_holder.reload_library_client() else: # ... provider lookup ... client = client_holder.update_provider_data( { - "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(), + "azure_api_key": azure_manager.access_token.get_secret_value(), "azure_api_base": str(azure_config.get("api_base")), } )scripts/llama-stack-entrypoint.sh (1)
15-19: Consider capturing enrichment failures for debugging.The
|| truesuppresses all errors from the enrichment script without logging them. If enrichment fails, operators won't have visibility into why.♻️ Suggested improvement
echo "Enriching llama-stack config..." - python3 /opt/app-root/llama_stack_configuration.py \ + if ! python3 /opt/app-root/llama_stack_configuration.py \ -c "$LIGHTSPEED_CONFIG" \ -i "$INPUT_CONFIG" \ -o "$ENRICHED_CONFIG" \ - -e "$ENV_FILE" 2>&1 || true + -e "$ENV_FILE" 2>&1; then + echo "Warning: Config enrichment failed, will use original config" + fisrc/models/config.py (1)
1482-1494: Consider adding Field metadata for consistency.The
tenant_id,client_id, andclient_secretfields lack thetitleanddescriptionmetadata that other configuration fields in this module include. This affects API documentation and schema generation consistency.♻️ Suggested addition of field metadata
class AzureEntraIdConfiguration(ConfigurationBase): """Microsoft Entra ID authentication attributes for Azure.""" - tenant_id: SecretStr - client_id: SecretStr - client_secret: SecretStr + tenant_id: SecretStr = Field( + ..., + title="Tenant ID", + description="Azure AD tenant ID for authentication.", + ) + client_id: SecretStr = Field( + ..., + title="Client ID", + description="Azure AD application (client) ID.", + ) + client_secret: SecretStr = Field( + ..., + title="Client Secret", + description="Azure AD client secret for authentication.", + ) scope: str = Field( "https://cognitiveservices.azure.com/.default", title="Token scope", description="Azure Cognitive Services scope for token requests. " "Override only if using a different Azure service.", )examples/lightspeed-stack-azure-entraid-service.yaml (2)
1-8: Consider adding a security note forauth_enabled: false.This example has
auth_enabled: falseon line 5. While appropriate for development examples, consider adding a comment clarifying this should be enabled in production deployments.
9-17: Clarify the placeholderapi_keyvalue.The
api_key: xyzzyon line 17 is a placeholder. Since Azure Entra ID manages authentication dynamically, consider adding a comment explaining that this value is overridden at runtime by the token management system, or use a more descriptive placeholder like"<replaced-at-runtime>".Makefile (1)
20-23: Handle missing.envfile gracefully and clarify in-place config modification.Two observations:
Missing
.envhandling: If.envdoesn't exist orAZURE_API_KEYisn't defined,grepwill fail silently andAZURE_API_KEYwill be empty, potentially causing runtime issues without clear error messages.In-place modification: Using the same file for
-iand-o($(LLAMA_STACK_CONFIG)) overwrites the original. This may be intentional but could surprise users.Suggested improvement for `.env` handling
run-llama-stack: ## Start Llama Stack with enriched config (for local service mode) + `@test` -f .env || (echo "Error: .env file not found" && exit 1) uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \ - AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2-) \ + AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2- || echo "") \ uv run llama stack run $(LLAMA_STACK_CONFIG)src/app/endpoints/streaming_query.py (1)
897-901: Consider caching theAzureEntraIDManager()instance.While
AzureEntraIDManageris a singleton, calling it three times in this block is slightly verbose. Consider storing it in a local variable for readability.Suggested improvement
if provider_id == "azure": + azure_manager = AzureEntraIDManager() if ( - AzureEntraIDManager().is_entra_id_configured - and AzureEntraIDManager().is_token_expired + azure_manager.is_entra_id_configured + and azure_manager.is_token_expired ): - await AzureEntraIDManager().refresh_token() + await azure_manager.refresh_token()tests/unit/authorization/test_azure_token_manager.py (1)
122-143: Type annotation improvement and assertion placement.
- The
caplogparameter should usepytest.LogCaptureFixtureinstead ofAnyfor proper type checking.- The assertion at line 143 is inside the
caplog.at_levelcontext manager - while this works, placing it outside is cleaner.Suggested improvement
`@pytest.mark.asyncio` async def test_refresh_token_failure_logs_error( self, token_manager: AzureEntraIDManager, dummy_config: AzureEntraIdConfiguration, mocker: MockerFixture, - caplog: Any, + caplog: pytest.LogCaptureFixture, ) -> None: """Log error when token retrieval fails due to ClientAuthenticationError.""" token_manager.set_config(dummy_config) mock_credential_instance = mocker.Mock() mock_credential_instance.get_token.side_effect = ClientAuthenticationError( "fail" ) mocker.patch( "authorization.azure_token_manager.ClientSecretCredential", return_value=mock_credential_instance, ) with caplog.at_level("ERROR"): await token_manager.refresh_token() - assert "Failed to retrieve Azure access token" in caplog.text + assert "Failed to retrieve Azure access token" in caplog.texttests/unit/test_llama_stack_configuration.py (1)
157-177: Multipletype: ignorecomments indicate potential model validation issues.The extensive use of
# type: ignore[call-arg]suggests the Pydantic models have required fields not being provided. While this works at runtime, consider:
- Creating a test fixture with all required fields
- Using
model_construct()to bypass validation for test-only scenariosThis would improve type safety and make the test more maintainable.
docker-compose.yaml (1)
11-14: Inconsistent SELinux volume mount labels.Line 12 uses
:Z(exclusive private label) forrun.yaml, while line 14 uses:z(shared label) forlightspeed-stack.yaml. This inconsistency could cause issues:
:Zrelabels the content to be private to the container:zallows sharing between containersIf both services need to read
lightspeed-stack.yaml,:zis correct for that file. Consider making the labels consistent based on actual sharing requirements.src/authorization/azure_token_manager.py (2)
59-74: Consider making token retrieval truly async or documenting blocking behavior.The
refresh_tokenmethod isasyncbut calls the synchronous_retrieve_access_tokenwhich blocks on the Azure SDK'sget_token(). This blocks the event loop during token retrieval.Options:
- Run the blocking call in an executor:
await asyncio.to_thread(self._retrieve_access_token)- Document that brief blocking is expected (token refresh is infrequent)
For a token refresh that happens every ~hour, the brief blocking may be acceptable, but worth documenting.
Option 1: Use asyncio.to_thread
+import asyncio + async def refresh_token(self) -> None: """Refresh the cached Azure access token. This is async to enforce proper ordering in the event loop - callers must await this before using the refreshed token. Raises: ValueError: If Entra ID configuration has not been set. """ if self._entra_id_config is None: raise ValueError("Azure Entra ID configuration not set") logger.info("Refreshing Azure access token") - token_obj = self._retrieve_access_token() + token_obj = await asyncio.to_thread(self._retrieve_access_token) if token_obj: self._update_access_token(token_obj.token, token_obj.expires_on)
90-100: Error logging lacks exception details for debugging.When token retrieval fails, only a generic message is logged without the exception details. This makes troubleshooting authentication issues harder.
Include exception in log
except (ClientAuthenticationError, CredentialUnavailableError): - logger.error("Failed to retrieve Azure access token") + logger.error("Failed to retrieve Azure access token", exc_info=True) return Nonesrc/llama_stack_configuration.py (8)
82-100: Consider security implications of writing token to disk.The Azure access token is written in plaintext to the
.envfile. While this enables the Llama Stack service to read it at startup, be aware that:
- Tokens persisted to disk may be accessed by other processes
- The file should have restrictive permissions (e.g., 0600)
Consider adding file permission restrictions after writing.
Proposed enhancement: Set restrictive file permissions
with open(env_file, "w", encoding="utf-8") as f: f.writelines(lines) + + # Restrict file permissions to owner-only + os.chmod(env_file, 0o600) logger.info(
126-127: Docstring type doesn't match signature.The docstring references
list[ByokRag]but the actual parameter type islist[dict[str, Any]]. Update the docstring to match the implementation.Proposed fix
Parameters: ls_config (dict[str, Any]): Existing Llama Stack configuration mapping used as the base; existing `vector_dbs` entries are preserved if present. - byok_rag (list[ByokRag]): List of BYOK RAG definitions to be added to + byok_rag (list[dict[str, Any]]): List of BYOK RAG definitions to be added to the `vector_dbs` section.
139-140: Potential mutation of input data via reference assignment.Assigning
output = ls_config["vector_dbs"]creates a reference to the original list. Subsequentoutput.append()calls will modify the originalls_config["vector_dbs"]in place. Per coding guidelines, avoid in-place parameter modification; return new data structures instead.Proposed fix: Create a copy
# fill-in existing vector_dbs entries if "vector_dbs" in ls_config: - output = ls_config["vector_dbs"] + output = list(ls_config["vector_dbs"])
173-174: Docstring type doesn't match signature.Same issue as
construct_vector_dbs_section: docstring mentionslist[ByokRag]but actual type islist[dict[str, Any]].Proposed fix
- byok_rag (list[ByokRag]): List of BYOK RAG specifications to convert + byok_rag (list[dict[str, Any]]): List of BYOK RAG specifications to convert into provider entries.
187-188: Same reference assignment issue - creates mutation side effect.As per coding guidelines, avoid in-place parameter modification.
Proposed fix
# fill-in existing vector_io entries if "providers" in ls_config and "vector_io" in ls_config["providers"]: - output = ls_config["providers"]["vector_io"] + output = list(ls_config["providers"]["vector_io"])
253-256: Missing error handling for input file operations.If
input_filedoesn't exist or contains invalid YAML, the exception will propagate up unhandled. Consider wrapping in try-except for consistent error handling.Proposed fix
logger.info("Reading Llama Stack configuration from file %s", input_file) - with open(input_file, "r", encoding="utf-8") as file: - ls_config = yaml.safe_load(file) + try: + with open(input_file, "r", encoding="utf-8") as file: + ls_config = yaml.safe_load(file) + except FileNotFoundError: + logger.error("Input Llama Stack config not found: %s", input_file) + raise + except yaml.YAMLError as e: + logger.error("Invalid YAML in input file %s: %s", input_file, e) + raise
310-312: Consider using sys.exit for error conditions.When the config file is not found, the function logs an error and returns. For CLI tools, it's more conventional to exit with a non-zero status code to signal failure to calling scripts or CI pipelines.
Proposed fix
+import sys + ... except FileNotFoundError: logger.error("Config not found: %s", args.config) - return + sys.exit(1)
275-276: Docstring could be more descriptive.Per coding guidelines, all functions require docstrings with brief descriptions. The current docstring "CLI entry point." is minimal. Consider expanding to describe what the CLI does.
Proposed enhancement
def main() -> None: - """CLI entry point.""" + """CLI entry point for Llama Stack configuration enrichment. + + Reads Lightspeed configuration, applies Azure Entra ID token setup + and BYOK RAG enrichment, and writes the enriched Llama Stack config. + """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (38)
.github/workflows/e2e_tests.yaml.github/workflows/e2e_tests_providers.yamlMakefileREADME.mddocker-compose-library.yamldocker-compose.yamldocs/providers.mdexamples/azure-run.yamlexamples/lightspeed-stack-azure-entraid-lib.yamlexamples/lightspeed-stack-azure-entraid-service.yamlpyproject.tomlscripts/llama-stack-entrypoint.shsrc/app/endpoints/query.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/main.pysrc/authorization/azure_token_manager.pysrc/client.pysrc/configuration.pysrc/lightspeed_stack.pysrc/llama_stack_configuration.pysrc/models/config.pytest.containerfiletests/e2e/configs/run-azure.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yamltests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/features/steps/info.pytests/unit/authentication/test_api_key_token.pytests/unit/authorization/test_azure_token_manager.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_client.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.py
💤 Files with no reviewable changes (6)
- tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
- tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
- src/app/endpoints/query_v2.py
- tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
- tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
- tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Use FastAPI dependencies:from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, usingtyping_extensions.Selffor model validators
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Runuv run make formatto auto-format code with black and ruff before completion
Runuv run make verifyto run all linters (black, pyl...
Files:
src/models/config.pysrc/configuration.pytests/unit/authentication/test_api_key_token.pytests/unit/test_client.pytests/e2e/features/steps/info.pytests/unit/authorization/test_azure_token_manager.pysrc/app/endpoints/streaming_query.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.pysrc/lightspeed_stack.pysrc/app/endpoints/query.pysrc/client.pysrc/authorization/azure_token_manager.pysrc/app/main.pysrc/llama_stack_configuration.py
src/models/config.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Configuration base class setsextra="forbid"to reject unknown fields
Use type hints:Optional[FilePath],PositiveInt,SecretStrfor configuration fields
ExtendConfigurationBasefor configuration Pydantic models
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
ExtendBaseModelfor data Pydantic models
Use@model_validatorand@field_validatorfor Pydantic model validation
Files:
src/models/config.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Usepytest-mockfor AsyncMock objects in tests
UseMOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")pattern for authentication mocks in tests
Files:
tests/unit/authentication/test_api_key_token.pytests/unit/test_client.pytests/e2e/features/steps/info.pytests/unit/authorization/test_azure_token_manager.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion
Files:
tests/unit/authentication/test_api_key_token.pytests/unit/test_client.pytests/unit/authorization/test_azure_token_manager.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.py
tests/e2e/features/steps/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define E2E test steps in
tests/e2e/features/steps/directory
Files:
tests/e2e/features/steps/info.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.pysrc/app/main.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Checkpyproject.tomlfor supported Python versions before development
Always checkpyproject.tomlfor existing dependencies before adding new ones
Always verify current library versions inpyproject.tomlrather than assuming versions
Use pylint withsource-roots = "src"configuration
Files:
pyproject.toml
🧠 Learnings (13)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
src/models/config.pysrc/configuration.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/**/*.py : Use `MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")` pattern for authentication mocks in tests
Applied to files:
tests/unit/authentication/test_api_key_token.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
tests/unit/test_client.pytest.containerfilesrc/app/endpoints/streaming_query.pysrc/lightspeed_stack.pysrc/client.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
test.containerfile
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
examples/azure-run.yamlsrc/lightspeed_stack.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling
Applied to files:
src/lightspeed_stack.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt`
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack.yaml.github/workflows/e2e_tests.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to pyproject.toml : Always check `pyproject.toml` for existing dependencies before adding new ones
Applied to files:
pyproject.toml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data
Applied to files:
src/authorization/azure_token_manager.py
🧬 Code graph analysis (11)
src/models/config.py (1)
src/configuration.py (1)
azure_entra_id(345-349)
src/configuration.py (1)
src/models/config.py (1)
AzureEntraIdConfiguration(1482-1493)
tests/unit/test_client.py (2)
src/client.py (5)
load(32-40)get_client(101-115)is_library_client(28-30)update_provider_data(133-164)reload_library_client(117-131)src/utils/types.py (1)
Singleton(43-57)
tests/unit/authorization/test_azure_token_manager.py (3)
src/authorization/azure_token_manager.py (7)
AzureEntraIDManager(22-100)access_token(55-57)is_token_expired(50-52)is_entra_id_configured(45-47)set_config(39-42)refresh_token(59-74)_update_access_token(76-83)src/configuration.py (1)
configuration(95-106)src/models/config.py (1)
AzureEntraIdConfiguration(1482-1493)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
AzureEntraIDManager(22-100)is_entra_id_configured(45-47)is_token_expired(50-52)refresh_token(59-74)access_token(55-57)src/client.py (5)
AsyncLlamaStackClientHolder(21-164)get_client(101-115)is_library_client(28-30)reload_library_client(117-131)update_provider_data(133-164)
tests/unit/test_configuration.py (1)
src/configuration.py (2)
load_configuration(65-74)azure_entra_id(345-349)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
generate_configuration(239-267)construct_vector_dbs_section(115-157)construct_vector_io_providers_section(160-210)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (4)
is_entra_id_configured(45-47)is_token_expired(50-52)refresh_token(59-74)access_token(55-57)src/client.py (4)
get_client(101-115)is_library_client(28-30)reload_library_client(117-131)update_provider_data(133-164)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
configuration(95-106)src/models/config.py (1)
AzureEntraIdConfiguration(1482-1493)src/utils/types.py (1)
Singleton(43-57)
src/app/main.py (6)
src/authorization/azure_token_manager.py (4)
AzureEntraIDManager(22-100)set_config(39-42)refresh_token(59-74)access_token(55-57)src/app/database.py (1)
initialize_database(102-136)src/configuration.py (2)
configuration(95-106)azure_entra_id(345-349)src/log.py (1)
get_logger(7-25)src/a2a_storage/storage_factory.py (1)
A2AStorageFactory(20-185)src/models/responses.py (1)
InternalServerErrorResponse(1787-1945)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
config(325-342)config(1306-1314)dump(1607-1619)src/lightspeed_stack.py (1)
main(69-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (55)
tests/unit/authentication/test_api_key_token.py (1)
3-6: LGTM!The added blank line after the module docstring improves formatting consistency with PEP 8 style conventions.
tests/e2e/configs/run-azure.yaml (2)
79-82: Verify: Should this config use the new Azure Entra ID credentials?This e2e test configuration still uses
${env.AZURE_API_KEY}for the Azure inference provider. Given that the PR introduces Azure Entra ID token management withTENANT_ID,CLIENT_ID, andCLIENT_SECRET, please confirm whether this configuration should be updated to use the new credential pattern or ifAZURE_API_KEYis intentionally retained for backward compatibility testing.
90-90: LGTM!The whitespace normalization (blank line after provider entry and trailing newline) improves file formatting consistency.
Also applies to: 156-156
tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)
19-20: LGTM!No functional changes; the file formatting is normalized with a proper trailing newline.
tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)
18-19: LGTM!Trailing newline added for proper file formatting. No functional changes.
tests/e2e/features/steps/info.py (1)
113-116: Hardcoded value reduces test flexibility and creates inconsistency.The
check_shield_structurefunction now uses a hardcoded"openai/gpt-4o-mini"forprovider_resource_id, whilecheck_model_structure(lines 49-58) dynamically usescontext.default_modelandcontext.default_provider. This inconsistency could cause test failures if the test environment configuration changes.Consider using the context variables for consistency:
♻️ Suggested fix
+ expected_model = f"{context.default_provider}/{context.default_model}" - assert found_shield["provider_resource_id"] == "openai/gpt-4o-mini", ( - f"provider_resource_id should be 'openai/gpt-4o-mini', " + assert found_shield["provider_resource_id"] == expected_model, ( + f"provider_resource_id should be '{expected_model}', " f"but is '{found_shield['provider_resource_id']}'" )Was this hardcoding intentional to stabilize tests during the Azure credential migration? If so, please add a comment explaining the rationale.
tests/unit/models/config/test_dump_configuration.py (7)
103-103: LGTM!The assertion correctly verifies that the new
azure_entra_idfield is present in the serialized configuration output.
205-206: LGTM!The expected JSON correctly includes
"azure_entra_id": None, aligning with the new optional configuration field and maintaining consistency with the assertion on line 103.
415-415: LGTM!Consistent assertion for the new field presence in the quota limiters test.
532-533: LGTM!Expected JSON correctly updated for the quota limiters with different values test.
749-750: LGTM!Expected JSON correctly updated for the quota limiters with different scheduler values test.
940-941: LGTM!Expected JSON correctly updated for the BYOK RAG configuration test.
1117-1118: LGTM!Expected JSON correctly updated for the PostgreSQL namespace configuration test.
src/app/main.py (1)
12-12: LGTM!Import added for the new Azure Entra ID token management functionality.
docker-compose-library.yaml (1)
23-26: LGTM with a naming consideration.The migration from
AZURE_API_KEYto separate Entra ID credentials (TENANT_ID,CLIENT_ID,CLIENT_SECRET) correctly supports dynamic token management. The comment on line 23 helpfully explains the new flow.Consider using more specific names like
AZURE_TENANT_ID,AZURE_CLIENT_ID,AZURE_CLIENT_SECRETto avoid potential conflicts with other services or providers that might use similar generic names.Verify that the configuration loading code expects these exact variable names and update consistently if renamed.
README.md (1)
179-180: LGTM!Documentation updated to reflect Azure model compatibility with the new Entra ID authentication flow. The table formatting is consistent with other provider entries.
.github/workflows/e2e_tests.yaml (1)
63-63: LGTM!The whitespace adjustment and the updated configuration source message now correctly reflect the environment-specific run YAML selection (
run-${{ matrix.environment }}.yaml), aligning with the existing config selection logic.Also applies to: 103-103
examples/lightspeed-stack-azure-entraid-lib.yaml (1)
1-30: LGTM!This example configuration correctly demonstrates the Azure Entra ID integration for library mode, with proper environment variable references for credentials. The structure aligns with the new
AzureEntraIdConfigurationmodel.src/app/endpoints/query.py (2)
72-73: LGTM!The import of
AzureEntraIDManagerfollows the absolute import pattern as per coding guidelines.
310-311: LGTM!The refactored client acquisition via
AsyncLlamaStackClientHolderproperly supports the dynamic token refresh workflow.scripts/llama-stack-entrypoint.sh (1)
21-25: LGTM!The
.envsourcing withset -a/set +acorrectly exports all variables, and the shellcheck directive is appropriately applied.src/models/config.py (1)
1605-1605: LGTM!The
azure_entra_idfield is correctly added as an optional configuration section, consistent with other optional sections likeauthorizationandcustomization. Based on learnings, all config uses Pydantic models extendingConfigurationBase, which is satisfied.test.containerfile (3)
2-2: LGTM!Base image update is consistent with the project's image versioning approach.
6-11: LGTM!Adding
azure-identitydependency aligns with the Azure Entra ID token management feature. The directory setup and ownership configuration follow best practices for rootless container execution.
13-22: No action needed. Thellama_stack_configuration.pyscript is invoked explicitly with thepython3interpreter in the entrypoint script (python3 /opt/app-root/llama_stack_configuration.py), so a shebang line is not required. Thechmod +xis unnecessary but harmless since the file is never executed directly as a standalone executable.src/configuration.py (2)
14-14: LGTM!Import follows the existing pattern for configuration model imports.
344-349: LGTM!The new
azure_entra_idproperty follows the established pattern of other configuration accessors in this class. TheOptionalreturn type correctly reflects that Azure Entra ID configuration may not be present. The guard clause appropriately raisesLogicErrorwhen configuration hasn't been loaded.src/lightspeed_stack.py (1)
108-116: LGTM!The core startup flow (environment setup, quota scheduler, and Uvicorn) is clean and correct. The comment clarification on line 109 improves readability.
examples/lightspeed-stack-azure-entraid-service.yaml (1)
27-30: LGTM!The
azure_entra_idsection correctly wires credentials to environment variables using the${env.VAR_NAME}syntax, consistent with Llama Stack configuration patterns.Makefile (1)
13-18: LGTM!Good use of
?=for configuration variables allowing command-line overrides. Theruntarget update correctly passes the config file path.tests/unit/test_client.py (3)
13-16: Good practice: Singleton reset fixture for test isolation.The autouse fixture properly resets singleton state between tests, preventing test pollution.
86-131: Test coverage forupdate_provider_datalooks good.The test properly validates:
- Pre-populating existing provider data
- Merging new updates while preserving existing fields
- Client replacement behavior
One note: accessing
_custom_headers(line 102) is a private attribute access. Consider if there's a public way to set initial headers, though for test setup this is often acceptable.
133-152: Test coverage forreload_library_clientis adequate.The test verifies that:
- Library mode is correctly identified
- Reloading returns a new client instance
- The holder is updated with the new client
tests/unit/test_configuration.py (2)
812-846: Good test coverage for Azure Entra ID configuration.The test properly validates that all three required fields (
tenant_id,client_id,client_secret) are loaded and accessible viaget_secret_value().
849-877: Good validation error test for incomplete Azure Entra ID config.This test ensures that partial configurations properly raise
ValidationError, preventing misconfiguration at runtime.src/app/endpoints/streaming_query.py (1)
52-52: Import follows project conventions.The Azure token manager import aligns with the codebase's import style.
.github/workflows/e2e_tests_providers.yaml (2)
55-68: Azure Entra ID config injection step looks correct.The script properly:
- Only runs when
matrix.environment == 'azure'- Uses proper escaping (
\${env.VAR}) to preserve YAML env var syntax- Provides feedback on each modified file
190-192: Credential environment variables properly added.The
TENANT_ID,CLIENT_ID, andCLIENT_SECRETenvironment variables are correctly propagated to both server and library mode runs.Also applies to: 221-223
docs/providers.md (1)
68-177: Comprehensive Azure Entra ID documentation.The documentation covers the authentication flow thoroughly:
- Configuration requirements with clear attribute table
- Critical note about the
${env.AZURE_API_KEY}placeholder requirement- Token lifecycle for both library and service modes
- Security considerations
tests/unit/authorization/test_azure_token_manager.py (2)
21-41: Well-structured test fixtures.The fixtures properly handle singleton reset with
autouse=True, ensuring test isolation. Thedummy_configfixture provides a clean configuration object for testing.
145-158: Good dynamic expiration testing.The time mocking approach properly validates the
is_token_expiredproperty behavior across time boundaries.tests/unit/test_llama_stack_configuration.py (2)
28-83: Comprehensive vector_dbs section tests.Tests properly cover empty input, preserving existing entries, adding new entries, default values, and merge behavior. Good coverage of the
construct_vector_dbs_sectionfunction.
90-128: Good coverage for vector_io providers section.Tests cover the key scenarios: empty config, preserving existing entries, adding new entries, and default provider type fallback.
src/client.py (4)
32-41: Clean separation of library vs service client initialization.The early return pattern and clear branching between library and service modes is well-structured.
117-131: Library client reload implementation looks correct.The
reload_library_clientproperly reuses the stored_config_pathto reinitialize the library client, ensuring env var changes (like refreshed tokens) are picked up.
133-164: Provider data update handles edge cases well.The
update_provider_datamethod properly handles:
- Missing headers
- Invalid JSON in existing header
- Merging new values with existing provider data
This aligns with the Azure token refresh flow for service mode clients.
72-99: Use unique temporary filenames or document library-mode single-worker requirement.The enriched config is written to a fixed path in the temp directory. If library mode is used with multiple workers (despite examples showing
workers: 1), concurrent processes could overwrite each other's config files. Consider usingtempfile.NamedTemporaryFile(delete=False)for unique filenames, or add validation to prevent library mode withworkers > 1and document this constraint.docker-compose.yaml (1)
21-24: Azure Entra ID credential environment variables look correct.The switch from
AZURE_API_KEYto individual credential components (TENANT_ID,CLIENT_ID,CLIENT_SECRET) properly supports the new token management flow where the API key is obtained dynamically via client credentials grant.Also applies to: 64-67
src/authorization/azure_token_manager.py (2)
34-43: Clean initialization and configuration setup.The initialization properly sets default state, and
set_configcorrectly stores the configuration with debug logging.
44-57: Properties are well-implemented.The expiration check correctly handles both uninitialized state (
_expires_on == 0) and actual expiry. UsingSecretStrfor the access token ensures it won't be accidentally logged.examples/azure-run.yaml (2)
114-136: Well-structured storage configuration.The storage section properly separates backends (kv_sqlite, sql_sqlite) from stores, with clear namespace isolation. The inference store configuration with
max_write_queue_size: 10000andnum_writers: 4provides reasonable defaults for handling concurrent writes.
101-110: LGTM!The registered resources section provides a clean starter configuration with the rag-runtime tool group properly registered. Empty lists allow users to add their own resources as needed.
src/llama_stack_configuration.py (3)
1-19: LGTM! Module setup follows coding guidelines.The imports use absolute paths, the logger is configured correctly using
logging.getLogger(__name__), and the module docstring explains both usage modes.
46-48: Add return type annotation.The function signature is missing the
-> Nonereturn type annotation required by coding guidelines.Proposed fix
def setup_azure_entra_id_token( azure_config: dict[str, Any] | None, env_file: str -) -> None: +) -> None:Actually, reviewing again - the return type is present. Disregard.
213-231: LGTM! In-place modification is documented.The docstring clearly states
ls_configis "modified in place", making the mutation behavior explicit and intentional. This is acceptable when documented.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good in overall. I'll add some comments later, but LGTM
| for brag in byok_rag: | ||
| output.append( | ||
| { | ||
| "vector_db_id": brag.vector_db_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Right now the registration of vector_stores from config is not working properly. AFAIK it should but it doesn't, since llama-stack picks up the registration of the vector store from the metadata store
| for brag in byok_rag: | ||
| output.append( | ||
| { | ||
| "provider_id": "byok_" + brag.vector_db_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has also changed in LLS 0.3.x
vector_io:
- config: # Define the storage backend for RAG
persistence:
namespace: vector_io::faiss
backend: kv_default
provider_id: faiss
provider_type: inline::faiss
this is the config for vector_io, then you need to register a storage backend:
storage:
backends:
kv_default: # Define the storage backend type for RAG, in this case registry and RAG are unified i.e. information on registered resources (e.g. models, vector_stores) are saved together with the RAG chunks
type: kv_sqlite
db_path: ~/.llama/storage/rag/kv_store.db
| "urllib3==2.6.2", | ||
| # Used for agent card configuration | ||
| "PyYAML>=6.0.0", | ||
| "azure-core", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add a comment as other deps
|
LGTM overall, I have added some comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)
191-204: Samevector_db_idvalidation issue applies here.The same validation gap exists in this function. If
vector_db_idis missing or empty, theprovider_idbecomes"byok_"anddb_pathbecomes".llama/.db", both malformed. Apply consistent validation as suggested forconstruct_vector_dbs_section.
🤖 Fix all issues with AI agents
In `@src/llama_stack_configuration.py`:
- Around line 310-312: The except FileNotFoundError block currently logs the
missing config and returns, which doesn't signal failure to the CLI; replace the
silent return with a non-zero exit (e.g., call sys.exit(1) or raise
SystemExit(1)) so the process terminates with a failure status when args.config
is missing; ensure sys is imported if you choose sys.exit and keep the
logger.error("Config not found: %s", args.config) call.
♻️ Duplicate comments (2)
src/llama_stack_configuration.py (2)
106-108: Authentication failures are silently swallowed.When
ClientAuthenticationErrororCredentialUnavailableErroroccurs, the function logs an error but continues execution. Callers have no way to know authentication failed, which may lead to runtime failures later when the missingAZURE_API_KEYcauses API calls to fail.Consider re-raising the exception after logging so callers can handle or fail fast.
Proposed fix: Re-raise after logging
except (ClientAuthenticationError, CredentialUnavailableError) as e: logger.error("Azure Entra ID: Failed to generate token: %s", e) + raise
143-151: Missingvector_db_idvalidation results in malformed provider_id.If
brag.get("vector_db_id")returnsNoneor empty string, theprovider_idwill be"byok_"(empty suffix), which may cause issues downstream. Consider validating and skipping entries with missingvector_db_id.Proposed fix: Add validation
# append new vector_dbs entries for brag in byok_rag: + vector_db_id = brag.get("vector_db_id") + if not vector_db_id: + logger.warning("BYOK RAG entry missing vector_db_id, skipping") + continue output.append( { - "vector_db_id": brag.get("vector_db_id", ""), - "provider_id": "byok_" + brag.get("vector_db_id", ""), - "embedding_model": brag.get("embedding_model", ""), - "embedding_dimension": brag.get("embedding_dimension"), + "vector_db_id": vector_db_id, + "provider_id": f"byok_{vector_db_id}", + "embedding_model": brag.get("embedding_model", ""), + "embedding_dimension": brag.get("embedding_dimension"), } )
🧹 Nitpick comments (8)
docs/providers.md (3)
106-106: Consider adding a note about the preview API version.The example shows
api_version: 2025-01-01-preview. Since this is a preview API version, consider adding a brief note that users should verify the latest stable or preview API version from Azure's documentation, or explain that preview versions may have different availability/SLA characteristics.
122-129: Clarify the pre-startup script mechanism.The documentation mentions "runs a pre-startup script" but doesn't specify what script, where it's located, or how it's invoked. Consider adding more details about this mechanism or referencing the relevant code/configuration, as users deploying in service mode may need to understand or troubleshoot this step.
143-145: Consider specifying the token refresh safety margin.The documentation mentions that "Lightspeed refreshes tokens proactively before expiration (with a safety margin)" but doesn't specify what this margin is. If this is a configurable value or a specific time window (e.g., 5 minutes before expiration), documenting it would help users understand the refresh behavior and plan accordingly.
src/llama_stack_configuration.py (5)
126-127: Docstring type mismatch with function signature.The docstring states
byok_rag (list[ByokRag])but the function signature useslist[dict[str, Any]]. Update the docstring to match the actual type.Proposed fix
- byok_rag (list[ByokRag]): List of BYOK RAG definitions to be added to + byok_rag (list[dict[str, Any]]): List of BYOK RAG definitions to be added to
173-174: Docstring type mismatch with function signature.Similar to
construct_vector_dbs_section, the docstring stateslist[ByokRag]but the signature useslist[dict[str, Any]].Proposed fix
- byok_rag (list[ByokRag]): List of BYOK RAG specifications to convert + byok_rag (list[dict[str, Any]]): List of BYOK RAG specifications to convert
213-231: In-place modification is documented but consider returning a new dict.Per coding guidelines, in-place parameter modification should be avoided. While the docstring documents this behavior, returning a modified copy would be more explicit and testable.
Additionally, use more Pythonic empty check:
if not byok_rag:instead ofif len(byok_rag) == 0:.Proposed minor improvement
- if len(byok_rag) == 0: + if not byok_rag: logger.info("BYOK RAG is not configured: skipping") return
255-256: Missing error handling for file read operations.If the input file doesn't exist or contains invalid YAML, an unhandled exception will be raised. Consider adding try-except to handle
FileNotFoundErrorandyaml.YAMLErrorwith appropriate logging, similar to the error handling inmain().Proposed fix
logger.info("Reading Llama Stack configuration from file %s", input_file) - with open(input_file, "r", encoding="utf-8") as file: - ls_config = yaml.safe_load(file) + try: + with open(input_file, "r", encoding="utf-8") as file: + ls_config = yaml.safe_load(file) + except FileNotFoundError: + logger.error("Input file not found: %s", input_file) + raise + except yaml.YAMLError as e: + logger.error("Invalid YAML in input file %s: %s", input_file, e) + raise
275-276: Docstring is minimal for CLI entry point.Per coding guidelines, all functions require docstrings with brief descriptions. Consider expanding the docstring to describe the CLI behavior, arguments, and exit conditions.
Proposed improvement
def main() -> None: - """CLI entry point.""" + """CLI entry point for Llama Stack configuration enrichment. + + Parses command-line arguments for config paths, loads the Lightspeed + configuration, applies environment variable substitution, and generates + an enriched Llama Stack configuration with Azure Entra ID and BYOK RAG + settings. + + Exits with status 1 if the configuration file is not found. + """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/providers.mdpyproject.tomlscripts/llama-stack-entrypoint.shsrc/app/main.pysrc/llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (3)
- scripts/llama-stack-entrypoint.sh
- src/app/main.py
- pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Use FastAPI dependencies:from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, usingtyping_extensions.Selffor model validators
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Runuv run make formatto auto-format code with black and ruff before completion
Runuv run make verifyto run all linters (black, pyl...
Files:
src/llama_stack_configuration.py
🧠 Learnings (2)
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.
Applied to files:
docs/providers.md
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
src/llama_stack_configuration.py
🧬 Code graph analysis (1)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
config(325-342)config(1306-1314)dump(1607-1619)src/lightspeed_stack.py (1)
main(69-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / ci
🔇 Additional comments (4)
docs/providers.md (1)
68-91: LGTM! Clear and well-structured documentation.The introduction and configuration requirements are well-documented with clear tables and examples. The environment variable naming is now consistent throughout the document.
src/llama_stack_configuration.py (3)
1-19: LGTM!Imports are well-organized, using absolute imports as per guidelines. The logger follows the
logging.getLogger(__name__)pattern correctly.
22-38: LGTM!The custom YAML dumper properly enforces block-style indentation. The docstring is well-written and the pylint disable is appropriately scoped.
46-48: Add return type annotation.The function signature is missing the
-> Nonereturn type annotation, which is required per coding guidelines for complete type annotations.Proposed fix
def setup_azure_entra_id_token( azure_config: dict[str, Any] | None, env_file: str -) -> None: +) -> None:Note: The current code already has
-> Noneon line 48, so this is actually correct. Let me re-check...Actually, looking again at line 48, it does have
-> None:. This is fine.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
3e82d84 to
afc72ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 63-65: Fix the typo in the dependency comment by changing
"managment" to "management" and add consistent version constraints for the two
entries "azure-core" and "azure-identity" to match the project's dependency
version pattern (e.g., pin or use the same range format used elsewhere); update
the quoted entries "azure-core" and "azure-identity" in the pyproject.toml
dependency list to include the appropriate version specifiers and ensure
formatting matches surrounding dependencies.
In `@src/app/main.py`:
- Around line 44-50: The startup silently continues when token fetch fails
because AzureEntraIDManager.refresh_token() swallows authentication errors from
_retrieve_access_token() and returns None; update refresh_token() to raise on
auth failures (re-raise ClientAuthenticationError/CredentialUnavailableError
instead of returning None) and/or after calling
AzureEntraIDManager().refresh_token() in main.py immediately verify
AzureEntraIDManager().access_token is non-empty and, if empty, log the error and
fail startup (raise/exit) so the service does not continue with an empty
AZURE_API_KEY.
♻️ Duplicate comments (3)
src/llama_stack_configuration.py (3)
106-107: Authentication failures are silently swallowed.When
ClientAuthenticationErrororCredentialUnavailableErroroccurs, the function logs an error but continues. Callers have no indication authentication failed, potentially leading to runtime failures when the missingAZURE_API_KEYcauses Azure API calls to fail.Proposed fix: Re-raise after logging
except (ClientAuthenticationError, CredentialUnavailableError) as e: logger.error("Azure Entra ID: Failed to generate token: %s", e) + raise
143-150: Missingvector_db_idresults in malformed provider_id.If
brag.get("vector_db_id")returnsNoneor empty string, theprovider_idwill be"byok_"(empty suffix). Consider validating and skipping invalid entries.Proposed fix: Add validation
for brag in byok_rag: + vector_db_id = brag.get("vector_db_id") + if not vector_db_id: + logger.warning("BYOK RAG entry missing vector_db_id, skipping") + continue output.append( { - "vector_db_id": brag.get("vector_db_id", ""), - "provider_id": "byok_" + brag.get("vector_db_id", ""), + "vector_db_id": vector_db_id, + "provider_id": f"byok_{vector_db_id}", "embedding_model": brag.get("embedding_model", ""), "embedding_dimension": brag.get("embedding_dimension"), } )
310-312: CLI should exit with non-zero status on error.When the config file is not found, the function logs an error and returns. For a CLI entry point, this should exit with a non-zero status code to indicate failure to shell scripts and CI pipelines.
Proposed fix
+import sys + ... except FileNotFoundError: logger.error("Config not found: %s", args.config) - return + sys.exit(1)
🧹 Nitpick comments (4)
src/llama_stack_configuration.py (3)
139-140: Reference assignment mutates input config.
output = ls_config["vector_dbs"]creates a reference, not a copy. Subsequentoutput.append()calls modifyls_config["vector_dbs"]directly. This is inconsistent with the function's documented behavior of "building" a section and may cause unexpected side effects if callers reusels_config.Proposed fix: Use a copy
# fill-in existing vector_dbs entries if "vector_dbs" in ls_config: - output = ls_config["vector_dbs"] + output = list(ls_config["vector_dbs"])
187-188: Reference assignment mutates input config.Same issue as
construct_vector_dbs_section-output = ls_config["providers"]["vector_io"]is a reference, not a copy.Proposed fix
if "providers" in ls_config and "vector_io" in ls_config["providers"]: - output = ls_config["providers"]["vector_io"] + output = list(ls_config["providers"]["vector_io"])
83-100: Security consideration: Token written to plaintext .env file.Writing the Azure access token to a
.envfile leaves credentials in plaintext on disk. This may be acceptable for development/containerized environments, but consider:
- Setting restrictive file permissions (e.g.,
0600)- Documenting this behavior and its security implications
Optional: Set restrictive permissions
+ import stat + with open(env_file, "w", encoding="utf-8") as f: f.writelines(lines) + os.chmod(env_file, stat.S_IRUSR | stat.S_IWUSR) # 0600tests/unit/test_llama_stack_configuration.py (1)
132-132: The test configuration filetests/configuration/run.yamlexists in the repository. However, consider whether reducing external file dependencies would improve test maintainability. If desired, create a minimal test configuration inline or via a fixture instead of relying on the committed file.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
docs/providers.mdpyproject.tomlscripts/llama-stack-entrypoint.shsrc/app/main.pysrc/llama_stack_configuration.pytests/unit/test_llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (2)
- scripts/llama-stack-entrypoint.sh
- docs/providers.md
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Use FastAPI dependencies:from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, usingtyping_extensions.Selffor model validators
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Runuv run make formatto auto-format code with black and ruff before completion
Runuv run make verifyto run all linters (black, pyl...
Files:
src/llama_stack_configuration.pytests/unit/test_llama_stack_configuration.pysrc/app/main.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Usepytest-mockfor AsyncMock objects in tests
UseMOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")pattern for authentication mocks in tests
Files:
tests/unit/test_llama_stack_configuration.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion
Files:
tests/unit/test_llama_stack_configuration.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/main.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Checkpyproject.tomlfor supported Python versions before development
Always checkpyproject.tomlfor existing dependencies before adding new ones
Always verify current library versions inpyproject.tomlrather than assuming versions
Use pylint withsource-roots = "src"configuration
Files:
pyproject.toml
🧠 Learnings (2)
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use `logger.error()` for serious problems that prevented function execution
Applied to files:
src/llama_stack_configuration.py
🧬 Code graph analysis (3)
src/llama_stack_configuration.py (1)
src/models/config.py (3)
config(325-342)config(1306-1314)dump(1607-1619)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
generate_configuration(239-267)construct_vector_dbs_section(115-157)construct_vector_io_providers_section(160-210)
src/app/main.py (2)
src/authorization/azure_token_manager.py (3)
AzureEntraIDManager(22-100)set_config(39-42)refresh_token(59-74)src/configuration.py (2)
configuration(95-106)azure_entra_id(345-349)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / ci
🔇 Additional comments (4)
src/app/main.py (1)
12-12: LGTM!Import of
AzureEntraIDManageris correctly added for the new Azure Entra ID integration.tests/unit/test_llama_stack_configuration.py (3)
1-13: LGTM on test structure and imports.Tests are correctly structured with pytest and properly import the functions under test.
28-74: Good coverage forconstruct_vector_dbs_section.Tests cover empty input, preserving existing entries, adding new entries, and merging - all key scenarios.
81-110: Good coverage forconstruct_vector_io_providers_section.Tests cover empty input, preserving existing entries, and adding new entries.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
afc72ca to
50bc95d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/llama_stack_configuration.py (2)
142-154: In-place modification of input parameter.Line 143 assigns a reference to the existing list, so subsequent
append()calls modifyls_config["vector_dbs"]in place. Per coding guidelines, avoid in-place parameter modification; return new data structures instead.Additionally, empty
vector_db_idproduces malformedprovider_idof"byok_"(previously flagged).Proposed fix
output = [] # fill-in existing vector_dbs entries if "vector_dbs" in ls_config: - output = ls_config["vector_dbs"] + output = list(ls_config["vector_dbs"]) # append new vector_dbs entries for brag in byok_rag: + vector_db_id = brag.get("vector_db_id") + if not vector_db_id: + logger.warning("BYOK RAG entry missing vector_db_id, skipping") + continue output.append( { - "vector_db_id": brag.get("vector_db_id", ""), - "provider_id": "byok_" + brag.get("vector_db_id", ""), - "embedding_model": brag.get("embedding_model", ""), + "vector_db_id": vector_db_id, + "provider_id": f"byok_{vector_db_id}", + "embedding_model": brag.get("embedding_model", ""), "embedding_dimension": brag.get("embedding_dimension"), } )
189-207: Same in-place modification and validation issues.Similar to
construct_vector_dbs_section: line 191 assigns a reference causing in-place modification, and emptyvector_db_idproduces malformed paths like".llama/.db".Proposed fix
output = [] # fill-in existing vector_io entries if "providers" in ls_config and "vector_io" in ls_config["providers"]: - output = ls_config["providers"]["vector_io"] + output = list(ls_config["providers"]["vector_io"]) # append new vector_io entries for brag in byok_rag: + vector_db_id = brag.get("vector_db_id") + if not vector_db_id: + continue # Already warned in construct_vector_dbs_section output.append( { - "provider_id": "byok_" + brag.get("vector_db_id", ""), + "provider_id": f"byok_{vector_db_id}", "provider_type": brag.get("rag_type", "inline::faiss"), "config": { "kvstore": { - "db_path": ".llama/" + brag.get("vector_db_id", "") + ".db", + "db_path": f".llama/{vector_db_id}.db", "namespace": None, "type": "sqlite", } }, } )
🤖 Fix all issues with AI agents
In `@src/app/endpoints/streaming_query.py`:
- Around line 895-916: After calling AzureEntraIDManager().refresh_token(),
verify the refresh succeeded and that AzureEntraIDManager().access_token
contains a non-empty/valid token before proceeding; if refresh failed or
access_token is empty, log/error and skip updating the client or raise so you
don't call AsyncLlamaStackClientHolder().update_provider_data with an invalid
token. Specifically, in the block where refresh_token() is invoked, check the
return value or AzureEntraIDManager().access_token.get_secret_value() for
presence/validity, only call
AsyncLlamaStackClientHolder().reload_library_client() or
update_provider_data(...) when valid, and handle/log the failure path (do not
continue to call providers.list() / update_provider_data with an empty token).
In `@src/authorization/azure_token_manager.py`:
- Around line 59-71: The tests are awaiting a synchronous method: keep
refresh_token as a regular def on AzureTokenManager (refresh_token) and update
the tests to stop using await; specifically remove the await usages in
tests/unit/authorization/test_azure_token_manager.py (the calls at lines ~84,
~114, ~142) so they call token_manager.refresh_token() directly, or
alternatively convert refresh_token to async def and adjust all production call
sites (main.py, query.py, streaming_query.py) to await it and run the blocking
_retrieve_access_token()/ClientSecretCredential.get_token() in an executor —
pick one approach and apply consistently (recommended: keep refresh_token
synchronous and change the tests to remove await).
In `@src/llama_stack_configuration.py`:
- Around line 102-107: The code writes the Azure token to env_file but doesn't
set it in the running process, so calls like generate_configuration that rely on
os.environ won't see AZURE_API_KEY; after writing the file (the block that opens
env_file and calls logger.info) set os.environ["AZURE_API_KEY"] = <token_value>
(the same value you wrote into lines) so the current process environment matches
the .env on disk, and keep the logger.info call as-is to record the write.
♻️ Duplicate comments (4)
pyproject.toml (1)
63-65: Add version constraints for Azure dependencies.The
azure-coreandazure-identitypackages lack version constraints, which is inconsistent with the project's dependency versioning pattern (most dependencies specify minimum versions like>=x.y.z).Suggested fix
# Used for Azure Entra ID token management - "azure-core", - "azure-identity", + "azure-core>=1.30.0", + "azure-identity>=1.15.0",src/app/main.py (1)
44-48: Token retrieval failures are silently ignored during startup.If
refresh_token()fails to obtain a token (e.g., due toClientAuthenticationError), it silently returns without raising an exception, and the service continues with an emptyAZURE_API_KEY. This will cause all subsequent Azure API calls to fail with authentication errors.Consider verifying the token was obtained after refresh:
Suggested fix
azure_config = configuration.configuration.azure_entra_id if azure_config is not None: AzureEntraIDManager().set_config(azure_config) AzureEntraIDManager().refresh_token() + if not AzureEntraIDManager().access_token.get_secret_value(): + logger.error("Failed to obtain Azure access token during startup") + raise RuntimeError("Azure Entra ID configured but token acquisition failed")src/llama_stack_configuration.py (2)
109-111: Authentication failures are silently swallowed.This was previously flagged. When
ClientAuthenticationErrororCredentialUnavailableErroroccurs, the function logs an error but continues execution. Callers have no way to know authentication failed, which may lead to runtime failures later when the missingAZURE_API_KEYcauses API calls to fail.
309-314: CLI should handle errors and exit with non-zero status.The CLI entry point lacks error handling for file operations. When config or input files are not found, the function should log the error and exit with a non-zero status code to signal failure to shell scripts and CI pipelines. This was previously flagged.
Proposed fix
+import sys + ... def main() -> None: """CLI entry point.""" ... args = parser.parse_args() + try: with open(args.config, "r", encoding="utf-8") as f: config = yaml.safe_load(f) config = replace_env_vars(config) - - generate_configuration(args.input, args.output, config, args.env_file) + generate_configuration(args.input, args.output, config, args.env_file) + except FileNotFoundError as e: + logger.error("File not found: %s", e.filename) + sys.exit(1) + except Exception as e: + logger.error("Failed to generate configuration: %s", e) + sys.exit(1)
🧹 Nitpick comments (3)
scripts/llama-stack-entrypoint.sh (1)
7-10: Path collision risk remains between INPUT_CONFIG and ENRICHED_CONFIG.When
LLAMA_STACK_CONFIGis not set, bothINPUT_CONFIGandENRICHED_CONFIGdefault to/opt/app-root/run.yaml. While theENRICHMENT_FAILEDflag on line 28 mitigates incorrect "enriched" reporting, it would be clearer to use distinct paths.Suggested fix
INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}" -ENRICHED_CONFIG="/opt/app-root/run.yaml" +ENRICHED_CONFIG="/opt/app-root/run-enriched.yaml" LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}" ENV_FILE="/opt/app-root/.env"src/llama_stack_configuration.py (2)
223-225: Minor: Prefer Pythonic empty check.Use
if not byok_rag:instead ofif len(byok_rag) == 0:for checking empty lists.- if len(byok_rag) == 0: + if not byok_rag: logger.info("BYOK RAG is not configured: skipping") return
258-270: Missing error handling for file operations.The function does not handle
FileNotFoundErrorforinput_fileorPermissionError/OSErrorfor write operations. For a function intended to be called as a module API, consider adding error handling or documenting potential exceptions in the docstring.Proposed approach
def generate_configuration( input_file: str, output_file: str, config: dict[str, Any], env_file: str = ".env", ) -> None: """Generate enriched Llama Stack configuration for service/container mode. Args: input_file: Path to input Llama Stack config output_file: Path to write enriched config config: Lightspeed config dict (from YAML) env_file: Path to .env file + + Raises: + FileNotFoundError: If input_file does not exist + OSError: If output_file cannot be written """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
docs/providers.mdpyproject.tomlscripts/llama-stack-entrypoint.shsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/main.pysrc/authorization/azure_token_manager.pysrc/llama_stack_configuration.pytests/unit/test_llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/providers.md
- src/app/endpoints/query.py
- tests/unit/test_llama_stack_configuration.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Use FastAPI dependencies:from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, usingtyping_extensions.Selffor model validators
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Runuv run make formatto auto-format code with black and ruff before completion
Runuv run make verifyto run all linters (black, pyl...
Files:
src/app/endpoints/streaming_query.pysrc/authorization/azure_token_manager.pysrc/llama_stack_configuration.pysrc/app/main.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/main.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Checkpyproject.tomlfor supported Python versions before development
Always checkpyproject.tomlfor existing dependencies before adding new ones
Always verify current library versions inpyproject.tomlrather than assuming versions
Use pylint withsource-roots = "src"configuration
Files:
pyproject.toml
🧠 Learnings (6)
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.
Applied to files:
src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to pyproject.toml : Always check `pyproject.toml` for existing dependencies before adding new ones
Applied to files:
pyproject.toml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data
Applied to files:
src/authorization/azure_token_manager.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use `logger.error()` for serious problems that prevented function execution
Applied to files:
src/llama_stack_configuration.py
🧬 Code graph analysis (4)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
AzureEntraIDManager(22-97)is_entra_id_configured(45-47)is_token_expired(50-52)refresh_token(59-71)access_token(55-57)src/client.py (4)
AsyncLlamaStackClientHolder(21-164)is_library_client(28-30)reload_library_client(117-131)update_provider_data(133-164)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
configuration(95-106)src/models/config.py (1)
AzureEntraIdConfiguration(1482-1493)src/utils/types.py (1)
Singleton(43-57)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
config(325-342)config(1306-1314)dump(1607-1619)src/lightspeed_stack.py (1)
main(69-116)
src/app/main.py (3)
src/authorization/azure_token_manager.py (3)
AzureEntraIDManager(22-97)set_config(39-42)refresh_token(59-71)tests/integration/test_rh_identity_integration.py (1)
client(17-46)src/configuration.py (2)
configuration(95-106)azure_entra_id(345-349)
🪛 GitHub Actions: Unit tests
src/authorization/azure_token_manager.py
[error] 1-1: AzureEntraIDTokenManager.refresh_token is not awaitable / returns None, causing TypeError when awaited during pytest run. Ensure the method is defined as an async function or returns an awaitable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / ci
- GitHub Check: build-pr
🔇 Additional comments (5)
src/app/endpoints/streaming_query.py (1)
52-52: LGTM!Import follows the project's absolute import convention.
src/authorization/azure_token_manager.py (2)
22-42: LGTM!The singleton pattern is correctly implemented using the project's
Singletonmetaclass. Configuration handling is clean with proper type annotations and logging.
73-80: LGTM!Token update logic correctly applies the expiration leeway and provides clear logging of the refresh event.
src/llama_stack_configuration.py (2)
1-20: LGTM on imports and module structure.The module docstring clearly explains the dual usage modes (script vs module), and imports follow the coding guidelines with
logger = logging.getLogger(__name__)pattern.
23-39: LGTM!The custom YAML dumper properly enforces block-style indentation with clear documentation.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
7f2bde9 to
fefbaca
Compare
fefbaca to
22841de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/lightspeed_stack.py (2)
26-40: Stale docstring references removed CLI options.The docstring documents
-g/--generate-llama-stack-configuration,-i/--input-config-file, and-o/--output-config-fileoptions (lines 33-36), but these have been removed from the parser. Update the docstring to match the actual arguments.📝 Suggested docstring fix
def create_argument_parser() -> ArgumentParser: """Create and configure argument parser object. The parser includes these options: - -v / --verbose: enable verbose output - -d / --dump-configuration: dump the loaded configuration to JSON and exit - -c / --config: path to the configuration file (default "lightspeed-stack.yaml") - - -g / --generate-llama-stack-configuration: generate a Llama Stack - configuration from the service configuration - - -i / --input-config-file: Llama Stack input configuration filename (default "run.yaml") - - -o / --output-config-file: Llama Stack output configuration filename (default "run_.yaml") Returns: Configured ArgumentParser for parsing the service CLI options. """
69-86:main()docstring references removed functionality.The docstring mentions
--generate-llama-stack-configuration(lines 77-79) and "Llama Stack generation fails" (line 84), but this feature has been moved tosrc/llama_stack_configuration.py. Update to reflect current behavior.📝 Suggested docstring fix
"""Entry point to the web service. Start the Lightspeed Core Stack service process based on CLI flags and configuration. Parses command-line arguments, loads the configured settings, and then: - If --dump-configuration is provided, writes the active configuration to configuration.json and exits (exits with status 1 on failure). - - If --generate-llama-stack-configuration is provided, generates and stores - the Llama Stack configuration to the specified output file and exits - (exits with status 1 on failure). - Otherwise, sets LIGHTSPEED_STACK_CONFIG_PATH for worker processes, starts the quota scheduler, and starts the Uvicorn web service. Raises: - SystemExit: when configuration dumping or Llama Stack generation fails - (exits with status 1). + SystemExit: when configuration dumping fails (exits with status 1). """
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query.py`:
- Around line 318-339: The token refresh path can proceed with a missing/empty
token and set "None" as a string for api_base; after calling
AzureEntraIDManager().refresh_token() verify the refresh succeeded and that
AzureEntraIDManager().access_token (or its secret value) is non-empty before
updating the client via AsyncLlamaStackClientHolder().update_provider_data; if
refresh failed, stop and surface/log/raise an error instead of updating the
provider data. Also sanitize azure_config.get("api_base") by checking for None
and only include it (or pass None/not set) rather than converting None to the
string "None" when building the dict for update_provider_data.
In `@test.containerfile`:
- Around line 13-17: The Dockerfile adds execute permission to the Python module
llama_stack_configuration.py even though it’s invoked via the interpreter;
update the RUN line so only enrich-entrypoint.sh is made executable (remove +x
for llama_stack_configuration.py) and still chown both files (or chown them in a
separate command); specifically modify the chmod/chown command that references
/opt/app-root/enrich-entrypoint.sh and
/opt/app-root/llama_stack_configuration.py so that chmod applies only to
enrich-entrypoint.sh while chown continues to set ownership for both files.
♻️ Duplicate comments (10)
src/app/endpoints/streaming_query.py (1)
895-916: Token refresh failure not handled; duplicated logic with query.py.This block has the same issue as
query.py: ifrefresh_token()fails silently, the code proceeds with a potentially empty token. Additionally, this logic is duplicated verbatim fromquery.pylines 318-339.Consider:
- Validating the token after refresh (see suggestion for query.py)
- Extracting the token refresh logic into a shared utility to avoid duplication
♻️ Suggested refactor to shared utility
Create a helper in a shared module (e.g.,
utils/azure_token.py):async def refresh_azure_token_if_needed( provider_id: str, client: AsyncLlamaStackClient, client_holder: AsyncLlamaStackClientHolder, ) -> AsyncLlamaStackClient: """Refresh Azure token if expired and return updated client.""" if ( provider_id != "azure" or not AzureEntraIDManager().is_entra_id_configured or not AzureEntraIDManager().is_token_expired ): return client AzureEntraIDManager().refresh_token() if not AzureEntraIDManager().access_token.get_secret_value(): raise ValueError("Failed to refresh Azure access token") if client_holder.is_library_client: return await client_holder.reload_library_client() azure_config = next( p.config for p in await client.providers.list() if p.provider_type == "remote::azure" ) api_base = azure_config.get("api_base") if not api_base: raise ValueError("Azure provider config missing api_base") return client_holder.update_provider_data({ "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(), "azure_api_base": str(api_base), })src/app/main.py (1)
44-47: Token retrieval failures are silently ignored during startup.The
refresh_token()method catchesClientAuthenticationErrorandCredentialUnavailableErrorinternally and returnsNone, allowing startup to continue with an emptyAZURE_API_KEY. Subsequent Azure API calls will fail with confusing errors.Consider validating the token after refresh and failing startup if authentication fails:
🔧 Suggested validation
azure_config = configuration.configuration.azure_entra_id if azure_config is not None: AzureEntraIDManager().set_config(azure_config) AzureEntraIDManager().refresh_token() + + if not AzureEntraIDManager().access_token.get_secret_value(): + logger.error("Failed to obtain Azure Entra ID access token during startup") + raise RuntimeError( + "Azure Entra ID authentication failed. " + "Check tenant_id, client_id, and client_secret configuration." + )src/authorization/azure_token_manager.py (1)
59-71: Missing failure signal prevents callers from handling token refresh errors.
_retrieve_access_token()logs a warning when authentication fails and returnsNone, butrefresh_token()silently accepts this and returns without signaling to callers. All three call sites instreaming_query.py,main.py, andquery.pydo not (and cannot) check whether refresh succeeded, allowing the code to proceed with a potentially stale or missing token.While error visibility exists via warning logs, the method design prevents callers from handling failures gracefully. Returning a success indicator or raising an exception on failure would allow appropriate error handling.
🔧 Option 1: Return success indicator
- def refresh_token(self) -> None: + def refresh_token(self) -> bool: """Refresh the cached Azure access token. + Returns: + bool: True if token was successfully refreshed, False otherwise. + Raises: ValueError: If Entra ID configuration has not been set. """ if self._entra_id_config is None: raise ValueError("Azure Entra ID configuration not set") logger.info("Refreshing Azure access token") token_obj = self._retrieve_access_token() if token_obj: self._update_access_token(token_obj.token, token_obj.expires_on) + return True + return False🔧 Option 2: Raise on failure
token_obj = self._retrieve_access_token() if token_obj: self._update_access_token(token_obj.token, token_obj.expires_on) + else: + raise RuntimeError("Failed to retrieve Azure access token")tests/unit/test_llama_stack_configuration.py (1)
118-183: Add test coverage for untested functions to meet 60% code coverage requirement.The test file covers
construct_vector_dbs_section,construct_vector_io_providers_section, andgenerate_configuration, but missing tests for:
setup_azure_entra_id_token- Azure token generation and .env file handlingenrich_byok_rag- orchestration function for BYOK RAG enrichmentmain()- CLI entry pointKey scenarios to add:
setup_azure_entra_id_token: skips whenAZURE_API_KEYset, skips when config is None, handles missing credentials, writes token to file, catches auth errorsenrich_byok_rag: returns early when list empty, modifies config in placemain(): argument parsing, config loading, FileNotFoundError handlingBased on learnings, unit tests require 60% code coverage.
examples/azure-run.yaml (1)
71-77: Update Azure OpenAI API version to a current release.The inference provider correctly uses
${env.AZURE_API_KEY}for credential injection, aligning with Azure Entra ID token management. However,api_version: 2025-01-01-previewis outdated. Consider updating to a more recent version like2025-04-01-preview.src/llama_stack_configuration.py (4)
109-110: Authentication failures are silently swallowed.When
ClientAuthenticationErrororCredentialUnavailableErroroccurs, the function logs an error but continues. Callers have no way to know authentication failed, which may cause API calls to fail later due to missingAZURE_API_KEY.Consider re-raising the exception or returning a status indicator. As per coding guidelines, use
logger.error()for serious problems that prevented function execution - but the function should also propagate the failure.
102-107: Token written to file but not set in current process environment.After writing the token to
.env, theAZURE_API_KEYwon't be available viaos.environin the current process. If downstream code needs the token during the same execution, it won't find it.♻️ Proposed fix
with open(env_file, "w", encoding="utf-8") as f: f.writelines(lines) + # Make token available to current process + os.environ["AZURE_API_KEY"] = token.token + logger.info( "Azure Entra ID: Access token set in env and written to %s", env_file )
146-152: Missingvector_db_idproduces malformed provider_id.If
brag.get("vector_db_id")returnsNoneor empty string,provider_idbecomes"byok_"which is likely invalid. Consider validatingvector_db_idbefore appending.♻️ Proposed fix with validation
for brag in byok_rag: + vector_db_id = brag.get("vector_db_id") + if not vector_db_id: + logger.warning("BYOK RAG entry missing vector_db_id, skipping") + continue output.append( { - "vector_db_id": brag.get("vector_db_id", ""), - "provider_id": "byok_" + brag.get("vector_db_id", ""), + "vector_db_id": vector_db_id, + "provider_id": f"byok_{vector_db_id}", "embedding_model": brag.get("embedding_model", ""), "embedding_dimension": brag.get("embedding_dimension"), } )
309-313: CLI lacks error handling for missing config file.The
main()function opensargs.configwithout a try-except block. If the file doesn't exist, it will raise an unhandledFileNotFoundError. For a CLI entry point, this should catch the exception and exit with a non-zero status code.♻️ Proposed fix
+import sys + ... - with open(args.config, "r", encoding="utf-8") as f: - config = yaml.safe_load(f) - config = replace_env_vars(config) + try: + with open(args.config, "r", encoding="utf-8") as f: + config = yaml.safe_load(f) + config = replace_env_vars(config) + except FileNotFoundError: + logger.error("Config not found: %s", args.config) + sys.exit(1) generate_configuration(args.input, args.output, config, args.env_file)src/client.py (1)
163-163: Removecopy()method call — it does not exist onAsyncLlamaStackClient.The code at line 163 calls
self._lsc.copy(set_default_headers=updated_headers), butAsyncLlamaStackClientv0.3.5 does not expose acopy()method. Usewith_options()or create a newAsyncLlamaStackClientinstance with the updated headers instead. The# type: ignorecomment masks this as a type error rather than fixing the underlying issue.
🧹 Nitpick comments (10)
src/configuration.py (1)
344-349: LGTM!The property follows the established pattern in the class with proper
LogicErrorguard and correct return type. The implementation is consistent with other configuration accessors.Optional: For full alignment with the Google Python docstring conventions referenced in the coding guidelines, consider expanding the docstring to include
ReturnsandRaisessections, similar to other properties likeconfigurationorservice_configuration.📝 Optional docstring enhancement
`@property` def azure_entra_id(self) -> Optional[AzureEntraIdConfiguration]: - """Return Azure Entra ID configuration, or None if not provided.""" + """Return Azure Entra ID configuration. + + Returns: + Optional[AzureEntraIdConfiguration]: The Azure Entra ID + configuration if provided, otherwise None. + + Raises: + LogicError: If the configuration has not been loaded. + """ if self._configuration is None: raise LogicError("logic error: configuration is not loaded") return self._configuration.azure_entra_idMakefile (1)
20-23: Consider adding validation for.envfile andAZURE_API_KEY.The
run-llama-stacktarget readsAZURE_API_KEYfrom.envsilently. If the file doesn't exist or the key is missing,AZURE_API_KEYwill be empty, which may cause the llama stack to fail with a confusing error.Consider adding a check:
🛠️ Suggested improvement for robustness
run-llama-stack: ## Start Llama Stack with enriched config (for local service mode) uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \ + `@if` [ ! -f .env ]; then echo "Error: .env file not found"; exit 1; fi && \ + `@if` ! grep -q '^AZURE_API_KEY=' .env; then echo "Error: AZURE_API_KEY not found in .env"; exit 1; fi && \ AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2-) \ uv run llama stack run $(LLAMA_STACK_CONFIG)Alternatively, document the
.envrequirement in the target's help comment..github/workflows/e2e_tests_providers.yaml (1)
55-68: Echo-based YAML modification is safe in this context.All test configuration files properly end with newlines, eliminating the concatenation risk mentioned. The code defensively adds a blank line before the
azure_entra_id:block, providing additional safety. The approach is acceptable for these controlled test configurations.For future robustness or when handling untrusted YAML files, consider
yqas an alternative, but this is not required here.examples/lightspeed-stack-azure-entraid-service.yaml (1)
16-17: Consider using environment variable forapi_key.The
api_key: xyzzyappears to be a placeholder. For consistency with the Azure Entra ID credentials (which use${env.VAR}syntax), consider using an environment variable reference here as well to avoid accidentally committing real credentials.💡 Suggested change
url: http://localhost:8321 - api_key: xyzzy + api_key: ${env.LLAMA_STACK_API_KEY:xyzzy}tests/unit/test_configuration.py (1)
849-877: Test validates incomplete Azure Entra ID config correctly.Consider adding a test case to
test_default_configuration()for theazure_entra_idproperty to maintain consistency with other configuration properties.♻️ Optional: Add consistency test
Add to
test_default_configuration():with pytest.raises(Exception, match="logic error: configuration is not loaded"): # try to read property _ = cfg.azure_entra_id # pylint: disable=pointless-statementtests/unit/test_client.py (2)
13-16: Fixture missing cleanup and type hint.The fixture resets singleton state before tests but doesn't clean up after. Follow the pattern from
tests/unit/test_configuration.py:♻️ Suggested fix
+from typing import Generator + + `@pytest.fixture`(autouse=True) -def reset_singleton() -> None: +def reset_singleton() -> Generator: """Reset singleton state between tests.""" Singleton._instances = {} + yield + Singleton._instances = {}
101-107: Test accesses internal_custom_headersattribute.Setting
_custom_headersdirectly is fragile as it depends on internal httpx/client implementation. The test works because this eventually gets merged intodefault_headers, but consider using a cleaner approach:♻️ Alternative: Use update_provider_data to set initial state
- # Pre-populate with existing provider data via headers - original_client._custom_headers["X-LlamaStack-Provider-Data"] = json.dumps( - { - "existing_field": "keep_this", - "azure_api_key": "old_token", - } - ) - - updated_client = holder.update_provider_data( + # First call to set initial provider data + holder.update_provider_data( + { + "existing_field": "keep_this", + "azure_api_key": "old_token", + } + ) + intermediate_client = holder.get_client() + + # Second call to update - this is what we're testing + updated_client = holder.update_provider_data( { "azure_api_key": "new_token", "azure_api_base": "https://new.example.com", } ) + + # Returns new client and updates holder + assert updated_client is not intermediate_clientsrc/authorization/azure_token_manager.py (1)
95-97: Exception details not logged.The warning message doesn't include the exception details, making troubleshooting difficult. Consider logging the exception:
♻️ Log exception details
- except (ClientAuthenticationError, CredentialUnavailableError): - logger.warning("Failed to retrieve Azure access token") + except (ClientAuthenticationError, CredentialUnavailableError) as e: + logger.warning("Failed to retrieve Azure access token: %s", e) return Nonesrc/client.py (2)
88-89: Fixed temp file path may cause issues in concurrent deployments.Using
tempfile.gettempdir()with a fixed filenamellama_stack_enriched_config.yamlcould cause race conditions if multiple processes enrich configs simultaneously. Consider usingtempfile.NamedTemporaryFilewithdelete=Falsefor unique paths.♻️ Proposed fix using unique temp file
- enriched_path = os.path.join( - tempfile.gettempdir(), "llama_stack_enriched_config.yaml" - ) + # Use unique temp file to avoid race conditions + fd, enriched_path = tempfile.mkstemp( + suffix=".yaml", prefix="llama_stack_enriched_" + ) + os.close(fd) # Close fd, will reopen with encoding
79-84: Silent fallback on config read errors may mask issues.When YAML read fails, the function logs a warning and returns the original path, but the caller proceeds as if BYOK RAG was configured. This could lead to confusing runtime behavior where RAG features silently don't work.
Consider logging at
errorlevel or raising the exception since BYOK RAG was explicitly configured but enrichment failed.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (38)
.github/workflows/e2e_tests.yaml.github/workflows/e2e_tests_providers.yamlMakefileREADME.mddocker-compose-library.yamldocker-compose.yamldocs/providers.mdexamples/azure-run.yamlexamples/lightspeed-stack-azure-entraid-lib.yamlexamples/lightspeed-stack-azure-entraid-service.yamlpyproject.tomlscripts/llama-stack-entrypoint.shsrc/app/endpoints/query.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/main.pysrc/authorization/azure_token_manager.pysrc/client.pysrc/configuration.pysrc/lightspeed_stack.pysrc/llama_stack_configuration.pysrc/models/config.pytest.containerfiletests/e2e/configs/run-azure.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yamltests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/features/steps/info.pytests/unit/authentication/test_api_key_token.pytests/unit/authorization/test_azure_token_manager.pytests/unit/models/config/test_dump_configuration.pytests/unit/test_client.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.py
💤 Files with no reviewable changes (6)
- tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
- tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
- tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
- tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
- tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
- src/app/endpoints/query_v2.py
✅ Files skipped from review due to trivial changes (1)
- tests/e2e/configs/run-azure.yaml
🚧 Files skipped from review as they are similar to previous changes (12)
- scripts/llama-stack-entrypoint.sh
- tests/unit/authentication/test_api_key_token.py
- README.md
- docs/providers.md
- src/models/config.py
- tests/e2e/features/steps/info.py
- tests/unit/authorization/test_azure_token_manager.py
- tests/unit/models/config/test_dump_configuration.py
- .github/workflows/e2e_tests.yaml
- pyproject.toml
- tests/e2e/configuration/server-mode/lightspeed-stack.yaml
- docker-compose-library.yaml
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Use FastAPI dependencies:from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, usingtyping_extensions.Selffor model validators
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Runuv run make formatto auto-format code with black and ruff before completion
Runuv run make verifyto run all linters (black, pyl...
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/lightspeed_stack.pysrc/client.pytests/unit/test_llama_stack_configuration.pysrc/app/main.pysrc/configuration.pytests/unit/test_client.pysrc/authorization/azure_token_manager.pytests/unit/test_configuration.pysrc/llama_stack_configuration.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/main.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Usepytest-mockfor AsyncMock objects in tests
UseMOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")pattern for authentication mocks in tests
Files:
tests/unit/test_llama_stack_configuration.pytests/unit/test_client.pytests/unit/test_configuration.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion
Files:
tests/unit/test_llama_stack_configuration.pytests/unit/test_client.pytests/unit/test_configuration.py
🧠 Learnings (13)
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.
Applied to files:
src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
src/app/endpoints/streaming_query.pysrc/lightspeed_stack.pysrc/client.pytests/unit/test_client.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling
Applied to files:
src/lightspeed_stack.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
src/lightspeed_stack.pyexamples/azure-run.yamlsrc/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Unit tests require 60% code coverage
Applied to files:
tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Write unit tests covering new functionality before completion
Applied to files:
tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/integration/**/*.py : Integration tests require 10% code coverage
Applied to files:
tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
src/configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data
Applied to files:
src/authorization/azure_token_manager.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use `logger.error()` for serious problems that prevented function execution
Applied to files:
src/llama_stack_configuration.py
🧬 Code graph analysis (9)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (5)
AzureEntraIDManager(22-97)is_entra_id_configured(45-47)is_token_expired(50-52)refresh_token(59-71)access_token(55-57)src/client.py (3)
is_library_client(28-30)reload_library_client(117-131)update_provider_data(133-164)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
AzureEntraIDManager(22-97)is_entra_id_configured(45-47)is_token_expired(50-52)refresh_token(59-71)access_token(55-57)src/client.py (4)
AsyncLlamaStackClientHolder(21-164)is_library_client(28-30)reload_library_client(117-131)update_provider_data(133-164)
src/client.py (2)
src/configuration.py (2)
configuration(95-106)llama_stack_configuration(123-134)src/llama_stack_configuration.py (2)
enrich_byok_rag(216-234)YamlDumper(23-39)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
generate_configuration(242-270)construct_vector_dbs_section(118-160)construct_vector_io_providers_section(163-213)
src/app/main.py (2)
src/authorization/azure_token_manager.py (3)
AzureEntraIDManager(22-97)set_config(39-42)refresh_token(59-71)src/configuration.py (2)
configuration(95-106)azure_entra_id(345-349)
src/configuration.py (1)
src/models/config.py (1)
AzureEntraIdConfiguration(1537-1548)
tests/unit/test_client.py (2)
src/client.py (6)
AsyncLlamaStackClientHolder(21-164)load(32-40)get_client(101-115)is_library_client(28-30)update_provider_data(133-164)reload_library_client(117-131)src/utils/types.py (1)
Singleton(43-57)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
configuration(95-106)src/models/config.py (1)
AzureEntraIdConfiguration(1537-1548)src/utils/types.py (1)
Singleton(43-57)
tests/unit/test_configuration.py (1)
src/configuration.py (2)
load_configuration(65-74)azure_entra_id(345-349)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / ci
🔇 Additional comments (30)
tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)
18-19: LGTM!The configuration maintains the
noopauthentication module as expected for library-mode e2e tests. Based on learnings, noop authentication tests use this default configuration, which is the correct setup.src/configuration.py (1)
14-14: LGTM!The import follows the absolute import pattern and is correctly placed with other model imports.
.github/workflows/e2e_tests_providers.yaml (2)
190-192: LGTM!The Azure Entra ID credentials are properly sourced from GitHub secrets and align with the transition from static API keys to dynamic token management.
221-223: LGTM!Library mode step correctly mirrors the server mode credential setup, maintaining consistency across deployment modes.
test.containerfile (2)
6-6: LGTM!Adding
azure-identityis necessary for Azure Entra ID token acquisition. The package is appropriate for this authentication flow.
21-22: LGTM!The ENTRYPOINT correctly references the enrichment script that handles configuration enrichment and initial token generation before service startup, as clarified in past comments.
Makefile (2)
13-15: LGTM!Good use of conditional assignment (
?=) allowing users to override configuration files via command line (e.g.,make run CONFIG=custom.yaml).
17-18: LGTM!The run target now properly accepts a configurable configuration file path.
src/app/endpoints/query.py (1)
72-73: LGTM!Import is correctly placed and follows the absolute import pattern per coding guidelines.
src/app/main.py (1)
12-12: LGTM!Import follows the absolute import pattern per coding guidelines.
src/app/endpoints/streaming_query.py (1)
52-52: LGTM!Import follows the absolute import pattern.
examples/lightspeed-stack-azure-entraid-lib.yaml (1)
1-25: LGTM!Configuration structure is clear and follows the expected format for library client mode. Environment variable references for secrets follow best practices.
examples/lightspeed-stack-azure-entraid-service.yaml (2)
27-30: Verify ifscopeis required in the Azure Entra ID configuration.Same as the library config - the
scopefield may be required byAzureEntraIdConfiguration.
1-25: LGTM!Configuration structure is correct for service mode with appropriate comments explaining the alternatives.
tests/unit/test_configuration.py (1)
812-846: LGTM! Good coverage for Azure Entra ID configuration.The test validates that all three credential fields (
tenant_id,client_id,client_secret) are correctly loaded asSecretStrvalues. The pattern matches existing tests in the file.tests/unit/test_client.py (2)
69-84: LGTM! Exception type now matches implementation.The change from
ExceptiontoValueErrorcorrectly reflects the actual exception raised insrc/client.pyline 47.
133-152: LGTM! Good coverage for library client reload.The test properly validates that
reload_library_clientcreates a new client instance and updates the holder.docker-compose.yaml (2)
21-24: LGTM! Azure credentials properly configured for Entra ID flow.The switch from
AZURE_API_KEYto individualTENANT_ID,CLIENT_ID, andCLIENT_SECRETenvironment variables aligns with the new Azure Entra ID token management implementation.
14-14: Volume mount added for configuration file.The shared SELinux label (
:z) is appropriate since bothllama-stackandlightspeed-stackservices need read access tolightspeed-stack.yaml.src/authorization/azure_token_manager.py (4)
1-19: LGTM! Module setup follows coding guidelines.Imports, logging setup, and constant definition follow the project patterns. The 30-second leeway for token expiration is a reasonable safety margin.
22-42: Singleton token manager with proper initialization.The singleton pattern ensures consistent token state across the application. Configuration is properly typed with
AzureEntraIdConfiguration.
54-57:access_tokenreturns empty string when not set.The property returns
SecretStr("")whenAZURE_API_KEYis unset. Callers should checkis_entra_id_configuredandis_token_expiredbefore using the token value.
73-80: Token update implementation is correct.The leeway is properly applied to the expiration time, and the token is stored in the environment variable as designed (single source of truth for Llama Stack config injection).
tests/unit/test_llama_stack_configuration.py (2)
28-74: LGTM! Good coverage for construct_vector_dbs_section.Tests cover empty input, preserving existing entries, adding new entries, and merging behavior. Assertions validate the expected dict structure with
vector_db_id,provider_id,embedding_model, andembedding_dimension.
81-110: LGTM! Adequate coverage for construct_vector_io_providers_section.Tests validate empty inputs, preserving existing entries, and adding new BYOK RAG entries with correct
provider_idandprovider_typevalues.examples/azure-run.yaml (1)
1-137: LGTM! Configuration follows Llama Stack 0.3.x patterns.The configuration properly uses the 0.3.x storage backend structure with
kv_default/sql_defaultbackends, persistence namespaces, and registered_resources scaffolding. Thevector_ioprovider uses the correctpersistence.backendpattern per the Llama Stack 0.3.x format. Based on learnings, telemetry is correctly not configured under the providers section.src/client.py (2)
117-131: LGTM! Reload mechanism for library client.The
reload_library_clientmethod properly reinitializes the library client from the stored config path, enabling token refresh for library mode. The guard for missing_config_pathprovides clear error messaging.
53-53: The double.configurationaccess pattern is correct and intentional.The code correctly accesses a module-level singleton instance (
configuration) which is anAppConfigobject, then calls itsconfigurationproperty to retrieve the loadedConfigurationobject, then accesses thebyok_ragfield. This is a standard pattern when using singleton classes.src/llama_stack_configuration.py (2)
216-234: LGTM! Clean orchestration function for BYOK RAG enrichment.The
enrich_byok_ragfunction properly guards against empty input, initializes theprovidersdict if missing, and delegates to the construct functions. The in-place modification is documented in the docstring.
242-270: LGTM! Well-structured configuration generation.The
generate_configurationfunction properly orchestrates reading input, applying Azure Entra ID token setup, BYOK RAG enrichment, and writing output. The use ofYamlDumperensures consistent formatting.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
22841de to
b84740b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)
194-207: Samevector_db_idvalidation issue as inconstruct_vector_dbs_section.Apply consistent validation logic to skip entries with missing
vector_db_idto avoid malformedprovider_idanddb_pathvalues.Proposed fix: Add validation
for brag in byok_rag: + vector_db_id = brag.get("vector_db_id") + if not vector_db_id: + logger.warning("BYOK RAG entry missing vector_db_id in vector_io, skipping") + continue output.append( { - "provider_id": "byok_" + brag.get("vector_db_id", ""), + "provider_id": f"byok_{vector_db_id}", "provider_type": brag.get("rag_type", "inline::faiss"), "config": { "kvstore": { - "db_path": ".llama/" + brag.get("vector_db_id", "") + ".db", + "db_path": f".llama/{vector_db_id}.db", "namespace": None, "type": "sqlite", } }, } )
♻️ Duplicate comments (5)
test.containerfile (1)
13-17: Remove unnecessary execute permission from Python file.The
chmod +xapplies to both scripts, butllama_stack_configuration.pyis invoked viapython3interpreter (not as a direct executable) and lacks a shebang. Only the shell script needs execute permission.Proposed fix
-RUN chmod +x /opt/app-root/enrich-entrypoint.sh && \ - chown 1001:0 /opt/app-root/enrich-entrypoint.sh /opt/app-root/llama_stack_configuration.py +RUN chmod +x /opt/app-root/enrich-entrypoint.sh && \ + chown 1001:0 /opt/app-root/enrich-entrypoint.sh && \ + chown 1001:0 /opt/app-root/llama_stack_configuration.pysrc/llama_stack_configuration.py (3)
109-110: Authentication failures are silently swallowed.When
ClientAuthenticationErrororCredentialUnavailableErroroccurs, the function logs an error but continues. Callers cannot detect the failure, leading to runtime errors later whenAZURE_API_KEYis missing.Proposed fix: Re-raise after logging
except (ClientAuthenticationError, CredentialUnavailableError) as e: logger.error("Azure Entra ID: Failed to generate token: %s", e) + raise
102-107: Token written to file but not set in current process environment.After writing the token to
.env, downstream code in the same process won't have access toAZURE_API_KEYviaos.environ. Ifgenerate_configurationor other code needs this token during the same execution, it won't be available.Proposed fix: Set environment variable in current process
with open(env_file, "w", encoding="utf-8") as f: f.writelines(lines) + # Make token available to current process + os.environ["AZURE_API_KEY"] = token.token + logger.info( "Azure Entra ID: Access token set in env and written to %s", env_file )
146-154: Missingvector_db_idvalidation results in malformed entries.If
brag.get("vector_db_id")returnsNoneor empty string,provider_idbecomes"byok_"(empty suffix). Consider validatingvector_db_idbefore constructing the entry.Proposed fix: Add validation
for brag in byok_rag: + vector_db_id = brag.get("vector_db_id") + if not vector_db_id: + logger.warning("BYOK RAG entry missing vector_db_id, skipping") + continue output.append( { - "vector_db_id": brag.get("vector_db_id", ""), - "provider_id": "byok_" + brag.get("vector_db_id", ""), + "vector_db_id": vector_db_id, + "provider_id": f"byok_{vector_db_id}", "embedding_model": brag.get("embedding_model", ""), "embedding_dimension": brag.get("embedding_dimension"), } )tests/unit/test_llama_stack_configuration.py (1)
118-183: Good test coverage forgenerate_configuration, but missing tests for other functions.The tests effectively cover
generate_configurationscenarios. However, as noted in a past review, the module is missing tests for:
setup_azure_entra_id_token- token generation, skip conditions, error handlingenrich_byok_rag- empty list handling, config modificationmain()- CLI argument parsingPer retrieved learnings, unit tests require 60% code coverage.
Suggested test additions for setup_azure_entra_id_token
from unittest.mock import patch, MagicMock def test_setup_azure_entra_id_token_skips_when_env_set(monkeypatch: pytest.MonkeyPatch) -> None: """Test skips token generation when AZURE_API_KEY already set.""" monkeypatch.setenv("AZURE_API_KEY", "existing_token") # Should return early without error from llama_stack_configuration import setup_azure_entra_id_token setup_azure_entra_id_token({"tenant_id": "t", "client_id": "c", "client_secret": "s"}, ".env") def test_setup_azure_entra_id_token_skips_when_config_none() -> None: """Test skips when azure_config is None.""" from llama_stack_configuration import setup_azure_entra_id_token setup_azure_entra_id_token(None, ".env") # Should not raise def test_setup_azure_entra_id_token_warns_on_missing_fields(caplog: pytest.LogCaptureFixture) -> None: """Test warns when required fields are missing.""" from llama_stack_configuration import setup_azure_entra_id_token setup_azure_entra_id_token({"tenant_id": "t"}, ".env") assert "Missing required fields" in caplog.text
🧹 Nitpick comments (5)
src/authorization/azure_token_manager.py (1)
87-102: Consider logging the exception details for auth failures.The warning log on line 101 doesn't include the exception details, which could make debugging authentication issues harder in production.
Proposed improvement
except (ClientAuthenticationError, CredentialUnavailableError): - logger.warning("Failed to retrieve Azure access token") + logger.warning("Failed to retrieve Azure access token", exc_info=True) return Nonesrc/app/endpoints/query.py (1)
72-73: Import placement inconsistent with other imports.The
AzureEntraIDManagerimport is placed after the relative imports fromutils.types, breaking the import organization pattern. Consider moving it to group with otherauthorizationimports.Suggested placement
from authentication import get_auth_dependency from authentication.interface import AuthTuple from authorization.middleware import authorize +from authorization.azure_token_manager import AzureEntraIDManager from client import AsyncLlamaStackClientHoldersrc/app/endpoints/streaming_query.py (1)
894-915: Code duplication withquery.py- consider extracting shared helper.This Azure token refresh logic is nearly identical to lines 317-338 in
query.py. Consider extracting this into a shared helper function to reduce duplication and ensure consistent behavior.The same
str(azure_config.get("api_base"))concern applies here as noted in the query.py review.Suggested refactor: Extract shared helper
Create a helper in
src/authorization/azure_token_manager.pyor a shared utils module:async def refresh_azure_token_and_update_client( client: AsyncLlamaStackClient, client_holder: AsyncLlamaStackClientHolder, ) -> AsyncLlamaStackClient: """Refresh Azure token and update client if needed. Returns: Updated client instance. """ if not ( AzureEntraIDManager().is_entra_id_configured and AzureEntraIDManager().is_token_expired and AzureEntraIDManager().refresh_token() ): return client if client_holder.is_library_client: return await client_holder.reload_library_client() azure_config = next( p.config for p in await client.providers.list() if p.provider_type == "remote::azure" ) return client_holder.update_provider_data( { "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(), "azure_api_base": str(azure_config.get("api_base", "")), } )Then use in both endpoints:
if provider_id == "azure": client = await refresh_azure_token_and_update_client( client, AsyncLlamaStackClientHolder() )src/llama_stack_configuration.py (1)
216-234: In-place modification is documented but contradicts coding guidelines.The function modifies
ls_configin place, which is documented in the docstring. However, per coding guidelines, returning a new data structure is preferred. Consider either:
- Returning the modified config (preferred per guidelines)
- Adding an explicit note that this is intentional for performance/integration reasons
Optional: Return modified config instead
-def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> None: - """Enrich Llama Stack config with BYOK RAG settings. - - Args: - ls_config: Llama Stack configuration dict (modified in place) - byok_rag: List of BYOK RAG configurations - """ +def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> dict[str, Any]: + """Enrich Llama Stack config with BYOK RAG settings. + + Args: + ls_config: Llama Stack configuration dict + byok_rag: List of BYOK RAG configurations + + Returns: + Enriched configuration dict + """ if len(byok_rag) == 0: logger.info("BYOK RAG is not configured: skipping") - return + return ls_config logger.info("Enriching Llama Stack config with BYOK RAG") ls_config["vector_dbs"] = construct_vector_dbs_section(ls_config, byok_rag) if "providers" not in ls_config: ls_config["providers"] = {} ls_config["providers"]["vector_io"] = construct_vector_io_providers_section( ls_config, byok_rag ) + return ls_configtests/unit/test_llama_stack_configuration.py (1)
81-111: Consider adding a merge test for consistency.The
construct_vector_dbs_sectiontests include a merge test (test_construct_vector_dbs_section_merge), butconstruct_vector_io_providers_sectiontests don't have an equivalent. Consider adding one for consistency.Suggested: Add merge test
def test_construct_vector_io_providers_section_merge() -> None: """Test merges existing and new entries.""" ls_config = {"providers": {"vector_io": [{"provider_id": "existing"}]}} byok_rag = [{"vector_db_id": "new_db"}] output = construct_vector_io_providers_section(ls_config, byok_rag) assert len(output) == 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docs/providers.mdpyproject.tomlscripts/llama-stack-entrypoint.shsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/main.pysrc/authorization/azure_token_manager.pysrc/llama_stack_configuration.pytest.containerfiletests/unit/authorization/test_azure_token_manager.pytests/unit/test_llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/unit/authorization/test_azure_token_manager.py
- scripts/llama-stack-entrypoint.sh
- docs/providers.md
- pyproject.toml
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Use FastAPI dependencies:from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, usingtyping_extensions.Selffor model validators
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Runuv run make formatto auto-format code with black and ruff before completion
Runuv run make verifyto run all linters (black, pyl...
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/llama_stack_configuration.pysrc/authorization/azure_token_manager.pysrc/app/main.pytests/unit/test_llama_stack_configuration.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/app/main.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Usepytest-mockfor AsyncMock objects in tests
UseMOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")pattern for authentication mocks in tests
Files:
tests/unit/test_llama_stack_configuration.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion
Files:
tests/unit/test_llama_stack_configuration.py
🧠 Learnings (8)
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.
Applied to files:
test.containerfile
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.
Applied to files:
src/app/endpoints/streaming_query.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data
Applied to files:
src/authorization/azure_token_manager.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Unit tests require 60% code coverage
Applied to files:
tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Write unit tests covering new functionality before completion
Applied to files:
tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/integration/**/*.py : Integration tests require 10% code coverage
Applied to files:
tests/unit/test_llama_stack_configuration.py
🧬 Code graph analysis (5)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (5)
AzureEntraIDManager(22-102)is_entra_id_configured(45-47)is_token_expired(50-52)refresh_token(59-76)access_token(55-57)src/client.py (3)
is_library_client(28-30)reload_library_client(117-131)update_provider_data(133-164)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
AzureEntraIDManager(22-102)is_entra_id_configured(45-47)is_token_expired(50-52)refresh_token(59-76)access_token(55-57)src/client.py (4)
AsyncLlamaStackClientHolder(21-164)is_library_client(28-30)reload_library_client(117-131)update_provider_data(133-164)
src/llama_stack_configuration.py (1)
src/models/config.py (3)
config(330-347)config(1361-1369)dump(1701-1713)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
configuration(95-106)src/models/config.py (1)
AzureEntraIdConfiguration(1537-1548)src/utils/types.py (1)
Singleton(43-57)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
generate_configuration(242-270)construct_vector_dbs_section(118-160)construct_vector_io_providers_section(163-213)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / ci
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (18)
test.containerfile (2)
6-6: LGTM: Azure Identity dependency added.The
azure-identitypackage is correctly added to support the new Azure Entra ID token management feature.
21-22: LGTM: Entrypoint correctly configured.The entrypoint is set to the enrichment script which will handle configuration enrichment and token generation before starting the Llama Stack service.
src/authorization/azure_token_manager.py (5)
1-20: LGTM: Module setup follows coding guidelines.
- Module docstring present
- Logger pattern
logging.getLogger(__name__)used correctly- Absolute imports used
- Type imports properly organized
- Constant
TOKEN_EXPIRATION_LEEWAYappropriately defined at module level
22-47: LGTM: Singleton class with proper configuration management.The class correctly implements the singleton pattern via metaclass and provides clear configuration lifecycle management with
set_configandis_entra_id_configured.
49-57: Potential race condition when checking expiration and reading token from different sources.
is_token_expiredchecks_expires_on(instance state) whileaccess_tokenreads fromos.environ["AZURE_API_KEY"]. In concurrent scenarios, these could become inconsistent if another thread updates the env var between checking expiration and reading the token.Since this is a singleton used across requests in an async context, consider whether this inconsistency could cause issues. If library mode reloads the client (which re-reads env vars), this might be acceptable.
59-76: LGTM: Token refresh with boolean return.The method correctly returns
True/Falseto indicate success, allowing callers to handle failures appropriately. This addresses the previous review concern about silent failures.
78-85: Token stored in environment variable - verify thread safety.Using
os.environ["AZURE_API_KEY"]as storage works for the intended use case (env var injection for Llama Stack), but be aware that environment variable updates are not atomic. In high-concurrency scenarios with multiple simultaneous refreshes, this could theoretically cause brief inconsistencies.Based on learnings, the environment variable approach is intentional as the single source of truth for Azure token storage.
src/app/main.py (2)
12-12: LGTM: Import follows project conventions.Absolute import from
authorization.azure_token_managercorrectly added.
43-51: LGTM: Startup token refresh with graceful degradation.The implementation correctly:
- Checks if Azure Entra ID is configured before attempting initialization
- Uses
set_configto provide configuration to the singleton- Checks
refresh_token()return value and logs a warning on failure- Allows service to continue (will retry on next Azure request)
This addresses the previous review concern about silent failures at startup.
src/app/endpoints/streaming_query.py (1)
52-52: LGTM: Import correctly placed with authorization imports.src/app/endpoints/query.py (1)
317-338: Code is correct; api_base is guaranteed by Llama Stack schema validation.The short-circuit pattern with
and AzureEntraIDManager().refresh_token()is clean and prevents invalid token updates. Regarding theapi_baseconcern: according to Lightspeed's documentation, the Azure provider config must includeapi_key,api_base, andapi_versionas required fields for Llama Stack schema validation—Llama Stack fails to start if any are missing. By the time line 335 executes,api_baseis guaranteed to be present, sostr(azure_config.get("api_base"))is safe and will not produce"None".src/llama_stack_configuration.py (4)
1-20: LGTM!Module docstring is clear, imports are properly organized with standard library first, then third-party, and the logger follows the
logger = logging.getLogger(__name__)pattern as per coding guidelines.
23-39: LGTM!The custom YAML dumper is well-documented with a clear docstring explaining its purpose and behavior.
242-271: LGTM!The function is well-structured with clear documentation. File errors propagate naturally, and the enrichment flow is logical.
278-317: LGTM!CLI entry point is well-structured with clear argument definitions and defaults. Unhandled exceptions (like
FileNotFoundError) will naturally result in non-zero exit status.tests/unit/test_llama_stack_configuration.py (3)
9-13: LGTM!Imports are properly organized using absolute imports as per coding guidelines.
28-74: LGTM!Good test coverage for
construct_vector_dbs_sectionwith tests for empty inputs, preservation of existing entries, adding new entries, and merging behavior.
127-137: Test fixture file is properly committed.The file
tests/configuration/run.yamlexists in the repository and is available for the test to use. The test code correctly references this fixture file.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Description
This PR brings the new functionality of Azure Entra ID authentication.
Part of it is also the management of the automatic enrichment of the llama stack config with the BYOK section from the LCS config, which shares the principle with the dynamic injection of access tokens.
Since Llama Stack needs a valid Azure API key already at the start of the service, it was necessary to add mechanisms to generate the initial access token even before the start of the Llama Stack service. This works for both library and service mode.
Documentation is also part of it, which discusses in detail the workflow of access tokens for both deployment modes and provides examples of configs for local deployment.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.