-
Notifications
You must be signed in to change notification settings - Fork 62
fix(rlsapi): use chat completions API instead of Agent abstraction #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @major. Thanks for your PR. I'm waiting for a lightspeed-core member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
WalkthroughReplaces the temporary-agent/stateless inference path in the RLS API v1 endpoint with a direct Changes
Sequence Diagram(s)sequenceDiagram
participant Client as HTTP Client
participant Endpoint as RLSAPI v1 Endpoint
participant SDK as LLM SDK Client
participant LLM as Chat Completions Service
Client->>Endpoint: POST /infer (payload)
Endpoint->>Endpoint: build sys_msg, user_msg
Endpoint->>SDK: client.chat.completions.create(messages=[sys_msg,user_msg])
SDK->>LLM: relay chat completion request
alt success
LLM-->>SDK: response (choices -> message -> content)
SDK-->>Endpoint: response
Endpoint->>Client: 200 { content: "<raw string>", request_id, ... }
else error or empty
LLM-->>SDK: error or empty choices
SDK-->>Endpoint: error/empty
Endpoint->>Client: 200 { content: "", request_id, ... } or error handling
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/rlsapi_v1.py (2)
82-128: Refactoring successfully avoids the Agent API compatibility issues, but integration tests are outdated.The migration from the agent-based API to
client.chat.completions.create()is an excellent solution that sidesteps theinitialize()issues mentioned in the PR motivation. Thechat.completionsAPI is available in llama-stack-client 0.3.5 (confirmed by imports fromllama_stack_client.types.chat.completion_create_params), and unit tests validate the new implementation.However, the integration tests in
tests/integration/endpoints/test_rlsapi_v1_integration.pyare now outdated. They still mockget_temp_agentand the agent-based flow, while the actual endpoint code no longer uses the Agent API. These tests should be updated to mockclient.chat.completions.create()instead to reflect the current implementation.
1-191: Update test fixtures to match the new chat.completions API refactoring.The rlsapi_v1 endpoint was refactored to use the Llama Stack chat completions API directly instead of the agent-based approach. The test fixtures must be updated to reflect this:
- Remove the patch for
get_temp_agent(not used in the refactored code)- Configure
mock_client.chat.completions.createto return a properly structured response with:
response.choices[0].message.contentcontaining the LLM response text- Update
mock_empty_llm_responsefixture similarly to test empty/None content handlingThe test structure should mock the chat completions call directly rather than the agent turn-based API.
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)
116-128: Consider simplifying defensive null checks.The defensive pattern using
getattr()is safe and explicit, which is appropriate given the library compatibility concerns mentioned in the PR. However, if you prefer more concise code, you could use optional chaining with exception handling:try: return str(response.choices[0].message.content) except (IndexError, AttributeError): return ""This is purely stylistic—the current implementation is clear and correct.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/app/endpoints/rlsapi_v1.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/rlsapi_v1.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/rlsapi_v1.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/rlsapi_v1.py
🪛 GitHub Actions: Integration tests
src/app/endpoints/rlsapi_v1.py
[error] 1-1: AttributeError: module 'app.endpoints.rlsapi_v1' does not have the attribute 'get_temp_agent'. Tests patch this symbol during setup but it is missing in the module. Command: 'uv run pytest tests/integration --cov=src --cov=runner --cov-report term-missing'.
🪛 GitHub Actions: Unit tests
src/app/endpoints/rlsapi_v1.py
[error] 1-1: AttributeError: module app.endpoints.rlsapi_v1 does not have the attribute 'get_temp_agent'.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / ci
- GitHub Check: build-pr
🔇 Additional comments (3)
src/app/endpoints/rlsapi_v1.py (3)
8-15: LGTM! Import changes support the new chat completions approach.The addition of
MessageOpenAISystemMessageParamandMessageOpenAIUserMessageParamtypes is appropriate for constructing properly-typed chat messages.
49-79: LGTM! Clean helper function with proper error handling.The function correctly constructs the model identifier and handles missing configuration gracefully with appropriate error messages and status codes.
131-191: LGTM! Endpoint properly integrates the new chat completions flow.The endpoint correctly handles authentication, authorization, error cases, and empty responses. The error handling for
APIConnectionErrorfollows the coding guidelines, converting it to an appropriate503 ServiceUnavailableResponse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unit/app/endpoints/test_rlsapi_v1.py (2)
45-78: Consider addingspecto mock objects for better test reliability.The mock objects are created without
specorspec_setparameters. While this provides flexibility, it means attribute access typos won't be caught during testing. Consider usingspecwith the actual llama-stack-client types to ensure the mocks accurately reflect the real API surface.Example: Adding spec to mocks
from llama_stack_client.types import CompletionMessage, ChatCompletionMessageParam # Create mock message with spec mock_message = mocker.Mock(spec=CompletionMessage) mock_message.content = "This is a test LLM response."Note: You may need to import the appropriate types from
llama-stack-clientto use as specs.
81-123: Reduce duplication across mock fixtures.The three mock fixtures (
mock_llm_response,mock_empty_llm_response,mock_api_connection_error) share substantial setup code. Consider extracting the common client/chat/completions mock structure into a helper function that accepts the desiredcreate()behavior as a parameter.Example: Helper function to reduce duplication
def _setup_chat_completions_mock(mocker: MockerFixture, create_behavior): """Helper to set up chat.completions mock with custom create() behavior.""" mock_completions = mocker.Mock() mock_completions.create = create_behavior mock_chat = mocker.Mock() mock_chat.completions = mock_completions mock_client = mocker.Mock() mock_client.chat = mock_chat mock_client_holder = mocker.Mock() mock_client_holder.get_client.return_value = mock_client mocker.patch( "app.endpoints.rlsapi_v1.AsyncLlamaStackClientHolder", return_value=mock_client_holder, ) return mock_client @pytest.fixture(name="mock_llm_response") def mock_llm_response_fixture(mocker: MockerFixture) -> None: mock_message = mocker.Mock() mock_message.content = "This is a test LLM response." mock_choice = mocker.Mock() mock_choice.message = mock_message mock_response = mocker.Mock() mock_response.choices = [mock_choice] _setup_chat_completions_mock( mocker, mocker.AsyncMock(return_value=mock_response) )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/unit/app/endpoints/test_rlsapi_v1.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
🧠 Learnings (2)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
tests/unit/app/endpoints/test_rlsapi_v1.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
Applied to files:
tests/unit/app/endpoints/test_rlsapi_v1.py
🧬 Code graph analysis (1)
tests/unit/app/endpoints/test_rlsapi_v1.py (1)
src/client.py (1)
get_client(70-84)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / ci
- GitHub Check: build-pr
🔇 Additional comments (2)
tests/unit/app/endpoints/test_rlsapi_v1.py (2)
218-219: Good defensive assertions for request_id validation.The added assertions ensure that
request_idis always present and properly formatted usingcheck_suid. This improves test coverage and validates the stateless request tracking behavior.
108-110: TheAPIConnectionError(request=mocker.Mock())instantiation is correct and matches the expected constructor signature forllama-stack-client0.3.5. Therequestparameter is accepted by the constructor, and this pattern is used consistently across the test suite (e.g., intest_info.py,test_query_v2.py,test_shields.py, and others). Bothmocker.Mock()andNoneare valid values for therequestparameter in this codebase.
Switch the /v1/infer endpoint from using the AsyncAgent abstraction to the direct chat.completions.create() API. The AsyncAgent.create_turn() method internally calls self.initialize() which does not exist in llama-stack-client 0.3.5, causing AttributeError at runtime. The chat completions API is a better fit for the stateless /infer endpoint: - Simpler for single-turn inference (no session management) - Avoids the broken Agent abstraction - Fewer moving parts Test fixtures updated with: - spec=OpenAIChatCompletion/OpenAIChatCompletionChoice for type safety - Extracted common mock setup into helper to reduce duplication Signed-off-by: Major Hayden <major@redhat.com>
fbdbcd7 to
4c46cea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)
1-191: Critical: Integration tests failing – must be updated to match new implementation.The integration tests in
tests/integration/endpoints/test_rlsapi_v1_integration.pyattempt to patchget_temp_agentat lines 106 and 241, but this function is not used in the refactoredinfer_endpoint. The endpoint now usesAsyncLlamaStackClientHolder().get_client()with directchat.completions.create()calls instead of the agent-based approach.The test fixtures must be updated to mock the chat completions client and response structure rather than the agent and its turn output.
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)
102-128: Consider using direct attribute access instead of getattr() pattern.The defensive
getattr()checks (lines 120, 124) andstr()conversion (line 128) suggest uncertainty about the response structure. Consider:
- Import
OpenAIChatCompletiontype (available inllama_stack_client.types.chat.completion_create_response)- Add type hint:
response: OpenAIChatCompletion = await client.chat.completions.create(...)- Replace
getattr()with direct attribute access:choice.message.content- Let typed access raise AttributeError if structure is invalid, or handle specific exceptions explicitly
This improves type safety and makes the code more maintainable. The current pattern obscures whether we're guarding against None values, missing attributes, or API inconsistencies.
♻️ Proposed refactoring with proper types
Add import at the top of the file:
from llama_stack_client.types.chat.completion_create_response import ( OpenAIChatCompletion, )Then refactor the response handling:
- response = await client.chat.completions.create( + response: OpenAIChatCompletion = await client.chat.completions.create( model=model_id, messages=[sys_msg, user_msg], ) if not response.choices: return "" choice = response.choices[0] - message = getattr(choice, "message", None) - if message is None: + if not hasattr(choice, "message") or choice.message is None: return "" - content = getattr(message, "content", None) - if content is None: + if not hasattr(choice.message, "content") or choice.message.content is None: return "" - return str(content) + return choice.message.content
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/rlsapi_v1.pytests/unit/app/endpoints/test_rlsapi_v1.py
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/rlsapi_v1.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/rlsapi_v1.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/rlsapi_v1.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
🧠 Learnings (2)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
tests/unit/app/endpoints/test_rlsapi_v1.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
Applied to files:
tests/unit/app/endpoints/test_rlsapi_v1.py
🧬 Code graph analysis (1)
tests/unit/app/endpoints/test_rlsapi_v1.py (1)
src/client.py (1)
get_client(70-84)
🪛 GitHub Actions: Integration tests
src/app/endpoints/rlsapi_v1.py
[error] 1-1: AttributeError during tests: module 'app.endpoints.rlsapi_v1' does not have the attribute 'get_temp_agent'. This caused multiple integration tests (pytest) to fail when patching get_temp_agent. Command: 'uv run pytest tests/integration --cov=src --cov=runner --cov-report term-missing'. Ensure get_temp_agent is defined or correctly imported in rlsapi_v1.py.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / ci
🔇 Additional comments (9)
src/app/endpoints/rlsapi_v1.py (2)
8-8: LGTM! Missing import now properly added.The
Anytype is used in the existinginfer_responsesdictionary (line 38:dict[int | str, dict[str, Any]]) but was not previously imported.
12-15: LGTM! Proper type hints for message parameters.The imports provide type safety for the system and user message construction used in the direct chat completions API call.
tests/unit/app/endpoints/test_rlsapi_v1.py (7)
6-14: LGTM! Proper imports for the new mocking strategy.The imports support the refactored mocking approach:
Anyis used for the flexiblecreate_behaviorparameter (line 40)OpenAIChatCompletionandOpenAIChatCompletionChoiceprovide proper specs for mock objects
40-62: Consider adding return type hint for completeness.The helper function works correctly and properly mocks the entire chain. Consider adding explicit
-> Nonereturn type hint to match coding guidelines requiring complete type annotations.♻️ Add return type hint
-def _setup_chat_completions_mock(mocker: MockerFixture, create_behavior: Any) -> None: +def _setup_chat_completions_mock(mocker: MockerFixture, create_behavior: Any) -> None:Wait, it already has
-> None. Never mind, this is already complete.Actually, looking more carefully, line 40 already has
-> None. This is already correct!
40-62: LGTM! Well-structured mocking helper.The function correctly mocks the entire chain from
AsyncLlamaStackClientHolderdown tochat.completions.create(), with proper type hints and documentation. The use of regularMock()for synchronous intermediate objects and accepting flexiblecreate_behaviorfor the asynccreate()method is appropriate.
76-87: LGTM! Fixture properly updated for chat completions API.The mock correctly constructs the response structure (
response.choices[0].message.content) with appropriate specs for type safety. The removal of theprepare_agent_mocksparameter aligns with the elimination of the agent-based approach.
91-96: LGTM! Properly tests empty response handling.The fixture correctly simulates an empty
choiceslist to test the defensive check inretrieve_simple_response.
100-105: LGTM! Properly tests API connection error handling.The fixture correctly simulates
APIConnectionErrorthrough the chat completions mock, testing the error handling path that returns a 503 response.
200-200: LGTM! Good defensive assertion.Adding an explicit check for
request_id is not Nonebefore validating the SUID format improves test clarity and error messages.
Description
Switch the
/v1/inferendpoint from using theAsyncAgentabstraction to the directchat.completions.create()API.The
AsyncAgent.create_turn()method internally callsself.initialize()which does not exist in llama-stack-client 0.3.5, causingAttributeErrorat runtime.The chat completions API is a better fit for the stateless
/inferendpoint because:Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
/v1/inferwith rh-identity auth headerAttributeErrorExample test:
Summary by CodeRabbit
Refactor
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.