Add Codex OAuth provider#486
Conversation
Greptile SummaryThis PR adds a
Confidence Score: 3/5The new provider works end-to-end, but a first-time user who hasn't run The core OAuth and SSE logic in strix/llm/llm.py — specifically the Important Files Changed
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
strix/llm/llm.py:300-328
**Permanent auth errors trigger full retry backlog**
`CodexOAuthError` has no `status_code` attribute, so `_should_retry` returns `True` for every instance (the `code is None` branch). This means errors like "Codex auth file not found. Run `codex login` first." or "Codex OAuth request was unauthorized. Run `codex login` again." will be retried up to `max_retries` (default 5) times with exponential backoff — producing roughly 62 seconds of silent waiting before the error is finally surfaced to the user. A first-time user who hasn't run `codex login` will sit through the full retry loop every time.
### Issue 2 of 3
strix/llm/llm.py:317-328
**Double yield emits full raw content before processing**
When `content` is non-empty, two `LLMResponse` objects are yielded back-to-back: first the complete raw response (including unstripped `<thinking>` blocks and raw tool-call XML), then the processed version. In the regular `_stream` path, intermediate yields are partial incremental chunks, so the "raw" intermediate content is always shorter than the final. Here the first yield is the full, unprocessed text, which means any consumer that renders each yielded response would flash the complete raw output (with thinking tags) and then immediately overwrite it with the cleaned version.
### Issue 3 of 3
strix/llm/codex_oauth.py:205-209
**Non-standard `version` request header**
The `"version": "strix-codex-oauth"` header is not a recognized HTTP header name. Standard practice for identifying a client is `User-Agent`. OpenAI's own SDKs use `User-Agent` for client versioning. Using an arbitrary lowercase key named `version` is unlikely to be read correctly by the server, and if the intent is to mark traffic for observability or routing, it may silently have no effect.
Reviews (1): Last reviewed commit: "Add Codex OAuth provider" | Re-trigger Greptile |
| async def _stream_codex_oauth( | ||
| self, messages: list[dict[str, Any]] | ||
| ) -> AsyncIterator[LLMResponse]: | ||
| self._total_stats.requests += 1 | ||
| model = self.config.codex_model or self.config.model_name | ||
| content, usage = await asyncio.to_thread( | ||
| complete_codex_oauth, | ||
| model, | ||
| messages, | ||
| self._reasoning_effort, | ||
| self.config.timeout, | ||
| ) | ||
|
|
||
| if usage: | ||
| self._total_stats.input_tokens += usage.get("input_tokens", 0) | ||
| self._total_stats.output_tokens += usage.get("output_tokens", 0) | ||
|
|
||
| if content: | ||
| yield LLMResponse(content=content) | ||
|
|
||
| content = _THINKING_BLOCK_RE.sub("", content) | ||
| content = normalize_tool_format(content) | ||
| content = fix_incomplete_tool_call(_truncate_to_first_function(content)) | ||
|
|
||
| yield LLMResponse( | ||
| content=content, | ||
| tool_invocations=parse_tool_invocations(content), | ||
| thinking_blocks=None, | ||
| ) |
There was a problem hiding this comment.
Permanent auth errors trigger full retry backlog
CodexOAuthError has no status_code attribute, so _should_retry returns True for every instance (the code is None branch). This means errors like "Codex auth file not found. Run codex login first." or "Codex OAuth request was unauthorized. Run codex login again." will be retried up to max_retries (default 5) times with exponential backoff — producing roughly 62 seconds of silent waiting before the error is finally surfaced to the user. A first-time user who hasn't run codex login will sit through the full retry loop every time.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 300-328
Comment:
**Permanent auth errors trigger full retry backlog**
`CodexOAuthError` has no `status_code` attribute, so `_should_retry` returns `True` for every instance (the `code is None` branch). This means errors like "Codex auth file not found. Run `codex login` first." or "Codex OAuth request was unauthorized. Run `codex login` again." will be retried up to `max_retries` (default 5) times with exponential backoff — producing roughly 62 seconds of silent waiting before the error is finally surfaced to the user. A first-time user who hasn't run `codex login` will sit through the full retry loop every time.
How can I resolve this? If you propose a fix, please make it concise.| if content: | ||
| yield LLMResponse(content=content) | ||
|
|
||
| content = _THINKING_BLOCK_RE.sub("", content) | ||
| content = normalize_tool_format(content) | ||
| content = fix_incomplete_tool_call(_truncate_to_first_function(content)) | ||
|
|
||
| yield LLMResponse( | ||
| content=content, | ||
| tool_invocations=parse_tool_invocations(content), | ||
| thinking_blocks=None, | ||
| ) |
There was a problem hiding this comment.
Double yield emits full raw content before processing
When content is non-empty, two LLMResponse objects are yielded back-to-back: first the complete raw response (including unstripped <thinking> blocks and raw tool-call XML), then the processed version. In the regular _stream path, intermediate yields are partial incremental chunks, so the "raw" intermediate content is always shorter than the final. Here the first yield is the full, unprocessed text, which means any consumer that renders each yielded response would flash the complete raw output (with thinking tags) and then immediately overwrite it with the cleaned version.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 317-328
Comment:
**Double yield emits full raw content before processing**
When `content` is non-empty, two `LLMResponse` objects are yielded back-to-back: first the complete raw response (including unstripped `<thinking>` blocks and raw tool-call XML), then the processed version. In the regular `_stream` path, intermediate yields are partial incremental chunks, so the "raw" intermediate content is always shorter than the final. Here the first yield is the full, unprocessed text, which means any consumer that renders each yielded response would flash the complete raw output (with thinking tags) and then immediately overwrite it with the cleaned version.
How can I resolve this? If you propose a fix, please make it concise.| headers = { | ||
| "Authorization": f"Bearer {credentials.access_token}", | ||
| "Accept": "text/event-stream", | ||
| "Content-Type": "application/json", | ||
| "version": "strix-codex-oauth", |
There was a problem hiding this comment.
Non-standard
version request header
The "version": "strix-codex-oauth" header is not a recognized HTTP header name. Standard practice for identifying a client is User-Agent. OpenAI's own SDKs use User-Agent for client versioning. Using an arbitrary lowercase key named version is unlikely to be read correctly by the server, and if the intent is to mark traffic for observability or routing, it may silently have no effect.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/codex_oauth.py
Line: 205-209
Comment:
**Non-standard `version` request header**
The `"version": "strix-codex-oauth"` header is not a recognized HTTP header name. Standard practice for identifying a client is `User-Agent`. OpenAI's own SDKs use `User-Agent` for client versioning. Using an arbitrary lowercase key named `version` is unlikely to be read correctly by the server, and if the intent is to mark traffic for observability or routing, it may silently have no effect.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds a
codex/LLM provider that lets Strix use the local Codex CLI ChatGPT OAuth login instead of requiring an API key.Users can run: