Skip to content

Add Codex OAuth provider#486

Open
JOBOYA wants to merge 2 commits into
usestrix:mainfrom
JOBOYA:codex/codex-oauth-provider
Open

Add Codex OAuth provider#486
JOBOYA wants to merge 2 commits into
usestrix:mainfrom
JOBOYA:codex/codex-oauth-provider

Conversation

@JOBOYA
Copy link
Copy Markdown

@JOBOYA JOBOYA commented May 12, 2026

Summary

Adds a codex/ LLM provider that lets Strix use the local Codex CLI ChatGPT OAuth login instead of requiring an API key.

Users can run:

codex login
export STRIX_LLM="codex/gpt-5.5"

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR adds a codex/ LLM provider that authenticates via the local Codex CLI OAuth token stored in ~/.codex/auth.json, allowing users to run Strix without an API key by reusing their ChatGPT login. The implementation covers credential loading, automatic token refresh on 401, OpenAI Responses-API payload building, SSE event parsing, and a bypass of the existing litellm path in LLM._stream.

  • strix/llm/codex_oauth.py: New self-contained module handling all OAuth credential I/O, HTTP requests to chatgpt.com/backend-api/codex, and SSE/JSON response parsing; logic is solid with good error wrapping.
  • strix/llm/llm.py: Adds _stream_codex_oauth as an early-return branch in _stream; permanent auth failures unintentionally fall into the existing retry loop because CodexOAuthError has no status_code, causing ~62 s of backoff before the user sees a helpful message.
  • strix/interface/main.py / strix/llm/config.py: Environment validation and warm-up are cleanly updated to skip API-key requirements for the codex/ prefix.

Confidence Score: 3/5

The new provider works end-to-end, but a first-time user who hasn't run codex login will wait over a minute through retries before seeing an actionable error message.

The core OAuth and SSE logic in codex_oauth.py is well-written and well-tested. The integration point in llm.py has a concrete defect: CodexOAuthError carries no HTTP status code, so the inherited retry loop treats every error as transient and backs off five times before giving up. This affects the most common first-run failure scenario — forgetting to run codex login — turning an instant, clear error into a 62-second delay.

strix/llm/llm.py — specifically the _stream_codex_oauth method and its interaction with the retry loop in generate().

Important Files Changed

Filename Overview
strix/llm/codex_oauth.py New module implementing Codex CLI OAuth credential loading, token refresh, payload building, and SSE response parsing; logic is sound with good error handling, minor issue with non-standard version request header.
strix/llm/llm.py Adds _stream_codex_oauth as a bypass path in _stream; permanent CodexOAuthError failures inherit the existing retry loop (unintended, ~62 s delay), and the double-yield pattern differs from the regular streaming path in a way that could confuse display consumers.
strix/llm/config.py Cleanly detects codex/ prefix and populates uses_codex_oauth / codex_model on LLMConfig; no issues.
strix/interface/main.py Skips API-key validation for codex OAuth users and runs a warm-up call via the new provider; straightforward and correct.
tests/llm/test_codex_oauth.py Good coverage of credential loading, token refresh persistence, payload conversion, and SSE parsing; all tests are well-structured with clear assertions.
README.md Adds documentation for the Codex OAuth login flow; accurate and concise.
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
strix/llm/llm.py:300-328
**Permanent auth errors trigger full retry backlog**

`CodexOAuthError` has no `status_code` attribute, so `_should_retry` returns `True` for every instance (the `code is None` branch). This means errors like "Codex auth file not found. Run `codex login` first." or "Codex OAuth request was unauthorized. Run `codex login` again." will be retried up to `max_retries` (default 5) times with exponential backoff — producing roughly 62 seconds of silent waiting before the error is finally surfaced to the user. A first-time user who hasn't run `codex login` will sit through the full retry loop every time.

### Issue 2 of 3
strix/llm/llm.py:317-328
**Double yield emits full raw content before processing**

When `content` is non-empty, two `LLMResponse` objects are yielded back-to-back: first the complete raw response (including unstripped `<thinking>` blocks and raw tool-call XML), then the processed version. In the regular `_stream` path, intermediate yields are partial incremental chunks, so the "raw" intermediate content is always shorter than the final. Here the first yield is the full, unprocessed text, which means any consumer that renders each yielded response would flash the complete raw output (with thinking tags) and then immediately overwrite it with the cleaned version.

### Issue 3 of 3
strix/llm/codex_oauth.py:205-209
**Non-standard `version` request header**

The `"version": "strix-codex-oauth"` header is not a recognized HTTP header name. Standard practice for identifying a client is `User-Agent`. OpenAI's own SDKs use `User-Agent` for client versioning. Using an arbitrary lowercase key named `version` is unlikely to be read correctly by the server, and if the intent is to mark traffic for observability or routing, it may silently have no effect.

Reviews (1): Last reviewed commit: "Add Codex OAuth provider" | Re-trigger Greptile

Comment thread strix/llm/llm.py
Comment on lines +300 to +328
async def _stream_codex_oauth(
self, messages: list[dict[str, Any]]
) -> AsyncIterator[LLMResponse]:
self._total_stats.requests += 1
model = self.config.codex_model or self.config.model_name
content, usage = await asyncio.to_thread(
complete_codex_oauth,
model,
messages,
self._reasoning_effort,
self.config.timeout,
)

if usage:
self._total_stats.input_tokens += usage.get("input_tokens", 0)
self._total_stats.output_tokens += usage.get("output_tokens", 0)

if content:
yield LLMResponse(content=content)

content = _THINKING_BLOCK_RE.sub("", content)
content = normalize_tool_format(content)
content = fix_incomplete_tool_call(_truncate_to_first_function(content))

yield LLMResponse(
content=content,
tool_invocations=parse_tool_invocations(content),
thinking_blocks=None,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Permanent auth errors trigger full retry backlog

CodexOAuthError has no status_code attribute, so _should_retry returns True for every instance (the code is None branch). This means errors like "Codex auth file not found. Run codex login first." or "Codex OAuth request was unauthorized. Run codex login again." will be retried up to max_retries (default 5) times with exponential backoff — producing roughly 62 seconds of silent waiting before the error is finally surfaced to the user. A first-time user who hasn't run codex login will sit through the full retry loop every time.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 300-328

Comment:
**Permanent auth errors trigger full retry backlog**

`CodexOAuthError` has no `status_code` attribute, so `_should_retry` returns `True` for every instance (the `code is None` branch). This means errors like "Codex auth file not found. Run `codex login` first." or "Codex OAuth request was unauthorized. Run `codex login` again." will be retried up to `max_retries` (default 5) times with exponential backoff — producing roughly 62 seconds of silent waiting before the error is finally surfaced to the user. A first-time user who hasn't run `codex login` will sit through the full retry loop every time.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread strix/llm/llm.py Outdated
Comment on lines +317 to +328
if content:
yield LLMResponse(content=content)

content = _THINKING_BLOCK_RE.sub("", content)
content = normalize_tool_format(content)
content = fix_incomplete_tool_call(_truncate_to_first_function(content))

yield LLMResponse(
content=content,
tool_invocations=parse_tool_invocations(content),
thinking_blocks=None,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double yield emits full raw content before processing

When content is non-empty, two LLMResponse objects are yielded back-to-back: first the complete raw response (including unstripped <thinking> blocks and raw tool-call XML), then the processed version. In the regular _stream path, intermediate yields are partial incremental chunks, so the "raw" intermediate content is always shorter than the final. Here the first yield is the full, unprocessed text, which means any consumer that renders each yielded response would flash the complete raw output (with thinking tags) and then immediately overwrite it with the cleaned version.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 317-328

Comment:
**Double yield emits full raw content before processing**

When `content` is non-empty, two `LLMResponse` objects are yielded back-to-back: first the complete raw response (including unstripped `<thinking>` blocks and raw tool-call XML), then the processed version. In the regular `_stream` path, intermediate yields are partial incremental chunks, so the "raw" intermediate content is always shorter than the final. Here the first yield is the full, unprocessed text, which means any consumer that renders each yielded response would flash the complete raw output (with thinking tags) and then immediately overwrite it with the cleaned version.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread strix/llm/codex_oauth.py Outdated
Comment on lines +205 to +209
headers = {
"Authorization": f"Bearer {credentials.access_token}",
"Accept": "text/event-stream",
"Content-Type": "application/json",
"version": "strix-codex-oauth",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-standard version request header

The "version": "strix-codex-oauth" header is not a recognized HTTP header name. Standard practice for identifying a client is User-Agent. OpenAI's own SDKs use User-Agent for client versioning. Using an arbitrary lowercase key named version is unlikely to be read correctly by the server, and if the intent is to mark traffic for observability or routing, it may silently have no effect.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/codex_oauth.py
Line: 205-209

Comment:
**Non-standard `version` request header**

The `"version": "strix-codex-oauth"` header is not a recognized HTTP header name. Standard practice for identifying a client is `User-Agent`. OpenAI's own SDKs use `User-Agent` for client versioning. Using an arbitrary lowercase key named `version` is unlikely to be read correctly by the server, and if the intent is to mark traffic for observability or routing, it may silently have no effect.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant