Skip to content

Add a dedicated OpenAI-compatible LLM adapter#1895

Open
jimmyzhuu wants to merge 10 commits intoZipstack:mainfrom
jimmyzhuu:codex/openai-compatible-llm-adapter
Open

Add a dedicated OpenAI-compatible LLM adapter#1895
jimmyzhuu wants to merge 10 commits intoZipstack:mainfrom
jimmyzhuu:codex/openai-compatible-llm-adapter

Conversation

@jimmyzhuu
Copy link
Copy Markdown

@jimmyzhuu jimmyzhuu commented Apr 4, 2026

Summary

This PR adds a dedicated OpenAI Compatible LLM adapter for OpenAI-style chat completion endpoints that are not the official OpenAI service.

The implementation is intentionally small in scope:

  • adds a new OpenAI Compatible LLM adapter backed by LiteLLM's custom_openai path
  • keeps the existing OpenAI adapter unchanged
  • adds adapter registration and JSON schema
  • adds focused tests for registration, model normalization, schema loading, and usage recording
  • documents the new adapter in the README

Why

Users may already have access to OpenAI-compatible endpoints behind a private gateway or third-party provider, but the current OpenAI adapter is specifically shaped around official OpenAI semantics.

Using a separate adapter keeps those semantics explicit and avoids broadening the meaning of the existing OpenAI adapter.

Refs #1894
Refs #856
Refs #1443

Scope

This PR is limited to:

  • LLM only
  • no embedding changes
  • no x2text / OCR changes
  • no changes to the existing OpenAI adapter behavior

Notes

LLM._record_usage now prefers provider-reported prompt_tokens when they are present in the usage payload.

If prompt_tokens are missing, _record_usage still falls back to LiteLLM token estimation. If that estimation raises, it now logs a warning and records 0 prompt tokens for usage audit instead of bubbling the exception up after a successful LLM call.

This behavior change is in the shared _record_usage path, so it applies to every SDK1 LLM adapter that uses it, not just custom_openai.

This keeps successful LLM calls from failing at the usage-audit / billing step while preserving the current pricing semantics in this PR.

Validation

  • UV_SKIP_WHEEL_FILENAME_CHECK=1 uv run pytest tests/test_openai_compatible_adapter.py
  • UV_SKIP_WHEEL_FILENAME_CHECK=1 uv run ruff check src/unstract/sdk1/adapters/base1.py src/unstract/sdk1/adapters/llm1/__init__.py src/unstract/sdk1/adapters/llm1/openai_compatible.py src/unstract/sdk1/llm.py tests/test_openai_compatible_adapter.py

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 4, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds support for OpenAI-compatible LLM providers (e.g., via LiteLLM) by introducing a new adapter type, parameter validation with model normalization, improved usage token handling, and corresponding tests and configuration.

Changes

OpenAI-Compatible LLM Adapter Feature

Layer / File(s) Summary
Parameter Definition
unstract/sdk1/src/unstract/sdk1/adapters/base1.py
OpenAICompatibleLLMParameters class adds api_key and api_base fields with validate() and validate_model() methods that prefix models with custom_openai/ and normalize empty API keys to None.
BaseAdapter Enhancement
unstract/sdk1/src/unstract/sdk1/adapters/base1.py
BaseAdapter gains SCHEMA_PATH class variable and get_json_schema() now respects custom schema paths for adapters with overridden paths.
Adapter Implementation
unstract/sdk1/src/unstract/sdk1/adapters/llm1/openai_compatible.py
OpenAICompatibleLLMAdapter class subclasses parameters and BaseAdapter, exposing fixed ID, metadata, name "OpenAI Compatible", provider "custom_openai", description, and icon through static methods.
Module Registration & Exports
unstract/sdk1/src/unstract/sdk1/adapters/llm1/__init__.py
OpenAICompatibleLLMAdapter is imported and added to __all__ for public module exposure.
Usage Recording
unstract/sdk1/src/unstract/sdk1/llm.py
_record_usage() now prefers provider-supplied prompt_tokens from usage_data; falls back to token_counter() if absent and logs a warning (with zero token count recorded) if token counting fails.
Configuration & Schema
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/openai_compatible.json
JSON Schema defines required fields (adapter_name, api_base) and optional fields (api_key, model, max_tokens, max_retries, timeout) with types and descriptions.
Documentation & Tests
README.md, unstract/sdk1/tests/test_openai_compatible_adapter.py
README updated to list "OpenAI Compatible" as a supported provider; comprehensive test suite validates adapter registration, parameter normalization, schema structure, metadata, and usage token recording edge cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provides comprehensive coverage of What, Why, and Scope. However, it lacks several required template sections: 'How' details, breaking changes assessment, database migrations, env config, relevant docs, related issues clarity, dependencies versions, and testing notes. Fill in the missing template sections (How, breaking changes, databases, env config, docs, issues, dependencies, and testing notes) to provide complete context for reviewers and maintainers.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a dedicated OpenAI-compatible LLM adapter. It directly matches the changeset's primary objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 4, 2026

Greptile Summary

This PR adds a dedicated OpenAI Compatible LLM adapter backed by LiteLLM's custom_openai path, keeping the existing OpenAI adapter untouched. It also makes a cross-cutting improvement to LLM._record_usage: the method now prefers provider-reported prompt_tokens over a local token_counter estimate, and gracefully catches estimation failures instead of bubbling exceptions after a successful LLM call.

  • New adapter (openai_compatible.py, openai_compatible.json): Implements all required abstract methods, registers correctly, and uses a SCHEMA_PATH override to point at the right JSON schema file rather than relying on the default naming convention.
  • _record_usage change (llm.py): Applies to every SDK1 LLM adapter; providers that already report prompt_tokens are unaffected, while adapters that don't (e.g. custom_openai) now get a fallback estimate instead of silently billing 0.
  • OpenAICompatibleLLMParameters.validate (base1.py): Mutates the caller's dict in-place before Pydantic validation, unlike OpenAILLMParameters.validate which copies first — this discrepancy was flagged in a prior review thread.

Confidence Score: 5/5

Safe to merge; the new adapter is well-isolated and the _record_usage change only improves resilience for providers that omit prompt_tokens.

The adapter is a thin, self-contained addition with no changes to existing adapter behavior. The shared _record_usage change is strictly additive (catches a previously uncaught exception and prefers reported token counts), with thorough test coverage for all three scenarios. Open items from prior review threads (validate dict mutation, model not in required) are pre-existing and non-blocking.

base1.py warrants a follow-up on the dict-mutation issue in OpenAICompatibleLLMParameters.validate, which was flagged in a prior review thread but not yet addressed in this PR.

Important Files Changed

Filename Overview
unstract/sdk1/src/unstract/sdk1/adapters/llm1/openai_compatible.py New adapter class; cleanly inherits OpenAICompatibleLLMParameters and BaseAdapter, overrides SCHEMA_PATH, and implements all required abstract methods.
unstract/sdk1/src/unstract/sdk1/adapters/base1.py Adds OpenAICompatibleLLMParameters and a SCHEMA_PATH override mechanism; validate() mutates the caller's dict (flagged in prior thread) unlike OpenAILLMParameters which copies first.
unstract/sdk1/src/unstract/sdk1/llm.py _record_usage now prefers provider-reported prompt_tokens and falls back to token_counter estimation with exception handling; type annotation on the usage parameter is slightly inaccurate for the None-value case.
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/openai_compatible.json JSON schema for the new adapter; model is not in required (flagged in prior thread), api_key correctly typed as string
unstract/sdk1/tests/test_openai_compatible_adapter.py Comprehensive test coverage for registration, model normalization, schema loading, and _record_usage behavior; magic module stub addressed with patch.dict context manager.
unstract/sdk1/src/unstract/sdk1/adapters/llm1/init.py Correctly registers OpenAICompatibleLLMAdapter alongside existing adapters and adds it to all.

Sequence Diagram

sequenceDiagram
    participant UI
    participant OpenAICompatibleLLMAdapter
    participant OpenAICompatibleLLMParameters
    participant LiteLLM
    participant LLM._record_usage
    participant Audit

    UI->>OpenAICompatibleLLMAdapter: validate(adapter_metadata)
    OpenAICompatibleLLMAdapter->>OpenAICompatibleLLMParameters: validate_model(metadata)
    Note over OpenAICompatibleLLMParameters: prefix with "custom_openai/"
    OpenAICompatibleLLMParameters-->>OpenAICompatibleLLMAdapter: "custom_openai/<model>"
    OpenAICompatibleLLMAdapter->>OpenAICompatibleLLMParameters: Pydantic model_dump()
    OpenAICompatibleLLMParameters-->>OpenAICompatibleLLMAdapter: validated dict

    UI->>LiteLLM: litellm.completion(model="custom_openai/...", api_base=...)
    LiteLLM-->>LLM._record_usage: response with usage (may include prompt_tokens)

    alt usage has prompt_tokens
        LLM._record_usage->>LLM._record_usage: use reported value
    else usage missing prompt_tokens
        LLM._record_usage->>LiteLLM: token_counter(model, messages)
        alt token_counter succeeds
            LiteLLM-->>LLM._record_usage: estimated count
        else token_counter fails
            LLM._record_usage->>LLM._record_usage: warn + use 0
        end
    end

    LLM._record_usage->>Audit: push_usage_data(prompt_tokens, ...)
Loading

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
unstract/sdk1/src/unstract/sdk1/llm.py:648-654
The `usage` parameter is annotated as `Mapping[str, int] | None`, but the new code explicitly handles `None` values within the mapping (e.g. `{"prompt_tokens": None, ...}`). The test `test_record_usage_uses_estimated_prompt_tokens_when_usage_has_none` demonstrates this real-world case. A stricter type checker would reject callers passing a `Mapping[str, int | None]` under this annotation.

```suggestion
    def _record_usage(
        self,
        model: str,
        messages: list[dict[str, str]],
        usage: Mapping[str, int | None] | None,
        llm_api: str,
    ) -> None:
```

Reviews (6): Last reviewed commit: "Refine OpenAI compatible adapter schema ..." | Re-trigger Greptile

Comment thread unstract/sdk1/tests/test_openai_compatible_adapter.py Outdated
Comment thread unstract/sdk1/tests/test_openai_compatible_adapter.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
unstract/sdk1/src/unstract/sdk1/llm.py (1)

542-557: Avoid unconditional token estimation when usage already includes prompt tokens.

This currently computes token_counter() even when provider usage already has prompt tokens, which can create repeated warnings/noise for unmapped models without improving recorded usage.

♻️ Proposed refinement
-        try:
-            prompt_tokens = token_counter(model=model, messages=messages)
-        except Exception as e:
-            prompt_tokens = 0
-            logger.warning(
-                "[sdk1][LLM][%s][%s] Failed to estimate prompt tokens: %s",
-                model,
-                llm_api,
-                e,
-            )
         usage_data: Mapping[str, int] = usage or {}
+        prompt_tokens = usage_data.get("prompt_tokens")
+        if prompt_tokens is None:
+            try:
+                prompt_tokens = token_counter(model=model, messages=messages)
+            except Exception as e:
+                prompt_tokens = 0
+                logger.warning(
+                    "[sdk1][LLM][%s][%s] Failed to estimate prompt tokens: %s",
+                    model,
+                    llm_api,
+                    e,
+                )
         all_tokens = TokenCounterCompat(
-            prompt_tokens=usage_data.get("prompt_tokens", 0),
+            prompt_tokens=usage_data.get("prompt_tokens", prompt_tokens or 0),
             completion_tokens=usage_data.get("completion_tokens", 0),
             total_tokens=usage_data.get("total_tokens", 0),
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 542 - 557, The code
unconditionally calls token_counter(model, messages) even when usage already
contains prompt token counts; change the logic in the block around token_counter
and TokenCounterCompat so you first check usage (usage_data = usage or {}) and
if usage_data.get("prompt_tokens") is present use that value for prompt_tokens
instead of calling token_counter; only call token_counter(model, messages)
inside the try/except when usage_data lacks prompt_tokens, preserving the
existing exception handling and the logger.warning path, and then construct
TokenCounterCompat using the values from usage_data (falling back to the
estimated prompt_tokens when used).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/custom_openai.json`:
- Around line 15-20: The schema for the "api_key" property currently only allows
a string which fails when runtime metadata contains null; update the "api_key"
entry in the JSON schema (the "api_key" property in custom_openai.json) to
permit null values by changing its type to accept both string and null (or add a
nullable:true equivalent) so stored configs with null pass validation and
editing flows.

---

Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 542-557: The code unconditionally calls token_counter(model,
messages) even when usage already contains prompt token counts; change the logic
in the block around token_counter and TokenCounterCompat so you first check
usage (usage_data = usage or {}) and if usage_data.get("prompt_tokens") is
present use that value for prompt_tokens instead of calling token_counter; only
call token_counter(model, messages) inside the try/except when usage_data lacks
prompt_tokens, preserving the existing exception handling and the logger.warning
path, and then construct TokenCounterCompat using the values from usage_data
(falling back to the estimated prompt_tokens when used).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf841637-54b7-4802-9156-7f56e899ca54

📥 Commits

Reviewing files that changed from the base of the PR and between 7983c98 and 5090773.

📒 Files selected for processing (7)
  • README.md
  • unstract/sdk1/src/unstract/sdk1/adapters/base1.py
  • unstract/sdk1/src/unstract/sdk1/adapters/llm1/__init__.py
  • unstract/sdk1/src/unstract/sdk1/adapters/llm1/openai_compatible.py
  • unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/custom_openai.json
  • unstract/sdk1/src/unstract/sdk1/llm.py
  • unstract/sdk1/tests/test_openai_compatible_adapter.py

@jimmyzhuu
Copy link
Copy Markdown
Author

Addressed the review follow-ups.

  • allowed api_key = null in the OpenAI-compatible schema
  • avoided redundant token_counter() calls when provider usage already includes prompt_tokens
  • tightened the tests around both cases
  • added ERNIE / Baidu Qianfan as schema examples while keeping the adapter generic

Validation re-run:

  • UV_SKIP_WHEEL_FILENAME_CHECK=1 uv run pytest tests/test_openai_compatible_adapter.py
  • UV_SKIP_WHEEL_FILENAME_CHECK=1 uv run ruff check src/unstract/sdk1/llm.py tests/test_openai_compatible_adapter.py

@jimmyzhuu
Copy link
Copy Markdown
Author

Gentle follow-up on this PR in case it slipped through the queue. When someone has bandwidth, I would really appreciate a review. Happy to make any follow-up changes quickly. Thanks!

Comment on lines +543 to +554
prompt_tokens = usage_data.get("prompt_tokens")
if prompt_tokens is None:
try:
prompt_tokens = token_counter(model=model, messages=messages)
except Exception as e:
prompt_tokens = 0
logger.warning(
"[sdk1][LLM][%s][%s] Failed to estimate prompt tokens: %s",
model,
llm_api,
e,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pk-zipstack @johnyrahul is this a safe change?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept this scoped to usage accounting only. It still uses provider-reported prompt tokens when available, only estimates when they are missing, and the fallback paths are covered by tests now.

@jaseemjaskp jaseemjaskp self-requested a review April 23, 2026 09:01
Copy link
Copy Markdown
Contributor

@jaseemjaskp jaseemjaskp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Toolkit — consolidated findings

Automated review aggregating six specialist agents (code-reviewer, code-simplifier, silent-failure-hunter, type-design-analyzer, pr-test-analyzer, comment-analyzer). No blocking defects; the adapter follows existing sibling-adapter conventions and the scope is appropriately narrow.

High-signal items worth addressing before merge

  • llm.py:556 — redundant/confusing fallback: prompt_tokens is already resolved at line 543 with an explicit estimation branch, so the usage_data.get("prompt_tokens", prompt_tokens or 0) expression double-handles the default and, worse, silently coerces an explicit None from usage_data to 0 without logging.
  • llm.py:547 — broad except Exception combined with a warning that says "failed to estimate" but not "recording 0 tokens" means billing/audit can silently under-report. Narrow the exception and either use logger.exception or rewrite the message to name the consequence.
  • base1.py:232 / schema api_base — the Pydantic type is plain str; URL shape lives only in the JSON schema, so direct construction accepts garbage. Consider HttpUrl / a field_validator.
  • base1.py:239 — prefix logic is only invoked by the validate classmethod. AzureOpenAILLMParameters uses @model_validator(mode="before") which cannot be bypassed; mirroring that tightens the invariant.
  • custom_openai.jsonapi_base is listed as required yet ships with a placeholder default URL, which lets users save an unchanged form and hit 404s at request time instead of validation errors. Vendor-specific examples (ERNIE-4.0-8K (Baidu Qianfan), qianfan.baidubce.com) are prone to rot and should be generic.
  • test_openai_compatible_adapter.py — the "tolerates unmapped models" test only asserts push_usage_data.assert_called_once(); it does not verify prompt_tokens=0 was actually pushed, so a regression silently pushing None or crashing inside TokenCounterCompat would still pass.

Suggested follow-ups (non-blocking)

  • Add tests for: api_base missing (Pydantic ValidationError), usage=None, usage["prompt_tokens"]=None, and the success branch of token_counter.
  • Drop the unused @lru_cache on _load_llm_module and the dead _load_llm_class.
  • Replace the tautological get_description() / metadata description with user-facing copy that distinguishes this adapter from OpenAILLMAdapter.

Inline comments below flag each item at its exact line with a concrete fix suggestion.

Comment thread unstract/sdk1/src/unstract/sdk1/adapters/llm1/openai_compatible.py Outdated
Comment thread unstract/sdk1/src/unstract/sdk1/adapters/llm1/openai_compatible.py Outdated
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
unstract/sdk1/tests/test_openai_compatible_adapter.py (3)

102-192: LGTM — good coverage of the three _record_usage branches.

Tests exercise: (a) provider-supplied prompt_tokens bypasses token_counter, (b) token_counter raising falls back to 0 with a warning, and (c) prompt_tokens=None triggers estimation. The use of __new__ to bypass __init__ and the targeted patch.object(llm_module, ...) are appropriate here. Nice to see the warning-message assertion on line 162 pinning the audit-visible text.

One small suggestion: assert mock_warning is called exactly once and with model/llm_api substituted into the format string, to catch regressions that accidentally change the log signature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/tests/test_openai_compatible_adapter.py` around lines 102 -
192, Update the
test_record_usage_tolerates_unmapped_models_without_prompt_tokens test to assert
the warning logger was called exactly once and that the warning message includes
the model ("custom_openai/gateway-model") and llm_api ("complete") values;
locate the test function and the mock_warning (patched via
patch.object(llm_module.logger, "warning")) and after calling llm._record_usage
add assertions that mock_warning.assert_called_once() and that the call_args
contains both the model and llm_api strings in the formatted warning message to
catch signature regressions.

18-31: Minor: lru_cache around import_module is largely redundant.

sys.modules already caches modules after first import, so the lru_cache(maxsize=1) only saves the patch.dict context-manager overhead. Leaving it is harmless, but on second and subsequent calls the magic stub will not be re-installed (because the cached branch returns early), so any test that newly triggers import magic after the first call would see the real module. Not a problem today, but worth documenting with a short comment to prevent surprise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/tests/test_openai_compatible_adapter.py` around lines 18 - 31,
The `@lru_cache` on _load_llm_module() prevents the patch.dict stub for "magic"
from being re-applied on subsequent calls, which can lead to surprising behavior
if tests later import the real magic module; either remove the `@lru_cache`
decorator or (preferred) keep it but add a brief comment inside _load_llm_module
explaining that sys.modules already caches imports and that the cached result
means the "magic" stub will not be re-installed on later calls so tests should
call this once or manage stubbing themselves — reference the _load_llm_module
function and the patch.dict usage when adding the comment.

34-35: Dead helper: _load_llm_class is never called.

Every test uses _load_llm_module().LLM directly (e.g., lines 104, 135, 167). Consider removing _load_llm_class or using it in place of the inline llm_module.LLM lookups for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/tests/test_openai_compatible_adapter.py` around lines 34 - 35,
The helper function _load_llm_class is unused (dead code); either remove
_load_llm_class entirely or replace direct usages of _load_llm_module().LLM in
tests (e.g., the inline lookups at places that call llm_module.LLM) with calls
to _load_llm_class() for consistency. Locate the definition of _load_llm_class
and the test files referencing _load_llm_module().LLM and either delete the
unused _load_llm_class function or update those tests to call _load_llm_class()
instead, ensuring imports and type annotations still match.
unstract/sdk1/src/unstract/sdk1/adapters/base1.py (1)

234-242: Minor: validate() mutates the caller's dict.

Lines 236 and 241 write back into adapter_metadata (same pattern as OpenAILLMParameters, but unlike VertexAILLMParameters which copies first via {**adapter_metadata}). Given LLM.complete() calls self.adapter.validate({**self.kwargs, **kwargs}) (a fresh dict), there's no current bug — but if a future caller passes a long-lived dict, the api_key would be mutated in place and the model prefix would double-rewrite on a second call (the startswith("custom_openai/") guard in validate_model mitigates the latter).

Optional: copy first for defensive hygiene, matching the VertexAILLMParameters.validate() style.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/adapters/base1.py` around lines 234 - 242,
The validate() method currently mutates the incoming adapter_metadata dict (it
writes to adapter_metadata["model"] and adapter_metadata["api_key"]); to avoid
in-place side effects make a shallow copy first (e.g., metadata =
{**adapter_metadata}) and perform all modifications against that copy before
passing it to OpenAICompatibleLLMParameters.validate_model and constructing
OpenAICompatibleLLMParameters(**metadata).model_dump(); keep references to the
same symbols (validate, OpenAICompatibleLLMParameters.validate_model,
OpenAICompatibleLLMParameters, adapter_metadata) so the change is local and
preserves existing behavior while preventing caller dict mutation.
unstract/sdk1/src/unstract/sdk1/llm.py (1)

557-557: Minor: prompt_tokens or 0 also zeroes out legitimate 0 from provider.

If a provider ever reports usage.prompt_tokens == 0 (unusual, but possible for zero-content requests or certain gateways), the truthiness check collapses it the same as None. Given the prompt_tokens is None branch already assigns an int (or 0 on exception), this or 0 is only needed to satisfy the type checker. A more precise form:

Proposed tweak
-    all_tokens = TokenCounterCompat(
-        prompt_tokens=prompt_tokens or 0,
+    all_tokens = TokenCounterCompat(
+        prompt_tokens=prompt_tokens if prompt_tokens is not None else 0,
         completion_tokens=usage_data.get("completion_tokens", 0),
         total_tokens=usage_data.get("total_tokens", 0),
     )

Low-impact since 0 vs None ends up the same in the audit row, but semantically cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/llm.py` at line 557, Replace the truthiness
fallback that zeroes out legitimate zero values: instead of using
"prompt_tokens=prompt_tokens or 0" keep the explicit None-check so only None
becomes 0 (e.g., use a conditional expression that assigns prompt_tokens if
prompt_tokens is not None else 0). Locate the occurrence of the
"prompt_tokens=prompt_tokens or 0" assignment and change it to an explicit
None-check for the variable prompt_tokens so a reported 0 remains 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 543-560: The current catch in the prompt token estimation around
token_counter (used when building TokenCounterCompat) silently sets
prompt_tokens=0; update this to (1) narrow the except to only expected errors
from the estimator (e.g., KeyError/ValueError and litellm-specific exceptions
raised by token_counter) so unexpected errors still propagate, and (2) add a
sentinel field to the usage payload (e.g., prompt_tokens_source or
estimation_failed) before calling Audit().push_usage_data to mark that prompt
tokens were estimated/failed, and/or increment an ops metric/counter when the
fallback path occurs; reference the token_counter call, TokenCounterCompat
construction, Audit().push_usage_data, and the existing logger to emit a clear
warning and metric.

---

Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/adapters/base1.py`:
- Around line 234-242: The validate() method currently mutates the incoming
adapter_metadata dict (it writes to adapter_metadata["model"] and
adapter_metadata["api_key"]); to avoid in-place side effects make a shallow copy
first (e.g., metadata = {**adapter_metadata}) and perform all modifications
against that copy before passing it to
OpenAICompatibleLLMParameters.validate_model and constructing
OpenAICompatibleLLMParameters(**metadata).model_dump(); keep references to the
same symbols (validate, OpenAICompatibleLLMParameters.validate_model,
OpenAICompatibleLLMParameters, adapter_metadata) so the change is local and
preserves existing behavior while preventing caller dict mutation.

In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Line 557: Replace the truthiness fallback that zeroes out legitimate zero
values: instead of using "prompt_tokens=prompt_tokens or 0" keep the explicit
None-check so only None becomes 0 (e.g., use a conditional expression that
assigns prompt_tokens if prompt_tokens is not None else 0). Locate the
occurrence of the "prompt_tokens=prompt_tokens or 0" assignment and change it to
an explicit None-check for the variable prompt_tokens so a reported 0 remains 0.

In `@unstract/sdk1/tests/test_openai_compatible_adapter.py`:
- Around line 102-192: Update the
test_record_usage_tolerates_unmapped_models_without_prompt_tokens test to assert
the warning logger was called exactly once and that the warning message includes
the model ("custom_openai/gateway-model") and llm_api ("complete") values;
locate the test function and the mock_warning (patched via
patch.object(llm_module.logger, "warning")) and after calling llm._record_usage
add assertions that mock_warning.assert_called_once() and that the call_args
contains both the model and llm_api strings in the formatted warning message to
catch signature regressions.
- Around line 18-31: The `@lru_cache` on _load_llm_module() prevents the
patch.dict stub for "magic" from being re-applied on subsequent calls, which can
lead to surprising behavior if tests later import the real magic module; either
remove the `@lru_cache` decorator or (preferred) keep it but add a brief comment
inside _load_llm_module explaining that sys.modules already caches imports and
that the cached result means the "magic" stub will not be re-installed on later
calls so tests should call this once or manage stubbing themselves — reference
the _load_llm_module function and the patch.dict usage when adding the comment.
- Around line 34-35: The helper function _load_llm_class is unused (dead code);
either remove _load_llm_class entirely or replace direct usages of
_load_llm_module().LLM in tests (e.g., the inline lookups at places that call
llm_module.LLM) with calls to _load_llm_class() for consistency. Locate the
definition of _load_llm_class and the test files referencing
_load_llm_module().LLM and either delete the unused _load_llm_class function or
update those tests to call _load_llm_class() instead, ensuring imports and type
annotations still match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bb92694d-b745-40e9-8bd9-0bc1fa3628b1

📥 Commits

Reviewing files that changed from the base of the PR and between d3e1cad and f6a2a7d.

⛔ Files ignored due to path filters (1)
  • frontend/public/icons/adapter-icons/OpenAICompatible.png is excluded by !**/*.png
📒 Files selected for processing (5)
  • unstract/sdk1/src/unstract/sdk1/adapters/base1.py
  • unstract/sdk1/src/unstract/sdk1/adapters/llm1/openai_compatible.py
  • unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/custom_openai.json
  • unstract/sdk1/src/unstract/sdk1/llm.py
  • unstract/sdk1/tests/test_openai_compatible_adapter.py
✅ Files skipped from review due to trivial changes (2)
  • unstract/sdk1/src/unstract/sdk1/adapters/llm1/openai_compatible.py
  • unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/custom_openai.json

Comment on lines +543 to 560
prompt_tokens = usage_data.get("prompt_tokens")
if prompt_tokens is None:
try:
prompt_tokens = token_counter(model=model, messages=messages)
except Exception as e:
prompt_tokens = 0
logger.warning(
"[sdk1][LLM][%s][%s] Failed to estimate prompt tokens; "
"recording 0 prompt tokens for usage audit: %s",
model,
llm_api,
e,
)
all_tokens = TokenCounterCompat(
prompt_tokens=usage_data.get("prompt_tokens", 0),
prompt_tokens=prompt_tokens or 0,
completion_tokens=usage_data.get("completion_tokens", 0),
total_tokens=usage_data.get("total_tokens", 0),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent zero-token recording risks corrupting billing/usage audit data.

When token_counter raises (e.g., unmapped custom models in LiteLLM's metadata), the code records prompt_tokens=0 into Audit().push_usage_data. Per unstract/sdk1/src/unstract/sdk1/utils/common.py:114-145 and unstract/sdk1/src/unstract/sdk1/audit.py:85-98, that zero flows directly to the platform's usage record with no sentinel/flag distinguishing "unknown" from "actually zero." For long-running workloads against an OpenAI-compatible endpoint that doesn't return usage.prompt_tokens, this could silently understate prompt-token consumption in cost attribution and analytics.

Consider one of:

  • Tagging the audit payload with an estimation_failed / prompt_tokens_source flag so downstream consumers can distinguish missing data from genuinely zero usage.
  • Narrowing the except (e.g., except (KeyError, ValueError, litellm.exceptions.*)) so truly unexpected errors still propagate instead of being swallowed.
  • Emitting a metric/counter when this fallback triggers so ops can detect silent drift.

A warning log alone is easy to miss in aggregated usage reports. This answers the question raised in the prior review thread on this range.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 543 - 560, The current
catch in the prompt token estimation around token_counter (used when building
TokenCounterCompat) silently sets prompt_tokens=0; update this to (1) narrow the
except to only expected errors from the estimator (e.g., KeyError/ValueError and
litellm-specific exceptions raised by token_counter) so unexpected errors still
propagate, and (2) add a sentinel field to the usage payload (e.g.,
prompt_tokens_source or estimation_failed) before calling
Audit().push_usage_data to mark that prompt tokens were estimated/failed, and/or
increment an ops metric/counter when the fallback path occurs; reference the
token_counter call, TokenCounterCompat construction, Audit().push_usage_data,
and the existing logger to emit a clear warning and metric.

Comment thread unstract/sdk1/src/unstract/sdk1/adapters/base1.py
@jimmyzhuu jimmyzhuu closed this May 6, 2026
@athul-rs
Copy link
Copy Markdown
Contributor

athul-rs commented May 6, 2026

Hi @jimmyzhuu - sorry we let this sit so long without a review, that's on us. The change looks reasonable in scope and the validation looks solid. If you're still interested, we'd be happy to have you reopen it and we'll get a maintainer on it this week. Either way, thanks for the careful work and the patience.
cc: @hari-kuriakose @ritwik-g @jaseemjaskp

@jimmyzhuu jimmyzhuu reopened this May 6, 2026
@jimmyzhuu
Copy link
Copy Markdown
Author

Absolutely — I’d be happy to work on this. Thanks for the suggestion!

@athul-rs athul-rs requested review from athul-rs and pk-zipstack May 6, 2026 08:34
Copy link
Copy Markdown
Contributor

@chandrasekharan-zipstack chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the most part, @pk-zipstack please help take a look as well

"model": {
"type": "string",
"title": "Model",
"default": "gpt-4o-mini",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"default": "gpt-4o-mini",

NIT: Consider removing this default since it might eventually get deprecated

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@pk-zipstack pk-zipstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,61 @@
{
"title": "OpenAI Compatible LLM",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimmyzhuu This could be just OpenAI Compatible.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimmyzhuu Might be better to rename the file also to openai_compatible.json.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hari-kuriakose fixed. Renamed the schema file to openai_compatible.json and updated the adapter schema loading path accordingly.

Copy link
Copy Markdown
Contributor

@jaseemjaskp jaseemjaskp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. After running the PR review toolkit (code-reviewer, comment-analyzer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, code-simplifier), all clearly important findings (correctness, billing/audit data integrity, broken contracts) were already raised by prior reviewers (greptile, coderabbitai, jaseemjaskp) and addressed by the author in the latest commits. Remaining items from agent passes are minor or NIT-level (e.g. tightening api_base/model pydantic validators, schema title/filename naming, end-to-end LLM(init) coverage), so I'm not posting them as inline comments. Resolved my three previously-posted threads that the current code addresses (DESCRIPTION constant, dedicated icon, blank api_key coercion).

@athul-rs
Copy link
Copy Markdown
Contributor

athul-rs commented May 6, 2026

@jimmyzhuu The one thing I'd note is that the exception-swallowing path now applies to every adapter, not just custom_openai. I think that's the right tradeoff (a successful LLM call shouldn't fail at the billing step), but flagging it for the merge commit.
Could you call out the _record_usage change explicitly in the PR description?
cc: @hari-kuriakose @pk-zipstack

@jimmyzhuu
Copy link
Copy Markdown
Author

@jimmyzhuu The one thing I'd note is that the exception-swallowing path now applies to every adapter, not just custom_openai. I think that's the right tradeoff (a successful LLM call shouldn't fail at the billing step), but flagging it for the merge commit.我注意到的一点是,现在异常捕获路径适用于所有适配器,而不仅仅是 custom_openai。我认为这是正确的权衡(一个成功的 LLM 调用不应该在计费步骤失败),但为合并提交标记一下。 Could you call out the _record_usage change explicitly in the PR description?你能明确在 PR 描述中提到 _record_usage 的变更吗? cc: @hari-kuriakose @pk-zipstack

@athul-rs Updated the PR description.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants