feat: Async Journey: LiteLLM Removal from Async Engine by eric-tramel · Pull Request #310 · NVIDIA-NeMo/DataDesigner

eric-tramel · 2026-02-07T03:02:33Z

Summary

Adds a comprehensive analysis of removing the litellm dependency from Data Designer. This is a planning document — no code changes.

Key findings

LiteLLM is well-contained (12 production files, all in engine/models/ and engine/models_v2/)
DD underuses it — each ModelFacade creates a Router with a single deployment (no load balancing, no failover)
OpenAI and Anthropic SDKs handle retry/backoff natively; Bedrock does not (manual retry needed for throttling)
Anthropic adapter is HIGH risk due to structurally different response format (content blocks vs strings)

Implementation plan (4 phases)

Replace Router with ModelClient in models_v2/ — OpenAI SDK adapter, keep OpenAI response format as canonical type. models/ untouched as fallback.
Validate — Benchmark, test suite, real inference with env var enabled.
Additional provider adapters — Anthropic + Bedrock. models/ fallback still available.
Consolidate and drop dependency — Delete models/, remove litellm from deps. Only after all adapters are proven.

Reviewed by

10 independent code reviewers examined the report against the actual codebase. Corrections incorporated: expanded test blast radius (4 files, ~56 functions), upgraded Anthropic risk to HIGH, added MCP facade cross-layer caveat, corrected dependency impact analysis.

Comprehensive analysis of removing the litellm dependency from Data Designer. Covers blast radius (per-phase), provider SDK research (OpenAI, Anthropic, Bedrock), risk assessment, and a 4-phase implementation plan using the models_v2/ parallel stack approach. Co-Authored-By: Remi <noreply@anthropic.com>

greptile-apps · 2026-02-07T03:04:59Z

Greptile Overview

Greptile Summary

Adds comprehensive planning document for removing LiteLLM dependency from Data Designer. The document is thorough, well-structured, and demonstrates deep understanding of the codebase through 10 independent reviewer validations.

Key strengths:

Well-contained blast radius (12 production files, all in engine/models/ and engine/models_v2/)
Pragmatic 4-phase approach with parallel implementation in models_v2/ to maintain fallback
Accurate technical analysis confirmed against actual codebase (dependency version, file counts, test impact)
Honest risk assessment (Anthropic and Bedrock adapters marked HIGH risk due to response format incompatibilities)
Comprehensive test migration plan (~56 test functions identified)
Clear decision on response format (keep OpenAI structure as canonical to minimize cross-layer changes)

Implementation strategy:

Phase 1: OpenAI adapter in models_v2/ (low risk)
Phase 2: Validation via benchmarks and tests
Phase 3: Anthropic + Bedrock adapters (high risk, structural differences)
Phase 4: Remove models/ and drop dependency (only after validation)

The document correctly identifies that Data Designer underuses LiteLLM (single-deployment Router instead of load balancing), making removal feasible. The parallel stack approach via DATA_DESIGNER_ASYNC_ENGINE env var provides safe rollback mechanism.

Confidence Score: 5/5

This PR is safe to merge with minimal risk - it only adds planning documentation with no code changes
Documentation-only change with comprehensive technical analysis that has been validated by 10 independent reviewers. No production code modified, no breaking changes, no runtime impact. The planning document demonstrates thorough understanding of the codebase and provides clear implementation roadmap with appropriate risk assessment.
No files require special attention

Important Files Changed

Filename	Overview
LITELLM_REMOVAL_ANALYSIS.md	New comprehensive planning document analyzing LiteLLM removal strategy with 4-phase implementation plan

Sequence Diagram

sequenceDiagram
    participant Config as Config Layer
    participant Factory as models_v2/factory.py
    participant Facade as models_v2/facade.py
    participant Client as ModelClient (OpenAI/Anthropic/Bedrock)
    participant SDK as Provider SDK
    participant API as Provider API

    Note over Config,Factory: Phase 1: OpenAI Adapter
    Config->>Factory: create_model_registry(model_configs)
    Factory->>Factory: Construct OpenAIModelClient
    Factory->>Client: Initialize with api_key, base_url
    Factory->>Facade: ModelFacade(client=OpenAIModelClient)
    
    Note over Facade,API: Inference Request Flow
    Facade->>Facade: completion(messages, **params)
    Facade->>Client: client.completion(messages, **kwargs)
    Client->>Client: Translate DD params → SDK params
    Client->>SDK: await sdk.chat.completions.create(...)
    SDK->>SDK: Built-in retry/backoff
    SDK->>API: HTTPS POST /v1/chat/completions
    API-->>SDK: 200 OK with response
    SDK-->>Client: OpenAI response object
    Client->>Client: Extract content, tool_calls, usage
    Client-->>Facade: CompletionResponse
    Facade-->>Config: Generated text

    Note over Factory,Client: Phase 3: Multi-Provider
    Factory->>Factory: match provider_type
    alt provider_type == "openai"
        Factory->>Client: OpenAIModelClient
    else provider_type == "anthropic"
        Factory->>Client: AnthropicModelClient
        Note over Client: Translates content blocks → string
    else provider_type == "bedrock"
        Factory->>Client: BedrockModelClient
        Note over Client: Manual retry for throttling
    end

    Note over Facade,SDK: Error Handling
    SDK-->>Client: SDK-specific exception (e.g., RateLimitError)
    Client->>Client: Map to DD error types
    Client-->>Facade: ModelRateLimitError
    Facade-->>Config: Propagate with FormattedLLMErrorMessage

eric-tramel requested a review from a team as a code owner February 7, 2026 03:02

eric-tramel changed the title ~~docs: LiteLLM removal impact analysis and implementation plan~~ feat: Async Journey: LiteLLM Removal from Async Engine Feb 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Async Journey: LiteLLM Removal from Async Engine#310

feat: Async Journey: LiteLLM Removal from Async Engine#310
eric-tramel wants to merge 1 commit intoasync/async-facadefrom
async/litellm-removal

eric-tramel commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026

Confidence Score: 5/5

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eric-tramel commented Feb 7, 2026

Summary

Key findings

Implementation plan (4 phases)

Reviewed by

Uh oh!

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant