LocalCode Middleware Server

A lightweight OpenAI-compatible proxy for OpenCode that routes requests to OpenCode Zen (free GLM 4.7) or local llama.cpp. Powered by LiteLLM.

Features

OpenAI-Compatible API: /v1/chat/completions endpoint
Free Models: GLM 4.7, Big Pickle, Grok Code, Alpha GD4 via OpenCode Zen
Local Models: llama.cpp via local server
Auto-Retries: Built-in retry logic for rate limits (429)
Connection Pooling: Efficient httpx client reuse
Structured Logging: Request/response/event logging with category markers

Installation

# Install with poetry
cd localcode
poetry install

# Or with pip
pip install litellm pyyaml

Configuration

Edit config.yaml to configure models and settings:

model_list:
  - model_name: glm-4.7-free
    litellm_params:
      model: glm-4.7-free
      api_base: https://opencode.ai/zen/v1
      api_key: "public"  # Free tier - no key needed
      max_retries: 3
      retry_after: 10  # Wait 10s on 429 before retry
      timeout: 300.0

litellm_settings:
  default_max_retries: 3
  default_retry_after: 10
  allow_auth_on_null_key: true

Environment Variables

Variable	Default	Description
`LITELLM_CONFIG`	`config.yaml`	Path to config file
`LITELLM_HOST`	`0.0.0.0`	Server host
`LITELLM_PORT`	`4242`	Server port
`LITELLM_LOGLEVEL`	`INFO`	Logging level

Running

# Start with default config
poetry run python main.py

# Or use litellm directly
litellm --config config.yaml

# Custom port
poetry run python main.py --port 8080

# Custom config
poetry run python main.py --config my-config.yaml

Switching Backends

Cloud (OpenCode Zen - Default)

model_list:
  - model_name: glm-4.7-free
    litellm_params:
      api_base: https://opencode.ai/zen/v1
      api_key: "public"

Local (llama.cpp)

model_list:
  - model_name: qwen3-coder:a3b
    litellm_params:
      api_base: http://localhost:8080/v1
      api_key: "no-key-required"
      max_retries: 0  # Local - no retries needed

OpenCode Configuration

Add to your opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "localcode": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "localcode",
      "options": {
        "baseURL": "http://localhost:4242/v1"
      },
      "models": {
        "glm-4.7-free": {
          "name": "GLM-4.7 Free",
          "limit": {
            "context": 204800,
            "output": 131072
          }
        }
      }
    }
  }
}

API Endpoints

Endpoint	Description
`POST /v1/chat/completions`	Chat completions (OpenAI-compatible)
`GET /v1/models`	List available models
`GET /health`	Health check

Log Output

The middleware prints structured logs:

Request

================================================================================
[REQUEST] 12:34:56
Model: glm-4.7-free
Stream: False
Messages count: 1
================================================================================

Response

--------------------------------------------------------------------------------
[RESPONSE] 12:34:58
Content: I'll help you edit the function...
Finish reason: stop
Usage - prompt: 150, completion: 45, total: 195
[Cached Tokens: 12]
--------------------------------------------------------------------------------

Tool Call

[Tool Call] edit
[Tool Call] read

Reasoning (GLM 4.7)

[STREAM CHUNK] 12:34:57 [REASONING] Let me think about this...
[STREAM CHUNK] 12:34:58 [REASONING] 1+1=2, so 2+2=4
[STREAM CHUNK] 12:34:59 4

Architecture

Refactored from custom Python implementation to LiteLLM proxy:

Before	After
6 modules, ~736 lines	3 files, ~50 lines
Custom HTTP client	Built-in LiteLLM router
Custom SSE parsing	`CustomStreamWrapper`
No retry logic	`max_retries`, `retry_after`
No connection pooling	httpx client reuse

Files

config.yaml - LiteLLM model configuration
main.py - Server entrypoint
logging_callbacks.py - Custom event logging
opencode.json - OpenCode provider config

License

Same as parent OpenCode project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LocalCode Middleware Server

Features

Installation

Configuration

Environment Variables

Running

Switching Backends

Cloud (OpenCode Zen - Default)

Local (llama.cpp)

OpenCode Configuration

API Endpoints

Log Output

Request

Response

Tool Call

Reasoning (GLM 4.7)

Architecture

Files

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LocalCode Middleware Server

Features

Installation

Configuration

Environment Variables

Running

Switching Backends

Cloud (OpenCode Zen - Default)

Local (llama.cpp)

OpenCode Configuration

API Endpoints

Log Output

Request

Response

Tool Call

Reasoning (GLM 4.7)

Architecture

Files

License