A single normalized Go interface for LLM inference across multiple providers, using their official SDKs where available.
- Features at a glance
- Installation
- Quickstart
- Examples
- Provider configuration
- Supported providers
- Model presets
- Model capabilities and normalization
- HTTP debugging
- Notes
- Development
- License
-
Single normalized interface via
ProviderSetAPI -
Provider support today:
- Anthropic Messages API via
github.com/anthropics/anthropic-sdk-go - OpenAI Responses API via
github.com/openai/openai-go/v3 - OpenAI Chat Completions API via
github.com/openai/openai-go/v3 - Google Generate Content API via
google.golang.org/genai
- Anthropic Messages API via
-
Normalized request/response model in
spec/:- text, image, and file input content
- assistant/user/tool/reasoning content
- function/custom/web-search tool definitions and tool calls
- structured output and verbosity controls
- reasoning/thinking controls
- streaming events for text and thinking
- usage accounting
- cache-control normalization where supported
-
Request normalization before provider calls:
- capability-driven validation and safe dropping of unsupported features
- warnings returned in
FetchCompletionResponse.Warnings - per-model capability override support through
FetchCompletionOptions.CapabilityResolver
-
Streaming:
- text streaming for supported providers
- thinking/reasoning streaming where the provider exposes it
-
Debugging:
- pluggable
CompletionDebugger - built-in HTTP debugger in
debugclient
- pluggable
# Go 1.26+
go get github.com/flexigpt/inference-goBasic flow:
- Create a
ProviderSetAPI. - Register one or more providers with
AddProvider- The easiest path is to use a predefined vendor specific
modelpreset, which contains provider connection defaults, model defaults, and per-provider/per-model capability overrides.
- The easiest path is to use a predefined vendor specific
- Set each provider API key with
SetProviderAPIKey - Call
FetchCompletion
Basic:
ctx := context.Background()
ps, err := inference.NewProviderSetAPI()
if err != nil {
return err
}
providerPreset, err := modelpreset.Provider(modelpreset.ProviderOpenAIResponses)
if err != nil {
return err
}
modelPreset, err := modelpreset.Model(modelpreset.ProviderOpenAIResponses, modelpreset.PresetOpenAIResponsesGPT5Mini)
if err != nil {
return err
}
if _, err := ps.AddProviderFromPreset(ctx, providerPreset.Name, providerPreset); err != nil {
return err
}
if err := ps.SetProviderAPIKey(ctx, providerPreset.Name, os.Getenv("OPENAI_API_KEY")); err != nil {
return err
}
completionKey := string(modelPreset.ID)
resolver, err := ps.NewPresetCapabilityResolver(
ctx,
providerPreset.Name,
providerPreset,
modelPreset,
completionKey,
)
if err != nil {
return err
}
modelParam := modelPreset.ModelParam
modelParam.Stream = false
modelParam.MaxOutputLength = 2048
modelParam.SystemPrompt = "You are concise."
resp, err := ps.FetchCompletion(ctx, providerPreset.Name, &spec.FetchCompletionRequest{
ModelParam: modelParam,
Inputs: []spec.InputUnion{{
Kind: spec.InputKindInputMessage,
InputMessage: &spec.InputOutputContent{
Role: spec.RoleUser,
Contents: []spec.InputOutputContentItemUnion{{
Kind: spec.ContentItemKindText,
TextItem: &spec.ContentItemText{Text: "Say hello in one sentence."},
}},
},
}},
}, &spec.FetchCompletionOptions{
CompletionKey: completionKey,
CapabilityResolver: resolver,
})
if err != nil {
return err
}
_ = respAvailable repository examples:
-
Anthropic
-
OpenAI
-
Google
-
Capability override example (get provider caps, override per-model)
Providers are registered dynamically with ProviderSetAPI.AddProvider.
type AddProviderConfig struct {
SDKType spec.ProviderSDKType
Origin string
ChatCompletionPathPrefix string
APIKeyHeaderKey string
DefaultHeaders map[string]string
}Fields:
-
SDKTypespec.ProviderSDKTypeAnthropicspec.ProviderSDKTypeOpenAIChatCompletionsspec.ProviderSDKTypeOpenAIResponsesspec.ProviderSDKTypeGoogleGenerateContent
-
Origin- Required
- Base origin for the provider or gateway/proxy
-
ChatCompletionPathPrefix- Optional generic path prefix
- Historical field name, reused across providers
- Useful when routing through a gateway path prefix
- Adapters trim built-in endpoint suffixes when needed:
- Anthropic: trailing
v1/messages - OpenAI Chat: trailing
chat/completions - OpenAI Responses: trailing
responses
- Anthropic: trailing
-
APIKeyHeaderKey- Optional override for non-standard gateway auth headers
-
DefaultHeaders- Optional extra headers added to every request
| Area | Support | Notes |
|---|---|---|
| Text input/output | yes | User/assistant messages normalized |
| Streaming text | yes | |
| Reasoning/thinking | yes | Signed thinking and redacted thinking supported |
| Streaming thinking | yes | Redacted thinking is not streamed |
| Output format | yes | text and jsonSchema |
| Output verbosity | yes | maps to Anthropic effort |
| Stop sequences | yes | maps to stop_sequences |
| Images input | yes | base64 or URL |
| Files input | partial | PDFs supported; plain-text file document mapping is still pending |
| Function/custom tools | yes | |
| Web search | yes | server-side web search tool and result blocks |
| Tool policy | yes | auto, any, tool, none |
| Cache control | partial | top-level, input/output content, tool choice, tool call, tool output |
| Citations | partial | URL citations normalized |
| Usage | yes | input/output/cached; no explicit reasoning token count from Anthropic |
Normalization notes:
- reasoning input history keeps Anthropic-compatible signed/redacted reasoning only
- if an interleaved tool-result turn requires Anthropic thinking to be enabled/disabled, the adapter applies the needed override
- tool-result ordering is normalized for Anthropic’s strict tool-use/tool-result turn rules
| Area | Support | Notes |
|---|---|---|
| Text input/output | yes | |
| Streaming text | yes | |
| Reasoning/thinking | yes | config + reasoning output items |
| Streaming thinking | yes | reasoning summary and reasoning text deltas |
| Output format | yes | text and jsonSchema |
| Output verbosity | yes | |
| Stop sequences | no | dropped with warning by normalization |
| Images input | yes | base64 or URL |
| Files input | yes | base64 or URL |
| Function/custom tools | yes | custom tool definitions are currently emitted as function tools |
| Web search | yes | built-in web search tool |
| Tool policy | yes | auto, any, tool, none |
| Cache control | partial | top-level prompt cache only |
| Citations | yes | URL citations normalized |
| Usage | yes | input/output/cached/reasoning |
Normalization notes:
- reasoning input history is sanitized to OpenAI-compatible encrypted reasoning only
- if no encrypted reasoning input exists, reasoning history items are dropped
- stateful Responses features like
previous_response_idand provider-side storage are intentionally not normalized
| Area | Support | Notes |
|---|---|---|
| Text input/output | yes | first choice only is surfaced |
| Streaming text | yes | |
| Reasoning config | yes | reasoning effort only |
| Streaming thinking | no | API does not expose separate reasoning stream |
| Reasoning message history | no | dropped by adapter |
| Output format | yes | text and jsonSchema |
| Output verbosity | yes | max maps to high |
| Stop sequences | yes | up to 4 |
| Images input | yes | base64 data URL or remote URL |
| Files input | partial | embedded file data only |
| Function/custom tools | yes | custom tool definitions are currently emitted as function tools |
| Web search | yes | via top-level web_search_options, not as a normal tool call |
| Tool policy | yes | auto, any, tool, none |
| Cache control | partial | top-level prompt cache only |
| Citations | yes | URL citations from annotations |
| Usage | yes | input/output/cached/reasoning |
| System prompt role | yes | sent as developer for o* / gpt-5* model families, else system |
Normalization notes:
- reasoning message inputs are dropped because Chat Completions does not support structured reasoning history
- tool outputs are normalized back in as text-only tool messages
- web search forcing semantics differ from function tools because Chat Completions exposes web search as top-level request options, not as a standard tool call
| Area | Support | Notes |
|---|---|---|
| Text input/output | yes | first candidate only is surfaced |
| Streaming text | yes | |
| Reasoning/thinking | yes | config + Google-native signed thought history; signatures on assistant text and function-tool-call parts are preserved for replay |
| Streaming thinking | yes | streams thought text when exposed by the API |
| Output format | partial | text and jsonSchema; currently only the raw schema payload is forwarded |
| Output verbosity | no | dropped with warning by normalization |
| Stop sequences | yes | normalized up to capability max |
| Images input | yes | inline bytes or URI |
| Files input | yes | inline bytes or URI |
| Function/custom tools | yes | custom tool definitions are emitted as function declarations |
| Web search | yes | Google Search grounding normalized as web-search call/output |
| Tool policy | partial | auto, any, tool, none for callable tools; web search cannot be forced as a callable tool |
| Cache control | no | dropped with warning by normalization |
| Citations | partial | grounding is normalized as web-search tool outputs, not attached to text citations yet |
| Usage | yes | input/output/cached/reasoning |
Normalization notes:
- reasoning input history keeps only valid Google-native signed thoughts
- non-Google reasoning history is sanitized out before request conversion
- assistant text/tool-call signatures emitted by Gemini are preserved and passed back on follow-up turns
- function tool output history is currently text-only
ToolPolicy.DisableParallelis not currently normalized for Google Generate Content
Package modelpreset provides a runtime catalog of common providers and models.
It includes:
- provider names
- model preset IDs
- model names
- provider connection defaults
- model default
spec.ModelParam - provider-level capability overrides
- model-level capability overrides
- default provider/default model hints
Typical use:
providerPreset, err := modelpreset.Provider(modelpreset.ProviderAnthropic)
modelPreset, err := modelpreset.Model(modelpreset.ProviderAnthropic, modelpreset.PresetAnthropicSonnet46)
_, err = ps.AddProviderFromPreset(ctx, providerPreset.Name, providerPreset)- The returned presets are cloned. Callers may mutate/customize returned values safely as required.
- Apps that need persistence should treat
modelpresetas immutable base data and store their own overlay/preference fields separately.
Capabilities are described by spec.ModelCapabilities in spec/capability.go.
Default provider capability profiles live in:
- Anthropic:
internal/anthropicsdk/capability.go - OpenAI Responses:
internal/openairesponsessdk/capability.go - OpenAI Chat:
internal/openaichatsdk/capability.go - Google Generate Content:
internal/googlegeneratecontentsdk/capability.go
You can inspect the active provider-wide default via:
ProviderSetAPI.GetProviderCapability(ctx, providerName)
Normalization behavior:
- unsupported contract-like features generally return an error
- example: unsupported output format
- unsupported safe-to-drop features are removed and reported via
FetchCompletionResponse.Warnings- example: unsupported verbosity or cache-control scope
- some provider-specific history items are sanitized before request conversion
- Anthropic: only Anthropic-compatible reasoning history is retained
- OpenAI Responses: only encrypted reasoning history is retained
- OpenAI Chat: reasoning history is dropped
- Google: only valid signed Google thought history is retained
For per-model capability differences, pass a custom spec.ModelCapabilityResolver in FetchCompletionOptions.
- For most model-preset based callers, use:
ProviderSetAPI.NewPresetCapabilityResolvercapabilityoverride.DeriveModelCapabilitiescapabilityoverride.NewCompletionKeyResolver
Provider SDKs expose broad provider-level capabilities. Real models often differ:
- one model may not support files
- one model may only allow a subset of reasoning levels
- one gateway may use a different parameter dialect
- one model may require temperature to be omitted when reasoning is enabled
capabilityoverride.ModelCapabilitiesOverride is a patch-like form of spec.ModelCapabilities.
Layering order is:
- SDK/provider base capability profile
- provider preset override
- model preset override
- caller/user override, if any
Use ProviderSetAPI.NewPresetCapabilityResolver for the common case:
resolver, err := ps.NewPresetCapabilityResolver(
ctx,
providerPreset.Name,
providerPreset,
modelPreset,
string(modelPreset.ID),
)Then pass it per completion:
opts := &spec.FetchCompletionOptions{
CompletionKey: string(modelPreset.ID),
CapabilityResolver: resolver,
}AddProviderFromPreset only configures the provider connection. Capability overrides are applied per completion through FetchCompletionOptions, because
the active model can differ from call to call.
The library exposes a pluggable CompletionDebugger:
type CompletionDebugger interface {
HTTPClient(base *http.Client) *http.Client
StartSpan(ctx context.Context, info *spec.CompletionSpanStart) (context.Context, spec.CompletionSpan)
}Package debugclient includes a ready-to-use implementation:
- wraps provider SDK HTTP clients
- captures scrubbed request/response metadata
- redacts secrets and sensitive content
- attaches structured debug data to
FetchCompletionResponse.DebugDetails
Typical setup:
dbg := debugclient.NewHTTPCompletionDebugger(&debugclient.DebugConfig{
LogToSlog: false,
})
ps, _ := inference.NewProviderSetAPI(
inference.WithDebugClientBuilder(func(p spec.ProviderParam) spec.CompletionDebugger {
return dbg
}),
)-
Stateless focus
- the SDK intentionally focuses on stateless request/response flows
- provider-native conversation state, uploaded file IDs, stored responses, and similar stateful features are out of scope for the normalized interface
-
Opaque provider-specific fields
- many provider-native details remain available only through debug payloads, not the normalized response structs
-
Prompt filtering
ModelParam.MaxPromptLengthuses a heuristic tokenizer viasdkutil.FilterMessagesByTokenCount- it is approximate, not a provider tokenizer
-
Choice/candidate handling
- OpenAI Chat surfaces the first choice
- Google Generate Content surfaces the first candidate
- Formatting/linting uses the repository configuration in
.golangci.yml - Useful scripts are available in
taskfile.yml - PRs are welcome
- keep the public surface small and provider-neutral
- avoid leaking provider SDK types into
package inferenceorspec
Copyright (c) 2026 - Present - Pankaj Pipada
All source code in this repository, unless otherwise noted, is licensed under the MIT License. See LICENSE for details.