A general-purpose LLM SDK for Rust with support for multiple LLM providers.
- Type-safe: Leverages Rust's type system for compile-time guarantees
- Async-first: Built with Tokio for high-performance async operations
- Ergonomic: Builder pattern for easy request construction
- Comprehensive error handling: Detailed error types with context
- Tool/Function Calling: Type-safe tool calling with automatic JSON Schema generation
- Claude support: Full Messages API implementation
- Gemini support: Google Gemini 3 Pro and Flash with reasoning capabilities
- Grok support: xAI and Zen (free) Grok integration with OpenAI-compatible API
- GLM support: Cerebras GLM models with OpenAI-compatible API
- Ollama support: Local models via Ollama
/api/chat - llama.cpp support: Local models via OpenAI-compatible API
- Zen provider: Free access to select models during beta
- OpenAI support: GPT-5 models (GPT-5, GPT-5 mini, GPT-5 nano, GPT-5.1, GPT-5.1 Codex) via Responses API
- Voyage AI support: Text embeddings with multiple specialized models
- Multi-provider: Same models available from different providers
- Extensible: Designed for easy addition of other LLM providers
The SDK uses a trait-based architecture that provides a unified interface across all LLM providers while maintaining type safety and extensibility:
// Core trait that all providers implement
pub trait LlmClient: Send + Sync {
type Response: LlmResponse;
type MessageBuilder: MessageBuilder;
fn message_builder(&self) -> Self::MessageBuilder;
fn with_base_url(self, url: impl Into<String>) -> Self;
}
// Provider-specific implementations
impl LlmClient for ClaudeClient { /* ... */ }
impl LlmClient for GeminiClient { /* ... */ }
impl LlmClient for GrokClient { /* ... */ }
impl LlmClient for GlmClient { /* ... */ }
impl LlmClient for OpenAIClient { /* ... */ }- Type Safety: Each provider has its own response types and message builders
- Unified API: Common operations work across all providers via the trait
- Extensibility: New providers can be added by implementing the trait
- Zero-Cost Abstractions: No runtime overhead from the trait system
- Provider-Specific Features: Access to unique capabilities of each provider
Access the same models through different providers for flexibility in cost, performance, and availability:
| Model | Zen (Free) | xAI (Paid) | Cerebras (Paid) |
|---|---|---|---|
| Grok | grok-code |
grok-code-fast-1 |
- |
| GLM 4.6 | big-pickle |
- | zai-glm-4.6 |
Add this to your Cargo.toml:
[dependencies]
llm-sdk = { path = "../llm-sdk" } # For local development
# OR
llm-sdk = "0.1"use nocodo_llm_sdk::claude::ClaudeClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create client with your Anthropic API key
let client = ClaudeClient::new("your-anthropic-api-key")?;
// Build and send a message
let response = client
.message_builder()
.model("claude-sonnet-4-5-20250929")
.max_tokens(1024)
.user_message("Hello, Claude! How are you today?")
.send()
.await?;
println!("Response: {}", response.content[0].text);
Ok(())
}use nocodo_llm_sdk::gemini::GeminiClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create client with your Google API key
let client = GeminiClient::new("your-google-api-key")?;
// Build and send a message
let response = client
.message_builder()
.model("gemini-3-pro-preview")
.system("You are a helpful assistant")
.user_message("Explain quantum entanglement briefly")
.thinking_level("high") // Enable deep reasoning
.temperature(1.0) // Recommended: keep at 1.0
.max_output_tokens(1024)
.send()
.await?;
// Extract and print the response
for candidate in &response.candidates {
for part in &candidate.content.parts {
if let Some(text) = &part.text {
println!("Gemini: {}", text);
}
}
}
// Print token usage
if let Some(usage) = response.usage_metadata {
println!(
"Usage: {} input, {} output, {} total tokens",
usage.prompt_token_count,
usage.candidates_token_count,
usage.total_token_count
);
}
Ok(())
}use nocodo_llm_sdk::grok::GrokClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create client with your xAI API key
let client = GrokClient::new("your-xai-api-key")?;
// Build and send a message
let response = client
.message_builder()
.model("grok-code-fast-1")
.max_tokens(1024)
.user_message("Write a Rust function to reverse a string.")
.send()
.await?;
println!("Grok: {}", response.choices[0].message.content);
println!(
"Usage: {} input tokens, {} output tokens",
response.usage.prompt_tokens, response.usage.completion_tokens
);
Ok(())
}use nocodo_llm_sdk::glm::GlmClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create client with your Cerebras API key
let client = GlmClient::new("your-cerebras-api-key")?;
// Build and send a message
let response = client
.message_builder()
.model("zai-glm-4.6")
.max_tokens(1024)
.user_message("Explain quantum computing in simple terms.")
.send()
.await?;
// GLM models may return both content and reasoning
println!("GLM: {}", response.choices[0].message.get_text());
println!(
"Usage: {} input tokens, {} output tokens",
response.usage.prompt_tokens, response.usage.completion_tokens
);
Ok(())
}use nocodo_llm_sdk::ollama::OllamaClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Defaults to http://localhost:11434
let client = OllamaClient::new()?;
let response = client
.message_builder()
.model("llama3.1")
.user_message("Hello from Ollama!")
.send()
.await?;
println!("Ollama: {}", response.message.content);
Ok(())
}use nocodo_llm_sdk::llama_cpp::LlamaCppClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Defaults to http://localhost:8080
let client = LlamaCppClient::new()?;
let response = client
.message_builder()
.model("gpt-3.5-turbo")
.user_message("Hello from llama.cpp!")
.send()
.await?;
println!("llama.cpp: {}", response.choices[0].message.content.clone().unwrap_or_default());
Ok(())
}nocodo-llm-sdk supports accessing the same models via different providers, giving you flexibility in cost, performance, and availability.
Access Grok via different providers:
use nocodo_llm_sdk::grok::zen::ZenGrokClient;
// No API key required for free model!
let client = ZenGrokClient::new()?;
let response = client
.message_builder()
.model("grok-code")
.max_tokens(1024)
.user_message("Hello, Grok!")
.send()
.await?;
println!("Response: {}", response.choices[0].message.content);use nocodo_llm_sdk::grok::xai::XaiGrokClient;
let client = XaiGrokClient::new("your-xai-api-key")?;
let response = client
.message_builder()
.model("grok-code-fast-1")
.max_tokens(1024)
.user_message("Hello, Grok!")
.send()
.await?;
println!("Response: {}", response.choices[0].message.content);Access GLM 4.6 via different providers:
use nocodo_llm_sdk::glm::cerebras::CerebrasGlmClient;
let client = CerebrasGlmClient::new("your-cerebras-api-key")?;
let response = client
.message_builder()
.model("zai-glm-4.6")
.max_tokens(1024)
.user_message("Hello, GLM!")
.send()
.await?;
println!("Response: {}", response.choices[0].message.get_text());| Model | Zen (OpenCode) | Native Provider |
|---|---|---|
| Grok | grok-code (free) |
grok-code-fast-1 (xAI, paid) |
| GLM 4.6 | big-pickle (free, limited time) |
zai-glm-4.6 (Cerebras, paid) |
use nocodo_llm_sdk::openai::OpenAIClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create client with your OpenAI API key
let client = OpenAIClient::new("your-openai-api-key")?;
// Build and send a response request
let response = client
.response_builder()
.model("gpt-5-mini")
.input("Write a Python function to check if a number is prime.")
.send()
.await?;
// Extract text from response
let text_content: String = response
.output
.iter()
.filter(|item| item.item_type == "message")
.filter_map(|item| item.content.as_ref())
.flatten()
.filter(|block| block.content_type == "output_text")
.map(|block| block.text.clone())
.collect();
println!("GPT: {}", text_content);
println!(
"Usage: {} input tokens, {} output tokens",
response.usage.input_tokens.unwrap_or(0),
response.usage.output_tokens.unwrap_or(0)
);
Ok(())
}Google's Gemini 3 models with reasoning capabilities and thinking level controls.
The most intelligent model for complex reasoning tasks.
use nocodo_llm_sdk::gemini::GeminiClient;
use nocodo_llm_sdk::models::gemini::GEMINI_3_PRO;
let client = GeminiClient::new("your-google-api-key")?;
let response = client
.message_builder()
.model(GEMINI_3_PRO)
.system("You are a helpful assistant")
.user_message("Write a Rust function to calculate fibonacci numbers")
.thinking_level("high") // Enable deep reasoning (default)
.temperature(1.0) // Recommended: keep at 1.0
.max_output_tokens(1024)
.send()
.await?;
// Extract response text
for candidate in &response.candidates {
for part in &candidate.content.parts {
if let Some(text) = &part.text {
println!("{}", text);
}
}
}Pro-level intelligence at Flash speed for faster responses.
use nocodo_llm_sdk::models::gemini::GEMINI_3_FLASH;
let response = client
.message_builder()
.model(GEMINI_3_FLASH)
.thinking_level("low") // Fast mode for quicker responses
.user_message("What is a REST API?")
.max_output_tokens(200)
.send()
.await?;Gemini 3 supports dynamic thinking control to balance speed vs. reasoning depth:
Gemini 3 Pro:
low: Faster responses with less reasoninghigh(default): Maximum reasoning capability
Gemini 3 Flash:
minimal: Fastest responseslow: Quick with basic reasoningmedium: Balanced speed and reasoninghigh(default): Maximum reasoning
// Fast response mode
let response = client
.message_builder()
.model(GEMINI_3_FLASH)
.thinking_level("minimal")
.user_message("Quick question...")
.send()
.await?;
// Deep reasoning mode
let response = client
.message_builder()
.model(GEMINI_3_PRO)
.thinking_level("high")
.user_message("Complex reasoning task...")
.send()
.await?;let response = client
.message_builder()
.model(GEMINI_3_PRO)
.system("You are a helpful coding assistant")
.user_message("What's the best way to handle errors in Rust?")
.model_message("The best way is to use Result<T, E> for recoverable errors.")
.user_message("Can you show me an example?")
.send()
.await?;- ✅ 1M token context window - Process large documents
- ✅ 64k token output - Generate long-form content
- ✅ Tool/function calling - Integrate with external APIs
- ✅ Vision support - Analyze images (multimodal)
- ✅ Structured outputs - JSON response formatting
- ✅ Thinking level controls - Adjust reasoning depth
- ✅ Thought signature preservation - Maintains reasoning context for multi-step tasks
- Context: 1M input / 64k output tokens
- Knowledge cutoff: January 2025
- Thinking levels:
low,high(default) - Best for: Complex reasoning, autonomous coding, agentic workflows
- Pricing: $2/1M input tokens (<200k), $12/1M output tokens
- Context: 1M input / 64k output tokens
- Knowledge cutoff: January 2025
- Thinking levels:
minimal,low,medium,high(default) - Best for: Fast responses with Pro-level intelligence
- Pricing: $0.50/1M input tokens, $3/1M output tokens
Temperature: Gemini 3 documentation strongly recommends keeping temperature at 1.0 (default). Lowering temperature may cause looping or degraded performance. Use thinking level instead to control response quality/speed.
Thought Signatures: For advanced tool calling with multi-step reasoning, Gemini 3 uses encrypted "thought signatures" to maintain reasoning context. The SDK automatically preserves these signatures across API calls.
let response = client
.message_builder()
.model("claude-sonnet-4-5-20250929")
.max_tokens(1024)
.message("system", "You are a helpful assistant.")
.message("user", "What's the capital of France?")
.message("assistant", "The capital of France is Paris.")
.message("user", "What's its population?")
.send()
.await?;let response = client
.message_builder()
.model("claude-sonnet-4-5-20250929")
.max_tokens(2048)
.temperature(0.7)
.top_p(0.9)
.system("You are an expert programmer.")
.user_message("Write a Rust function to calculate fibonacci numbers.")
.send()
.await?;let response = client
.message_builder()
.model("grok-code-fast-1")
.max_tokens(1024)
.system_message("You are an expert Rust developer.")
.user_message("What's the best way to handle errors in Rust?")
.assistant_message("The best way is to use Result<T, E> for recoverable errors and panic! for unrecoverable errors.")
.user_message("Can you show me an example?")
.send()
.await?;match client
.message_builder()
.model("claude-sonnet-4-5-20250929")
.max_tokens(100)
.user_message("Hello")
.send()
.await
{
Ok(response) => println!("Success: {}", response.content[0].text),
Err(nocodo_llm_sdk::error::LlmError::AuthenticationError { message }) => {
eprintln!("Authentication failed: {}", message);
}
Err(nocodo_llm_sdk::error::LlmError::RateLimitError { message, retry_after }) => {
eprintln!("Rate limited: {} - retry after {:?}", message, retry_after);
}
Err(e) => eprintln!("Other error: {:?}", e),
}Enable LLMs to call external functions with type-safe parameter extraction.
use nocodo_llm_sdk::{openai::OpenAIClient, tools::Tool};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
// Define parameter schema
#[derive(Debug, Serialize, Deserialize, JsonSchema)]
struct WeatherParams {
location: String,
unit: String,
}
// Create tool
let tool = Tool::from_type::<WeatherParams>()
.name("get_weather")
.description("Get weather for a location")
.build();
// Use tool
let response = client
.message_builder()
.user_message("What's the weather in NYC?")
.tool(tool)
.send()
.await?;
// Handle tool calls
if let Some(calls) = response.tool_calls() {
for call in calls {
let params: WeatherParams = call.parse_arguments()?;
// Execute your function...
}
}// Define multiple tools
let search_tool = Tool::from_type::<SearchParams>()
.name("search")
.description("Search the knowledge base")
.build();
let calc_tool = Tool::from_type::<CalculateParams>()
.name("calculate")
.description("Evaluate mathematical expressions")
.build();
// Use multiple tools with parallel execution
let response = client
.message_builder()
.user_message("Search for 'Rust' and calculate 123 * 456")
.tools(vec![search_tool, calc_tool])
.tool_choice(ToolChoice::Auto)
.parallel_tool_calls(true)
.send()
.await?;See examples/tool_calling_*.rs for complete examples.
ToolChoice::Auto: Let the model decide whether to use toolsToolChoice::Required: Force the model to use at least one toolToolChoice::None: Disable tool useToolChoice::Specific { name }: Force a specific tool by name
new(api_key: impl Into<String>) -> Result<Self>: Create a new clientwith_base_url(url: impl Into<String>) -> Self: Set custom API base URLmessage_builder() -> ClaudeMessageBuilder: Start building a message request
new(api_key: impl Into<String>) -> Result<Self>: Create a new clientwith_base_url(url: impl Into<String>) -> Self: Set custom API base URLmessage_builder() -> GeminiMessageBuilder: Start building a message request
new(api_key: impl Into<String>) -> Result<Self>: Create a new clientwith_base_url(url: impl Into<String>) -> Self: Set custom API base URLmessage_builder() -> GrokMessageBuilder: Start building a message request
new(api_key: impl Into<String>) -> Result<Self>: Create a new clientwith_base_url(url: impl Into<String>) -> Self: Set custom API base URLmessage_builder() -> GlmMessageBuilder: Start building a message request
new(api_key: impl Into<String>) -> Result<Self>: Create a new clientwith_base_url(url: impl Into<String>) -> Self: Set custom API base URLresponse_builder() -> OpenAIResponseBuilder: Start building a response request
new() -> Result<Self>: Create a new client (no API key required)with_base_url(url: impl Into<String>) -> Self: Set custom API base URLmessage_builder() -> OllamaMessageBuilder: Start building a message request
new() -> Result<Self>: Create a new client (no API key required)with_api_key(api_key: impl Into<String>) -> Self: Set optional API keywith_base_url(url: impl Into<String>) -> Self: Set custom API base URLmessage_builder() -> LlamaCppMessageBuilder: Start building a message request
model(model: impl Into<String>) -> Self: Set the modelmax_tokens(tokens: u32) -> Self: Set maximum tokensmessage(role: impl Into<String>, content: impl Into<String>) -> Self: Add a messageuser_message(content: impl Into<String>) -> Self: Add a user messageassistant_message(content: impl Into<String>) -> Self: Add an assistant messagesystem_message(content: impl Into<String>) -> Self: Add a system messagetemperature(temp: f32) -> Self: Set temperature (0.0-2.0)top_p(top_p: f32) -> Self: Set top-p samplingstop_sequences(sequences: Vec<String>) -> Self: Set stop sequencessend() -> Result<Response>: Send the request
Note: Claude also supports system() for system prompts, while Grok uses system_message().
model(model: impl Into<String>) -> Self: Set the modeluser_message(content: impl Into<String>) -> Self: Add a user messagemodel_message(content: impl Into<String>) -> Self: Add a model (assistant) messagecontent(content: GeminiContent) -> Self: Add a complete content objectsystem(text: impl Into<String>) -> Self: Set system instructionthinking_level(level: impl Into<String>) -> Self: Set thinking level (minimal/low/medium/high)temperature(temp: f32) -> Self: Set temperature (recommended: 1.0)max_output_tokens(tokens: u32) -> Self: Set maximum output tokenstop_p(top_p: f32) -> Self: Set top-p samplingtop_k(top_k: u32) -> Self: Set top-k samplingtool(tool: GeminiTool) -> Self: Add a tool/function declarationtool_config(config: GeminiToolConfig) -> Self: Configure tool calling behaviorsend() -> Result<GeminiGenerateContentResponse>: Send the request
AuthenticationError: Invalid API keyRateLimitError: Rate limit exceeded (includes retry_after info)InvalidRequestError: Malformed requestApiError: API error with status codeNetworkError: Network/connection issuesParseError: JSON parsing errorsInternalError: Unexpected internal errors
cargo testIntegration tests require valid API keys:
# Claude integration tests
ANTHROPIC_API_KEY=your-key-here cargo test --test claude_integration -- --ignored
# Gemini integration tests
GEMINI_API_KEY=your-key-here cargo test --test gemini_integration -- --ignored
# Grok integration tests
XAI_API_KEY=your-key-here cargo test --test grok_integration -- --ignored
# GLM integration tests
CEREBRAS_API_KEY=your-key-here cargo test --test glm_integration -- --ignoredSee the examples/ directory for complete working examples:
simple_completion.rs: Basic Claude usagegemini_simple.rs: Gemini 3 Pro example with reasoninggemini_flash.rs: Gemini 3 Flash fast response examplegrok_completion.rs: Basic Grok usageglm_completion.rs: Basic GLM usagevoyage_embeddings.rs: Text embeddings with Voyage AItool_calling_weather.rs: Simple tool calling exampletool_calling_agent.rs: Multi-tool agent example
This SDK is in active development. v0.1 provides Claude and Grok support with a clean, standalone API.
- Claude (Anthropic): Full Messages API implementation
- Models: claude-sonnet-4-5, claude-opus-4-5, claude-haiku-4-5, etc.
- Gemini (Google): Gemini API with reasoning capabilities
- Models: gemini-3-pro-preview, gemini-3-flash-preview
- Features: Thinking level controls, 1M context, tool calling
- Grok (xAI): OpenAI-compatible API
- Models: grok-code-fast-1, grok-beta, grok-vision-beta
- GLM (Cerebras): OpenAI-compatible API
- Models: zai-glm-4.6, llama-3.3-70b
- Ollama (Local):
/api/chatendpoint- Models: local models installed in Ollama
- llama.cpp (Local): OpenAI-compatible API
- Models: local models served by llama-server
- OpenAI: Responses API
- Models: gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.1, gpt-5.1-codex
- Voyage AI: Text embeddings
- Models: voyage-4, voyage-4-large, voyage-4-lite, specialized domain models
- Streaming responses
- Advanced features (vision, extended thinking)
- Multi-language bindings (Python, Node.js)
Licensed under the same terms as the nocodo project.
Contributions welcome! Please see the main nocodo repository for contribution guidelines.