Skip to content
This repository was archived by the owner on Nov 26, 2025. It is now read-only.

Latest commit

 

History

History
363 lines (250 loc) · 6.73 KB

File metadata and controls

363 lines (250 loc) · 6.73 KB

LLM Providers

Rohas supports multiple LLM providers, allowing you to use different AI models based on your needs.

Table of Contents

  1. Supported Providers
  2. OpenAI
  3. Anthropic
  4. Ollama
  5. Local Models
  6. Provider Configuration
  7. Model Selection

Supported Providers

  • OpenAI - GPT-4, GPT-3.5, and other OpenAI models
  • Anthropic - Claude 3 Opus, Sonnet, Haiku
  • Ollama - Local models via Ollama
  • Local - Custom local model servers

OpenAI

Setup

  1. Get an API key from OpenAI
  2. Use the --provider openai flag with your API key

Usage

rohas run script.ro --provider openai --api-key YOUR_API_KEY

In Script

prompt "Hello, how are you?"
  model: "gpt-4"
  temperature: 0.7
  maxTokens: 100

Available Models

  • gpt-4 - Most capable model
  • gpt-4-turbo - Faster GPT-4
  • gpt-3.5-turbo - Fast and cost-effective
  • gpt-4o - Latest GPT-4 variant

Example

prompt "Explain quantum computing"
  model: "gpt-4"
  temperature: 0.7
  maxTokens: 500

Anthropic

Setup

  1. Get an API key from Anthropic
  2. Use the --provider anthropic flag with your API key

Usage

rohas run script.ro --provider anthropic --api-key YOUR_API_KEY

In Script

prompt "Hello, how are you?"
  model: "claude-3-opus-20240229"
  temperature: 0.7
  maxTokens: 100

Available Models

  • claude-3-opus-20240229 - Most capable
  • claude-3-sonnet-20240229 - Balanced performance
  • claude-3-haiku-20240307 - Fast and efficient

Example

prompt "Write a short story"
  model: "claude-3-opus-20240229"
  temperature: 0.8
  maxTokens: 1000

Ollama

Ollama allows you to run models locally on your machine.

Setup

  1. Install Ollama
  2. Pull a model: ollama pull llama2
  3. Start Ollama server (usually runs on http://localhost:11434)

Usage

rohas run script.ro --provider ollama --base-url http://localhost:11434 --model llama2

In Script

prompt "Hello, how are you?"
  model: "llama2"
  temperature: 0.7
  maxTokens: 100

Available Models

Any model available in Ollama:

  • llama2
  • mistral
  • codellama
  • phi
  • And many more...

Example

prompt "Explain Rust programming"
  model: "llama2"
  temperature: 0.5
  maxTokens: 300

Local Models

For custom local model servers (e.g., vLLM, text-generation-inference).

Setup

  1. Set up your local model server
  2. Ensure it's compatible with OpenAI API format (or use adapter)

Usage

rohas run script.ro --provider local --base-url http://localhost:8000 --model your-model

In Script

prompt "Hello, how are you?"
  model: "your-model"
  temperature: 0.7
  maxTokens: 100

Example

prompt "Generate code"
  model: "local-model"
  temperature: 0.3
  maxTokens: 500

Provider Configuration

Command Line

# OpenAI
rohas run script.ro --provider openai --api-key KEY --model gpt-4

# Anthropic
rohas run script.ro --provider anthropic --api-key KEY --model claude-3-opus-20240229

# Ollama
rohas run script.ro --provider ollama --base-url http://localhost:11434 --model llama2

# Local
rohas run script.ro --provider local --base-url http://localhost:8000 --model custom-model

Environment Variables

You can also set provider configuration via environment variables:

export ROHAS_PROVIDER=openai
export ROHAS_API_KEY=your_key
export ROHAS_MODEL=gpt-4

rohas run script.ro

Default Provider

If no provider is specified, Rohas will attempt to use environment variables or default to a local provider if available.

Model Selection

Per-Prompt Model Selection

You can specify different models for different prompts:

prompt "Simple question"
  model: "gpt-3.5-turbo"  # Use cheaper model

prompt "Complex analysis"
  model: "gpt-4"          # Use more capable model

Default Model

Set a default model via command line:

rohas run script.ro --provider openai --api-key KEY --model gpt-4

All prompts will use gpt-4 unless overridden:

prompt "Uses gpt-4 (default)"
  # model not specified, uses default

prompt "Uses gpt-3.5-turbo"
  model: "gpt-3.5-turbo"  # Override default

Prompt Options

All providers support these prompt options:

Temperature

Controls randomness (0.0 to 2.0):

prompt "Creative writing"
  temperature: 0.9  # More creative

prompt "Factual answer"
  temperature: 0.1  # More focused

Max Tokens

Maximum tokens in response:

prompt "Short answer"
  maxTokens: 50

prompt "Long explanation"
  maxTokens: 1000

Stream

Stream responses (if supported):

prompt "Long response"
  stream: true
  maxTokens: 1000

Provider-Specific Features

OpenAI

  • Function calling / tool use
  • Streaming
  • Multiple model variants

Anthropic

  • Long context windows
  • Structured outputs
  • System prompts

Ollama

  • Local execution
  • No API costs
  • Custom model support

Local

  • Full control
  • Custom APIs
  • Private deployment

Best Practices

  1. Choose the right model - Use cheaper models for simple tasks, powerful models for complex ones
  2. Set appropriate temperature - Lower for factual tasks, higher for creative tasks
  3. Limit max tokens - Set reasonable limits to control costs
  4. Use streaming - For long responses, use streaming for better UX
  5. Cache responses - Consider caching for repeated queries
  6. Handle rate limits - Implement retry logic for production use

Troubleshooting

API Key Issues

# Check API key is set
echo $ROHAS_API_KEY

# Or pass explicitly
rohas run script.ro --provider openai --api-key YOUR_KEY

Connection Issues

# Check base URL for local/Ollama
rohas run script.ro --provider ollama --base-url http://localhost:11434

# Test connection
curl http://localhost:11434/api/tags  # For Ollama

Model Not Found

# List available models (Ollama)
ollama list

# Use correct model name
rohas run script.ro --provider ollama --model llama2

Examples

See the examples/ directory for provider-specific examples:

  • hello-world.ro - Basic prompt
  • template-literal-example.ro - Template literals with prompts
  • functions-and-await.ro - Async prompts

Cost Considerations

  • OpenAI: Pay per token (varies by model)
  • Anthropic: Pay per token (varies by model)
  • Ollama: Free (runs locally)
  • Local: Free (your infrastructure)

Choose based on your needs:

  • Development/testing: Ollama or local models
  • Production: OpenAI or Anthropic for reliability
  • High volume: Consider local models or caching