GitHub - Getty/langertha: Perl Framework for AI - Langertha - the viking of AI

 __                              __   __
|  .---.-.-----.-----.-----.----|  |_|  |--.---.-.
|  |  _  |     |  _  |  -__|   _|   _|     |  _  |
|__|___._|__|__|___  |_____|__| |____|__|__|___._|
---------------|_____|----------------------------

The clan of fierce vikings with axes and shields to AId your rAId

Langertha is a unified Perl interface for LLM APIs. One API, many providers. Supports chat, streaming, embeddings, transcription, MCP tool calling, autonomous agents, workflow orchestration, observability, and dynamic model discovery.

Supported Providers

Provider	Chat	Streaming	Tools (MCP)	Embeddings	Images	Transcription	Models API
OpenAI 🇺🇸	✅	✅	✅	✅	✅	✅	✅
Anthropic 🇺🇸	✅	✅	✅				✅
Gemini 🇺🇸	✅	✅	✅				✅
Ollama	✅	✅	✅	✅			✅
Groq 🇺🇸	✅	✅	✅			✅	✅
Mistral 🇪🇺	✅	✅	✅				✅
DeepSeek 🇨🇳	✅	✅	✅				✅
MiniMax 🇨🇳	✅	✅	✅				✅
Perplexity 🇺🇸	✅	✅
Nous Research 🇺🇸	✅	✅	✅				✅
Cerebras 🇺🇸	✅	✅	✅
OpenRouter 🇺🇸	✅	✅	✅
Replicate 🇺🇸	✅	✅	✅
HuggingFace 🇺🇸	✅	✅	✅
vLLM	✅	✅	✅
SGLang	✅	✅	✅
llama.cpp	✅	✅	✅	✅
LM Studio	✅	✅					✅
AKI.IO 🇪🇺	✅	✅	✅				✅
T-Systems AIFS 🇩🇪	✅	✅	✅	✅		✅	✅
Scaleway 🇪🇺	✅	✅	✅	✅			✅
Whisper						✅

Quick Start

cpanm Langertha

use Langertha::Engine::OpenAI;

my $openai = Langertha::Engine::OpenAI->new(
    api_key => $ENV{OPENAI_API_KEY},
    model   => 'gpt-4o-mini',
);

print $openai->simple_chat('Hello from Perl!');

Architecture at a Glance

Engine (provider API adapter)
  -> low-level provider calls (chat, streaming, tools, embeddings, images, transcription)

Wrappers (task-focused facades)
  -> Chat / Embedder / ImageGen with optional overrides + plugins

Raider (autonomous worker agent)
  -> history, mission, tool loop, self-tools, plugins, continuation

Raid (workflow orchestration)
  -> compose Raider + Raid nodes as Sequential / Parallel / Loop trees
  -> shared RunContext + unified Result semantics

Usage Examples

Cloud APIs

use Langertha::Engine::Anthropic;

my $claude = Langertha::Engine::Anthropic->new(
    api_key => $ENV{ANTHROPIC_API_KEY},
    model   => 'claude-sonnet-4-6',
);
print $claude->simple_chat('Generate Perl Moose classes for GeoJSON.');

use Langertha::Engine::Gemini;

my $gemini = Langertha::Engine::Gemini->new(
    api_key => $ENV{GEMINI_API_KEY},
    model   => 'gemini-2.5-flash',
);
print $gemini->simple_chat('Explain quantum computing.');

Local Models with Ollama

use Langertha::Engine::Ollama;

my $ollama = Langertha::Engine::Ollama->new(
    url   => 'http://localhost:11434',
    model => 'llama3.3',
);
print $ollama->simple_chat('Do you wanna build a snowman?');

AKI.IO 🇪🇺 European AI Infrastructure

AKI.IO is a European AI model hub based in Germany. All inference runs on EU-based infrastructure, making it a strong choice for GDPR-compliant and data-sovereignty-sensitive applications. No data leaves the EU.

use Langertha::Engine::AKI;

my $aki = Langertha::Engine::AKI->new(
    api_key => $ENV{AKI_API_KEY},
    model   => 'llama3_8b_chat',
);
print $aki->simple_chat('Hello!');

# OpenAI-compatible API for streaming & tool calling
# Note: native model names are not mapped automatically to /v1 names
my $aki_openai = $aki->openai(model => 'llama3-chat-8b');

T-Systems AIFS 🇩🇪 GDPR-Compliant LLM Hub

T-Systems AI Foundation Services (formerly LLM Hub) is an OpenAI-compatible aggregator with 30+ models. T-Cloud models (Llama 3.3, Qwen 3, Mistral Small, Teuken, BGE-M3, Whisper, gpt-oss-120b) are processed exclusively in Germany; hyperscaler models (GPT 5/4.1/4o, Claude 4.5 Sonnet, Gemini 2.5/3) are processed in the EU.

use Langertha::Engine::TSystems;

my $tsi = Langertha::Engine::TSystems->new(
    api_key => $ENV{LANGERTHA_TSYSTEMS_API_KEY},
    model   => 'gpt-oss-120b',          # default — T-Cloud, reliable tools
);
print $tsi->simple_chat('Hello from AIFS!');

my $vector = $tsi->simple_embedding('embed me');  # default model: text-embedding-bge-m3

Trial API keys at apikey.llmhub.t-systems.net.

Scaleway 🇪🇺 European Generative APIs

Scaleway Generative APIs is a fully-managed, EU-hosted, drop-in replacement for the OpenAI API. EU-Act compliant, priced per 1M tokens.

use Langertha::Engine::Scaleway;

my $scw = Langertha::Engine::Scaleway->new(
    api_key => $ENV{LANGERTHA_SCALEWAY_API_KEY},
    model   => 'llama-3.1-8b-instruct',  # default
);
print $scw->simple_chat('Hello from Scaleway!');

# Project-scoped URL (optional)
my $scw_proj = Langertha::Engine::Scaleway->new(
    url     => 'https://api.scaleway.ai/<PROJECT_ID>/v1',
    api_key => $ENV{LANGERTHA_SCALEWAY_API_KEY},
);

Generate an IAM API key (only the Secret Key is needed) at console.scaleway.com → IAM & API keys.

Self-hosted with vLLM

use Langertha::Engine::vLLM;

my $vllm = Langertha::Engine::vLLM->new(
    url   => $ENV{VLLM_URL},
    model => 'meta-llama/Llama-3.3-70B-Instruct',
);
print $vllm->simple_chat('Hello!');

Local Models with LM Studio (native API)

use Langertha::Engine::LMStudio;

my $lmstudio = Langertha::Engine::LMStudio->new(
    url   => 'http://localhost:1234',
    model => 'qwen2.5-7b-instruct',
);
print $lmstudio->simple_chat('Hello!');

# OpenAI-compatible wrapper (/v1)
my $lmstudio_oai = $lmstudio->openai;
print $lmstudio_oai->simple_chat('Hello via OpenAI-compatible API!');
# Equivalent explicit class:
use Langertha::Engine::LMStudioOpenAI;
my $lmstudio_oai2 = Langertha::Engine::LMStudioOpenAI->new(
    url   => 'http://localhost:1234/v1',
    model => 'qwen2.5-7b-instruct',
);
 # api_key defaults internally to "lmstudio"

# Anthropic-compatible wrapper (/v1/messages)
my $lmstudio_anth = $lmstudio->anthropic;
print $lmstudio_anth->simple_chat('Hello via Anthropic-compatible API!');
# Equivalent explicit class:
use Langertha::Engine::LMStudioAnthropic;
my $lmstudio_anth2 = Langertha::Engine::LMStudioAnthropic->new(
    url   => 'http://localhost:1234',
    model => 'qwen2.5-7b-instruct',
);
 # api_key defaults internally to "lmstudio"

Streaming

Real-time token streaming with callbacks, iterators, or async/await:

# Callback
$engine->simple_chat_stream(sub {
    print shift->content;
}, 'Write a poem about Perl');

# Iterator
my $iter = $engine->simple_chat_stream_iterator('Tell me a story');
while (my $chunk = $iter->next) {
    print $chunk->content;
}

# Async/await with real-time streaming
use Future::AsyncAwait;

my ($content, $chunks) = await $engine->simple_chat_stream_realtime_f(
    sub { print shift->content },
    'Explain monads'
);

MCP Tool Calling

Langertha integrates with MCP (Model Context Protocol) servers via Net::Async::MCP. LLMs can discover and invoke tools automatically.

use IO::Async::Loop;
use Net::Async::MCP;
use Future::AsyncAwait;

my $loop = IO::Async::Loop->new;

# Connect to an MCP server (in-process, stdio, or HTTP)
my $mcp = Net::Async::MCP->new(
    command => ['npx', '@anthropic/mcp-server-web-search'],
);
$loop->add($mcp);
await $mcp->initialize;

# Any engine, same API
my $engine = Langertha::Engine::Anthropic->new(
    api_key     => $ENV{ANTHROPIC_API_KEY},
    model       => 'claude-sonnet-4-6',
    mcp_servers => [$mcp],
);

my $response = await $engine->chat_with_tools_f(
    'Search the web for Perl MCP modules'
);
say $response;

The tool-calling loop runs automatically:

Gathers available tools from all configured MCP servers
Sends chat request with tool definitions to the LLM
If the LLM returns tool calls, executes them via MCP
Feeds tool results back to the LLM and repeats
Returns the final text response

Works with all engines that support tool calling (see table above).

Hermes-Native Tool Calling

For models that support the Hermes tool calling format (via <tool_call> XML tags) but lack API-level tool support, engines compose Langertha::Role::HermesTools:

# NousResearch, AKI, and AKIOpenAI compose HermesTools out of the box
my $nous = Langertha::Engine::NousResearch->new(
    api_key     => $ENV{NOUSRESEARCH_API_KEY},
    mcp_servers => [$mcp],
);

my $aki = Langertha::Engine::AKI->new(
    api_key     => $ENV{AKI_API_KEY},
    mcp_servers => [$mcp],
);

Tools are injected into the system prompt and <tool_call> tags are parsed from the model's text output. The tool prompt template is customizable via hermes_tool_prompt.

Tool & Structured-Output Flow

Three things go into a tool/structured-output call: what the caller passed (tools, tool_choice, response_format, mcp_servers), which method was used (chat_f single-turn vs chat_with_tools_f multi-turn loop), and what the engine actually supports on the wire. Langertha auto-rewrites between forms when the wire reality demands it. Every case lands as a Langertha::ToolCall on Response.tool_calls so callers read the result the same way regardless of provider.

Decision flow (single-turn `chat_f`)

flowchart TD
    Start([Caller invokes chat_f]) --> Q{What was passed?}

    Q -->|tools only| T{Engine caps?}
    Q -->|tools +<br>forced named<br>tool_choice| F{Engine caps?}
    Q -->|response_format<br>json_object /<br>json_schema| R{Engine caps?}
    Q -->|mcp_servers set| MCP[Use chat_with_tools_f<br>multi-turn loop:<br>tools from MCP,<br>auto-executed,<br>results fed back]

    T -->|tools_native| TA[Forward via<br>Tool->to_PROVIDER]
    T -->|only tools_hermes| TB[XML in prompt<br>via chat_with_tools_f]

    F -->|tool_choice_named| FA[Native forced-name<br>on the wire]
    F -->|only<br>response_format_json_schema<br>e.g. Perplexity| FB[AUTO-REWRITE<br>clear tools/choice,<br>set response_format=json_schema,<br>loose-parse content,<br>attach synthetic ToolCall]

    R -->|response_format_json_*| RA[Native<br>OpenAI: native block<br>Gemini: responseSchema<br>Ollama: format param]
    R -->|only tool_choice_named<br>Anthropic| RB[ENGINE-INTERNAL<br>synth tool + forced choice,<br>tool_use input lifted<br>into Response.content as JSON]

    TA --> End([Response.tool_calls<br>ArrayRef of<br>Langertha::ToolCall])
    TB --> End
    FA --> End
    FB --> End
    RA --> End
    RB --> End
    MCP --> End

    classDef rewrite fill:#fef3c7,stroke:#d97706,color:#000
    classDef done fill:#d1fae5,stroke:#059669,color:#000
    class FB,RB rewrite
    class End done

Per-provider tool wire mechanics

Provider family	Tools wire	tool_choice forms	response_format mechanism	Tool calls in response
OpenAIBase (OpenAI, DeepSeek, Groq, Mistral, Cerebras, MiniMax, OpenRouter, Replicate, HuggingFace, vLLM, SGLang, LlamaCpp, Ollama-OpenAI, LMStudioOpenAI, AKIOpenAI, TSystems, Scaleway)	`tools=[{type=>'function',function=>{...}}]`	string `auto`/`required`/`none` + `{type=>'function',function=>{name=>X}}`	native `response_format` block (json_object / json_schema)	`choices[0].message.tool_calls`
AnthropicBase (Anthropic, MiniMaxAnthropic, LMStudioAnthropic)	`tools=[{name=>...,input_schema=>...}]`	`{type=>'auto'/'any'/'none'/'tool',name=>X}`	engine-internal: synthesizes tool + forces it; lifts tool_use input into Response.content as JSON	`content[*]` blocks with `type=>'tool_use'`
Gemini	`tools=[{functionDeclarations=>[...]}]`	`toolConfig.functionCallingConfig` (`mode` + `allowedFunctionNames` for named)	`generationConfig.responseSchema` + `responseMimeType='application/json'`	`candidates[0].content.parts[*].functionCall`
Ollama (native)	OpenAI-shape tools natively	OpenAI-shape `tool_choice`	`format='json'` (json_object) or schema HashRef (json_schema)	`message.tool_calls`
Perplexity	NO tool calling on the wire	string `auto`/`required`/`none` only (named coerced to `required`)	native `response_format=json_schema`	(synthetic, via `chat_f` auto-rewrite)
Hermes engines (NousResearch, AKI, AKIOpenAI)	tools injected into system prompt as XML	(model decides via prompt)	(use response_format on AKIOpenAI / NousResearch where applicable)	`<tool_call>...</tool_call>` parsed from text

Reading tool calls back

Single source of truth — same shape regardless of provider:

my $r = await $engine->chat_f(...);

if ($r->has_tool_calls) {
    for my $tc (@{ $r->tool_calls }) {
        say $tc->name;          # tool name
        say $tc->arguments;     # decoded HashRef
        say $tc->id;            # provider call id (if any)
        say $tc->synthetic;     # 1 if synthesized via fallback
    }
}

# Lookup helpers
my $tc   = $r->tool_call;            # first ToolCall (or undef)
my $tc   = $r->tool_call('extract'); # by name
my $args = $r->tool_call_args('extract');  # ->arguments shortcut

Capability gating before sending

$engine->supports('tool_choice_named')          or warn "engine cannot force a named tool — chat_f will auto-rewrite if json_schema is available";
$engine->supports('response_format_json_schema')  # safe to pass json_schema
$engine->supports('tools_native')                 # accepts tools array on the wire
$engine->supports('tools_hermes')                 # XML-prompt fallback path
$engine->supports('streaming')                    # chat_stream_request available

Response Metadata

simple_chat returns Langertha::Response objects with full metadata — token usage, model, finish reason, timing. Backward-compatible: stringifies to the text content, so existing code works unchanged.

my $r = $engine->simple_chat('Hello!');
print $r;                    # prints the text (stringification)
say $r->model;               # gpt-4o-mini
say $r->prompt_tokens;       # 12
say $r->completion_tokens;   # 8
say $r->total_tokens;        # 20
say $r->finish_reason;       # stop

Works across all engines. Each provider's token counts and metadata are normalized automatically.

Rate Limiting

Rate limit information from HTTP response headers is extracted automatically and normalized into Langertha::RateLimit objects. Available per-response and on the engine (always reflects the latest response):

my $r = $engine->simple_chat('Hello!');

# Per-response rate limit
if ($r->has_rate_limit) {
    say $r->requests_remaining;              # 499
    say $r->tokens_remaining;                # 29990
    say $r->rate_limit->requests_reset;      # "12s" or RFC 3339
    say $r->rate_limit->raw;                 # all provider-specific headers
}

# Engine always has latest rate limit
say $engine->rate_limit->requests_remaining if $engine->has_rate_limit;

Supported providers: OpenAI, Groq, Cerebras, OpenRouter, Replicate, HuggingFace (x-ratelimit-*), Anthropic (anthropic-ratelimit-*). Local engines (Ollama, vLLM, llama.cpp) typically don't return rate limit headers.

Chain-of-Thought Reasoning

Reasoning models produce chain-of-thought thinking alongside their answers. Langertha extracts this automatically — the response content is always clean, and thinking is available separately:

my $r = $engine->simple_chat('Solve this step by step...');
say $r;                  # clean answer
say $r->thinking;        # chain-of-thought reasoning (if any)
say $r->has_thinking;    # check if thinking was produced

Native API extraction works automatically for providers that return reasoning as a separate field:

Provider	Reasoning Field	Models
DeepSeek	`reasoning_content`	deepseek-reasoner
Anthropic	`thinking` content blocks	claude with extended thinking
Gemini	`thought` parts	gemini-2.5-flash/pro
OpenAI	`reasoning_content`	o1, o3, o4-mini

Think tag filtering handles open-source reasoning models that embed <think>...</think> tags inline (DeepSeek R1 via Ollama/vLLM, QwQ, Hermes with reasoning). The filter is enabled by default on all engines and strips tags automatically. Handles both closed and unclosed tags (when models stop mid-thought).

# NousResearch with reasoning enabled
my $nous = Langertha::Engine::NousResearch->new(
    api_key   => $ENV{NOUSRESEARCH_API_KEY},
    model     => 'DeepHermes-3-Mistral-24B-Preview',
    reasoning => 1,   # enables chain-of-thought system prompt
);
my $r = $nous->simple_chat('Explain why the sky is blue');
say $r;               # clean answer
say $r->thinking;     # <think> content extracted automatically

# Custom tag name for models using different tags
my $engine = Langertha::Engine::vLLM->new(
    url       => $vllm_url,
    model     => 'my-model',
    think_tag => 'reasoning',   # default: 'think'
);

Raider — Autonomous Agent

Langertha::Raider is a stateful agent with conversation history and MCP tool calling. It maintains context across multiple interactions ("raids").

use Langertha::Raider;

my $raider = Langertha::Raider->new(
    engine  => $engine,    # any engine with mcp_servers
    mission => 'You are a code explorer.',
);

# First raid — tools are called automatically, history is saved
my $r1 = await $raider->raid_f('What files are in lib/?');
say $r1;

# Second raid — has context from the first conversation
my $r2 = await $raider->raid_f('Read the main module.');
say $r2;

# Metrics across all raids
say $raider->metrics->{tool_calls};  # cumulative
$raider->clear_history;              # start fresh

Key features: persistent history, mission (system prompt), cumulative metrics (raids, iterations, tool_calls, time_ms), context compression, session history, Hermes tool calling support, plugin system.

Plugins

Extend the Raider with plugins that hook into every stage of the raid lifecycle:

my $raider = Langertha::Raider->new(
    engine  => $engine,
    plugins => ['Langfuse', 'MyApp::Guardrails'],
);

Plugins are Moose classes extending Langertha::Plugin with async hook methods:

plugin_before_raid — transform input messages
plugin_build_conversation — transform assembled conversation
plugin_before_llm_call — transform conversation before each LLM call
plugin_after_llm_response — inspect/transform LLM response
plugin_before_tool_call — inspect/skip/transform tool calls
plugin_after_tool_call — transform tool results
plugin_after_raid — transform the final result

Short names are resolved to Langertha::Plugin::* or LangerthaX::Plugin::*. The built-in Langfuse plugin provides full observability as an alternative to engine-level Langfuse.

# Quick plugin with sugar
package MyApp::Guardrails;
use Langertha qw( Plugin );

around plugin_before_tool_call => async sub {
    my ($orig, $self, $name, $input) = @_;
    my @result = await $self->$orig($name, $input);
    return if $name eq 'dangerous_tool';  # skip
    return @result;
};

__PACKAGE__->meta->make_immutable;

Context Compression

For long-running agents, history can grow large. Enable auto-compression to keep token usage under control:

my $raider = Langertha::Raider->new(
    engine             => $engine,
    mission            => 'You are an assistant.',
    max_context_tokens => 100_000,           # enables auto-compression
    context_compress_threshold => 0.75,      # compress at 75% (default)
    # compression_engine => $cheap_engine,   # optional: use a cheaper model
);

When prompt tokens exceed the threshold, the working history is automatically summarized via LLM before the next raid. The summary replaces the history, keeping context compact while preserving key information.

Session History

The full session history (including tool calls and results) is archived in session_history — never auto-compressed, persisted across clear_history and reset:

# Register MCP tool so the LLM can query its own history
$raider->register_session_history_tool($mcp_server);

# Or inspect programmatically
my @all = @{$raider->session_history};

Mid-Raid Context Injection

Feed additional context to the agent while it's working — it picks it up at the next iteration:

# From another async task, timer, or callback:
$raider->inject('Also check the test files');
$raider->inject({ role => 'user', content => 'Focus on .pm files' });

# Or use on_iteration for programmatic injection per iteration:
my $raider = Langertha::Raider->new(
    engine  => $engine,
    on_iteration => sub {
        my ($raider, $iteration) = @_;
        return ['Check the error log'] if $iteration == 3;
        return;
    },
);

Injected messages are persisted in history so the agent remembers them across raids.

Raid — Workflow Orchestration

Langertha::Raid adds a lightweight orchestration layer on top of Langertha::Raider. Use it to compose multiple runnable nodes into workflow trees without introducing a DSL.

Core building blocks:

Langertha::Role::Runnable — shared execution contract (run_f($ctx))
Langertha::RunContext — structured execution state (input, state, artifacts, metadata, trace)
Langertha::Raid::Sequential — ordered step execution
Langertha::Raid::Parallel — concurrent branch execution with context branching/merge
Langertha::Raid::Loop — orchestration-level loop with max_loops / max_iterations

Minimal Sequential Example

use Future::AsyncAwait;
use Langertha::Raider;
use Langertha::Raid::Sequential;
use Langertha::RunContext;

my $flow = Langertha::Raid::Sequential->new(
    steps => [
        Langertha::Raider->new(engine => $engine, mission => 'Collect facts'),
        Langertha::Raider->new(engine => $engine, mission => 'Write concise summary'),
    ],
);

my $ctx = Langertha::RunContext->new(
    input => 'Analyze lib/Langertha.pm and summarize key components.',
);

my $result = await $flow->run_f($ctx);
say $result->text if $result->is_final;

use Future::AsyncAwait;
use Langertha::Raider;
use Langertha::Raid::Sequential;
use Langertha::Raid::Parallel;
use Langertha::Raid::Loop;
use Langertha::RunContext;

my $raid = Langertha::Raid::Sequential->new(
    steps => [
        Langertha::Raider->new(engine => $engine_a, mission => 'Research'),
        Langertha::Raid::Parallel->new(
            steps => [
                Langertha::Raider->new(engine => $engine_b, mission => 'Summarize'),
                Langertha::Raider->new(engine => $engine_c, mission => 'Extract risks'),
            ],
        ),
        Langertha::Raid::Loop->new(
            steps => [
                Langertha::Raider->new(engine => $engine_a, mission => 'Refine output'),
            ],
            max_loops => 2,
        ),
    ],
);

my $ctx = Langertha::RunContext->new(
    input    => 'Analyze this repository and produce an action plan.',
    metadata => { request_id => 'req-42' },
);

my $result = await $raid->run_f($ctx);

if ($result->is_final) {
    say $result->text;
} elsif ($result->is_question) {
    say "Needs input: " . $result->content;
} elsif ($result->is_pause) {
    say "Paused: " . $result->content;
} elsif ($result->is_abort) {
    die "Aborted: " . $result->content;
}

Parallel branches do not mutate one shared context object blindly. Each branch gets its own cloned context and merged snapshots are available in:

my $branches = $ctx->artifacts->{parallel_branches};

Result semantics are unified across Raider and Raid:

final — completed successfully
question — needs user input
pause — execution paused/resumable
abort — explicit stop/failure

Observability with Langfuse

Every engine has Langfuse observability built in. Just set env vars — zero code changes:

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_URL=http://localhost:3000   # optional, defaults to cloud

my $engine = Langertha::Engine::OpenAI->new(
    api_key => $ENV{OPENAI_API_KEY},
);

$engine->simple_chat('Hello!');  # auto-traced
$engine->langfuse_flush;         # send events to Langfuse

simple_chat calls are auto-instrumented with traces and generations (including token usage and timing). Raider raids create cascading traces with proper hierarchy:

Trace: "raid" (with userId, sessionId, tags)
  ├── Span: iteration-1
  │     ├── Generation: llm-call (with usage, modelParameters)
  │     ├── Span: tool: list_files (with input/output, timing)
  │     └── Span: tool: read_file
  ├── Span: iteration-2
  │     └── Generation: llm-call (final response)
  └── [trace updated with output at end]

Customize Raider traces with user/session/tag metadata:

my $raider = Langertha::Raider->new(
    engine             => $engine,
    langfuse_user_id   => 'user-42',
    langfuse_session_id => 'session-abc',
    langfuse_tags      => ['production', 'v2'],
);

Disabled by default — active only when both keys are set. A Kubernetes manifest for self-hosted Langfuse is included: kubectl apply -f ex/langfuse-k8s.yaml

Wrapper Classes

Wrap an engine with optional overrides for specific use cases:

use Langertha::Chat;
use Langertha::Embedder;
use Langertha::ImageGen;

# Chat wrapper with custom system prompt and model
my $chat = Langertha::Chat->new(
    engine        => $openai,
    system_prompt => 'You are a pirate.',
    model         => 'gpt-4o',
    temperature   => 0.9,
);
print $chat->simple_chat('Ahoy!');

# Embedding wrapper with specific model
my $embedder = Langertha::Embedder->new(
    engine => $openai,
    model  => 'text-embedding-3-small',
);
my $vec = $embedder->simple_embedding('some text');

# Image generation wrapper
my $imagegen = Langertha::ImageGen->new(
    engine  => $openai,
    model   => 'gpt-image-1',
    size    => '1024x1024',
    quality => 'high',
);
my $images = $imagegen->simple_image('A cat in space');

Wrappers support plugins via the same plugins attribute as Raider.

Async/Await

All operations have async variants via Future::AsyncAwait:

use Future::AsyncAwait;

async sub main {
    my $response = await $engine->simple_chat_f('Hello!');
    say $response;
}

main()->get;

Embeddings

use Langertha::Engine::OpenAI;

my $openai = Langertha::Engine::OpenAI->new(
    api_key => $ENV{OPENAI_API_KEY},
);

my $embedding = $openai->simple_embedding('Some text to embed');
# Returns arrayref of floats

Also supported by Ollama (e.g. mxbai-embed-large).

Image Generation

use Langertha::Engine::OpenAI;

my $openai = Langertha::Engine::OpenAI->new(
    api_key => $ENV{OPENAI_API_KEY},
);

my $images = $openai->simple_image('A viking with an axe in pixel art');
# Returns arrayref of image objects with url or b64_json

Default model is gpt-image-1. Pass size, quality, or n as extra arguments.

Transcription (Whisper)

Langertha::Engine::Whisper is a slim transcription-only engine (extends Langertha::Engine::TranscriptionBase) for self-hosted Whisper-compatible servers — no chat, tools, or embeddings on the same object:

use Langertha::Engine::Whisper;

my $whisper = Langertha::Engine::Whisper->new(
    url => $ENV{WHISPER_URL},
);
print $whisper->simple_transcription('recording.ogg');

OpenAI and Groq also support transcription via their Whisper endpoints directly on the chat-side engine:

my $openai = Langertha::Engine::OpenAI->new(
    api_key => $ENV{OPENAI_API_KEY},
);
print $openai->simple_transcription('recording.ogg');

For a focused transcription handle reusing the chat engine's credentials (no need to restate api_key / url), use the whisper attribute on Langertha::Engine::OpenAI:

my $text = $openai->whisper->simple_transcription('recording.ogg');
# $openai->whisper is a Langertha::Engine::TranscriptionBase bound to
# the parent's api_key/url, with transcription_model 'whisper-1'.

Engine Capabilities

Every engine reports its capabilities so calling code can avoid sending parameters the wire cannot honour:

if ($engine->supports('tool_choice_named')) {
    # safe to pass tool_choice => { type => 'tool', name => '...' }
}
if ($engine->supports('response_format_json_schema')) {
    # safe to pass response_format => { type => 'json_schema', ... }
}

my $caps = $engine->engine_capabilities;
# { chat => 1, streaming => 1, tools_native => 1, tool_choice_named => 1,
#   response_format_json_schema => 1, embedding => 1, ... }

The flag set is derived from which capability roles the engine composes; engines override (via around engine_capabilities) when the wire reality differs from the role inventory. See Langertha::Role::Capabilities for the full list.

Dynamic Model Discovery

Query available models from any provider API:

my $models = $engine->list_models;
# Returns: ['gpt-4o', 'gpt-4o-mini', 'o1', ...]

my $models = $engine->list_models(full => 1);       # Full metadata
my $models = $engine->list_models(force_refresh => 1); # Bypass cache

Results are cached for 1 hour (configurable via models_cache_ttl).

Testing

# Run all unit tests
prove -l t/

# Run mock tool calling tests (no API keys needed)
prove -l -It/lib t/64_tool_calling_ollama_mock.t
prove -l -It/lib t/66_tool_calling_hermes.t

# Run live integration tests
TEST_LANGERTHA_OPENAI_API_KEY=...    \
TEST_LANGERTHA_ANTHROPIC_API_KEY=... \
TEST_LANGERTHA_GEMINI_API_KEY=...    \
prove -l t/80_live_tool_calling.t

# Ollama with multiple models
TEST_LANGERTHA_OLLAMA_URL=http://localhost:11434     \
TEST_LANGERTHA_OLLAMA_MODELS=qwen3:8b,llama3.2:3b   \
prove -l t/80_live_tool_calling.t

# NousResearch (Hermes-native tool calling via <tool_call> tags)
TEST_LANGERTHA_NOUSRESEARCH_API_KEY=... \
prove -l t/80_live_tool_calling.t

# vLLM (requires --enable-auto-tool-choice and --tool-call-parser on server)
TEST_LANGERTHA_VLLM_URL=http://localhost:8000/v1              \
TEST_LANGERTHA_VLLM_MODEL=Qwen/Qwen2.5-3B-Instruct           \
TEST_LANGERTHA_VLLM_TOOL_CALL_PARSER=hermes                   \
prove -l t/80_live_tool_calling.t

# LM Studio live chat (native + OpenAI-compatible + Anthropic-compatible)
TEST_LANGERTHA_LMSTUDIO_URL=http://localhost:1234             \
TEST_LANGERTHA_LMSTUDIO_MODEL=qwen2.5-7b-instruct             \
prove -l t/83_live_chat.t

Examples

See the ex/ directory for runnable examples:

Example	Description
`synopsis.pl`	Basic usage with multiple engines
`response.pl`	Response metadata (tokens, model, timing)
`streaming_callback.pl`	Real-time streaming with callbacks
`streaming_iterator.pl`	Streaming with iterator pattern
`streaming_future.pl`	Async streaming with Futures
`async_await.pl`	Async/await patterns
`mcp_inprocess.pl`	MCP tool calling with in-process server
`mcp_stdio.pl`	MCP tool calling with stdio server
`hermes_tools.pl`	Hermes-native tool calling with NousResearch
`raider.pl`	Autonomous agent with MCP tools and history
`raider_run.pl`	Full Raider demo: self-tools, engine/MCP catalogs, bootstrapping
`raider_plugin_sugar.pl`	Raider with plugins using class sugar
`raider_rag.pl`	RAG (Retrieval-Augmented Generation) with Raider
`langfuse.pl`	Langfuse observability tracing
`langfuse-k8s.yaml`	Kubernetes manifest for self-hosted Langfuse
`embedding.pl`	Text embeddings
`transcription.pl`	Audio transcription with Whisper
`structured_output.pl`	Structured/JSON output

Community

CPAN: Langertha on MetaCPAN
GitHub: Getty/langertha - Issues & PRs welcome
Discord: Join the community
IRC: irc://irc.perl.org/ai

License

This is free software licensed under the same terms as Perl itself (Artistic License / GPL).

THIS API IS WORK IN PROGRESS

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.claude/skills		.claude/skills
bin		bin
ex		ex
lib		lib
maint		maint
share		share
t		t
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Changes		Changes
Dockerfile		Dockerfile
README.md		README.md
cpanfile		cpanfile
dist.ini		dist.ini
docker-entrypoint.sh		docker-entrypoint.sh

Folders and files

Latest commit

History

Repository files navigation

Table of Contents