agent-opt

Close the loop: six prompt-optimization algorithms, any LLM, any metric.

Part of the Future AGI open-source platform for making AI agents reliable.

Try Cloud (Free) · Docs · Colab · Blog · Discord · Discussions

Why agent-opt?

Prompts are how ambiguity sneaks into an agent. You can tweak one by hand. You can't tweak a hundred, and you definitely can't re-tweak them every time the model behind them changes. agent-opt does the tweaking for you: pick an algorithm, pick a metric, feed it a dataset, and it returns a prompt that beats the one you wrote.

Six algorithms, one API. Plug in any LLM via LiteLLM. Score against any of the 50+ metrics from ai-evaluation, or write your own. Production traces feed back in as training data.

Six real algorithms

Not one toy loop with six labels. Random Search, Bayesian (Optuna), ProTeGi (textual gradients), Meta-Prompt, PromptWizard (mutate-critique-refine), and GEPA (evolutionary Pareto). Pick by problem shape.

Any model, any metric

LiteLLM under the hood, so OpenAI, Anthropic, Gemini, Bedrock, Azure, Groq, and self-hosted all just work. Score with BLEU, ROUGE, embedding similarity, LLM-as-judge, or any of 50+ ai-evaluation metrics. Or write your own.

Built for the Future AGI loop

Optimize against traces captured by traceAI. Score with ai-evaluation. Deploy the winning prompt through the Agent Command Center gateway. One loop, on your infrastructure.

Install

pip install agent-opt

Requirements: Python ≥ 3.10 · ai-evaluation ≥ 0.2.2 · litellm ≥ 1.80 · optuna ≥ 3.6 · gepa ≥ 0.0.17.

Quickstart

Optimize a RAG prompt against BLEU in 60 seconds.

from fi.opt.optimizers import BayesianSearchOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator
from fi.evals.metrics import BLEUScore

dataset = [
    {"context": "Paris is the capital of France.",
     "question": "What is the capital of France?", "answer": "Paris"},
    # ... more examples
]

evaluator = Evaluator(BLEUScore())
mapper = BasicDataMapper(key_map={
    "response": "generated_output",
    "expected_response": "answer",
})

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    teacher_model_name="gpt-4o",
    n_trials=10,
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=mapper,
    dataset=dataset,
    initial_prompts=["Given the context: {context}, answer: {question}"],
)

print(f"Best score:  {result.final_score:.4f}")
print(f"Best prompt: {result.best_generator.get_prompt_template()}")

_{Full walkthrough: examples/FutureAGI_Agent_Optimizer.ipynb · Open in Colab}

The six algorithms

Each algorithm is a drop-in optimize() call. Swap without touching your dataset, evaluator, or data mapper.

Algorithm	Best for	Key idea
Random Search	Baselines and sanity checks	Random prompt variations around a seed
Bayesian Search	Few-shot example selection	Optuna TPE over example subsets and ordering
ProTeGi	Iterative refinement	Textual gradients from error analysis, beam-searched
Meta-Prompt	Teacher-model rewrites	Strong teacher analyzes failures, rewrites the prompt
PromptWizard	Multi-stage pipelines	Mutate → critique → refine, N rounds
GEPA	Complex solution spaces	Genetic Pareto evolution across multiple objectives

Quick snippets for each

from fi.opt.optimizers import (
    RandomSearchOptimizer, BayesianSearchOptimizer,
    ProTeGi, MetaPromptOptimizer,
    PromptWizardOptimizer, GEPAOptimizer,
)
from fi.opt.generators import LiteLLMGenerator

teacher = LiteLLMGenerator(model="gpt-4o", prompt_template="{prompt}")

# Random — fastest baseline
RandomSearchOptimizer(generator=teacher, teacher_model="gpt-4o", num_variations=5)

# Bayesian — few-shot selection via Optuna
BayesianSearchOptimizer(min_examples=2, max_examples=8, n_trials=20,
                        inference_model_name="gpt-4o-mini", teacher_model_name="gpt-4o")

# ProTeGi — textual gradient refinement
ProTeGi(teacher_generator=teacher, num_gradients=4, beam_size=4)

# Meta-Prompt — teacher-driven rewrites
MetaPromptOptimizer(teacher_generator=teacher, num_rounds=5)

# PromptWizard — mutate / critique / refine
PromptWizardOptimizer(teacher_generator=teacher, mutate_rounds=3, refine_iterations=2)

# GEPA — evolutionary Pareto
GEPAOptimizer(reflection_model="gpt-5", generator_model="gpt-4o-mini")

Core concepts

Generators

Execute a prompt, return a response. LiteLLMGenerator works with every LiteLLM-supported provider.

from fi.opt.generators import LiteLLMGenerator

generator = LiteLLMGenerator(
    model="gpt-4o-mini",
    prompt_template="Summarize this text: {text}",
)

Evaluators

Score a generated output. Three flavors (heuristic, LLM-as-judge, and the Future AGI platform's pre-built templates), all behind one Evaluator API.

# Heuristic
from fi.evals.metrics import BLEUScore
evaluator = Evaluator(BLEUScore())

# LLM-as-judge
from fi.evals.llm import LiteLLMProvider
from fi.evals.metrics import CustomLLMJudge

judge = CustomLLMJudge(
    provider=LiteLLMProvider(),
    config={
        "name": "correctness_judge",
        "grading_criteria": (
            "Score 1.0 if 'response' is semantically equivalent to "
            "'expected_response'. 0.0 if incorrect. Partial credit OK."
        ),
    },
    model="gemini/gemini-2.5-flash",
    temperature=0.4,
)
evaluator = Evaluator(metric=judge)

# Future AGI platform — 50+ pre-built templates
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="...", fi_secret_key="...",
)

Data mappers

Translate your dataset's shape into the keys the evaluator expects.

from fi.opt.datamappers import BasicDataMapper

mapper = BasicDataMapper(key_map={
    "output":       "generated_output",  # from the generator
    "input":        "question",          # from the dataset row
    "ground_truth": "answer",            # from the dataset row
})

Advanced usage

Custom heuristic metric

from fi.evals.metrics.base_metric import BaseMetric

class ExactMatchWithNormalization(BaseMetric):
    @property
    def metric_name(self):
        return "exact_match_norm"

    def compute_one(self, inputs):
        return float(inputs["response"].strip().lower()
                     == inputs["expected_response"].strip().lower())

Custom prompt builder (few-shot composition)

def builder(base_prompt: str, few_shot: list[str]) -> str:
    return f"{base_prompt}\n\nExamples:\n" + "\n\n".join(few_shot)

BayesianSearchOptimizer(prompt_builder=builder, ...)

Logging

from fi.opt.utils import setup_logging
import logging

setup_logging(level=logging.INFO,
              log_to_console=True, log_to_file=True,
              log_file="optimization.log")

Environment

export OPENAI_API_KEY="..."
export GEMINI_API_KEY="..."        # if using Gemini
export FI_API_KEY="..."            # for Future AGI platform evaluators
export FI_SECRET_KEY="..."

Where agent-opt fits in the Future AGI loop

simulate → evaluate → control → monitor → optimize. This SDK is the optimize step.

traceAI captures production traces of every LLM call.
ai-evaluation scores them with 50+ metrics.
agent-opt turns those scored traces into a better prompt.
The Agent Command Center ships the new prompt behind an OpenAI-compatible endpoint.

Use one SDK or all of them. Each is independently packaged and Apache 2.0-licensed.

Project structure

src/fi/opt/
├── base/              # Abstract base classes (Evaluator, Optimizer, …)
├── datamappers/       # Dataset-shape → evaluator-key translators
├── generators/        # LiteLLM-backed LLM callers
├── optimizers/        # Random, Bayesian, ProTeGi, Meta-Prompt, PromptWizard, GEPA
├── utils/             # Logging, IO, small helpers
└── types.py           # Shared type defs

Roadmap

Shipped	In progress	Coming up	Exploring
Six algorithms (RS, Bayesian, ProTeGi, Meta-Prompt, PromptWizard, GEPA) LiteLLM generator `ai-evaluation` integration (heuristic + LLM-judge + platform) Early-stopping config GEPA iteration history	Public OSS launch Async optimization loop Multi-objective result surface Trace-ingestion connector (`traceAI` → dataset)	Prompt version control with branches Cost-aware optimization budgets Resumable runs from checkpoint CLI (`agent-opt optimize …`)	Auto-tuned rubrics from human feedback Multi-turn dialogue optimization Voice-agent prompt optimization Federated optimization across tenants

Contributing

Bug fixes, new algorithms, new metrics, docs, examples: all welcome.

Browse good first issue
Read the main repo Contributing Guide — same CLA, same workflow.
Say hi on Discord or Discussions.

Community & support


💬 Discord	Real-time help from the team and community
🗨️ GitHub Discussions	Ideas, questions, roadmap input
📝 Blog	Engineering & research posts
📧 support@futureagi.com	Cloud account / billing
🔐 security@futureagi.com	Private vulnerability disclosure — see SECURITY.md

License

Licensed under the Apache License 2.0. See LICENSE and NOTICE.

Part of the Future AGI open-source ecosystem.

Built by the Future AGI team and contributors.

If agent-opt helps you ship better agents, a ⭐ helps more teams find us.

🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/assets		.github/assets
examples		examples
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
test_optimizers.py		test_optimizers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-opt

Why agent-opt?

Six real algorithms

Any model, any metric

Built for the Future AGI loop

Install

Quickstart

The six algorithms

Core concepts

Generators

Evaluators

Data mappers

Advanced usage

Custom heuristic metric

Custom prompt builder (few-shot composition)

Logging

Environment

Where agent-opt fits in the Future AGI loop

Project structure

Roadmap

Contributing

Community & support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-opt

Why agent-opt?

Six real algorithms

Any model, any metric

Built for the Future AGI loop

Install

Quickstart

The six algorithms

Core concepts

Generators

Evaluators

Data mappers

Advanced usage

Custom heuristic metric

Custom prompt builder (few-shot composition)

Logging

Environment

Where agent-opt fits in the Future AGI loop

Project structure

Roadmap

Contributing

Community & support

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages