Skip to content

future-agi/agent-opt

Future AGI

agent-opt

Close the loop: six prompt-optimization algorithms, any LLM, any metric.

Part of the Future AGI open-source platform for making AI agents reliable.

PyPI Python versions Apache 2.0 License Downloads Discord

Try Cloud (Free) · Docs · Colab · Blog · Discord · Discussions


Why agent-opt?

Prompts are how ambiguity sneaks into an agent. You can tweak one by hand. You can't tweak a hundred, and you definitely can't re-tweak them every time the model behind them changes. agent-opt does the tweaking for you: pick an algorithm, pick a metric, feed it a dataset, and it returns a prompt that beats the one you wrote.

Six algorithms, one API. Plug in any LLM via LiteLLM. Score against any of the 50+ metrics from ai-evaluation, or write your own. Production traces feed back in as training data.

Six real algorithms

Not one toy loop with six labels. Random Search, Bayesian (Optuna), ProTeGi (textual gradients), Meta-Prompt, PromptWizard (mutate-critique-refine), and GEPA (evolutionary Pareto). Pick by problem shape.

Any model, any metric

LiteLLM under the hood, so OpenAI, Anthropic, Gemini, Bedrock, Azure, Groq, and self-hosted all just work. Score with BLEU, ROUGE, embedding similarity, LLM-as-judge, or any of 50+ ai-evaluation metrics. Or write your own.

Built for the Future AGI loop

Optimize against traces captured by traceAI. Score with ai-evaluation. Deploy the winning prompt through the Agent Command Center gateway. One loop, on your infrastructure.


Install

pip install agent-opt

Requirements: Python ≥ 3.10 · ai-evaluation ≥ 0.2.2 · litellm ≥ 1.80 · optuna ≥ 3.6 · gepa ≥ 0.0.17.


Quickstart

Optimize a RAG prompt against BLEU in 60 seconds.

from fi.opt.optimizers import BayesianSearchOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator
from fi.evals.metrics import BLEUScore

dataset = [
    {"context": "Paris is the capital of France.",
     "question": "What is the capital of France?", "answer": "Paris"},
    # ... more examples
]

evaluator = Evaluator(BLEUScore())
mapper = BasicDataMapper(key_map={
    "response": "generated_output",
    "expected_response": "answer",
})

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    teacher_model_name="gpt-4o",
    n_trials=10,
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=mapper,
    dataset=dataset,
    initial_prompts=["Given the context: {context}, answer: {question}"],
)

print(f"Best score:  {result.final_score:.4f}")
print(f"Best prompt: {result.best_generator.get_prompt_template()}")

Full walkthrough: examples/FutureAGI_Agent_Optimizer.ipynb · Open in Colab


The six algorithms

Each algorithm is a drop-in optimize() call. Swap without touching your dataset, evaluator, or data mapper.

Algorithm Best for Key idea
Random Search Baselines and sanity checks Random prompt variations around a seed
Bayesian Search Few-shot example selection Optuna TPE over example subsets and ordering
ProTeGi Iterative refinement Textual gradients from error analysis, beam-searched
Meta-Prompt Teacher-model rewrites Strong teacher analyzes failures, rewrites the prompt
PromptWizard Multi-stage pipelines Mutate → critique → refine, N rounds
GEPA Complex solution spaces Genetic Pareto evolution across multiple objectives
Quick snippets for each
from fi.opt.optimizers import (
    RandomSearchOptimizer, BayesianSearchOptimizer,
    ProTeGi, MetaPromptOptimizer,
    PromptWizardOptimizer, GEPAOptimizer,
)
from fi.opt.generators import LiteLLMGenerator

teacher = LiteLLMGenerator(model="gpt-4o", prompt_template="{prompt}")

# Random — fastest baseline
RandomSearchOptimizer(generator=teacher, teacher_model="gpt-4o", num_variations=5)

# Bayesian — few-shot selection via Optuna
BayesianSearchOptimizer(min_examples=2, max_examples=8, n_trials=20,
                        inference_model_name="gpt-4o-mini", teacher_model_name="gpt-4o")

# ProTeGi — textual gradient refinement
ProTeGi(teacher_generator=teacher, num_gradients=4, beam_size=4)

# Meta-Prompt — teacher-driven rewrites
MetaPromptOptimizer(teacher_generator=teacher, num_rounds=5)

# PromptWizard — mutate / critique / refine
PromptWizardOptimizer(teacher_generator=teacher, mutate_rounds=3, refine_iterations=2)

# GEPA — evolutionary Pareto
GEPAOptimizer(reflection_model="gpt-5", generator_model="gpt-4o-mini")

Core concepts

Generators

Execute a prompt, return a response. LiteLLMGenerator works with every LiteLLM-supported provider.

from fi.opt.generators import LiteLLMGenerator

generator = LiteLLMGenerator(
    model="gpt-4o-mini",
    prompt_template="Summarize this text: {text}",
)

Evaluators

Score a generated output. Three flavors (heuristic, LLM-as-judge, and the Future AGI platform's pre-built templates), all behind one Evaluator API.

# Heuristic
from fi.evals.metrics import BLEUScore
evaluator = Evaluator(BLEUScore())

# LLM-as-judge
from fi.evals.llm import LiteLLMProvider
from fi.evals.metrics import CustomLLMJudge

judge = CustomLLMJudge(
    provider=LiteLLMProvider(),
    config={
        "name": "correctness_judge",
        "grading_criteria": (
            "Score 1.0 if 'response' is semantically equivalent to "
            "'expected_response'. 0.0 if incorrect. Partial credit OK."
        ),
    },
    model="gemini/gemini-2.5-flash",
    temperature=0.4,
)
evaluator = Evaluator(metric=judge)

# Future AGI platform — 50+ pre-built templates
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="...", fi_secret_key="...",
)

Data mappers

Translate your dataset's shape into the keys the evaluator expects.

from fi.opt.datamappers import BasicDataMapper

mapper = BasicDataMapper(key_map={
    "output":       "generated_output",  # from the generator
    "input":        "question",          # from the dataset row
    "ground_truth": "answer",            # from the dataset row
})

Advanced usage

Custom heuristic metric

from fi.evals.metrics.base_metric import BaseMetric

class ExactMatchWithNormalization(BaseMetric):
    @property
    def metric_name(self):
        return "exact_match_norm"

    def compute_one(self, inputs):
        return float(inputs["response"].strip().lower()
                     == inputs["expected_response"].strip().lower())

Custom prompt builder (few-shot composition)

def builder(base_prompt: str, few_shot: list[str]) -> str:
    return f"{base_prompt}\n\nExamples:\n" + "\n\n".join(few_shot)

BayesianSearchOptimizer(prompt_builder=builder, ...)

Logging

from fi.opt.utils import setup_logging
import logging

setup_logging(level=logging.INFO,
              log_to_console=True, log_to_file=True,
              log_file="optimization.log")

Environment

export OPENAI_API_KEY="..."
export GEMINI_API_KEY="..."        # if using Gemini
export FI_API_KEY="..."            # for Future AGI platform evaluators
export FI_SECRET_KEY="..."

Where agent-opt fits in the Future AGI loop

simulate → evaluate → control → monitor → optimize. This SDK is the optimize step.

  • traceAI captures production traces of every LLM call.
  • ai-evaluation scores them with 50+ metrics.
  • agent-opt turns those scored traces into a better prompt.
  • The Agent Command Center ships the new prompt behind an OpenAI-compatible endpoint.

Use one SDK or all of them. Each is independently packaged and Apache 2.0-licensed.


Project structure

src/fi/opt/
├── base/              # Abstract base classes (Evaluator, Optimizer, …)
├── datamappers/       # Dataset-shape → evaluator-key translators
├── generators/        # LiteLLM-backed LLM callers
├── optimizers/        # Random, Bayesian, ProTeGi, Meta-Prompt, PromptWizard, GEPA
├── utils/             # Logging, IO, small helpers
└── types.py           # Shared type defs

Roadmap

Shipped In progress Coming up Exploring
  • Six algorithms (RS, Bayesian, ProTeGi, Meta-Prompt, PromptWizard, GEPA)
  • LiteLLM generator
  • ai-evaluation integration (heuristic + LLM-judge + platform)
  • Early-stopping config
  • GEPA iteration history
  • Public OSS launch
  • Async optimization loop
  • Multi-objective result surface
  • Trace-ingestion connector (traceAI → dataset)
  • Prompt version control with branches
  • Cost-aware optimization budgets
  • Resumable runs from checkpoint
  • CLI (agent-opt optimize …)
  • Auto-tuned rubrics from human feedback
  • Multi-turn dialogue optimization
  • Voice-agent prompt optimization
  • Federated optimization across tenants

Contributing

Bug fixes, new algorithms, new metrics, docs, examples: all welcome.

  1. Browse good first issue
  2. Read the main repo Contributing Guide — same CLA, same workflow.
  3. Say hi on Discord or Discussions.

Community & support

💬 Discord Real-time help from the team and community
🗨️ GitHub Discussions Ideas, questions, roadmap input
📝 Blog Engineering & research posts
📧 support@futureagi.com Cloud account / billing
🔐 security@futureagi.com Private vulnerability disclosure — see SECURITY.md

License

Licensed under the Apache License 2.0. See LICENSE and NOTICE.

Part of the Future AGI open-source ecosystem.


Built by the Future AGI team and contributors.

If agent-opt helps you ship better agents, a ⭐ helps more teams find us.

🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com

About

Open Source Library for Automated Optimization of AI Agent Workflows

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages