AI Agents hallucinate. Fix it faster.

⚠️ Nightly release for early testing. Expect rough edges. Stable version coming out soon — please open an issue if you hit anything.

AI Agents hallucinate. Fix it faster.

The open-source platform for making AI agents reliable — evaluations, tracing, simulations, guardrails, gateway, optimization. One loop, on your infrastructure.

Try Cloud (Free) · Self-Host · Docs · Blog · Discord · Discussions

Future AGI — trace an agent, run evals, simulate, and guardrail in one platform

Why Future AGI?

AI agents don't fail at launch. They fail in production, and most teams fight that with a stitched-together stack of evals, observability, and guardrails that never close the loop. FutureAGI collapses all of it into one platform and one feedback loop. Simulate edge cases before launch, evaluate what happens in production, protect users in real time, and turn every trace into signal for the next version. The result: agents that don't just get monitored, they self-improve.

All-in-one

No more stitching Langfuse + Braintrust + Helicone + Guardrails AI + a custom simulator. One platform covers the lifecycle: simulate → evaluate → protect → monitor → optimize, with data flowing back as a loop.

Open & self-hostable

Apache 2.0 core. Every evaluator, every prompt, every trace is inspectable — no black-box scoring. Self-host for data sovereignty or use our managed Cloud. Drop in your own stack at any layer via OTel / OpenAI-compatible HTTP.

Built for production

Go-based gateway with ~9.9 ns weighted routing, ~29 k req/s on t3.xlarge, P99 ≤ 21 ms with guardrails on. OpenTelemetry-native traces. 50+ framework instrumentors. Every claim reproducible via the committed benchmark harness.

🚀 Quickstart (60 seconds)

Three ways, picked by how much you want to install:

Cloud (fastest) Self-host (Docker) Self-host (Kubernetes)

No install. Free tier.

# Sign up free:
#   app.futureagi.com

pip install ai-evaluation

_{SOC 2 Type II · HIPAA · data stays in your region.}

One command, full stack.

git clone \
  https://github.com/future-agi/future-agi
cd future-agi
cp futureagi/.env.example futureagi/.env
docker compose up -d

Open http://localhost:3031.

Production-grade, HA.

helm repo add futureagi \
helm install fagi futureagi/future-agi

_{Helm chart — v1 in progress. Until then, kubectl manifests in deploy/.}

Instrument your first agent

Python

from fi_instrumentation import register
from traceai_openai import OpenAIInstrumentor

register(project_name="my-agent")
OpenAIInstrumentor().instrument()

# Your existing OpenAI code is now traced.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": query}],
)

TypeScript

import { register } from "@traceai/fi-core";
import { OpenAIInstrumentation } from "@traceai/openai";

register({ projectName: "my-agent" });
new OpenAIInstrumentation().instrument();

// Your existing OpenAI code is now traced.
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: query }],
});

_{Full docs → · Cookbooks → · API reference →}

Core features

Six pillars. Each one replaces a tool you probably have.

1. Simulate — test agents at scale before users meet them

Run your agent against thousands of multi-turn conversations — realistic personas, adversarial inputs, domain edge cases. Text and voice (LiveKit, VAPI, Retell, Pipecat). Full audio + transcript capture. Scores fed straight into Evaluate.

_{Simulation docs → · Agent Playground → · Datasets →}

2. Evaluate — 50+ metrics, unified `evaluate()` API

Groundedness · faithfulness · tool-use correctness · RAG context relevance · hallucination · answer completeness · PII · toxicity · bias · tone · custom rubrics. LLM-as-judge + heuristic + ML under one evaluate() call. Run on datasets, live traces, or in CI.

_{Evaluation docs → · Error Feeds → · Prompts →}

3. Protect — real-time guardrails for production

18 built-in scanners (jailbreak, prompt injection, PII, secrets, code injection, content moderation, …) + 15 third-party vendor adapters (Lakera, Aporia, AWS Bedrock, Azure, Presidio, Llama Guard, Pangea, Enkrypt, Lasso, HiddenLayer, Gray Swan, DynamoAI, CrowdStrike, IBM, Zscaler). Inline in the gateway or standalone SDK.

_{Protect docs → · Gateway guardrails →}

4. Monitor — OpenTelemetry-native tracing for every agent

50+ framework instrumentors (LangChain, LlamaIndex, CrewAI, AutoGen, DSPy, Haystack…). Span graphs · latency · token cost · live dashboards. Export to Jaeger, Datadog, Grafana — or Future AGI. Zero-config.

_{Observability docs →}

5. Agent Command Center — the AI gateway built in

One OpenAI-compatible endpoint, 100+ providers (hosted + self-hosted). 15 routing strategies (latency-aware · cost-opt · shadow · failover · circuit breaker…). Exact + semantic caching across 10 backends. Virtual keys, budgets, rate limits, MCP, A2A.

Benchmarks (single Mac, 4 vCPU container = t3.xlarge profile):

~9.9 ns weighted target selection (faster than Bifrost's published ~10 ns)
~28 900 req/s sustained at 100 % success, P99 ≤ 21 ms (~5.7× Bifrost's 5 k rps on the same resource profile)
~2.8 ms P95 end-to-end at ~1 k RPS (~2.9× faster than LiteLLM's 8 ms)
+0.5 % throughput, +1.4 ms P95 to add 3 inline guardrails

_{Gateway docs → · Full benchmarks → · vs Portkey / Bifrost / LiteLLM / Helicone / Kong →}

6. Optimize — close the loop automatically

Six algorithms: Random Search · Bayesian · ProTeGi · Meta-Prompt · PromptWizard · GEPA (evolutionary). Plug in any LLM via LiteLLM; optimize against any of the 50+ eval metrics. Production traces feed back as training data.

_{Optimization docs → · Knowledge Base →}

Deployment options

Target	Status	Notes
Docker Compose	✅	`docker compose up -d` from a fresh clone
Kubernetes	✅	Plain manifests today; Helm chart v1 in progress
AWS / GCP / Azure	✅	Runs on any container runtime — ECS · Cloud Run · AKS · EKS · GKE
AWS Marketplace	⏳	Coming soon
Air-gapped / on-prem	✅	No phone-home — contact sales

Architecture

Every arrow is an open, documented interface: OpenTelemetry OTLP for traces, OpenAI-compatible HTTP for the gateway, Postgres / ClickHouse SQL for storage. Drop in your own stack at any layer.

Runtime: Python 3.11+ (Django 4.2 + Channels) · Go 1.23+ (gateway) · React 18 + Vite · Node 20+. Data: PostgreSQL (metadata) · ClickHouse (spans + time-series) · Redis (state) · RabbitMQ + Temporal (jobs).

Component breakdown (per-package)

Layer	Component	Code
Edge	traceAI — OpenTelemetry instrumentation	`future-agi/traceAI`
Edge	Agent Command Center — OpenAI-compatible proxy	`futureagi/agentcc-gateway/`
Platform	tracer — OTLP ingest, span graph	`futureagi/tracer/`
Platform	agentic_eval — 50+ metrics, LLM-as-judge	`futureagi/agentic_eval/`
Platform	simulate — persona-driven scenario generation	`futureagi/simulate/`
Platform	model_hub — LLM routing, embeddings, datasets	`futureagi/model_hub/`
Platform	accounts · usage · integrations — auth, orgs, metering, connectors	`futureagi/accounts/`
Data	PostgreSQL · ClickHouse · Redis · RabbitMQ + Temporal	—

SDKs & integrations

Future AGI is an open-source ecosystem — each SDK is independently usable, independently packaged, Apache/MIT-licensed.

Client libraries

Repo	Install	Languages	Purpose
traceAI	`pip install fi-instrumentation-otel` `npm i @traceai/fi-core`	Python · TS · Java · C#	Zero-config OTel tracing for 50+ AI frameworks
ai-evaluation	`pip install ai-evaluation` `npm i @future-agi/ai-evaluation`	Python · TS	50+ evaluation metrics + guardrail scanners
futureagi	`pip install futureagi`	Python	Platform SDK — datasets, prompts, KB, experiments
agent-opt	`pip install agent-opt`	Python	6 prompt-optimization algorithms (GEPA, PromptWizard, …)
simulate-sdk	`pip install agent-simulate`	Python	Voice-agent simulation via LiveKit + Silero VAD
agentcc	`pip install agentcc` `npm i @agentcc/client`	Python · TS (+ LangChain · LlamaIndex · React · Vercel)	Gateway client SDKs

Integrations


LLM providers (100+)	OpenAI · Anthropic · Google Gemini · Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Groq · Cohere · Together · Perplexity · OpenRouter · Fireworks · xAI · Replicate · HuggingFace · + self-hosted Ollama · vLLM · LM Studio · TGI · Llamafile
Agent frameworks	LangChain · LangGraph · LlamaIndex · CrewAI · AutoGen · Phidata · PydanticAI · Claude SDK · LiteLLM · Haystack · DSPy · Instructor · Smol-agents
Voice platforms	VAPI · Retell · LiveKit · Pipecat
Vector DBs	Pinecone · Weaviate · Chroma · Milvus · Qdrant · pgvector
Tools & infra	Vercel AI SDK · n8n · MongoDB · MCP · A2A · Guardrails AI · Langfuse · HuggingFace Smol-agents

_{Full integrations catalog →}

How Future AGI compares

	Future AGI	Langfuse	Phoenix	Braintrust	Helicone
Open source	✅ _{Apache 2.0}	✅ _MIT	✅ _{Elastic v2}	❌	✅ _{Apache 2.0}
Self-host	✅	✅	✅	❌	✅
LLM tracing (OpenTelemetry)	✅	⚠️ _Partial	✅	❌	❌
Evaluation suites	✅ _{50+ metrics}	✅	✅	✅	⚠️ _Limited
Agent simulation	✅	❌	❌	❌	❌
Voice agent eval	✅	❌	❌	❌	❌
LLM gateway built in	✅ _{100+ providers}	❌	❌	❌	✅
Guardrails built in	✅ _{18 + 15 adapters}	❌	❌	❌	⚠️ _Basic
Prompt optimization	✅ _{6 algorithms}	❌	❌	❌	❌
Prompt management	✅	✅	⚠️	✅	❌
Datasets & experiments	✅	✅	✅	✅	❌
No-code eval builder	✅	⚠️	⚠️	⚠️	❌

_{Based on publicly-documented features as of April 2026. Corrections welcome — open a PR.}

Built for every kind of agent

Customer Support: Ship support AI that customers actually trust
Voice Agents: Test, evaluate, and improve voice AI end-to-end
Internal Tools: AI copilots your whole org can rely on
RAG & Search: Every answer grounded, every citation verified
Autonomous Agents: Multi-step agents you can actually trust in production
Computer-Use Agents (CUA): Agents that click with confidence
Coding Agents: AI that writes code you can actually ship

Roadmap

Vote on the public roadmap → · GitHub Discussions · Releases · Changelog

Recently shipped	In progress	Coming up	Exploring
OpenTelemetry-native tracing 50+ evaluation metrics Agent Command Center (gateway) Voice agent simulations Agent IDE Prompt optimization (6 algos) ClickHouse analytics MCP server support Apache 2.0 licensing Error-feed root-cause v2	Public OSS launch Self-hosted Helm chart v1 AWS Marketplace listing One-click cloud-deploy buttons	SSO (SAML + OIDC) SCIM provisioning Audit-log retention tiers Distributed tracing across multi-agent graphs Auto-tuned rubrics from human feedback Per-tenant budgeting Public benchmark harness + leaderboard	CUA eval suite Emotion-aware voice scoring On-device / air-gapped eval runtime Prompt version control with branches Federated evals across tenants Marketplace for community evaluators

🤝 Contributing

We love contributions — bug fixes, new evaluators, framework integrations, docs, examples, anything.

Browse good first issue
Read the Contributing Guide
Say hi on Discord or Discussions
Sign the CLA on your first PR (automatic bot)

🌍 Community & support


💬 Discord	Real-time help from the team and community
🗨️ GitHub Discussions	Ideas, questions, roadmap input
🐦 Twitter / X	Release announcements
📝 Blog	Engineering & research posts
📺 YouTube	Walkthroughs & demos
📊 Status	Cloud uptime + incident history
📧 support@futureagi.com	Cloud account / billing
🔐 security@futureagi.com	Private vulnerability disclosure (24h ack — see SECURITY.md)

Telemetry

Self-hosted Future AGI phones home anonymous usage counts only (version, instance ID, feature flags used) so we can size our release testing. No trace data, no prompts, no API keys, ever. Opt out via FUTURE_AGI_TELEMETRY_DISABLED=1.

⭐ Star history

📄 License

Future AGI is licensed under the Apache License 2.0. See LICENSE and NOTICE.

You own your evaluation logic and your data. Inspect every evaluator, every prompt, every trace — no black-box scoring, no vendor lock-in.

Built with ❤️ by the Future AGI team and contributors worldwide.

If Future AGI helps you ship better AI, a ⭐ helps more teams find us.

🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com · 📊 status.futureagi.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
.husky		.husky
frontend		frontend
futureagi		futureagi
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.lintstagedrc.cjs		.lintstagedrc.cjs
BRANCH_NAMING_CONVENTION.md		BRANCH_NAMING_CONVENTION.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
TESTING.md		TESTING.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.frontend.yml		docker-compose.frontend.yml
docker-compose.yml		docker-compose.yml
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Agents hallucinate. Fix it faster.

Why Future AGI?

All-in-one

Open & self-hostable

Built for production

🚀 Quickstart (60 seconds)

Instrument your first agent

Core features

1. Simulate — test agents at scale before users meet them

2. Evaluate — 50+ metrics, unified `evaluate()` API

3. Protect — real-time guardrails for production

4. Monitor — OpenTelemetry-native tracing for every agent

5. Agent Command Center — the AI gateway built in

6. Optimize — close the loop automatically

Deployment options

Architecture

SDKs & integrations

Client libraries

Integrations

How Future AGI compares

Built for every kind of agent

Roadmap

🤝 Contributing

🌍 Community & support

Telemetry

⭐ Star history

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Agents hallucinate. Fix it faster.

Why Future AGI?

All-in-one

Open & self-hostable

Built for production

🚀 Quickstart (60 seconds)

Instrument your first agent

Core features

1. Simulate — test agents at scale before users meet them

2. Evaluate — 50+ metrics, unified evaluate() API

3. Protect — real-time guardrails for production

4. Monitor — OpenTelemetry-native tracing for every agent

5. Agent Command Center — the AI gateway built in

6. Optimize — close the loop automatically

Deployment options

Architecture

SDKs & integrations

Client libraries

Integrations

How Future AGI compares

Built for every kind of agent

Roadmap

🤝 Contributing

🌍 Community & support

Telemetry

⭐ Star history

📄 License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Evaluate — 50+ metrics, unified `evaluate()` API

Packages