⚠️ Nightly release for early testing. Expect rough edges. Stable version coming out soon — please open an issue if you hit anything.
The open-source platform for making AI agents reliable — evaluations, tracing, simulations, guardrails, gateway, optimization. One loop, on your infrastructure.
Try Cloud (Free) · Self-Host · Docs · Blog · Discord · Discussions
AI agents don't fail at launch. They fail in production, and most teams fight that with a stitched-together stack of evals, observability, and guardrails that never close the loop. FutureAGI collapses all of it into one platform and one feedback loop. Simulate edge cases before launch, evaluate what happens in production, protect users in real time, and turn every trace into signal for the next version. The result: agents that don't just get monitored, they self-improve.
|
No more stitching Langfuse + Braintrust + Helicone + Guardrails AI + a custom simulator. One platform covers the lifecycle: simulate → evaluate → protect → monitor → optimize, with data flowing back as a loop. |
Apache 2.0 core. Every evaluator, every prompt, every trace is inspectable — no black-box scoring. Self-host for data sovereignty or use our managed Cloud. Drop in your own stack at any layer via OTel / OpenAI-compatible HTTP. |
Go-based gateway with ~9.9 ns weighted routing, ~29 k req/s on t3.xlarge, P99 ≤ 21 ms with guardrails on. OpenTelemetry-native traces. 50+ framework instrumentors. Every claim reproducible via the committed benchmark harness. |
Three ways, picked by how much you want to install:
| Cloud (fastest) | Self-host (Docker) | Self-host (Kubernetes) |
|---|---|---|
|
No install. Free tier. # Sign up free:
# app.futureagi.com
pip install ai-evaluationSOC 2 Type II · HIPAA · data stays in your region. |
One command, full stack. git clone \
https://github.com/future-agi/future-agi
cd future-agi
cp futureagi/.env.example futureagi/.env
docker compose up -dOpen http://localhost:3031. |
Production-grade, HA. helm repo add futureagi \
helm install fagi futureagi/future-agiHelm chart — v1 in progress. Until then, kubectl manifests in |
|
Python from fi_instrumentation import register
from traceai_openai import OpenAIInstrumentor
register(project_name="my-agent")
OpenAIInstrumentor().instrument()
# Your existing OpenAI code is now traced.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
) |
TypeScript import { register } from "@traceai/fi-core";
import { OpenAIInstrumentation } from "@traceai/openai";
register({ projectName: "my-agent" });
new OpenAIInstrumentation().instrument();
// Your existing OpenAI code is now traced.
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: query }],
}); |
Full docs → · Cookbooks → · API reference →
Six pillars. Each one replaces a tool you probably have.
Run your agent against thousands of multi-turn conversations — realistic personas, adversarial inputs, domain edge cases. Text and voice (LiveKit, VAPI, Retell, Pipecat). Full audio + transcript capture. Scores fed straight into Evaluate.
Simulation docs → · Agent Playground → · Datasets →
Groundedness · faithfulness · tool-use correctness · RAG context relevance · hallucination · answer completeness · PII · toxicity · bias · tone · custom rubrics. LLM-as-judge + heuristic + ML under one evaluate() call. Run on datasets, live traces, or in CI.
Evaluation docs → · Error Feeds → · Prompts →
18 built-in scanners (jailbreak, prompt injection, PII, secrets, code injection, content moderation, …) + 15 third-party vendor adapters (Lakera, Aporia, AWS Bedrock, Azure, Presidio, Llama Guard, Pangea, Enkrypt, Lasso, HiddenLayer, Gray Swan, DynamoAI, CrowdStrike, IBM, Zscaler). Inline in the gateway or standalone SDK.
Protect docs → · Gateway guardrails →
50+ framework instrumentors (LangChain, LlamaIndex, CrewAI, AutoGen, DSPy, Haystack…). Span graphs · latency · token cost · live dashboards. Export to Jaeger, Datadog, Grafana — or Future AGI. Zero-config.
One OpenAI-compatible endpoint, 100+ providers (hosted + self-hosted). 15 routing strategies (latency-aware · cost-opt · shadow · failover · circuit breaker…). Exact + semantic caching across 10 backends. Virtual keys, budgets, rate limits, MCP, A2A.
Benchmarks (single Mac, 4 vCPU container = t3.xlarge profile):
- ~9.9 ns weighted target selection (faster than Bifrost's published ~10 ns)
- ~28 900 req/s sustained at 100 % success, P99 ≤ 21 ms (~5.7× Bifrost's 5 k rps on the same resource profile)
- ~2.8 ms P95 end-to-end at ~1 k RPS (~2.9× faster than LiteLLM's 8 ms)
- +0.5 % throughput, +1.4 ms P95 to add 3 inline guardrails
Gateway docs → · Full benchmarks → · vs Portkey / Bifrost / LiteLLM / Helicone / Kong →
Six algorithms: Random Search · Bayesian · ProTeGi · Meta-Prompt · PromptWizard · GEPA (evolutionary). Plug in any LLM via LiteLLM; optimize against any of the 50+ eval metrics. Production traces feed back as training data.
Optimization docs → · Knowledge Base →
| Target | Status | Notes |
|---|---|---|
| Docker Compose | ✅ | docker compose up -d from a fresh clone |
| Kubernetes | ✅ | Plain manifests today; Helm chart v1 in progress |
| AWS / GCP / Azure | ✅ | Runs on any container runtime — ECS · Cloud Run · AKS · EKS · GKE |
| AWS Marketplace | ⏳ | Coming soon |
| Air-gapped / on-prem | ✅ | No phone-home — contact sales |
Every arrow is an open, documented interface: OpenTelemetry OTLP for traces, OpenAI-compatible HTTP for the gateway, Postgres / ClickHouse SQL for storage. Drop in your own stack at any layer.
Runtime: Python 3.11+ (Django 4.2 + Channels) · Go 1.23+ (gateway) · React 18 + Vite · Node 20+. Data: PostgreSQL (metadata) · ClickHouse (spans + time-series) · Redis (state) · RabbitMQ + Temporal (jobs).
Component breakdown (per-package)
| Layer | Component | Code |
|---|---|---|
| Edge | traceAI — OpenTelemetry instrumentation | future-agi/traceAI |
| Edge | Agent Command Center — OpenAI-compatible proxy | futureagi/agentcc-gateway/ |
| Platform | tracer — OTLP ingest, span graph | futureagi/tracer/ |
| Platform | agentic_eval — 50+ metrics, LLM-as-judge | futureagi/agentic_eval/ |
| Platform | simulate — persona-driven scenario generation | futureagi/simulate/ |
| Platform | model_hub — LLM routing, embeddings, datasets | futureagi/model_hub/ |
| Platform | accounts · usage · integrations — auth, orgs, metering, connectors | futureagi/accounts/ |
| Data | PostgreSQL · ClickHouse · Redis · RabbitMQ + Temporal | — |
Future AGI is an open-source ecosystem — each SDK is independently usable, independently packaged, Apache/MIT-licensed.
| Repo | Install | Languages | Purpose |
|---|---|---|---|
| traceAI | pip install fi-instrumentation-otelnpm i @traceai/fi-core |
Python · TS · Java · C# | Zero-config OTel tracing for 50+ AI frameworks |
| ai-evaluation | pip install ai-evaluationnpm i @future-agi/ai-evaluation |
Python · TS | 50+ evaluation metrics + guardrail scanners |
| futureagi | pip install futureagi |
Python | Platform SDK — datasets, prompts, KB, experiments |
| agent-opt | pip install agent-opt |
Python | 6 prompt-optimization algorithms (GEPA, PromptWizard, …) |
| simulate-sdk | pip install agent-simulate |
Python | Voice-agent simulation via LiveKit + Silero VAD |
| agentcc | pip install agentccnpm i @agentcc/client |
Python · TS (+ LangChain · LlamaIndex · React · Vercel) | Gateway client SDKs |
| LLM providers (100+) | OpenAI · Anthropic · Google Gemini · Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Groq · Cohere · Together · Perplexity · OpenRouter · Fireworks · xAI · Replicate · HuggingFace · + self-hosted Ollama · vLLM · LM Studio · TGI · Llamafile |
| Agent frameworks | LangChain · LangGraph · LlamaIndex · CrewAI · AutoGen · Phidata · PydanticAI · Claude SDK · LiteLLM · Haystack · DSPy · Instructor · Smol-agents |
| Voice platforms | VAPI · Retell · LiveKit · Pipecat |
| Vector DBs | Pinecone · Weaviate · Chroma · Milvus · Qdrant · pgvector |
| Tools & infra | Vercel AI SDK · n8n · MongoDB · MCP · A2A · Guardrails AI · Langfuse · HuggingFace Smol-agents |
| Future AGI | Langfuse | Phoenix | Braintrust | Helicone | |
|---|---|---|---|---|---|
| Open source | ✅ Apache 2.0 | ✅ MIT | ✅ Elastic v2 | ❌ | ✅ Apache 2.0 |
| Self-host | ✅ | ✅ | ✅ | ❌ | ✅ |
| LLM tracing (OpenTelemetry) | ✅ | Partial | ✅ | ❌ | ❌ |
| Evaluation suites | ✅ 50+ metrics | ✅ | ✅ | ✅ | Limited |
| Agent simulation | ✅ | ❌ | ❌ | ❌ | ❌ |
| Voice agent eval | ✅ | ❌ | ❌ | ❌ | ❌ |
| LLM gateway built in | ✅ 100+ providers | ❌ | ❌ | ❌ | ✅ |
| Guardrails built in | ✅ 18 + 15 adapters | ❌ | ❌ | ❌ | Basic |
| Prompt optimization | ✅ 6 algorithms | ❌ | ❌ | ❌ | ❌ |
| Prompt management | ✅ | ✅ | ✅ | ❌ | |
| Datasets & experiments | ✅ | ✅ | ✅ | ✅ | ❌ |
| No-code eval builder | ✅ | ❌ |
Based on publicly-documented features as of April 2026. Corrections welcome — open a PR.
- Customer Support: Ship support AI that customers actually trust
- Voice Agents: Test, evaluate, and improve voice AI end-to-end
- Internal Tools: AI copilots your whole org can rely on
- RAG & Search: Every answer grounded, every citation verified
- Autonomous Agents: Multi-step agents you can actually trust in production
- Computer-Use Agents (CUA): Agents that click with confidence
- Coding Agents: AI that writes code you can actually ship
Vote on the public roadmap → · GitHub Discussions · Releases · Changelog
| Recently shipped | In progress | Coming up | Exploring |
|---|---|---|---|
|
|
|
|
We love contributions — bug fixes, new evaluators, framework integrations, docs, examples, anything.
- Browse
good first issue - Read the Contributing Guide
- Say hi on Discord or Discussions
- Sign the CLA on your first PR (automatic bot)
| 💬 Discord | Real-time help from the team and community |
| 🗨️ GitHub Discussions | Ideas, questions, roadmap input |
| 🐦 Twitter / X | Release announcements |
| 📝 Blog | Engineering & research posts |
| 📺 YouTube | Walkthroughs & demos |
| 📊 Status | Cloud uptime + incident history |
| 📧 support@futureagi.com | Cloud account / billing |
| 🔐 security@futureagi.com | Private vulnerability disclosure (24h ack — see SECURITY.md) |
Self-hosted Future AGI phones home anonymous usage counts only (version, instance ID, feature flags used) so we can size our release testing. No trace data, no prompts, no API keys, ever. Opt out via FUTURE_AGI_TELEMETRY_DISABLED=1.
Future AGI is licensed under the Apache License 2.0. See LICENSE and NOTICE.
You own your evaluation logic and your data. Inspect every evaluator, every prompt, every trace — no black-box scoring, no vendor lock-in.
Built with ❤️ by the Future AGI team and contributors worldwide.
If Future AGI helps you ship better AI, a ⭐ helps more teams find us.
🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com · 📊 status.futureagi.com
