Skip to content

commitPirate/future-agi

 
 

Repository files navigation

⚠️ Nightly release for early testing. Expect rough edges. Stable version coming out soon — please open an issue if you hit anything.

Future AGI — make AI agents reliable

AI Agents hallucinate. Fix it faster.

The open-source platform for making AI agents reliable — evaluations, tracing, simulations, guardrails, gateway, optimization. One loop, on your infrastructure.

Apache 2.0 License GitHub stars Docker pulls PyPI npm Discord

Try Cloud (Free) · Self-Host · Docs · Blog · Discord · Discussions


Future AGI — trace an agent, run evals, simulate, and guardrail in one platform

Why Future AGI?

AI agents don't fail at launch. They fail in production, and most teams fight that with a stitched-together stack of evals, observability, and guardrails that never close the loop. FutureAGI collapses all of it into one platform and one feedback loop. Simulate edge cases before launch, evaluate what happens in production, protect users in real time, and turn every trace into signal for the next version. The result: agents that don't just get monitored, they self-improve.

All-in-one

No more stitching Langfuse + Braintrust + Helicone + Guardrails AI + a custom simulator. One platform covers the lifecycle: simulate → evaluate → protect → monitor → optimize, with data flowing back as a loop.

Open & self-hostable

Apache 2.0 core. Every evaluator, every prompt, every trace is inspectable — no black-box scoring. Self-host for data sovereignty or use our managed Cloud. Drop in your own stack at any layer via OTel / OpenAI-compatible HTTP.

Built for production

Go-based gateway with ~9.9 ns weighted routing, ~29 k req/s on t3.xlarge, P99 ≤ 21 ms with guardrails on. OpenTelemetry-native traces. 50+ framework instrumentors. Every claim reproducible via the committed benchmark harness.


🚀 Quickstart (60 seconds)

Three ways, picked by how much you want to install:

Cloud (fastest) Self-host (Docker) Self-host (Kubernetes)

No install. Free tier.

# Sign up free:
#   app.futureagi.com

pip install ai-evaluation

SOC 2 Type II · HIPAA · data stays in your region.

One command, full stack.

git clone \
  https://github.com/future-agi/future-agi
cd future-agi
cp futureagi/.env.example futureagi/.env
docker compose up -d

Open http://localhost:3031.

Production-grade, HA.

helm repo add futureagi \
helm install fagi futureagi/future-agi

Helm chart — v1 in progress. Until then, kubectl manifests in deploy/.

Instrument your first agent

Python

from fi_instrumentation import register
from traceai_openai import OpenAIInstrumentor

register(project_name="my-agent")
OpenAIInstrumentor().instrument()

# Your existing OpenAI code is now traced.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": query}],
)

TypeScript

import { register } from "@traceai/fi-core";
import { OpenAIInstrumentation } from "@traceai/openai";

register({ projectName: "my-agent" });
new OpenAIInstrumentation().instrument();

// Your existing OpenAI code is now traced.
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: query }],
});

Full docs → · Cookbooks → · API reference →


Core features

Six pillars. Each one replaces a tool you probably have.

1. Simulate — test agents at scale before users meet them

Run your agent against thousands of multi-turn conversations — realistic personas, adversarial inputs, domain edge cases. Text and voice (LiveKit, VAPI, Retell, Pipecat). Full audio + transcript capture. Scores fed straight into Evaluate.

Simulation docs → · Agent Playground → · Datasets →

2. Evaluate — 50+ metrics, unified evaluate() API

Groundedness · faithfulness · tool-use correctness · RAG context relevance · hallucination · answer completeness · PII · toxicity · bias · tone · custom rubrics. LLM-as-judge + heuristic + ML under one evaluate() call. Run on datasets, live traces, or in CI.

Evaluation docs → · Error Feeds → · Prompts →

3. Protect — real-time guardrails for production

18 built-in scanners (jailbreak, prompt injection, PII, secrets, code injection, content moderation, …) + 15 third-party vendor adapters (Lakera, Aporia, AWS Bedrock, Azure, Presidio, Llama Guard, Pangea, Enkrypt, Lasso, HiddenLayer, Gray Swan, DynamoAI, CrowdStrike, IBM, Zscaler). Inline in the gateway or standalone SDK.

Protect docs → · Gateway guardrails →

4. Monitor — OpenTelemetry-native tracing for every agent

50+ framework instrumentors (LangChain, LlamaIndex, CrewAI, AutoGen, DSPy, Haystack…). Span graphs · latency · token cost · live dashboards. Export to Jaeger, Datadog, Grafana — or Future AGI. Zero-config.

Observability docs →

5. Agent Command Center — the AI gateway built in

One OpenAI-compatible endpoint, 100+ providers (hosted + self-hosted). 15 routing strategies (latency-aware · cost-opt · shadow · failover · circuit breaker…). Exact + semantic caching across 10 backends. Virtual keys, budgets, rate limits, MCP, A2A.

Benchmarks (single Mac, 4 vCPU container = t3.xlarge profile):

  • ~9.9 ns weighted target selection (faster than Bifrost's published ~10 ns)
  • ~28 900 req/s sustained at 100 % success, P99 ≤ 21 ms (~5.7× Bifrost's 5 k rps on the same resource profile)
  • ~2.8 ms P95 end-to-end at ~1 k RPS (~2.9× faster than LiteLLM's 8 ms)
  • +0.5 % throughput, +1.4 ms P95 to add 3 inline guardrails

Gateway docs → · Full benchmarks → · vs Portkey / Bifrost / LiteLLM / Helicone / Kong →

6. Optimize — close the loop automatically

Six algorithms: Random Search · Bayesian · ProTeGi · Meta-Prompt · PromptWizard · GEPA (evolutionary). Plug in any LLM via LiteLLM; optimize against any of the 50+ eval metrics. Production traces feed back as training data.

Optimization docs → · Knowledge Base →


Deployment options

Target Status Notes
Docker Compose docker compose up -d from a fresh clone
Kubernetes Plain manifests today; Helm chart v1 in progress
AWS / GCP / Azure Runs on any container runtime — ECS · Cloud Run · AKS · EKS · GKE
AWS Marketplace Coming soon
Air-gapped / on-prem No phone-home — contact sales

Architecture

Every arrow is an open, documented interface: OpenTelemetry OTLP for traces, OpenAI-compatible HTTP for the gateway, Postgres / ClickHouse SQL for storage. Drop in your own stack at any layer.

Future AGI architecture — client SDKs → traceAI + Agent Command Center → Django platform → PostgreSQL, ClickHouse, Redis, RabbitMQ

Runtime: Python 3.11+ (Django 4.2 + Channels) · Go 1.23+ (gateway) · React 18 + Vite · Node 20+. Data: PostgreSQL (metadata) · ClickHouse (spans + time-series) · Redis (state) · RabbitMQ + Temporal (jobs).

Component breakdown (per-package)
Layer Component Code
Edge traceAI — OpenTelemetry instrumentation future-agi/traceAI
Edge Agent Command Center — OpenAI-compatible proxy futureagi/agentcc-gateway/
Platform tracer — OTLP ingest, span graph futureagi/tracer/
Platform agentic_eval — 50+ metrics, LLM-as-judge futureagi/agentic_eval/
Platform simulate — persona-driven scenario generation futureagi/simulate/
Platform model_hub — LLM routing, embeddings, datasets futureagi/model_hub/
Platform accounts · usage · integrations — auth, orgs, metering, connectors futureagi/accounts/
Data PostgreSQL · ClickHouse · Redis · RabbitMQ + Temporal

SDKs & integrations

Future AGI is an open-source ecosystem — each SDK is independently usable, independently packaged, Apache/MIT-licensed.

Client libraries

Repo Install Languages Purpose
traceAI pip install fi-instrumentation-otel
npm i @traceai/fi-core
Python · TS · Java · C# Zero-config OTel tracing for 50+ AI frameworks
ai-evaluation pip install ai-evaluation
npm i @future-agi/ai-evaluation
Python · TS 50+ evaluation metrics + guardrail scanners
futureagi pip install futureagi Python Platform SDK — datasets, prompts, KB, experiments
agent-opt pip install agent-opt Python 6 prompt-optimization algorithms (GEPA, PromptWizard, …)
simulate-sdk pip install agent-simulate Python Voice-agent simulation via LiveKit + Silero VAD
agentcc pip install agentcc
npm i @agentcc/client
Python · TS (+ LangChain · LlamaIndex · React · Vercel) Gateway client SDKs

Integrations

LLM providers (100+) OpenAI · Anthropic · Google Gemini · Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Groq · Cohere · Together · Perplexity · OpenRouter · Fireworks · xAI · Replicate · HuggingFace · + self-hosted Ollama · vLLM · LM Studio · TGI · Llamafile
Agent frameworks LangChain · LangGraph · LlamaIndex · CrewAI · AutoGen · Phidata · PydanticAI · Claude SDK · LiteLLM · Haystack · DSPy · Instructor · Smol-agents
Voice platforms VAPI · Retell · LiveKit · Pipecat
Vector DBs Pinecone · Weaviate · Chroma · Milvus · Qdrant · pgvector
Tools & infra Vercel AI SDK · n8n · MongoDB · MCP · A2A · Guardrails AI · Langfuse · HuggingFace Smol-agents

Full integrations catalog →


How Future AGI compares

Future AGI Langfuse Phoenix Braintrust Helicone
Open source
Apache 2.0

MIT

Elastic v2

Apache 2.0
Self-host
LLM tracing (OpenTelemetry)⚠️
Partial
Evaluation suites
50+ metrics
⚠️
Limited
Agent simulation
Voice agent eval
LLM gateway built in
100+ providers
Guardrails built in
18 + 15 adapters
⚠️
Basic
Prompt optimization
6 algorithms
Prompt management⚠️
Datasets & experiments
No-code eval builder⚠️⚠️⚠️

Based on publicly-documented features as of April 2026. Corrections welcome — open a PR.


Built for every kind of agent

  • Customer Support: Ship support AI that customers actually trust
  • Voice Agents: Test, evaluate, and improve voice AI end-to-end
  • Internal Tools: AI copilots your whole org can rely on
  • RAG & Search: Every answer grounded, every citation verified
  • Autonomous Agents: Multi-step agents you can actually trust in production
  • Computer-Use Agents (CUA): Agents that click with confidence
  • Coding Agents: AI that writes code you can actually ship

Roadmap

Vote on the public roadmap → · GitHub Discussions · Releases · Changelog

Recently shipped In progress Coming up Exploring
  • OpenTelemetry-native tracing
  • 50+ evaluation metrics
  • Agent Command Center (gateway)
  • Voice agent simulations
  • Agent IDE
  • Prompt optimization (6 algos)
  • ClickHouse analytics
  • MCP server support
  • Apache 2.0 licensing
  • Error-feed root-cause v2
  • Public OSS launch
  • Self-hosted Helm chart v1
  • AWS Marketplace listing
  • One-click cloud-deploy buttons
  • SSO (SAML + OIDC)
  • SCIM provisioning
  • Audit-log retention tiers
  • Distributed tracing across multi-agent graphs
  • Auto-tuned rubrics from human feedback
  • Per-tenant budgeting
  • Public benchmark harness + leaderboard
  • CUA eval suite
  • Emotion-aware voice scoring
  • On-device / air-gapped eval runtime
  • Prompt version control with branches
  • Federated evals across tenants
  • Marketplace for community evaluators

🤝 Contributing

We love contributions — bug fixes, new evaluators, framework integrations, docs, examples, anything.

  1. Browse good first issue
  2. Read the Contributing Guide
  3. Say hi on Discord or Discussions
  4. Sign the CLA on your first PR (automatic bot)

🌍 Community & support

💬 Discord Real-time help from the team and community
🗨️ GitHub Discussions Ideas, questions, roadmap input
🐦 Twitter / X Release announcements
📝 Blog Engineering & research posts
📺 YouTube Walkthroughs & demos
📊 Status Cloud uptime + incident history
📧 support@futureagi.com Cloud account / billing
🔐 security@futureagi.com Private vulnerability disclosure (24h ack — see SECURITY.md)

Telemetry

Self-hosted Future AGI phones home anonymous usage counts only (version, instance ID, feature flags used) so we can size our release testing. No trace data, no prompts, no API keys, ever. Opt out via FUTURE_AGI_TELEMETRY_DISABLED=1.


⭐ Star history

Star history

📄 License

Future AGI is licensed under the Apache License 2.0. See LICENSE and NOTICE.

You own your evaluation logic and your data. Inspect every evaluator, every prompt, every trace — no black-box scoring, no vendor lock-in.


Built with ❤️ by the Future AGI team and contributors worldwide.

If Future AGI helps you ship better AI, a ⭐ helps more teams find us.

🌐 futureagi.com · 📖 docs.futureagi.com · ☁️ app.futureagi.com · 📊 status.futureagi.com

About

Open-source, end-to-end platform for evaluating, observing, and improving LLM and AI agent applications. Tracing · Evals · Simulations · Datasets · Gateway · Guardrails. Self-hostable. Apache 2.0.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 48.5%
  • JavaScript 43.1%
  • Go 6.5%
  • Bru 1.2%
  • HTML 0.3%
  • Shell 0.3%
  • Other 0.1%