Your agent's context is your attack surface. Act accordingly.
Secure context engineering for production AI agents.
Content security. Integrity verification. Trust hierarchy. Context that improves itself.
Website • Docs • Blog • Quickstart • Security Guide
Agents are getting compromised. Not theoretically — right now.
- EchoLeak (CVE-2025-32711, CVSS 9.3) — a single email triggered zero-click data exfiltration from Microsoft 365 Copilot1
- CrewAI + GPT-4o — researchers achieved 65% exfiltration success rate against multi-agent systems (COLM 2025)2
- Drift chatbot cascade — one compromised chatbot integration cascaded into 700+ organizations via Salesforce, Google Workspace, Slack, S3, and Azure3
- OWASP Top 10 for Agentic Applications published December 2025 — memory and context manipulation is a top risk category4
Agent A's output is Agent B's instruction. Memory is the vector.
Every other memory layer trusts content by default. That is the vulnerability.
We audited the docs, repos, and changelogs of every major memory tool.5 These protections do not exist anywhere else:
| Security Feature | mem0 | Zep | Letta | Aegis |
|---|---|---|---|---|
| Content injection detection | — | — | — | 4-stage pipeline |
| Memory integrity | — | — | — | HMAC-SHA256 |
| Agent identity binding | — | — | — | Cryptographic API key |
| Trust hierarchy | — | — | — | 4-tier OWASP model |
| Per-agent rate limiting | — | — | — | Sliding window |
| Security audit trail | — | — | — | Immutable event log |
| Sensitive data protection | — | — | — | Auto-detect + reject/redact/flag |
Aegis implements OWASP AI Agent Security recommendations natively. Six capabilities, none optional:
- 4-stage content security pipeline — input validation, sensitive data scanning, prompt injection detection, optional LLM-based injection classification. Every memory write. Not optional.
- HMAC-SHA256 integrity signing — tamper detection on store, verification on demand. You know if a memory was modified.
- OWASP 4-tier trust hierarchy — untrusted, internal, privileged, system. Agents get compromised. Aegis limits the blast radius.
- Cryptographic agent binding — API keys bound to agent identity. No more trusting a request body that says "I'm the admin agent."
- ACE loop — generation, reflection, curation. Agents that learn from their own mistakes and promote what works.
- Multi-agent coordination — scoped access control, cross-agent query, structured handoffs. Memory sharing with boundaries.
git clone https://github.com/quantifylabs/aegis-memory.git
cd aegis-memory
export OPENAI_API_KEY=sk-...
docker compose up -d
curl http://localhost:8000/health
# {"status": "healthy"}pip install aegis-memoryfrom aegis_memory import AegisClient
client = AegisClient(api_key="dev-key", base_url="http://localhost:8000")
# Planner agent stores task breakdown
client.add(
content="Task: Build login. Steps: 1) Form, 2) Validation, 3) API",
agent_id="planner",
scope="agent-shared",
shared_with_agents=["executor"]
)
# Executor queries planner's memories
memories = client.query_cross_agent(
query="current task",
requesting_agent_id="executor",
target_agent_ids=["planner"]
)
print(memories[0].content)Aegis Memory is the first context layer with a complete ACE loop — the Generation → Reflection → Curation cycle from Stanford/SambaNova's research, engineered for production.
Your agent made the same mistake 5 times? ACE loop remembers the fix forever. Stale memories polluting retrieval? Curation auto-cleans your playbook.
Generation Execution Reflection Curation
| | | |
Query playbook -> Run task with -> Auto-vote on -> Promote effective
for strategies tracked memories used memories Flag ineffective
Auto-reflect Consolidate duplicates
on failures
from aegis_memory import AegisClient
client = AegisClient(api_key="your-key")
# 1. GENERATION: Query agent-specific playbook
playbook = client.get_playbook_for_agent(
"executor",
query="API pagination task",
task_type="api-integration",
)
memory_ids = [e.id for e in playbook.entries]
# 2. EXECUTION: Track which memories the agent uses
run = client.start_run(
"task-42", "executor",
task_type="api-integration",
memory_ids_used=memory_ids,
)
# ... agent does its work ...
# 3. REFLECTION: Complete with outcome (auto-feedback!)
client.complete_run("task-42", success=True, evaluation={"score": 0.95})
# -> Auto-votes 'helpful' on every memory used
# -> On failure: auto-votes 'harmful' AND creates a reflection memory
# 4. CURATION: Periodically clean up
curation = client.curate(namespace="production")
# -> Promotes high-effectiveness entries
# -> Flags low-effectiveness for deprecation
# -> Identifies duplicate entries to consolidate| Feature | ACE-Inspired | Aegis ACE-Engineered |
|---|---|---|
| Voting | Manual vote endpoints | Auto-voting tied to run outcomes |
| Reflection | Manual reflection creation | Auto-reflection on failure with error context |
| Curation | Not implemented | Full curation cycle with promote/flag/consolidate |
| Run tracking | Not tracked | First-class ace_runs table linking memories to outcomes |
| Agent-specific playbook | Generic query | Filtered by agent_id + task_type |
Different tools solve different problems. This comparison stays focused on capabilities clearly documented in public repos and docs.5
| If you need... | Usually pick | Reason |
|---|---|---|
| Personalized assistant memory (user/profile facts) | mem0 | Designed around persistent user/agent memory for assistants |
| Personal/team "second brain" with ingestion | Supermemory | Knowledge-base style memory with connectors |
| Graph-native episodic memory over agent events | Graphiti / Zep | Focused on temporal + knowledge graph memory models |
| Stateful agent runtime + built-in memory blocks | Letta | Agent framework centered on durable state |
| Secure context engineering with built-in security, trust, and compliance | Aegis Memory | Only context layer with content security, integrity verification, and trust hierarchy |
| Multi-agent coordination with access boundaries | Aegis Memory | Scope-aware ACLs + cross-agent query APIs |
| Self-improving context loops (what worked / failed) | Aegis Memory | ACE patterns: vote, reflection, playbook |
| Capability | mem0 | Graphiti / Zep | Letta | Aegis Memory |
|---|---|---|---|---|
| Primary focus | Assistant personalization | Graph-based episodic memory | Stateful agents | Secure context engineering |
| Open source | Yes | Yes | Yes | Yes |
| Self-host posture | Available | Available | Available | Self-host-first |
| Content security pipeline | — | — | — | 4-stage (validation, PII, injection, LLM) |
| Memory integrity | — | — | — | HMAC-SHA256 |
| Trust hierarchy | — | — | — | 4-tier OWASP model |
| Multi-agent ACL/scopes | — | — | — | Yes |
| Cross-agent query | — | — | — | Yes |
| Handoff baton | — | — | — | Yes |
| ACE loop | — | — | — | Yes |
| Typed memory model | — | — | — | Yes |
| Temporal decay | — | Partial | — | Yes |
Pick Aegis Memory when most of these are true:
- You need content security — injection detection, integrity verification, sensitive data protection.
- You need multiple agents to share memory safely with explicit ACL/scopes.
- You need handoffs where one agent passes a reliable state bundle to another.
- You want ACE patterns (vote/reflection/playbook) to continuously improve memory quality.
- You prefer a self-host posture with operational control over storage and deployment.
- You need temporal decay so stale memories don't pollute retrieval over time.
Benchmarked on 8 vCPU / 7.6 GB RAM (Intel 13th Gen), 1000 memories, Docker Compose (PostgreSQL 16 + pgvector), concurrency=10. Queries include OpenAI embedding latency. Reproduce with cd benchmarks && bash run_benchmark.sh.
| Operation | p50 | p95 | p99 | Throughput |
|---|---|---|---|---|
| Sequential add | 72ms | 89ms | 97ms | 14.1 ops/s |
| Batch add (5x20) | 216ms | 292ms | 292ms | 4.6 ops/s |
| Concurrent add (c=10) | 100ms | 193ms | 511ms | 85.1 ops/s |
| Sequential query | 282ms | 411ms | 1502ms | 3.8 ops/s |
| Concurrent query (c=10) | 413ms | 1832ms | 1897ms | 18.6 ops/s |
| Cross-agent query | 304ms | 380ms | 380ms | 3.3 ops/s |
| Vote | 64ms | 176ms | 176ms | 14.1 ops/s |
| Deduplication | 75ms | 112ms | 112ms | 13.6 ops/s |
Query tail latency (p95/p99) is dominated by the external OpenAI embedding call, not Aegis or PostgreSQL. Write and vote operations that skip embedding are consistently under 100ms at p50.
docker compose up -dkubectl apply -f k8s/| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://... |
PostgreSQL connection |
OPENAI_API_KEY |
— | For embeddings |
AEGIS_API_KEY |
dev-key |
API authentication |
CONTENT_POLICY_INJECTION |
flag |
reject / redact / flag / allow |
CONTENT_POLICY_SECRETS |
reject |
reject / redact / flag / allow |
ENABLE_LLM_INJECTION_CLASSIFIER |
false |
Enable Stage 4 LLM classifier |
INJECTION_CLASSIFIER_MODEL |
gpt-4o-mini |
Model for injection classification |
docs.aegismemory.com — Full documentation
- Quickstart — Get running in 5 minutes
- Security Guide — Content security, integrity, trust hierarchy
- ACE Patterns — Self-improving agent patterns
- Smart Memory — Zero-config memory extraction
- Integrations — CrewAI, LangChain guides
- CLI Reference — Command-line tools
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Run tests
pytest tests/ -v
# Run linting
ruff check server/Apache 2.0 — Use it however you want. See LICENSE.
Built by engineers who read the OWASP reports and acted on them.
Footnotes
-
EchoLeak: Zero-click exfiltration from M365 Copilot. arxiv.org/html/2509.10540v1 ↩
-
Multi-agent exfiltration study (COLM 2025). openreview.net/pdf?id=DAozI4etUp ↩
-
CVE-2025-32711 zero-click AI vulnerability analysis. socprime.com/blog/cve-2025-32711-zero-click-ai-vulnerability/ ↩
-
OWASP Top 10 for Agentic Applications (2026). genai.owasp.org ↩
-
Security comparison based on public documentation and open-source repositories as of February 2026. Sources: mem0 docs | Zep docs | Letta repo | Aegis docs ↩ ↩2