Stop optimizing prompts. Start optimizing models.
Your agents should learn from experience, not just follow instructions.
Prefer Gen Z language? Check out README_GENZ.md for the same info but it hits different fr fr 🔥
DeltaLoop is an open-source continuous fine-tuning layer that automatically converts your AI agent logs into training data and fine-tunes your model to inherently understand your domain.
Traditional AI agent optimization is stuck in an endless loop:
Agent fails → Check logs → Rewrite prompt → Deploy → Test → Repeat
This creates prompt bloat (1500+ tokens), requires manual labor (100+ hours), and the model never actually learns.
Turn logs you're already analyzing into training data instead of manually rewriting prompts.
With DeltaLoop:
Agent runs → Auto-collect logs → Fine-tune model → Deploy adapter → Improve
Automated improvement that compounds over time. No more endless prompt engineering.
pip install deltaloopfrom deltaloop.adapters.langchain import DeltaLoopCallback
agent = create_react_agent(
llm=llm,
tools=tools,
callbacks=[DeltaLoopCallback()] # That's it!
)
# Run normally - logs auto-save to data/raw_logs/
agent.run(task)# Process logs into training data
deltaloop distill --input data/raw_logs/traces.jsonl --output train.jsonl
# Fine-tune your model (LoRA adapters, only ~17MB!)
deltaloop train --dataset train.jsonl --model mistral-7b --steps 500
# Evaluate improvement
deltaloop eval --adapter data/models/v1Done! Your model is now specialized for your domain.
- Framework Agnostic - Works with LangChain, AutoGen, CrewAI, LlamaIndex, or custom agents
- Fully Automated - Logs → Training → Deployment in 3 commands
- Lightweight - LoRA adapters are only ~17MB (not full model weights)
- Open Source - Apache 2.0, no vendor lock-in
- Cost Effective - Reduce prompt costs by 80%+
graph LR
A[Agent Logs] --> B[Distillation]
B --> C[Fine-Tuning]
C --> D[LoRA Adapter]
D --> E[Production]
E -.Feedback.-> A
- Adapters - Framework-specific log collectors (LangChain, AutoGen, etc.)
- Distillation - Convert logs into high-quality training datasets
- Training - Fine-tune with Unsloth, Transformers, or DPO
- Evaluation - Compare adapted model vs baseline
- Deployment - Load improved adapters into production
For programmatic workflows:
from deltaloop import Pipeline, PipelineConfig
# One-shot: logs → adapter
pipeline = Pipeline(PipelineConfig(
raw_logs="data/raw_logs/traces.jsonl",
base_model="mistral-7b",
output_dir="data/models/v1"
))
result = pipeline.run()
print(f"Improvement: {result.eval_summary.improvement_percent:.1f}%")from deltaloop.adapters.langchain import DeltaLoopCallback
agent = create_react_agent(callbacks=[DeltaLoopCallback()])from deltaloop.adapters.generic import GenericLogger
logger = GenericLogger()
# Manually log each interaction
logger.log(
prompt="Check order status for #12345",
output="Order #12345 shipped on 2024-01-15",
success=True,
tool_calls=["check_order_status"]
)
logger.save("data/raw_logs/custom.jsonl")Check out examples/customer_support_agent.py for a complete end-to-end example:
python examples/customer_support_agent.py --steps 100 --training-method sftThis demonstrates:
- E-commerce support scenarios
- Tool usage (order status, refunds, tickets)
- Policy adherence
- Before/after performance comparison
Current: Alpha (v0.1.0) - Production-ready core, expanding features
- ✅ Core distillation, training, and evaluation
- ✅ LangChain adapter + generic logger
- ✅ CLI with 4 commands
- ✅ Python API
- ✅ Comprehensive examples
- 🚧 Additional framework adapters (AutoGen, CrewAI)
- 🚧 Deployment automation
- 🚧 Advanced evaluation tasks
We welcome contributions! Priority areas:
- Framework Adapters - AutoGen, CrewAI, Haystack, Semantic Kernel
- Evaluation Tasks - Domain-specific benchmarks
- Examples - Real-world use cases
- Documentation - Tutorials, guides, videos
See CONTRIBUTING.md for details.
Apache 2.0 - See LICENSE for details.
Built for the open-source AI community.
Your agents should learn from experience, not just follow instructions.