🪞 CLEAR: Contrastive Learning of Experience via Agentic Reflection

LLM agents rely on effective model context to obtain task-relevant information for decision-making. Many existing context engineering approaches primarily rely on context generated from past experience and retrieval mechanisms that reuse it. However, the retrieved context may not be directly applicable to unseen future tasks.

CLEAR is a generative context augmentation framework that addresses this limitation. Instead of retrieving knowledge from the past, CLEAR trains a model to generate task-specific context that is better tailored to the current task. The pipeline works as follows:

🔍 A reflection agent performs contrastive analysis over past execution trajectories and summarizes useful context for each observed task.
📝 These summaries are used as supervised fine-tuning (SFT) data to train a Context Augmentation Model (CAM).
🎯 CAM is further optimized using reinforcement learning (RL), where the reward signal is obtained by running the task execution agent.

⚙️ Prerequisites

Python >= 3.12
LLaMA-Factory for SFT
veRL for RL training
AppWorld benchmark data
AWS credentials configured for S3, Bedrock and AgentCore access

🚀 Installation

uv venv --python 3.12
source .venv/bin/activate
uv pip install -e .

Set root path:

export CLEAR_ROOT=path/to/repo_root_dir

🔄 Pipeline

Step 1️⃣ Generate SFT Training Data

Use the reflection agent to analyze trajectories via contrastive learning and generate task-specific guidance. The agent compares multiple rollouts for each task to produce guidance that helps agents solve the task.

Trajectories are stored at training_data/appworld/replay/, organized by run (e.g., appworld_train_run0/, appworld_train_run1/).

cd $CLEAR_ROOT/src

python clear/main.py

This produces appworld_sft_data.jsonl containing (task_id, task_description, guidance) tuples for SFT training. For completeness, we already generated those data in training_data/appworld/sft if you don't want to reproduce it.

Step 2️⃣ Supervised Fine-Tuning (SFT) with LLaMA-Factory

Use the generated SFT data to fine-tune a base model (e.g., Qwen3-32B) with LLaMA-Factory.

Register the dataset from Step 1 in LLaMA-Factory's dataset_info.json.
Run SFT using the provided config:

llamafactory-cli train $CLEAR_ROOT/scripts/llamafactory_sft.yaml

The SFT checkpoint will be saved to saves/appworld-cam-qwen3-32b/full/sft.

Step 3️⃣ Reinforcement Learning with GRPO

Further improve the SFT model using Group Relative Policy Optimization (GRPO) via veRL.

Update scripts/verl_grpo.sh with your paths:
- TRAIN_DATASET: path to the RL training data (parquet format)
- actor_rollout_ref.model.path: path to the SFT model from Step 2
- custom_reward_function.path: path to your reward function
Run RL training:

bash $CLEAR_ROOT/scripts/verl_grpo.sh

Step 4️⃣ Deploy and Evaluate the Agent

Clone agentcore-rl-toolkit package and copy strands_appworld_agent into it:

cd $CLEAR_ROOT
git clone https://github.com/awslabs/agentcore-rl-toolkit.git
cp -rT strands_appworld_agent agentcore-rl-toolkit/examples/strands_appworld_agent

Follow the instructions in strands_appworld_agent/README.md to:

🖥️ Host the trained CAM via vLLM
🔧 Set up the Strands AppWorld agent
🐳 Run locally, in Docker, or deploy to Amazon Bedrock AgentCore
📈 Evaluate on the AppWorld benchmark

WebShop Agent Coming Soon

📁 Project Structure

.
├── src/
│   ├── clear/                      # 🪞 CLEAR: contrastive guidance generation
│   │   ├── main.py                 # Entry point for SFT data generation
│   │   ├── tools.py                # Restricted shell tool for trajectory analysis
│   │   ├── utils.py                # Utility functions
│   │   └── prompts/                # System and user prompts for the CLEAR agent
│   └── ace_appworld/               # AppWorld agent with playbook support
├── scripts/
│   ├── llamafactory_sft.yaml       # LLaMA-Factory SFT config
│   └── verl_grpo.sh                # veRL GRPO training script
├── strands_appworld_agent/         # 🤖 Strands-based AppWorld agent for deployment
├── training_data/                  # 📦 Trajectory and SFT data for training
└── pyproject.toml

📓 Note

This code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
src		src
strands_appworld_agent		strands_appworld_agent
training_data		training_data
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪞 CLEAR: Contrastive Learning of Experience via Agentic Reflection

⚙️ Prerequisites

🚀 Installation

🔄 Pipeline

Step 1️⃣ Generate SFT Training Data

Step 2️⃣ Supervised Fine-Tuning (SFT) with LLaMA-Factory

Step 3️⃣ Reinforcement Learning with GRPO

Step 4️⃣ Deploy and Evaluate the Agent

WebShop Agent Coming Soon

📁 Project Structure

📓 Note

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪞 CLEAR: Contrastive Learning of Experience via Agentic Reflection

⚙️ Prerequisites

🚀 Installation

🔄 Pipeline

Step 1️⃣ Generate SFT Training Data

Step 2️⃣ Supervised Fine-Tuning (SFT) with LLaMA-Factory

Step 3️⃣ Reinforcement Learning with GRPO

Step 4️⃣ Deploy and Evaluate the Agent

WebShop Agent Coming Soon

📁 Project Structure

📓 Note

📄 Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages