LLM agents rely on effective model context to obtain task-relevant information for decision-making. Many existing context engineering approaches primarily rely on context generated from past experience and retrieval mechanisms that reuse it. However, the retrieved context may not be directly applicable to unseen future tasks.
CLEAR is a generative context augmentation framework that addresses this limitation. Instead of retrieving knowledge from the past, CLEAR trains a model to generate task-specific context that is better tailored to the current task. The pipeline works as follows:
- 🔍 A reflection agent performs contrastive analysis over past execution trajectories and summarizes useful context for each observed task.
- 📝 These summaries are used as supervised fine-tuning (SFT) data to train a Context Augmentation Model (CAM).
- 🎯 CAM is further optimized using reinforcement learning (RL), where the reward signal is obtained by running the task execution agent.
- Python >= 3.12
- LLaMA-Factory for SFT
- veRL for RL training
- AppWorld benchmark data
- AWS credentials configured for S3, Bedrock and AgentCore access
uv venv --python 3.12
source .venv/bin/activate
uv pip install -e .Set root path:
export CLEAR_ROOT=path/to/repo_root_dirUse the reflection agent to analyze trajectories via contrastive learning and generate task-specific guidance. The agent compares multiple rollouts for each task to produce guidance that helps agents solve the task.
Trajectories are stored at training_data/appworld/replay/, organized by run (e.g., appworld_train_run0/, appworld_train_run1/).
cd $CLEAR_ROOT/src
python clear/main.pyThis produces appworld_sft_data.jsonl containing (task_id, task_description, guidance) tuples for SFT training. For completeness, we already generated those data in training_data/appworld/sft if you don't want to reproduce it.
Use the generated SFT data to fine-tune a base model (e.g., Qwen3-32B) with LLaMA-Factory.
- Register the dataset from Step 1 in LLaMA-Factory's
dataset_info.json. - Run SFT using the provided config:
llamafactory-cli train $CLEAR_ROOT/scripts/llamafactory_sft.yamlThe SFT checkpoint will be saved to saves/appworld-cam-qwen3-32b/full/sft.
Further improve the SFT model using Group Relative Policy Optimization (GRPO) via veRL.
-
Update
scripts/verl_grpo.shwith your paths:TRAIN_DATASET: path to the RL training data (parquet format)actor_rollout_ref.model.path: path to the SFT model from Step 2custom_reward_function.path: path to your reward function
-
Run RL training:
bash $CLEAR_ROOT/scripts/verl_grpo.shClone agentcore-rl-toolkit package and copy strands_appworld_agent into it:
cd $CLEAR_ROOT
git clone https://github.com/awslabs/agentcore-rl-toolkit.git
cp -rT strands_appworld_agent agentcore-rl-toolkit/examples/strands_appworld_agentFollow the instructions in strands_appworld_agent/README.md to:
- 🖥️ Host the trained CAM via vLLM
- 🔧 Set up the Strands AppWorld agent
- 🐳 Run locally, in Docker, or deploy to Amazon Bedrock AgentCore
- 📈 Evaluate on the AppWorld benchmark
.
├── src/
│ ├── clear/ # 🪞 CLEAR: contrastive guidance generation
│ │ ├── main.py # Entry point for SFT data generation
│ │ ├── tools.py # Restricted shell tool for trajectory analysis
│ │ ├── utils.py # Utility functions
│ │ └── prompts/ # System and user prompts for the CLEAR agent
│ └── ace_appworld/ # AppWorld agent with playbook support
├── scripts/
│ ├── llamafactory_sft.yaml # LLaMA-Factory SFT config
│ └── verl_grpo.sh # veRL GRPO training script
├── strands_appworld_agent/ # 🤖 Strands-based AppWorld agent for deployment
├── training_data/ # 📦 Trajectory and SFT data for training
└── pyproject.toml
This code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper.