QACS — Query-Aware Capability Sampler for Agentic DAG Search

Research prototype for query-conditional, capability-aware sampling of multi-agent operator DAGs. Built on top of bingreeky/MaAS (ICML'25 Oral).

🏗️ Method Overview

Large language model-based multi-agent systems have demonstrated strong capabilities in mathematical reasoning and code generation, and their effectiveness is heavily influenced by the underlying agent architecture. However, manually designing effective architectures is labor-intensive and task-dependent, making automated multi-agent architecture generation an important research direction. Existing methods typically formulate this problem as a sequential, incremental construction process, which favors local architectural decisions over global planning and provides limited query-specific structural adaptation.

To address this issue, QACS formulates multi-agent architecture search as query-conditioned Directed Acyclic Graph (DAG) generation over a learned capability space. Specifically, QACS first pretrains a capability-aware representation that summarizes each agent operator's reasoning profile in a shared latent space; conditioned on this representation, a lightweight sampler then generates a tailored DAG of agent calls for each input query, allowing the resulting pipeline to adapt to per-query reasoning demands.

📁 Repository Layout

.
├── examples/maas/
│   ├── optimize.py              # legacy MaAS entry (prompt search)
│   └── dag_optimize.py          # QACS entry — this is what the README uses
├── maas/
│   ├── ext/maas/
│   │   ├── models/
│   │   │   ├── unified_dag_sampler.py   # UnifiedDAGSampler (W_Q/W_K/need/stop/edge)
│   │   │   ├── dag.py                   # DAGPlan / DAGNode classes
│   │   │   └── utils.py                 # SentenceEncoder, helpers
│   │   ├── scripts/
│   │   │   ├── pretrain_capability.py   # Stage 1 pretraining CLI
│   │   │   ├── dag_optimizer.py         # Stage 2 REINFORCE driver
│   │   │   ├── dag_evaluator.py         # Per-dataset evaluation driver
│   │   │   └── optimized/
│   │   │       ├── GSM8K/train/dag_graph.py
│   │   │       ├── MATH/train/dag_graph.py
│   │   │       └── HumanEval/train/dag_graph.py
│   │   ├── benchmark/experiment_configs.py   # operator pool per dataset
│   │   └── data/                             # *.jsonl benchmark files
│   ├── actions/  prompts/  tools/  utils/    # inherited from MetaGPT
│   └── configs/
├── config/
│   ├── config2.example.yaml   # template — copy this
│   └── config2.yaml           # YOUR keys, gitignored
├── requirements.txt
└── README.md

🛠️ Installation

Environment

Python 3.10 or newer
A CUDA-capable GPU (tested on CUDA 11.8 with torch==2.1.0+cu118)
sentence-transformers pulls down all-MiniLM-L6-v2 on first run

Setup

git clone git@github.com:Roderick-Stinson/QACS.git
cd QACS

# option A: pip
pip install -r requirements.txt

# option B: uv (faster, self-contained venv)
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

📊 Data Preparation

QACS uses the same JSONL layout as upstream MaAS. Place each dataset under maas/ext/maas/data/:

maas/ext/maas/data/
├── gsm8k_train.jsonl
├── gsm8k_test.jsonl
├── math_test.jsonl
├── humaneval_train.jsonl
├── humaneval_test.jsonl
└── humaneval_public_test.jsonl

Each line is an object with at least these fields:

Dataset	Fields
GSM8K	`{ "question": str, "answer": str, "cot": str, "id": str }`
MATH	`{ "problem": str, "solution": str, "level": str, ... }`
HumanEval	`{ "prompt": str, "canonical_solution": str, "test": str, "task_id": str }`

🔑 Configuration

Copy the example and fill in your credentials (this file is gitignored so keys stay local):

cp config/config2.example.yaml config/config2.yaml
$EDITOR config/config2.yaml

Minimum required:

llm:
  api_type: "openai"
  model: "gpt-4o-mini"
  base_url: ""          # or your gateway URL
  api_key: "sk-..."     # your key

models:
  gpt-4o-mini:
    api_type: "openai"
    model: "gpt-4o-mini"
    base_url: ""
    api_key: "sk-..."

--opt_model_name and --exec_model_name on the CLI below are looked up in the models: block.

🏃 Quick Start

Stage 1 — Capability pretraining (one-off)

python -m maas.ext.maas.scripts.pretrain_capability \
    --k 8 \
    --epochs-stage1 50 \
    --epochs-stage2 20 \
    --output pretrained_sampler.pt

This writes pretrained_sampler.pt in the project root (containing W_Q / W_K / need_mlp / stop_net / W_edge). You only need to do this once; all three benchmarks reuse the same file.

Stage 2 — REINFORCE fine-tune (per dataset)

python -m examples.maas.dag_optimize \
    --dataset GSM8K --round 1 --sample 4 \
    --exec_model_name gpt-4o-mini \
    --lr 0.01 --max_nodes 8

Checkpoints, CSV results and per-query traces are written under maas/ext/maas/scripts/optimized/GSM8K/train/round_1/.

Evaluation on the held-out split

python -m examples.maas.dag_optimize \
    --dataset GSM8K --round 1 --sample 4 \
    --exec_model_name gpt-4o-mini --is_test

Test-time traces land under .../optimized/GSM8K/test/round_1/traces/.

📈 Results

TBD — the Stage-2 training pipeline is currently under diagnostic review. Early rounds show signs of a query-invariant routing prior (the capability matrix C is differentiated across operators but nearly constant across queries, and stop_net tends to terminate the DAG at a single node). We are deliberately withholding numbers until the routing is validated to avoid publishing misleading scores dominated by the fallback Programmer / Test verification operators.

🔬 Operator Pool

Operators are defined per task type in maas/ext/maas/benchmark/experiment_configs.py:

Task type	Operators available to the sampler
Math (GSM8K, MATH)	`Generate`, `GenerateCoT`, `MultiGenerateCoT`, `ScEnsemble`, `Programmer`, `SelfRefine`
Code (HumanEval)	`Generate`, `GenerateCoT`, `MultiGenerateCoT`, `ScEnsemble`, `Test`, `SelfRefine`

EarlyStop is defined in the pool but filtered out by DAGOptimizer — stopping is now handled by stop_net instead.

🙏 Acknowledgements

QACS is a research fork of bingreeky/MaAS (ICML'25 Oral); the dataset pipelines, operator implementations, benchmark wrappers and most of the execution infrastructure are inherited directly from it. The capability sampler (UnifiedDAGSampler), the two-stage training (pretrain_capability.py, dag_optimizer.py) and the per-dataset DAGWorkflow wrappers are the contributions of this branch.

Like the upstream, QACS also uses prompts and operator designs adapted from ADAS, AgentSquare and AFLOW.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QACS — Query-Aware Capability Sampler for Agentic DAG Search

🏗️ Method Overview

📁 Repository Layout

🛠️ Installation

Environment

Setup

📊 Data Preparation

🔑 Configuration

🏃 Quick Start

Stage 1 — Capability pretraining (one-off)

Stage 2 — REINFORCE fine-tune (per dataset)

Evaluation on the held-out split

📈 Results

🔬 Operator Pool

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.maas		.maas
assets		assets
config		config
examples/maas		examples/maas
maas		maas
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

QACS — Query-Aware Capability Sampler for Agentic DAG Search

🏗️ Method Overview

📁 Repository Layout

🛠️ Installation

Environment

Setup

📊 Data Preparation

🔑 Configuration

🏃 Quick Start

Stage 1 — Capability pretraining (one-off)

Stage 2 — REINFORCE fine-tune (per dataset)

Evaluation on the held-out split

📈 Results

🔬 Operator Pool

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages