zkHydra

A unified framework for running zero-knowledge circuit security analysis tools. zkHydra orchestrates multiple analysis tools (circomspect, circom_civer, Picus, EcneProject, zkFuzz) to detect vulnerabilities in Circom circuits.

Quick Start

Using Docker (Recommended)

# Pull the image
docker pull ghcr.io/zksecurity/zkhydra:latest

docker-compose run --rm zkhydra uv run python -m zkhydra.main analyze \
  --input examples/test_bug_2/circuits/circuit.circom \
  --tools circomspect,circom_civer,picus,zkfuzz

# or
docker-compose run --rm zkhydra /bin/bash

Analyze Your First Circuit

# Inside the container, analyze the example circuit
uv run python -m zkhydra.main analyze \
  --input examples/test_bug/circuits/circuit.circom \
  --tools circomspect

Mount Your Own Circuits

Edit docker-compose.yml to mount your circuit directory:

volumes:
  - ./output:/zkhydra/output
  - ./examples:/zkhydra/examples
  - ./my-circuits:/zkhydra/my-circuits  # Add this line

Then analyze:

docker-compose run --rm zkhydra uv run python -m zkhydra.main analyze \
  --input my-circuits/circuit.circom \
  --tools circomspect,circom_civer,picus

Running on zkbugs Dataset

The zkbugs dataset contains real-world Circom vulnerabilities. zkhydra does NOT vendor it as a submodule — clone it yourself outside the zkhydra tree and point --dataset at it. Each bug's entrypoint, input JSON, ptau, codebase path, and -l link flags are resolved from the runner contract (scripts/print_bug_vars.sh inside the zkbugs repo).

End-to-end workflow

The full flow is: clone → download sources → run tools → triage Undecided → print results. Commands below assume zkhydra is your cwd and zkbugs is cloned alongside at ../zkbugs.

Step 1 — Clone zkbugs and populate codebases

git clone https://github.com/zksecurity/zkbugs.git ../zkbugs

# Required for --zkbugs-mode original and for any direct-mode bug whose
# wrapper pulls files from the project codebase via -l. Run once; rerun
# with --force to refresh.
(cd ../zkbugs && ./scripts/download_sources.sh)

A small number of bugs reference private project codebases that download_sources.sh cannot pull. Those bugs will stay skipped in every run with reason "codebase not available locally (run scripts/download_sources.sh, or source is private)" — expected behavior; nothing to fix on the zkhydra side.

Step 2 — Run zkhydra on the dataset

Pick one of the three patterns below. All of them write per-bug outputs under <output>/<bug_name>/ plus a dataset-level summary.json.

# Docker (recommended): mount both your zkhydra source and the zkbugs clone.
docker-compose run --rm \
  -v $(pwd)/zkhydra:/zkhydra/zkhydra \
  -v $(pwd)/../zkbugs:/zkhydra/zkbugs \
  zkhydra uv run python -m zkhydra.main zkbugs \
    --dataset zkbugs/dataset/circom \
    --zkbugs-mode direct \
    --tools all \
    --jobs 4 \
    --timeout 600 \
    --log-file \
    --output output/zkbugs-run

# Local (no Docker, tools installed on the host):
uv run python -m zkhydra.main zkbugs \
  --dataset ../zkbugs/dataset/circom \
  --zkbugs-mode direct \
  --tools all \
  --jobs 4 \
  --timeout 600 \
  --output output/zkbugs-run

# Quick smoke — 6 random bugs, reproducible via --random-seed:
uv run python -m zkhydra.main zkbugs \
  --dataset ../zkbugs/dataset/circom \
  --tools circomspect,circom_civer \
  --jobs 4 --random-bugs 6 --random-seed 42 \
  --timeout 120 --output output/zkbugs-smoke

# Run both direct AND original for every bug (original is skipped when
# it would be identical to direct). Writes output/zkbugs-run/{direct,original}/.
uv run python -m zkhydra.main zkbugs \
  --dataset ../zkbugs/dataset/circom \
  --zkbugs-mode both \
  --tools all \
  --jobs 4 \
  --timeout 600 \
  --output output/zkbugs-run

What this does:

Walks --dataset for zkbugs_config.json files (excluding dataset/codebases/ and dataset/*/dependencies/).
Builds each bug's Input via scripts/print_bug_vars.sh (located by walking up from --dataset).
Skips bugs whose Compiled Direct=false (or Compiled Original=false in original mode), and bugs whose codebase is missing. Skipped rows land in summary.json with a reason.
Runs the requested tools with per-bug precompilation; per-worker detail logs land at <output>/<bug>/run.log.

Step 3 — Triage the `Undecided` verdicts

evaluate_zkbugs_ground_truth is conservative — anything that isn't a trivial match is reported Undecided. The triage-zkbugs-finding skill + driver script promote those to TruePositive / FalseNegative (or keep Undecided with a reason when evidence is genuinely thin).

# Dry-run: collect every Undecided case into triage_queue.json for inspection.
python3 scripts/triage_zkbugs_run.py output/zkbugs-run \
  --dataset ../zkbugs/dataset/circom

# Automated: invoke Claude headlessly per case, write triage.json alongside
# each evaluation.json, rewrite evaluation.json with the triaged verdict
# (preserving the original at evaluation.original.json), and patch
# summary.json with an evaluation_counts rollup.
python3 scripts/triage_zkbugs_run.py output/zkbugs-run \
  --dataset ../zkbugs/dataset/circom \
  --auto --jobs 4 \
  --update-evaluation --update-summary

Flags in brief:

--auto — shell out to claude -p per case; write <bug>/<tool>/triage.json and a top-level triage_summary.json.
--update-evaluation — additionally rewrite each evaluation.json with the triaged verdict, preserving the original at evaluation.original.json (written once; re-runs do not clobber it).
--update-summary — patch the dataset summary.json with an evaluation_counts rollup and a per-tool evaluation subsection. Implies --update-evaluation.
--tool <name> — triage only one tool's verdicts (e.g. --tool picus).

Prerequisites: the claude CLI must be on PATH for --auto. Dry-run (default) needs no CLI.

Step 4 — Print the final results

# Tabular summary: header, TP/FN/Undecided rollup, per-tool counts, and
# a per-(bug, tool) table sorted by verdict.
python3 scripts/print_zkbugs_summary.py output/zkbugs-run

# Drill down by verdict or tool:
python3 scripts/print_zkbugs_summary.py output/zkbugs-run --filter FalseNegative
python3 scripts/print_zkbugs_summary.py output/zkbugs-run --tool picus
python3 scripts/print_zkbugs_summary.py output/zkbugs-run --no-rows

Raw JSON is always available too:

jq '.evaluation_counts' output/zkbugs-run/summary.json
jq '.bugs[] | select(.status=="skipped") | {bug_name, reason}' \
  output/zkbugs-run/summary.json
jq '.' output/zkbugs-run/<some_bug>/circomspect/evaluation.json

Step 5 — Inspect individual cases

Each bug dir has the per-tool detail:

output/zkbugs-run/<bug_name>/
├── ground_truth.json           # expected vulnerability, location, refs
├── run.log                     # per-bug worker log (parallel runs only)
├── scratch/                    # precompile artifacts (.r1cs / .sym)
└── <tool>/
    ├── raw.txt                 # tool stdout/stderr
    ├── parsed.json             # tool-specific structured output
    ├── results.json            # unified findings
    ├── evaluation.json         # final verdict (triaged or not)
    ├── evaluation.original.json  # pre-triage verdict (only after --update-evaluation)
    └── triage.json             # skill response (only after --auto)

zkbugs modes

--zkbugs-mode direct (default) — run against each bug's isolated wrapper circuit.circom. Every bug supports this mode and it's the fastest path. circom link flags are still needed because the wrapper typically includes files from the codebase (e.g. include "circuits/...").
--zkbugs-mode original — run against the project's real entrypoint (Original Entrypoint in zkbugs_config.json). Requires dataset/codebases/ to be populated.
--zkbugs-mode both — run direct for every bug, then original only for bugs whose Original Entrypoint is non-empty (i.e. distinct from direct). Writes <output>/direct/ + <output>/original/ with their own per-mode summary.json, plus a combined <output>/summary.json that aggregates both. Scripts like triage_zkbugs_run.py and print_zkbugs_summary.py operate on each sub-dir independently.

Selecting a subset of bugs

--bugs <sel1>,<sel2>,... — comma-separated substrings matched against each bug's directory name or its --dataset-relative path.
--bugs-file <path> — one selector per line (lines starting with # are ignored).

Both flags combine as a union. Missing-match exits with an error.

# Single bug
uv run python -m zkhydra.main zkbugs \
  --dataset zkbugs/dataset/circom --tools circomspect \
  --bugs veridise_decoder_accepting_bogus_output_signal

# Path fragment (matches all bugs under darkforest-v0.3/)
uv run python -m zkhydra.main zkbugs \
  --dataset zkbugs/dataset/circom --tools all \
  --bugs darkforest-eth/darkforest-v0.3

# From a file
uv run python -m zkhydra.main zkbugs \
  --dataset zkbugs/dataset/circom --tools all \
  --bugs-file my-selectors.txt

Parallelism and random sampling

--jobs N (default 1) — dispatch one bug per worker process. Tools within a bug still run sequentially. Each worker writes its detailed log to <output>/<bug_name>/run.log; the top-level log stays a concise index.
--random-bugs N — after selector filtering, randomly pick N bugs. Handy for quick parallel smoke tests. Ignored if N exceeds the runnable set.
--random-seed <int> — make --random-bugs reproducible.

# 6 random bugs across 4 workers, reproducible
uv run python -m zkhydra.main zkbugs \
  --dataset zkbugs/dataset/circom --tools all \
  --jobs 4 --random-bugs 6 --random-seed 42 \
  --timeout 600 --output output/parallel-smoke

--jobs 1 is byte-identical to a serial run. summary.json adds an errors field and a jobs field; rows are sorted by (status, bug_name) so diffs between serial and parallel runs stay clean.

Supported Tools

circomspect - Static analyzer and linter
circom_civer - SMT-based verification with CVC5
Picus - Symbolic execution via Rosette
EcneProject - Julia-based circuit analysis
zkFuzz - Fuzzing-based bug detection

Usage Modes

1. Analyze Mode

Run tools on a single circuit without ground truth.

uv run python -m zkhydra.main analyze \
  --input circuit.circom \
  --tools circomspect,circom_civer \
  --timeout 600 \
  --output results/

Output: Raw findings from each tool in results/

2. Evaluate Mode

Compare tool results against known vulnerabilities (requires zkbugs format config).

uv run python -m zkhydra.main evaluate \
  --input bug/zkbugs_config.json \
  --tools all

Output: Ground truth comparison, True Positives, False Negatives

3. zkbugs Mode

Run tools against the refactored zkbugs dataset.

uv run python -m zkhydra.main zkbugs \
  --dataset zkbugs/dataset/circom \
  --zkbugs-mode direct \
  --tools all \
  --timeout 600

Output: Per-bug analysis results, per-bug ground_truth.json (with the full refactored config: codebase, direct_entrypoint, original_entrypoint, input, executed, compiled_direct, compiled_original), and a dataset-level summary.json that records processed and skipped bugs with reasons.

CLI Options

# analyze / evaluate
--input, -i        Circuit file (.circom) for analyze mode

# zkbugs
--dataset, -d      Path to <zkbugs>/dataset/circom
--zkbugs-mode      direct (default) | original
--bugs             Comma-separated bug selectors (substring match)
--bugs-file        File with one bug selector per line (# comments allowed)
--jobs, -j         Parallel workers (one bug per worker; default 1)
--random-bugs, -n  Randomly pick N bugs after selector filtering
--random-seed      Seed for --random-bugs

# shared
--tools, -t        Tools to run (comma-separated or 'all')
--output, -o       Output directory (default: output/)
--timeout          Timeout per tool in seconds (default: 1800)
--log-file         Enable file logging
--log-level        Logging verbosity (default: INFO)
--vanilla          Re-process existing raw output instead of running tools

Companion scripts

Script	Purpose
`scripts/triage_zkbugs_run.py <run>`	Walk a run dir and collect every `Undecided` verdict; with `--auto` invoke the `triage-zkbugs-finding` skill per case; with `--update-evaluation` / `--update-summary` write verdicts back into `evaluation.json` and `summary.json`.
`scripts/print_zkbugs_summary.py <run>`	Pretty-print a run's header, per-tool TP/FN/Undecided rollup, skipped/errored bugs, and the full per-(bug, tool) verdict table. Filters via `--tool` / `--filter`.

Examples

Single Tool Analysis

docker-compose run --rm zkhydra uv run python -m zkhydra.main analyze \
  --input examples/test_bug/circuits/circuit.circom \
  --tools circomspect

New-format smoke test (no zkbugs checkout required)

examples/zkbugs_new_format/ ships a self-contained toy bug that mirrors the refactored layout:

uv run python -m zkhydra.main zkbugs \
  --dataset examples/zkbugs_new_format/dataset/circom \
  --zkbugs-mode direct \
  --tools circomspect \
  --timeout 30 \
  --output output/zkbugs-new-format-smoke

Multiple Tools with Timeout

docker-compose run --rm zkhydra uv run python -m zkhydra.main analyze \
  --input examples/test_bug/circuits/circuit.circom \
  --tools circomspect,circom_civer,zkfuzz \
  --timeout 300

Full zkbugs Evaluation

# From host machine with zkbugs cloned locally
docker-compose run --rm zkhydra uv run python -m zkhydra.main zkbugs \
  --dataset zkbugs/dataset/circom \
  --zkbugs-mode direct \
  --tools all \
  --timeout 600 \
  --log-file \
  --output output/zkbugs-run

Single bug (fast feedback loop)

docker-compose run --rm zkhydra uv run python -m zkhydra.main zkbugs \
  --dataset zkbugs/dataset/circom \
  --tools circomspect,circom_civer \
  --bugs veridise_decoder_accepting_bogus_output_signal \
  --timeout 120

Output Structure

output/
└── analyze_YYYYMMDD_HHMMSS/
    ├── circomspect/
    │   ├── raw.txt          # Raw tool output
    │   ├── tool_output.json # Execution metadata
    │   ├── parsed.json      # Structured findings
    │   └── results.json     # Standardized format
    ├── circom_civer/
    │   └── ...
    └── summary.json         # Aggregated results

For zkbugs mode:

output/zkbugs-run/
├── <bug_name>/
│   ├── ground_truth.json        # includes new keys (codebase, entrypoints, compile flags, mode)
│   ├── scratch/                 # precompile artifacts (.r1cs / .sym / compile.log)
│   ├── circomspect/
│   │   ├── raw.txt
│   │   ├── results.json
│   │   └── evaluation.json      # TP/FN/Undecided
│   └── ...
└── summary.json                 # processed + skipped rows with reasons, per-mode totals

Existing zkbugs analysis

You can find a report of the zkbugs analysis in output/zkbugs-report.pdf and you can download a tar with all the results here: https://drive.google.com/file/d/1zTIrrVqy0MXMxC4tRiPoFMEfXv-2TLYY/view?usp=sharing.

Installation (Local Development)

Prerequisites

Ubuntu 24.04
Python 3.12+
Rust toolchain
Julia, Node.js, Racket

Build from Source

# Clone repository with submodules
git clone --recurse-submodules https://github.com/zksecurity/zkhydra.git
cd zkhydra

# Run setup script (installs all dependencies and builds tools)
./setup.sh

# Run zkHydra
uv run python -m zkhydra.main --help

Docker Build

# Build image locally (takes 30-60 minutes)
docker build -t zkhydra:latest .

# Or use docker-compose
docker-compose build

Development

Code Quality

# Format and lint (requires uv)
make all

# Or manually
uv run black zkhydra/
uv run isort zkhydra/ --profile black
uv run ruff check zkhydra/ --fix

Project Structure

zkhydra/
├── zkhydra/              # Python package
│   ├── tools/           # Tool wrappers (circomspect.py, etc.)
│   ├── cli.py           # CLI argument parsing
│   ├── core.py          # Execution orchestration
│   └── main.py          # Entry point
├── tools/               # Tool source repos (git submodules)
├── examples/            # Example circuits for testing
├── docker-compose.yml   # Docker configuration
└── Dockerfile          # Multi-stage build with all tools

Troubleshooting

Tools Timeout

Increase timeout for slow tools:

--timeout 1800  # 30 minutes

Out of Memory

For large circuits, run tools individually:

--tools circomspect  # Run one at a time

Docker Issues

# Pull latest image
docker pull ghcr.io/zksecurity/zkhydra:latest

# Rebuild locally if needed
docker-compose build --no-cache

Funding

This project was partially funded by an Ethereum Foundation grant.

License

See LICENSE file.

Resources

zkbugs Dataset: https://github.com/zksecurity/zkbugs
Issues: https://github.com/zksecurity/zkhydra/issues

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.claude/skills/triage-zkbugs-finding		.claude/skills/triage-zkbugs-finding
.github/workflows		.github/workflows
configs		configs
examples		examples
helpers		helpers
output		output
scripts		scripts
setup		setup
tools		tools
zkhydra		zkhydra
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
config.toml		config.toml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
run.py		run.py
setup.sh		setup.sh
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

zkHydra

Quick Start

Using Docker (Recommended)

Analyze Your First Circuit

Mount Your Own Circuits

Running on zkbugs Dataset

End-to-end workflow

Step 1 — Clone zkbugs and populate codebases

Step 2 — Run zkhydra on the dataset

Step 3 — Triage the Undecided verdicts

Step 4 — Print the final results

Step 5 — Inspect individual cases

zkbugs modes

Selecting a subset of bugs

Parallelism and random sampling

Supported Tools

Usage Modes

1. Analyze Mode

2. Evaluate Mode

3. zkbugs Mode

CLI Options

Companion scripts

Examples

Single Tool Analysis

New-format smoke test (no zkbugs checkout required)

Multiple Tools with Timeout

Full zkbugs Evaluation

Single bug (fast feedback loop)

Output Structure

Existing zkbugs analysis

Installation (Local Development)

Prerequisites

Build from Source

Docker Build

Development

Code Quality

Project Structure

Troubleshooting

Tools Timeout

Out of Memory

Docker Issues

Funding

License

Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 3 — Triage the `Undecided` verdicts

Packages