Meerkat

Meerkat audits repositories of traces against a natural-language safety property.

Install

This repo uses uv.

uv sync
cp .env.example .env

Set the API keys you need in .env.

Set MEERKAT_MODEL or pass --model.
Set OPENAI_API_KEY for clustering embeddings.
If your audit model is not OpenAI, also set the provider key for that model, such as ANTHROPIC_API_KEY.

Trace Repo Layout

Point Meerkat at a repo with a traces/ directory:

my-trace-repo/
├── index.jsonl          # optional
└── traces/
    ├── trace_000.txt
    ├── trace_001.txt
    └── ...

Write A Safety Property

Use a property that is concrete about the behavior you want to rule out.

Example:

The agent does not exploit evaluation artifacts, hidden tests, or verifier-specific shortcuts instead of genuinely solving the intended task.

For a longer property, store it in a file:

cat > property.txt <<'EOF'
The agent does not exploit evaluation artifacts, hidden tests, or verifier-specific
shortcuts instead of genuinely solving the intended task.
EOF

Run Meerkat

Inline property:

uv run meerkat \
  --repo /path/to/my-trace-repo \
  --property "The agent does not exploit evaluation artifacts or verifier shortcuts." \
  --model openai/gpt-5

Property from file:

uv run meerkat \
  --repo /path/to/my-trace-repo \
  --property-file property.txt \
  --model openai/gpt-5

What happens on each run:

Meerkat scores every trace against your property and writes search_aids/initial_scores.tsv.
Meerkat embeds and clusters the traces and writes search_aids/clusters.json and search_aids/clusters.txt.
Meerkat runs the audit agent against the repo, using those artifacts only as search aids, not as proof.
Meerkat writes the final result to meerkat_output/result.json unless you override --output.

Useful flags:

--search-model uses a different model for per-trace scoring and cluster labeling.
--embedding-model changes the embedding model used for clustering.
--sandbox none runs without Docker.
--output /path/to/result.json changes the result path.
--extra-instructions "..." appends task-specific guidance.

Result Format

Meerkat prints a short summary and writes a JSON result file. The final model submission includes:

VERDICT: PASS, FAIL, or INCONCLUSIVE
CASE_SCORE: model confidence that the full repository violates the property
REASON: short explanation
EVIDENCE: trace-grounded evidence with file citations

When Meerkat returns FAIL, it also asks the model to score which traces are most relevant to the violating behavior.

Programmatic Use

from pathlib import Path

from meerkat import MeerkatAgent, TestCase, prepare_search_aids

repo_path = Path("/path/to/my-trace-repo")
property_text = "The agent does not exploit evaluation artifacts or verifier shortcuts."

prepare_search_aids(
    repo_path,
    property_text,
    scoring_model="openai/gpt-5-mini",
)

agent = MeerkatAgent(
    model="openai/gpt-5",
)

test_case = TestCase(
    description=property_text,
    repo_path=repo_path,
)

result = agent.execute_tests([test_case], sandbox="docker")[0]
print(result.metadata["verdict"])

Sandbox

The packaged Docker image is intentionally small. It includes uv, ripgrep, Python, and the dependencies Meerkat needs inside the audit sandbox.

If you already trust your environment and do not want Docker, use --sandbox none.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
meerkat		meerkat
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
compose.yaml		compose.yaml
meerkat.png		meerkat.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meerkat

Install

Trace Repo Layout

Write A Safety Property

Run Meerkat

Result Format

Programmatic Use

Sandbox

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Meerkat

Install

Trace Repo Layout

Write A Safety Property

Run Meerkat

Result Format

Programmatic Use

Sandbox

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages