Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 76 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,31 +12,85 @@ JSON files committed here.

```
benchmarks/
├── ch3f/ # CH₃F small-molecule benchmark
│ ├── results/*.json # 100 optimizer runs (L-BFGS-B, Nelder-Mead, Optax, …)
│ └── forcefields/ # Optimized force fields per run
├── rh-enamide/ # Rh-enamide TS (Donoghue 2008), 9 molecules
│ ├── results/*.json # JaxOpt, Optax, and regularized runs
│ └── forcefields/
├── heck-relay/ # Heck relay (Rosales 2020), 23 molecules
│ ├── results/*.json
│ └── forcefields/
├── pd-allyl-amination/ # Pd allyl amination (Wahlers 2021), 9 molecules
│ ├── results/*.json
│ └── forcefields/
├── pd-1,4-conjugate-addition/ # Pd 1,4-conjugate addition (Wahlers 2021), 10 molecules
│ ├── results/*.json
│ └── forcefields/
└── rh-1,4-conjugate-addition/ # Rh 1,4-conjugate addition (Wahlers 2022), 10 molecules
├── results/*.json
└── forcefields/

qfuerza-zenodo/ # QFUERZA paper validation data
├── ch3f/ # CH₃F small-molecule benchmark
│ ├── convergence/ # Ratio-gated end-to-end optimization (current pipeline)
│ ├── results/*.json # 100 optimizer runs (L-BFGS-B, Nelder-Mead, Optax, …) — full-matrix CLI output
│ └── forcefields/ # Optimized force fields per run from the matrix output
├── rh-enamide/ # Rh-enamide TS (Donoghue 2008), 9 molecules
│ └── convergence/ # Ratio-gated end-to-end optimization
├── heck-relay/ # Heck relay (Rosales 2020), 23 molecules
│ ├── convergence/ # Ratio-gated end-to-end optimization
│ └── diagnostic/ # Three-baseline diagnostic (q2mm#277 loader bug)
├── pd-allyl-amination/ # Pd allyl amination (Wahlers 2021), 21 molecules
│ └── convergence/
├── pd-1,4-conjugate-addition/ # Pd 1,4-conjugate addition (Wahlers 2021), 10 molecules
│ └── convergence/
└── rh-1,4-conjugate-addition/ # Rh 1,4-conjugate addition (Wahlers 2022), 10 molecules
└── convergence/

qfuerza-zenodo/ # QFUERZA paper validation data (Farrugia 2025)
├── README.md
├── cisplatin/ # Cisplatin ground state (Farrugia 2025)
└── rh-enamide/ # Rh-enamide QFUERZA/FUERZA force fields
├── cisplatin/ # Cisplatin ground state
└── rh-enamide/ # Rh-enamide QFUERZA/FUERZA force fields
```

Two standard directory layouts:

- `convergence/` — output of `scripts/regenerate_convergence_results.py`
in q2mm, the canonical end-to-end ratio-gated optimization pipeline.
Every published-FF system has exactly one of these. Contains
`validation_results.json`, `paper_metrics.json`, and the optimized
`.fld` force field.
- `results/` + `forcefields/` — output of the legacy full-matrix
`q2mm-benchmark` CLI. Currently kept only for `ch3f/`, which is
the source of the optimizer-matrix table in
[`docs/systems/small-molecules.md`](https://github.com/ericchansen/q2mm/blob/master/docs/systems/small-molecules.md).
Do not add new `results/`/`forcefields/` directories for other
systems unless you also wire them into a docs page in the same PR
(see "Stewardship rule" below).

## Stewardship rule — every committed file earns its place

This repo follows
[q2mm AGENTS.md §2](https://github.com/ericchansen/q2mm/blob/master/AGENTS.md):

> *"Every file earns its place. If you can't explain why a file exists
> and what would break without it, it probably shouldn't be there.
> No deprecated artifacts. If something is superseded, delete the old
> version in the same commit."*

For this repo specifically:

- **Before committing data**, identify the doc page or test fixture that
references it. Put the reference link in the PR description.
- **Before deleting code in q2mm that produced a directory layout here**,
open a paired cleanup PR in q2mm-data — never let the layout
references go stale on either side.
- **Don't commit speculative or exploratory output** — write it to a
local scratch dir. Only artifacts that back a public claim
(documentation, paper figure, regression test) belong here.

Run `scripts/audit-orphans.sh` (see below) periodically to catch any
directories that have lost their references.

## Auditing for orphaned data

```bash
# from the q2mm-data repo root, point at a checkout of ericchansen/q2mm:
scripts/audit-orphans.sh ../q2mm
```

The script walks every `benchmarks/<system>/<subdir>/` and searches the
q2mm checkout (`docs/`, `test/`, `q2mm/`, `scripts/`, `examples/`) for
references to that path. Any subdirectory without a reference is
reported as a candidate for deletion or wiring-in.

History note: the systems' legacy `results/` / `forcefields/` directories
(other than `ch3f/`) were dropped in
[#7](https://github.com/ericchansen/q2mm-data/pull/7) after the audit
flagged them as orphans — they had been speculatively populated by the
old full-matrix CLI but never referenced anywhere.

## JSON result format

Each benchmark result JSON contains:
Expand Down
Loading