Reduce conformer bench preprocessing times via single embed + jitter by scal444 · Pull Request #181 · NVIDIA-BioNeMo/nvMolKit

scal444 · 2026-05-27T12:34:42Z

Instead of running ETKDG on all conformers, embed one conformer, duplicate, jitter coordinates. Improves bottleneck for TFD, conformer RMSD, and ff minimization on the fly generation

Add bench_utils.perturb_conformer and bench_utils.embed_and_jitter as a shared helper for the three benches that need 'embed once then perturb' conformer prep: * tfd_bench, ff_optimize_bench: parallel embed_and_jitter across many mols. * conformer_rmsd_bench: per-mol EmbedMolecule + serial jitter loop in the existing single-mol benchmark structure (no parallel batch helper needed here since the outer loop processes one mol at a time).

greptile-apps · 2026-05-27T12:39:40Z

Greptile Summary

This PR replaces per-conformer ETKDG runs with a single-embed-then-jitter strategy across the TFD, conformer RMSD, and FF minimization benchmarks, significantly reducing preprocessing time. The shared embed_and_jitter helper in bench_utils/molprep.py parallelises the embedding step via process_map and then duplicates/perturbs the single base conformer in-process.

embed_and_jitter and perturb_conformer are extracted into bench_utils/molprep.py and exported; ff_optimize_bench.py removes its local _embed_conformers in favour of the shared helper.
tfd_bench.py switches from a per-molecule serial loop to the batch API, adding --prep-workers for parallelism control; conformer_rmsd_bench.py inlines the same single-embed + jitter pattern directly in its benchmark loop.

Confidence Score: 5/5

The changes are confined to benchmark preprocessing utilities and do not touch library code; the jitter logic is straightforward and the parallelism is well-guarded.

The refactor is clean: seed handling is consistent, the conformer copy-then-perturb order is correct (all copies are taken from the unperturbed base before the base itself is jittered), and no production library paths are affected.

No files require special attention.

Important Files Changed

Filename	Overview
benchmarks/bench_utils/molprep.py	Adds `perturb_conformer` (per-atom uniform jitter in place) and `embed_and_jitter` (single ETKDGv3 embed via process_map, then jitter-duplicate to desired count); logic is correct and seed usage is consistent
benchmarks/ff_optimize_bench.py	Removes local `_embed_conformers` and replaces it with shared `embed_and_jitter`; the call correctly passes `num_workers=args.rdkit_threads` and mols already carry explicit Hs from `prep_mols`, so `add_hs=False` default is appropriate
benchmarks/tfd_bench.py	Replaces per-molecule serial ETKDG with batch `embed_and_jitter`; `add_hs=True` and `min_atoms=4` are correctly threaded through; seed hardcoded to 42 in `prepare_molecules` is functionally equivalent to the old `42+i` scheme since `_embed_one` offsets by index internally
benchmarks/conformer_rmsd_bench.py	Switches from `EmbedMultipleConfs` to single-embed + per-conformer jitter, now guarding `num_confs < 2` explicitly; minor: embedding still happens before the `num_confs < 2` skip check, so a single-conformer run does one unnecessary embed
benchmarks/bench_utils/init.py	Exports `embed_and_jitter` and `perturb_conformer` from `molprep`; straightforward addition

_{Reviews (2): Last reviewed commit: "Update benchmarks/conformer_rmsd_bench.p..." | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

evasnow1992 · 2026-05-27T19:00:14Z

+    if num_confs < 2:
+        raise ValueError(f"num_confs must be >= 2 for TFD, got {num_confs}")
+    workers = num_workers if num_workers > 0 else max(1, multiprocessing.cpu_count() // 2)
+    return embed_and_jitter(


I’m a little concerned about replacing EmbedMultipleConfs with embed_and_jitter. If I understand correctly, the latter may generate conformers with near-identical torsion angles and without meaningful dihedral rotations. In that case, the TFD values could collapse close to 0, where the numerical comparison between RDKit and nvMolKit TFD implementations may become more variable or less informative.

Valid concern - I would expect that the values are different enough but I'll confirm with a quick experiment.

I added an additional spread parameter to avoid RMSD clumps, now both TFD and RMSD have reasonable spreads compared to ETKDG

evasnow1992

One comment, but no code validity issues. Changes look good to me.

greptile-apps Bot reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/bench_utils/molprep.py

Comment thread benchmarks/conformer_rmsd_bench.py

Update benchmarks/conformer_rmsd_bench.py

6919aef

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

scal444 requested a review from evasnow1992 May 27, 2026 12:59

evasnow1992 reviewed May 27, 2026

View reviewed changes

evasnow1992 approved these changes May 27, 2026

View reviewed changes

Spread out the jittered coordinates better

dc267f2

scal444 merged commit 5809dad into NVIDIA-BioNeMo:main May 29, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce conformer bench preprocessing times via single embed + jitter#181

Reduce conformer bench preprocessing times via single embed + jitter#181
scal444 merged 3 commits into
NVIDIA-BioNeMo:mainfrom
scal444:split/embed-perturb

scal444 commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 •

edited

Loading

Greptile Summary

Uh oh!

Uh oh!

Uh oh!

evasnow1992 May 27, 2026

Uh oh!

scal444 May 29, 2026

Uh oh!

scal444 May 29, 2026

Uh oh!

evasnow1992 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scal444 commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

evasnow1992 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

scal444 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

scal444 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

evasnow1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 27, 2026 •

edited

Loading