Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ cd HBDesigner/
To create a virtual environment with `mamba` for use on a GPU or CPU, respectively:
```
# for running on a GPU
mamba env create -f env.yaml
mamba env create -f env_gpu.yaml
pip install .

# for running on a CPU
Expand All @@ -36,10 +36,9 @@ uv pip install -e ".[gpu-cu124]"
# for running on a CPU
uv pip install -e ".[cpu]"
```
We also provide an alternative install method using `Pixi`. The following will create a `Pixi` project in the `HBDesigner` root directory:
To create a virtual environment with `Pixi` for use on a GPU or CPU respectively:
```
git clone https://github.com/Kuhlman-Lab/HBDesigner.git
cd HBDesigner/
# for running on a GPU
pixi install

# if installing with pixi on a GPU with CUDA 12.4 and include -e gpu-cu124 in pixi run command
Expand Down
2 changes: 1 addition & 1 deletion env.yaml → env_gpu.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: hbdesigner
name: hbdesigner_gpu
channels:
- conda-forge
dependencies:
Expand Down
13 changes: 9 additions & 4 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ These are the minimal input params that you should consider setting for any desi
# Number of top (best scoring, see below section for details) designs to save. Good values are 5-25, depending on your use case.
--top_k 5

# If needed, model checkpoint paths can be specified as follows:
--design_model_ckpt /path/to/model_weights/design_020.pt
--packing_model_ckpt /path/to/model_weights/pack.pt

# If running on a CPU include
--cpu
```
Expand All @@ -92,10 +96,11 @@ At larger `--n_res`, packing is harder, so you will get fewer good designs per `
Smaller amino acids, especially SER and THR, have notably higher success rates. This means that, if you don't care what amino acids are in your network, you can get higher success rates using --guide_seq SXX, --guide_seq TXX, etc.

## Postprocessing
We provide a helper script called `merge_networks.py` that attempts to naively combine output networks by checking for sequence overlap and clashes. This is NOT an exhaustive sweep, so it will not return ALL possible networks, but a subset from a rapid sampling procedure.
```
python merge_networks.py --designs designs/ --output merged_designs/ --no_duplicates --max_order 5 --min_order 2
```
We provide a helper script called `merge_networks.py` that attempts to naively combine output networks by checking for sequence overlap and clashes. This is NOT an exhaustive sweep, so it will not return ALL possible networks, but a subset from a rapid sampling procedure. For example usage, see `examples/postprocessing/run_merge.sh`.

When doing one-sided interface design, we may want to graft our output network back onto the wildtype target for downstream modeling. We provide a helper script called `graft_seq.py` to do this. For example usage, see `examples/postprocessing/run_graft.sh`

The most common use for HBDesigner outputs is as input for traditional sequence design. To do this, we use LigandMPNN to design the remaining sequence, keeping the HBDesigner network residues fixed. For an example of this, see `examples/postprocessing/run_mpnn.sh`.

### Conditioning parameters:
These are extra (optional) params you can use to help guide the model toward making specific types of networks more often.
Expand Down
673 changes: 673 additions & 0 deletions examples/postprocessing/1YRK_HBDes_rank_1.pdb

Large diffs are not rendered by default.

707 changes: 707 additions & 0 deletions examples/postprocessing/grafted/1YRK_HBDes_rank_1_grafted.pdb

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions examples/postprocessing/run_graft.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

# After one-sided design, we want to graft the binder residues onto the original structure to get the target seq back
f=1YRK_HBDes_rank_1.pdb
mkdir -p grafted
base=$(basename $f .pdb)
python ../../hbdesigner/scripts/graft_seq.py \
--target_pdb $f \
--ref_pdb ../interface/1YRK.pdb \
--out_pdb grafted/${base}_grafted.pdb \
--graft_chains B
18 changes: 18 additions & 0 deletions examples/postprocessing/run_merge.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

# Generate single networks with HBDesigner
run_hbdesigner \
--pdb ../interface/1YRK.pdb \
--n_workers 8 \
--n_samples 200 \
--n_res 3 \
--top_k 10 \
--out_dir ./single_network

# Merge single networks into multi-network designs
python ../../hbdesigner/scripts/merge_networks.py \
--designs ./single_network \
--output ./multi_network \
--max_order 2 \
--seed 123

32 changes: 32 additions & 0 deletions examples/postprocessing/run_mpnn.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash

# This example shows how to run LigandMPNN on HBDesigner outputs while keeping the network residues fixed.
# Note: the key helper script (get_res_sel_multi_json.py) can only be found in the Kuhlman Lab fork of LigandMPNN (https://github.com/Kuhlman-Lab/ligandmpnn/)
source ~/.bashrc
mamba activate ligandmpnn

input=./grafted
ligandmpnn_loc=/proj/kuhl_lab/LigandMPNN

# Get JSON listing files to process
python ${ligandmpnn_loc}/get_pdb_multi_json.py --pdb_dir $input --json_file multi_pdb.json

# Get JSON listing which residues to keep fixed (all except GLY)
python ${ligandmpnn_loc}/get_sel_res_multi_json.py --pdb_dir $input --flip --json_file multi_res.json --sel_restypes "G"

# Run LigandMPNN with sidechain context
python ${ligandmpnn_loc}/run.py \
--pdb_path_multi ./multi_pdb.json \
--out_folder ./mpnn_outputs \
--fixed_residues_multi multi_res.json \
--checkpoint_ligand_mpnn ${ligandmpnn_loc}/model_params/ligandmpnn_v_32_010_25.pt \
--temperature 0.1 \
--number_of_batches 1 \
--batch_size 1 \
--checkpoint_path_sc ${ligandmpnn_loc}/model_params/ligandmpnn_sc_v_32_002_16.pt \
--pack_side_chains 1 \
--number_of_packs_per_design 1 \
--ligand_mpnn_use_side_chain_context 1 \
--pack_with_ligand_context 1 \
--chains_to_design "A" \
--repack_everything 0
13 changes: 9 additions & 4 deletions hbdesigner/inference/inference_hbdesigner.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,16 +244,21 @@ def validate_inputs(self):
assert n_anchor_res > 0, "You must provide at least one anchor residue if --anchor_res is specified."

# Retrieve model weights and configurations
#self.opts.pack_cfg = os.path.join(Path(__file__).parents[2], "model_weights/pack.yaml")
pack_cfg_name = "pack_cpu.yaml" if self.opts.cpu else "pack.yaml"
design_cfg_name = (
f"{self.opts.design_model}_cpu.yaml" if self.opts.cpu else f"{self.opts.design_model}.yaml"
)
self.opts.pack_cfg = os.path.join(Path(__file__).parents[2], f"model_weights/{pack_cfg_name}")
self.opts.pack_ckpt = os.path.join(Path(__file__).parents[2], "model_weights/pack.pt")
#self.opts.design_cfg = os.path.join(Path(__file__).parents[2], f"model_weights/design_020.yaml")
# Override model weight paths if provided by user, otherwise use defaults
if self.opts.packing_model_ckpt is not None:
self.opts.pack_ckpt = self.opts.packing_model_ckpt
else:
self.opts.pack_ckpt = os.path.join(Path(__file__).parents[2], "model_weights/pack.pt")
self.opts.design_cfg = os.path.join(Path(__file__).parents[2], f"model_weights/{design_cfg_name}")
self.opts.design_ckpt = os.path.join(Path(__file__).parents[2], f"model_weights/{self.opts.design_model}.pt")
if self.opts.design_model_ckpt is not None:
self.opts.design_ckpt = self.opts.design_model_ckpt
else:
self.opts.design_ckpt = os.path.join(Path(__file__).parents[2], f"model_weights/{self.opts.design_model}.pt")
# These packing options are fixed for inference use
self.opts.pack_crop = 10.0
self.opts.packer = "hbpacker"
Expand Down
14 changes: 14 additions & 0 deletions hbdesigner/inference/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,5 +222,19 @@ def get_hbdes_parser() -> FileArgumentParser:
default=None,
help="Random seed for reproducible sampling. Default is no seed.",
)
parser.add_argument(
"--design_model_ckpt",
required=False,
type=str,
default=None,
help="Path to custom design model checkpoint. If not specified, will use default checkpoint for the specified design model (e.g. 'design_020')."
)
parser.add_argument(
"--packing_model_ckpt",
required=False,
type=str,
default=None,
help="Path to custom packing model checkpoint. If not specified, will use default checkpoint."
)

return parser
111 changes: 111 additions & 0 deletions hbdesigner/scripts/merge_networks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
import argparse
import os
import glob
import random
from tqdm import tqdm
import numpy as np
from scipy.spatial.distance import cdist

from hbdesigner.data.protein import Protein
import hbdesigner.data.residue_constants as rc


def check_clashes(p_anchor, p_next, threshold=3.0):
anchor_xyz = np.reshape(p_anchor.atom27_xyz[:, 4:14], (-1, 3)) # [N, 3]
next_xyz = np.reshape(p_next.atom27_xyz[:, 4:14], (-1, 3)) # [N, 3]
anchor_xyz_mask = np.reshape(p_anchor.atom27_mask[:, 4:14], (-1, 1))
next_xyz_mask = np.reshape(p_next.atom27_mask[:, 4:14], (-1, 1))

pair_dists = cdist(anchor_xyz, next_xyz) # [N, N]
pair_masks = anchor_xyz_mask * np.transpose(next_xyz_mask) # [N, N]
clashes = (pair_dists < threshold) * pair_masks
n_clashes = np.sum(clashes)

return n_clashes > 0


def check_overlap(anchor_res, next_res):
return np.sum(anchor_res[:, None] == next_res[None, :]) > 0


def main(designs: str, output: str, max_order: int = 2, min_order: int = 2, no_duplicates: bool = False, threshold: float = 3.0):

assert os.path.isdir(designs), f"Design directory {designs} does not exist."

os.makedirs(output, exist_ok=True)

# Collect all network files
network_files = glob.glob(os.path.join(designs, "*.pdb"))

# Shuffle into random order
random.shuffle(network_files)
idx_used = []

# Iterate over files and only consider networks after them in the list
for idx, anchor_network in tqdm(enumerate(network_files)):
# Load anchor network
p_anchor = Protein.from_pdb_file(anchor_network, discard_Hs=False)
if no_duplicates and idx in idx_used:
continue

# Iterate over all networks after the anchor network
graft_idx = idx + 1
while True:
# Break on list end
if graft_idx >= len(network_files):
break

next_network = network_files[graft_idx]
p_next = Protein.from_pdb_file(next_network, discard_Hs=False)

anchor_res = np.where(p_anchor.aatype != rc.restype_order["G"])[0]
next_res = np.where(p_next.aatype != rc.restype_order["G"])[0]

# Break if clash found
has_clash = check_clashes(p_anchor.mask(anchor_res), p_next.mask(next_res), threshold=threshold)
has_clash = has_clash or check_overlap(anchor_res, next_res)
if not has_clash:
# If no clash, graft together
p_anchor.aatype[next_res] = p_next.aatype[next_res]
p_anchor.atom27_xyz[next_res] = p_next.atom27_xyz[next_res]
p_anchor.atom27_mask[next_res] = p_next.atom27_mask[next_res]
else:
break
graft_idx += 1

# Break on max graft order
if (graft_idx - idx) >= max_order:
break
# Check if net passes min graft order
if (graft_idx - idx) < min_order:
continue
# Save the merged network
else:
order = graft_idx - idx
p_hash = round(np.abs(p_anchor.__hash__()) % 1e6)
print("Saving merged network: ", f"HBDes_merged_{p_hash}_order{order}.pdb")
output_file = os.path.join(output, f"HBDes_merged_{p_hash}_order{order}.pdb")
with open(output_file, "w") as f:
f.write(p_anchor.to_pdb())

if no_duplicates:
idx_used.extend(range(idx, graft_idx + 1))
return


if __name__ == "__main__":

parser = argparse.ArgumentParser(description="Merge multiple network files into one.")
parser.add_argument("--designs", type=str, required=True, help="Design directory")
parser.add_argument("--output", type=str, required=True, help="Output directory")
parser.add_argument("--seed", type=int, default=None, help="Random seed for reproducibility")
parser.add_argument("--max_order", type=int, default=2, help="Maximum grafting order (i.e., number of concurrent networks attempted).")
parser.add_argument("--min_order", type=int, default=2, help="Minimum grafting order (i.e., number of concurrent networks attempted).")
parser.add_argument("--no_duplicates", action="store_true", help="If set, will not allow the same network to be grafted multiple times.")
parser.add_argument("--threshold", type=float, default=3.0, help="Clash threshold distance in Angstroms.")

args = parser.parse_args()
if args.seed is not None:
random.seed(args.seed)

main(args.designs, args.output, args.max_order, args.min_order, args.no_duplicates, args.threshold)
Loading