Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions docs/source/cli_arguments.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
CLI Arguments
=============
Command Line Options
====================
The various command line options for running HBDesigner are described in detail below. For some general advice for how to use these options to maximize the success rate of HBDesigner, see the `Usage Advice`_ section at the end of this page.

.. argparse::
:module: hbdesigner.inference.parsers
:func: get_hbdes_parser
:func: get_hbdes_parser

Usage Advice
------------

At larger ``--n_res``, packing is harder, so you will get fewer good designs per ``--n_samples``. This means you might want to increase ``--n_samples`` when increasing ``--n_res``. Here is a good place to start:

.. code-block:: bash

--n_res=2, --n_samples=100
--n_res=3, --n_samples=200
--n_res=4, --n_samples=500
--n_res=5, --n_samples=500
--n_res=6, --n_samples=1000

Smaller amino acids, especially SER and THR, have notably higher success rates. This means that, if you don't care what amino acids are in your network, you can get higher success rates using ``--guide_seq SXX``, ``--guide_seq TXX``, etc.
71 changes: 71 additions & 0 deletions docs/source/examples/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
HBDesigner Examples
===================

Several examples of how to use HBDesigner have been included in this repository, you can find them in the ``examples`` folder. Descriptions of these examples can be found below.

Running the Examples
--------------------

Input structures
^^^^^^^^^^^^^^^^

HBDesigner takes a ``.pdb`` file as its primary input. It can have a sequence and/or sidechains on it, but they will be removed to create a PolyGLY backbone before use.

If you want to design intra-chain or monomer networks, you should include only the chain you wish to design.

If you want to design inter-chain or interface networks, you should include only the chains forming the interface you wish to design.

Output files
^^^^^^^^^^^^
HBDesigner produces two kinds of output:
- PDB files named "HBDes_rank_N.pdb", where N is the rank calculated by HBDesigner. Lower ranks are "better".
- A CSV file named "HBDes_stats.csv", which includes all of the scores and residue IDs of all designed networks. This includes networks that passed the scoring filters but didn't make the ``--top_k`` cutoff.

HBDesigner will only output networks that pass all of its scoring filters and meet its definition of 'successful'. A 'successful' design meets the following criteria:

- All network residues must be engaged in at least 1 sidechain-sidechain H-bond
- All network residues must form a single contiguous network
- The network passes the minimum thresholds for saturation, BUHs, BUPHs, etc. (see :doc:`../cli_arguments` for how to set these thresholds)

After filtering, HBDesigner will rank the remaining networks using various score terms, with the following priority:

1. ``buried_unsat_Hpol`` (buried unsatisfied polar H atoms): the fewer the better.
2. ``saturation``: the fraction of total h-bonding "capacity" that is being used across all network residue sc atoms: the higher the better.
3. HBond Score (``HB_Score_full``): the change in Rosetta energy provided by the designed network, calculated against an identical PolyG backbone: the lower (more negative) the better.

This setup has a few implications:

- It is possible for HBDesigner to return 0 networks, if given a hard enough task and/or few enough tries at it. If this happens, try increasing ``--n_samples``.

- If HBDesigner finds more networks than ``--top_k`` allows, it will only return the ``--top_k`` "best" according to the ranking scheme. If you want more outputs, increase ``--top_k``.

.. important::
It is assumed that you will be running these examples in their respective subdirectories in the ``examples`` folder. The examples use relative paths to access the necessary files, so if you run them from a different location you will need to adjust the paths accordingly.

These examples also assume that you have access to 8 CPU nodes to use as 'workers' for HBDesigner. If you have access to fewer, then you will need to adjust the values of ``n_workers`` in the shell scripts before running.

If you have installed HBDesigner using uv, conda, mamba, or pip you can run these examples via

.. code-block:: bash

./<example_script>.sh

(Though you will need to make sure you have the appropriate environment activated if you installed via conda/mamba/uv.)

However, if you installed HBDesigner via pixi, you will need to run the examples via

.. code-block:: bash

pixi run --manifest-path=<relative path to>/pyproject.toml <example_script>.sh

You can see a full example of this in the ``monomer`` directory.

.. toctree::
:maxdepth: 1
:caption: Explanation of the Examples:

monomer.md
interface_design.md
postprocessing.md

For more information about the options used in these examples, see :doc:`../cli_arguments`.
45 changes: 45 additions & 0 deletions docs/source/examples/interface_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Interface Design

If provided with an interface, HBDesigner will automatically try to design a network across it. This can be used for either one-sided or two-sided interface design. The one-sided case requires one or more "anchor residue(s)" on the target strand(s).

## Symmetric Design
Once more than one chain is involved in the hydrogen bonding network, symmetry can be taken into account. HBDesigner has limited support for symmetric design across protein-protein interfaces. To do this, we offer two complementary approaches: "lazy" and "strict" symmetry:

```{warning}
Symmetric design is still experimental and has not been validated when used in combination with conditioning features.
```

### Lazy symmetry
In a 'lazy' symmetry scenario, HBDesigner designs asymmetric networks then attempts to symmetrize them across any symmetric chains. This is best for when the network itself doesn't necessarily need to be symmetric, but the sequence symmetry needs to be preserved across the interface.

### Strict symmetry
For 'strict' symmetry, HBDesigner explicitly designs symmetric networks where all symmetric residues must contribute to the network and must interact with the symmetric copies of themselves.

```{note}
For strict symmetry to work well the designable residues must be oriented very close to the plane of symmetry.
```

For a system that is N-wise symmetric, the provided value of `n_res` must be divisible by N. For example, for a homotrimer the valid choices for `n_res` are 3, 6, etc.

---
## Examples

Examples of using HBDesigner to design hydrogen bonding networks for protein interfaces can be found in examples/interface. The following examples use the 1YRK PDB file located in the `interface` folder. You can find its entry on the RCSB [here](https://www.rcsb.org/structure/1YRK). This structure is composed of two chains and a ligand. Chain A is a protein kinase while chain B is a 13-residue peptide. The ligand is acetic acid.

- One-sided: `interface/one_sided`
- Residue B5 (serine) is selected as an anchor residue - this residue will be part of any designed hydrogen bonding networks
- Chain B is omitted - HBDesigner cannot choose residues from chain B in this calculation, except for those specified by `anchor_res`
- Two-sided: `interface/two_sided`
- This example does not have any special options. HBDesigner recognizes that multiple chains are provided, and will automatically design a network between them.

The following examples use PDB files that are located in their individual folders:
- Lazy Symmetry: `interface/symm_lazy`
- System of interest: [10GS](https://www.rcsb.org/structure/10GS), a dimer with C2 symmetry
- The only extra option that needs to be provided for 'lazy' symmetry is `symm_chains` which tells HBDesigner which chains should be symmetrized.
- Strict Symmetry: `interface/symm_strict`
- System of interest [5J0K](https://www.rcsb.org/structure/5J0K), a dimer with C2 symmetry - the provided structure has been cleaned
- The [Rosetta](https://github.com/RosettaCommons/rosetta) script, [`make_symmdef_file.pl`](https://github.com/RosettaCommons/rosetta/blob/main/source/src/apps/public/symmetry/make_symmdef_file.pl) was used to create the provided `.symm` file.
```{important}
The use of Rosetta requires a [license](https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md)
```
- The addition of the `symm_chains` argument is required to tell HBDesigner which chains should be symmetrized.
59 changes: 59 additions & 0 deletions docs/source/examples/monomer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Monomer Design
The types of examples in `monomer` include:
- [Unconditional Monomer Design](#unconditional_monomer_design)
- [Sequence Conditioning](#sequence_conditioning)
- [Virtual Guide Atom Conditioning](#virtual_guide_atom_conditioning)
- [Omitting Amino Acids](#omitting_amino_acids)
- [Pixi Example](#pixi_example)

All of these examples use the provided 1PGA PDB file. You can find out more about this structure [here](https://www.rcsb.org/structure/1PGA).

(unconditional_monomer_design)=
## Unconditional Monomer Design
You will want to perform unconditional monomer design if you:
- want to create networks on a monomer
- do not have specific amino acids that need to be used to create these networks

Unconditional monomer design only requires:
- `--pdb` (a PDB file of your input structure)
- `--n_res` (The number of residues in the completed network(s))
- `--n_samples` (The number of samples to generate before packing and scoring)
- `--n_workers` (The number of CPUs that can be used for packing)

You can run this example by `cd`-ing into `examples/monomer/unconditional` and running the shell script located there.

For the requested 200 samples, it is typical for only ~10 to 'succeed' based on the packing and scoring procedures. Each of the generated designs should have 3 non-glycine residues, as was specified by `n_res`.

(sequence_conditioning)=
## Sequence Conditioning
Sequence conditioning can be used to specify which amino acid(s) are used to form the hydrogen bonding network, including partial or ambiguous (e.g., "Either ASN or GLN") specifications. This is specified with a comma-separated list of amino acid groups.

Aside from the arguments discussed above, the only additional argument required for conditional sequence design is `guide_seq`. This option specifies what amino acids can be used when designing the hydrogen bonding network.

Three sequence conditioning examples are provided:
- `monomer/sequence_conditioning_full`:
`S,N,T` is passed to `guide_seq` resulting in designs that use those three amino acids specifically.
- `monomer/sequence_conditioning_partial`:
`X,T,X` is passed to `guide_seq` resulting in designs that have at least one threonine residue in each design. There are no restrictions on what the other two residues can be.
- `monomer/sequence_conditioning_ambiguous`:
`S,N|Q,T` is passed to `guide_seq` resulting in designs that have S, N **or** Q, and T in any order.

(virtual_guide_atom_conditioning)=
## Virtual Guide Atom Conditioning
Guide atoms can be used to specify an approximate location for the hydrogen bonding network. Given a list of residues in the input structure, HBDesigner will place a 'guide atom' in the centroid of the C-betas of these residues.

You can find an example of this in `monomer/guide_atom_conditioning`. The only additional argument needed, besides those discussed in the [Unconditional Monomer Design section](#unconditional_monomer_design), is `guide_res`.

(omitting_amino_acids)=
## Omitting Amino Acids
The `monomer/omit_AA` example shows how to omit amino acids from hydrogen bonding networks designed by HBDesigner. The addition of the argument `omit-AA` will prevent any of the amino acids specified from the user from being included in the designs.

(pixi_example)=
## Pixi Example
An example has been included here see how to run HBDesigner with [Pixi](https://pixi.prefix.dev/latest/). The same command line arguments can be used but now
```bash
pixi run --manifest-path=<path to pyproject.toml>
```
will need to be added before the rest of the command.

This is only relevant when pixi was used to install HBDesigner.
8 changes: 8 additions & 0 deletions docs/source/examples/postprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Postprocessing
We provide a helper script called `merge_networks.py` that attempts to naively combine output networks by checking for sequence overlap and clashes. This is NOT an exhaustive sweep, so it will not return ALL possible networks, but a subset from a rapid sampling procedure. For example usage, see `examples/postprocessing/run_merge.sh`.

When doing one-sided interface design, we may want to graft our output network back onto the wildtype target for downstream modeling. We provide a helper script called `graft_seq.py` to do this. For example usage, see `examples/postprocessing/run_graft.sh`

The most common use for HBDesigner outputs is as input for traditional sequence design. To do this, we use LigandMPNN to design the remaining sequence, keeping the HBDesigner network residues fixed. For an example of this, see `examples/postprocessing/run_mpnn.sh`.

These scripts pull from the resources in the other example directories.
8 changes: 5 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,17 @@
HBDesigner Documentation
========================

Welcome to the official documentation for [HBDesigner](https://github.com/Kuhlman-Lab/HBDesigner).
Welcome to the official documentation for `HBDesigner <https://github.com/Kuhlman-Lab/HBDesigner>`_.

HBDesigner is a machine learning tool that designs highly-connected hydrogen bonding networks based on user-defined constraints and an input protein backbone.

Make sure to read through the [HBDesigner README](https://github.com/Kuhlman-Lab/HBDesigner/blob/main/README.md) for an overview of the tool, its features, and how to get started.
Make sure to read through the `HBDesigner README <https://github.com/Kuhlman-Lab/HBDesigner/blob/main/README.md>`_ for an overview of the tool, its features, and how to get started.

.. toctree::
:maxdepth: 1
:caption: Contents:

cli_arguments.rst
installation_guide.md
installation_guide.md
models.md
examples/index.rst
44 changes: 8 additions & 36 deletions docs/source/installation_guide.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,13 @@
# Installation Guide
# Installation Issues

## Contents
See the [README](https://github.com/Kuhlman-Lab/HBDesigner/blob/main/README.md) for information about the various ways to install HBDesigner.


---
## Installation with Conda

### GPU

### CPU

---
## Installation with Mamba

### GPU

### CPU

---
## Installation with uv

### GPU

### CPU

---
## Installation with pixi

### GPU
### CPU
These installation methods assume a system that is running CUDA 12.8 or greater, but have options for CUDA 12.4. If you are running on a system that has a version of CUDA below 12.4 or a CUDA version too recent to be compatible with CUDA 12.8, continue reading for advice for how to adapt the environment files for your system.

(cuda_version)=
## Adjusting for the CUDA version available on your system
The various installation files (`env.yaml`, `pyproject.toml`) were created for systems running CUDA 12.8.

If your system has a more recent version of CUDA (e.g. 13.0) then these installation files may still work correctly. See the [Common Issues](#common-issues) section below if you are having trouble.
If your system has a more recent version of CUDA (e.g. 13.0) then these installation files will likely still work correctly. See the [Common Issues](#common-issues) section below if you are having trouble.

If your system has an older version of CUDA, then there are several dependencies you may need to change:
- The PyTorch source:
Expand All @@ -48,7 +21,7 @@ If your system has an older version of CUDA, then there are several dependencies
--extra-index-url https://download.pytorch.org/whl/cu124
--find-links https://data.pyg.org/whl/torch-2.6.0+cu124.html
```
You can find the correct link to use [here](https://pytorch.org/get-started/previous-versions/).
You can find the correct link to use for your torch/CUDA combination [here](https://pytorch.org/get-started/previous-versions/).
- The torch version will need to be changed:
```bash
torch==2.8.0+cu128
Expand Down Expand Up @@ -77,15 +50,14 @@ If your system has an older version of CUDA, then there are several dependencies
- torch-cluster
- torch-scatter
- You may need to find a version of triton that works with your PyTorch/CUDA combination, or you might be able to remove the version requirement from the triton listing. For CUDA 12.4, `triton==3.2.0` works.
-

---
(common-issues)=
## Common Issues

### `ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found`
If you are seeing this, it is most likely because your CUDA version and the YAML or TOML file you are using to install the dependencies do not match. See [the previous section](#cuda_version) to modify your file for the CUDA version you have access to.
If you are seeing this, it is most likely because your CUDA version and CUDA version the HBDesigner dependencies are expecting do not match. See [the previous section](#cuda_version) to modify your file for the CUDA version you have access to.

### `Segmentation Fault (core dumped)`
This can be caused by a variety of different things, including a CUDA version mismatch. If you haven't already modified your TOML or YAML file, see [the previous section](#cuda_version).
If you have updated these files, try adding the `dev` requirements to you installation instructions.
This can be caused by a variety of different things, including a CUDA version mismatch. If you haven't already modified your environment file to customize it for your particular system, see [the previous section](#cuda_version).
If you have updated these files, try adding the `dev` requirements to your installation.
10 changes: 10 additions & 0 deletions docs/source/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Model Weights

There are several checkpoint files in the `model_weights` directory that can be used with HBDesigner or you can supply your own. You can use the `design_model`, `design_model_ckpt`, or `packing_model_ckpt` command line options to specify which checkpoint file you want to use for the various operations HBDesigner performs.

Here we will briefly describe the various checkpoint files and their uses.

- `design_020.pt`: Default design checkpoint. It is the high-noise option for the two provided design models. It is best for sampling small networks (2-3 residues) and can give greater sample diversity.
- `design_002.pt`: Low-noise model and is best for large (4+ residues) networks and is more precise than `design_020.pt`.
- `pack.pt`: Provided model for packing calculations.
- `pippack_model_x_ckpt.pt`: These three models were used for benchmarking purposes and are not recommended for general use.
Loading
Loading