Skip to content

Fixing does not appear to work for inferring valence & formal charge states from molecules from some PDB files #231

@Croydon-Brixton

Description

@Croydon-Brixton

Thank you for this nice library!

I'm have a question re fixing 'broken' Mols by inferring the correct valences and charges that I was hoping datamol could fix for me.

If I load NAP structures from examples in the pdb (e.g. 5ocm) and simply transfer over bond annotations and atoms (formal charge is not specified in this PDB, so I'm assuming 0 charge) I end up with a structure like this:

smi = "c1cc(c[n](c1)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O])O[P@](=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)n4cnc5c4ncnc5N)OP(=O)(O)O)O)O)O)C(=O)N"

# The correct smiles would be:
smi_correct = "c1cc(c[n+](c1)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O-])O[P@](=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)n4cnc5c4ncnc5N)OP(=O)(O)O)O)O)O)C(=O)N"

Screenshot 2024-09-02 at 19 19 10

RDkit then fails to load this due to sanitization problems

mol = Chem.MolFromSmiles(smi)  # < fails
mol = Chem.MolFromSmiles(smi, sanitize=False)  # <works and produces the structure above, which is an invalid molecule

This molecule can be 'rescued' by assigning a positive charge to nitrogen number 4, but the datamol pipeline unfortunately fails to do this:

import datamol as dm

# Standardize and sanitize
mol = Chem.MolFromSmiles(smi, sanitize=False)
mol = dm.fix_mol(mol)
mol = dm.sanitize_mol(mol)
mol = dm.standardize_mol(mol)
Chem.SanitizeMol(mol)

Is there a way to fix this structure computationally with datamol?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions