Production-grade cheminformatics workflow nodes for the Continuum platform, powered by RDKit. Designed for computational chemistry, medicinal chemistry, and drug discovery teams who need scalable, reproducible molecular processing pipelines.
From SMILES to insights -- parse, standardize, fingerprint, filter, react, and visualize molecules at scale, all within declarative, versionable Continuum workflows.
| Challenge | How Continuum Cheminformatics Helps |
|---|---|
| Compound library curation | Standardize structures with salt stripping, aromatization, canonical SMILES, and structure normalization nodes |
| Hit triage & filtering | Apply PAINS, BRENK, NIH, and ZINC structural alert catalogs; filter by substructure, functional groups, or Lipinski properties |
| SAR analysis | Decompose molecules into Murcko scaffolds and R-groups; compute 41 molecular descriptors in a single node |
| Virtual screening | Generate 9 fingerprint types, compute Tanimoto/Dice similarity, and select diverse subsets via MaxMin picking |
| Reaction enumeration | Run one- and two-component reactions using SMARTS-based reaction templates at scale |
| 3D modeling prep | Generate conformers, optimize geometries with MMFF94/UFF force fields, and align molecules in 3D |
| Reproducibility | Every step is a versioned workflow node with explicit configuration -- no hidden scripts or ad-hoc notebooks |
Both bit-based and count-based representations are supported:
| Algorithm | Type | Typical Use Case |
|---|---|---|
| Morgan (ECFP) | Circular | General-purpose similarity, QSAR, virtual screening |
| FeatMorgan (FCFP) | Circular (pharmacophoric) | Pharmacophore-aware similarity |
| AtomPair | Topological | Scaffold hopping, diversity analysis |
| Torsion | Topological | Conformational similarity |
| RDKit | Path-based | Substructure-oriented similarity |
| Avalon | Substructure | Fast screening, patent searching |
| Layered | Path-based (layered) | Similarity with tunable abstraction layers |
| MACCS | Structural keys (166-bit) | Quick pharmacophore screening |
| Pattern | Substructure | Substructure matching fingerprints |
Compute any combination of descriptors in a single pass:
| Category | Descriptors |
|---|---|
| Physicochemical | AMW, ExactMW, SlogP, SMR, LabuteASA, TPSA |
| Hydrogen Bonding | NumLipinskiHBA, NumLipinskiHBD, NumHBD, NumHBA |
| Structural Counts | NumRotatableBonds, NumAmideBonds, NumHeteroAtoms, NumHeavyAtoms, NumAtoms |
| Stereochemistry | NumStereocenters, NumUnspecifiedStereocenters |
| Ring Systems | NumRings, NumAromaticRings, NumSaturatedRings, NumAliphaticRings, NumAromaticHeterocycles, NumSaturatedHeterocycles, NumAliphaticHeterocycles, NumAromaticCarbocycles, NumSaturatedCarbocycles, NumAliphaticCarbocycles |
| Complexity | FractionCSP3 |
| Connectivity (Chi) | Chi0v, Chi1v, Chi2v, Chi3v, Chi4v, Chi1n, Chi2n, Chi3n, Chi4n |
| Shape (Kier) | HallKierAlpha, kappa1, kappa2, kappa3 |
The Molecule Catalog Filter node screens compounds against established medicinal chemistry filter sets:
| Catalog | Purpose |
|---|---|
| PAINS | Pan-Assay Interference Compounds -- flags promiscuous HTS artifacts |
| BRENK | Brenk structural alerts -- identifies unstable or reactive motifs |
| NIH | NIH MLSMR exclusion filters |
| ZINC | ZINC database drug-likeness filters |
Nodes for computing molecular properties and descriptors.
| Node | Description |
|---|---|
| Calculate Charges | Computes Gasteiger partial charges per atom using RDKit |
| Descriptor Calculation | Calculates up to 41 molecular descriptors from SMILES strings in a single pass |
Nodes for parsing and converting between molecular representations (SMILES, InChI, SDF, SMARTS).
| Node | Description |
|---|---|
| Canonical SMILES | Converts SMILES strings to their canonical form using RDKit |
| InChI to Molecule | Converts InChI strings to canonical SMILES using RDKit |
| Molecule to InChI | Converts SMILES strings to InChI and InChI Key using RDKit |
| Molecule Writer | Converts SMILES strings to different molecule formats (SDF, SMARTS, etc.) using RDKit |
| SMILES Parser | Parses molecule strings (SMILES, SDF, SMARTS) into canonical SMILES with error routing |
Advanced nodes under active development for specialized workflows.
| Node | Description |
|---|---|
| Adjust Query Properties | Converts molecules to flexible substructure queries by adjusting atom and bond query properties |
| Molecule Catalog Filter | Filters molecules against structural alert catalogs (PAINS, BRENK, NIH, ZINC) using RDKit FilterCatalog |
| R-Group Decomposition | Decomposes molecules into a core scaffold and R-groups using RDKit RGroupDecomposition |
| Structure Normalizer | Applies configurable normalization steps to standardize molecule SMILES representations |
Nodes for generating molecular fingerprints and computing chemical similarity.
| Node | Description |
|---|---|
| Count-Based Fingerprint | Generates count-based fingerprints (Morgan, AtomPair, Torsion) as JSON {index: count} maps |
| Diversity Picker | Selects a diverse subset of molecules using MaxMin diversity picking with RDKit fingerprints |
| Fingerprint | Generates bit-based fingerprints supporting 9 algorithms (Morgan, FeatMorgan, AtomPair, Torsion, RDKit, Avalon, Layered, MACCS, Pattern) |
| Fingerprint Similarity | Computes pairwise Tanimoto or Dice similarity from two SMILES columns |
Nodes for molecular fragmentation, scaffold extraction, and decomposition.
| Node | Description |
|---|---|
| Molecule Fragmenter | Fragments molecules using Murcko decomposition and outputs scaffold plus side chains |
| Molecule Extractor | Splits multi-component molecules (e.g., salts, mixtures) into individual fragments, one row per fragment |
| Murcko Scaffold | Extracts the Murcko scaffold (core ring structure) from molecules using RDKit |
Nodes for 3D coordinate generation, conformer sampling, force field optimization, and molecular alignment.
| Node | Description |
|---|---|
| Add Conformers | Generates multiple 3D conformers using RDKit distance geometry (ETKDG) |
| Add Coordinates | Generates 2D or 3D coordinates for molecules from SMILES strings |
| Open 3D Alignment | Aligns query molecules against reference molecules in 3D using Open3DAlign |
| Optimize Geometry | Optimizes 3D molecular geometry using MMFF94 or UFF force fields |
| RMSD Filter | Filters conformers/molecules using greedy RMSD-based diversity selection |
Nodes for chemical structure standardization and modification.
| Node | Description |
|---|---|
| Add Hydrogens | Adds explicit hydrogens to molecules using RDKit |
| Aromatize | Applies aromaticity perception to SMILES strings using RDKit |
| Kekulize | Converts aromatic SMILES to Kekule form (explicit double bonds) using RDKit |
| Remove Hydrogens | Removes explicit hydrogens from SMILES strings using RDKit |
| Salt Stripper | Removes salt/counterion fragments from SMILES strings using RDKit |
Nodes for combinatorial chemistry and reaction enumeration using SMARTS templates.
| Node | Description |
|---|---|
| Chemical Transformation | Applies chemical transformations iteratively to molecules using reaction SMARTS |
| One Component Reaction | Runs one-component reactions (e.g., deprotection, oxidation) on reactant molecules |
| Two Component Reaction | Runs two-component reactions (e.g., amide coupling, Suzuki) on pairs of reactant molecules |
Nodes for molecular visualization and depiction.
| Node | Description |
|---|---|
| RDKit to SVG | Renders molecules as publication-quality SVG images using RDKit MolDraw2DSVG |
Nodes for substructure searching, pattern matching, and maximum common substructure analysis.
| Node | Description |
|---|---|
| Functional Group Filter | Filters molecules by functional group presence and count using SMARTS patterns |
| Maximum Common Substructure | Finds the MCS among all input molecules -- useful for SAR and lead series identification |
| Molecule Substructure Filter | Filters molecules against a table of SMILES-based substructure queries, splitting matches and non-matches |
| Substructure Counter | Counts substructure matches for each molecule against a table of query patterns |
| Substructure Filter | Filters molecules by SMARTS substructure query, splitting matches and non-matches into separate output ports |
Nodes for validation and quality assurance of molecular data pipelines.
| Node | Description |
|---|---|
| SDF Difference Checker | Compares two tables of molecules row by row, identifying SMILES, property, and coordinate differences |
Below are common cheminformatics pipeline patterns you can build by connecting these nodes:
SMILES Parser --> Salt Stripper --> Structure Normalizer --> Canonical SMILES --> Descriptor Calculation
+--> Molecule Catalog Filter (PAINS) --> [Clean Hits]
SMILES Parser --> Descriptor Calc --> |
+--> Functional Group Filter --> [Flagged Compounds]
SMILES Parser --> Fingerprint (Morgan) --> Fingerprint Similarity --> Diversity Picker --> [Diverse Subset]
[Reactant A] --\
+--> Two Component Reaction (Suzuki SMARTS) --> Descriptor Calculation --> [Product Library]
[Reactant B] --/
SMILES Parser --> Add Coordinates (3D) --> Add Conformers --> Optimize Geometry (MMFF94) --> RMSD Filter --> [Diverse Conformers]
continuum-feature-cheminformatics/
├── features/
│ └── continuum-feature-rdkit/ # RDKit cheminformatics nodes
│ ├── build.gradle.kts
│ ├── lib/org.RDKit.jar # RDKit Java/JNI bindings
│ └── src/
│ ├── main/kotlin/.../node/ # 38 node implementations
│ └── test/kotlin/.../node/ # Unit tests for every node
├── worker/ # Spring Boot worker application
├── docker/ # Local dev infrastructure (Temporal, Kafka, MinIO)
├── test-workflows/ # Automated test workflows organized by category
│ └── rdkit/
│ ├── 01-converters/
│ ├── 02-modifiers/
│ ├── 03-calculators/
│ ├── 04-geometry/
│ ├── 05-fingerprints/
│ ├── 06-fragments/
│ ├── 07-searching/
│ ├── 08-reactions/
│ ├── 09-rendering/
│ ├── 10-experimental/
│ └── 11-testing/
├── settings.gradle.kts
└── gradle.properties
| Component | Technology | Purpose |
|---|---|---|
| Cheminformatics Engine | RDKit (via Java/JNI bindings) | Molecular parsing, fingerprinting, descriptors, reactions, 3D geometry |
| Workflow Orchestration | Temporal | Distributed, fault-tolerant workflow execution |
| Event Streaming | Apache Kafka | Progress reporting, workflow event bus |
| Data Storage | Apache Parquet on S3/MinIO | Columnar storage for tabular molecular data |
| Runtime | Spring Boot + JDK 21 | Worker application framework |
| Build | Gradle (Kotlin DSL) | Multi-module build with Jib containerization |
- JDK 21 -- Eclipse Temurin recommended
- Docker & Docker Compose -- for local infrastructure
- GitHub PAT with
read:packagesscope -- for downloading Continuum dependencies
# 1. Clone the repository
git clone <repository-url>
cd continuum-feature-cheminformatics
# 2. Set GitHub credentials (for Continuum dependency resolution)
export GITHUB_USERNAME=your-username
export GITHUB_TOKEN=ghp_your-token
# 3. Start local infrastructure
cd docker && docker compose up -d && cd ..
# 4. Build the project
./gradlew build
# 5. Run the worker
./gradlew :worker:bootRunYour cheminformatics nodes are now registered with Temporal and ready to execute workflows.
For detailed setup instructions and how to create custom nodes, see the Continuum Feature Template.
| Repository | Description |
|---|---|
| Continuum | Core platform -- API server, worker framework, shared libraries |
| continuum-workbench | Browser-based workflow editor (Eclipse Theia + React Flow) |
| continuum-feature-base | Base analytics nodes -- data transforms, REST, scripting |
| continuum-feature-ai | AI/ML nodes -- LLM fine-tuning with Unsloth + LoRA |
| continuum-feature-template | Template for scaffolding new feature modules |