[WIP] refactor: module support by adthrasher · Pull Request #318 · stjudecloud/workflows

adthrasher · 2026-05-29T13:51:55Z

DO NOT MERGE

Demonstration PR for upcoming module support in WDL 1.4.

Before submitting this PR, please make sure:

You have added a few sentences describing the PR here.
The code passes all CI tests without any errors or warnings.
You have added tests (when appropriate).
You have added an entry in any relevant CHANGELOGs (when appropriate).
If you have made any changes to the scripts/ or docker/ directories, please ensure any image versions have been incremented accordingly!
You have updated the README or other documentation to account for these changes (when appropriate).

adthrasher · 2026-05-29T14:11:03Z

@a-frantz after some discussion with @claymcleod I mocked up some example modules. I intentionally kept it simple, but I covered both a defined entry point and a default entry point (index.wdl). I'll be interested to see how things work once Sprocket has the more fully featured module support.

adthrasher · 2026-06-01T15:14:32Z

Assuming I've understood the module spec properly, I've created a number of examples here.

The "per-tool" module: This is what I've done with fq and samtools. This seems like it will be a huge maintenance burden, as each tool definition gets moved to a folder and has an accompanying module.json. So it will essentially double the number of files in the repo.
Grouping tools: This is what I did with the new alignment subdirectory. This would enable you to do something like import alignment/bwa and import alignment/star to get the precise aligner.
I didn't do this, but we could simply make tools a module. Then you'd do something like import tools/sambamba to get individual tools.

For workflows, I think the organization is obvious for something like DNAseq or RNAseq. We'd have a single module with entry points for the FASTQ and BAM. This has the advantage that it also hides the core workflows from end users. I'm less clear on how we should organize the other workflows (e.g. bam-to-fastqs). I have a single example of a standalone module for a workflow.

I also didn't touch the data structures folder. I suspect that should end up as a single module with various sub-paths.

I'm also not sure how the versioning works. Clay's spec says that git-based dependencies are by git tags, so we'd have to rethink how we've been doing releases and follow that specific format (e.g. <module>/<version>).

adthrasher · 2026-06-01T16:10:14Z

+            "path": "./dnaseq-standard-fastq.wdl"
+        },
+        "bam": {
+            "path": "./dnaseq-standard.wdl"


I'm not sure this is right. I think it needs a source object wrapping it.

claymcleod · 2026-06-01T16:29:42Z

Assuming I've understood the module spec properly, I've created a number of examples here.
* The "per-tool" module: This is what I've done with `fq` and `samtools`. This seems like it will be a huge maintenance burden, as each tool definition gets moved to a folder and has an accompanying `module.json`. So it will essentially double the number of files in the repo.

* Grouping tools: This is what I did with the new `alignment` subdirectory. This would enable you to do something like `import alignment/bwa` and `import alignment/star` to get the precise aligner.

* I didn't do this, but we could simply make `tools` a module. Then you'd do something like `import tools/sambamba` to get individual tools.
For workflows, I think the organization is obvious for something like DNAseq or RNAseq. We'd have a single module with entry points for the FASTQ and BAM. This has the advantage that it also hides the core workflows from end users. I'm less clear on how we should organize the other workflows (e.g. bam-to-fastqs). I have a single example of a standalone module for a workflow.

I also didn't touch the data structures folder. I suspect that should end up as a single module with various sub-paths.

I'm also not sure how the versioning works. Clay's spec says that git-based dependencies are by git tags, so we'd have to rethink how we've been doing releases and follow that specific format (e.g. <module>/<version>).

I think you've got all of this right. My recommendation is to do try the tool model (i.e., the first model) to start out. The reason is mainly because that's going to enable a really rich discovery of the modules when we write the registry at OpenWDL (e.g., the metadata about which tools exist in the module and at what versions/licenses is going to be easiest to reason about in this mode).

I think the second solution could work as well, especially if the maintenance burden of the first becomes too high.

I would stay away from the third version, as I feel it pulls far too much information into one module.json.

adthrasher · 2026-06-01T17:50:38Z

Assuming I've understood the module spec properly, I've created a number of examples here.
* The "per-tool" module: This is what I've done with `fq` and `samtools`. This seems like it will be a huge maintenance burden, as each tool definition gets moved to a folder and has an accompanying `module.json`. So it will essentially double the number of files in the repo.

* Grouping tools: This is what I did with the new `alignment` subdirectory. This would enable you to do something like `import alignment/bwa` and `import alignment/star` to get the precise aligner.

* I didn't do this, but we could simply make `tools` a module. Then you'd do something like `import tools/sambamba` to get individual tools.
For workflows, I think the organization is obvious for something like DNAseq or RNAseq. We'd have a single module with entry points for the FASTQ and BAM. This has the advantage that it also hides the core workflows from end users. I'm less clear on how we should organize the other workflows (e.g. bam-to-fastqs). I have a single example of a standalone module for a workflow.
I also didn't touch the data structures folder. I suspect that should end up as a single module with various sub-paths.
I'm also not sure how the versioning works. Clay's spec says that git-based dependencies are by git tags, so we'd have to rethink how we've been doing releases and follow that specific format (e.g. <module>/<version>).
I think you've got all of this right. My recommendation is to do try the tool model (i.e., the first model) to start out. The reason is mainly because that's going to enable a really rich discovery of the modules when we write the registry at OpenWDL (e.g., the metadata about which tools exist in the module and at what versions/licenses is going to be easiest to reason about in this mode).

I think the second solution could work as well, especially if the maintenance burden of the first becomes too high.

I would stay away from the third version, as I feel it pulls far too much information into one module.json.

I think I'd be on board with #1 if you could have a flat tools directory (as we do now) with a single module.json. I'd envision that module.json containing, essentially, an array of module declarations. It seems strange to me to have a directory per tool with either a generic index.wdl or a <tool>.wdl and a module.json as the only entries. It feels quite cluttered.

adthrasher added 3 commits May 29, 2026 09:50

refactor: module support

14eae45

chore: update imports

3efae8e

refactor: arriba with default entrypoint

d4f8a55

adthrasher added 2 commits May 29, 2026 12:39

refactor: combine DNAseq workflows into a single module

b67bf1d

chore: add missing module.json

35e791e

adthrasher commented Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] refactor: module support#318

[WIP] refactor: module support#318
adthrasher wants to merge 5 commits into
mainfrom
refactor/wdl_modules

adthrasher commented May 29, 2026

Uh oh!

adthrasher commented May 29, 2026

Uh oh!

adthrasher commented Jun 1, 2026

Uh oh!

adthrasher Jun 1, 2026

Uh oh!

claymcleod commented Jun 1, 2026

Uh oh!

adthrasher commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adthrasher commented May 29, 2026

Uh oh!

adthrasher commented May 29, 2026

Uh oh!

adthrasher commented Jun 1, 2026

Uh oh!

adthrasher Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

claymcleod commented Jun 1, 2026

Uh oh!

adthrasher commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants