A template for modular data workflows using snakemake workflows, part of the Modelblocks toolset.
Tip
Looking for general information on Modelblocks? Check the Modelblocks website and our documentation and guidelines.
To familiarise yourself with Modelblocks data modules:
- Check the auto-generated minimal example. You can find it in
tests/integration/Snakefile. - Read about
snakemakemodularisation in their documentation.
- Standardised layout compliant with the snakemake workflow catalogue's listing requirements, enabling them to be automatically included in their listings once published. Read more about those requirements here.
- Standardised input-output structure across modules:
resources/: files needed for the module's processes.user/: files that should be provided by users. Document them well!automatic/: files that the module downloads or prepares in intermediate steps.
results/: files generated by the module's algorithms that are relevant to the user.
- Pre-made integration setup for your module.
- Continuous Integration (CI) settings, ready for pre-commit.ci.
- GitHub actions to automate chores during pull requests and releases.
- Premade
pytestsetup.
Important
A few things to be aware of.
- Modules do not work like regular snakemake workflows
- The primary way to test them should be external (calling
module:, passing resources, and requesting results). Check the pre-made example intests/integrationfor more info. - Internal access (e.g., calling the
all:rule) may not work, as the module may not have the necessaryresources/to execute properly.
- The primary way to test them should be external (calling
- Please be sure to maintain the following files to ensure Modelblocks compatibility
INTERFACE.yaml: a simple description of the module's input/output structure.config/config.yaml: a basic functioning example of how to configure this module.workflow/internal/config.schema.yaml: the module's configuration schema, used bysnakemakefor validation.AUTHORS/CITATION.cff/LICENSE: licensing and attribution of this module's code and methods.
We require pixi as a package manager. Once installed, do the following:
-
Install the templater tool
copier.pixi global install copier
-
Use
copierto build a project with this template. A new module will be created in the directory you chose. We recommend you use the module name as the directory name.copier copy https://github.com/modelblocks-org/data-module-template.git ./<module_name>
[!TIP] If your terminal does not have access to
copierthen you may need to update yourPATHvariable to include~/.pixi/bin. -
Answer some questions so can we pre-fill licensing, citation files, etc...
-
Initialise the
pixiproject environment of your new module.cd ./<module_name> # navigate to the new project pixi install --all # install the project environment
-
Extra: run the auto-generated example module!
cd tests/integration # go to the integration test... pixi run snakemake --use-conda # run it!