Skip to content

modelblocks-org/data-module-template

Repository files navigation

Modelblocks data module template

A template for modular data workflows using snakemake workflows, part of the Modelblocks toolset.

Tip

Looking for general information on Modelblocks? Check the Modelblocks website and our documentation and guidelines.

Resources

To familiarise yourself with Modelblocks data modules:

  • Check the auto-generated minimal example. You can find it in tests/integration/Snakefile.
  • Read about snakemake modularisation in their documentation.

Features

  • Standardised layout compliant with the snakemake workflow catalogue's listing requirements, enabling them to be automatically included in their listings once published. Read more about those requirements here.
  • Standardised input-output structure across modules:
    • resources/: files needed for the module's processes.
      • user/: files that should be provided by users. Document them well!
      • automatic/: files that the module downloads or prepares in intermediate steps.
    • results/: files generated by the module's algorithms that are relevant to the user.
  • Pre-made integration setup for your module.
    • Continuous Integration (CI) settings, ready for pre-commit.ci.
    • GitHub actions to automate chores during pull requests and releases.
    • Premade pytest setup.

Important

A few things to be aware of.

  • Modules do not work like regular snakemake workflows
    • The primary way to test them should be external (calling module:, passing resources, and requesting results). Check the pre-made example in tests/integration for more info.
    • Internal access (e.g., calling the all: rule) may not work, as the module may not have the necessary resources/ to execute properly.
  • Please be sure to maintain the following files to ensure Modelblocks compatibility
    • INTERFACE.yaml: a simple description of the module's input/output structure.
    • config/config.yaml: a basic functioning example of how to configure this module.
    • workflow/internal/config.schema.yaml: the module's configuration schema, used by snakemake for validation.
    • AUTHORS / CITATION.cff / LICENSE: licensing and attribution of this module's code and methods.

How to use this template

We require pixi as a package manager. Once installed, do the following:

  1. Install the templater tool copier.

    pixi global install copier
  2. Use copier to build a project with this template. A new module will be created in the directory you chose. We recommend you use the module name as the directory name.

    copier copy https://github.com/modelblocks-org/data-module-template.git ./<module_name>

    [!TIP] If your terminal does not have access to copier then you may need to update your PATH variable to include ~/.pixi/bin.

  3. Answer some questions so can we pre-fill licensing, citation files, etc...

  4. Initialise the pixi project environment of your new module.

    cd ./<module_name> # navigate to the new project
    pixi install --all  # install the project environment
  5. Extra: run the auto-generated example module!

    cd tests/integration  # go to the integration test...
    pixi run snakemake --use-conda  # run it!

About

A template for modular data workflows, making energy systems analysis more understandable and transparent!

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors