Skip to content

Add Snakemake workflow for QA pipelines#2

Draft
mvdoc wants to merge 2 commits intomasterfrom
claude/setup-snakemake-pipelines-SiCGC
Draft

Add Snakemake workflow for QA pipelines#2
mvdoc wants to merge 2 commits intomasterfrom
claude/setup-snakemake-pipelines-SiCGC

Conversation

@mvdoc
Copy link
Owner

@mvdoc mvdoc commented Jan 26, 2026

Create Snakemake pipelines for orchestrating QA analysis:

  • Motion plots: generates traces, FD plots, and violin plots from fMRIprep
  • tSNR plots: computes tSNR volumes then generates mosaics and plots
  • ISC plots: computes ISC then generates violin and surface plots

Each pipeline properly handles dependencies (tSNR/ISC computation before plotting).
Added snakemake>=8.0.0 as optional 'workflow' dependency.

Create Snakemake pipelines for orchestrating QA analysis:
- Motion plots: generates traces, FD plots, and violin plots from fMRIprep
- tSNR plots: computes tSNR volumes then generates mosaics and plots
- ISC plots: computes ISC then generates violin and surface plots

Each pipeline properly handles dependencies (tSNR/ISC computation before plotting).
Added snakemake>=8.0.0 as optional 'workflow' dependency.
Copilot AI review requested due to automatic review settings January 26, 2026 05:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Snakemake-based workflow to orchestrate existing QA scripts (motion, tSNR, ISC) with dependency ordering and documentation, plus an optional dependency for installing Snakemake.

Changes:

  • Introduces a new workflow/Snakefile with rules for generating QA outputs (including compute→plot dependencies for tSNR/ISC).
  • Adds workflow configuration (workflow/config/config.yaml) and usage docs (workflow/README.md).
  • Adds snakemake>=8.0.0 as an optional workflow extra in pyproject.toml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 18 comments.

File Description
workflow/config/config.yaml Adds toggles for which pipelines run and subject filtering settings.
workflow/Snakefile Implements Snakemake rules to run QA scripts and manage pipeline dependencies/targets.
workflow/README.md Documents installation and how to run workflow targets/cleanup.
pyproject.toml Adds workflow optional dependency group including Snakemake.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

snakemake tsnr_plots --cores 1 --forcerun
"""

import os
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os is imported but not used in this Snakefile. Removing unused imports helps keep the workflow file clean and avoids lint noise.

Suggested change
import os

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +17

# Whether to force regeneration of outputs that already exist
force: false
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force is defined in the workflow config but is not used anywhere in the Snakefile. Either wire it up (e.g., pass --force to scripts that support it like qa-plot-tsnr.py, and/or document the intended behavior) or remove it to avoid a misleading option.

Suggested change
# Whether to force regeneration of outputs that already exist
force: false

Copilot uses AI. Check for mistakes.
params:
subjects=get_subjects_arg(),
shell:
"""
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tsnr_plots rule writes its log to {TSNR_DIR}/logs/... but does not ensure that directory exists. If compute_tsnr was run previously (so .tsnr_computed.done exists) but the logs directory is missing, this rule will fail when tee tries to open the log file. Add a mkdir -p for the logs directory (and, if needed, the output directory) in this rule’s shell command.

Suggested change
"""
"""
mkdir -p {TSNR_DIR}/logs

Copilot uses AI. Check for mistakes.
params:
subjects=get_subjects_arg(),
shell:
"""
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motion_report rule writes its log to {MOTION_DIR}/logs/... but does not ensure that directory exists. Add a mkdir -p for the logs directory in this rule to avoid failures when running the rule in isolation (e.g., after cleaning logs but keeping .motion_plots.done).

Suggested change
"""
"""
mkdir -p {MOTION_DIR}/logs

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +205
log:
TSNR_DIR / "logs" / "tsnr_reports.log",
params:
subjects=get_subjects_arg(),
shell:
"""
{PYTHON} {input.script} {params.subjects} 2>&1 | tee {log}
"""
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tsnr_report rule writes its log to {TSNR_DIR}/logs/... but does not create that directory. Add a mkdir -p for {TSNR_DIR}/logs to prevent failures when the directory is missing.

Copilot uses AI. Check for mistakes.
shell:
"""
mkdir -p {MOTION_DIR}/logs
{PYTHON} {input.script} {params.subjects} 2>&1 | tee {log}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motion summary rule uses params.subjects directly in the shell command, so shell metacharacters in the subjects config can terminate the Python command and execute arbitrary additional commands. Since this value may be controlled via --config subjects=..., an attacker could gain code execution under the Snakemake user account. This should be corrected by ensuring subjects is passed as a properly escaped argument or through a safer execution mechanism instead of raw shell string interpolation.

Copilot uses AI. Check for mistakes.
shell:
"""
mkdir -p {MOTION_DIR}/logs
{PYTHON} {input.script} {params.subjects} 2>&1 | tee {log}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untrusted params.subjects is interpolated directly into this shell command, so shell metacharacters in the subjects config (e.g., passed via --config subjects=...) can break out of the intended --subjects argument and execute arbitrary commands. Because Snakemake formats this into /bin/sh without additional quoting, an attacker who can influence subjects could run code with the privileges of the Snakemake user. To mitigate this, ensure the subjects list is passed as properly quoted arguments or via a non-shell interface (for example, using Snakemake's safer argument passing mechanisms instead of raw string interpolation).

Copilot uses AI. Check for mistakes.
shell:
"""
mkdir -p {TSNR_DIR}/logs
{PYTHON} {input.script} {params.subjects} 2>&1 | tee {log}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untrusted params.subjects is interpolated directly into this shell command, so shell metacharacters in the subjects config (e.g., via --config subjects=...) can escape the intended --subjects argument and execute arbitrary commands when this rule runs. Since Snakemake passes this formatted string to the shell without quoting, an attacker controlling subjects could achieve code execution under the Snakemake user account. To remediate, ensure the subjects value is treated as data only (e.g., passed as a properly quoted argument or through a non-shell execution path) rather than concatenated into a shell command.

Copilot uses AI. Check for mistakes.
subjects=get_subjects_arg(),
shell:
"""
{PYTHON} {input.script} {params.subjects} 2>&1 | tee {log}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, params.subjects is again interpolated unescaped into a shell command, which allows shell metacharacters in the subjects config to terminate the Python invocation and run arbitrary additional commands. Because this rule may be invoked with user-specified --config subjects=..., a malicious value can lead to code execution as the Snakemake user. This should be fixed by avoiding raw string concatenation of subjects into the shell command and instead passing it as a properly quoted argument or through a safer execution mechanism.

Copilot uses AI. Check for mistakes.
subjects=get_subjects_arg(),
shell:
"""
{PYTHON} {input.script} {params.subjects} 2>&1 | tee {log}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The params.subjects value is used directly in this shell command for the motion HTML reports, so a crafted subjects config string containing shell metacharacters can inject arbitrary commands. Since Snakemake does not automatically escape this interpolation, any component that passes attacker-controlled subjects (for example via --config) could be abused for code execution. To address this, ensure subjects is passed as a data argument rather than concatenated into the shell command string, using proper quoting or non-shell execution.

Copilot uses AI. Check for mistakes.
- Remove unused import (os)
- Fix Python interpreter detection: prefer active interpreter, fallback to .venv
- Fix configfile path using workflow.source_path() for portability
- Add missing log directory creation in tsnr_plots, isc_plots rules
- Add localrules declaration for clean targets (run locally, not on cluster)
- Convert all Path objects to strings for shell commands
- Remove unused 'force' config option
- Remove unnecessary README.md
@mvdoc
Copy link
Owner Author

mvdoc commented Jan 26, 2026

@copilot review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +50 to +57
# Data directories (from qa_config.yaml defaults)
DATA_DIR = str(PROJECT_ROOT / "data")
DERIVATIVES_DIR = f"{DATA_DIR}/derivatives"
FMRIPREP_DIR = f"{DERIVATIVES_DIR}/fmriprep"
QA_DIR = f"{DERIVATIVES_DIR}/qa"
TSNR_DIR = f"{QA_DIR}/tsnr"
MOTION_DIR = f"{QA_DIR}/motion"
ISC_DIR = f"{QA_DIR}/isc"
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow hard-codes QA output locations under data/derivatives/qa instead of deriving them from the Hyperface QA config (which can be overridden via --config, --data-dir, or HYPERFACE_QA_CONFIG). This can lead to logs/cleanup targeting a different directory than the scripts actually write to. Consider loading hyperface.qa.get_config() (or parsing the same YAML) in the Snakefile and building QA_DIR/TSNR_DIR/MOTION_DIR/ISC_DIR from config.paths, and/or exposing qa_config/data_dir as snakemake config entries passed through to the scripts.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +64
def get_subjects_arg():
"""Get --subjects argument if configured."""
subjects = config.get("subjects", "")
if subjects:
return f"--subjects {subjects}"
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subjects is interpolated into the shell: command as a raw string. Because it comes from snakemake config, this makes the command fragile (subjects containing whitespace/shell metacharacters) and enables shell injection via --config subjects=.... Prefer passing subjects as a list (e.g., store parsed tokens and use snakemake.utils.quote/shlex.join for safe rendering) or move execution to a run: block using subprocess.run([...]) to avoid shell parsing.

Copilot uses AI. Check for mistakes.
# Data directories (from qa_config.yaml defaults)
DATA_DIR = str(PROJECT_ROOT / "data")
DERIVATIVES_DIR = f"{DATA_DIR}/derivatives"
FMRIPREP_DIR = f"{DERIVATIVES_DIR}/fmriprep"
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMRIPREP_DIR is defined but never used in this Snakefile, which can be confusing when maintaining the workflow. Either remove it or use it when wiring rule inputs (e.g., as part of declared inputs for motion/tSNR rules).

Suggested change
FMRIPREP_DIR = f"{DERIVATIVES_DIR}/fmriprep"

Copilot uses AI. Check for mistakes.
@mvdoc mvdoc marked this pull request as draft March 10, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants