Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughPR adds VAF-aware FP/FN expression builders, reorders and reformats contig/liftover helper logic, expands classification outputs to include tp and tp-baseline, makes certain FP/FN-related outputs persistent, and documents the new TP deliverables. ChangesFP/FN/TP classification and expression support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@workflow/rules/common.smk`:
- Around line 613-615: The conditional uses the unsupported
wildcards.get("classification") API; update it to follow the file's established
pattern by checking hasattr(wildcards, "classification") and then comparing
wildcards.classification to "fn" or "tp-baseline" (i.e., replace
wildcards.get("classification") == "fn" / == "tp-baseline" with a guarded
attribute access using hasattr and wildcards.classification) so the rule's
wildcard checks match the rest of the file's usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| wildcards.get("classification") == "fn" | ||
| or wildcards.get("classification") == "tp-baseline" | ||
| ): |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Snakemake (v7.17+ and current), does the wildcardsobject support.get("name"), or is attribute/index access the supported API?
💡 Result:
In Snakemake (v7.17+ and current), the supported API for the wildcards object is attribute access (e.g. wildcards.sample or {wildcards.sample}), not .get("name"). Authoritative docs describe accessing the wildcards via the wildcards namespace like input/output (e.g. {wildcards.group} and wildcards.sample) [1][2]. While some community answers mention wildcards.get("sample"), that is not documented in the official Snakemake docs as the intended/guaranteed interface for the wildcards object. The official guidance consistently uses attribute access [1][2]. If you want safe access to an optional wildcard, use Python logic around the known wildcard keys/expected structure rather than relying on .get(). For example, validate that the wildcard exists before referencing it as an attribute. Answer: Prefer attribute/index-style access; .get("name") is not the documented/guaranteed supported API in v7.17+ and current.
Citations:
- 1: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
- 2: https://snakemake.readthedocs.io/en/v7.0.2/tutorial/basics.html
🏁 Script executed:
cd workflow/rules && wc -l common.smkRepository: snakemake-workflows/dna-seq-benchmark
Length of output: 96
🏁 Script executed:
# Examine the specific lines mentioned (613-615) and surrounding context
sed -n '605,625p' workflow/rules/common.smk | cat -nRepository: snakemake-workflows/dna-seq-benchmark
Length of output: 890
🏁 Script executed:
# Search for all instances of wildcards.get usage in the file
rg "wildcards\.get\(" workflow/rules/common.smkRepository: snakemake-workflows/dna-seq-benchmark
Length of output: 189
🏁 Script executed:
# Search for hasattr patterns with wildcards in the file
rg "hasattr\(wildcards" workflow/rules/common.smk -A 1 -B 1Repository: snakemake-workflows/dna-seq-benchmark
Length of output: 1681
Use wildcard attribute access with hasattr() guards instead of wildcards.get(...) for consistency and safety.
The file consistently uses hasattr(wildcards, "fieldname") + attribute access (found in 10+ functions), but lines 613-615 use the undocumented wildcards.get("classification") API. Snakemake's official documentation only specifies attribute access; .get() is not a guaranteed/supported interface.
Proposed fix
- if (
- wildcards.get("classification") == "fn"
- or wildcards.get("classification") == "tp-baseline"
- ):
+ classification = (
+ wildcards.classification if hasattr(wildcards, "classification") else None
+ )
+ if classification in {"fn", "tp-baseline"}:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@workflow/rules/common.smk` around lines 613 - 615, The conditional uses the
unsupported wildcards.get("classification") API; update it to follow the file's
established pattern by checking hasattr(wildcards, "classification") and then
comparing wildcards.classification to "fn" or "tp-baseline" (i.e., replace
wildcards.get("classification") == "fn" / == "tp-baseline" with a guarded
attribute access using hasattr and wildcards.classification) so the rule's
wildcard checks match the rest of the file's usage.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@workflow/rules/common.smk`:
- Around line 496-500: The reformatted with-statement uses Python 3.10-only
syntax by placing "as f" inside parentheses; change it so the context manager
expression is assigned to a local variable first (e.g., cm =
checkpoints.stat_truth.get(benchmark=benchmark, cov=cov).output[0].open()) and
then use the traditional form "with cm as f:" so that the "as f" remains outside
parentheses; locate the expression built from
checkpoints.stat_truth.get(...).output[0].open() and refactor as described.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| with ( | ||
| checkpoints.stat_truth.get(benchmark=benchmark, cov=cov) | ||
| .output[0] | ||
| .open() as f | ||
| ): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for declared Python version constraints in setup/config files
fd -e cfg -e toml -e txt -e yaml --exclude ".snakemake" | xargs grep -l "python_requires\|python >=\|python>=\|Programming Language :: Python :: 3\." 2>/dev/null | head -20
rg -n "min_version|python_requires|python >=" --type yaml --type toml 2>/dev/null | head -20Repository: snakemake-workflows/dna-seq-benchmark
Length of output: 63
🏁 Script executed:
# Find all requirements/environment files
fd -type f \( -name "requirements*.txt" -o -name "environment*.yml" -o -name "environment*.yaml" -o -name "setup.py" -o -name "pyproject.toml" -o -name "Snakefile" \) | head -20Repository: snakemake-workflows/dna-seq-benchmark
Length of output: 252
🏁 Script executed:
# Look for Snakemake version constraint or min_version in Snakefiles and rules
rg -l "Snakefile|\.smk" | head -10Repository: snakemake-workflows/dna-seq-benchmark
Length of output: 100
🏁 Script executed:
# Check the actual code at lines 496-500 to confirm the syntax
sed -n '490,505p' workflow/rules/common.smkRepository: snakemake-workflows/dna-seq-benchmark
Length of output: 500
🏁 Script executed:
# Find all constraint/config files
find . -maxdepth 3 \( -name "*.txt" -o -name "*.yml" -o -name "*.yaml" -o -name "*.toml" -o -name "*.py" \) 2>/dev/null | grep -iE "(require|environment|setup|config|snakemake)" | head -30Repository: snakemake-workflows/dna-seq-benchmark
Length of output: 163
🏁 Script executed:
# Check the main Snakefile for min_version
cat workflow/Snakefile | head -50Repository: snakemake-workflows/dna-seq-benchmark
Length of output: 1601
🏁 Script executed:
# Look for requirements in the root and common directories
cat requirements.txt 2>/dev/null || cat environment.yml 2>/dev/null || cat pyproject.toml 2>/dev/null || echo "No standard config found in root"Repository: snakemake-workflows/dna-seq-benchmark
Length of output: 114
🌐 Web query:
Snakemake 7.17.1 Python version requirement minimum
💡 Result:
Minimum Python version required for Snakemake 7.17.1: Python 3.5 (i.e., “Python ≥3.5”). [1] This requirement is stated in the Snakemake 7.17.1 documentation under the setup prerequisites (“Python ≥3.5”). [1]
Citations:
🌐 Web query:
Snakemake 7.17 Python 3.8 compatibility requirements
💡 Result:
Snakemake 7.17.x requires Python >= 3.7, so it is compatible with Python 3.8 on the Python-version level.[1] However, whether it works end-to-end can depend on dependency availability/builds in the installation method you use; there are historical reports that Python 3.8 could be problematic with certain dependency stacks (e.g., datrie) even though Snakemake’s own minimum Python version includes 3.8.[2] Supporting evidence: - PyPI lists Snakemake 7.17.0 with “Python: >=3.7”, which includes Python 3.8.[1] - Snakemake’s 7.17.x changelog notes a “require at least Python 3.7” breaking change for 7.17.0.[3] - GitHub issue discussion indicates installation attempts on Python 3.8 failed in the past due to a dependency constraint, illustrating that “Python >=3.7” doesn’t automatically guarantee every dependency stack will install on 3.8.[2] Bottom line for “Python 3.8 compatibility requirements”: - Required Python version: >= 3.7 (so Python 3.8 is within the supported range).[1] - No additional “Python 3.8 only” requirement is stated in the Snakemake 7.17.0 PyPI metadata beyond the >=3.7 constraint.[1] - Practical compatibility may still be constrained by non-Python dependencies / transitive dependencies depending on your installation method/environment.[2][4]
Citations:
- 1: https://pypi.org/project/snakemake/7.17.0/
- 2: Python 3.8 not supported snakemake/snakemake#130
- 3: https://snakemake.readthedocs.io/en/v7.31.1/project_info/history.html
- 4: https://snakemake.readthedocs.io/en/v7.17.0/getting_started/installation.html
with (… as f): is Python 3.10+ syntax — incompatible with declared minimum Python 3.7.
The reformatted with statement at lines 496–500 places as f inside the parentheses, which is the Python 3.10+ syntax form. The project declares min_version("7.17.1") in the Snakefile, and Snakemake 7.17.1 supports Python ≥ 3.7. This code will fail with a SyntaxError on Python 3.7, 3.8, and (officially) 3.9.
Extract the context manager expression to a local variable so as f remains outside the parentheses:
Suggested fix
- with (
- checkpoints.stat_truth.get(benchmark=benchmark, cov=cov)
- .output[0]
- .open() as f
- ):
+ stat_file = checkpoints.stat_truth.get(benchmark=benchmark, cov=cov).output[0]
+ with stat_file.open() as f:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@workflow/rules/common.smk` around lines 496 - 500, The reformatted
with-statement uses Python 3.10-only syntax by placing "as f" inside
parentheses; change it so the context manager expression is assigned to a local
variable first (e.g., cm = checkpoints.stat_truth.get(benchmark=benchmark,
cov=cov).output[0].open()) and then use the traditional form "with cm as f:" so
that the "as f" remains outside parentheses; locate the expression built from
checkpoints.stat_truth.get(...).output[0].open() and refactor as described.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
workflow/rules/eval-results.smk (1)
2-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUpdate
extract-fp-fn.pyto handletpandtp-baselineclassifications.The script currently only filters for
fpandfnclassifications. With the new checkpoint at line 208 expandingclassification=["fp", "fn", "tp", "tp-baseline"], the rule will execute with all four values, but the script will produce empty output files fortpandtp-baselinebecause neither condition matches and execution falls through tocontinue. Add handling forClass.TPandClass.TP_BASELINEto extract those records correctly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/rules/eval-results.smk` around lines 2 - 23, The extract-fp-fn.py script currently only handles Class.FP and Class.FN and falls through for other classifications; update the script (the branch that checks the variant classification) to also handle Class.TP and Class.TP_BASELINE by adding cases that select and write true-positive records (and baseline true-positives) analogous to the existing FP/FN logic so that when Snakemake's extract_fp_fn rule runs with classification values "tp" or "tp-baseline" it emits the correct TSV rows instead of skipping (reference the Class enum values Class.TP and Class.TP_BASELINE and the code block where you currently have the FP/FN filtering and the continue).
🧹 Nitpick comments (3)
workflow/rules/eval-results.smk (3)
221-257: 💤 Low valueOptional: rename the
checkpoint=input key.Both
filter_shared_fnandfilter_uniqueusecheckpoint=as the input dict key, which collides visually with the Snakemakecheckpointkeyword and thecheckpoints.<name>accessor. Renaming to e.g.collection_done=keeps the gating intent obvious without overloading reserved-looking names.♻️ Proposed rename
input: fn="results/fp-fn-tp/benchmarks/{benchmark}.fn.tsv", - checkpoint=( + collection_done=( ancient("results/.checkpoints/fp-fn-tp-collection.done") if not config.get("stop-at-fp-fn-tp-collection") else [] ),Apply the same rename in
filter_unique(lines 245-249). No script changes are needed unlesssnakemake.input.checkpointis referenced inside the scripts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/rules/eval-results.smk` around lines 221 - 257, The input key named checkpoint in the rules filter_shared_fn and filter_unique visually collides with Snakemake's checkpoint keyword; rename that input key to something like collection_done in both rules (i.e., change input: checkpoint=(...) to input: collection_done=(...)) and update any references in the workflow or scripts that access snakemake.input.checkpoint to use snakemake.input.collection_done instead so behavior remains unchanged.
27-42: 💤 Low valueRename the input key
tp=to something classification-agnostic.This rule is now invoked for
fp,fn,tp, andtp-baseline, so calling the inputtpis misleading insideextract-fp-fn.py(snakemake.input.tpfor anfntable reads oddly). Considervcf=orcalls=.♻️ Proposed rename
rule extract_fp_fn_tp: input: - tp="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf", + vcf="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf",Update the corresponding
snakemake.input.tpreference inworkflow/scripts/if it exists.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/rules/eval-results.smk` around lines 27 - 42, The input key named tp in the rule extract_fp_fn_tp is misleading because the rule is used for fp, fn, tp and tp-baseline; rename the input key to a neutral name like vcf or calls (e.g., change tp="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf" to vcf=...) and update any references to snakemake.input.tp in the workflow/scripts/ helpers (and any usages of get_fp_fn_expression if it references snakemake.input.tp) to use snakemake.input.vcf (or snakemake.input.calls) so all callers and the wrapper ("v9.4.0/bio/vembrane/table") remain consistent.
208-218: ⚡ Quick winConsider demoting
checkpointtorulefor clarity.This is declared as a
checkpointbut produces only a static touch file, and downstream rules consume it via path andancient(...)rather thancheckpoints.fp_fn_tp_collection_done.get(...). Without reliance on the dynamic-output-driven DAG re-evaluation that checkpoints provide, a regularruleis functionally equivalent and avoids the checkpoint re-eval overhead. Thetp-baselineclassification is actively used downstream (inreformat-fp-fn-tp-tables.pyand throughoutcommon.smk), so the expanded target set is correct.♻️ Optional: demote to a regular rule
-checkpoint fp_fn_tp_collection_done: +rule fp_fn_tp_collection_done: input: expand( "results/fp-fn-tp/benchmarks/{benchmark}.{classification}.tsv", benchmark=used_benchmarks, classification=["fp", "fn", "tp", "tp-baseline"], ), output: "results/.checkpoints/fp-fn-tp-collection.done", shell: "mkdir -p $(dirname {output}) && touch {output}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/rules/eval-results.smk` around lines 208 - 218, The rule is declared as checkpoint fp_fn_tp_collection_done but only creates a static touch file and is consumed by downstream rules via file paths and ancient(...); change it to a regular rule named fp_fn_tp_collection_done (demote checkpoint to rule) keeping the same input expand(...) list (including "tp-baseline") and the same output and shell, so DAG re-evaluation overhead is avoided; update any references if you used checkpoints.* access (none expected since consumers use paths and ancient(...)), and ensure downstream scripts like reformat-fp-fn-tp-tables.py and references in common.smk continue to rely on the produced "results/.checkpoints/fp-fn-tp-collection.done" file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@README.md`:
- Around line 36-38: Update the fenced code block containing the command
"snakemake --until eval_fp_fn_tp_collection" to include a language identifier
(e.g., change the opening ``` to ```bash) so the snippet is syntax-highlighted
and more readable; locate the code block in README.md and modify only the
opening fence to add the language token.
---
Outside diff comments:
In `@workflow/rules/eval-results.smk`:
- Around line 2-23: The extract-fp-fn.py script currently only handles Class.FP
and Class.FN and falls through for other classifications; update the script (the
branch that checks the variant classification) to also handle Class.TP and
Class.TP_BASELINE by adding cases that select and write true-positive records
(and baseline true-positives) analogous to the existing FP/FN logic so that when
Snakemake's extract_fp_fn rule runs with classification values "tp" or
"tp-baseline" it emits the correct TSV rows instead of skipping (reference the
Class enum values Class.TP and Class.TP_BASELINE and the code block where you
currently have the FP/FN filtering and the continue).
---
Nitpick comments:
In `@workflow/rules/eval-results.smk`:
- Around line 221-257: The input key named checkpoint in the rules
filter_shared_fn and filter_unique visually collides with Snakemake's checkpoint
keyword; rename that input key to something like collection_done in both rules
(i.e., change input: checkpoint=(...) to input: collection_done=(...)) and
update any references in the workflow or scripts that access
snakemake.input.checkpoint to use snakemake.input.collection_done instead so
behavior remains unchanged.
- Around line 27-42: The input key named tp in the rule extract_fp_fn_tp is
misleading because the rule is used for fp, fn, tp and tp-baseline; rename the
input key to a neutral name like vcf or calls (e.g., change
tp="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf" to vcf=...) and
update any references to snakemake.input.tp in the workflow/scripts/ helpers
(and any usages of get_fp_fn_expression if it references snakemake.input.tp) to
use snakemake.input.vcf (or snakemake.input.calls) so all callers and the
wrapper ("v9.4.0/bio/vembrane/table") remain consistent.
- Around line 208-218: The rule is declared as checkpoint
fp_fn_tp_collection_done but only creates a static touch file and is consumed by
downstream rules via file paths and ancient(...); change it to a regular rule
named fp_fn_tp_collection_done (demote checkpoint to rule) keeping the same
input expand(...) list (including "tp-baseline") and the same output and shell,
so DAG re-evaluation overhead is avoided; update any references if you used
checkpoints.* access (none expected since consumers use paths and ancient(...)),
and ensure downstream scripts like reformat-fp-fn-tp-tables.py and references in
common.smk continue to rely on the produced
"results/.checkpoints/fp-fn-tp-collection.done" file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e544d302-4bee-42a3-98eb-ce51d4c23655
📒 Files selected for processing (5)
README.mdworkflow/Snakefileworkflow/rules/annotation.smkworkflow/rules/common.smkworkflow/rules/eval-results.smk
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/rules/common.smk
There was a problem hiding this comment.
Pull request overview
This PR extends the FP/FN/TP table pipeline to optionally include VAF extracted from the callset VCF, adds a convenience “stop after table collection” target, and adjusts contig-correction helper logic intended to fix liftover behavior.
Changes:
- Add
eval_fp_fn_tp_collectionphony target to build aggregated FP/FN/TP (+ TP-baseline) tables without running downstream unique/shared filtering. - Refactor Vembrane table extraction/renaming expressions to include VAF when available (without requiring a benchmark-level VAF config).
- Update contig-correction helper functions for liftover/merge selection, and document new TP outputs in the README.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
workflow/Snakefile |
Adds a phony target to build aggregated FP/FN/TP tables as an intermediate stopping point. |
workflow/rules/common.smk |
Adjusts liftover/rename input selection helpers and refactors optional VAF column extraction/renaming logic. |
README.md |
Documents optional TP/TP-baseline aggregated outputs and cleans up intermediate/cleanup bullets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def get_callset_correct_contigs(wildcards): | ||
| callset = config["variant-calls"][wildcards.callset] | ||
| vcf = callset["path"] | ||
| if callset.get("rename-contigs", False): | ||
| return "results/normalized-variants/{callset}.replaced-contigs.vcf.gz" | ||
| elif callset.get("genome-build", "grch38") == "grch37": | ||
| return "results/normalized-variants/{callset}.lifted.vcf.gz" | ||
| elif isinstance(vcf, dict): | ||
| # Input for liftover must be the pre-liftover callset. | ||
| if isinstance(vcf, dict): | ||
| return "results/merge-callsets/{callset}.merged.vcf.gz" | ||
| else: | ||
| return get_raw_callset(wildcards) |
| callset = config["variant-calls"][wildcards.callset] | ||
| vcf = callset["path"] | ||
| if isinstance(vcf, dict): | ||
| if callset.get("genome-build", "grch38") == "grch37": |
This adds the vaf from the callset vcf (if available) to the final fp table.
Summary by CodeRabbit
New Features
Documentation
Refactor
Chores