Skip to content

feat: adapt vaf extraction for non-vaf benchmarks and fix liftover#174

Open
famosab wants to merge 17 commits into
mainfrom
feat/vaf
Open

feat: adapt vaf extraction for non-vaf benchmarks and fix liftover#174
famosab wants to merge 17 commits into
mainfrom
feat/vaf

Conversation

@famosab
Copy link
Copy Markdown
Collaborator

@famosab famosab commented May 6, 2026

This adds the vaf from the callset vcf (if available) to the final fp table.

Summary by CodeRabbit

  • New Features

    • Generates TP and TP-baseline aggregated tables for callsets and benchmarks alongside FP/FN results.
  • Documentation

    • README updated to document new TP/TP-baseline outputs and clarify the --notemp guidance for retaining intermediates.
  • Refactor

    • FP/FN table generation refined to conditionally include VAF when available, adjusting output column behavior.
  • Chores

    • Several FP/FN-related outputs are no longer marked temporary and are written as persistent result files.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f456088-bda2-40b3-a6e8-2ed5013851bb

📥 Commits

Reviewing files that changed from the base of the PR and between c881d9c and 0ff149f.

📒 Files selected for processing (2)
  • workflow/Snakefile
  • workflow/rules/eval-results.smk
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/Snakefile

📝 Walkthrough

Walkthrough

PR adds VAF-aware FP/FN expression builders, reorders and reformats contig/liftover helper logic, expands classification outputs to include tp and tp-baseline, makes certain FP/FN-related outputs persistent, and documents the new TP deliverables.

Changes

FP/FN/TP classification and expression support

Layer / File(s) Summary
FP/FN expression helpers with VAF support
workflow/rules/common.smk
New get_fp_fn_expression(wildcards) builds CHROM/POS/ALT/REF and conditionally appends a VAF field based on wildcards.classification (benchmark VAF for fn/tp-baseline, callset VAF otherwise); get_rename_expression(wildcards) derives a 4- or 5-field rename map from that expression.
Helper function reordering and formatting
workflow/rules/common.smk
Reorder branching in contig/liftover helpers to check genome-build == grch37 before dict-vcf merge handling; add inline comment in get_callset_correct_contigs; reformat _get_nonempty_coverages with a parenthesized multi-line with.
Rule eval input expansion for tp/tp-baseline
workflow/Snakefile
Extend rule all and rule eval input expansions for results/report/fp-fn/callsets/{callset}/{classification} and results/fp-fn/benchmarks/{benchmark}.{classification}.tsv to include ["fp","fn","tp","tp-baseline"].
Make FP/FN/precision-recall outputs persistent
workflow/rules/eval-results.smk
Change several outputs (germline extract_fp_fn, somatic reformat_fp_fn_tp_tables renamed_table, and collect_stratifications TSV) from temp(...) to persistent results/...tsv paths.
Documentation updates
README.md
Add optional per-callset and per-benchmark TP TSV entries under results/fp-fn/ and clarify --notemp guidance for keeping intermediates during debugging.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • BiancaStoecker
  • johanneskoester

Poem

🐰 A rabbit hops through lines of code,
weaving VAF where variants showed.
TP joins FP and FN in queue,
branches tidy, outputs true.
Debugging carrots, --notemp too!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references VAF extraction for non-VAF benchmarks and liftover fixes, matching key changes in the PR that refactor FP/FN expression logic to handle VAF conditionally and reorganize liftover path handling.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/vaf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@workflow/rules/common.smk`:
- Around line 613-615: The conditional uses the unsupported
wildcards.get("classification") API; update it to follow the file's established
pattern by checking hasattr(wildcards, "classification") and then comparing
wildcards.classification to "fn" or "tp-baseline" (i.e., replace
wildcards.get("classification") == "fn" / == "tp-baseline" with a guarded
attribute access using hasattr and wildcards.classification) so the rule's
wildcard checks match the rest of the file's usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 796fd9b4-a80c-432a-adba-67adb6b37d42

📥 Commits

Reviewing files that changed from the base of the PR and between 5fa52d3 and e4c9bc8.

📒 Files selected for processing (1)
  • workflow/rules/common.smk

Comment thread workflow/rules/common.smk
Comment on lines +613 to +615
wildcards.get("classification") == "fn"
or wildcards.get("classification") == "tp-baseline"
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Snakemake (v7.17+ and current), does the wildcardsobject support.get("name"), or is attribute/index access the supported API?

💡 Result:

In Snakemake (v7.17+ and current), the supported API for the wildcards object is attribute access (e.g. wildcards.sample or {wildcards.sample}), not .get("name"). Authoritative docs describe accessing the wildcards via the wildcards namespace like input/output (e.g. {wildcards.group} and wildcards.sample) [1][2]. While some community answers mention wildcards.get("sample"), that is not documented in the official Snakemake docs as the intended/guaranteed interface for the wildcards object. The official guidance consistently uses attribute access [1][2]. If you want safe access to an optional wildcard, use Python logic around the known wildcard keys/expected structure rather than relying on .get(). For example, validate that the wildcard exists before referencing it as an attribute. Answer: Prefer attribute/index-style access; .get("name") is not the documented/guaranteed supported API in v7.17+ and current.

Citations:


🏁 Script executed:

cd workflow/rules && wc -l common.smk

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 96


🏁 Script executed:

# Examine the specific lines mentioned (613-615) and surrounding context
sed -n '605,625p' workflow/rules/common.smk | cat -n

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 890


🏁 Script executed:

# Search for all instances of wildcards.get usage in the file
rg "wildcards\.get\(" workflow/rules/common.smk

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 189


🏁 Script executed:

# Search for hasattr patterns with wildcards in the file
rg "hasattr\(wildcards" workflow/rules/common.smk -A 1 -B 1

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 1681


Use wildcard attribute access with hasattr() guards instead of wildcards.get(...) for consistency and safety.

The file consistently uses hasattr(wildcards, "fieldname") + attribute access (found in 10+ functions), but lines 613-615 use the undocumented wildcards.get("classification") API. Snakemake's official documentation only specifies attribute access; .get() is not a guaranteed/supported interface.

Proposed fix
-    if (
-        wildcards.get("classification") == "fn"
-        or wildcards.get("classification") == "tp-baseline"
-    ):
+    classification = (
+        wildcards.classification if hasattr(wildcards, "classification") else None
+    )
+    if classification in {"fn", "tp-baseline"}:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/rules/common.smk` around lines 613 - 615, The conditional uses the
unsupported wildcards.get("classification") API; update it to follow the file's
established pattern by checking hasattr(wildcards, "classification") and then
comparing wildcards.classification to "fn" or "tp-baseline" (i.e., replace
wildcards.get("classification") == "fn" / == "tp-baseline" with a guarded
attribute access using hasattr and wildcards.classification) so the rule's
wildcard checks match the rest of the file's usage.

@famosab famosab changed the title feat: adapt vaf extraction for non-vaf benchmarks feat: adapt vaf extraction for non-vaf benchmarks and fix liftover May 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@workflow/rules/common.smk`:
- Around line 496-500: The reformatted with-statement uses Python 3.10-only
syntax by placing "as f" inside parentheses; change it so the context manager
expression is assigned to a local variable first (e.g., cm =
checkpoints.stat_truth.get(benchmark=benchmark, cov=cov).output[0].open()) and
then use the traditional form "with cm as f:" so that the "as f" remains outside
parentheses; locate the expression built from
checkpoints.stat_truth.get(...).output[0].open() and refactor as described.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4fbc3c28-eed5-4874-b61d-814049122eef

📥 Commits

Reviewing files that changed from the base of the PR and between e4c9bc8 and e6642e8.

📒 Files selected for processing (1)
  • workflow/rules/common.smk

Comment thread workflow/rules/common.smk
Comment on lines +496 to +500
with (
checkpoints.stat_truth.get(benchmark=benchmark, cov=cov)
.output[0]
.open() as f
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look for declared Python version constraints in setup/config files
fd -e cfg -e toml -e txt -e yaml --exclude ".snakemake" | xargs grep -l "python_requires\|python >=\|python>=\|Programming Language :: Python :: 3\." 2>/dev/null | head -20
rg -n "min_version|python_requires|python >=" --type yaml --type toml 2>/dev/null | head -20

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 63


🏁 Script executed:

# Find all requirements/environment files
fd -type f \( -name "requirements*.txt" -o -name "environment*.yml" -o -name "environment*.yaml" -o -name "setup.py" -o -name "pyproject.toml" -o -name "Snakefile" \) | head -20

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 252


🏁 Script executed:

# Look for Snakemake version constraint or min_version in Snakefiles and rules
rg -l "Snakefile|\.smk" | head -10

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 100


🏁 Script executed:

# Check the actual code at lines 496-500 to confirm the syntax
sed -n '490,505p' workflow/rules/common.smk

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 500


🏁 Script executed:

# Find all constraint/config files
find . -maxdepth 3 \( -name "*.txt" -o -name "*.yml" -o -name "*.yaml" -o -name "*.toml" -o -name "*.py" \) 2>/dev/null | grep -iE "(require|environment|setup|config|snakemake)" | head -30

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 163


🏁 Script executed:

# Check the main Snakefile for min_version
cat workflow/Snakefile | head -50

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 1601


🏁 Script executed:

# Look for requirements in the root and common directories
cat requirements.txt 2>/dev/null || cat environment.yml 2>/dev/null || cat pyproject.toml 2>/dev/null || echo "No standard config found in root"

Repository: snakemake-workflows/dna-seq-benchmark

Length of output: 114


🌐 Web query:

Snakemake 7.17.1 Python version requirement minimum

💡 Result:

Minimum Python version required for Snakemake 7.17.1: Python 3.5 (i.e., “Python ≥3.5”). [1] This requirement is stated in the Snakemake 7.17.1 documentation under the setup prerequisites (“Python ≥3.5”). [1]

Citations:


🌐 Web query:

Snakemake 7.17 Python 3.8 compatibility requirements

💡 Result:

Snakemake 7.17.x requires Python >= 3.7, so it is compatible with Python 3.8 on the Python-version level.[1] However, whether it works end-to-end can depend on dependency availability/builds in the installation method you use; there are historical reports that Python 3.8 could be problematic with certain dependency stacks (e.g., datrie) even though Snakemake’s own minimum Python version includes 3.8.[2] Supporting evidence: - PyPI lists Snakemake 7.17.0 with “Python: >=3.7”, which includes Python 3.8.[1] - Snakemake’s 7.17.x changelog notes a “require at least Python 3.7” breaking change for 7.17.0.[3] - GitHub issue discussion indicates installation attempts on Python 3.8 failed in the past due to a dependency constraint, illustrating that “Python >=3.7” doesn’t automatically guarantee every dependency stack will install on 3.8.[2] Bottom line for “Python 3.8 compatibility requirements”: - Required Python version: >= 3.7 (so Python 3.8 is within the supported range).[1] - No additional “Python 3.8 only” requirement is stated in the Snakemake 7.17.0 PyPI metadata beyond the >=3.7 constraint.[1] - Practical compatibility may still be constrained by non-Python dependencies / transitive dependencies depending on your installation method/environment.[2][4]

Citations:


with (… as f): is Python 3.10+ syntax — incompatible with declared minimum Python 3.7.

The reformatted with statement at lines 496–500 places as f inside the parentheses, which is the Python 3.10+ syntax form. The project declares min_version("7.17.1") in the Snakefile, and Snakemake 7.17.1 supports Python ≥ 3.7. This code will fail with a SyntaxError on Python 3.7, 3.8, and (officially) 3.9.

Extract the context manager expression to a local variable so as f remains outside the parentheses:

Suggested fix
-        with (
-            checkpoints.stat_truth.get(benchmark=benchmark, cov=cov)
-            .output[0]
-            .open() as f
-        ):
+        stat_file = checkpoints.stat_truth.get(benchmark=benchmark, cov=cov).output[0]
+        with stat_file.open() as f:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/rules/common.smk` around lines 496 - 500, The reformatted
with-statement uses Python 3.10-only syntax by placing "as f" inside
parentheses; change it so the context manager expression is assigned to a local
variable first (e.g., cm = checkpoints.stat_truth.get(benchmark=benchmark,
cov=cov).output[0].open()) and then use the traditional form "with cm as f:" so
that the "as f" remains outside parentheses; locate the expression built from
checkpoints.stat_truth.get(...).output[0].open() and refactor as described.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
workflow/rules/eval-results.smk (1)

2-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update extract-fp-fn.py to handle tp and tp-baseline classifications.

The script currently only filters for fp and fn classifications. With the new checkpoint at line 208 expanding classification=["fp", "fn", "tp", "tp-baseline"], the rule will execute with all four values, but the script will produce empty output files for tp and tp-baseline because neither condition matches and execution falls through to continue. Add handling for Class.TP and Class.TP_BASELINE to extract those records correctly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/rules/eval-results.smk` around lines 2 - 23, The extract-fp-fn.py
script currently only handles Class.FP and Class.FN and falls through for other
classifications; update the script (the branch that checks the variant
classification) to also handle Class.TP and Class.TP_BASELINE by adding cases
that select and write true-positive records (and baseline true-positives)
analogous to the existing FP/FN logic so that when Snakemake's extract_fp_fn
rule runs with classification values "tp" or "tp-baseline" it emits the correct
TSV rows instead of skipping (reference the Class enum values Class.TP and
Class.TP_BASELINE and the code block where you currently have the FP/FN
filtering and the continue).
🧹 Nitpick comments (3)
workflow/rules/eval-results.smk (3)

221-257: 💤 Low value

Optional: rename the checkpoint= input key.

Both filter_shared_fn and filter_unique use checkpoint= as the input dict key, which collides visually with the Snakemake checkpoint keyword and the checkpoints.<name> accessor. Renaming to e.g. collection_done= keeps the gating intent obvious without overloading reserved-looking names.

♻️ Proposed rename
     input:
         fn="results/fp-fn-tp/benchmarks/{benchmark}.fn.tsv",
-        checkpoint=(
+        collection_done=(
             ancient("results/.checkpoints/fp-fn-tp-collection.done")
             if not config.get("stop-at-fp-fn-tp-collection")
             else []
         ),

Apply the same rename in filter_unique (lines 245-249). No script changes are needed unless snakemake.input.checkpoint is referenced inside the scripts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/rules/eval-results.smk` around lines 221 - 257, The input key named
checkpoint in the rules filter_shared_fn and filter_unique visually collides
with Snakemake's checkpoint keyword; rename that input key to something like
collection_done in both rules (i.e., change input: checkpoint=(...) to input:
collection_done=(...)) and update any references in the workflow or scripts that
access snakemake.input.checkpoint to use snakemake.input.collection_done instead
so behavior remains unchanged.

27-42: 💤 Low value

Rename the input key tp= to something classification-agnostic.

This rule is now invoked for fp, fn, tp, and tp-baseline, so calling the input tp is misleading inside extract-fp-fn.py (snakemake.input.tp for an fn table reads oddly). Consider vcf= or calls=.

♻️ Proposed rename
 rule extract_fp_fn_tp:
     input:
-        tp="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf",
+        vcf="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf",

Update the corresponding snakemake.input.tp reference in workflow/scripts/ if it exists.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/rules/eval-results.smk` around lines 27 - 42, The input key named tp
in the rule extract_fp_fn_tp is misleading because the rule is used for fp, fn,
tp and tp-baseline; rename the input key to a neutral name like vcf or calls
(e.g., change tp="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf" to
vcf=...) and update any references to snakemake.input.tp in the
workflow/scripts/ helpers (and any usages of get_fp_fn_expression if it
references snakemake.input.tp) to use snakemake.input.vcf (or
snakemake.input.calls) so all callers and the wrapper
("v9.4.0/bio/vembrane/table") remain consistent.

208-218: ⚡ Quick win

Consider demoting checkpoint to rule for clarity.

This is declared as a checkpoint but produces only a static touch file, and downstream rules consume it via path and ancient(...) rather than checkpoints.fp_fn_tp_collection_done.get(...). Without reliance on the dynamic-output-driven DAG re-evaluation that checkpoints provide, a regular rule is functionally equivalent and avoids the checkpoint re-eval overhead. The tp-baseline classification is actively used downstream (in reformat-fp-fn-tp-tables.py and throughout common.smk), so the expanded target set is correct.

♻️ Optional: demote to a regular rule
-checkpoint fp_fn_tp_collection_done:
+rule fp_fn_tp_collection_done:
     input:
         expand(
             "results/fp-fn-tp/benchmarks/{benchmark}.{classification}.tsv",
             benchmark=used_benchmarks,
             classification=["fp", "fn", "tp", "tp-baseline"],
         ),
     output:
         "results/.checkpoints/fp-fn-tp-collection.done",
     shell:
         "mkdir -p $(dirname {output}) && touch {output}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/rules/eval-results.smk` around lines 208 - 218, The rule is declared
as checkpoint fp_fn_tp_collection_done but only creates a static touch file and
is consumed by downstream rules via file paths and ancient(...); change it to a
regular rule named fp_fn_tp_collection_done (demote checkpoint to rule) keeping
the same input expand(...) list (including "tp-baseline") and the same output
and shell, so DAG re-evaluation overhead is avoided; update any references if
you used checkpoints.* access (none expected since consumers use paths and
ancient(...)), and ensure downstream scripts like reformat-fp-fn-tp-tables.py
and references in common.smk continue to rely on the produced
"results/.checkpoints/fp-fn-tp-collection.done" file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Around line 36-38: Update the fenced code block containing the command
"snakemake --until eval_fp_fn_tp_collection" to include a language identifier
(e.g., change the opening ``` to ```bash) so the snippet is syntax-highlighted
and more readable; locate the code block in README.md and modify only the
opening fence to add the language token.

---

Outside diff comments:
In `@workflow/rules/eval-results.smk`:
- Around line 2-23: The extract-fp-fn.py script currently only handles Class.FP
and Class.FN and falls through for other classifications; update the script (the
branch that checks the variant classification) to also handle Class.TP and
Class.TP_BASELINE by adding cases that select and write true-positive records
(and baseline true-positives) analogous to the existing FP/FN logic so that when
Snakemake's extract_fp_fn rule runs with classification values "tp" or
"tp-baseline" it emits the correct TSV rows instead of skipping (reference the
Class enum values Class.TP and Class.TP_BASELINE and the code block where you
currently have the FP/FN filtering and the continue).

---

Nitpick comments:
In `@workflow/rules/eval-results.smk`:
- Around line 221-257: The input key named checkpoint in the rules
filter_shared_fn and filter_unique visually collides with Snakemake's checkpoint
keyword; rename that input key to something like collection_done in both rules
(i.e., change input: checkpoint=(...) to input: collection_done=(...)) and
update any references in the workflow or scripts that access
snakemake.input.checkpoint to use snakemake.input.collection_done instead so
behavior remains unchanged.
- Around line 27-42: The input key named tp in the rule extract_fp_fn_tp is
misleading because the rule is used for fp, fn, tp and tp-baseline; rename the
input key to a neutral name like vcf or calls (e.g., change
tp="results/vcfeval/{callset}/{cov}/{classification}.norm.vcf" to vcf=...) and
update any references to snakemake.input.tp in the workflow/scripts/ helpers
(and any usages of get_fp_fn_expression if it references snakemake.input.tp) to
use snakemake.input.vcf (or snakemake.input.calls) so all callers and the
wrapper ("v9.4.0/bio/vembrane/table") remain consistent.
- Around line 208-218: The rule is declared as checkpoint
fp_fn_tp_collection_done but only creates a static touch file and is consumed by
downstream rules via file paths and ancient(...); change it to a regular rule
named fp_fn_tp_collection_done (demote checkpoint to rule) keeping the same
input expand(...) list (including "tp-baseline") and the same output and shell,
so DAG re-evaluation overhead is avoided; update any references if you used
checkpoints.* access (none expected since consumers use paths and ancient(...)),
and ensure downstream scripts like reformat-fp-fn-tp-tables.py and references in
common.smk continue to rely on the produced
"results/.checkpoints/fp-fn-tp-collection.done" file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e544d302-4bee-42a3-98eb-ce51d4c23655

📥 Commits

Reviewing files that changed from the base of the PR and between e6642e8 and fa20213.

📒 Files selected for processing (5)
  • README.md
  • workflow/Snakefile
  • workflow/rules/annotation.smk
  • workflow/rules/common.smk
  • workflow/rules/eval-results.smk
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/rules/common.smk

Comment thread README.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the FP/FN/TP table pipeline to optionally include VAF extracted from the callset VCF, adds a convenience “stop after table collection” target, and adjusts contig-correction helper logic intended to fix liftover behavior.

Changes:

  • Add eval_fp_fn_tp_collection phony target to build aggregated FP/FN/TP (+ TP-baseline) tables without running downstream unique/shared filtering.
  • Refactor Vembrane table extraction/renaming expressions to include VAF when available (without requiring a benchmark-level VAF config).
  • Update contig-correction helper functions for liftover/merge selection, and document new TP outputs in the README.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
workflow/Snakefile Adds a phony target to build aggregated FP/FN/TP tables as an intermediate stopping point.
workflow/rules/common.smk Adjusts liftover/rename input selection helpers and refactors optional VAF column extraction/renaming logic.
README.md Documents optional TP/TP-baseline aggregated outputs and cleans up intermediate/cleanup bullets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread workflow/rules/common.smk
Comment on lines 243 to 250
def get_callset_correct_contigs(wildcards):
callset = config["variant-calls"][wildcards.callset]
vcf = callset["path"]
if callset.get("rename-contigs", False):
return "results/normalized-variants/{callset}.replaced-contigs.vcf.gz"
elif callset.get("genome-build", "grch38") == "grch37":
return "results/normalized-variants/{callset}.lifted.vcf.gz"
elif isinstance(vcf, dict):
# Input for liftover must be the pre-liftover callset.
if isinstance(vcf, dict):
return "results/merge-callsets/{callset}.merged.vcf.gz"
else:
return get_raw_callset(wildcards)
Comment thread workflow/rules/common.smk
callset = config["variant-calls"][wildcards.callset]
vcf = callset["path"]
if isinstance(vcf, dict):
if callset.get("genome-build", "grch38") == "grch37":
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants