Skip to content

Latest commit

 

History

History
95 lines (74 loc) · 4.16 KB

File metadata and controls

95 lines (74 loc) · 4.16 KB

Column descriptions

Variant identifiers

  • chrom: Chromosome (e.g., chr1, chrX).
  • Var_pos: 1-based position of the candidate variant.
  • Var_ref / Var_alt: REF/ALT alleles of the candidate variant.

Haplotype-consistency label

  • hap_label: Label derived from read-stacking patterns between the candidate and its germline anchor.
    • hap=2: Diploid heterozygous-like pattern (non-mosaic-like).
    • hap=3: Mosaic-consistent pattern in diploid context (passes the gate).
    • hap=3_sex: Mosaic-consistent pattern on male chrX/chrY using a homozygous anchor (PAR ignored).
      Kept as hap=3_* so grep hap=3 captures both.
    • hap>3: Inconsistent / unmodeled pattern (often artifact or complex locus).
    • hap=NA: A germline anchor exists but no spanning reads cover both loci (n_common_reads == 0).
    • Not_applicable: No germline anchor was assigned upstream.

Platform selection

  • LR_source: Which long-read platform metrics were used for LR fields.
    • PB: use PB_* INFO fields
    • ONT: use ONT_* INFO fields
      This is inferred from the BAM filename used for read stacking.

Short-read (SR) metrics (always Illumina)

  • SR_vaf / SR_dp / SR_alt: Taken from CrossPVal-only VCF INFO fields:
    • ILL_VAF, ILL_DP, ILL_AD_ALT

Long-read (LR) metrics (depends on LR_source)

  • LR_vaf / LR_dp / LR_alt: Taken from CrossPVal-only VCF INFO fields:
    • if LR_source=PB: PB_VAF, PB_DP, PB_AD_ALT
    • if LR_source=ONT: ONT_VAF, ONT_DP, ONT_AD_ALT
  • p_binom_LR_0_5: Two-sided binomial test p-value for LR ALT fraction vs 0.5:
    • binomtest(LR_alt, LR_dp, p=0.5)

Read-stacking counts (phasing evidence)

  • copy_depth: Expected per-copy depth used as a baseline:
    • diploid: depth/2
    • haploid (male chrX/chrY): depth
      (depth is provided by the workflow.)
  • n_common_reads: Number of reads spanning both the candidate site and the germline anchor site.
  • n_var_alt: Among n_common_reads, reads carrying the candidate ALT at Var_pos.
  • n_germ_alt: Among n_common_reads, reads carrying the germline anchor ALT at Germ_pos.

Copy-model / segdup-like assessment block

This block evaluates whether the observed fraction of variant-supporting reads can be explained by a discrete copy / multi-mapping model (often seen in segmental duplications or collapsed repeats).

  • Ncopy_est: Estimated effective copy count from coverage inflation:
    • Ncopy_est = round(n_common_reads / copy_depth) (minimum 1)
    • Interpretation:
      • ~1: expected coverage (clean mapping)
      • >=2: inflated coverage (possible multi-mapping / segdup / CN inflation)
  • best_model: The discrete model k/Ncopy_est that best fits n_var_alt / n_common_reads,
    selected by maximizing binomial p-value among all k=1..Ncopy_est.
  • best_p: The probability corresponding to best_model (e.g., 2/3 = 0.666...).
  • p_binom_best_model: Binomial p-value under the best-fitting discrete model:
    • binomtest(n_var_alt, n_common_reads, p=best_p)
    • Large values indicate the discrete model explains the observed fraction well.

REGIONS, Tag, Decision

  • REGIONS: Comma-separated region tags parsed from INFO/REGIONS (e.g., GIAB_Difficult, GIAB_Segdup, SMaHT_Extreme).
  • Tag: Semicolon-separated tags summarizing warnings; HighConf if no warnings.
  • Decision: PASS or Failed.

Tag meanings and assignment rules

After computing hap_label, tags are assigned (default p-value threshold: 0.01):

Gate

  • If hap_label is not hap=3 or hap=3_sex:
    Decision=Failed, Tag=Phasing_fail

If the gate passes (hap=3 / hap=3_sex)

Start with Decision=PASS, then add tags:

  • Weak_align
    Added if n_common_reads < copy_depth (insufficient spanning coverage for stable phasing).

  • VAF_high
    Added if LR ALT fraction looks germline-like or too high:

    • p_binom_LR_0_5 > 0.01 OR
    • LR_alt / LR_dp > 0.5
  • pCopy_model
    Added if a discrete copy model explains n_var_alt / n_common_reads well:

    • p_binom_best_model >= 0.01
  • HighConf
    Assigned only if no tags were added after passing the hap gate.