- chrom: Chromosome (e.g.,
chr1,chrX). - Var_pos: 1-based position of the candidate variant.
- Var_ref / Var_alt: REF/ALT alleles of the candidate variant.
- hap_label: Label derived from read-stacking patterns between the candidate and its germline anchor.
hap=2: Diploid heterozygous-like pattern (non-mosaic-like).hap=3: Mosaic-consistent pattern in diploid context (passes the gate).hap=3_sex: Mosaic-consistent pattern on male chrX/chrY using a homozygous anchor (PAR ignored).
Kept ashap=3_*sogrep hap=3captures both.hap>3: Inconsistent / unmodeled pattern (often artifact or complex locus).hap=NA: A germline anchor exists but no spanning reads cover both loci (n_common_reads == 0).Not_applicable: No germline anchor was assigned upstream.
- LR_source: Which long-read platform metrics were used for LR fields.
PB: usePB_*INFO fieldsONT: useONT_*INFO fields
This is inferred from the BAM filename used for read stacking.
- SR_vaf / SR_dp / SR_alt: Taken from CrossPVal-only VCF
INFOfields:ILL_VAF,ILL_DP,ILL_AD_ALT
- LR_vaf / LR_dp / LR_alt: Taken from CrossPVal-only VCF
INFOfields:- if
LR_source=PB:PB_VAF,PB_DP,PB_AD_ALT - if
LR_source=ONT:ONT_VAF,ONT_DP,ONT_AD_ALT
- if
- p_binom_LR_0_5: Two-sided binomial test p-value for LR ALT fraction vs 0.5:
binomtest(LR_alt, LR_dp, p=0.5)
- copy_depth: Expected per-copy depth used as a baseline:
- diploid:
depth/2 - haploid (male chrX/chrY):
depth
(depthis provided by the workflow.)
- diploid:
- n_common_reads: Number of reads spanning both the candidate site and the germline anchor site.
- n_var_alt: Among
n_common_reads, reads carrying the candidate ALT atVar_pos. - n_germ_alt: Among
n_common_reads, reads carrying the germline anchor ALT atGerm_pos.
This block evaluates whether the observed fraction of variant-supporting reads can be explained by a discrete copy / multi-mapping model (often seen in segmental duplications or collapsed repeats).
- Ncopy_est: Estimated effective copy count from coverage inflation:
Ncopy_est = round(n_common_reads / copy_depth)(minimum 1)- Interpretation:
~1: expected coverage (clean mapping)>=2: inflated coverage (possible multi-mapping / segdup / CN inflation)
- best_model: The discrete model
k/Ncopy_estthat best fitsn_var_alt / n_common_reads,
selected by maximizing binomial p-value among allk=1..Ncopy_est. - best_p: The probability corresponding to
best_model(e.g.,2/3 = 0.666...). - p_binom_best_model: Binomial p-value under the best-fitting discrete model:
binomtest(n_var_alt, n_common_reads, p=best_p)- Large values indicate the discrete model explains the observed fraction well.
- REGIONS: Comma-separated region tags parsed from
INFO/REGIONS(e.g.,GIAB_Difficult,GIAB_Segdup,SMaHT_Extreme). - Tag: Semicolon-separated tags summarizing warnings;
HighConfif no warnings. - Decision:
PASSorFailed.
After computing hap_label, tags are assigned (default p-value threshold: 0.01):
- If
hap_labelis nothap=3orhap=3_sex:
→Decision=Failed,Tag=Phasing_fail
Start with Decision=PASS, then add tags:
-
Weak_align
Added ifn_common_reads < copy_depth(insufficient spanning coverage for stable phasing). -
VAF_high
Added if LR ALT fraction looks germline-like or too high:p_binom_LR_0_5 > 0.01ORLR_alt / LR_dp > 0.5
-
pCopy_model
Added if a discrete copy model explainsn_var_alt / n_common_readswell:p_binom_best_model >= 0.01
-
HighConf
Assigned only if no tags were added after passing the hap gate.