Snakemake-based workflow to construct species phylogenies using BUSCOs
flowchart LR
%% ----- INPUT -----
subgraph INPUT["Input data"]
A_fa["Genome assemblies (FASTA)"]
A_vcf2["Multi-sample VCF"]
subgraph VCF_REF["VCF reconstruction"]
A_ref["Reference genome (FASTA)"]
A_vcf["Per-sample VCFs"]
end
end
%% ----- BUSCO -----
subgraph BUSCO["Ortholog extraction"]
B_busco["BUSCO"]
end
%% ----- PREPROCESSING -----
subgraph PREP["Sequence processing"]
subgraph ALN["Multiple alignment"]
C_aln["MAFFT\nMUSCLE\nPRANK"]
end
subgraph FLT["Trimming"]
C_flt["ClipKIT\nGBlocks\nTrimAl"]
end
end
%% ----- PHYLOGENY -----
subgraph PHYLO["Phylogenetic tree inference"]
subgraph CONCAT["Supermatrix approach"]
E_phy["IQTree\nMrBayes\nPHYLIP\nRAxML-NG\nRapidNJ"]
end
subgraph TREE["Multispecies coalescent"]
D_ast["Astral-IV"]
end
end
%% ----- MERGE NODE (invisible) -----
MERGE(( ))
%% ----- EDGES: MAIN -----
A_fa --> B_busco
A_ref -.-> B_busco
B_busco --> C_aln
B_busco -.-> MERGE
A_vcf --> MERGE
MERGE --> |"apply_vcf_to_busco.py"| C_aln
C_aln --> C_flt
C_flt -->|"Concat alignment"| E_phy
C_flt -->|"IQTree per gene"| D_ast
%% ----- EDGES: VCF2PHYLIP -----
A_vcf2 -. "vcf2phylip.py" .-> E_phy
%% ----- STYLE -----
classDef input fill:#e8f4ff,stroke:#2b7cd3,stroke-width:1px
classDef process fill:#eaf7ea,stroke:#2f9e44,stroke-width:1px
classDef phylo fill:#fff4e6,stroke:#e67700,stroke-width:1px
classDef optional fill:#e8f4ff,stroke:#2b7cd3,stroke-width:1px,stroke-dasharray:4 4
classDef merge fill:none,stroke:none,width:0px
class A_fa,A_ref,A_vcf input
class B_busco,C_aln,C_flt process
class D_ast,E_phy phylo
class A_vcf2 optional
class MERGE merge
- Ortholog extraction: BUSCO
- VCF-based SNP application: apply_vcf_to_busco.py, vcf2phylip
- Alignment: MAFFT, MUSCLE, PRANK
- Trimming: ClipKIT, TrimAl, GBlocks
- Phylogenetic tree construction: IQTree, MrBayes, ASTRAL-IV, RapidNJ, PHYLIP, RAxML-NG
- Visualization: Etetoolkit, Matplotlib
Full documentation is available in the Wiki
- Usage — input preparation, configuration, running the pipeline
- Configuration — all config parameters with defaults and descriptions
- Apptainer — running BuscoClade via pre-built Apptainer container
- Advanced Usage — starting from existing BUSCO results, gap-aware AltRef insertion
Please email: andrey.tomarovsky@gmail.com for questions or feedback.
