-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hello, thanks for great tool.
We are currently looking for optimization of our WGS pipeline and tried to replace FastQC with Falco.
According to paper:
We directly compared the text summary for each output of Falco to FastQC’s output summary files, obtaining the same outputs (pass, warning, or fail) for all tested criteria in all datasets.
Differences in the fastqc_data.txt files between the two programs result from choices for numerical precision output, or as a result of Falco calculating certain averages based on more of the data within each file
I assume that we cannot except equal results numerically, but at least criterias should be all same, am I correct? And numerical differences shouldn't be significant (only roundings)?
We used some public data for tests and found a lot of differences - can you show our errors in running commands or maybe we've found some bugs?
Sample 1 - FASTQ from GIAB:
Falco
falco -o /inputs/public/falco/ /inputs/public/U0a_CGATGT_L001_R1_001.fastq.gzsummary.txt
PASS Basic Statistics U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per base sequence quality U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per tile sequence quality U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per sequence quality scores U0a_CGATGT_L001_R1_001.fastq.gz
WARN Per base sequence content U0a_CGATGT_L001_R1_001.fastq.gz
WARN Per sequence GC content U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per base N content U0a_CGATGT_L001_R1_001.fastq.gz
PASS Sequence Length Distribution U0a_CGATGT_L001_R1_001.fastq.gz
PASS Sequence Duplication Levels U0a_CGATGT_L001_R1_001.fastq.gz
PASS Overrepresented sequences U0a_CGATGT_L001_R1_001.fastq.gz
PASS Adapter Content U0a_CGATGT_L001_R1_001.fastq.gz
fastqc_data.txt
FastQC
fastqc -o /inputs/public/fq/fastqc/ /inputs/public/fq/U0a_CGATGT_L001_R1_001.fastq.gzsummary.txt
PASS Basic Statistics U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per base sequence quality U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per tile sequence quality U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per sequence quality scores U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per base sequence content U0a_CGATGT_L001_R1_001.fastq.gz
WARN Per sequence GC content U0a_CGATGT_L001_R1_001.fastq.gz
PASS Per base N content U0a_CGATGT_L001_R1_001.fastq.gz
PASS Sequence Length Distribution U0a_CGATGT_L001_R1_001.fastq.gz
PASS Sequence Duplication Levels U0a_CGATGT_L001_R1_001.fastq.gz
PASS Overrepresented sequences U0a_CGATGT_L001_R1_001.fastq.gz
PASS Adapter Content U0a_CGATGT_L001_R1_001.fastq.gz
fastqc_data.txt
Difference:
Difference in Per base sequence content:
