Skip to content

thisisamirv/SPRING2

Repository files navigation

SPRING2

CI

SPRING2 logo

LICENSE RESTRICTION NOTICE

SPRING is not provided under a permissive open-source license. Under the Non-Exclusive Research Use Software and Patent License Agreement, the software and patent rights are licensed only for non-profit research, educational, personal, and individual use. Commercial use, sublicensing, assignment, transfer, or other unlicensed making-available to third parties is not authorized absent a separate license from the University of Illinois. Any use, copying, modification, disclosure, or redistribution must comply with the original license terms, and derivative works must be clearly marked and renamed. Treat this repository as demonstration and reference material unless your intended use is expressly permitted by that license.

SPRING2 is a compressor for FASTQ and FASTA sequencing data, including paired-end data and gzipped FASTQ inputs. It is built on top of the original SPRING project and represents a substantial modernization of that codebase.

Tip

This revision substantially modernizes the original SPRING codebase. It upgrades the build system to C++20 and CMake 4.2, removes the remaining Boost dependency, adds first-class cross-platform support for Linux, macOS, and native Windows builds, and expands the runtime to cover grouped lane archives, assay-aware compression behavior, preview and audit workflows, richer gzip metadata preservation, and cleaner developer and benchmarking tooling.

Features

  • Near-optimal compression for single-end, paired-end, and grouped multi-lane sequencing datasets
  • Lossless preservation of reads, quality scores, and identifiers, with reversible assay-aware transforms where applicable
  • Optional lossy quality compression using QVZ, Illumina 8-level binning, or binary thresholding
  • Automatic FASTQ versus FASTA detection, plain-text versus gzipped input handling, and gzip-aware output restoration
  • Support for grouped archives with R1/R2 plus optional R3, I1, and I2 lanes
  • Automatic and explicit assay handling for dna, rna, atac, bisulfite, sc-rna, sc-atac, and sc-bisulfite
  • Preview mode for archive inspection and audit mode for dry-run integrity verification without full decompression
  • Record-level CRC32 verification for lossless archives during decompression and optional post-compression auditing
  • Cross-platform builds and CI coverage for Linux, macOS, and native Windows toolchains
  • Free for non-profit research and educational use

Platform Support

SPRING2 is built and validated on:

  • Linux: x86_64, ARM64
  • macOS: Universal (Intel/Apple Silicon)
  • Windows: Native via MinGW-w64, Native ARM64 via MSVC

Documentation

Note

📚 Published documentation, with tips, examples, and full reference is available at: https://spring2.readthedocs.io/

Installation

Pre-built Binaries

The easiest way to use SPRING2 is to download the pre-built binaries from the Releases page.

Windows releases include both portable spring2.exe binaries and architecture-specific Inno Setup installers with embedded version, icon, and publisher metadata.

Running SPRING

Current command-line help:

Allowed options:

* General Options:
  -h [ --help ]                   produce help message
  -V [ --version ]                produce version information
  -o [ --output ] arg             output file name
                                    - if not specified, it uses original input
                                      filenames (swapping extension to .sp during
                                      compression)
                                    - for paired end decompression, if only one file
                                      is specified, two output files will be created
                                      by suffixing .1 and .2
  -t [ --threads ] arg            number of threads (default:
                                  min(max(1, hw_threads - 1), 16))
  -m [ --memory ] arg (=0)        approximate memory budget in GB; reduces
                                  effective thread count using about 1 GB per
                                  worker thread (0 disables)
  -v [ --verbose ]                enable extensive logging (default: progress bar)
--------------------------------------------------------------------------------
* Compression Options:
  -c [ --compress ]               compress
  -R1 [ --R1 ] arg                input read-1 file (required)
  -R2 [ --R2 ] arg                input read-2 file (optional; enables paired-end mode)
  -R3 [ --R3 ] arg                input read-3 file (optional; requires --R2)
  -I1 [ --I1 ] arg                input index-read-1 file (optional; requires --R2)
  -I2 [ --I2 ] arg                input index-read-2 file (optional; requires --I1)
  -l [ --level ] arg (=6)         compression level (1-9) to use for output
                                  (.gz) formatting (passed to gzip unchanged
                                  and scaled to Zstd 1-22 internally)
  -s [ --strip ] arg              discard data: i (ids), o (order), q (quality)
                                  Example: --strip io to drop ids and order.
  -q [ --qmod ] arg               quality mode: possible modes are
                                    1. -q lossless (default)
                                    2. -q qvz qv_ratio (QVZ lossy compression,
                                      parameter qv_ratio roughly corresponds to
                                      bits used per quality value)
                                    3. -q ill_bin (Illumina 8-level binning)
                                    4. -q binary thr high low (binary (2-level)
                                      thresholding, quality binned to high if >=
                                      thr and to low if < thr)
  -n [ --note ] arg               add a custom note to the archive
  -y [ --assay ] arg (=auto)      specify assay type. Valid choices:
                                  auto, dna, rna, atac, bisulfite,
                                  sc-rna, sc-atac, sc-bisulfite
  -a [ --audit ]                  enable post-operation integrity verification
--------------------------------------------------------------------------------
* Decompression Options:
  -d [ --decompress ]             decompress
  -i [ --input ] arg              input archive file (.sp)
  -u [ --unzip ]                  during decompression, force output to be
                                  uncompressed (even if original was .gz)

When an archive was created with grouped lanes (`--R3` and/or `--I1/--I2`),
decompression with a single `-o <prefix>` emits grouped outputs using suffixes
`.R1`, `.R2`, optional `.R3`, optional `.I1`, and optional `.I2`.

Previewing Archives

You can use spring2 --preview to quickly inspect archive metadata, original filenames, and custom notes. It also provides a high-speed Audit mode to verify archive integrity without full decompression:

# Display metadata preview
spring2 --preview archive.sp

# Perform a high-speed integrity audit (dry-run)
spring2 --preview --audit archive.sp

Example:

--------------------------------
Original Input 1:  file.fastq.gz
Note:              My Note
Assay Type:        sc-rna
Mode:              Single-end
Reads Processed:   321055
Compression Ratio: 5.81x (2312 / 398 MB)
Max Read Length:   301 (using short-read encoder)
Preserve Order:    Yes
Preserve IDs:      Yes
Preserve Quality:  Yes
Quality Mode:      Lossless
Compression Level: 6
Use CRLF:          No
--------------------------------
Input 1 Original Compression:
  Profile:         BGZF (Default)
  Format:          BGZF (Block Gzip)
  Block Size:      2185
  Uncompressed Name: file.fastq
  Gzip Header:     FLG=0x4, MTIME=0, OS=255
  Member Count:    102860
  Original Ratio:  5.53x (6382 / 1154 MB)
  Likely Origin:   htslib/samtools/clib
--------------------------------

For a comprehensive list of all options, quality modes, and multi-threaded examples, see the Usage Guide and CLI Reference.

Related

Reporting Bugs and Errors

If you encounter a bug, crash, or unexpected behavior, please open an issue at:

https://github.com/thisisamirv/SPRING2/issues

To help us reproduce and diagnose problems quickly, include:

  • SPRING2 version (spring2 --version)
  • Platform details (OS, architecture, shell/environment)
  • Exact command you ran

For diagnostics, please also provide logs from debug verbosity. Capture both stdout and stderr into a file:

spring2 ... --verbose debug > spring2-debug.log 2>&1

PowerShell equivalent:

spring2 ... --verbose debug *> spring2-debug.log

Please upload spring2-debug.log to the issue so we can diagnose the problem quickly.

If your command includes sensitive paths or data, redact private information before posting logs.

About

A modernized version of SPRING (ONLY FOR DEMONSTRATION)

Resources

License

Stars

Watchers

Forks

Contributors