Library Format Specification

JSON Format

A nimble library is a valid JSON file. The top-level of the file is an array containing two JSON objects.

Aligner Configuration

The first object contains the aligner configuration:

{
  "score_threshold": number,
  "score_percent": number,
  "num_mismatches": number,
  "discard_multiple_matches": boolean,
  "intersect_level: number",
  "group_on": string,
  "discard_multi_hits": number,
  "require_valid_pair": boolean,
  "max_hits_to_report": number,
  "trim_target_length": number,
  "trim_strictness": number
}

score_threshold: controls the score in base pairs an alignment needs to reach to be considered a match. For perfect matches, set this value equal to the length of the reads being aligned to the reference library.
score_percent: controls the score in by percentage of read length that an alignment needs to reach to be considered a match. For perfect matches, set this value equal to 1.0.
num_mismatches: sets the allowable number of mismatches during alignment.
discard_multiple_matches: boolean flag for whether a read that matches multiple references should be retained.
intersect_level: controls logic behind how to count matches during alignment. There are three intersect levels. intersect_level: 0 takes the best matches from either the read or reverse read, determined by alignment score. intersect_level: 1 takes the intersection between the read and reverse read -- if there is no intersection, it defaults to the best match. intersect_level: 2 takes the intersection and reports no match if there is no intersection.
group_on: if this is set to the name of a header in the reference metadata file, the output will be filtered to that level of specificity. For instance, if you've added a column with lineage information under a header called "lineage", setting "group_on": "lineage" will report lineage-level information, rather than the default case of reference-level information. If a single read matches onto the group_on category more than once during alignment (for instance, if a read matches multiple alleles in the same lineage and you're grouping on lineage), it will only count as one match and instead increment the score for that lineage. If group_on is unset, reference-level information is returned.
discard_multi_hits: Reads which align to greater than this number of references ambiguously will be discarded. Set this higher if your references have significant sequence similarity. This filter applies to individual reads, before coercing read-pair alignments via intersection.
require_valid_pair: Enable if you're aligning paired-end data and you only want to include pairs where both mates aligned successfully.
max_hits_to_report: Similar to discard_multi_hits, pairs which align to greater than this number of references ambiguously will be discarded. Set this higher if your references have significant sequence similarity. This filter applies to read-pairs, after coercing read-pair alignments via intersection.
trim_target_length: Target length for the read trimming step, balanced against the quality of the bases.
trim_strictness: Weight for base quality when trimming reads, balanced against the target length.

Reference Metadata

The second object is the reference metadata:

{
  "headers": ["reference_genome", "sequence_name", "nt_length", "sequence", ...]
  "columns": [[...], [...], [...], [...], ...]
}

This object contains a headers field and a columns field. headers is an array of strings that label the corresponding column in the columns field. The aligner must have at least reference_genome, sequence_name nt_length, and sequence headers, along with their corresponding columns.

reference_genome: name for the reference genome
sequence_name: name of the reference sequence, used to identify counts in the alignment output
nt_length: length of the sequence data
sequence: RNA string

The columns field is a multidimensional array of strings. Each sub-array corresponds to a header in the headers field.

To add another header/column pair (e.g. to add per-allele lineage or locus information), add a string to the headers array and add a column to the corresponding index in the columns field. However, you shouldn't need to directly edit this object -- nimble generate has an option for providing metadata via a .csv file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Library Format Specification

JSON Format

Aligner Configuration

Reference Metadata

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally