-
Notifications
You must be signed in to change notification settings - Fork 0
Library Format Specification
A nimble library is a valid JSON file. The top-level of the file is an array containing two JSON objects.
The first object contains the aligner configuration:
{
"score_threshold": number,
"score_percent": number,
"num_mismatches": number,
"discard_multiple_matches": boolean,
"intersect_level: number",
"group_on": string,
"discard_multi_hits": number,
"require_valid_pair": boolean,
"max_hits_to_report": number,
"trim_target_length": number,
"trim_strictness": number
}
-
score_threshold: controls the score in base pairs an alignment needs to reach to be considered a match. For perfect matches, set this value equal to the length of the reads being aligned to the reference library. -
score_percent: controls the score in by percentage of read length that an alignment needs to reach to be considered a match. For perfect matches, set this value equal to 1.0. -
num_mismatches: sets the allowable number of mismatches during alignment. -
discard_multiple_matches: boolean flag for whether a read that matches multiple references should be retained. -
intersect_level: controls logic behind how to count matches during alignment. There are three intersect levels.intersect_level: 0takes the best matches from either the read or reverse read, determined by alignment score.intersect_level: 1takes the intersection between the read and reverse read -- if there is no intersection, it defaults to the best match.intersect_level: 2takes the intersection and reports no match if there is no intersection. -
group_on: if this is set to the name of a header in the reference metadata file, the output will be filtered to that level of specificity. For instance, if you've added a column with lineage information under a header called "lineage", setting"group_on": "lineage"will report lineage-level information, rather than the default case of reference-level information. If a single read matches onto thegroup_oncategory more than once during alignment (for instance, if a read matches multiple alleles in the same lineage and you're grouping on lineage), it will only count as one match and instead increment the score for that lineage. Ifgroup_onis unset, reference-level information is returned. -
discard_multi_hits: Reads which align to greater than this number of references ambiguously will be discarded. Set this higher if your references have significant sequence similarity. This filter applies to individual reads, before coercing read-pair alignments via intersection. -
require_valid_pair: Enable if you're aligning paired-end data and you only want to include pairs where both mates aligned successfully. -
max_hits_to_report: Similar todiscard_multi_hits, pairs which align to greater than this number of references ambiguously will be discarded. Set this higher if your references have significant sequence similarity. This filter applies to read-pairs, after coercing read-pair alignments via intersection. -
trim_target_length: Target length for the read trimming step, balanced against the quality of the bases. -
trim_strictness: Weight for base quality when trimming reads, balanced against the target length.
The second object is the reference metadata:
{
"headers": ["reference_genome", "sequence_name", "nt_length", "sequence", ...]
"columns": [[...], [...], [...], [...], ...]
}
This object contains a headers field and a columns field. headers is an array of strings that label the corresponding column in the columns field. The aligner must have at least reference_genome, sequence_name nt_length, and sequence headers, along with their corresponding columns.
-
reference_genome: name for the reference genome -
sequence_name: name of the reference sequence, used to identify counts in the alignment output -
nt_length: length of the sequence data -
sequence: RNA string
The columns field is a multidimensional array of strings. Each sub-array corresponds to a header in the headers field.
To add another header/column pair (e.g. to add per-allele lineage or locus information), add a string to the headers array and add a column to the corresponding index in the columns field. However, you shouldn't need to directly edit this object -- nimble generate has an option for providing metadata via a .csv file.