-
-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Core Infrastructure
using Async where possible, parsing bioinformatic file types
- Genbank parser
- Embl parser
- Conversions of formats into gbk/embl/fasta/ffn/faa and save as gff/gbk or embl
- CI/CD with github actions and write tests - ongoing
- Documentation with examples - ongoing
Format expansion
Integration of common types and parsers such as:
- VCF
- BLAST output in various modes, -5 (XML) and -6 (one line per hit), compat with Diamond, MMSeqs2
- GTF
- link ups with formats from rust-bio, noodles-vcf
- look into metabolomics options (like KEGG) some are not open-source
- fastq gzipped version parsing
- SAM format parser
- RPKM output parser
- Support for other compressed files such as BAM and CRAM
- Writer support for those types
Advanced features
Protein parameters such as:
- Hydrophobicity
- Molecular weight
- Counted amino acids & as percentage
- Aromaticity
- Parsing phylogenetic trees
- Parsing multiple sequence alignments
- Methylation data parsing
- Consider pyo3 integration
Testing - bioinformatics
- Testing & updating parsers with various edge case files such as gbk with the CONTIG(JOIN) structure
Data Viz
- Example heatmap with WASM and Js example
- Simple graph types bar, line, scatter
ZEZE1020
Metadata
Metadata
Assignees
Labels
No labels