RNApysoforms is a Python package designed for visualizing RNA isoform structures and expression levels. Leveraging Plotly for interactive plotting and Polars for efficient data manipulation, it enables the creation of fast-rendering, interactive plots suitable for both local and web applications. Inspired by the R package ggtranscript, RNApysoforms brings similar RNA visualization capabilities to the Python ecosystem, facilitating effective exploration and presentation of RNA sequencing data.
https://doi.org/10.1093/bioadv/vbaf057
RNApysoforms expects feature start and end coordinates in GTF format, where coordinates are 1-indexed and inclusive on both ends.
You can install RNApysoforms using pip:
pip install RNApysoformsRescaling introns for a prettier RNA isoform structure plot
Plotting RNA isoform structure and expression
Plotting RNA isoform structure and normalized expression
Function documentation and vignettes
Please go through the documentation and vignettes before submitting an issue.
Contributions to RNApysoforms are welcome! Please feel free to submit a Pull Request.
The function implementations are under the src/RNApysoforms directory.
-
calculate_exon_number(): Assigns exon numbers to exons, CDS, and introns within a genomic annotation dataset based on transcript structure and strand direction.
-
gene_filtering(): Filters genomic annotations and optionally an expression matrix for a specific gene, with options to order and select top expressed transcripts.
-
make_plot(): Creates a Plotly figure panel for transcript structure plots and/or expression data plots.
-
make_traces(): Generates Plotly traces for visualizing transcript structures and expression data.
-
read_expression_matrix(): Loads and processes an expression matrix, optionally merging with metadata, performing CPM normalization, and calculating relative transcript abundance.
-
process_expression_matrix(): Same as
read_expression_matrix(), but takes a polars dataframe as input instead of a file path. -
read_ensembl_gtf(): Reads an ENSEMBL GTF (Gene Transfer Format) file and returns the data as a Polars DataFrame.
-
process_ensembl_gtf(): Same as
read_ensembl_gtf(), but takes a polars dataframe as input instead of a file path. -
shorten_gaps(): Shortens intron and transcript start gaps between exons in genomic annotations to enhance visualization.
-
to_intron(): Converts exon coordinates into corresponding intron coordinates within a genomic annotation dataset.
This project is licensed under the MIT License.
