|
| 1 | +# Common Bioconductor Methods and Classes {#reusebioc} |
| 2 | + |
| 3 | +## Motivation {#bioc-common-motivation} |
| 4 | + |
| 5 | +Bioconductor is a large and diverse project with many packages that provide |
| 6 | +functionality for a wide range of biological data types and statistical methods. |
| 7 | +It has a rich set of classes and methods that are widely used across |
| 8 | +many packages. It is, therefore, important to reuse existing data classes and |
| 9 | +methods to ensure that packages are interoperable with the rest of the |
| 10 | +_Bioconductor_ software ecosystem. Central data representations allow users to |
| 11 | +readily integrate analysis workflows across multiple Bioconductor packages |
| 12 | +providing a more seamless user experience. |
| 13 | + |
| 14 | +Many classes in Bioconductor are implemented using the S4 object-oriented |
| 15 | +system in R. The S4 system is particularly well-suited for the representation |
| 16 | +of complex genomic data structures. The initial motivations to use S4 in |
| 17 | +Bioconductor were centered around its benefits over other systems such as S3. |
| 18 | +These benefits include, but are not limited to, formal class definitions, |
| 19 | +multiple inheritance, and validity checking. |
| 20 | + |
| 21 | +Although Bioconductor promotes the re-use of existing S4 classes to represent |
| 22 | +genomic data, there are cases where new classes are needed for cutting-edge |
| 23 | +technologies. In such cases, new classes should be developed, ideally, with |
| 24 | +open discussion and consideration of the Bioconductor community. |
| 25 | + |
| 26 | +### Use Case: Importing data {#commonimport} |
| 27 | + |
| 28 | +For developers who import data into their package, it is important to know which |
| 29 | +packages and methods are available for reuse. The following list provides |
| 30 | +commonly used packages and their methods to import various data types: |
| 31 | + |
| 32 | ++ GTF, GFF, BED, BigWig, etc., -- `r BiocStyle::Biocpkg("rtracklayer")` `::import()` |
| 33 | ++ VCF -- `r BiocStyle::Biocpkg("VariantAnnotation")` `::readVcf()` |
| 34 | ++ SAM / BAM -- `r BiocStyle::Biocpkg("Rsamtools")` `::scanBam()`, |
| 35 | + `r BiocStyle::Biocpkg("GenomicAlignments")` `::readGAlignment*()` |
| 36 | ++ FASTA -- `r BiocStyle::Biocpkg("Biostrings")` `::readDNAStringSet()` |
| 37 | ++ FASTQ -- `r BiocStyle::Biocpkg("ShortRead")` `::readFastq()` |
| 38 | ++ MS data (XML-based and mgf formats) -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()`, |
| 39 | + `r BiocStyle::Biocpkg("Spectra")` `::Spectra(source = MsBackendMgf::MsBackendMgf())` |
| 40 | + |
| 41 | +This list is not exhaustive, and developers are encouraged to initiate dialogue |
| 42 | +with other community members to identify additional packages and methods that |
| 43 | +may be useful for their specific use case. We acknowledge that class and method |
| 44 | +discoverability can be a challenge and we are working to improve this aspect of |
| 45 | +the Bioconductor project. |
| 46 | + |
| 47 | +### Common Classes {#commonclass} |
| 48 | + |
| 49 | +The following table, though certainly not exhaustive, provides select classes |
| 50 | +and constructor functions to represent genomic data: |
| 51 | + |
| 52 | +| Data Type | Package and Function | Description | |
| 53 | +|-------------------------------|----------------------------------------------------------|--------------------------------------------------------| |
| 54 | +| Rectangular feature by sample | `r BiocStyle::Biocpkg("SummarizedExperiment")` `::SummarizedExperiment()` | RNAseq count matrix, microarray, etc. | |
| 55 | +| Genomic coordinates | `r BiocStyle::Biocpkg("GenomicRanges")` `::GRanges()` | 1-based, closed interval genomic coordinates | |
| 56 | +| Genomic coordinates (multiple)| `r BiocStyle::Biocpkg("GenomicRanges")` `::GRangesList()` | Genomic coordinates from multiple samples | |
| 57 | +| Ragged genomic coordinates | `r BiocStyle::Biocpkg("RaggedExperiment")` `::RaggedExperiment()` | Ragged (variable length) genomic coordinates | |
| 58 | +| DNA/RNA/AA sequences | `r BiocStyle::Biocpkg("Biostrings")` `::*StringSet()` | DNA, RNA, or amino acid sequences | |
| 59 | +| Gene sets | `r BiocStyle::Biocpkg("BiocSet")` `::BiocSet()`, <br>`r BiocStyle::Biocpkg("GSEABase")` `::GeneSet()`, <br>`r BiocStyle::Biocpkg("GSEABase")` `::GeneSetCollection()` | Collections of gene sets | |
| 60 | +| Multi-omics data | `r BiocStyle::Biocpkg("MultiAssayExperiment")` `::MultiAssayExperiment()` | Data integrating multiple omics assays | |
| 61 | +| Single cell data | `r BiocStyle::Biocpkg("SingleCellExperiment")` `::SingleCellExperiment()` | Single-cell expression and related data | |
| 62 | +| Mass spec data | `r BiocStyle::Biocpkg("Spectra")` `::Spectra()` | Mass spectrometry data | |
| 63 | +| File formats | `r BiocStyle::Biocpkg("BiocIO")` `::BiocFile-class` | Classes for interacting with various biological data file formats | |
| 64 | + |
| 65 | +Search [biocViews][] for other classes and methods that may be useful for your |
| 66 | +package. |
| 67 | + |
| 68 | +## Package Submission Considerations |
| 69 | + |
| 70 | +Bioconductor strives for interoperability across packages, and package |
| 71 | +submissions are generally not accepted unless they demonstrate such |
| 72 | +interoperability, typically by reusing existing Bioconductor classes and |
| 73 | +methods where appropriate. Submissions that introduce new classes or data |
| 74 | +structures must provide strong justification and clearly describe how they |
| 75 | +interoperate with existing Bioconductor infrastructure. |
| 76 | + |
| 77 | +In the case where the data does not conform to an existing data class, |
| 78 | +we recommend discussing the design of a new class with the Bioconductor |
| 79 | +community. The open discussion can take place on main Bioconductor communication |
| 80 | +channels such as the [bioc-devel][bioc-devel-mail] mailing list, or the |
| 81 | +Bioconductor community Slack. |
| 82 | + |
| 83 | +## Package Implementations |
| 84 | + |
| 85 | +The following packages are examples of packages that reuse Bioconductor classes |
| 86 | +and methods: |
| 87 | + |
| 88 | +| package | inherits classes and methods from: | |
| 89 | +|---|---| |
| 90 | +| `r BiocStyle::Biocpkg("DESeq2")` | `r BiocStyle::Biocpkg("SummarizedExperiment")`, `r BiocStyle::Biocpkg("GenomicRanges")` | |
| 91 | +| `r BiocStyle::Biocpkg("GenomicAlignments")` | `r BiocStyle::Biocpkg("GenomicRanges")`, `r BiocStyle::Biocpkg("Rsamtools")` | |
| 92 | +| `r BiocStyle::Biocpkg("VariantAnnotation")` | `r BiocStyle::Biocpkg("GenomicRanges")`, `r BiocStyle::Biocpkg("SummarizedExperiment")`, `r BiocStyle::Biocpkg("Rsamtools")` | |
0 commit comments