Skip to content

Commit ccc39fd

Browse files
LiNk-NYCopilot
andauthored
add separate section for Bioconductor classes and methods (#163)
* Create reusebioc from bioc-classes-methods.Rmd - move section to own chapter * add S4 blurb to motivation section * additional improvements - convert list to table * create a table of common classes * Update bioc-classes-methods.Rmd Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update bioc-classes-methods.Rmd Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent d70cce7 commit ccc39fd

3 files changed

Lines changed: 98 additions & 49 deletions

File tree

_bookdown.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ rmd_files: ["index.Rmd",
88
"devguide-introduction.Rmd",
99
"package-name.Rmd",
1010
"general-package-development.Rmd",
11-
"important-bioc-features.Rmd",
11+
"important-bioc-features.Rmd",
12+
"bioc-classes-methods.Rmd",
1213
"readme-file.Rmd",
1314
"description-file.Rmd",
1415
"namespace-file.Rmd",

bioc-classes-methods.Rmd

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Common Bioconductor Methods and Classes {#reusebioc}
2+
3+
## Motivation {#bioc-common-motivation}
4+
5+
Bioconductor is a large and diverse project with many packages that provide
6+
functionality for a wide range of biological data types and statistical methods.
7+
It has a rich set of classes and methods that are widely used across
8+
many packages. It is, therefore, important to reuse existing data classes and
9+
methods to ensure that packages are interoperable with the rest of the
10+
_Bioconductor_ software ecosystem. Central data representations allow users to
11+
readily integrate analysis workflows across multiple Bioconductor packages
12+
providing a more seamless user experience.
13+
14+
Many classes in Bioconductor are implemented using the S4 object-oriented
15+
system in R. The S4 system is particularly well-suited for the representation
16+
of complex genomic data structures. The initial motivations to use S4 in
17+
Bioconductor were centered around its benefits over other systems such as S3.
18+
These benefits include, but are not limited to, formal class definitions,
19+
multiple inheritance, and validity checking.
20+
21+
Although Bioconductor promotes the re-use of existing S4 classes to represent
22+
genomic data, there are cases where new classes are needed for cutting-edge
23+
technologies. In such cases, new classes should be developed, ideally, with
24+
open discussion and consideration of the Bioconductor community.
25+
26+
### Use Case: Importing data {#commonimport}
27+
28+
For developers who import data into their package, it is important to know which
29+
packages and methods are available for reuse. The following list provides
30+
commonly used packages and their methods to import various data types:
31+
32+
+ GTF, GFF, BED, BigWig, etc., -- `r BiocStyle::Biocpkg("rtracklayer")` `::import()`
33+
+ VCF -- `r BiocStyle::Biocpkg("VariantAnnotation")` `::readVcf()`
34+
+ SAM / BAM -- `r BiocStyle::Biocpkg("Rsamtools")` `::scanBam()`,
35+
`r BiocStyle::Biocpkg("GenomicAlignments")` `::readGAlignment*()`
36+
+ FASTA -- `r BiocStyle::Biocpkg("Biostrings")` `::readDNAStringSet()`
37+
+ FASTQ -- `r BiocStyle::Biocpkg("ShortRead")` `::readFastq()`
38+
+ MS data (XML-based and mgf formats) -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()`,
39+
`r BiocStyle::Biocpkg("Spectra")` `::Spectra(source = MsBackendMgf::MsBackendMgf())`
40+
41+
This list is not exhaustive, and developers are encouraged to initiate dialogue
42+
with other community members to identify additional packages and methods that
43+
may be useful for their specific use case. We acknowledge that class and method
44+
discoverability can be a challenge and we are working to improve this aspect of
45+
the Bioconductor project.
46+
47+
### Common Classes {#commonclass}
48+
49+
The following table, though certainly not exhaustive, provides select classes
50+
and constructor functions to represent genomic data:
51+
52+
| Data Type | Package and Function | Description |
53+
|-------------------------------|----------------------------------------------------------|--------------------------------------------------------|
54+
| Rectangular feature by sample | `r BiocStyle::Biocpkg("SummarizedExperiment")` `::SummarizedExperiment()` | RNAseq count matrix, microarray, etc. |
55+
| Genomic coordinates | `r BiocStyle::Biocpkg("GenomicRanges")` `::GRanges()` | 1-based, closed interval genomic coordinates |
56+
| Genomic coordinates (multiple)| `r BiocStyle::Biocpkg("GenomicRanges")` `::GRangesList()` | Genomic coordinates from multiple samples |
57+
| Ragged genomic coordinates | `r BiocStyle::Biocpkg("RaggedExperiment")` `::RaggedExperiment()` | Ragged (variable length) genomic coordinates |
58+
| DNA/RNA/AA sequences | `r BiocStyle::Biocpkg("Biostrings")` `::*StringSet()` | DNA, RNA, or amino acid sequences |
59+
| Gene sets | `r BiocStyle::Biocpkg("BiocSet")` `::BiocSet()`, <br>`r BiocStyle::Biocpkg("GSEABase")` `::GeneSet()`, <br>`r BiocStyle::Biocpkg("GSEABase")` `::GeneSetCollection()` | Collections of gene sets |
60+
| Multi-omics data | `r BiocStyle::Biocpkg("MultiAssayExperiment")` `::MultiAssayExperiment()` | Data integrating multiple omics assays |
61+
| Single cell data | `r BiocStyle::Biocpkg("SingleCellExperiment")` `::SingleCellExperiment()` | Single-cell expression and related data |
62+
| Mass spec data | `r BiocStyle::Biocpkg("Spectra")` `::Spectra()` | Mass spectrometry data |
63+
| File formats | `r BiocStyle::Biocpkg("BiocIO")` `::BiocFile-class` | Classes for interacting with various biological data file formats |
64+
65+
Search [biocViews][] for other classes and methods that may be useful for your
66+
package.
67+
68+
## Package Submission Considerations
69+
70+
Bioconductor strives for interoperability across packages, and package
71+
submissions are generally not accepted unless they demonstrate such
72+
interoperability, typically by reusing existing Bioconductor classes and
73+
methods where appropriate. Submissions that introduce new classes or data
74+
structures must provide strong justification and clearly describe how they
75+
interoperate with existing Bioconductor infrastructure.
76+
77+
In the case where the data does not conform to an existing data class,
78+
we recommend discussing the design of a new class with the Bioconductor
79+
community. The open discussion can take place on main Bioconductor communication
80+
channels such as the [bioc-devel][bioc-devel-mail] mailing list, or the
81+
Bioconductor community Slack.
82+
83+
## Package Implementations
84+
85+
The following packages are examples of packages that reuse Bioconductor classes
86+
and methods:
87+
88+
| package | inherits classes and methods from: |
89+
|---|---|
90+
| `r BiocStyle::Biocpkg("DESeq2")` | `r BiocStyle::Biocpkg("SummarizedExperiment")`, `r BiocStyle::Biocpkg("GenomicRanges")` |
91+
| `r BiocStyle::Biocpkg("GenomicAlignments")` | `r BiocStyle::Biocpkg("GenomicRanges")`, `r BiocStyle::Biocpkg("Rsamtools")` |
92+
| `r BiocStyle::Biocpkg("VariantAnnotation")` | `r BiocStyle::Biocpkg("GenomicRanges")`, `r BiocStyle::Biocpkg("SummarizedExperiment")`, `r BiocStyle::Biocpkg("Rsamtools")` |

important-bioc-features.Rmd

Lines changed: 4 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -71,55 +71,11 @@ acceptable list, please email <bioc-devel@r-project.org> requesting the new
7171
`biocViews` term, under which hierarchy the term should be placed, and the
7272
justification for the new term.
7373

74+
## Common Bioconductor Methods and Classes
7475

75-
## Common Bioconductor Methods and Classes {#reusebioc}
76-
77-
We strongly recommend reusing existing methods for importing data, and
78-
reusing established classes for representing data. Here are some
79-
suggestions for importing different file types and commonly used
80-
_Bioconductor_ classes. For more classes and functionality also try
81-
searching in [biocViews][] for your data type.
82-
83-
### Importing data {#commonimport}
84-
85-
+ GTF, GFF, BED, BigWig, etc., -- `r BiocStyle::Biocpkg("rtracklayer")` `::import()`
86-
+ VCF -- `r BiocStyle::Biocpkg("VariantAnnotation")` `::readVcf()`
87-
+ SAM / BAM -- `r BiocStyle::Biocpkg("Rsamtools")` `::scanBam()`,
88-
`r BiocStyle::Biocpkg("GenomicAlignments")` `::readGAlignment*()`
89-
+ FASTA -- `r BiocStyle::Biocpkg("Biostrings")` `::readDNAStringSet()`
90-
+ FASTQ -- `r BiocStyle::Biocpkg("ShortRead")` `::readFastq()`
91-
+ MS data (XML-based and mgf formats) -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()`,
92-
`r BiocStyle::Biocpkg("Spectra")` `::Spectra(source = MsBackendMgf::MsBackendMgf())`
93-
94-
95-
### Common Classes {#commonclass}
96-
97-
+ Rectangular feature x sample data --
98-
`r BiocStyle::Biocpkg("SummarizedExperiment")` `::SummarizedExperiment()`
99-
(RNAseq count matrix, microarray, ...)
100-
+ Genomic coordinates -- `r BiocStyle::Biocpkg("GenomicRanges")` `::GRanges()`
101-
(1-based, closed interval)
102-
+ Genomic coordinates from multiple samples --
103-
`r BiocStyle::Biocpkg("GenomicRanges")` `::GRangesList()`
104-
+ Ragged genomic coordinates -- `r BiocStyle::Biocpkg("RaggedExperiment")`
105-
`::RaggedExperiment()`
106-
+ DNA / RNA / AA sequences -- `r BiocStyle::Biocpkg("Biostrings")`
107-
`::*StringSet()`
108-
+ Gene sets -- `r BiocStyle::Biocpkg("BiocSet")` `::BiocSet()`,
109-
`r BiocStyle::Biocpkg("GSEABase")` `::GeneSet()`,
110-
`r BiocStyle::Biocpkg("GSEABase")` `::GeneSetCollection()`
111-
+ Multi-omics data --
112-
`r BiocStyle::Biocpkg("MultiAssayExperiment")` `::MultiAssayExperiment()`
113-
+ Single cell data --
114-
`r BiocStyle::Biocpkg("SingleCellExperiment")` `::SingleCellExperiment()`
115-
+ Spatial transcriptomics data --
116-
`r BiocStyle::Biocpkg("SpatialExperiment")` `::SpatialExperiment()`
117-
+ Mass spec data -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()`
118-
+ File formats -- `r BiocStyle::Biocpkg("BiocIO")` `` ::`BiocFile-class` ``
119-
120-
121-
In general, a package will not be accepted if it does not show interoperability
122-
with the current [_Bioconductor_][] ecosystem.
76+
Visit the [Common Bioconductor Methods and Classes][bioc-common] page for a
77+
list of common methods and classes used in Bioconductor packages and for
78+
a more in-depth overview for supporting Bioconductor classes in packages.
12379

12480
## Vignette {#bioc-vignette}
12581

0 commit comments

Comments
 (0)