consensusNetR/README.Rmd at main · Systems-Methods/consensusNetR · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  eval = FALSE
)
```

# consensusNetR


<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![Codecov test coverage](https://codecov.io/gh/Systems-Methods/consensusNetR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/Systems-Methods/consensusNetR?branch=main)
<!-- badges: end -->

consensusNetR is an R Package for combining networks into a consensus network based on the work of [Laura Cantini and Andrei Zinovyev](https://academic.oup.com/bioinformatics/article/35/21/4307/5426054). In addition to identifying consensus based on correlation of community meta-genes (loadings or membership scores), we also implement methods based on overlap.

## Installation

Install the version from BMS BioGit with:

```{r}
remotes::install_github(
  repo = "Systems-Methods/consensusNetR"
)
```

or:

```{r}
remotes::install_git(
  "https://github.com/Systems-Methods/consensusNetR"
)
```


# Example Workflow

This example will create a consensus network from three public datasets:
[GSE39582](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39582),
[TCGA COAD](https://portal.gdc.cancer.gov/projects/TCGA-COAD), and
[TCGA READ](https://portal.gdc.cancer.gov/projects/TCGA-READ)

This example workflow begins with
[icWGCNA](https://systems-methods.github.io/icWGCNA/) results, however
alternative methods work as well (KNN, PCA, ICA, WGCNA). See
[Example Appendix] for downloading data and icWGCNA run details and code.

## Consensus construction

```{r Consensus-construction}
# Create list of community_membership object
memb_list <- list(
  GSE39582 = GSE39582_icwgcna$community_membership,
  READ = read_icwgcna$community_membership,
  COAD = coad_icwgcna$community_membership
)

# Construct Meta Reciprocal Best Hits based on overlaps (318 communities found)
rbh <- construct_rbh_overlap_based(memb_list, top_n = 25)
nrow(rbh)
## 318

# RBH Heatmap Creation
plot_rbh(rbh = rbh, network_membership_list = memb_list)
```

![Reciprocal Best Hits Heatmap](man/figures/README-RBH-1.png)


```{r Consensus-construction2}
# Detect Communities in Adjacency/Reciprocal Best Hits Matrix
consensus_comms <- detect_consensus_communities(rbh)
# showing the first 10 communities
#note community 1 is a miscellaneous and will be removed
table(consensus_comms$Cluster)[1:10]
##    1   2   3   4   5   6   7   8   9  10
##  200   3   3   3   3   3   3   3   3   3

# Compute the average metagene across studies for each community
consensus_memb <- calc_consensus_memberships(consensus_comms, memb_list)
```

## Downstream Analysis

```{r Downstream-Analysis}
consensus_genes <- get_gene_community_membership(consensus_comms, memb_list, 2)
head(consensus_genes)
##    gene_id cluster n_studies
##  1    A1BG      43         2
##  2     A2M       1         2
##  3    AACS      19         2
##  4   AAGAB      25         2
##  5   AASDH      40         2
##  6    AASS      42         2

# Need to use icWGCNA for individual eigengenes
GSE39582_eigen <- icWGCNA::compute_eigengene_matrix(
  ex = GSE39582_df,
  membership_matrix = consensus_memb
)
read_eigen <- icWGCNA::compute_eigengene_matrix(
  ex = read_df,
  membership_matrix = consensus_memb
)
coad_eigen <- icWGCNA::compute_eigengene_matrix(
  ex = coad_df,
  membership_matrix = consensus_memb
)

eigen_list <- list(GSE39582_eigen, read_eigen, coad_eigen)
plot_consensus_eig_dist(eigen_list)
```

![Individual Eigengene Distributions](man/figures/README-eig-dist-1.png)


## Example Appendix

<details><summary>Downloading data</summary>

For GSE39582 we need to convert from Affymetrix Human Genome U133 Plus 2.0 Array
to gene symbols, by using the `icWGCNA::gene_mapping()` function. This matches
with the two TCGA datasets already in gene symbols.

```{r downloading-data}
library(icWGCNA)

# GSE39582
GSE39582 <- GEOquery::getGEO("GSE39582")

# TCGA READ
UCSCXenaTools::getTCGAdata(
  project = "READ",
  mRNASeq = TRUE,
  mRNASeqType = "normalized",
  clinical = TRUE,
  download = TRUE,
  destdir = "/MY_PATH/data/"
)

# TCGA COAD
UCSCXenaTools::getTCGAdata(
  project = "COAD",
  mRNASeq = TRUE,
  mRNASeqType = "normalized",
  clinical = TRUE,
  download = TRUE,
  destdir = "/MY_PATH/data/"
)
```

</details>

<details><summary> Preprocessing steps</summary>

All datasets must have consistent annotation (i.e. Gene symbols, Entrez,
Ensembl, ...). In this example we will convert GSE39582 to gene symbols using
the [icWGCNA::gene_mapping()](https://systems-methods.github.io/icWGCNA/reference/gene_mapping.html) function.

```{r gene-mapping}
# creating annotation file for gene mapping to gene symbols
GSE39582_annotation <- GSE39582@featureData@data |>
  dplyr::select(ID, gene_symbol = `Gene Symbol`) |>
  dplyr::mutate(
    gene_symbol = purrr::map(
      gene_symbol, ~ stringr::str_split(.x, " /// ")[[1]]
    )
  ) %>%
  tidyr::unnest(gene_symbol)

GSE39582_hugo <- icWGCNA::gene_mapping(
  GSE39582@assayData$exprs,
  GSE39582_annotation,
  compress_fun = "highest_mean",
  compress_trans = "log_exp"
)
```

All data should be normalized, however in this example we downloaded the
normalized data so no transformations needed here.

</details>


<details><summary> icWGCNA runs</summary>

For icWGCNA runs using defaults, except reducing max iterations to 5 for
demonstration purposes. These runs benefit greatly by using multiple
computer cores.

```{r icWGCNA}
# GSE39582
GSE39582_icwgcna <- icWGCNA::icwgcna(GSE39582_hugo, maxIt = 5)

# TCGA READ
read_icwgcna <- icWGCNA::icwgcna(read_df, maxIt = 5)

# TCGA COAD
coad_icwgcna <- icWGCNA::icwgcna(coad_df, maxIt = 5)
```

</details>


# Code of Conduct

Please note that the icWGCNA project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.