Open
Conversation
ESCRI11
approved these changes
Mar 30, 2026
Contributor
ESCRI11
left a comment
There was a problem hiding this comment.
Functions work as expected. Some comments anyways (mainly regarding mergeCpG function).
rmandgccalls are pointless. e.g. on line 322 you remove LL to allocate an empty vector to it on next line, the otherrmis before the function ends, hence variable will get out of scope inmediately and cleaned up by R automatically. R doesgcwhen in need, most of the times calling it manually adds unneeded overheads.- There are two loops where you are adding new elements on each iteration
LL[[ff[i]]]; this is the most inefficient way to do things on R. The first loop could easily be vectorized (grouped mean viarowmeans), the second given the operations of "collapse unique" with strings maybe it's slightly more difficult (and probably less gains). Simple vectorized example run on my laptop shows 100x performance increase on first loop. Would be good to explore this options for better code.
Contributor
Author
|
Contributor
|
@zitoa Quick idea of how first loop could be vectorized: ## small example data
X <- matrix(c(
0.1, 0.2, 0.3,
0.4, NA, 0.6,
0.7, 0.8, 0.9,
NA, 0.5, 0.4,
0.3, 0.3, 0.3,
0.9, 0.9, NA
), nrow = 6, byrow = TRUE, dimnames = list(
paste0("cg", 1:6),
paste0("S", 1:3)
))
gene_sym <- c("A", "A", "B", "B", "B", "C")
X
#> S1 S2 S3
#> cg1 0.1 0.2 0.3
#> cg2 0.4 NA 0.6
#> cg3 0.7 0.8 0.9
#> cg4 NA 0.5 0.4
#> cg5 0.3 0.3 0.3
#> cg6 0.9 0.9 NA
gene_sym
#> [1] "A" "A" "B" "B" "B" "C"
## current approach
collapse_loop <- function(X, gene_sym) {
ff <- unique(gene_sym)
LL <- list()
for (i in seq_along(ff)) {
jj <- which(gene_sym == ff[i])
LL[[ff[i]]] <- t(as.matrix(colMeans(X[jj, , drop = FALSE], na.rm = TRUE)))
}
res <- do.call(rbind, LL)
rownames(res) <- ff
res
}
## vectorized example
collapse_vectorized <- function(X, gene_sym) {
X_nona <- X
not_na <- !is.na(X)
X_nona[!not_na] <- 0
sums <- rowsum(X_nona, gene_sym, reorder = FALSE)
counts <- rowsum(not_na * 1, gene_sym, reorder = FALSE)
counts[counts == 0] <- NA # 0/0 -> NA, same as colMeans(na.rm=TRUE)
sums / counts
}
## compare
res_loop <- collapse_loop(X, gene_sym)
res_vec <- collapse_vectorized(X, gene_sym)
res_loop
#> S1 S2 S3
#> A 0.25 0.2000000 0.4500000
#> B 0.50 0.5333333 0.5333333
#> C 0.90 0.9000000 NaN
res_vec
#> S1 S2 S3
#> A 0.25 0.2000000 0.4500000
#> B 0.50 0.5333333 0.5333333
#> C 0.90 0.9000000 NA
all.equal(res_loop, res_vec, tolerance = 1e-12)
#> [1] TRUECreated on 2026-04-01 with reprex v2.1.1 |
Contributor
|
also, i dont understand the latest commits where you handle |
Contributor
Author
|
@ESCRI11 I set EPIC array & 850K array when i was testing playbase by script. it is in my list to stick only to EPIC. Will push. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Array methylation normalization functions. Playbase.