-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
I'm facing an issue when I summarize a data.table using a function inside the "by" clause
Here is an example:
> library(data.table)
> dt <- data.table(x = c(4, 5, 1, 3, 2), y = 1L, key = "x")
> dt
x y
1: 1 1
2: 2 1
3: 3 1
4: 4 1
5: 5 1
> str(dt)
Classes ‘data.table’ and 'data.frame': 5 obs. of 2 variables:
$ x: num 1 2 3 4 5
$ y: int 1 1 1 1 1
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "sorted")= chr "x"
> dt_sum <- dt[, .(.N), by = .(round(2 / x))]
> dt_sum
round N
1: 2 1
2: 1 2
3: 0 2
> str(dt_sum)
Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
$ round: num 2 1 0
$ N : int 1 2 2
- attr(*, "sorted")= chr "round"
- attr(*, ".internal.selfref")=<externalptr>
> dt_sum[round == 0] # ERROR
Empty data.table (0 rows and 2 cols): round,N
> dt_sum[round == 1] # CORRECT
round N
1: 1 2
> dt_sum[round == 2] # ERROR
Empty data.table (0 rows and 2 cols): round,N
I think the issue is here - attr(*, "sorted")= chr "round" because dt_sum isn't already sorted. I don't know if it's a known issue and there's documentation about it, I didn't find anything. Gretings!
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=es_MX.UTF-8 LC_NUMERIC=C LC_TIME=es_MX.UTF-8 LC_COLLATE=es_MX.UTF-8 LC_MONETARY=es_MX.UTF-8
[6] LC_MESSAGES=es_MX.UTF-8 LC_PAPER=es_MX.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_MX.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.6
loaded via a namespace (and not attached):
[1] compiler_4.2.2 tools_4.2.2