Skip to content

Added shallow search for data.table in tables()#7580

Open
manmita wants to merge 17 commits intomasterfrom
feat/adding_list_search_to_tables
Open

Added shallow search for data.table in tables()#7580
manmita wants to merge 17 commits intomasterfrom
feat/adding_list_search_to_tables

Conversation

@manmita
Copy link
Contributor

@manmita manmita commented Jan 9, 2026

Closes #2606

added arg depth = 1L to tables() one for shallow search
if depth is 0 then its the data.table
if depth is 1, we loop through list-like objects using is.list and which are not data.table or data.frame
if depth > 1, we throw error

added name for the nested list found parent[[1]] or parent$child
pre-allocating info to avoid reallocation cost

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

Hello,

I created a new PR in replacement of #7568

Reasons: There was some git issue there and the merge became too complex and I changed the algo because I didnt know previously that rbind or cbind would cost for re-allocation

The current PR considers that part and avoids appends

Previous PR : creating seperate data.table called info and rbind at the end
This PR: pre-allocates for a total-sized data.table and fills the info

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

In reply to previous comment of @jangorecki

An example of when this new feature could be useful?

To support lists which occur due to split.data.table or fread like the following

list(data.table(a = 1, b = 4:6)),
      data.table(a = 2, b = 7:10))

The original code supported data.table() top level and this code adds support for list(data.table) if the arg shallow_search = TRUE

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

Example of the original code and the new feature is as follows

> A = list(data.table(a = 1, b = 4:6),
      data.table(a = 2, b = 7:10))
> B = list(data.table(a = 1, b = 4:6), 1:5)
> C = data.table(a = 1, b = 4:6)
> tables()
   NAME NROW NCOL MB COLS    KEY
1:    C    3    2  0  a,b [NULL]
Total: 0MB using type_size
> tables(shallow_search = TRUE)
     NAME NROW NCOL MB COLS    KEY
1: A[[1]]    3    2  0  a,b [NULL]
2: A[[2]]    4    2  0  a,b [NULL]
3: B[[1]]    3    2  0  a,b [NULL]
4:      C    3    2  0  a,b [NULL]
Total: 0MB using type_size
> D = list(d = data.table(a = 1, b = 4:6), x = 1:5)
> tables(shallow_search = TRUE)
     NAME NROW NCOL MB COLS    KEY
1: A[[1]]    3    2  0  a,b [NULL]
2: A[[2]]    4    2  0  a,b [NULL]
3: B[[1]]    3    2  0  a,b [NULL]
4:      C    3    2  0  a,b [NULL]
5:    D$d    3    2  0  a,b [NULL]
Total: 0MB using type_size

tables() work same as before and tables(shallow_search = TRUE) searches 1 level

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.03%. Comparing base (1bd88cb) to head (c65ff92).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7580   +/-   ##
=======================================
  Coverage   99.02%   99.03%           
=======================================
  Files          87       87           
  Lines       16896    16937   +41     
=======================================
+ Hits        16732    16773   +41     
  Misses        164      164           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

No obvious timing issues in HEAD=feat/adding_list_search_to_tables
Comparison Plot

Generated via commit c65ff92

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 3 minutes and 1 seconds
Installing different package versions 22 seconds
Running and plotting the test cases 4 minutes and 2 seconds

#2606 tables() depth=1 finds nested data.tables in lists
# creating env so that the names are within it
xenv2 = new.env()
xenv2$DT = data.table(a = 1L)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: data.table style is to omit spaces before and after named arguments: data.table(a=1L)

test(2366.2, tables(env = xenv2, depth = 1L, index = TRUE)$INDICES, list(NULL, NULL, NULL, "b"))
setkey(xenv2$M$b, a)
test(2366.3, tables(env = xenv2, depth = 1L, index = TRUE)$KEY, list(NULL, NULL, NULL, "a"))
test(2366.4, tryCatch(tables(env = xenv2, depth = 2L), error = function(e) e$message), "depth > 1L is not implemented yet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use tryCatch like this in our suite; you can use the error= argument.

https://rdatatable.gitlab.io/data.table/reference/test.html

test(2366.4, tryCatch(tables(env = xenv2, depth = 2L), error = function(e) e$message), "depth > 1L is not implemented yet")
rm(xenv2)

# no data.table test and depth >1 test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is maybe in the wrong place?

# creating env so that the names are within it
xenv2 = new.env()
xenv2$DT = data.table(a = 1L)
xenv2$L = list(data.table(a = 1, b = 4:6), data.table(a = 2, b = 7:10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test should also include a further-nested table to demonstrate that depth=1L is honored:

xenv2$LL = list(list(data.table(a=1L, b=4:6)))

There, we'd need depth=2L to find the data.table, AIUI.


#2606 tables() depth=1 finds nested data.tables in lists
# creating env so that the names are within it
xenv2 = new.env()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why xenv2? Where is xenv?

\item{env}{ An \code{environment}, typically the \code{.GlobalEnv} by default, see Details. }
\item{silent}{ \code{logical}; should the output be printed? }
\item{index}{ \code{logical}; if \code{TRUE}, the column \code{INDICES} is added to indicate the indices assorted with each object, see \code{\link{indices}}. }
\item{depth}{\code{integer}; if \code{1L}, searches for \code{data.table} objects inside top-level lists. If depth = 0L it accepts data.table and Values greater than \code{1L} are not implemented yet.}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
\item{depth}{\code{integer}; if \code{1L}, searches for \code{data.table} objects inside top-level lists. If depth = 0L it accepts data.table and Values greater than \code{1L} are not implemented yet.}
\item{depth}{\code{integer}, default \code{0L}. Larger values govern the depth of search in recursive objects like lists or environments. For example, \code{depth=1L} will find all data.tables in the global environment as well as all data.tables in lists (but in lists of lists). NB: \code{depth > 1L} is not yet supported.}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tables could look for en-list-ed data.tables as well

2 participants