col_paths is overly expensive

Currently `col_paths` is implemented as 

```
function (x) 
{
    if (!is(coltree(x), "LayoutColTree")) {
        stop("I don't know how to extract the column paths from an object of class ", 
            class(x))
    }
    make_col_df(x, visible_only = TRUE)$path
}
```

The problem is `make_col_df` does a ton of other things unrelated to column paths. Combine this with the fact that pruning or scoring functions may need to call col_paths for every row of a table (if implemented naively) and this gives rise to a situation where for large tables we have seen repeated `col_paths` calls take up to 50% of the total pruning/sorting time, when each call in those contexts is *guaranteed to return the same set of paths* making that time entirely wasted. 

I propose we extend the `InstantiatedColumnInfo` class to cache its set of column paths the way it already does for column subset expressions.  This would make repeated `col_paths` calls acceptable as each one is effectively free. 

In fact, the result of `make_col_df` doesn't depend on font the way `make_row_df` does, so I think we could consider caching the full result of `make_col_df` rather than just the col_paths...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

col_paths is overly expensive #1035

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

col_paths is overly expensive #1035

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions