Currently col_paths is implemented as
function (x)
{
if (!is(coltree(x), "LayoutColTree")) {
stop("I don't know how to extract the column paths from an object of class ",
class(x))
}
make_col_df(x, visible_only = TRUE)$path
}
The problem is make_col_df does a ton of other things unrelated to column paths. Combine this with the fact that pruning or scoring functions may need to call col_paths for every row of a table (if implemented naively) and this gives rise to a situation where for large tables we have seen repeated col_paths calls take up to 50% of the total pruning/sorting time, when each call in those contexts is guaranteed to return the same set of paths making that time entirely wasted.
I propose we extend the InstantiatedColumnInfo class to cache its set of column paths the way it already does for column subset expressions. This would make repeated col_paths calls acceptable as each one is effectively free.
In fact, the result of make_col_df doesn't depend on font the way make_row_df does, so I think we could consider caching the full result of make_col_df rather than just the col_paths...
Currently
col_pathsis implemented asThe problem is
make_col_dfdoes a ton of other things unrelated to column paths. Combine this with the fact that pruning or scoring functions may need to call col_paths for every row of a table (if implemented naively) and this gives rise to a situation where for large tables we have seen repeatedcol_pathscalls take up to 50% of the total pruning/sorting time, when each call in those contexts is guaranteed to return the same set of paths making that time entirely wasted.I propose we extend the
InstantiatedColumnInfoclass to cache its set of column paths the way it already does for column subset expressions. This would make repeatedcol_pathscalls acceptable as each one is effectively free.In fact, the result of
make_col_dfdoesn't depend on font the waymake_row_dfdoes, so I think we could consider caching the full result ofmake_col_dfrather than just the col_paths...