Allow expressions in data_summary() that return > 1 column summaries#673
Allow expressions in data_summary() that return > 1 column summaries#673strengejacke merged 33 commits intomainfrom
data_summary() that return > 1 column summaries#673Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
I think summary dfs should have one row (to be shape consistent), but allow multi-value expressions - just expanding them to columns. |
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Errors: library(datawizard)
set.seed(123)
d <- data.frame(
x = rnorm(100, 1, 1),
y = rnorm(100, 2, 2),
groups = rep(1:4, each = 25)
)
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
quant_y = quantile(y, c(0.1, 0.9)),
suffix = c("a", "b", "c")
)
#> Error:
#> ! Argument `suffix` must have the same length as the result of the
#> regarding summary expression. `suffix` has 3 elements (`a`, `b` and `c`)
#> for the expression `quantile(x, c(0.25, 0.75))`, which returned 2
#> values.
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
quant_y = quantile(y, c(0.1, 0.9)),
suffix = list(c("a", "b"), c("c", "d"), c("e", "f"))
)
#> Error:
#> ! If `suffix` is a list of character vectors, it should have the same
#> length as the number of expressions. `suffix` has 3 elements, but
#> there are 2 expressions.
data_summary(
mtcars,
n = unique(mpg),
j = c(min(am), max(am)),
by = c("am", "gear")
)
#> Error:
#> ! Each expression must return the same number of values for each group.
#> Some of the expressions seem to return varying numbers of values.Created on 2026-03-11 with reprex v2.1.1 |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@etiennebacher WDYT about the current implementation? It's no longer an additional "off-label-use" functionality like |
This comment was marked as outdated.
This comment was marked as outdated.
|
If the summary is a named vector can those be used as suffixes instead of the stuffix argument? |
|
Which behaviour would you suggest? To make it less complex, we could do: When the summary expression returns more than one value (and only then)
I would then not allow |
etiennebacher
left a comment
There was a problem hiding this comment.
Looks good to me but I'd like to wait for @mattansb's opinion before merging.
|
Sounds good to me, thanks! |
This comment was marked as outdated.
This comment was marked as outdated.
|
Here's the (hopefully) final implementation: library(datawizard)
set.seed(123)
d <- data.frame(
x = rnorm(100, 1, 1),
y = rnorm(100, 2, 2),
w = rnorm(100, 3, 0.5),
z = rnorm(100, 4, 3),
groups = rep(1:4, each = 25)
)
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75))
)
#> quant_x25% | quant_x75% | mean_x | quant_y25% | quant_y50% | quant_y75%
#> -----------------------------------------------------------------------
#> 0.51 | 1.69 | 1.09 | 0.40 | 1.55 | 2.94
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
fivenum = fivenum(y)
)
#> quant_x25% | quant_x75% | mean_x | fivenum_1 | fivenum_2 | fivenum_3
#> --------------------------------------------------------------------
#> 0.51 | 1.69 | 1.09 | -2.11 | 0.37 | 1.55
#>
#> quant_x25% | fivenum_4 | fivenum_5
#> ----------------------------------
#> 0.51 | 2.97 | 8.48
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75)),
suffix = list(quant_y = c("_Q1", "_Q2", "_Q3"))
)
#> quant_x25% | quant_x75% | mean_x | quant_y_Q1 | quant_y_Q2 | quant_y_Q3
#> -----------------------------------------------------------------------
#> 0.51 | 1.69 | 1.09 | 0.40 | 1.55 | 2.94
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75)),
suffix = list(quant_x = c("Q1", "Q3"), quant_y = c("_Q1", "_Q2", "_Q3"))
)
#> quant_xQ1 | quant_xQ3 | mean_x | quant_y_Q1 | quant_y_Q2 | quant_y_Q3
#> ---------------------------------------------------------------------
#> 0.51 | 1.69 | 1.09 | 0.40 | 1.55 | 2.94
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.5)),
quant_w = quantile(w, c(0.25, 0.5)),
quant_y = quantile(y, c(0.25, 0.5)),
quant_z = quantile(z, c(0.25, 0.5)),
suffix = c("_Q1", "_Q2")
)
#> quant_x_Q1 | quant_x_Q2 | quant_w_Q1 | quant_w_Q2 | quant_y_Q1 | quant_y_Q2
#> ---------------------------------------------------------------------------
#> 0.51 | 1.06 | 2.73 | 3.02 | 0.40 | 1.55
#>
#> quant_x_Q1 | quant_z_Q1 | quant_z_Q2
#> ------------------------------------
#> 0.51 | 1.81 | 3.99
# errors
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75)),
suffix = list(quant_xy = c("_Q1", "_Q2", "_Q3"))
)
#> Error:
#> ! Names of `suffix` must match the names of the expressions. Suffix
#> `quant_xy` has no corresponding expression.
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75)),
suffix = list(c("Q1", "Q3"), "mean", c("_Q1", "_Q2", "_Q3"))
)
#> Error:
#> ! All elements of `suffix` must have names.
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75)),
suffix = c("_Q1", "_Q2", "_Q3")
)
#> Error:
#> ! Argument `suffix` must have the same length as the result of the
#> corresponding summary expression. `suffix` has 3 elements (`_Q1`, `_Q2`
#> and `_Q3`) for the expression `quantile(x, c(0.25, 0.75))`, which
#> returned 2 values.
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75)),
suffix = list(quant_x = c("_Q1", "_Q2", "_Q3"))
)
#> Error:
#> ! Argument `suffix` must have the same length as the result of the
#> corresponding summary expression. `suffix` has 3 elements (`_Q1`, `_Q2`
#> and `_Q3`) for the expression `quantile(x, c(0.25, 0.75))`, which
#> returned 2 values.
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.75)),
mean_x = mean(x),
quant_y = quantile(y, c(0.25, 0.5, 0.75)),
suffix = list(quant_x = c("Q1", "Q3"), quant_y = c("_Q1", "_Q2", "_Q2"))
)
#> Error:
#> ! All suffixes for a single expression must be unique. Suffix for element
#> `quant_y` has duplicate values.
data_summary(
d,
quant_x = quantile(x, c(0.25, 0.5)),
quant_w = quantile(w, c(0.25, 0.5)),
quant_y = quantile(y, c(0.25, 0.5)),
quant_z = quantile(z, c(0.25, 0.5)),
suffix = c("_Q1", "_Q2", "_Q3")
)
#> Error:
#> ! Argument `suffix` must have the same length as the result of the
#> corresponding summary expression. `suffix` has 3 elements (`_Q1`, `_Q2`
#> and `_Q3`) for the expression `quantile(x, c(0.25, 0.5))`, which
#> returned 2 values.Created on 2026-03-12 with reprex v2.1.1 |
|
looks great! |
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
Revision