In working through the implications of implementing means in chunks, it is notable that once missing data is in play, we need to return two numbers from the reduce_chunk method: the sum, and the count, because means over chunks will be needed to be weighted by the actual number of values being meaned.
There are a number of ways we could implement this:
- Always return
(X, N), where X is the expected operation, and N the number of values contributing
- Only return
(X, N) when required (e.g. for means) otherwise return (X,None) or (X,)
- Return
X, except when it needs to be (X,N)
- Something else.
The something else option could be slightly more interesting: do we think it's a smart idea to say we could chain a series of methods and expect a series of results, in a lightweight sort of caching?
Obvious use cases would be:
- mean = sum, count
- range = min, max
- sqmean = sum(squares), sum, count
This could be facilitated by handing not just "a method" but a list of 1.. many methods, and expect back a list of 1..many results.
In working through the implications of implementing means in chunks, it is notable that once missing data is in play, we need to return two numbers from the
reduce_chunkmethod: thesum, and thecount, because means over chunks will be needed to be weighted by the actual number of values being meaned.There are a number of ways we could implement this:
(X, N), whereXis the expected operation, andNthe number of values contributing(X, N)when required (e.g. for means) otherwise return(X,None)or(X,)X, except when it needs to be(X,N)The something else option could be slightly more interesting: do we think it's a smart idea to say we could chain a series of methods and expect a series of results, in a lightweight sort of caching?
Obvious use cases would be:
This could be facilitated by handing not just "a method" but a list of 1.. many methods, and expect back a list of 1..many results.