Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 13 additions & 14 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,29 +131,28 @@ Example datasets for testing: `law_data()`, `mouse_data()`,

Docs are served from a DigitalOcean droplet via nginx. The deploy
workflow (`.github/workflows/deploy_docs.yml`) fires on pushes to
master that touch `docs/**`. It SSHes to the droplet and runs
`git pull --ff-only` — it does **not** run `make html`.
master that touch `docs/**`, or via manual `workflow_dispatch`. The
workflow runs `cd docs && uv run make html` in CI and then rsyncs
`docs/_build/html/` to the droplet over SSH (with `--delete`), using
the `DOCS_DEPLOY_KEY`, `DOCS_USER`, and `DOCS_HOST` repository
secrets.

**`docs/_build/html/` must therefore be committed to git.** The
`.gitignore` does not exclude it. When editing docs:
**Build artifacts are not committed.** `docs/_build/` is in
`.gitignore`; CI rebuilds on every deploy. When editing docs:

```bash
# 1. Edit RST sources under docs/
# 2. Rebuild
# 2. (Optional) build locally to catch RST errors before pushing
cd docs && uv run make html

# 3. Stage sources and build artifacts together
# 3. Commit and push the source changes only
git add docs/<changed-source-files>
git add -f docs/_build/html/ # -f needed for any new files
git rm --cached docs/_build/html/.buildinfo.bak 2>/dev/null || true # exclude ephemeral artifact

# 4. Commit and push to master
git commit -m "..."
git push
```

`bootstrap-stat` is a submodule of the nginx-proxy repo on the
droplet. Each pull updates the working tree, but the parent repo's
submodule pointer goes stale — cosmetic only; nginx serves from the
working tree directly.
Pushing to master with any change under `docs/**` is sufficient to
trigger a rebuild and redeploy.

## References

Expand Down
3 changes: 3 additions & 0 deletions docs/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@ References
Accuracy Estimation and Model Selection." In *Proceedings of the 14th
International Joint Conference on Artificial Intelligence (IJCAI)*,
Vol. 2, 1995, pp. 1137–1143.

.. [SA22] Mårten Schultzberg and Sebastian Ankargren. "Resampling-Free
Bootstrap Inference for Quantiles." arXiv:2202.10992, 2022.
6 changes: 4 additions & 2 deletions docs/user_guides/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ User Guides
In-depth treatments of the bootstrap methods in this library, each
worked through on a concrete example. The first five guides cover
the core inferential machinery on small datasets from [ET93]_; the
last describes the *zero-inflated bootstrap*, a recipe for efficient
inference on severely zero-inflated data. The guides are
remaining guides describe specialized recipes — the *zero-inflated
bootstrap* for severely zero-inflated data, and a resampling-free
recipe for bootstrap quantile inference at web scale. The guides are
self-contained and may be read in any order.

.. toctree::
Expand All @@ -17,3 +18,4 @@ self-contained and may be read in any order.
bias
significance
zero_inflated
quantiles_at_scale
285 changes: 285 additions & 0 deletions docs/user_guides/quantiles_at_scale.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
Bootstrap Quantiles at Scale
============================

Quantiles are the natural target for inference on heavy-tailed data:
the 95th-percentile request latency, the median order value, the
99th-percentile rendering time. The naive bootstrap recipe applies —
draw :math:`B` samples of size :math:`n` with replacement, compute
the quantile on each, read off the percentiles of the resulting
distribution — and the library supports it directly. But it scales
as :math:`O(B \cdot n)` per replicate (or :math:`O(B \cdot n \log n)`
once sorting is included), and on web-scale data with :math:`n` in
the millions the cost becomes prohibitive: a single 95% confidence
interval on a 10-million-observation sample takes minutes when the
sort is repeated for each replicate.

Schultzberg and Ankargren [SA22]_ — engineers at Spotify — observed
that for sample quantiles, the bootstrap loop has redundant
structure that can be eliminated. The Poisson-bootstrap distribution
of the index of the order statistic that ends up *being* the q-th
quantile of a bootstrap replicate is, for large :math:`n`,
approximately :math:`\mathrm{Bin}(n+1, q)`. So one can sort the
original sample once, sample :math:`B` indices from a binomial, and
look up the corresponding order statistics. No resampling, no
per-replicate sort, no per-replicate quantile evaluation.

The technique is small enough — a few lines of NumPy — that the
library does not ship a dedicated function for it. The bootstrap
machinery already accepts a pre-computed array of bootstrap
statistics through the ``theta_star`` argument of
:func:`~bootstrap_stat.confidence.percentile_interval` and
:func:`~bootstrap_stat.confidence.bcanon_interval`, so the trick
composes cleanly with the existing confidence-interval API. This
guide walks through the construction.

The trick
---------

Let :math:`X_1, \ldots, X_n` be the original sample and write
:math:`X_{(1)} \le \cdots \le X_{(n)}` for the order statistics. A
*Poisson bootstrap* sample assigns each :math:`X_i` an independent
:math:`\mathrm{Poisson}(1)` count :math:`N_i` and constructs the
replicate by including :math:`X_i` exactly :math:`N_i` times. Total
sample size is :math:`M = \sum_i N_i \sim \mathrm{Poisson}(n)`,
random rather than fixed at :math:`n` — this is the price the
Poisson bootstrap pays for the independence between the
:math:`N_i`. For large :math:`n` the Poisson and multinomial
bootstraps agree to leading order; both target the same sampling
distribution.

Now consider the q-th quantile of the bootstrap replicate. Sort the
distinct values; the bootstrap quantile is :math:`X_{(I)}` for some
random index :math:`I`. The index :math:`I` is determined by where
the cumulative bootstrap weight first crosses :math:`q M`. With iid
Poisson counts, [SA22]_ show that for large :math:`n`,

.. math::

I \;\approx\; \mathrm{Bin}(n + 1,\, q).

The :math:`n+1` (rather than :math:`n`) is a small-sample correction
that aligns the discrete approximation with the continuous Beta
distribution of the corresponding population quantile under
sampling from :math:`\hat{F}`; the approximation is excellent for
:math:`n` even in the low hundreds. The implication is operational:
the entire bootstrap distribution of :math:`\hat{q}^*` lives in the
sorted original sample, indexed by a binomial random variable.

The example
-----------

We illustrate on synthetic latency data: :math:`n = 1{,}000{,}000`
draws from a lognormal distribution with location 4 and scale 1
(median :math:`\approx 55`, p95 :math:`\approx 283`).

.. code-block:: python

import numpy as np
import bootstrap_stat as bp

rng = np.random.default_rng(0)
n = 1_000_000
data = rng.lognormal(mean=4.0, sigma=1.0, size=n)

The target is a 90% confidence interval on the 95th percentile.

The naive bootstrap (baseline)
------------------------------

The library supports this directly:

.. code-block:: python

dist = bp.EmpiricalDistribution(data)

def p95(x):
return float(np.quantile(x, 0.95))

lo, hi = bp.percentile_interval(dist, p95, B=500)
# CI ≈ (282.55, 284.50) ~9.7 s

Each of the 500 replicates draws :math:`n = 10^6` observations with
replacement and computes their 95th percentile. NumPy's
``np.quantile`` is :math:`O(n)` (via ``np.partition``), but the
draw-with-replacement step itself is the dominant cost, and it is
repeated :math:`B` times. The wall time is several seconds, and it
scales linearly in :math:`n`.

The optimized recipe
--------------------

The Schultzberg–Ankargren construction reduces the bootstrap step to
a sort plus :math:`B` binomial draws plus :math:`B` lookups:

.. code-block:: python

def fast_quantile_bootstrap(data, q, B, rng=None):
rng = rng if rng is not None else np.random.default_rng()
n = len(data)
sorted_data = np.sort(data)
idx = rng.binomial(n + 1, q, size=B)
idx = np.clip(idx, 0, n - 1) # guard the boundary
return sorted_data[idx]

The function returns the bootstrap distribution of :math:`\hat{q}^*`
directly. To turn it into a confidence interval, hand it to the
library's existing percentile-interval routine via the
``theta_star`` argument:

.. code-block:: python

theta_star = fast_quantile_bootstrap(data, q=0.95, B=500, rng=rng)
lo, hi = bp.percentile_interval(dist, p95, theta_star=theta_star)
# CI ≈ (282.55, 284.52) ~0.010 s (~970× faster)

When ``theta_star`` is supplied, ``percentile_interval`` skips the
resampling step entirely and reads the percentiles off the array
that was passed in. The ``dist`` and ``stat`` arguments are still
required for API symmetry but go unused on this path; ``stat`` is
never called.

The interval matches the naive computation to Monte Carlo noise. The
agreement is not coincidental — both target the same bootstrap
distribution; the difference is between sampling from it the long
way and sampling from a known closed-form approximation to its
quantile-position.

Difference in quantiles
-----------------------

The Spotify use case is the difference of a quantile across two
arms of an A/B test. The construction generalizes immediately: sort
each arm once, sample independent binomial indices for each, and
take the difference of the looked-up order statistics.

.. code-block:: python

n = 200_000
ctrl = rng.lognormal(mean=4.00, sigma=1.0, size=n)
trt = rng.lognormal(mean=4.05, sigma=1.0, size=n)

sc = np.sort(ctrl)
st = np.sort(trt)
B = 500

idx_c = np.clip(rng.binomial(n + 1, 0.95, size=B), 0, n - 1)
idx_t = np.clip(rng.binomial(n + 1, 0.95, size=B), 0, n - 1)
theta_star = st[idx_t] - sc[idx_c]

lo, hi = np.quantile(theta_star, [0.05, 0.95])
# CI ≈ (11.17, 16.87) ~0.003 s (vs naive ~3.5 s, ~1130× faster)

The two arms are independent under the null and under the
alternative, so independent binomial draws across arms is correct.
For paired or stratified designs the index distributions are no
longer independent; the technique still applies but the joint
distribution must be specified accordingly, which is no longer a
one-liner.

Scaling
-------

Wall times for ``B = 500`` on a single-arm 95th-percentile
confidence interval, lognormal data:

.. list-table::
:header-rows: 1
:widths: 15 18 18 14

* - :math:`n`
- Naive
- Fast
- Speedup
* - 10,000
- 0.18 s
- 0.0004 s
- 485×
* - 100,000
- 1.16 s
- 0.0009 s
- 1340×
* - 1,000,000
- 9.74 s
- 0.010 s
- 980×
* - 10,000,000
- ≳ 100 s (estimated)
- 0.49 s
- ≳ 200×

The naive method is :math:`O(B \cdot n)` and tracks linearly in
:math:`n`; the fast method is dominated by the one-time sort
(:math:`O(n \log n)`) plus :math:`O(B)` lookups, and so the speedup
*ratio* shrinks at very large :math:`n` while the *absolute* time
stays small enough to be negligible in any analysis pipeline. At
:math:`n = 10^7` a single-quantile confidence interval comes back in
under a second, against the better part of two minutes for the
naive bootstrap.

Why this is not a library function
-----------------------------------

The optimization is roughly ten lines of NumPy. The corresponding
generic library function would have to expose a quantile-only API
and parameterize over the small set of decisions a caller might
plausibly want to vary — single quantile vs. difference-in-quantile
vs. multiple quantiles, percentile vs. BCa interval, optional RNG,
optional pre-sorted input. The result would be considerably larger
than the recipe itself, and would obscure the fact that the trick is
exact-modulo-Poisson-approximation rather than something the
library is doing on the user's behalf.

The library's design choice is to expose the composable hooks —
``theta_star`` and ``jv`` on the confidence-interval functions, plus
the underlying :class:`~bootstrap_stat.distributions.EmpiricalDistribution`
and :func:`~bootstrap_stat.sampling.bootstrap_samples` for cases
where standard resampling is wanted — and let domain-specific
recipes like this one live in user code or in a guide.

Caveats
-------

The Bin(:math:`n+1`, :math:`q`) result is an approximation to the
Poisson-bootstrap index distribution, and the Poisson bootstrap is
in turn slightly different from the multinomial bootstrap that
``EmpiricalDistribution.sample`` implements. Both approximations
have error :math:`O(1/n)` and are negligible by :math:`n` in the
low thousands; for the small samples typical of textbook bootstrap
examples the approximation is unnecessary because the naive
bootstrap is already fast.

The technique applies to *plain* sample quantiles — the q-th order
statistic, or one of the closely related index-based definitions
that NumPy's ``np.quantile`` selects between via the ``method``
argument when the index falls between two order statistics. For
quantile estimators that interpolate (the default ``linear``
method), the order-statistic identity is a slight idealization; on
samples where the interpolation moves the estimate by more than a
single observation the discrepancy matters, but for the large
:math:`n` this technique is built for, the gap between adjacent
order statistics is small enough that the difference is below Monte
Carlo noise.

For BCa intervals, the bootstrap step is still sped up by the trick,
but the BCa acceleration requires a jackknife pass over the
original data, which is :math:`O(n)` evaluations of the statistic.
At very large :math:`n` this becomes the bottleneck. The jackknife
of a sample quantile has a closed form — for the q-th order
statistic, removing observation :math:`X_i` either leaves the
quantile unchanged or shifts it by one position in the sorted
array, depending on whether :math:`i` falls above or below the
quantile index — and pre-computing it analytically and passing it
through the ``jv`` argument of ``bcanon_interval`` parallels the
zero-inflated case described in :doc:`zero_inflated`. For the
percentile interval, no jackknife is needed and the speedup applies
end-to-end.

Finally, the technique gives bootstrap percentile intervals on the
quantile estimator. It does not fix any of the well-known issues
with bootstrapping quantiles on small samples (the bootstrap
distribution of a sample quantile is discrete, supported on at most
:math:`n` distinct values; coverage of nominal-level intervals can
be poor). On the data sizes that motivate this technique — tens of
thousands of observations and up — those concerns are not the
binding constraint, but they remain worth checking for any
particular application.
Loading