Skip to content

[Claude] Migrate RNG handling to modern NumPy default_rng/Generator API#31

Merged
rwilson4 merged 2 commits into
masterfrom
claude/review-rng-practices-1Hx4m
May 7, 2026
Merged

[Claude] Migrate RNG handling to modern NumPy default_rng/Generator API#31
rwilson4 merged 2 commits into
masterfrom
claude/review-rng-practices-1Hx4m

Conversation

@rwilson4
Copy link
Copy Markdown
Owner

@rwilson4 rwilson4 commented May 7, 2026

The package previously relied on the legacy global np.random interface
and np.random.seed(), which has several drawbacks: per-worker
np.random.seed produces correlated streams when distributing work,
the legacy MT19937 lacks the spawn-based parallelism guarantees of the
modern API, and library-level use of the global RNG makes results depend
on whatever the caller has done to np.random elsewhere.

Each EmpiricalDistribution (and MultiSampleEmpiricalDistribution)
now owns a Generator accepted via a new rng constructor argument,
sampling routes through that generator instead of the global one, and
all public bootstrap entry points (bootstrap_samples,
standard_error, bias, better_bootstrap_bias, bias_corrected,
percentile_interval, bcanon_interval, t_interval,
calibrate_interval, bootstrap_asl, percentile_asl,
bcanon_asl, bootstrap_power, prediction_error_optimism,
prediction_error_632, prediction_interval) take an optional
rng= argument. Multi-threaded paths use SeedSequence.spawn to
hand each worker an independent stream, replacing the prior pattern of
seeding workers from sibling draws of the parent's MT19937 state.

Tests now construct EmpiricalDistribution(data, rng=seed) (or pass
rng= to inference functions) instead of calling np.random.seed();
recorded expected values were re-recorded under PCG64. Bumps minimum
numpy>=1.25 (for SeedSequence.spawn / bit_generator.seed_seq)
and pandas>=1.4 (for random_state=Generator in DataFrame.sample).

https://claude.ai/code/session_01DX4Gi3Vwx6qJwr1ZYQEJiJ

claude added 2 commits May 7, 2026 21:24
The package previously relied on the legacy global ``np.random`` interface
and ``np.random.seed()``, which has several drawbacks: per-worker
``np.random.seed`` produces correlated streams when distributing work,
the legacy MT19937 lacks the spawn-based parallelism guarantees of the
modern API, and library-level use of the global RNG makes results depend
on whatever the caller has done to ``np.random`` elsewhere.

Each ``EmpiricalDistribution`` (and ``MultiSampleEmpiricalDistribution``)
now owns a ``Generator`` accepted via a new ``rng`` constructor argument,
sampling routes through that generator instead of the global one, and
all public bootstrap entry points (``bootstrap_samples``,
``standard_error``, ``bias``, ``better_bootstrap_bias``, ``bias_corrected``,
``percentile_interval``, ``bcanon_interval``, ``t_interval``,
``calibrate_interval``, ``bootstrap_asl``, ``percentile_asl``,
``bcanon_asl``, ``bootstrap_power``, ``prediction_error_optimism``,
``prediction_error_632``, ``prediction_interval``) take an optional
``rng=`` argument. Multi-threaded paths use ``SeedSequence.spawn`` to
hand each worker an independent stream, replacing the prior pattern of
seeding workers from sibling draws of the parent's MT19937 state.

Tests now construct ``EmpiricalDistribution(data, rng=seed)`` (or pass
``rng=`` to inference functions) instead of calling ``np.random.seed()``;
recorded expected values were re-recorded under PCG64. Bumps minimum
``numpy>=1.25`` (for ``SeedSequence.spawn`` / ``bit_generator.seed_seq``)
and ``pandas>=1.4`` (for ``random_state=Generator`` in ``DataFrame.sample``).

https://claude.ai/code/session_01DX4Gi3Vwx6qJwr1ZYQEJiJ
The custom EmpiricalDistribution subclasses in the zero-inflated and
significance guides previously sampled from the global ``np.random``,
which contradicts the modern rng-as-first-class-argument pattern the
library now uses. Their ``__init__`` now forwards ``rng=`` to the base
class and ``sample()`` draws from ``self._rng``, so the subclasses are
reproducible from a single seed and play correctly with the spawn-based
parallel paths.

The README quickstart gains a short paragraph showing
``EmpiricalDistribution(df, rng=0)`` and noting that inference functions
also accept ``rng=`` directly. The quantiles-at-scale guide passes its
existing ``rng`` to the empirical distribution for consistency. ``uv.lock``
is regenerated against the bumped numpy / pandas minimums from the prior
commit.

https://claude.ai/code/session_01DX4Gi3Vwx6qJwr1ZYQEJiJ
@rwilson4 rwilson4 merged commit 11720be into master May 7, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants