Skip to content

feat(payload): improve BoundedProbability ergonomics for adoption#1876

Open
goxberry wants to merge 8 commits into
mainfrom
goxberry/improve-probability-type-ergonomics
Open

feat(payload): improve BoundedProbability ergonomics for adoption#1876
goxberry wants to merge 8 commits into
mainfrom
goxberry/improve-probability-type-ergonomics

Conversation

@goxberry
Copy link
Copy Markdown
Contributor

@goxberry goxberry commented May 20, 2026

What does this PR do?

Prepares BoundedProbability (added in #1874) to replace bare f32
probability fields throughout lading_payload configs. The PR is a
sequence of independently-reviewable commits that add the surface the
adoption stack actually needs, plus a final cleanup commit that drops
the items the adoption PRs turned out not to use:

  1. impl PartialEq/Eq/PartialOrd/Ord/Hash — hand-written, sound under
    try_new's invariants (no NaN, no -0.0). PartialEq/Ord use f32
    operators (==, total_cmp); Hash uses the u32 bit pattern because
    f32: !Hash. The Eq/Hash contract holds because numerically equal
    valid values share a bit pattern.
  2. Type aliases — rename the generic struct to
    BoundedProbability<MIN_AS_BITS> and add Probability ([0.0, 1.0])
    and AtLeastOneTenth ([0.1, 1.0]) so callers don't have to spell
    f32::to_bits(...) at each use site.
  3. impl arbitrary::Arbitrary (feature-gated) — needed so structs
    containing a Probability field can keep deriving Arbitrary once
    wired in. A derived impl would emit NaN / infinity / out-of-range
    values; this one samples a u32 in [MIN_AS_BITS, to_bits(+1.0)] and
    relies on the documented f32 <-> u32 monotonicity to guarantee
    validity.
  4. sample_bernoulli helper — replaces the
    rng.random_bool(p.get() as f64) pattern that every adopting call site
    would otherwise repeat. The f32 -> f64 conversion is exact on
    [0.0, 1.0], so the success probability is preserved bit-for-bit.
  5. AtLeastOneHundredth aliasBoundedProbability<{ f32::to_bits(0.01) }>
    for unique_tag_ratio's existing 0.01 floor, which must admit
    in-the-wild values below 0.1. Also drops the unique_tag_ratio
    reference from the AtLeastOneTenth rustdoc.
  6. Cleanup — drop the items the adoption stack (feat(payload): adopt Probability for TimestampConfig::probability #1877feat(payload): adopt AtLeastOneHundredth for Config::unique_tag_ratio #1881) does
    not exercise: impl Display (originally from payload config: add new Probability<const u32: MIN_AS_BITS> type #1874), impl Eq,
    impl Ord, impl PartialOrd, impl Hash, sample_bernoulli, and
    the AtLeastOneTenth alias. Surviving surface: PartialEq
    (transitively required by dogstatsd::Config/ValueConf/TimestampConfig
    deriving PartialEq), Arbitrary (transitively required by fuzz
    targets that derive Arbitrary on Config), TryFrom<f32> /
    From<BoundedProbability<_>> for f32 (required by
    serde(into = "f32", try_from = "f32")), try_new, get, and the
    Probability and AtLeastOneHundredth aliases. The dead tests are
    pruned alongside.

Each non-cleanup commit ships with unit and property tests; the cleanup
commit removes the now-dead tests covering removed items.

Motivation

BoundedProbability exists but isn't yet usable in Config structs:
adopters need PartialEq to keep deriving that trait, Arbitrary for
fuzz harnesses, and named aliases instead of bit-pattern incantations.
Commits 1–5 build out a wider surface area than needed; commit 6 trims
back to what the rest of the stack actually calls. The history is
preserved so the rationale for each piece is reviewable, with the
cleanup commit acting as a journaled "what stuck" record.

Related issues

Follow-up to #1874. Foundation for the adoption stack:

Additional Notes

No wire-format change — BoundedProbability still serializes as a bare
f32 via serde(into = "f32", try_from = "f32"). Phase 1 PRs in the
stack switch one Config field per PR with no behavioral change at
runtime (.get() inlines to the same machine code as direct f32
access, and no sampler is swapped).

goxberry and others added 4 commits May 20, 2026 13:45
Hand-write the comparison and hashing traits, sound under `try_new`'s
invariants (no NaN, no `-0.0`). `PartialEq`/`PartialOrd`/`Ord` use `f32`
native operators (`==`, `total_cmp`); `Hash` uses the `u32` bit pattern
because `f32: !Hash`. The `Eq`/`Hash` contract holds because numerically
equal valid values share a bit pattern.

Add unit and property tests covering equality, ordering, hashing in a
`HashSet`, and agreement between `Ord::cmp` and `f32::partial_cmp` over
the full valid range.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename the generic struct to `BoundedProbability<MIN_AS_BITS>` and
expose two type aliases for the bounds that occur in lading payload
config: `Probability` for $[0.0, 1.0]$ and `AtLeastOneTenth` for
$[0.1, 1.0]$. Callers adopting the type no longer have to spell
`f32::to_bits(...)` at each use site, which is the main blocker for
wiring the type into existing `f32` fields (`dogstatsd::Config`,
`opentelemetry::trace::Config`, etc.).

ProbabilityError keeps its name since Probability is now the alias
users typically construct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add an arbitrary::Arbitrary impl gated on the existing `arbitrary`
feature so types containing a Probability field can continue to derive
Arbitrary once the type is wired into Config structs. A derived impl
would call f32::arbitrary and emit NaN, infinity, and out-of-range
values, so a manual impl is required.

The impl samples a u32 in [MIN_AS_BITS, f32::to_bits(+1.0)] and decodes
it with f32::from_bits. The f32 <-> u32 bit-ordering monotonicity that
is already documented on the type guarantees the decoded value lies in
[MIN, +1.0] and cannot be -0.0 (whose bit pattern 0x8000_0000 is well
above 0x3f80_0000). Routing the result through try_new fires the
per-monomorphization const-eval bound check and forwards any future
invariant added to the constructor; the expect is sound by the
argument above.

Add three property tests, one per existing bound, gated on the same
feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probabilities in lading payload code exist to drive Bernoulli trials
("flip a coin biased by p"). Without a helper, every adopting call site
would extract the inner f32 via get() and pass it to rng.random_bool(p
as f64) by hand, which both leaks the f32 representation and recomputes
the same f32 -> f64 conversion at each use.

Add sample_bernoulli(&self, &mut rng) -> bool that delegates to
rng.random_bool. The f32 -> f64 conversion is exact for every f32 in
[+0.0, +1.0], so the success probability is preserved bit-for-bit, and
try_new's range invariant means random_bool never panics on
out-of-range input.

Add two property tests over u64 seeds, one for p = 0.0 and one for
p = 1.0. Together they pin the wiring (P(true) is not inverted) and
exercise the boundary values that random_bool is most likely to special-
case. A statistical test on intermediate p would only re-test
rand::Rng::random_bool itself, so it is omitted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented May 20, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

Changelog Check | changelog-check   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. No changes to CHANGELOG.md detected. Add 'no-changelog' label if this is intentional.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 52539fb | Docs | Datadog PR Page | Give us feedback!

Introduces `AtLeastOneHundredth = BoundedProbability<{ f32::to_bits(0.01) }>`
for fields such as `unique_tag_ratio` whose existing 0.01 floor must admit
in-the-wild values below 0.1. Also updates the `AtLeastOneTenth` rustdoc to
drop the `unique_tag_ratio` reference, since that field will be wired to
`AtLeastOneHundredth` rather than tightened to a 0.1 floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@goxberry goxberry changed the title feat(payload): impl PartialEq/Eq/PartialOrd/Ord/Hash for Probability feat(payload): improve BoundedProbability ergonomics for adoption May 20, 2026
@goxberry goxberry marked this pull request as ready for review May 21, 2026 02:44
@goxberry goxberry requested a review from a team as a code owner May 21, 2026 02:44
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21da2305c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lading_payload/src/common/config.rs Outdated
/// Sample a Bernoulli trial with success probability `self.get()`.
///
/// Returns `true` with probability `self.get()` and `false` otherwise.
/// The f32 ↔ f64 conversion is exact for every f32 in `[+0.0, +1.0]`, so
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace non-ASCII documentation characters

AGENTS.md requires code and documentation to be US-ASCII only, but this added doc comment introduces a Unicode arrow; the same change adds another arrow below and an em dash in a test comment. Please replace the new non-ASCII characters with ASCII equivalents such as <-> and -- so the file continues to satisfy the documented repository standard.

Useful? React with 👍 / 👎.

AGENTS.md requires US-ASCII only in code and documentation. Replace two
left-right arrows with `<->` and an em dash with `--`.
@goxberry
Copy link
Copy Markdown
Contributor Author

BoundedProbability rollout plan

Wire the BoundedProbability aliases (Probability, AtLeastOneTenth, and a
new AtLeastOneHundredth) into the lading_payload config and generator
types. Keep Generator::generate paths bit-exact and zero-overhead.

Decisions locked in

  • unique_tag_ratio: add AtLeastOneHundredth = BoundedProbability<{ f32::to_bits(0.01) }>
    to preserve the existing 0.01 floor. Do NOT use AtLeastOneTenth for this
    field (it would tighten the bound and reject in-the-wild configs in
    [0.01, 0.10)).
  • PR structure: PR the existing branch first; then one stacked PR per field.
  • Stack tool: Graphite (gt). Repo initialized with main as trunk;
    goxberry/improve-probability-type-ergonomics tracked as child of main.

Phase 0 — Land the existing branch

On goxberry/improve-probability-type-ergonomics, add:

  1. AtLeastOneHundredth = BoundedProbability<{ f32::to_bits(0.01) }> alias
    in lading_payload/src/common/config.rs.
  2. Update the rustdoc on AtLeastOneTenth so it does NOT claim
    unique_tag_ratio uses it (since we're keeping unique_tag_ratio on
    the 0.01 floor).

Then gt submit to push and open the PR.

Phase 1 — Stacked PRs (one per field)

Per-field recipe

  • Change the public Config / TimestampConfig / ValueConf field type from
    f32 to the matching alias (Probability or AtLeastOneHundredth).
  • Remove the now-redundant is_finite() && (0.0..=1.0).contains(&x) checks
    from valid()try_from enforces them at deserialize time.
  • Thread the alias type down through the private constructors
    (MemberGenerator::new, MetricGenerator::new, NumValueGenerator::new,
    common::tags::Generator::new, dogstatsd::common::tags::Generator::new,
    opentelemetry::common::TagGenerator::new).
  • At the comparison site in the generate hot path, call .get() to extract
    the f32. .get() is const fn returning a copied scalar — inlines to
    the same machine code, so RNG sequences and bit-exact behavior are
    preserved.
  • Update test fixtures and Default impls that construct these structs
    literally.

Hot-path call sites that stay f32 at comparison time

  • lading_payload/src/dogstatsd/metric.rs:112 self.timestamp_probability == 0.0
  • lading_payload/src/dogstatsd/metric.rs:117 prob < self.timestamp_probability
  • lading_payload/src/dogstatsd/metric.rs:147 prob < self.sampling_probability
  • lading_payload/src/dogstatsd/metric.rs:161 prob < self.multivalue_pack_probability
  • lading_payload/src/dogstatsd/common.rs:70 and :81 prob < *float_probability
  • lading_payload/src/common/tags.rs:277 choose_existing_prob > self.unique_tag_probability

All become .get() calls. Do NOT substitute sample_bernoulli
OpenClosed01 < p and random_bool(p) are not bit-identical samplers, and
swapping would shift the user-visible RNG sequence.

Stack order (smallest blast radius first)

PR Field Alias Notes
1 TimestampConfig::probability Probability Self-contained struct with its own valid(). Many test fixtures in dogstatsd.rs construct TimestampConfig literally.
2 ValueConf::float_probability Probability Threads into NumValueGenerator.
3 Config::sampling_probability Probability Threads through MemberGenerator::new -> MetricGenerator::new.
4 Config::multivalue_pack_probability Probability Same path as PR 3.
5 Config::unique_tag_ratio AtLeastOneHundredth Threads through MemberGenerator::new -> common::tags::Generator::new (also dogstatsd::common::tags::Generator::new wrapper and opentelemetry::common::TagGenerator::new). The opentelemetry::common::UNIQUE_TAG_RATIO: f32 = 0.75 const becomes a const AtLeastOneHundredth via match try_new(0.75). Highest blast radius — tag tests reference the existing const bounds.

Efficiency verification

After each Phase 1 PR, run the relevant criterion benches and compare against
the prior commit:

  • PRs 1–4: cargo bench -p lading_payload --bench dogstatsd (and block).
  • PR 5: also OTel benches (opentelemetry_log, opentelemetry_metric,
    opentelemetry_traces).

Expected delta is noise — no allocation change, no new branch, no extra
function call (the .get() inlines away).

Out of scope

  • No change to YAML/JSON wire format. BoundedProbability continues to
    serialize as a bare f32 via serde(into = "f32", try_from = "f32").
  • No swap of OpenClosed01 < p for sample_bernoulli anywhere.
  • No change to the existing common::tags::MIN_UNIQUE_TAG_RATIO,
    WARN_UNIQUE_TAG_RATIO, or MAX_UNIQUE_TAG_RATIO constants — kept for
    backward compatibility and reuse in tests.

Comment thread lading_payload/src/common/config.rs Outdated
Comment thread lading_payload/src/common/config.rs Outdated
Comment on lines +314 to +331
/// Generate a uniformly-distributed-over-bit-patterns value in `[MIN, +1.0]`
/// by sampling a `u32` in `[MIN_AS_BITS, f32::to_bits(+1.0)]` and decoding it.
///
/// This works because the f32 <-> u32 ordering (documented on the type) is
/// monotonic for non-negative finite values, so every bit pattern in that
/// range decodes to a valid stored value. `-0.0`'s bit pattern is
/// `0x8000_0000`, far above `f32::to_bits(+1.0) = 0x3f80_0000`, so it can
/// never be generated.
#[cfg(feature = "arbitrary")]
impl<'a, const MIN_AS_BITS: u32> arbitrary::Arbitrary<'a> for BoundedProbability<MIN_AS_BITS> {
fn arbitrary(u: &mut arbitrary::Unstructured<'a>) -> arbitrary::Result<Self> {
let bits = u.int_in_range(MIN_AS_BITS..=f32::to_bits(Self::MAX))?;
let value = f32::from_bits(bits);
// Routing through `try_new` fires the per-monomorphization const-eval
// bound check on `Self::MIN` and forwards any future invariant added
// to the constructor. The `expect` is safe by the argument above.
Ok(Self::try_new(value).expect("bits in [MIN_AS_BITS, MAX_AS_BITS] always valid"))
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this trait implementation would actually get used unless we were fuzzing this type for some reason. I can't think of a scenario where we would actually do that. I think this block of code could be deleted without any real consequence.

Drop `impl Display`, `impl Eq`, `impl Ord`, `impl PartialOrd`,
`impl Hash`, `sample_bernoulli`, and the `AtLeastOneTenth` alias from
`BoundedProbability`. None of these are reached by production code at
the tip of the adoption stack: callers only need `PartialEq`
(transitively via derives on `dogstatsd::Config`, `ValueConf`,
`TimestampConfig`), `Arbitrary` (transitively via fuzz target derives),
`TryFrom`/`From` (required by serde), `try_new`/`get`, and the
`Probability` and `AtLeastOneHundredth` aliases. Drop the now-dead
tests covering the removed items. `cargo test -p lading-payload`
passes 239 tests; clippy and the `arbitrary` feature build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@goxberry
Copy link
Copy Markdown
Contributor Author

Remove unused BoundedProbability trait impls, methods, and aliases

Context

PR 1876 (goxberry/improve-probability-type-ergonomics, current branch) is the
foundation of a five-PR stack that extends BoundedProbability and then adopts
it across lading_payload. After the adoption PRs (1877–1881) land at the
stack tip (goxberry/probability-unique-tag-ratio), several items added in
PR 1876 turn out to have no production caller. Carrying them forward enlarges
the API surface, the test surface, and the supported invariants without any
caller benefiting.

The user wants PR 1876 trimmed so the type only exposes what the rest of the
stack actually uses. Audit performed against the tip of the stack
(goxberry/probability-unique-tag-ratio) — anything not referenced outside
lading_payload/src/common/config.rs is removable.

Audit summary (against tip of PR 1881)

Production / transitive callers verified via git grep against
goxberry/probability-unique-tag-ratio:

Keep (callers exist):

  • BoundedProbability type, Probability alias, AtLeastOneHundredth alias
  • try_new, get methods
  • TryFrom<f32>, From<BoundedProbability<_>> for f32 — required by
    #[serde(into = "f32", try_from = "f32")]
  • impl PartialEq — transitively required because dogstatsd::Config,
    dogstatsd::ValueConf, dogstatsd::TimestampConfig derive PartialEq and
    hold Probability / AtLeastOneHundredth fields
  • impl Arbitrary (feature = "arbitrary") — transitively required by fuzz
    targets in lading_payload/fuzz/fuzz_targets/dogstatsd_*.rs that derive
    arbitrary::Arbitrary on Input { config: lading_payload::dogstatsd::Config, .. }
  • ProbabilityError, NEG_ZERO_AS_BITS, MIN, MAX — internal callers in
    try_new and the Arbitrary impl

Remove (no production caller):

  • impl fmt::Display for BoundedProbability — only tests format the type
  • impl Eq — never required by a derive in production
  • impl Ord — never required by a derive in production
  • impl PartialOrd — never required by a derive in production
  • impl hash::Hash — never required by a derive in production
  • BoundedProbability::sample_bernoulli — no caller; only tested in-place
  • AtLeastOneTenth alias — no caller

Critical file

lading_payload/src/common/config.rs is the only file touched.

Edits in lading_payload/src/common/config.rs

Module-level imports (line 5):

  • Change use std::{cmp, fmt, hash}; to use std::{cmp, fmt};. cmp and
    fmt are still used by ConfRange (PartialOrd bounds, Display impl).
    hash is no longer needed.

Type aliases (around lines 177–179):

  • Delete the AtLeastOneTenth alias and its doc comment.

Trait impls on BoundedProbability (lines 200–236):

  • Delete impl<const MIN_AS_BITS: u32> fmt::Display for BoundedProbability<MIN_AS_BITS> block.
  • Delete the explanatory comment block on lines 206–210 about Eq/Ord/Hash soundness.
  • Delete impl<const MIN_AS_BITS: u32> Eq for BoundedProbability<MIN_AS_BITS> {}.
  • Delete impl<const MIN_AS_BITS: u32> Ord for BoundedProbability<MIN_AS_BITS> block.
  • Delete impl<const MIN_AS_BITS: u32> PartialOrd for BoundedProbability<MIN_AS_BITS> block.
  • Delete impl<const MIN_AS_BITS: u32> hash::Hash for BoundedProbability<MIN_AS_BITS> block.
  • Keep the PartialEq impl above the comment block.

Inherent methods on BoundedProbability (lines 300–311):

  • Delete the sample_bernoulli method and its doc comment.

Tests in mod probability_tests (lines 334–850):

  • Remove ordering_matches_numeric_ordering (lines 408–416) — uses <.
  • Remove hash_agrees_with_eq (lines 418–427) — uses HashSet/Hash.
  • Remove check_display_matches, check_display_precision helpers (lines 499–507).
  • Remove the three display_matches_inner_f32_for_valid_values_* proptests.
  • Remove the three display_propagates_precision_* proptests.
  • Remove ord_matches_f32_partial_cmp proptest (lines 605–618) — uses cmp.
  • Remove bernoulli_at_zero_never_succeeds and bernoulli_at_one_always_succeeds
    proptests (lines 829–848) and the // ===== Property tests: sample_bernoulli =====
    comment block above them.
  • Keep equality_holds_for_same_bit_pattern — relies only on PartialEq.

No other files need to change. The adoption PRs do not reference any of the
removed items, so the stack should rebase through cleanly.

Verification

Run from the repo root, on the current branch
(goxberry/improve-probability-type-ergonomics):

  1. cargo test -p lading-payload — full test suite passes (previously 250
    tests on PR 1881; expect a lower count after removing tests, all green).
  2. cargo clippy -p lading-payload --all-targets -- -D warnings — clean
    (catches any orphan use or dead-code warnings from removed helpers).
  3. cargo build -p lading-payload --features arbitrary — confirms the
    Arbitrary impl path still compiles after the test changes.
  4. Optional smoke: rebase a descendant branch (e.g. goxberry/probability-timestamp-config)
    onto this commit locally and run cargo check -p lading-payload.
    No expected fallout — none of the adoption PRs use the removed items.

`cargo fmt --check` flagged a stray blank line before the closing brace
of `mod probability_tests` introduced by the prior cleanup commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants