Skip to content

100% Test coverage all (?) distributions#1399

Merged
jzmaddock merged 12 commits into
boostorg:developfrom
JacobHass8:distribution-test-coverage
May 29, 2026
Merged

100% Test coverage all (?) distributions#1399
jzmaddock merged 12 commits into
boostorg:developfrom
JacobHass8:distribution-test-coverage

Conversation

@JacobHass8
Copy link
Copy Markdown
Contributor

@JacobHass8 JacobHass8 commented May 25, 2026

Added tests for the following distributions to get 100% test coverage.

  • arcsine
  • bernoulli (97%)
  • beta (99%)
    • In find_alpha and find_beta, there isn't an upper bound on the mean and variance. However, there should be.
  • binomial (98%)
  • cauchy
  • chi-squared (98%)
  • exponential (99%)
  • extreme value (98%)
  • find_location
    • Have a question below. The scale argument is allowed to be negative
  • find_scale
    • Here and in find_location, I think there might be redundant functions. Just a note to go back and take a look later.

Here is the code coverage report for my information.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 99.56332% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.61%. Comparing base (592063d) to head (7a0975e).

Files with missing lines Patch % Lines
include/boost/math/distributions/find_location.hpp 83.33% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1399      +/-   ##
===========================================
+ Coverage    95.35%   95.61%   +0.25%     
===========================================
  Files          825      825              
  Lines        68310    68487     +177     
===========================================
+ Hits         65139    65481     +342     
+ Misses        3171     3006     -165     
Files with missing lines Coverage Δ
include/boost/math/distributions/binomial.hpp 98.00% <100.00%> (+11.44%) ⬆️
include/boost/math/distributions/cauchy.hpp 100.00% <100.00%> (+16.45%) ⬆️
include/boost/math/distributions/chi_squared.hpp 98.13% <100.00%> (+7.30%) ⬆️
include/boost/math/distributions/extreme_value.hpp 98.83% <ø> (+17.68%) ⬆️
test/test_arcsine.cpp 99.24% <100.00%> (+0.06%) ⬆️
test/test_bernoulli.cpp 100.00% <100.00%> (ø)
test/test_beta_dist.cpp 98.95% <100.00%> (+0.24%) ⬆️
test/test_binomial.cpp 100.00% <100.00%> (ø)
test/test_cauchy.cpp 100.00% <100.00%> (ø)
test/test_chi_squared.cpp 100.00% <100.00%> (ø)
... and 5 more

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 592063d...7a0975e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JacobHass8 JacobHass8 changed the title 100% Test coverage for arcsine distribution 100% Test coverage all (?) distributions May 26, 2026
@JacobHass8
Copy link
Copy Markdown
Contributor Author

JacobHass8 commented May 27, 2026

@jzmaddock (or @mborland) you had previously mentioned expanding the existing test coverage. Would this PR be helpful for that? I was just going to keep plugging along for every distribution. Most of the missing tests are for edge cases, or returning NaNs. I've also found a couple of boolean operations that weren't correct which I fixed. There have been a couple cases where something like this appeared (false == check_param1() && check_param2()) when it should have been (false == (check_param1() && check_param2())).

BOOST_MATH_CUDA_ENABLED inline bool check_dist_and_prob(const char* function, const RealType& N, RealType p, RealType prob, RealType* result, const Policy& pol)
{
if((check_dist(function, N, p, result, pol) && detail::check_probability(function, prob, result, pol)) == false)
if(!(check_dist(function, N, p, result, pol) && detail::check_probability(function, prob, result, pol)))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind this change, but the two lines are equivalent I think, so if we end up with too many of these it gets harder to review the PR.

That said, I wonder if we should just simplify the whole thing down to:

return check_dist(function, N, p, result, pol) && detail::check_probability(function, prob, result, pol);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I get the point that the others have a missing set of ()'s. I wonder if those would simplify down to one liners too and be easier to read as a result?

Copy link
Copy Markdown
Contributor Author

@JacobHass8 JacobHass8 May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind this change, but the two lines are equivalent I think, so if we end up with too many of these it gets harder to review the PR.

I was worried about that. For now, I'll leave these alone and just fix any missing ()'s.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's quite a few of these issues. For example, in exponential.hpp all the checks are written as 0 == detail::verify_lambda(function, lambda, &result, Policy()).

return check_dist(function, N, p, result, pol) && detail::check_probability(function, prob, result, pol);

Would this work? Wouldn't the function then return true/false when we want to return the variable result which is defined as

result = policies::raise_domain_error<RealType>(
         function,
         "The scale parameter \"lambda\" must be > 0, but was: %1%.", l, pol);```

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's quite a few of these issues. For example, in exponential.hpp all the checks are written as 0 == detail::verify_lambda(function, lambda, &result, Policy()).

I don't see an issue there? Arguably it should be false == detail::verify_lambda(function, lambda, &result, Policy()) but even that's hardly a necessary change?

Would this work? Wouldn't the function then return true/false when we want to return the variable result which is defined as

I'm looking specifically at the helpers that compound two or more checks (and return a bool), for example:

        template <class RealType, class Policy>
        BOOST_MATH_CUDA_ENABLED inline bool check_dist_and_prob(const char* function, const RealType& N, RealType p, RealType prob, RealType* result, const Policy& pol)
        {
           if((check_dist(function, N, p, result, pol) && detail::check_probability(function, prob, result, pol)) == false)
              return false;
           return true;
        }

Could be simplified to:

        template <class RealType, class Policy>
        BOOST_MATH_CUDA_ENABLED inline bool check_dist_and_prob(const char* function, const RealType& N, RealType p, RealType prob, RealType* result, const Policy& pol)
        {
           return (check_dist(function, N, p, result, pol) && detail::check_probability(function, prob, result, pol);
        }

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's quite a few of these issues. For example, in exponential.hpp all the checks are written as 0 == detail::verify_lambda(function, lambda, &result, Policy()).

I don't see an issue there? Arguably it should be false == detail::verify_lambda(function, lambda, &result, Policy()) but even that's hardly a necessary change?

Yes, you're right. Issue wasn't the right word. It's really just a style change. I'll leave these since it doesn't change anything.

Could be simplified to:

        template <class RealType, class Policy>
        BOOST_MATH_CUDA_ENABLED inline bool check_dist_and_prob(const char* function, const RealType& N, RealType p, RealType prob, RealType* result, const Policy& pol)
        {
           return (check_dist(function, N, p, result, pol) && detail::check_probability(function, prob, result, pol);
        }

Yeah, that would consolidate a couple lines and maybe be more readable. I'll won't make these changes since the PR is already quite large (unless you say otherwise).

Comment thread test/test_arcsine.cpp
BOOST_CHECK((boost::math::isnan)(logpdf(ignore_error_arcsine(-1, 1), static_cast <RealType>(+2)))); // x > x_max

// CDF
BOOST_CHECK((boost::math::isnan)(cdf(ignore_error_arcsine(0, 1), std::numeric_limits<RealType>::infinity()))); // x == infinity
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment here, is that generally I would prefer to see a BOOST_CHECK_THROW and verify that the correct exception is thrown: as long as we're getting the right error handler called (and hence correct exception thrown) then we know it's guaranteed we'll get a NaN when exceptions are off, where as a NaN when exceptions are off does not guarantee the correct exception type when they're on. Hope that makes sense!

It wouldn't hurt to do both of course, but probably the only time we actually NEED both is to check for the correct sign on infinities when an overflow_error is raised.

One final really picky point: we should really hide use of std::numeric_limits<RealType>::infinity() behind a BOOST_MATH_IF_CONSTEXPR(std::numeric_limits<RealType>::has_infinity), likewise for NaN's.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to do both of course, but probably the only time we actually NEED both is to check for the correct sign on infinities when an overflow_error is raised.

The reason I did both is because we never actually hit the return result line in the code coverage report. Maybe this isn't the best reason to do add it though.

One final really picky point: we should really hide use of std::numeric_limits<RealType>::infinity() behind a BOOST_MATH_IF_CONSTEXPR(std::numeric_limits<RealType>::has_infinity), likewise for NaN's.

Why does the BOOST_MATH_IF_CONSTEXPR need added?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the BOOST_MATH_IF_CONSTEXPR need added?

Strictly speaking it's not required and we could use an if, but MSVC emits warnings if something could be if constexpr and isn't, so it just avoids that annoyance. if constexpr is C++17 BTW and we're still C++14 compatible.

@jzmaddock
Copy link
Copy Markdown
Collaborator

Really big thanks for taking this on! Yes definitely worth doing, and yes, it's a really tedious old job that mostly throws up a few trivial errors. Gets a big thumbs up from me!

@JacobHass8
Copy link
Copy Markdown
Contributor Author

JacobHass8 commented May 28, 2026

@jzmaddock I had a question regarding the logcdf function in exponential.hpp. When if (c.param >= tools::max_value<RealType>()) we return 0. Shouldn't this be -inf or at least nan since we would normally return -c.param * lambda? It looks like the logcdf will be very negative and then jump to 0 when c.param > tools::max_value<RealType>()

template <class RealType, class Policy>
BOOST_MATH_GPU_ENABLED inline RealType logcdf(const complemented2_type<exponential_distribution<RealType, Policy>, RealType>& c)
{
   BOOST_MATH_STD_USING // for ADL of std functions

   constexpr auto function = "boost::math::logcdf(const exponential_distribution<%1%>&, %1%)";

   RealType result = 0;
   RealType lambda = c.dist.lambda();
   if(0 == detail::verify_lambda(function, lambda, &result, Policy()))
      return result;
   if(0 == detail::verify_exp_x(function, c.param, &result, Policy()))
      return result;
   // Workaround for VC11/12 bug:
   if (c.param >= tools::max_value<RealType>())
      return 0; // Shouldn't this be -inf? 
   result = -c.param * lambda;

   return result;
}

@JacobHass8
Copy link
Copy Markdown
Contributor Author

Most of the test cases that I've written so far have seemed pretty redundant. I'm mainly just checking functions return a NaN if variables go outside their bounds.

I'm wondering if there's maybe a better way to do this. Could we define a new class of floating point number which inherits from float, double... but the bound can be set (i.e. from [0, inf) or (-inf, inf)). Then a domain_error is raised if a user tries to set a value outside that bound? Or somehow encode the bounds of each parameter into each class of distribution.

I just feel like there has to be a cleaner way to do this than copy and pasting tests which probe each variable for each distribution.

@jzmaddock
Copy link
Copy Markdown
Collaborator

I had a question regarding the logcdf function in exponential.hpp. When if (c.param >= tools::max_value()) we return 0. Shouldn't this be -inf or at least nan since we would normally return -c.param * lambda? It looks like the logcdf will be very negative and then jump to 0 when c.param > tools::max_value()

If you follow the logic, if verify_exp_x fails, then it sets result to the result of raise_domain_error which will be a NaN.

@jzmaddock
Copy link
Copy Markdown
Collaborator

I'm wondering if there's maybe a better way to do this. Could we define a new class of floating point number which inherits from float, double... but the bound can be set (i.e. from [0, inf) or (-inf, inf)). Then a domain_error is raised if a user tries to set a value outside that bound? Or somehow encode the bounds of each parameter into each class of distribution.

Unless I'm misunderstanding, I don't see a way to make this work: we want to be able to instantiate these on float or double, so we need to test that those types do the right thing? But maybe I'm missing something too...

@JacobHass8
Copy link
Copy Markdown
Contributor Author

I had a question regarding the logcdf function in exponential.hpp. When if (c.param >= tools::max_value()) we return 0. Shouldn't this be -inf or at least nan since we would normally return -c.param * lambda? It looks like the logcdf will be very negative and then jump to 0 when c.param > tools::max_value()

If you follow the logic, if verify_exp_x fails, then it sets result to the result of raise_domain_error which will be a NaN.

Hmmm, so is there any need for the check below if this will be caught in verify_exp_x?

 // Workaround for VC11/12 bug:
 if (c.param >= tools::max_value<RealType>())
    return 0; 

@JacobHass8
Copy link
Copy Markdown
Contributor Author

I'm wondering if there's maybe a better way to do this. Could we define a new class of floating point number which inherits from float, double... but the bound can be set (i.e. from [0, inf) or (-inf, inf)). Then a domain_error is raised if a user tries to set a value outside that bound? Or somehow encode the bounds of each parameter into each class of distribution.

Unless I'm misunderstanding, I don't see a way to make this work: we want to be able to instantiate these on float or double, so we need to test that those types do the right thing? But maybe I'm missing something too...

I don't think you're missing anything. I'm just doing a lot of copy and pasting of tests which makes me think there's a more systematic way to do this testing.

@JacobHass8
Copy link
Copy Markdown
Contributor Author

Another edge case that I found a little weird. The find_location function allows the scale of a distribution to be negative as shown in the code below

template <class Dist, class Policy>
inline
  typename Dist::value_type find_location( // For example, normal mean.
  typename Dist::value_type z, // location of random variable z to give probability, P(X > z) == p.
  // For example, a nominal minimum acceptable z, so that p * 100 % are > z
  typename Dist::value_type p, // probability value desired at x, say 0.95 for 95% > z.
  typename Dist::value_type scale, // scale parameter, for example, normal standard deviation.
  const Policy& pol 
  )
{
  static_assert(::boost::math::tools::is_distribution<Dist>::value, "The provided distribution does not meet the conceptual requirements of a distribution."); 
  static_assert(::boost::math::tools::is_scaled_distribution<Dist>::value, "The provided distribution does not meet the conceptual requirements of a scaled distribution."); 
  static const char* function = "boost::math::find_location<Dist, Policy>&, %1%)";

  if(!(boost::math::isfinite)(p) || (p < 0) || (p > 1))
  {
   return policies::raise_domain_error<typename Dist::value_type>(
       function, "Probability parameter was %1%, but must be >= 0 and <= 1!", p, pol);
  }
  if(!(boost::math::isfinite)(z))
  {
   return policies::raise_domain_error<typename Dist::value_type>(
       function, "z parameter was %1%, but must be finite!", z, pol);
  }
  if(!(boost::math::isfinite)(scale))
  {
   return policies::raise_domain_error<typename Dist::value_type>(
       function, "scale parameter was %1%, but must be finite!", scale, pol);
  }
    
  //cout << "z " << z << ", p " << p << ",  quantile(Dist(), p) "
  //  << quantile(Dist(), p) << ", quan * scale " << quantile(Dist(), p) * scale << endl;
  return z - (quantile(Dist(), p) * scale);
} // find_location

If this is unintended, I'll just change the code above to use boost::math::detail::check_scale instead.

return result;
if(0 == detail::check_finite(function, a, &result, Policy()))
return result;
if(0 == detail::check_finite(function, a, &result, Policy()))
Copy link
Copy Markdown
Contributor Author

@JacobHass8 JacobHass8 May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I deleted this, and another instance in this file, because they were duplicate checks.

@jzmaddock
Copy link
Copy Markdown
Collaborator

Hmmm, so is there any need for the check below if this will be caught in verify_exp_x?

Missed that, given the comment about VC11/12 and the fact that these are obsolete, I would say not any more.

@jzmaddock
Copy link
Copy Markdown
Collaborator

I don't think you're missing anything. I'm just doing a lot of copy and pasting of tests which makes me think there's a more systematic way to do this testing.

I wonder if we can't just write a generic set of tests, for example to test invalid support values we could use:

template <class Distribution>
void test_invalid_support(const Distribution& dist)
{
    using Real = Distribution::value_type;
    std::pair<Real, Real> sup = support(dist);

    // Test outside lower bound:
    Real invalid = sup.first;
    if (boost::math::isfinite(invalid))
    {
        if (invalid == -boost::math::tools::max_value<Real>())
            invalid = -std::numeric_limits<Real>::infinity();
        else
            invalid = boost::math::float_prior(invalid);
        BOOST_MATH_CHECK_THROW(pdf(dist, invalid), std::domain_error);
        BOOST_MATH_CHECK_THROW(cdf(dist, invalid), std::domain_error);
        BOOST_MATH_CHECK_THROW(cdf(complement(dist, invalid)), std::domain_error);
    }
    // Test outside upper bound:
    invalid = sup.second;
    if (boost::math::isfinite(invalid))
    {
        if (invalid == boost::math::tools::max_value<Real>())
            invalid = std::numeric_limits<Real>::infinity();
        else
            invalid = boost::math::float_next(invalid);
        BOOST_MATH_CHECK_THROW(pdf(dist, invalid), std::domain_error);
        BOOST_MATH_CHECK_THROW(cdf(dist, invalid), std::domain_error);
        BOOST_MATH_CHECK_THROW(cdf(complement(dist, invalid)), std::domain_error);
    }
    // Test NaN:
    invalid = std::numeric_limits<Real>::quiet_NaN();
    BOOST_MATH_CHECK_THROW(pdf(dist, invalid), std::domain_error);
    BOOST_MATH_CHECK_THROW(cdf(dist, invalid), std::domain_error);
    BOOST_MATH_CHECK_THROW(cdf(complement(dist, invalid)), std::domain_error);
}

And then do something similar for invalid probability values, and finally I guess something similar for distributions which were constructed from invalid parameters (this last one requires no exceptions to be thrown).

@jzmaddock
Copy link
Copy Markdown
Collaborator

If this is unintended, I'll just change the code above to use boost::math::detail::check_scale instead.

I think unintended and too lax yes.

@JacobHass8
Copy link
Copy Markdown
Contributor Author

JacobHass8 commented May 28, 2026

And then do something similar for invalid probability values, and finally I guess something similar for distributions which were constructed from invalid parameters (this last one requires no exceptions to be thrown).

I think something like this would be very nice! I think that invalid probability values should be quite easy since we just need to check negative inputs and inputs greater than 1. The invalid distribution construction might be harder.

Maybe we should just merge the PR at this point. Then I could open a separate PR to implement the test_invalid_support function as well as the invalid probabilities and invalid distribution construction. I've improved code coverage for a lot of distributions, but I think most of it could be consolidated to a single function which is generic across distributions.

@JacobHass8 JacobHass8 closed this May 28, 2026
@JacobHass8 JacobHass8 reopened this May 28, 2026
@jzmaddock jzmaddock merged commit febada3 into boostorg:develop May 29, 2026
74 checks passed
@jzmaddock
Copy link
Copy Markdown
Collaborator

Yep, OK I've merged. With a bit of improved testing infrastructure, hopefully most of the other distribution tests will just need one or two lines added to call the boiler plate testers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants