This document explains the basic usage patterns of the statcpp library.
#include <iostream>
#include <vector>
#include "statcpp/basic_statistics.hpp"
int main() {
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
// Calculate mean
double avg = statcpp::mean(data.begin(), data.end());
std::cout << "Mean: " << avg << std::endl; // 3.0
return 0;
}statcpp is a header-only library. Headers are located under the include/statcpp/ directory (or include-ja/statcpp/ for Japanese-commented headers). Add the include/ directory to your compiler's include path using the -I flag:
# Compile with the include path
g++ -std=c++17 -I/path/to/statcpp/include your_program.cpp -o your_programThen include headers with the statcpp/ prefix:
#include "statcpp/basic_statistics.hpp"When using CMake with find_package(statcpp) or add_subdirectory, the include path is configured automatically.
All functions accept ranges via an iterator pair (first, last) representing the half-open interval std::vector, std::array, built-in arrays (T a[N]), and raw pointer ranges (T* first, T* last).
Iterator Category: All functions require RandomAccessIterator.
This is because they assume random access via *(first + i) and std::distance.
Non-RandomAccessIterators such as std::forward_list or input stream iterators are not supported.
#include "statcpp/basic_statistics.hpp"
#include <vector>
#include <array>
// Using std::vector
std::vector<double> vec = {1.0, 2.0, 3.0};
double m1 = statcpp::mean(vec.begin(), vec.end());
// Using std::array
std::array<int, 5> arr = {1, 2, 3, 4, 5};
double m2 = statcpp::mean(arr.begin(), arr.end());
// Using C-style array
double data[] = {1.0, 2.0, 3.0};
double m3 = statcpp::mean(data, data + 3);Many functions have overloads that accept a projection function (callable such as a lambda) as an additional argument.
When a projection function
Supported functions:
sum,mean,median,mode,geometric_mean,harmonic_mean,trimmed_meanrange,population_variance,sample_variance,variancepopulation_stddev,sample_stddev,stddevcoefficient_of_variation,iqr,mean_absolute_deviationminimum,maximum,quartiles,percentile,five_number_summary
#include "statcpp/basic_statistics.hpp"
#include <vector>
struct Product {
std::string name;
double price;
};
std::vector<Product> products = {
{"Apple", 120.0},
{"Banana", 80.0},
{"Orange", 100.0}
};
// Calculate mean of prices (using projection)
double avg_price = statcpp::mean(
products.begin(),
products.end(),
[](const Product& p) { return p.price; }
);
// avg_price = 100.0Functions that compute variance or standard deviation have overloads that accept a pre-computed mean value as double precomputed_mean.
This is useful for efficiently computing multiple statistics.
Supported functions:
population_variance,sample_variance,variancepopulation_stddev,sample_stddev,stddevcoefficient_of_variation,mean_absolute_deviation
For projection versions, the argument order is (first, last, proj, precomputed_mean).
#include "statcpp/basic_statistics.hpp"
#include "statcpp/dispersion_spread.hpp"
#include <vector>
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
// Compute mean only once
double avg = statcpp::mean(data.begin(), data.end());
// Use pre-computed mean to calculate variance and standard deviation
double var = statcpp::variance(data.begin(), data.end(), avg);
double sd = statcpp::stddev(data.begin(), data.end(), avg);median, trimmed_mean, iqr, quartiles, percentile, and five_number_summary require a sorted range.
Pass a sequence (or iterator range) that has been sorted in ascending order beforehand.
The sort order is based on comparisons satisfying strict weak ordering, same as std::sort.
#include "statcpp/basic_statistics.hpp"
#include "statcpp/order_statistics.hpp"
#include <vector>
#include <algorithm>
std::vector<double> data = {5.0, 2.0, 8.0, 1.0, 3.0};
// Sorting required
std::sort(data.begin(), data.end());
// Compute median on sorted data
double med = statcpp::median(data.begin(), data.end());For projection versions, the elements must be arranged so that the return values of projection function
struct Product {
std::string name;
double price;
};
std::vector<Product> products = {
{"Apple", 120.0},
{"Banana", 80.0},
{"Orange", 100.0}
};
// Sort by projection key price before passing
std::sort(products.begin(), products.end(),
[](const Product& a, const Product& b) { return a.price < b.price; });
auto q = statcpp::quartiles(products.begin(), products.end(),
[](const Product& p) { return p.price; });Quantile computation in iqr, quartiles, percentile, and five_number_summary is based on linear interpolation (equivalent to R type=7 / Excel QUARTILE.INC / PERCENTILE.INC).
For parameter
Endpoint handling: When
For invalid arguments (out-of-range parameters, non-computable conditions, etc.) or when the definition of the statistic does not allow an empty range, std::invalid_argument is thrown.
The message includes the namespace-qualified function name (e.g., "statcpp::mean: empty range").
However, sum and count allow empty ranges and return value_type{} and 0 respectively.
#include "statcpp/basic_statistics.hpp"
#include <vector>
#include <stdexcept>
std::vector<double> empty_data;
try {
double avg = statcpp::mean(empty_data.begin(), empty_data.end());
} catch (const std::invalid_argument& e) {
std::cerr << "Error: " << e.what() << std::endl;
// Output: Error: statcpp::mean: empty range
}
// sum and count do not throw exceptions
double s = statcpp::sum(empty_data.begin(), empty_data.end()); // 0.0
std::size_t n = statcpp::count(empty_data.begin(), empty_data.end()); // 0Variance and standard deviation calculations are implemented using the two-pass algorithm (first compute the mean, then compute the sum of squared deviations). Since Welford's online algorithm is not used, catastrophic cancellation may occur with data of vastly different scales. Consider normalizing the data on the caller side when high precision is required.
#include "statcpp/dispersion_spread.hpp"
#include <vector>
// Data with vastly different scales
std::vector<double> data = {1e10, 1e10 + 1, 1e10 + 2};
// Normalize before computing
double mean_val = statcpp::mean(data.begin(), data.end());
std::vector<double> normalized;
for (double x : data) {
normalized.push_back(x - mean_val);
}
double var = statcpp::variance(normalized.begin(), normalized.end());Example of using multiple headers together:
#include "statcpp/basic_statistics.hpp"
#include "statcpp/dispersion_spread.hpp"
#include "statcpp/order_statistics.hpp"
#include <iostream>
#include <vector>
#include <algorithm>
int main() {
std::vector<double> data = {5.0, 2.0, 8.0, 1.0, 3.0, 7.0, 4.0};
// Basic statistics
double avg = statcpp::mean(data.begin(), data.end());
std::cout << "Mean: " << avg << std::endl;
// Dispersion
double sd = statcpp::stddev(data.begin(), data.end());
std::cout << "Standard Deviation: " << sd << std::endl;
// Order statistics (sorting required)
std::sort(data.begin(), data.end());
double med = statcpp::median(data.begin(), data.end());
std::cout << "Median: " << med << std::endl;
auto q = statcpp::quartiles(data.begin(), data.end());
std::cout << "First Quartile: " << q.q1 << std::endl;
std::cout << "Second Quartile: " << q.q2 << std::endl;
std::cout << "Third Quartile: " << q.q3 << std::endl;
return 0;
}- For detailed function reference, see API Reference
- For practical code examples, see Examples
- For building and testing, see Building Guide