This document provides an overview of all modules and public functions in the statcpp library (524 public functions across 31 header files).
For detailed function specifications, formulas, and usage examples, please refer to the HTML documentation generated by Doxygen.
To generate detailed API documentation:
# Install Doxygen (if necessary)
# macOS
brew install doxygen
# Ubuntu/Debian
sudo apt-get install doxygen
# Generate documentation
./generate_docs.sh
# Or run directly
doxygen Doxyfile
# Open in browser (macOS)
open doc/html/index.html
# Linux
xdg-open doc/html/index.htmlFunctions for computing basic statistical measures.
Functions:
| Function | Description | Overloads |
|---|---|---|
sum |
Sum of elements | + projection |
count |
Number of elements | |
mean |
Arithmetic mean | + projection |
median |
Median (requires sorted range) | + projection |
mode |
Mode (most frequent value) | + projection |
modes |
All modes | + projection |
geometric_mean |
Geometric mean | + projection |
harmonic_mean |
Harmonic mean | + projection |
trimmed_mean |
Trimmed mean (requires sorted range) | + projection |
weighted_mean |
Weighted mean | + projection |
logarithmic_mean |
Logarithmic mean of two values | |
weighted_harmonic_mean |
Weighted harmonic mean | + projection |
argmin |
Index of minimum element | + projection |
argmax |
Index of maximum element | + projection |
Functions for measuring data dispersion (variability).
Functions:
| Function | Description | Overloads |
|---|---|---|
range |
Range (max - min) | + projection |
var |
Variance with ddof parameter | + precomputed mean, + projection |
population_variance |
Population variance | + precomputed mean, + projection |
sample_variance |
Sample variance | + precomputed mean, + projection |
variance |
Variance (defaults to sample) | + precomputed mean, + projection |
stdev |
Standard deviation with ddof parameter | + precomputed mean, + projection |
population_stddev |
Population standard deviation | + precomputed mean, + projection |
sample_stddev |
Sample standard deviation | + precomputed mean, + projection |
stddev |
Standard deviation (defaults to sample) | + precomputed mean, + projection |
coefficient_of_variation |
Coefficient of variation | + precomputed mean, + projection |
iqr |
Interquartile range (requires sorted range) | + projection |
mean_absolute_deviation |
Mean absolute deviation | + precomputed mean, + projection |
weighted_variance |
Weighted variance | + projection |
weighted_stddev |
Weighted standard deviation | + projection |
geometric_stddev |
Geometric standard deviation | + projection |
Functions for computing order statistics (all require sorted range).
Structs:
quartile_result— fields:q1,q2,q3five_number_summary_result— fields:min,q1,median,q3,max
Functions:
| Function | Description | Overloads |
|---|---|---|
interpolate_at |
Interpolate value at fractional index | + projection |
minimum |
Minimum value | + projection |
maximum |
Maximum value | + projection |
quartiles |
Quartiles (Q1, Q2, Q3) | + projection |
percentile |
Percentile at given proportion | + projection |
five_number_summary |
Five-number summary | + projection |
weighted_median |
Weighted median | + projection |
weighted_percentile |
Weighted percentile | + projection |
Statistics characterizing distribution shape.
Functions:
| Function | Description | Overloads |
|---|---|---|
population_skewness |
Population skewness | + precomputed mean, + projection |
sample_skewness |
Sample skewness | + precomputed mean, + projection |
skewness |
Skewness (defaults to sample) | + precomputed mean, + projection |
population_kurtosis |
Population kurtosis | + precomputed mean, + projection |
sample_kurtosis |
Sample kurtosis | + precomputed mean, + projection |
kurtosis |
Kurtosis (defaults to sample) | + precomputed mean, + projection |
Functions for measuring relationships between two variables.
Functions:
| Function | Description | Overloads |
|---|---|---|
population_covariance |
Population covariance | + precomputed means, + projection |
sample_covariance |
Sample covariance | + precomputed means, + projection |
covariance |
Covariance (defaults to sample) | + precomputed means, + projection |
pearson_correlation |
Pearson correlation coefficient | + precomputed means, + projection |
spearman_correlation |
Spearman rank correlation | + projection |
kendall_tau |
Kendall's rank correlation | + projection |
weighted_covariance |
Weighted covariance | + projection |
Note:
weighted_covarianceassumes frequency weights (repeat counts). The Bessel correction formulaW / (W² − Σwᵢ²)is designed for this weight type and reduces ton/(n−1)when all weights equal 1. For precision weights (inverse-variance) or reliability weights, a different correction is required.
Creating frequency distributions and histograms.
Structs:
frequency_entry<T>— fields:value,count,relative_frequency,cumulative_count,cumulative_relative_frequencyfrequency_table_result<T>— fields:entries,total_count
Functions:
| Function | Description | Overloads |
|---|---|---|
frequency_table |
Frequency distribution table | + projection |
frequency_count |
Frequency count map | + projection |
relative_frequency |
Relative frequency map | + projection |
cumulative_frequency |
Cumulative frequency | + projection |
cumulative_relative_frequency |
Cumulative relative frequency | + projection |
Special functions used in statistical computations.
Constants:
pi,sqrt_2,sqrt_2_pi,log_sqrt_2_pi
Functions:
| Function | Description |
|---|---|
lgamma |
Log-gamma function |
tgamma |
Gamma function |
beta |
Beta function |
lbeta |
Log-beta function |
betainc |
Regularized incomplete beta function |
betaincinv |
Inverse of regularized incomplete beta function |
erf |
Error function |
erfc |
Complementary error function |
norm_cdf |
Standard normal CDF |
norm_quantile |
Standard normal quantile (inverse CDF) |
gammainc_lower |
Regularized lower incomplete gamma function |
gammainc_upper |
Regularized upper incomplete gamma function |
gammainc_lower_inv |
Inverse of regularized lower incomplete gamma function |
Random number generation engine.
Type Aliases:
default_random_engine=std::mt19937_64
Traits:
is_random_engine<T>— type trait for random engine detectionis_random_engine_v<T>— variable template shortcut
Functions:
| Function | Description |
|---|---|
get_random_engine |
Get thread-local default random engine |
set_seed |
Set random seed |
randomize_seed |
Randomize seed from hardware entropy |
PDF, CDF, quantile functions, and random number generation for continuous probability distributions.
Each distribution provides _pdf, _cdf, _quantile, and _rand functions (except studentized range which provides CDF and quantile only, and Cauchy which is also included).
Functions:
| Distribution | cdf | quantile | rand | |
|---|---|---|---|---|
| Uniform | uniform_pdf |
uniform_cdf |
uniform_quantile |
uniform_rand |
| Normal | normal_pdf |
normal_cdf |
normal_quantile |
normal_rand |
| Exponential | exponential_pdf |
exponential_cdf |
exponential_quantile |
exponential_rand |
| Gamma | gamma_pdf |
gamma_cdf |
gamma_quantile |
gamma_rand |
| Beta | beta_pdf |
beta_cdf |
beta_quantile |
beta_rand |
| Chi-squared | chisq_pdf |
chisq_cdf |
chisq_quantile |
chisq_rand |
| t | t_pdf |
t_cdf |
t_quantile |
t_rand |
| F | f_pdf |
f_cdf |
f_quantile |
f_rand |
| Log-normal | lognormal_pdf |
lognormal_cdf |
lognormal_quantile |
lognormal_rand |
| Weibull | weibull_pdf |
weibull_cdf |
weibull_quantile |
weibull_rand |
| Studentized range | — | studentized_range_cdf |
studentized_range_quantile |
— |
PMF, CDF, quantile functions, and random number generation for discrete probability distributions.
Each distribution provides _pmf, _cdf, _quantile, and _rand functions.
Utility Functions:
| Function | Description |
|---|---|
log_factorial |
Log of factorial |
log_binomial_coef |
Log of binomial coefficient |
binomial_coef |
Binomial coefficient |
Distribution Functions:
| Distribution | pmf | cdf | quantile | rand |
|---|---|---|---|---|
| Binomial | binomial_pmf |
binomial_cdf |
binomial_quantile |
binomial_rand |
| Poisson | poisson_pmf |
poisson_cdf |
poisson_quantile |
poisson_rand |
| Geometric | geometric_pmf |
geometric_cdf |
geometric_quantile |
geometric_rand |
| Hypergeometric | hypergeom_pmf |
hypergeom_cdf |
hypergeom_quantile |
hypergeom_rand |
| Negative binomial | nbinom_pmf |
nbinom_cdf |
nbinom_quantile |
nbinom_rand |
| Bernoulli | bernoulli_pmf |
bernoulli_cdf |
bernoulli_quantile |
bernoulli_rand |
| Discrete uniform | discrete_uniform_pmf |
discrete_uniform_cdf |
discrete_uniform_quantile |
discrete_uniform_rand |
Statistical estimation (confidence intervals, sample size calculation).
Structs:
confidence_interval— fields:lower,upper,point_estimate,confidence_level
Functions:
| Function | Description | Overloads |
|---|---|---|
standard_error |
Standard error of the mean | + precomputed stddev, + projection |
ci_mean |
Confidence interval for mean (t-distribution) | 2 overloads |
ci_mean_z |
Confidence interval for mean (z, known variance) | |
ci_proportion |
CI for proportion (Wald method) | |
ci_proportion_wilson |
CI for proportion (Wilson method) | |
ci_variance |
CI for variance (chi-squared based) | |
ci_mean_diff |
CI for difference of means | |
ci_mean_diff_welch |
CI for difference of means (Welch) | |
ci_mean_diff_pooled |
CI for difference of means (pooled) | |
ci_proportion_diff |
CI for difference of proportions | |
margin_of_error_mean |
Margin of error for mean | 2 overloads |
margin_of_error_proportion |
Margin of error for proportion | |
margin_of_error_proportion_worst_case |
Worst-case margin of error | |
sample_size_for_moe_proportion |
Sample size for desired MOE (proportion) | |
sample_size_for_moe_mean |
Sample size for desired MOE (mean) |
Parametric hypothesis tests.
Enums:
alternative_hypothesis— values:two_sided,less,greater
Structs:
test_result— fields:statistic,p_value,df,alternative,df2
Functions:
| Function | Description |
|---|---|
z_test |
One-sample z-test (known variance) |
z_test_proportion |
One-sample proportion z-test |
z_test_proportion_two_sample |
Two-sample proportion z-test |
t_test |
One-sample t-test |
t_test_two_sample |
Two-sample t-test (pooled variance) |
t_test_welch |
Two-sample t-test (Welch method) |
t_test_paired |
Paired t-test |
chisq_test_gof |
Chi-squared goodness-of-fit test |
chisq_test_gof_uniform |
Chi-squared GOF (uniform expected) |
chisq_test_independence |
Chi-squared test of independence |
f_test |
F-test for equality of variances |
bonferroni_correction |
Bonferroni p-value correction |
benjamini_hochberg_correction |
Benjamini-Hochberg FDR correction |
holm_correction |
Holm step-down correction |
Nonparametric hypothesis tests.
Functions:
| Function | Description |
|---|---|
compute_ranks_with_ties |
Compute ranks with tie handling |
compute_tie_groups |
Compute tie group information |
shapiro_wilk_test |
Shapiro-Wilk normality test |
lilliefors_test |
Lilliefors normality test |
ks_test_normal |
(deprecated — use lilliefors_test) |
levene_test |
Levene's test for homogeneity of variance |
bartlett_test |
Bartlett's test for homogeneity of variance |
wilcoxon_signed_rank_test |
Wilcoxon signed-rank test |
mann_whitney_u_test |
Mann-Whitney U test |
kruskal_wallis_test |
Kruskal-Wallis test |
fisher_exact_test |
Fisher's exact test (2x2 table) |
Note (tie detection): Rank-based functions (
compute_ranks_with_ties,wilcoxon_signed_rank_test,mann_whitney_u_test,kruskal_wallis_test,spearman_correlation,kendall_tau) use exact floating-point equality (==) for tie detection. This is appropriate for observed data (integers, fixed-precision decimals) and consistent with R's behavior. If input values are the result of floating-point arithmetic, they may not be recognized as ties. Round or quantize such data before passing it to these functions.Note (Lilliefors test):
lilliefors_testuses an asymptotic approximation for p-value calculation, which may be imprecise for small samples (n < 20) or in extreme tail regions. For small samples, consider usingshapiro_wilk_testas an alternative.
Effect size calculations.
Enums:
effect_size_magnitude— values:negligible,small,medium,large
Functions:
| Function | Description |
|---|---|
cohens_d |
Cohen's d (one-sample, 3 overloads) |
cohens_d_two_sample |
Cohen's d (two-sample, pooled SD) |
hedges_correction_factor |
Hedges' g bias correction factor |
hedges_g |
Hedges' g (one-sample, bias-corrected) |
hedges_g_two_sample |
Hedges' g (two-sample, bias-corrected) |
glass_delta |
Glass's delta (control group SD) |
t_to_r |
Convert t-value to correlation |
d_to_r |
Convert Cohen's d to correlation |
r_to_d |
Convert correlation to Cohen's d |
eta_squared |
Eta squared from F-test |
partial_eta_squared |
Partial eta squared |
omega_squared |
Omega squared (less biased) |
cohens_h |
Cohen's h for proportions |
odds_ratio |
Odds ratio (2x2 table) |
risk_ratio |
Risk ratio (2x2 table) |
interpret_cohens_d |
Interpret Cohen's d magnitude |
interpret_correlation |
Interpret correlation magnitude |
interpret_eta_squared |
Interpret eta squared magnitude |
Resampling methods.
Structs:
bootstrap_result— fields:estimate,standard_error,ci_lower,ci_upper,bias,replicatespermutation_result— fields:observed_statistic,p_value,n_permutations,permutation_distribution
Functions:
| Function | Description |
|---|---|
bootstrap_sample |
Generate a bootstrap sample (2 overloads) |
bootstrap |
Bootstrap estimation with custom statistic |
bootstrap_mean |
Bootstrap for mean |
bootstrap_median |
Bootstrap for median |
bootstrap_stddev |
Bootstrap for standard deviation |
bootstrap_bca |
BCa bootstrap confidence interval |
permutation_test_two_sample |
Two-sample permutation test |
permutation_test_paired |
Paired permutation test |
permutation_test_correlation |
Correlation permutation test |
Power analysis and sample size calculation.
Structs:
power_result— fields:power,sample_size,effect_size,alpha
Functions (each has string and alternative_hypothesis enum overloads):
| Function | Description |
|---|---|
power_t_test_one_sample |
Power for one-sample t-test |
sample_size_t_test_one_sample |
Sample size for one-sample t-test |
power_t_test_two_sample |
Power for two-sample t-test |
sample_size_t_test_two_sample |
Sample size for two-sample t-test |
power_prop_test |
Power for proportion test |
sample_size_prop_test |
Sample size for proportion test |
power_analysis_t_one_sample |
Power analysis returning power_result |
power_analysis_t_one_sample_n |
Sample size analysis returning power_result |
Note: The t-test power/sample size functions use a normal distribution approximation. For large samples (n > 30), accuracy is sufficient. For small samples or small effect sizes, the results may slightly overestimate power compared to the exact noncentral t-distribution. For more precise calculations, consider specialized software such as R's
pwrpackage or G*Power.
Linear regression analysis.
Structs:
simple_regression_result— fields:intercept,slope,intercept_se,slope_se,intercept_t,slope_t,intercept_p,slope_p,r_squared,adj_r_squared,residual_se,f_statistic,f_p_value,df_regression,df_residual,ss_total,ss_regression,ss_residualmultiple_regression_result— fields:coefficients,coefficient_se,t_statistics,p_values,r_squared,adj_r_squared,residual_se,f_statistic,f_p_value,df_regression,df_residual,ss_total,ss_regression,ss_residualprediction_interval— fields:prediction,lower,upper,se_predictionresidual_diagnostics— fields:residuals,standardized_residuals,studentized_residuals,hat_values,cooks_distance,durbin_watson
Functions:
| Function | Description |
|---|---|
simple_linear_regression |
Simple linear regression |
multiple_linear_regression |
Multiple linear regression |
predict |
Prediction (2 overloads: simple, multiple) |
prediction_interval_simple |
Prediction interval for simple regression |
confidence_interval_mean |
CI for mean response in simple regression |
compute_residual_diagnostics |
Residual diagnostics (2 overloads) |
compute_vif |
Variance inflation factor |
correlation_matrix_determinant |
Determinant of correlation matrix |
multicollinearity_score |
Multicollinearity assessment score |
r_squared |
Coefficient of determination |
adjusted_r_squared |
Adjusted R-squared |
Analysis of variance.
Structs:
anova_row— fields:source,ss,df,ms,f_statistic,p_valueone_way_anova_result— fields:between,within,ss_total,df_total,n_groups,n_total,grand_mean,group_means,group_sizestwo_way_anova_result— fields:factor_a,factor_b,interaction,error,ss_total,df_total,levels_a,levels_b,n_total,grand_meanposthoc_comparison— fields:group1,group2,mean_diff,se,statistic,p_value,lower,upper,significantposthoc_result— fields:method,comparisons,alpha,mse,df_errorancova_result— fields:ss_covariate,ss_treatment,ss_error,df_covariate,df_treatment,df_error,ms_covariate,ms_treatment,ms_error,f_covariate,f_treatment,p_covariate,p_treatment,adjusted_means
Functions:
| Function | Description |
|---|---|
one_way_anova |
One-way ANOVA |
two_way_anova |
Two-way ANOVA |
tukey_hsd |
Tukey HSD post-hoc test |
bonferroni_posthoc |
Bonferroni post-hoc test |
dunnett_posthoc |
Dunnett post-hoc test |
scheffe_posthoc |
Scheffe post-hoc test |
one_way_ancova |
One-way ANCOVA |
eta_squared |
Eta squared from ANOVA result |
partial_eta_squared_a |
Partial eta squared for factor A |
partial_eta_squared_b |
Partial eta squared for factor B |
partial_eta_squared_interaction |
Partial eta squared for interaction |
omega_squared |
Omega squared from ANOVA result |
cohens_f |
Cohen's f from ANOVA result |
Generalized linear models.
Enums:
link_function— values:identity,logit,probit,log,inverse,cloglogdistribution_family— values:gaussian,binomial,poisson,gamma_family
Structs:
glm_result— fields:coefficients,coefficient_se,z_statistics,p_values,null_deviance,residual_deviance,df_null,df_residual,aic,bic,log_likelihood,iterations,converged,link,familyglm_residuals— fields:response,pearson,deviance,working
Functions:
| Function | Description |
|---|---|
glm_fit |
General GLM fitting (IRLS algorithm) |
logistic_regression |
Logistic regression (binomial/logit) |
predict_probability |
Predict probabilities from GLM |
odds_ratios |
Odds ratios from logistic regression |
odds_ratios_ci |
Odds ratios with confidence interval |
poisson_regression |
Poisson regression (poisson/log) |
predict_count |
Predict counts from Poisson model |
incidence_rate_ratios |
IRR from Poisson regression |
compute_glm_residuals |
GLM residual analysis |
overdispersion_test |
Test for overdispersion |
pseudo_r_squared_mcfadden |
McFadden pseudo R-squared |
pseudo_r_squared_nagelkerke |
Nagelkerke pseudo R-squared |
Model selection and regularized regression.
Structs:
cv_result— fields:mean_error,se_error,fold_errors,n_foldsregularized_regression_result— fields:coefficients,lambda,mse,iterations,converged
Functions:
| Function | Description |
|---|---|
aic |
Akaike information criterion |
aic_linear |
AIC for linear regression (2 overloads) |
aicc |
Corrected AIC |
bic |
Bayesian information criterion |
bic_linear |
BIC for linear regression (2 overloads) |
press_statistic |
PRESS statistic |
create_cv_folds |
Create cross-validation folds |
cross_validate_linear |
Cross-validate linear regression |
loocv_linear |
Leave-one-out cross-validation |
ridge_regression |
Ridge regression |
lasso_regression |
Lasso regression |
elastic_net_regression |
Elastic net regression |
cv_ridge |
Cross-validated ridge regression |
cv_lasso |
Cross-validated lasso regression |
generate_lambda_grid |
Generate regularization parameter grid |
Distance and similarity calculations.
Functions:
| Function | Description | Overloads |
|---|---|---|
euclidean_distance |
Euclidean distance | iterator, vector |
manhattan_distance |
Manhattan distance | iterator, vector |
cosine_similarity |
Cosine similarity | iterator, vector |
cosine_distance |
Cosine distance (1 - similarity) | iterator, vector |
mahalanobis_distance |
Mahalanobis distance | |
minkowski_distance |
Minkowski distance | iterator, vector |
chebyshev_distance |
Chebyshev distance | iterator, vector |
Utility functions for numerical computations.
Constants:
epsilon— machine epsilon for doubledefault_rel_tol— default relative tolerance (1e-9)default_abs_tol— default absolute tolerance (1e-12)
Functions:
| Function | Description |
|---|---|
approx_equal |
Approximate equality for floating-point numbers |
is_zero |
Check if value is approximately zero |
is_finite |
Check if value is finite |
all_finite |
Check if all values in range are finite |
has_converged_abs |
Absolute convergence check |
has_converged_rel |
Relative convergence check |
has_converged |
Combined convergence check |
log1p_safe |
Numerically stable log(1 + x) |
expm1_safe |
Numerically stable exp(x) - 1 |
clamp |
Clamp value to range |
in_range |
Check if value is in range |
relative_error |
Relative error between values |
safe_divide |
Safe division (avoids divide by zero) |
kahan_sum |
Kahan summation (2 overloads) |
approx_equal_range |
Approximate equality for ranges |
Multivariate analysis functions.
Structs:
pca_result— fields:components,explained_variance,explained_variance_ratio
Functions:
| Function | Description |
|---|---|
covariance_matrix |
Covariance matrix |
correlation_matrix |
Correlation matrix |
standardize |
Standardize data (z-score) |
min_max_scale |
Min-max scaling |
power_iteration |
Power iteration for eigenvalue |
pca |
Principal component analysis |
pca_transform |
Transform data using PCA result |
Time series analysis functions.
Functions:
| Function | Description |
|---|---|
autocorrelation |
Autocorrelation at given lag |
acf |
Autocorrelation function (all lags) |
pacf |
Partial autocorrelation function |
mae |
Mean absolute error |
mse |
Mean squared error |
rmse |
Root mean squared error |
mape |
Mean absolute percentage error |
moving_average |
Simple moving average |
exponential_moving_average |
Exponential moving average |
diff |
First differences |
seasonal_diff |
Seasonal differences |
lag |
Lag operator |
Categorical data analysis.
Structs:
contingency_table_result— fields:table,row_totals,col_totals,total,n_rows,n_colsodds_ratio_result— fields:odds_ratio,log_odds_ratio,se_log_odds_ratio,ci_lower,ci_upperrelative_risk_result— fields:relative_risk,log_relative_risk,se_log_relative_risk,ci_lower,ci_upperrisk_difference_result— fields:risk_difference,se,ci_lower,ci_upper
Functions:
| Function | Description |
|---|---|
contingency_table |
Create contingency table |
odds_ratio |
Odds ratio (table or 2x2 values) |
relative_risk |
Relative risk (table or 2x2 values) |
risk_difference |
Risk difference (table or 2x2 values) |
number_needed_to_treat |
Number needed to treat |
Survival analysis functions.
Structs:
kaplan_meier_result— fields:times,survival,se,ci_lower,ci_upper,n_at_risk,n_events,n_censoredlogrank_result— fields:statistic,p_value,df,expected1,expected2,observed1,observed2hazard_rate_result— fields:times,hazard,cumulative_hazard
Functions:
| Function | Description |
|---|---|
kaplan_meier |
Kaplan-Meier survival estimate |
logrank_test |
Log-rank test |
median_survival_time |
Median survival time |
nelson_aalen |
Nelson-Aalen cumulative hazard estimate |
Robust statistical methods.
Structs:
outlier_detection_result— fields:outliers,outlier_indices,lower_fence,upper_fence,q1,q3,iqr_value
Functions:
| Function | Description |
|---|---|
mad |
Median absolute deviation |
mad_scaled |
Scaled MAD (consistent estimator) |
detect_outliers_iqr |
Outlier detection via IQR method |
detect_outliers_zscore |
Outlier detection via z-score |
detect_outliers_modified_zscore |
Outlier detection via modified z-score |
winsorize |
Winsorize data |
cooks_distance |
Cook's distance |
dffits |
DFFITS influence measure |
hodges_lehmann |
Hodges-Lehmann estimator |
biweight_midvariance |
Biweight midvariance |
Clustering algorithms.
Enums:
linkage_type— values:single,complete,average,ward
Structs:
kmeans_result— fields:labels,centroids,inertia,n_iterdendrogram_node— fields:left,right,distance,count
Functions:
| Function | Description |
|---|---|
euclidean_distance |
Euclidean distance (vector) |
manhattan_distance |
Manhattan distance (vector) |
kmeans_plusplus_init |
K-means++ initialization |
kmeans |
K-means clustering |
hierarchical_clustering |
Hierarchical clustering |
cut_dendrogram |
Cut dendrogram at k clusters |
silhouette_score |
Silhouette score |
Data transformation and preprocessing.
Constants:
NA— NaN sentinel value
Structs:
group_result<K,V>— fields:groupsaggregation_result<K>— fields:keys,valueslabel_encoding_result<T>— fields:encoded,mapping,classesvalidation_result— fields:is_valid,n_missing,n_infinite,n_negative,missing_indices,infinite_indices,negative_indices
Functions:
| Function | Description |
|---|---|
is_na |
Check if value is NA/NaN |
dropna |
Remove NaN values (2 overloads) |
fillna |
Fill NaN with constant |
fillna_mean |
Fill NaN with mean |
fillna_median |
Fill NaN with median |
fillna_ffill |
Forward fill NaN |
fillna_bfill |
Backward fill NaN |
fillna_interpolate |
Interpolate NaN values |
filter |
Filter elements by predicate |
filter_rows |
Filter rows of matrix |
filter_range |
Filter by value range |
log_transform |
Log transformation |
log1p_transform |
Log(1+x) transformation |
sqrt_transform |
Square root transformation |
boxcox_transform |
Box-Cox transformation |
rank_transform |
Rank transformation |
group_by |
Group by key function |
group_mean |
Mean by group |
group_sum |
Sum by group |
group_count |
Count by group |
sort_values |
Sort values |
argsort |
Indices that sort data |
sample_with_replacement |
Random sampling with replacement |
sample_without_replacement |
Random sampling without replacement |
stratified_sample |
Stratified random sampling |
drop_duplicates |
Remove duplicate values |
value_counts |
Count unique values |
get_duplicates |
Get duplicate values |
rolling_mean |
Rolling mean |
rolling_std |
Rolling standard deviation |
rolling_min |
Rolling minimum |
rolling_max |
Rolling maximum |
rolling_sum |
Rolling sum |
label_encode |
Label encoding |
one_hot_encode |
One-hot encoding |
bin_equal_width |
Equal-width binning |
bin_equal_freq |
Equal-frequency binning |
validate_data |
Data validation |
validate_range |
Range validation |
Advanced missing data handling.
Enums:
missing_mechanism— values:mcar,mar,mnar,unknown
Structs:
mcar_test_result— fields:chi_square,p_value,df,is_mcar,interpretationmissing_pattern_info— fields:patterns,pattern_counts,missing_rates,overall_missing_rate,n_complete_cases,n_patternsmultiple_imputation_result— fields:imputed_datasets,m,pooled_means,pooled_vars,within_vars,between_vars,fraction_missing_infosensitivity_analysis_result— fields:delta_values,estimated_means,estimated_vars,original_mean,original_var,interpretationtipping_point_result— fields:tipping_point,found,threshold,interpretationcomplete_case_result— fields:complete_data,n_complete,n_dropped,proportion_complete
Functions:
| Function | Description |
|---|---|
analyze_missing_patterns |
Analyze missing data patterns |
create_missing_indicator |
Create missing indicator matrix |
test_mcar_simple |
Simple MCAR test |
diagnose_missing_mechanism |
Diagnose missing data mechanism |
impute_conditional_mean |
Conditional mean imputation |
multiple_imputation_pmm |
Multiple imputation (PMM) |
multiple_imputation_bootstrap |
Multiple imputation (bootstrap) |
sensitivity_analysis_pattern_mixture |
Pattern mixture sensitivity analysis |
sensitivity_analysis_selection_model |
Selection model sensitivity analysis |
find_tipping_point |
Tipping point analysis |
extract_complete_cases |
Extract complete cases |
correlation_matrix_pairwise |
Pairwise complete correlation matrix |
Convenience header that includes all modules. No additional functions defined.
#include "statcpp/statcpp.hpp" // Includes everythingMost functions accept STL-style random access iterator pairs (first, last).
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
double avg = statcpp::mean(data.begin(), data.end());Note: Matrix-based functions (e.g., GLM, ANOVA with design matrices, multiple regression, covariance matrix) use
std::vector<std::vector<double>>instead of iterator pairs.
Many functions support projection functions, allowing direct computation on struct members, etc.
struct Point { double x, y; };
std::vector<Point> points = {{1, 2}, {3, 4}, {5, 6}};
// Mean of x coordinates
double avg_x = statcpp::mean(points.begin(), points.end(),
[](const Point& p) { return p.x; });For invalid input (empty range, out-of-range parameters, etc.), std::invalid_argument is thrown.
std::vector<double> empty;
try {
double avg = statcpp::mean(empty.begin(), empty.end());
} catch (const std::invalid_argument& e) {
std::cerr << e.what() << std::endl; // "statcpp::mean: empty range"
}#include "statcpp/basic_statistics.hpp"
#include "statcpp/dispersion_spread.hpp"
#include <vector>
#include <algorithm>
std::vector<double> data = {5, 2, 8, 1, 3, 7, 4};
// Basic statistics
double avg = statcpp::mean(data.begin(), data.end());
double sd = statcpp::stddev(data.begin(), data.end());
// Order statistics (sorting required)
std::sort(data.begin(), data.end());
double med = statcpp::median(data.begin(), data.end());
auto q = statcpp::quartiles(data.begin(), data.end());#include "statcpp/parametric_tests.hpp"
std::vector<double> sample1 = {/* data */};
std::vector<double> sample2 = {/* data */};
// Two-sample t-test
auto result = statcpp::t_test_two_sample(
sample1.begin(), sample1.end(),
sample2.begin(), sample2.end()
);
std::cout << "t-statistic: " << result.statistic << std::endl;
std::cout << "p-value: " << result.p_value << std::endl;#include "statcpp/linear_regression.hpp"
std::vector<double> x = {1, 2, 3, 4, 5};
std::vector<double> y = {2, 4, 5, 4, 5};
auto model = statcpp::simple_linear_regression(
x.begin(), x.end(),
y.begin()
);
std::cout << "Intercept: " << model.intercept << std::endl;
std::cout << "Slope: " << model.slope << std::endl;
std::cout << "R²: " << model.r_squared << std::endl;- For practical code examples, see Examples
- For basic usage, see Usage Guide
- For detailed function specifications, refer to the Doxygen-generated documentation