-
-
Notifications
You must be signed in to change notification settings - Fork 17
feat: add t-test mode for statistical significance testing #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # Statistical Significance Testing (T-Test) | ||
|
|
||
| This example demonstrates how to use Welch's t-test to determine if benchmark differences are statistically significant. | ||
|
|
||
| ## The Problem | ||
|
|
||
| When running benchmarks on shared or cloud environments, results can vary due to: | ||
| - CPU throttling | ||
| - Background processes | ||
| - Memory pressure | ||
| - Cache effects | ||
|
|
||
| A benchmark might show one implementation as "1.05x faster", but is that a real improvement or just noise? | ||
|
|
||
| ## The Solution | ||
|
|
||
| Enable t-test mode with `ttest: true`: | ||
|
|
||
| ```js | ||
| const { Suite } = require('bench-node'); | ||
|
|
||
| const suite = new Suite({ | ||
| ttest: true, // Automatically sets repeatSuite=30 | ||
| }); | ||
|
|
||
| suite.add('baseline', { baseline: true }, () => { | ||
| // ... | ||
| }); | ||
|
|
||
| suite.add('alternative', () => { | ||
| // ... | ||
| }); | ||
| ``` | ||
|
|
||
| When `ttest: true` is set, the suite automatically: | ||
| 1. Sets `repeatSuite=30` for all benchmarks (can be overridden) | ||
| 2. Runs Welch's t-test to compare results against baseline | ||
| 3. Displays significance stars in the output | ||
|
|
||
| ## Understanding the Output | ||
|
|
||
| The output will show significance stars next to comparisons: | ||
|
|
||
| ``` | ||
| Summary (vs. baseline): | ||
| baseline/for-loop (baseline) | ||
| forEach (1.80x slower) *** | ||
| for-of-loop (1.09x slower) *** | ||
| reduce (1.06x faster) ** | ||
|
|
||
| Significance: * p<0.05, ** p<0.01, *** p<0.001 | ||
| ``` | ||
|
|
||
| - `***` = p < 0.001 - Very high confidence (99.9%) the difference is real | ||
| - `**` = p < 0.01 - High confidence (99%) the difference is real | ||
| - `*` = p < 0.05 - Moderate confidence (95%) the difference is real | ||
| - (no stars) = Not statistically significant - difference may be noise | ||
|
|
||
| ## When to Use | ||
|
|
||
| 1. **Comparing similar implementations** - Is the "optimization" actually faster? | ||
| 2. **CI/CD pipelines** - Detect real regressions vs. flaky results | ||
| 3. **Cloud/shared environments** - High variance requires statistical validation | ||
| 4. **Small differences** - 5% faster could be noise or real | ||
|
|
||
| ## Run the Example | ||
|
|
||
| ```bash | ||
| node --allow-natives-syntax node.js | ||
| ``` | ||
|
|
||
| ## Sample Output | ||
|
|
||
| ``` | ||
| baseline/for-loop x 85,009,221 ops/sec (311 runs sampled) | ||
| reduce x 89,853,937 ops/sec (321 runs sampled) | ||
| for-of-loop x 78,268,434 ops/sec (302 runs sampled) | ||
| forEach x 47,249,597 ops/sec (334 runs sampled) | ||
|
|
||
| Summary (vs. baseline): | ||
| baseline/for-loop (baseline) | ||
| forEach (1.80x slower) *** | ||
| for-of-loop (1.09x slower) *** | ||
| reduce (1.06x faster) ** | ||
|
|
||
| Significance: * p<0.05, ** p<0.01, *** p<0.001 | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| /** | ||
| * Statistical Significance Example | ||
| * | ||
| * This example demonstrates how to use Welch's t-test to determine | ||
| * if benchmark differences are statistically significant. | ||
| * | ||
| * When running benchmarks, especially on shared/cloud environments, | ||
| * small performance differences may just be random noise. The t-test | ||
| * helps identify when a difference is real vs. just variance. | ||
| * | ||
| * Run with: node --allow-natives-syntax node.js | ||
| */ | ||
|
|
||
| const { Suite } = require('../../lib'); | ||
|
|
||
| // Enable t-test mode - this automatically sets repeatSuite=30 for all benchmarks | ||
| const suite = new Suite({ | ||
| ttest: true, | ||
| }); | ||
|
|
||
| // Baseline: Simple array sum using for loop | ||
| suite.add('baseline/for-loop', { baseline: true }, () => { | ||
| const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | ||
| let sum = 0; | ||
| for (let i = 0; i < arr.length; i++) { | ||
| sum += arr[i]; | ||
| } | ||
| return sum; | ||
| }); | ||
|
|
||
| // Alternative 1: Using reduce (typically slower due to function call overhead) | ||
| suite.add('reduce', () => { | ||
| const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | ||
| return arr.reduce((acc, val) => acc + val, 0); | ||
| }); | ||
|
|
||
| // Alternative 2: for-of loop (similar performance to for loop) | ||
| suite.add('for-of-loop', () => { | ||
| const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | ||
| let sum = 0; | ||
| for (const val of arr) { | ||
| sum += val; | ||
| } | ||
| return sum; | ||
| }); | ||
|
|
||
| // Alternative 3: forEach (slower due to function call per element) | ||
| suite.add('forEach', () => { | ||
| const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | ||
| let sum = 0; | ||
| arr.forEach((val) => { | ||
| sum += val; | ||
| }); | ||
| return sum; | ||
| }); | ||
|
|
||
| suite.run(); |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.