Predict_proba functionality to Random Forest Classifier by skywardfire1 · Pull Request #360 · smartcorelib/smartcore

skywardfire1 · 2026-03-16T11:57:32Z

Checklist

[yes] My branch is up-to-date with development branch.
[yes] Everything works and tested on latest stable Rust.
[yes] Coverage and Linting have been applied

This PR adds:

Proper probability estimation to Decision Tree Classifier, matching scikit-learn's predict_proba behavior.
Predict_proba functionality to Random Forest Classifier, also in scikit-learn style, which was the goal.

Previously, probability predictions from Decision Tree returned one-hot encoded vectors (1.0 for the majority class, 0.0 for others), which did not reflect actual class distributions in leaf nodes.
While we could still use it in Random Forest, this approach would not provide calibrated probability estimates.

Changes:

Node structure extended: Added class_distribution: Vec<usize> field to store the class histogram in each node. This data was already being computed during training but was not persisted.
DecisionTreeClassifier: Added predict_proba_for_row_real() method that returns proper probability distributions based on leaf class counts. The original predict_proba() method remains unchanged for backward compatibility. Hope, we will obsolete it one day.
RandomForestClassifier: Added public predict_proba() method that averages probability distributions from all trees (scikit-learn style), rather than averaging hard class predictions.
Testing: Added 5 new tests covering:
- Probability distributions summing to 1.0
- Correct class ordering in predictions
- Mixed-class leaf handling
- Forest-level probability averaging
Debug assertions: Added 3 debug_assert_eq!, just to feel better.

Backward Compatibility:
No breaking changes. All existing APIs remain intact.

Note:
Current predict_proba function returns Vec<Vec<f64>>, not DenseMatrix<f64>, since I didn't find any examples on what is the default behavior or standard for this.

codecov · 2026-03-16T12:00:49Z

Codecov Report

❌ Patch coverage is 56.36364% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.69%. Comparing base (70d8a0f) to head (fc969fe).
⚠️ Report is 10 commits behind head on development.

Files with missing lines	Patch %	Lines
src/tree/decision_tree_classifier.rs	62.16%	14 Missing ⚠️
src/ensemble/random_forest_classifier.rs	44.44%	10 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff               @@
##           development     #360      +/-   ##
===============================================
- Coverage        45.59%   44.69%   -0.90%     
===============================================
  Files               93       95       +2     
  Lines             8034     8054      +20     
===============================================
- Hits              3663     3600      -63     
- Misses            4371     4454      +83

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

skywardfire1 · 2026-03-16T19:43:41Z

applied auto formatting.

Mec-iS · 2026-03-17T08:21:41Z

wow. thank you. will take a look asap

skywardfire1 · 2026-03-18T11:50:44Z

some additional info. This is how it looks like in my project

labels - True:      [1, 1, 4, 0, 0, 5, 3, 0, 0, 5, 0, 0, 0, 0, 0, 4, 4, 1, 2, 3, 5, 1, 4, 0, 3, 1, 0, 0, 3, 3, 0, 0, 0, 4, 5, 1, 1, 0, 0, 1, 5, 2, 4, 4, 0, 0, 1, 1, 3, 4, 0, 0, 4, 2, 2, 3, 4, 5, 5, 0, 0, 5, 0, 0, 0, 4, 4, 1, 5]
labels - Predicted: [5, 1, 4, 0, 0, 5, 2, 0, 0, 5, 0, 0, 4, 0, 0, 4, 4, 5, 2, 3, 5, 5, 4, 0, 3, 3, 0, 0, 3, 3, 0, 0, 0, 4, 5, 1, 1, 0, 0, 1, 5, 2, 4, 1, 0, 0, 1, 5, 3, 4, 0, 0, 4, 4, 4, 3, 4, 5, 5, 0, 0, 5, 0, 0, 0, 4, 4, 1, 1]

Primary probabilities (first 5 samples):
    [[0.0, 0.2, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0],
     [0.0, 0.9, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0],
     [0.1, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0],
     [0.9, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0],
     [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

And [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] result is funny to me since the answer is correct, but so shy

Mec-iS · 2026-03-19T05:29:19Z

this looks OK.

it would be nice to have a test using the iris dataset like this one:

    #[cfg_attr(
        all(target_arch = "wasm32", not(target_os = "wasi")),
        wasm_bindgen_test::wasm_bindgen_test
    )]
    #[test]
    fn fit_predict_iris_oob() { 
        ...

so the results can be checked on known results.

Mec-iS · 2026-03-19T05:51:07Z

maybe a little better error handling like:

  pub fn predict_proba(&self, x: &X) -> Result<Vec<Vec<f64>>, Failed> {                                                 
      let (n, _) = x.shape();                                                                                           
      let mut result = Vec::with_capacity(n);                                                                           
      for i in 0..n {                                                                                                   
          result.push(self.predict_proba_for_row(x, i));                                                                
      }                                                                                                                 
      Ok(result)                                                                                                        
  }

Rest looks good.

Mec-iS

check out my comments

…est uses Iris dataset, and consists of 4 checks. The 2nd test consists of 2 checks.

skywardfire1 · 2026-03-19T14:40:13Z

I revisited the tests. As said in commit comment, now there are 2 tests. The first one uses Iris dataset from the beginning of the file, and performs 4 checks.

Everything builds and works perfectly at my side, clippy and fmt --all shows no issues, so I have no idea why builds fail.

What about error handling. I can physically add error checks but it seems useless since there are no operations could possibly return an error if the user doesn't break the API.

skywardfire1 · 2026-03-20T08:18:34Z

still, checks don't look good. I offer to throw this PR away, I'll make another one soon after, with same functionality.

Mec-iS · 2026-03-20T08:28:00Z

no problem. proceed as you find right

skywardfire1 and others added 7 commits March 8, 2026 21:21

added Jaccard distance

9fdfd93

two encounters of a bad pattern is_none() + unwrap(). FIXED.

ece4f28

added 3 tests, incl. symmetry test. Now 4 in total.

ff65a08

it compiles

8898ae2

feat: implement proper predict_proba for Random Forest and Decision Tree

9a50b4f

Merge branch 'smartcorelib:development' into development

f09d365

clippy

470de49

skywardfire1 requested a review from Mec-iS as a code owner March 16, 2026 11:57

Mec-iS reviewed Mar 19, 2026

View reviewed changes

RF predict_proba functionality is now covered by 2 tests. The first t…

8f7b17a

…est uses Iris dataset, and consists of 4 checks. The 2nd test consists of 2 checks.

skywardfire1 force-pushed the development branch from fc969fe to 8f7b17a Compare March 19, 2026 14:27

Merge branch 'development' into development

ef961e0

skywardfire1 closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predict_proba functionality to Random Forest Classifier#360

Predict_proba functionality to Random Forest Classifier#360
skywardfire1 wants to merge 9 commits intosmartcorelib:developmentfrom
skywardfire1:development

skywardfire1 commented Mar 16, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

skywardfire1 commented Mar 16, 2026

Uh oh!

Mec-iS commented Mar 17, 2026

Uh oh!

skywardfire1 commented Mar 18, 2026

Uh oh!

Mec-iS commented Mar 19, 2026

Uh oh!

Mec-iS commented Mar 19, 2026

Uh oh!

Mec-iS left a comment

Uh oh!

skywardfire1 commented Mar 19, 2026

Uh oh!

skywardfire1 commented Mar 20, 2026

Uh oh!

Mec-iS commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

skywardfire1 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

skywardfire1 commented Mar 16, 2026

Uh oh!

Mec-iS commented Mar 17, 2026

Uh oh!

skywardfire1 commented Mar 18, 2026

Uh oh!

Mec-iS commented Mar 19, 2026

Uh oh!

Mec-iS commented Mar 19, 2026

Uh oh!

Mec-iS left a comment

Choose a reason for hiding this comment

Uh oh!

skywardfire1 commented Mar 19, 2026

Uh oh!

skywardfire1 commented Mar 20, 2026

Uh oh!

Mec-iS commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

skywardfire1 commented Mar 16, 2026 •

edited

Loading

codecov bot commented Mar 16, 2026 •

edited

Loading