Skip to content

Conversation

@david-cortes-intel
Copy link
Contributor

Description

Many estimators check for sparse inputs using scipy's issparse, but sklearn's data validators also take sparse data frames as sparse and convert them to COO/CSC/CSR internally, whereas sklearnex assumes that data frames are always dense.

This PR fixes the sparsity checks to consider also sparse data frames, and adds tests along the way to ensure that they are processed in the right format.


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/intelci: run



def make_sparse_array():
out = sp.random(50, 4, 0.5, format="csc", random_state=123)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why csc and not csr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test that the format is converted as needed.

assert not onedal.datatypes._data_conversion._convert_one_to_table.called


# Note that some estimators that are implemented through daal4py do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these tests then be in daal4py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They'd be harder to find and to connect among each other if they are all spread throughout different places.

@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
sklearnex/basic_statistics/basic_statistics.py 33.33% 2 Missing ⚠️
onedal/svm/svm.py 75.00% 0 Missing and 1 partial ⚠️
sklearnex/svm/svc.py 50.00% 1 Missing ⚠️
Flag Coverage Δ
azure 81.18% <78.94%> (-0.01%) ⬇️
github ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sklearnex/cluster/dbscan.py 75.32% <100.00%> (-0.32%) ⬇️
sklearnex/cluster/k_means.py 89.65% <100.00%> (-0.08%) ⬇️
sklearnex/decomposition/pca.py 92.67% <100.00%> (-0.89%) ⬇️
sklearnex/dummy/_dummy.py 82.55% <100.00%> (-0.21%) ⬇️
sklearnex/ensemble/_forest.py 82.81% <ø> (-0.04%) ⬇️
sklearnex/linear_model/linear.py 83.89% <100.00%> (+0.55%) ⬆️
sklearnex/linear_model/logistic_regression.py 56.64% <100.00%> (-0.31%) ⬇️
sklearnex/linear_model/ridge.py 73.61% <100.00%> (ø)
sklearnex/neighbors/common.py 92.48% <100.00%> (ø)
onedal/svm/svm.py 90.78% <75.00%> (+0.04%) ⬆️
... and 2 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/azp run Nightly

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@david-cortes-intel
Copy link
Contributor Author

CI failure is due to a bug in KMeans unrelated to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants