FIX: Fix processing of sparse data frames #2826

david-cortes-intel · 2025-12-05T16:25:42Z

Description

Many estimators check for sparse inputs using scipy's issparse, but sklearn's data validators also take sparse data frames as sparse and convert them to COO/CSC/CSR internally, whereas sklearnex assumes that data frames are always dense.

This PR fixes the sparsity checks to consider also sparse data frames, and adds tests along the way to ensure that they are processed in the right format.

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

david-cortes-intel · 2025-12-05T16:25:54Z

/intelci: run

david-cortes-intel · 2025-12-05T16:51:03Z

/intelci: run

icfaust · 2025-12-07T23:17:32Z

sklearnex/tests/test_sparse_processing.py

+
+
+def make_sparse_array():
+    out = sp.random(50, 4, 0.5, format="csc", random_state=123)


Why csc and not csr?

To test that the format is converted as needed.

icfaust · 2025-12-07T23:18:10Z

sklearnex/tests/test_sparse_processing.py

+    assert not onedal.datatypes._data_conversion._convert_one_to_table.called
+
+
+# Note that some estimators that are implemented through daal4py do


Shouldn't these tests then be in daal4py?

They'd be harder to find and to connect among each other if they are all spread throughout different places.

codecov · 2025-12-08T10:13:03Z

Codecov Report

❌ Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
sklearnex/basic_statistics/basic_statistics.py	33.33%	2 Missing ⚠️
onedal/svm/svm.py	75.00%	0 Missing and 1 partial ⚠️
sklearnex/svm/svc.py	50.00%	1 Missing ⚠️

Flag	Coverage Δ
azure	`81.18% <78.94%> (-0.01%)`	⬇️
github	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
sklearnex/cluster/dbscan.py	`75.32% <100.00%> (-0.32%)`	⬇️
sklearnex/cluster/k_means.py	`89.65% <100.00%> (-0.08%)`	⬇️
sklearnex/decomposition/pca.py	`92.67% <100.00%> (-0.89%)`	⬇️
sklearnex/dummy/_dummy.py	`82.55% <100.00%> (-0.21%)`	⬇️
sklearnex/ensemble/_forest.py	`82.81% <ø> (-0.04%)`	⬇️
sklearnex/linear_model/linear.py	`83.89% <100.00%> (+0.55%)`	⬆️
sklearnex/linear_model/logistic_regression.py	`56.64% <100.00%> (-0.31%)`	⬇️
sklearnex/linear_model/ridge.py	`73.61% <100.00%> (ø)`
sklearnex/neighbors/common.py	`92.48% <100.00%> (ø)`
onedal/svm/svm.py	`90.78% <75.00%> (+0.04%)`	⬆️
... and 2 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

david-cortes-intel · 2025-12-08T13:43:05Z

/intelci: run

david-cortes-intel · 2025-12-08T13:43:14Z

/azp run Nightly

azure-pipelines · 2025-12-08T13:43:23Z

Azure Pipelines successfully started running 1 pipeline(s).

david-cortes-intel · 2025-12-09T08:39:25Z

CI failure is due to a bug in KMeans unrelated to this PR.

correct checks and tests for sparse data frames

7b88cd7

david-cortes-intel requested review from ahuber21, ethanglaser, icfaust, razdoburdin and yuejiaointel as code owners December 5, 2025 16:25

david-cortes-intel added the bug Something isn't working label Dec 5, 2025

david-cortes-intel requested review from Vika-F, avolkov-intel, homksei and napetrov as code owners December 5, 2025 16:25

skip tests on older scipy

ce8d0b7

add copyright header

c0592e8

icfaust reviewed Dec 7, 2025

View reviewed changes

fix error

cdfbbfa

david-cortes-intel added 6 commits December 8, 2025 11:21

don't deselect valid conformance tests

4c08af0

solve merge conflicts

c1f95e8

update deselections

2973efa

fix bad merge conflict solves

6e8d6f6

fix again

77c6c52

fix for older sklearn

f0b701f

david-cortes-intel added 2 commits December 8, 2025 16:44

fixes for older versions

a51e964

try another way

4b99474

add new dependency also to conda recipe

95d367a

david-cortes-intel mentioned this pull request Dec 9, 2025

Add new pytest-mock package used in tests conda-forge/scikit-learn-intelex-feedstock#68

Open

5 tasks

solve merge conflicts

d5973ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX: Fix processing of sparse data frames #2826

FIX: Fix processing of sparse data frames #2826

david-cortes-intel commented Dec 5, 2025

Uh oh!

david-cortes-intel commented Dec 5, 2025

Uh oh!

david-cortes-intel commented Dec 5, 2025

Uh oh!

icfaust Dec 7, 2025

Uh oh!

david-cortes-intel Dec 8, 2025

Uh oh!

icfaust Dec 7, 2025

Uh oh!

david-cortes-intel Dec 8, 2025

Uh oh!

codecov bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

david-cortes-intel commented Dec 8, 2025

Uh oh!

david-cortes-intel commented Dec 8, 2025

Uh oh!

azure-pipelines bot commented Dec 8, 2025

Uh oh!

david-cortes-intel commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def make_sparse_array():
		out = sp.random(50, 4, 0.5, format="csc", random_state=123)

		assert not onedal.datatypes._data_conversion._convert_one_to_table.called


		# Note that some estimators that are implemented through daal4py do

FIX: Fix processing of sparse data frames #2826

Are you sure you want to change the base?

FIX: Fix processing of sparse data frames #2826

Conversation

david-cortes-intel commented Dec 5, 2025

Description

Uh oh!

david-cortes-intel commented Dec 5, 2025

Uh oh!

david-cortes-intel commented Dec 5, 2025

Uh oh!

icfaust Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

icfaust Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

david-cortes-intel commented Dec 8, 2025

Uh oh!

david-cortes-intel commented Dec 8, 2025

Uh oh!

azure-pipelines bot commented Dec 8, 2025

Uh oh!

david-cortes-intel commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Dec 8, 2025 •

edited

Loading