Skip to content

Fix #72#78

Open
chr-sa wants to merge 2 commits into
MDAnalysis:mainfrom
chr-sa:fix-issue-72
Open

Fix #72#78
chr-sa wants to merge 2 commits into
MDAnalysis:mainfrom
chr-sa:fix-issue-72

Conversation

@chr-sa
Copy link
Copy Markdown

@chr-sa chr-sa commented Apr 25, 2026

This seems to address a similar issue as #60
@orbeckst

Fixes #72

Changes made in this Pull Request:

  • Change numpy types from long to int64

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@orbeckst
Copy link
Copy Markdown
Member

Not sure yet what the mambaforge failure means. Probably need to update the workflow to use a newer installer.

@chr-sa
Copy link
Copy Markdown
Author

chr-sa commented Apr 26, 2026

I would assume #76 fixes this

@orbeckst orbeckst marked this pull request as ready for review April 26, 2026 19:24
@orbeckst
Copy link
Copy Markdown
Member

Thanks for the PR @chr-sa . I switched it to "Ready for review" because I think from your side you've made the change that you think will fix the issue.

Unfortunately, the CI needs to be fixed first before we can productively review here because we need to see the tests. Please have a bit of patience. Maintenance of mdaencore is on a best-effort basis (which also means that we're very happy about anyone who wants to get involved).

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need the CI to run first (PR #76 ) but in the meantime, can you update CHANGELOG and also add yourself to AUTHORS, please?

orbeckst referenced this pull request in chr-sa/mdaencore Apr 26, 2026
@chr-sa
Copy link
Copy Markdown
Author

chr-sa commented Apr 26, 2026

Sure, happy to help, thank you for encouraging the PR and for the work on MDAnalysis.
I think using intp instead of int64 would be the correct approach, since that would be the replacement for long. I can see if this still fixes the issue next week, I fixed this for a colleague and don't have the data myself.

@chr-sa
Copy link
Copy Markdown
Author

chr-sa commented Apr 27, 2026

Update on this: for me, only the release version v1.0.0 crashes, the main branch of this repo works for my usecase @orbeckst .

@orbeckst
Copy link
Copy Markdown
Member

Ok, that's good to know — so in principle, we'd just need a new 1.0.1 release. Nevertheless, we need CI working again before doing that, so once it's working we might as well look into this PR and see if it may be a more long-term solution than PR #60.

@orbeckst
Copy link
Copy Markdown
Member

@chr-sa can you add yourself to AUTHORS, too?

(I don't know yet if we will ultimately merge the PR but if we do I'd like to be able to do it all quickly and get a 1.0.1 release out #79 .)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.19%. Comparing base (410ff43) to head (e810164).

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orbeckst
Copy link
Copy Markdown
Member

Many Windows tests related to the clustering are failing. These tests passed before. This makes me think that the proposed change changes results.

main-tests (windows-latest, 3.12, develop)

Windows run test (main-tests (windows-latest, 3.12, develop)) output with failures
=========================== short test summary info ===========================
FAILED mdaencore/tests/test_mdaencore.py::TestEncore::test_ces_convergence - AssertionError: 
Arrays are not almost equal to 2 decimals
Unexpected value for Clustering Ensemble similarity in convergence estimation
Mismatched elements: 1 / 1 (100%)
Mismatch at index:
 [0]: 0.3443593 (ACTUAL), 0.380395665848578 (DESIRED)
Max absolute difference among violations: 0.03603637
Max relative difference among violations: 0.0947339
 ACTUAL: array(0.34)
 DESIRED: array([0.38])
FAILED mdaencore/tests/test_mdaencore.py::TestEncore::test_ces - AssertionError: 
Arrays are not almost equal to 2 decimals
Unexpected value for Cluster Ensemble Similarity: 0.267713. Expected 0.510000.
 ACTUAL: np.float64(0.26771274394462913)
 DESIRED: 0.51
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_three_ensembles_two_identical - AssertionError: Unexpected result: 
assert 30 == 40
 +  where 30 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_one_ensemble - AssertionError: Unexpected results: 
assert 11 == 7
 +  where 11 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_AffinityPropagationNative_direct - AssertionError: Unexpected result: [ 0  1  2  3  4  5  6  7  8  9  9  9 15 15 15 15 15 15 15 15]
assert 11 == 7
 +  where 11 = len({np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), ...})
 +    where {np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), ...} = set(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  9,  9, 15, 15, 15, 15, 15,\n       15, 15, 15]))
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_two_ensembles - AssertionError: Unexpected results: 
assert 20 == 14
 +  where 20 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_sklearn_affinity_propagation - AssertionError: Native and sklearn implementations of affinity propagation don't agree: mismatch in number of clusters: 11 7
assert 11 == 7
 +  where 11 = len()
 +  and   7 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClusteringSklearn::test_one - AssertionError: Basic clustering test failed to give the rightnumber of clusters: 3 vs 10
assert 3 == 10
 +  where 3 = .n_clusters
 +  and   10 = len()
=========== 8 failed, 48 passed, 3 xpassed, 241 warnings in 34.79s ============

(Note that thanks to ongoing GitHub issues, it may not be possible to look at the output in a nice fashion. The raw logs may be available.)

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the failing Windows tests, I am going to block this change. We'd first have to establish that the proposed change fixes incorrect code. We'd then have to change the test fixtures.

# Prepare input and ouput arrays
cdef numpy.ndarray[numpy.float32_t, ndim=1] matndarray = numpy.ascontiguousarray(s._elements, dtype=numpy.float32)
cdef numpy.ndarray[long, ndim=1] clusters = numpy.zeros((s.size),dtype=numpy.dtype("long"))
cdef numpy.ndarray[numpy.int64_t, ndim=1] clusters = numpy.zeros((s.size),dtype=numpy.int64)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerjereddy do you have a quick opinion on what's the right approach here in matching dtypes in cython?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, given that "long" can be 32 bits on Windows, the proposed fix sounds like the right approach. I am then just a bit surprised that previously the Linux and Windows results agreed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proposed fix just does not work because clusters would always be 64 bit and CAffinityPropagation takes long, which is 32 bit on Windows. Imo the "right" approach would be to use a 64 bit datatype everywhere for consistency, but that would change behavior on Windows.

I only tested my change on Linux and MacOS, where long is 64 bit matching the int64, and was not aware of the Windows behavior before, that's why I initially thought it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

name 'long' is not defined in clustering/affinityprop.pyx

2 participants