Fix #72#78
Conversation
|
Not sure yet what the mambaforge failure means. Probably need to update the workflow to use a newer installer. |
|
I would assume #76 fixes this |
|
Thanks for the PR @chr-sa . I switched it to "Ready for review" because I think from your side you've made the change that you think will fix the issue. Unfortunately, the CI needs to be fixed first before we can productively review here because we need to see the tests. Please have a bit of patience. Maintenance of mdaencore is on a best-effort basis (which also means that we're very happy about anyone who wants to get involved). |
|
Sure, happy to help, thank you for encouraging the PR and for the work on MDAnalysis. |
|
Update on this: for me, only the release version |
|
Ok, that's good to know — so in principle, we'd just need a new 1.0.1 release. Nevertheless, we need CI working again before doing that, so once it's working we might as well look into this PR and see if it may be a more long-term solution than PR #60. |
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
|
Many Windows tests related to the clustering are failing. These tests passed before. This makes me think that the proposed change changes results. main-tests (windows-latest, 3.12, develop) Windows run test (main-tests (windows-latest, 3.12, develop)) output with failures=========================== short test summary info ===========================
FAILED mdaencore/tests/test_mdaencore.py::TestEncore::test_ces_convergence - AssertionError:
Arrays are not almost equal to 2 decimals
Unexpected value for Clustering Ensemble similarity in convergence estimation
Mismatched elements: 1 / 1 (100%)
Mismatch at index:
[0]: 0.3443593 (ACTUAL), 0.380395665848578 (DESIRED)
Max absolute difference among violations: 0.03603637
Max relative difference among violations: 0.0947339
ACTUAL: array(0.34)
DESIRED: array([0.38])
FAILED mdaencore/tests/test_mdaencore.py::TestEncore::test_ces - AssertionError:
Arrays are not almost equal to 2 decimals
Unexpected value for Cluster Ensemble Similarity: 0.267713. Expected 0.510000.
ACTUAL: np.float64(0.26771274394462913)
DESIRED: 0.51
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_three_ensembles_two_identical - AssertionError: Unexpected result:
assert 30 == 40
+ where 30 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_one_ensemble - AssertionError: Unexpected results:
assert 11 == 7
+ where 11 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_AffinityPropagationNative_direct - AssertionError: Unexpected result: [ 0 1 2 3 4 5 6 7 8 9 9 9 15 15 15 15 15 15 15 15]
assert 11 == 7
+ where 11 = len({np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), ...})
+ where {np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), ...} = set(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 15, 15, 15, 15, 15,\n 15, 15, 15]))
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_clustering_two_ensembles - AssertionError: Unexpected results:
assert 20 == 14
+ where 20 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClustering::test_sklearn_affinity_propagation - AssertionError: Native and sklearn implementations of affinity propagation don't agree: mismatch in number of clusters: 11 7
assert 11 == 7
+ where 11 = len()
+ and 7 = len()
FAILED mdaencore/tests/test_mdaencore.py::TestEncoreClusteringSklearn::test_one - AssertionError: Basic clustering test failed to give the rightnumber of clusters: 3 vs 10
assert 3 == 10
+ where 3 = .n_clusters
+ and 10 = len()
=========== 8 failed, 48 passed, 3 xpassed, 241 warnings in 34.79s ============
(Note that thanks to ongoing GitHub issues, it may not be possible to look at the output in a nice fashion. The raw logs may be available.) |
orbeckst
left a comment
There was a problem hiding this comment.
Given the failing Windows tests, I am going to block this change. We'd first have to establish that the proposed change fixes incorrect code. We'd then have to change the test fixtures.
| # Prepare input and ouput arrays | ||
| cdef numpy.ndarray[numpy.float32_t, ndim=1] matndarray = numpy.ascontiguousarray(s._elements, dtype=numpy.float32) | ||
| cdef numpy.ndarray[long, ndim=1] clusters = numpy.zeros((s.size),dtype=numpy.dtype("long")) | ||
| cdef numpy.ndarray[numpy.int64_t, ndim=1] clusters = numpy.zeros((s.size),dtype=numpy.int64) |
There was a problem hiding this comment.
@tylerjereddy do you have a quick opinion on what's the right approach here in matching dtypes in cython?
There was a problem hiding this comment.
Admittedly, given that "long" can be 32 bits on Windows, the proposed fix sounds like the right approach. I am then just a bit surprised that previously the Linux and Windows results agreed.
There was a problem hiding this comment.
I think the proposed fix just does not work because clusters would always be 64 bit and CAffinityPropagation takes long, which is 32 bit on Windows. Imo the "right" approach would be to use a 64 bit datatype everywhere for consistency, but that would change behavior on Windows.
I only tested my change on Linux and MacOS, where long is 64 bit matching the int64, and was not aware of the Windows behavior before, that's why I initially thought it works.
This seems to address a similar issue as #60
@orbeckst
Fixes #72
Changes made in this Pull Request:
PR Checklist