-
Notifications
You must be signed in to change notification settings - Fork 250
Find and fix a bug in Molecule isomorphism #1894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1894 +/- ##
==========================================
+ Coverage 47.10% 47.14% +0.03%
==========================================
Files 88 88
Lines 23189 23189
Branches 6033 6033
==========================================
+ Hits 10923 10932 +9
+ Misses 11114 11106 -8
+ Partials 1152 1151 -1
Continue to review full report at Codecov.
|
|
I think @mliu49 and I talked about doing this before and decided not to. I think at the time the argument was that if one cares about the correctness of the initial map (assuming the map is unique) one can simply use the generate_initial_map option which will give the right result. However inside RMG we already know the map and know it is correct the vast majority of times so in all of those situations it doesn't make sense to run this check. I'm unsure how much speed difference this would make, but given how often we use is_isomorphic it could be significant. |
|
Calling with There are other ways to rewrite this particular unit test to work around the problem instead of the solution currently in this PR, but then we still have the fact that a function called |
|
I do remember having that discussion, but I don't remember much of the details. In one of my many attempts to refactor these comparisons, I wrote an entirely new method with multiple arguments to handle all comparison types. The code is still here, and it has an option to verify the initial mapping, whether it was provided or generated. I ended up abandoning that branch because it seemed like a big function with a ton of options was less efficient than having multiple individual ones. I think it would be worthwhile to add this option to |
Previously, if there was an non-isomorphism present in the labeled atoms, then the mapping would be invalid, but this is not (currently) checked by the is_isomorphic method, so two non-isomorphic molecules would be found isomorphic, and would fail the test. Now we ensure that the mapping is valid, before passing it to is_isomorphic. This should work whatever the outcome of the discussion around PR ReactionMechanismGenerator#1894
|
So mazeau@8c78e91 now works around this problem so #1884 is no longer dependent on this PR. |
|
looking good! please rebase |
|
Are we certain we want to make this check non-optional? |
|
We could add a When would we then use the feature? I suppose if we have just algorithmically generated the initial map, but I'm not sure where that's the case. |
|
I'm not sure I understand...aren't all initial maps in an RMG job algorithmically generated? |
|
If there are labelled atoms then the labels are used to generate the initial mapping. If those atoms are NOT actually isomorphic, we never check them. See the example in the unit test that is added in this PR. This was causing tests in the database to incorrectly fail. |
|
Oh sorry, this is just Molecule.is_isomorphic, not all of them. |
|
Am I right that the only remaining concern is that checking isomorphism properly might slow things down a bit? The continuous-integration/rmg-tests show these timings: Observables Test Case: Aromatics Comparison Observables Test Case: NC Comparison Observables Test Case: SO2 Comparison Observables Test Case: EG1 Comparison All these jobs are only ~1 minute, so not a convincing argument, but I don't see evidence that we should worry. One is 1 second faster, and one is 1 second slower, and the other two are the same. Do we have (or should we have?) a more rigorous or accurate away to benchmark speed changes in pull requests? Weirdly, looking carefully at these logs, there appears to be some non-identical thermo in the rmg/nitrogen test But the same oxirene and Cyclopropene interchange was noticed on many different PRs so I think is an underlying non-deterministic behavior that should be addressed separately, in issue #2010. |
|
I think that's sufficient for this case, this one doesn't get as much use as its is_subgraph_isomorphic counterpart. It would be nice for us to have some speed benchmarks...as you've noted though RMG-tests examples aren't necessarily in the regime we're worried about. I think we'd need to add an example using the restart feature to start at a higher species load. To make it test properly on faster/slower processors I think we'd need to run an RMG independent speed test at the beginning that we could scale the RMG runtime to. |
The problem this identifies is that if you give it two different molecules but the only difference is in a set of atoms that you have already passed in as a map (eg. because of their labels) then they are thought to be isomorphic. I.e the initial mapping you pass is not checked.
This fixes the recently added failing isomorphism check. If you pass in a mapping, it is checked for validity. Note that this is not done at the Graph level or VF2, only at the Molecule level. I.e. this bug probably still exists for Graph and maybe Group objects.
|
Thanks |
Previously, if there was an non-isomorphism present in the labeled atoms, then the mapping would be invalid, but this is not (currently) checked by the is_isomorphic method, so two non-isomorphic molecules would be found isomorphic, and would fail the test. Now we ensure that the mapping is valid, before passing it to is_isomorphic. This should work whatever the outcome of the discussion around PR #1894
Motivation or Problem
If you ran
Molecule.is_isomorphicwith an incorrect initial map, it would assume nothing was wrong with the map, and if it found no non-isomorphic things in the rest of the molecule, would conclude the molecules were isomorphic.This was causing some incorrect failing database tests.
Description of Changes
If given an initial mapping, check it's valid.
Testing
Made a unit test that fails, then fixed it and it passes.