-
Notifications
You must be signed in to change notification settings - Fork 17
Update release history compatibility using script #2941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
lucyb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much manual testing have you done? Is it just that one codelist?
How would you feel about writing some unit tests to show that it's either updating the codelist version compatibility or not, depending on the codelist, including some codelists with multiple releases? I don't know how much effort that would be?
|
I can take a look at doing some more testing. Comprehensive unit tests are hard because it's inherently dependent on the history of the real prod data. |
PR #2911 fixed the calculation of coding system release compatibility for codelist versions. It did not correct the incorrectly calculated historical compatibility. This commit introduces a way of fixing that history via a Django script. In order to achieve this, the dynamically-set coding system database connections must be in place. The function that does this - `update_coding_system_database_connections` is not automatically called during script initiation (as it would be during normal app startup) so is called manually here. The script then iterates through all releases of all coding systems (in release date order) and calls the `update_codelist_version_compatibility` function that would normally be used in a coding system release import.
1e54bdc to
50ac1d1
Compare
|
Counting compatible releases ( dict(
sorted(
dict(
Counter(
[cv.compatible_releases.count() for cv in CodelistVersion.objects.all()]
)
).items()
)
)
# before
{
0: 14988,
1: 54,
2: 9,
3: 2,
4: 1,
7: 2,
8: 6,
14: 1,
16: 1,
19: 1,
24: 15,
}
# total: 15080
# after
{
0: 9532,
1: 1819,
2: 602,
3: 938,
4: 129,
5: 64,
6: 137,
7: 65,
8: 694,
9: 273,
10: 20,
11: 125,
12: 102,
13: 19,
14: 19,
15: 8,
16: 19,
17: 8,
18: 12,
19: 48,
20: 7,
21: 2,
22: 64,
23: 3,
24: 287,
25: 2,
26: 1,
27: 6,
28: 1,
29: 1,
30: 1,
31: 2,
32: 1,
34: 2,
35: 2,
36: 1,
38: 1,
40: 1,
41: 2,
42: 1,
47: 3,
49: 2,
50: 1,
55: 2,
59: 1,
61: 3,
62: 1,
63: 1,
79: 2,
82: 5,
83: 1,
87: 1,
94: 1,
95: 1,
99: 2,
100: 14,
102: 2,
117: 2,
118: 1,
122: 2,
124: 1,
126: 1,
131: 1,
135: 1,
154: 1,
160: 6,
}
# total: 15080 |
|
Per-codelist version comparison. Pre- and post-rebuild data exported thusly: with open("compat_counts.csv", "w") as f:
w = csv.DictWriter(f, fieldnames=["codelist_version_id", "compatible_version_count"])
w.writeheader()
w.writerows(
[
{
"codelist_version_id": f"{cv.codelist.full_slug()}/{cv.tag_or_hash}",
"compatible_version_count": cv.compatible_releases.count(),
}
for cv in CodelistVersion.objects.all()
]
)Comparison spreadsheet here |
|
Observations:
Spot-checking extreme examples (compatible counts >100) shows the top few are examples where all of the codes are invalid in the coding system (some user codelists, this un-published opensafely codelist which appears malformed in both its versions). This is something we may want to address (release compatibility calculation with bad codelists). The issue is that the compatibility is unchanged between those releases - it started off as zero and ended as zero! Others appear to be genuine, e.g. this one contains a VMP code for which there is only one AMP and has only been one AMP for as long as it has existed so does genuinely have an extremely long release compatibility list. @lucyb thoughts on whether the issue with "bad" codelist compatibility checking needs addressing? |
|
Looking at one of the malformed old OpenSAFELY-owned codelists, as it has no searches, the hierarchy-comparison is what's giving the false impression of compatibility. It's hierarchy is all descendent from the >>> CodelistVersion.objects.get_by_hash("0893ab89").hierarchy.data_for_cache()
'{
"root": null,
"nodes": [
"0402010J0AAAMAM",
"0402010J0BEABAJ",
"0402010J0BEAGAC",
"0402010J0AAALAL",
"0402010J0AABKBK",
"0402010J0AAA5A5",
"0402010J0AAA7A7",
"0402010J0BEAEAM",
"0402010J0BEAAAA",
"0402010J0AAA6A6",
null,
"0402010J0AAA8A8",
"0402010J0AABGBG",
"0402010J0AABJBJ",
"0402010J0AAAGAG",
"0402010J0BDABAL",
"0402010J0AAAAAA",
"0402010J0AABDBD",
"0402010J0AAAIAI",
"0402010J0BEAFAG",
"0402010J0BDACAG",
"0402010J0AABNBN",
"0402010J0BDAAAK",
"0402010J0AAACAC",
"0402010J0AAA4A4",
"0402010J0BH",
"0402010J0BE",
"0402010J0AAAKAK",
"0402010J0AABCBC",
"0402010J0BD",
"0402010J0AA",
"0402010J0BEAHAD",
"0402010J0AABBBB",
"0402010J0AAAJAJ",
"0402010J0BEACAK",
"0402010J0BBAABC",
"0402010J0BB",
"0402010J0",
"0402010J0BEADAL",
"0402010J0BHAABN",
"0402010J0AAADAD",
"0402010J0BDAEAC"
],
"child_map": {
"null": [
"0402010J0AAAMAM",
"0402010J0BEABAJ",
"0402010J0BEAGAC",
"0402010J0AAALAL",
"0402010J0AABKBK",
"0402010J0AAA5A5",
"0402010J0AAA7A7",
"0402010J0BEAEAM",
"0402010J0BEAAAA",
"0402010J0AAA6A6",
"0402010J0AABGBG",
"0402010J0AAA8A8",
"0402010J0AABJBJ",
"0402010J0AAAGAG",
"0402010J0BDABAL",
"0402010J0AAAAAA",
"0402010J0AABDBD",
"0402010J0AAAIAI",
"0402010J0BEAFAG",
"0402010J0BDACAG",
"0402010J0AABNBN",
"0402010J0BDAAAK",
"0402010J0AAACAC",
"0402010J0AAA4A4",
"0402010J0BH",
"0402010J0BE",
"0402010J0AAAKAK",
"0402010J0AABCBC",
"0402010J0BD",
"0402010J0AA",
"0402010J0BEAHAD",
"0402010J0AABBBB",
"0402010J0AAAJAJ",
"0402010J0BEACAK",
"0402010J0BBAABC",
"0402010J0BB",
"0402010J0",
"0402010J0BEADAL",
"0402010J0BHAABN",
"0402010J0AAADAD",
"0402010J0BDAEAC"
]
},
"parent_map": {
"0402010J0AAA6A6": [
null
],
"0402010J0AAACAC": [
null
],
"0402010J0AAA7A7": [
null
],
"0402010J0BH": [
null
],
"0402010J0AAALAL": [
null
],
"0402010J0AABJBJ": [
null
],
"0402010J0BEAFAG": [
null
],
"0402010J0AABBBB": [
null
],
"0402010J0AAA8A8": [
null
],
"0402010J0BHAABN": [
null
],
"0402010J0BEAEAM": [
null
],
"0402010J0AAAIAI": [
null
],
"0402010J0BEABAJ": [
null
],
"0402010J0BDACAG": [
null
],
"0402010J0BB": [
null
],
"0402010J0AABKBK": [
null
],
"0402010J0AA": [
null
],
"0402010J0BD": [
null
],
"0402010J0AAA4A4": [
null
],
"0402010J0BEAHAD": [
null
],
"0402010J0BDAAAK": [
null
],
"0402010J0BDAEAC": [
null
],
"0402010J0BBAABC": [
null
],
"0402010J0AAA5A5": [
null
],
"0402010J0AAADAD": [
null
],
"0402010J0BDABAL": [
null
],
"0402010J0BEAGAC": [
null
],
"0402010J0AABNBN": [
null
],
"0402010J0BEAAAA": [
null
],
"0402010J0AAAGAG": [
null
],
"0402010J0BE": [
null
],
"0402010J0AABGBG": [
null
],
"0402010J0AABCBC": [
null
],
"0402010J0BEACAK": [
null
],
"0402010J0AAAKAK": [
null
],
"0402010J0": [
null
],
"0402010J0AAAMAM": [
null
],
"0402010J0AAAJAJ": [
null
],
"0402010J0AAAAAA": [
null
],
"0402010J0BEADAL": [
null
],
"0402010J0AABDBD": [
null
]
},
"_descendants_cache": {},
"_ancestors_cache": {}
}' |
|
Seems that way for the other ones on my list >>> CodelistVersion.objects.get_by_hash("255efc95").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "36456811000001104", "36458511000001112", "36457111000001104", "36458411000001104"], "child_map": {"null": ["36458511000001112", "36457111000001104", "36458411000001104", "36456811000001104"]}, "parent_map": {"36458411000001104": [null], "36457111000001104": [null], "36456811000001104": [null], "36458511000001112": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("49396fa2").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "34376311000001104"], "child_map": {"null": ["34376311000001104"]}, "parent_map": {"34376311000001104": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("4e9eef8e").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "32746111000001104", "28691111000001108", "18162111000001104", "21259011000001104", "21393311000001104", "29934011000001104", "21258811000001108", "32982711000001104"], "child_map": {"null": ["32746111000001104", "28691111000001108", "18162111000001104", "21259011000001104", "21393311000001104", "29934011000001104", "21258811000001108", "32982711000001104"]}, "parent_map": {"21393311000001104": [null], "32746111000001104": [null], "32982711000001104": [null], "28691111000001108": [null], "29934011000001104": [null], "21258811000001108": [null], "18162111000001104": [null], "21259011000001104": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("0197e6e9").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "36456811000001104", "36458511000001112", "36457111000001104", "36458411000001104"], "child_map": {"null": ["36458511000001112", "36457111000001104", "36458411000001104", "36456811000001104"]}, "parent_map": {"36458411000001104": [null], "36457111000001104": [null], "36456811000001104": [null], "36458511000001112": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("0c62e6c1").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "34376311000001104"], "child_map": {"null": ["34376311000001104"]}, "parent_map": {"34376311000001104": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("5393839e").hierarchy.data_for_cache()
'{"root": null, "nodes": [], "child_map": {}, "parent_map": {}, "_descendants_cache": {}, "_ancestors_cache": {}}'Checking the other cases where there are >>> null_hierarchy_codelistversions = []
>>> for cv in CodelistVersion.objects.filter(cached_hierarchy__isnull=False):
... if cv.has_hierarchy and not cv.hierarchy.root:
... null_hierarchy_codelistversions.append(cv)
...
>>> len(null_hierarchy_codelistversions)
4676These can't all be "bad" codelists, I'll do some more digging |
With the fixes to the release compatibility checking, some codelist versions have very long compatibility histories. These do not fit at all well within the "Other compatible releases" box. The release compatibility checking can only every produce a contiguous series of releases, so a user only really needs the first and last compatible releases. Therefore, this commit changes this to show just these two releases.


PR #2911 fixed the calculation of coding system release compatibility for codelist versions.
It did not correct the incorrectly calculated historical compatibility - this PR fixes #2913 .
This commit introduces a way of fixing that history via a Django script.
In order to achieve this, the dynamically-set coding system database connections must be in place.
The function that does this -
update_coding_system_database_connectionsis not automatically called during script initiation (as it would be during normal app startup) so is called manually here.The script then iterates through all releases of all coding systems (in release date order) and calls the
update_codelist_version_compatibilityfunction that would normally be used in a coding system release import.