Skip to content

Conversation

@Jongmassey
Copy link
Contributor

PR #2911 fixed the calculation of coding system release compatibility for codelist versions.

It did not correct the incorrectly calculated historical compatibility - this PR fixes #2913 .

This commit introduces a way of fixing that history via a Django script.

In order to achieve this, the dynamically-set coding system database connections must be in place.
The function that does this - update_coding_system_database_connections is not automatically called during script initiation (as it would be during normal app startup) so is called manually here.

The script then iterates through all releases of all coding systems (in release date order) and calls the
update_codelist_version_compatibility function that would normally be used in a coding system release import.

@Jongmassey
Copy link
Contributor Author

This is an alternative implementation to #2940 which avoids some of the issues of using migrations.

This approach yields the same result in the example given here.

Copy link
Contributor

@lucyb lucyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much manual testing have you done? Is it just that one codelist?

How would you feel about writing some unit tests to show that it's either updating the codelist version compatibility or not, depending on the codelist, including some codelists with multiple releases? I don't know how much effort that would be?

@Jongmassey
Copy link
Contributor Author

I can take a look at doing some more testing. Comprehensive unit tests are hard because it's inherently dependent on the history of the real prod data.

PR #2911 fixed the calculation of coding system release compatibility
for codelist versions.

It did not correct the incorrectly calculated historical compatibility.

This commit introduces a way of fixing that history via a Django
script.

In order to achieve this, the dynamically-set coding system database
connections must be in place.
The function that does this - `update_coding_system_database_connections`
is not automatically called during script initiation (as it would be
during normal app startup) so is called manually here.

The script then iterates through all releases of all coding systems
(in release date order) and calls the
`update_codelist_version_compatibility` function that would normally be
used in a coding system release import.
@Jongmassey Jongmassey force-pushed the Jongmassey/update-release-compat-history-script branch from 1e54bdc to 50ac1d1 Compare January 13, 2026 17:23
@Jongmassey
Copy link
Contributor Author

Jongmassey commented Jan 14, 2026

Counting compatible releases (count_of_compatible_releases : count_of_codelist_versions):

dict(
    sorted(
        dict(
            Counter(
                [cv.compatible_releases.count() for cv in CodelistVersion.objects.all()]
            )
        ).items()
    )
)

# before
{
    0: 14988,
    1: 54,
    2: 9,
    3: 2,
    4: 1,
    7: 2,
    8: 6,
    14: 1,
    16: 1,
    19: 1,
    24: 15,
}

# total: 15080

# after
{
    0: 9532,
    1: 1819,
    2: 602,
    3: 938,
    4: 129,
    5: 64,
    6: 137,
    7: 65,
    8: 694,
    9: 273,
    10: 20,
    11: 125,
    12: 102,
    13: 19,
    14: 19,
    15: 8,
    16: 19,
    17: 8,
    18: 12,
    19: 48,
    20: 7,
    21: 2,
    22: 64,
    23: 3,
    24: 287,
    25: 2,
    26: 1,
    27: 6,
    28: 1,
    29: 1,
    30: 1,
    31: 2,
    32: 1,
    34: 2,
    35: 2,
    36: 1,
    38: 1,
    40: 1,
    41: 2,
    42: 1,
    47: 3,
    49: 2,
    50: 1,
    55: 2,
    59: 1,
    61: 3,
    62: 1,
    63: 1,
    79: 2,
    82: 5,
    83: 1,
    87: 1,
    94: 1,
    95: 1,
    99: 2,
    100: 14,
    102: 2,
    117: 2,
    118: 1,
    122: 2,
    124: 1,
    126: 1,
    131: 1,
    135: 1,
    154: 1,
    160: 6,
}

# total: 15080

@Jongmassey
Copy link
Contributor Author

Per-codelist version comparison. Pre- and post-rebuild data exported thusly:

with open("compat_counts.csv", "w") as f:
    w = csv.DictWriter(f, fieldnames=["codelist_version_id", "compatible_version_count"])
    w.writeheader()
    w.writerows(
        [
            {
                "codelist_version_id": f"{cv.codelist.full_slug()}/{cv.tag_or_hash}",
                "compatible_version_count": cv.compatible_releases.count(),
            }
            for cv in CodelistVersion.objects.all()
        ]
    )

Comparison spreadsheet here

@Jongmassey
Copy link
Contributor Author

Observations:

  • number of overall codelist versions hasn't changed
  • number of compatible releases vastly increased in many cases
  • no instances of number of compatible releases decreasing

Spot-checking extreme examples (compatible counts >100) shows the top few are examples where all of the codes are invalid in the coding system (some user codelists, this un-published opensafely codelist which appears malformed in both its versions). This is something we may want to address (release compatibility calculation with bad codelists). The issue is that the compatibility is unchanged between those releases - it started off as zero and ended as zero!

Others appear to be genuine, e.g. this one contains a VMP code for which there is only one AMP and has only been one AMP for as long as it has existed so does genuinely have an extremely long release compatibility list.

@lucyb thoughts on whether the issue with "bad" codelist compatibility checking needs addressing?

@Jongmassey
Copy link
Contributor Author

Looking at one of the malformed old OpenSAFELY-owned codelists, as it has no searches, the hierarchy-comparison is what's giving the false impression of compatibility.

It's hierarchy is all descendent from the null node - I'll check some other but this might be a relatively easy case to catch

>>> CodelistVersion.objects.get_by_hash("0893ab89").hierarchy.data_for_cache()
'{
    "root": null,
    "nodes": [
        "0402010J0AAAMAM",
        "0402010J0BEABAJ",
        "0402010J0BEAGAC",
        "0402010J0AAALAL",
        "0402010J0AABKBK",
        "0402010J0AAA5A5",
        "0402010J0AAA7A7",
        "0402010J0BEAEAM",
        "0402010J0BEAAAA",
        "0402010J0AAA6A6",
        null,
        "0402010J0AAA8A8",
        "0402010J0AABGBG",
        "0402010J0AABJBJ",
        "0402010J0AAAGAG",
        "0402010J0BDABAL",
        "0402010J0AAAAAA",
        "0402010J0AABDBD",
        "0402010J0AAAIAI",
        "0402010J0BEAFAG",
        "0402010J0BDACAG",
        "0402010J0AABNBN",
        "0402010J0BDAAAK",
        "0402010J0AAACAC",
        "0402010J0AAA4A4",
        "0402010J0BH",
        "0402010J0BE",
        "0402010J0AAAKAK",
        "0402010J0AABCBC",
        "0402010J0BD",
        "0402010J0AA",
        "0402010J0BEAHAD",
        "0402010J0AABBBB",
        "0402010J0AAAJAJ",
        "0402010J0BEACAK",
        "0402010J0BBAABC",
        "0402010J0BB",
        "0402010J0",
        "0402010J0BEADAL",
        "0402010J0BHAABN",
        "0402010J0AAADAD",
        "0402010J0BDAEAC"
    ],
    "child_map": {
        "null": [
            "0402010J0AAAMAM",
            "0402010J0BEABAJ",
            "0402010J0BEAGAC",
            "0402010J0AAALAL",
            "0402010J0AABKBK",
            "0402010J0AAA5A5",
            "0402010J0AAA7A7",
            "0402010J0BEAEAM",
            "0402010J0BEAAAA",
            "0402010J0AAA6A6",
            "0402010J0AABGBG",
            "0402010J0AAA8A8",
            "0402010J0AABJBJ",
            "0402010J0AAAGAG",
            "0402010J0BDABAL",
            "0402010J0AAAAAA",
            "0402010J0AABDBD",
            "0402010J0AAAIAI",
            "0402010J0BEAFAG",
            "0402010J0BDACAG",
            "0402010J0AABNBN",
            "0402010J0BDAAAK",
            "0402010J0AAACAC",
            "0402010J0AAA4A4",
            "0402010J0BH",
            "0402010J0BE",
            "0402010J0AAAKAK",
            "0402010J0AABCBC",
            "0402010J0BD",
            "0402010J0AA",
            "0402010J0BEAHAD",
            "0402010J0AABBBB",
            "0402010J0AAAJAJ",
            "0402010J0BEACAK",
            "0402010J0BBAABC",
            "0402010J0BB",
            "0402010J0",
            "0402010J0BEADAL",
            "0402010J0BHAABN",
            "0402010J0AAADAD",
            "0402010J0BDAEAC"
        ]
    },
    "parent_map": {
        "0402010J0AAA6A6": [
            null
        ],
        "0402010J0AAACAC": [
            null
        ],
        "0402010J0AAA7A7": [
            null
        ],
        "0402010J0BH": [
            null
        ],
        "0402010J0AAALAL": [
            null
        ],
        "0402010J0AABJBJ": [
            null
        ],
        "0402010J0BEAFAG": [
            null
        ],
        "0402010J0AABBBB": [
            null
        ],
        "0402010J0AAA8A8": [
            null
        ],
        "0402010J0BHAABN": [
            null
        ],
        "0402010J0BEAEAM": [
            null
        ],
        "0402010J0AAAIAI": [
            null
        ],
        "0402010J0BEABAJ": [
            null
        ],
        "0402010J0BDACAG": [
            null
        ],
        "0402010J0BB": [
            null
        ],
        "0402010J0AABKBK": [
            null
        ],
        "0402010J0AA": [
            null
        ],
        "0402010J0BD": [
            null
        ],
        "0402010J0AAA4A4": [
            null
        ],
        "0402010J0BEAHAD": [
            null
        ],
        "0402010J0BDAAAK": [
            null
        ],
        "0402010J0BDAEAC": [
            null
        ],
        "0402010J0BBAABC": [
            null
        ],
        "0402010J0AAA5A5": [
            null
        ],
        "0402010J0AAADAD": [
            null
        ],
        "0402010J0BDABAL": [
            null
        ],
        "0402010J0BEAGAC": [
            null
        ],
        "0402010J0AABNBN": [
            null
        ],
        "0402010J0BEAAAA": [
            null
        ],
        "0402010J0AAAGAG": [
            null
        ],
        "0402010J0BE": [
            null
        ],
        "0402010J0AABGBG": [
            null
        ],
        "0402010J0AABCBC": [
            null
        ],
        "0402010J0BEACAK": [
            null
        ],
        "0402010J0AAAKAK": [
            null
        ],
        "0402010J0": [
            null
        ],
        "0402010J0AAAMAM": [
            null
        ],
        "0402010J0AAAJAJ": [
            null
        ],
        "0402010J0AAAAAA": [
            null
        ],
        "0402010J0BEADAL": [
            null
        ],
        "0402010J0AABDBD": [
            null
        ]
    },
    "_descendants_cache": {},
    "_ancestors_cache": {}
}'

@Jongmassey
Copy link
Contributor Author

Seems that way for the other ones on my list

>>> CodelistVersion.objects.get_by_hash("255efc95").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "36456811000001104", "36458511000001112", "36457111000001104", "36458411000001104"], "child_map": {"null": ["36458511000001112", "36457111000001104", "36458411000001104", "36456811000001104"]}, "parent_map": {"36458411000001104": [null], "36457111000001104": [null], "36456811000001104": [null], "36458511000001112": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("49396fa2").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "34376311000001104"], "child_map": {"null": ["34376311000001104"]}, "parent_map": {"34376311000001104": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("4e9eef8e").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "32746111000001104", "28691111000001108", "18162111000001104", "21259011000001104", "21393311000001104", "29934011000001104", "21258811000001108", "32982711000001104"], "child_map": {"null": ["32746111000001104", "28691111000001108", "18162111000001104", "21259011000001104", "21393311000001104", "29934011000001104", "21258811000001108", "32982711000001104"]}, "parent_map": {"21393311000001104": [null], "32746111000001104": [null], "32982711000001104": [null], "28691111000001108": [null], "29934011000001104": [null], "21258811000001108": [null], "18162111000001104": [null], "21259011000001104": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("0197e6e9").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "36456811000001104", "36458511000001112", "36457111000001104", "36458411000001104"], "child_map": {"null": ["36458511000001112", "36457111000001104", "36458411000001104", "36456811000001104"]}, "parent_map": {"36458411000001104": [null], "36457111000001104": [null], "36456811000001104": [null], "36458511000001112": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("0c62e6c1").hierarchy.data_for_cache()
'{"root": null, "nodes": [null, "34376311000001104"], "child_map": {"null": ["34376311000001104"]}, "parent_map": {"34376311000001104": [null]}, "_descendants_cache": {}, "_ancestors_cache": {}}'
>>> CodelistVersion.objects.get_by_hash("5393839e").hierarchy.data_for_cache()
'{"root": null, "nodes": [], "child_map": {}, "parent_map": {}, "_descendants_cache": {}, "_ancestors_cache": {}}'

Checking the other cases where there are Hierarchies with a null root node:

>>> null_hierarchy_codelistversions = []
>>> for cv in CodelistVersion.objects.filter(cached_hierarchy__isnull=False):
...     if cv.has_hierarchy and not cv.hierarchy.root:
...         null_hierarchy_codelistversions.append(cv)
... 
>>> len(null_hierarchy_codelistversions)
4676

These can't all be "bad" codelists, I'll do some more digging

@Jongmassey
Copy link
Contributor Author

The instances of CodelistVersions having even genuine very long compatibility lists has thrown up a frontend issue:
image

isn't particularly readable or useful!

Given that the way that compatibility is calculated, there can only ever be a contiguous series of compatible releases, I think it would be far more concise and useful just to tell the user the last release that was compatible - which given it's adjacency to the box showing the release with which the version was created nicely shows a timespan of compatibility:
image

With the fixes to the release compatibility checking,
some codelist versions have very long compatibility histories.

These do not fit at all well within the "Other compatible releases" box.
The release compatibility checking can only every produce a contiguous
series of releases,
so a user only really needs the first and last compatible releases.

Therefore, this commit changes this to show just these two releases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-bulild CodelistVersion's compatible_releases

3 participants