Skip to content

Conversation

@alexfurmenkov
Copy link
Collaborator

No description provided.

@alexfurmenkov alexfurmenkov marked this pull request as ready for review October 21, 2025 13:20
…into 715-usdm-schema-validation

# Conflicts:
#	cdisc_rules_engine/dataset_builders/dataset_builder_factory.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the source for creating these files? If possible, the logic for creating these should be added to update-cache call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A combination of https://github.com/cdisc-org/DDF-RA/blob/main/Deliverables/API/USDM_API.json and this script:

import os
import argparse
import json

def parse_arguments():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--input_file", help="USDM OpenAPI JSON file")
    args = parser.parse_args()
    return args

args = parse_arguments()

filename = os.path.split(args.input_file)[-1]

outfname = "".join(filename.split(".")[0])+"_schemas"

with open(args.input_file) as f:
    openapi = json.load(f)

jschema = {"$defs": {}}

def replace_deep(data, a, b):
    if isinstance(data, str):
        return data.replace(a, b)
    elif isinstance(data, dict):
        return {k: replace_deep(v, a, b) for k, v in data.items()}
    elif isinstance(data, list):
        return [replace_deep(v, a, b) for v in data]
    else:
        # nothing to do?
        return data
    
for sn,sd in openapi["components"]["schemas"].items():
    if sn == "Wrapper-Input":
        for k, v in sd.items():
            jschema[k] = replace_deep(replace_deep(v,"components/schemas","$defs"),"-Input","")
    elif not sn.endswith("-Output"):
        #jschema["$defs"][sn] = to_json_schema(replace_deep(sd,"components/schemas","$defs"))
        jschema["$defs"][sn.replace("-Input","")] = replace_deep(replace_deep(sd,"components/schemas","$defs"),"-Input","")

for v in jschema["$defs"].values():
    v.update({"additionalProperties": False})
    for pn, pd in v.get("properties", {}).items():
        if pn in v.get("required", []) and pd.get("type","") == "array":
            pd.update({"minItems": 1})

with open(os.path.join(''.join(os.path.split(args.input_file)[0:-1]),outfname+'.json'), "w", encoding="utf-8") as f:
    json.dump(jschema, f, ensure_ascii=False, indent=4)

Btw, this command was used to retrieve the 3rd version of the schema:
git --no-pager --git-dir DDF-RA.git show --format=format:"%B" v3.0.0:Deliverables/API/USDM_API.json > USDM_API_v3-0-0.json

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerrycampion I am not sure if it should be placed into the update cache call since the schema definition is being loaded into the LibraryMetadataContainer inside of the get_library_metadata_from_cache function:

library_metadata: LibraryMetadataContainer = get_library_metadata_from_cache(args)

https://github.com/cdisc-org/cdisc-rules-engine/pull/1375/files#diff-645421107f064022b54581bdaf972ee1baa090acef121e83d4b27be3f50ed802R146

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, you could - and probably should - also get the v4 spec (and any subsequent versions) in the same way as you get the v3 spec:

git --no-pager --git-dir DDF-RA.git show --format=format:"%B" v4.0.0:Deliverables/API/USDM_API.json > USDM_API_v4-0-0.json

(It's worth noting that git ... show --format=format:"%B" is used because this does a binary transfer that avoids any platform-specific encoding).

Though ideally the specs would be in the Library already...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ASL-rmarshall, Yes, I was using the git command to also retrieve the 4th version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexfurmenkov If you can't put it in the update-cache, I think you should at least add a github action for it, like "prerelease-update-usdm-schema.yml"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated readme with USDM Schema update instructions. We've agreed that this is going to be updated rarely, so just manual instructions should be good for now

@alexfurmenkov alexfurmenkov merged commit 762b8f9 into main Nov 5, 2025
11 checks passed
@alexfurmenkov alexfurmenkov deleted the 715-usdm-schema-validation branch November 5, 2025 14:58
@alexfurmenkov alexfurmenkov linked an issue Nov 6, 2025 that may be closed by this pull request
@alexfurmenkov alexfurmenkov removed a link to an issue Nov 6, 2025
@alexfurmenkov alexfurmenkov linked an issue Nov 6, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement schema-based validation for USDM JSON data

5 participants