-
Notifications
You must be signed in to change notification settings - Fork 1
define json technical guide
This guide is aimed at data engineers and Python programmers working with clinical trial data pipelines. It assumes familiarity with JSON schemas, data modelling concepts, and ideally some exposure to CDISC standards (though the latter is not required).
- Architecture Overview
- Schema Foundation: LinkML
- The Root Object: MetaDataVersion
- ItemGroup and Item: The Data Layer
- Slicing: Parameter-Specific Definitions
- Conditions, WhereClauses, and RangeChecks
- Methods, FormalExpressions, and Analysis
- CodeLists and Controlled Vocabulary
- Semantic Layer: Coding and ReifiedConcept
- Data Products and Dataflows
- Versioning and Provenance
- XML ↔ JSON Conversion
- Reverse Engineering Metadata from Data
- Working with the Model in Python
- OID Reference Patterns
Define-JSON is a flat, reference-based JSON structure. Unlike deeply nested formats (e.g. XML hierarchies), most collections live at the top level of MetaDataVersion and reference each other by OID.
MetaDataVersion
├── itemGroups[] ← datasets, FHIR profiles, concept templates
│ └── items[] ← inlined within each group
├── items[] ← top-level template items (no group)
├── codeLists[] ← permissible value sets
├── conditions[] ← reusable logical expressions
├── whereClauses[] ← named data contexts (link conditions to structures)
├── methods[] ← derivation procedures
├── analyses[] ← analysis-specific method extensions
├── concepts[] ← abstract biomedical concepts (ReifiedConcept)
├── codings[] ← shared semantic tags
├── relationships[] ← inter-element semantic links
├── standards[] ← CDISC/external standard references
├── dictionaries[] ← external code systems (MedDRA, SNOMED, etc.)
├── dataProducts[] ← governed data product definitions
└── displays[] ← rendered analysis outputs
Key design decision: Items within an ItemGroup are inlined (embedded objects), but cross-references between groups use OID strings. This means you never dereference a separate lookup for items inside a group, but you do for codeList, method, applicableWhen, and similar cross-cutting references.
The model is defined in LinkML, a YAML-based schema language that generates JSON Schema, OWL, Python dataclasses, and more. The source schema is at https://cdisc.org/define-json.
Key LinkML concepts used throughout:
-
is_a: single inheritance (e.g.ItemGroupis aGovernedElement) -
mixins: multiple inheritance for cross-cutting concerns (e.g.IsProfile,IsODMStandard) -
multivalued: true: the slot holds a list -
inlined: true: the object is embedded, not referenced by ID -
any_of: polymorphic ranges (e.g.ownercan be astring,User, orOrganization) -
identifier: true: marks the primary key slot (OIDon allIdentifiableclasses)
You can generate Python dataclasses from the schema:
pip install linkml-runtime
gen-python https://cdisc.org/define-json > define_json_classes.pyMetaDataVersion is the tree root (LinkML tree_root: true) — the top-level object in any serialised Define-JSON document.
| Field | Type | Description |
|---|---|---|
OID |
string | Local identifier. Use MDV.STUDY-001.v1 format for submissions |
fileOID |
string | Identifier for the ODM file |
creationDateTime |
datetime | ISO 8601 timestamp |
odmVersion |
string | e.g. "2.0"
|
fileType |
string |
"Snapshot" or "Transactional"
|
studyOID |
string | Identifier for the parent study |
{
"OID": "MDV.LZZT.v1",
"fileOID": "ODM.LZZT",
"creationDateTime": "2024-01-15T10:00:00",
"odmVersion": "2.0",
"fileType": "Snapshot",
"studyOID": "STUDY.LZZT",
"studyName": "LZZT Phase II",
"itemGroups": [],
"codeLists": [],
"methods": [],
"conditions": [],
"whereClauses": []
}MetaDataVersion
is_a: GovernedElement
mixins: [Identifiable, Labelled, Governed]
mixins: [ODMFileMetadata, StudyMetadata]
This means MetaDataVersion picks up slots from all five classes: OID/uuid from Identifiable, name/label/description/coding/aliases from Labelled, mandatory/owner/purpose/lastUpdated/wasDerivedFrom/comments from Governed, ODM file fields from ODMFileMetadata, and study identification from StudyMetadata.
ItemGroup maps to a dataset, FHIR resource profile, OMOP table, or form section depending on context. Item is a single variable/column within it.
| Slot | Type | Notes |
|---|---|---|
OID |
string (required) | Primary key, e.g. "IG.VS"
|
name |
string | Short name, matches dataset name in data files |
domain |
string | CDISC domain abbreviation, e.g. "VS", "LB"
|
type |
ItemGroupType enum |
Dataset, DatasetSpecialization, FHIR, Form, etc. |
structure |
string | e.g. "One record per visit per vital sign test per subject"
|
items |
Item[] |
Inlined — full item objects, not references |
keySequence |
Item[] |
OID references to items that form the sort/uniqueness key |
slices |
ItemGroup[] |
Inlined sub-groups (parameter-specific specialisations) |
implementsConcept |
ReifiedConcept ref |
Links to abstract biomedical concept |
applicableWhen |
WhereClause[] refs |
When this group is in scope (OR logic across clauses) |
standard |
Standard ref |
CDISC IG being implemented |
wasDerivedFrom |
ref | Template this group was derived from |
| Slot | Type | Notes |
|---|---|---|
OID |
string (required) | e.g. "IT.VS.VSTESTCD"
|
name |
string | Variable name, e.g. "VSTESTCD"
|
dataType |
DataType enum |
text, integer, float, date, datetime, time, boolean
|
length |
integer | Max character length |
codeList |
CodeList ref |
Permissible value constraint |
method |
Method ref |
Derivation procedure |
origin |
Origin |
Source type and provenance |
rangeChecks |
RangeCheck[] |
Edit checks / CORE rules |
conceptProperty |
ConceptProperty ref |
Abstract property this item specialises |
applicableWhen |
WhereClause[] refs |
Conditional applicability |
wasDerivedFrom |
ref | Template item this was derived from |
{
"OID": "IG.VS",
"name": "VS",
"label": "Vital Signs",
"domain": "VS",
"type": "Dataset",
"structure": "One record per vital sign per visit per subject",
"standard": "SDTMIG.v3.4",
"keySequence": ["IT.VS.STUDYID", "IT.VS.USUBJID", "IT.VS.VSTESTCD", "IT.VS.VISITNUM"],
"items": [
{
"OID": "IT.VS.STUDYID",
"name": "STUDYID",
"label": "Study Identifier",
"dataType": "text",
"length": 12,
"origin": { "type": "Assigned" }
},
{
"OID": "IT.VS.VSTESTCD",
"name": "VSTESTCD",
"label": "Vital Signs Test Short Name",
"dataType": "text",
"length": 8,
"codeList": "CL.VSTESTCD",
"origin": { "type": "Assigned" }
},
{
"OID": "IT.VS.VSORRES",
"name": "VSORRES",
"label": "Result or Finding in Original Units",
"dataType": "text",
"length": 200,
"origin": { "type": "Collected" },
"method": null
},
{
"OID": "IT.VS.VSORRESU",
"name": "VSORRESU",
"label": "Original Units",
"dataType": "text",
"length": 40,
"codeList": "CL.UNIT",
"origin": { "type": "Collected" }
}
],
"slices": [
{ "OID": "VL.VS.DIABP", "name": "VL.VS.DIABP", "type": "DatasetSpecialization" },
{ "OID": "VL.VS.SYSBP", "name": "VL.VS.SYSBP", "type": "DatasetSpecialization" }
]
}Slices let you attach parameter-specific metadata without duplicating the parent group definition. A slice is itself an ItemGroup with type: "DatasetSpecialization" and an applicableWhen that scopes it.
This is the key structural improvement over Define-XML v2.1's ValueList approach: rather than grouping by variable (VSORRES, VSORRESU), slices group by clinical parameter (DIABP, SYSBP), so each slice carries both the result and the unit for that parameter.
{
"OID": "VL.VS.DIABP",
"name": "VL.VS.DIABP",
"label": "Diastolic Blood Pressure",
"type": "DatasetSpecialization",
"domain": "VS",
"applicableWhen": ["WC.VS.DIABP"],
"items": [
{
"OID": "IT.VS.DIABP.VSORRES",
"name": "VSORRES",
"label": "Diastolic BP Result",
"dataType": "float",
"rangeChecks": [
{
"item": "IT.VS.DIABP.VSORRES",
"comparator": "GE",
"checkValues": ["0"],
"softHard": "Soft"
},
{
"item": "IT.VS.DIABP.VSORRES",
"comparator": "LE",
"checkValues": ["300"],
"softHard": "Hard"
}
]
},
{
"OID": "IT.VS.DIABP.VSORRESU",
"name": "VSORRESU",
"label": "Diastolic BP Units",
"dataType": "text",
"codeList": "CL.MMHG_ONLY"
}
]
}The WC.VS.DIABP where-clause restricts this slice to rows where VSTESTCD = "DIABP":
{
"OID": "WC.VS.DIABP",
"name": "WC.VS.DIABP",
"conditions": ["COND.VS.DIABP"]
}WhereClause ← named context; referenced by items/groups via applicableWhen
└── Condition[] ← combined with AND within the clause
├── RangeCheck[] ← simple value comparisons (EQ, NE, IN, GE, LE, etc.)
├── Condition[] ← nested sub-conditions (recursive, for complex logic)
└── FormalExpression[] ← executable code for complex cases
Multiple WhereClause references on the same element are combined with OR logic: "applies when ANY of these clauses matches". Within a clause, Condition objects combine with AND (configurable via operator).
{
"OID": "COND.VS.DIABP",
"name": "COND.VS.DIABP",
"operator": "AND",
"rangeChecks": [
{
"item": "IT.VS.VSTESTCD",
"comparator": "EQ",
"checkValues": ["DIABP"],
"softHard": "Hard"
}
]
}EQ, NE, LT, LE, GT, GE, IN, NOTIN
| Value | Meaning |
|---|---|
Hard |
Error — data is invalid if check fails |
Soft |
Warning — data is unusual but not necessarily wrong |
{
"OID": "COND.ADVERSE_SERIOUS",
"name": "COND.ADVERSE_SERIOUS",
"operator": "AND",
"conditions": [
{
"OID": "COND.AESER_YES",
"operator": "AND",
"rangeChecks": [
{ "item": "IT.AE.AESER", "comparator": "EQ", "checkValues": ["Y"] }
]
},
{
"OID": "COND.AESEV_OR",
"operator": "OR",
"rangeChecks": [
{ "item": "IT.AE.AESEV", "comparator": "EQ", "checkValues": ["SEVERE"] },
{ "item": "IT.AE.AESEV", "comparator": "EQ", "checkValues": ["LIFE THREATENING"] }
]
}
]
}A Method is a reusable derivation procedure. Items reference methods via method: "MT.CALC_BMI".
{
"OID": "MT.CALC_BMI",
"name": "MT.CALC_BMI",
"label": "Calculate BMI",
"type": "Computation",
"expressions": [
{
"OID": "FE.CALC_BMI.SAS",
"context": "SAS",
"expression": "VSSTRESN = (WEIGHT_KG / (HEIGHT_M ** 2))",
"returnType": "float",
"parameters": [
{
"OID": "PARAM.WEIGHT_KG",
"name": "WEIGHT_KG",
"dataType": "float",
"required": true
},
{
"OID": "PARAM.HEIGHT_M",
"name": "HEIGHT_M",
"dataType": "float",
"required": true
}
]
},
{
"OID": "FE.CALC_BMI.PYTHON",
"context": "Python",
"expression": "bmi = weight_kg / (height_m ** 2)",
"returnType": "float"
}
]
}Analysis extends Method with study-specific traceability fields. Use it when you need to document why and from what an analysis was run, not just how.
{
"OID": "AN.SUMMARY_VS",
"name": "AN.SUMMARY_VS",
"label": "Vital Signs Summary Statistics",
"type": "Computation",
"analysisReason": "Primary Efficacy",
"analysisPurpose": "Exploratory",
"inputData": ["IG.VS", "IG.VS.VL.VS.DIABP"],
"expressions": [
{
"OID": "FE.SUMMARY_VS.R",
"context": "R",
"expression": "vs_summary <- vs_data %>% group_by(VSTESTCD, VISIT) %>% summarise(n=n(), mean=mean(VSSTRESN, na.rm=TRUE), sd=sd(VSSTRESN, na.rm=TRUE))"
}
]
}inputData accepts OIDs of ItemGroup or slice objects — make sure every referenced Item (e.g. analysis variables passed as Parameter) has its parent ItemGroup listed here.
{
"OID": "CL.VSTESTCD",
"name": "VSTESTCD",
"label": "Vital Signs Test Code",
"dataType": "text",
"standard": "CDISC/NCI",
"codeListItems": [
{
"codedValue": "DIABP",
"decode": "Diastolic Blood Pressure",
"coding": [{ "code": "C25299", "codeSystem": "NCI", "codeSystemVersion": "2023-09-25" }]
},
{
"codedValue": "SYSBP",
"decode": "Systolic Blood Pressure",
"coding": [{ "code": "C25298", "codeSystem": "NCI", "codeSystemVersion": "2023-09-25" }]
},
{
"codedValue": "TEMP",
"decode": "Temperature",
"weight": 3.0,
"coding": [{ "code": "C25206", "codeSystem": "NCI", "codeSystemVersion": "2023-09-25" }]
}
]
}When the full enumeration lives in an external system (MedDRA, SNOMED, LOINC), use externalCodeList instead of codeListItems:
{
"OID": "CL.MEDDRA_PT",
"name": "MEDDRA_PT",
"label": "MedDRA Preferred Terms",
"dataType": "text",
"externalCodeList": {
"OID": "RES.MEDDRA",
"name": "MedDRA",
"href": "https://www.meddra.org",
"version": "26.1"
}
}text, integer, float, double, decimal, date, time, datetime, dateTime, boolean, base64Binary, hexBinary, anyURI
This is what separates Define-JSON from a pure structural schema. Every element can be anchored to ontologies; datasets can declare which abstract biomedical concept they implement.
Attach standardised semantic tags to any element using the coding slot:
{
"OID": "IT.VS.VSORRES",
"name": "VSORRES",
"coding": [
{
"code": "C25712",
"codeSystem": "NCI",
"codeSystemVersion": "2023-09-25",
"decode": "Result",
"aliasType": "SameAs"
},
{
"code": "8480-6",
"codeSystem": "LOINC",
"codeSystemVersion": "2.76",
"decode": "Systolic blood pressure",
"aliasType": "NarrowMatch"
}
]
}aliasType (the AliasPredicate enum) controls the relationship semantics: SameAs, BroadMatch, NarrowMatch, RelatedMatch, Implements, IsA.
ReifiedConcept makes an abstract concept — e.g. "Diastolic Blood Pressure" as defined in the CDISC Biomedical Concept model — explicit and referenceable. ItemGroups and Methods then declare that they implement it.
{
"OID": "BC.DIABP",
"name": "DiastolicBloodPressure",
"label": "Diastolic Blood Pressure",
"href": "https://library.cdisc.org/api/cosmos/v2/bc/C25299",
"coding": [
{ "code": "C25299", "codeSystem": "NCI", "decode": "Diastolic Blood Pressure" }
],
"properties": [
{
"OID": "BCP.DIABP.RESULT",
"name": "result",
"label": "Result Value",
"minOccurs": 1,
"maxOccurs": 1,
"codeList": null
},
{
"OID": "BCP.DIABP.UNIT",
"name": "unit",
"label": "Unit of Measure",
"minOccurs": 1,
"maxOccurs": 1,
"codeList": "CL.MMHG_ONLY"
}
]
}An ItemGroup then declares "implementsConcept": "BC.DIABP" and each Item declares "conceptProperty": "BCP.DIABP.RESULT" — forming a typed, verifiable link from concrete implementation to abstract definition.
For pipeline and data contract use cases, DataProduct and Dataflow express the supply/demand boundary.
A Dataflow declares what structure is expected — before any concrete data exists. Think of it as an interface definition.
{
"OID": "DF.VS_TRANSFER",
"name": "DF.VS_TRANSFER",
"label": "Vital Signs Transfer Agreement",
"structure": "IG.VS",
"dimensionConstraint": ["IT.VS.USUBJID", "IT.VS.VSTESTCD", "IT.VS.VISITNUM"],
"version": "1.0"
}{
"OID": "DP.CLINICAL_DATA_V1",
"name": "ClinicalDataPackage",
"label": "Clinical Data Package v1",
"dataProductOwner": "Data Management Team",
"lifecycleStatus": "Active",
"domain": "SDTM",
"inputDataflow": ["DF.VS_TRANSFER", "DF.LB_TRANSFER"],
"outputDataset": ["DS.VS_FINAL", "DS.LB_FINAL"],
"outputPort": [
{
"OID": "SVC.FHIR_API",
"name": "FHIR R4 API",
"protocol": "HTTPS",
"resourceType": "HL7-FHIR",
"href": "https://api.example.com/fhir/r4",
"securitySchemaType": "OAuth2"
}
]
}{
"OID": "DS.VS_FINAL",
"name": "vs_final.xpt",
"structuredBy": "IG.VS",
"describedBy": "DF.VS_TRANSFER",
"conformsTo": "SDTM v1.8",
"dataExtractionDate": "2024-01-10",
"validFrom": "2023-01-01",
"validTo": "2023-12-31",
"distribution": [
{
"format": "application/x-xpt",
"accessService": {
"OID": "SVC.SFTP",
"protocol": "SFTP",
"href": "sftp://transfers.example.com/sdtm/"
}
}
]
}Each MetaDataVersion is a complete, immutable snapshot. Derivation is tracked via wasDerivedFrom, which accepts OID strings or typed object references.
{
"OID": "MDV.STUDY-001.v2",
"wasDerivedFrom": "MDV.STUDY-001.v1",
"creationDateTime": "2024-06-01T09:00:00",
"studyOID": "STUDY.001",
"itemGroups": [
{
"OID": "IG.VS.v2",
"wasDerivedFrom": "IG.VS.v1",
"items": [
{
"OID": "IT.VS.VSORRES.v2",
"wasDerivedFrom": "IT.VS.VSORRES.v1",
"name": "VSORRES",
"dataType": "float"
}
]
}
]
}This pattern supports:
-
Template reuse: a study-specific
ItemGroupderives from a CDISC standard template -
Study amendment tracking: each protocol amendment creates a new
MetaDataVersionlinked to the prior one -
Cross-study comparison: shared
wasDerivedFromancestry identifies equivalent variables across studies
{
"OID": "IT.VS.VSDY",
"name": "VSDY",
"label": "Study Day of Vital Signs",
"dataType": "integer",
"origin": {
"type": "Derived",
"source": "Sponsor",
"sourceItems": [
{
"item": "IT.VS.VSDTC",
"resource": ["IG.VS"],
"document": null
},
{
"item": "IT.DM.RFSTDTC",
"resource": ["IG.DM"]
}
]
},
"method": "MT.CALC_STUDY_DAY"
}OriginType values: Collected, Derived, Assigned, Protocol, eDT
OriginSource values: Investigator, Sponsor, Subject, Vendor
git clone https://github.com/TeMeta/define-json.git
cd define-json
pip install poetry
poetry installfrom src.define_json.converters.xml_to_json import PortableDefineXMLToJSONConverter
from src.define_json.converters.json_to_xml import DefineJSONToXMLConverter
from pathlib import Path
# Define-XML → Define-JSON
xml_converter = PortableDefineXMLToJSONConverter()
json_data = xml_converter.convert_file(
Path('data/define-360i.xml'),
Path('data/define-360i.json')
)
# Define-JSON → Define-XML
xml_converter = DefineJSONToXMLConverter()
xml_root = xml_converter.convert_file(
Path('data/define-360i.json'),
Path('data/define-360i-recreated.xml')
)# XML → JSON
poetry run python -m define_json xml2json data/define.xml data/output.json
# JSON → XML
poetry run python -m define_json json2xml data/input.json data/output.xml
# HTML rendering (no CORS issues for browser viewing)
poetry run python -m define_json json2html input.json output.html
# Roundtrip validation
poetry run python -m define_json roundtrip data/original.xml
# Schema validation
poetry run python -m define_json validate data/input.json| Aspect | Define-XML v2.1 | Define-JSON |
|---|---|---|
| ValueList grouping | By variable (VSORRES, VSORRESU) | By parameter (DIABP, SYSBP, TEMP) — clinically meaningful |
| WhereClause deduplication | Separate WC per variable per parameter | Shared WC per parameter — 27 → 14 in the 360i sample |
| JSON size | N/A | ~33% smaller than source XML (98KB → 66KB) |
| Reference model | Nested XML with repeated attribute XML | Flat JSON with OID references |
Generate a Define-JSON skeleton from existing Dataset-JSON files:
python scripts/reverse_engineer_define.py examples/sample_dataset_lb.jsonThis produces four output files:
| File | Contents |
|---|---|
define_metadata.json |
Inferred Define-JSON structure (ItemGroups, Items, CodeLists) |
sdmx_policy_suggestion.yaml |
Suggested SDMX dimension/measure assignments |
analysis_summary.json |
Per-variable statistics and confidence scores for data type inference |
reverse_engineering_report.md |
Human-readable audit of the inference process |
import json
from pathlib import Path
# Load Dataset-JSON
with open('examples/sample_dataset_lb.json') as f:
dataset = json.load(f)
# The reverse engineering script outputs structured metadata
# that can be post-processed:
with open('define_metadata.json') as f:
metadata = json.load(f)
# Iterate inferred items
for item_group in metadata.get('itemGroups', []):
print(f"Dataset: {item_group['name']}")
for item in item_group.get('items', []):
print(f" {item['name']}: {item['dataType']} (confidence: {item.get('confidence', 'n/a')})")import json
from pathlib import Path
with open('data/define-360i.json') as f:
mdv = json.load(f)
# Build an OID lookup for O(1) cross-reference resolution
oid_index = {}
for ig in mdv.get('itemGroups', []):
oid_index[ig['OID']] = ig
for item in ig.get('items', []):
oid_index[item['OID']] = item
for cl in mdv.get('codeLists', []):
oid_index[cl['OID']] = cl
for method in mdv.get('methods', []):
oid_index[method['OID']] = method
for wc in mdv.get('whereClauses', []):
oid_index[wc['OID']] = wc
# Resolve an item's code list
def get_codelist(item):
cl_oid = item.get('codeList')
if not cl_oid:
return None
return oid_index.get(cl_oid)
# Example: find all items with a specific data type
float_items = [
(ig['name'], item['name'])
for ig in mdv.get('itemGroups', [])
for item in ig.get('items', [])
if item.get('dataType') == 'float'
]def evaluate_where_clause(wc_oid, row: dict) -> bool:
"""Evaluate a WhereClause against a data row. Returns True if all conditions pass."""
wc = oid_index[wc_oid]
conditions = wc.get('conditions', [])
# Within a WhereClause, all conditions must be true (AND logic)
return all(evaluate_condition(cond, row) for cond in conditions)
def evaluate_condition(condition: dict, row: dict) -> bool:
operator = condition.get('operator', 'AND')
range_checks = condition.get('rangeChecks', [])
sub_conditions = condition.get('conditions', [])
results = []
for rc in range_checks:
item_oid = rc['item']
item = oid_index.get(item_oid, {})
value = row.get(item.get('name', ''))
results.append(evaluate_range_check(rc, value))
for sub in sub_conditions:
results.append(evaluate_condition(sub, row))
if operator == 'AND':
return all(results)
elif operator == 'OR':
return any(results)
return False
def evaluate_range_check(rc: dict, value) -> bool:
comparator = rc['comparator']
check_values = rc['checkValues']
if comparator == 'EQ':
return str(value) in check_values
elif comparator == 'NE':
return str(value) not in check_values
elif comparator == 'IN':
return str(value) in check_values
elif comparator == 'NOTIN':
return str(value) not in check_values
elif comparator == 'GE':
return float(value) >= float(check_values[0])
elif comparator == 'LE':
return float(value) <= float(check_values[0])
elif comparator == 'GT':
return float(value) > float(check_values[0])
elif comparator == 'LT':
return float(value) < float(check_values[0])
return False
# Determine which slice applies to a given row
def get_applicable_slice(item_group: dict, row: dict):
for slice_group in item_group.get('slices', []):
applicable_when = slice_group.get('applicableWhen', [])
# OR logic: row matches if ANY where-clause matches
if any(evaluate_where_clause(wc_oid, row) for wc_oid in applicable_when):
return slice_group
return Nonedef validate_row(row: dict, item_group: dict) -> list[dict]:
"""Returns a list of validation failures for a data row."""
failures = []
# Determine the applicable slice (if any)
slice_group = get_applicable_slice(item_group, row)
items_to_check = item_group.get('items', [])
if slice_group:
# Slice items override / supplement domain items
items_to_check = slice_group.get('items', items_to_check)
for item in items_to_check:
value = row.get(item['name'])
item_name = item['name']
# Code list check
cl = get_codelist(item)
if cl and value is not None:
allowed = {i['codedValue'] for i in cl.get('codeListItems', [])}
if allowed and str(value) not in allowed:
failures.append({
'item': item_name,
'severity': 'Hard',
'message': f"Value '{value}' not in code list {cl['OID']}"
})
# Range checks
for rc in item.get('rangeChecks', []):
if value is not None and not evaluate_range_check(rc, value):
failures.append({
'item': item_name,
'severity': rc.get('softHard', 'Hard'),
'message': f"Range check failed: {rc['comparator']} {rc['checkValues']}"
})
return failuresOIDs are the primary key mechanism. The schema uses CDISC conventions for regulatory submissions but allows any string for internal use. Recommended patterns:
| Object type | Pattern | Example |
|---|---|---|
| MetaDataVersion | MDV.<study>.<version> |
MDV.LZZT.v1 |
| ItemGroup | IG.<domain> |
IG.VS |
| ItemGroup slice | VL.<domain>.<param> |
VL.VS.DIABP |
| Item | IT.<domain>.<varname> |
IT.VS.VSTESTCD |
| CodeList | CL.<name> |
CL.VSTESTCD |
| Method | MT.<name> |
MT.CALC_BMI |
| Analysis | AN.<name> |
AN.SUMMARY_VS |
| WhereClause | WC.<domain>.<param> |
WC.VS.DIABP |
| Condition | COND.<name> |
COND.DIABP_FILTER |
| FormalExpression | FE.<method>.<context> |
FE.CALC_BMI.SAS |
| ReifiedConcept | BC.<name> |
BC.DIABP |
| Dataflow | DF.<name> |
DF.VS_TRANSFER |
| Dataset | DS.<name> |
DS.VS_FINAL |
| DataProduct | DP.<name> |
DP.CLINICAL_DATA_V1 |
Understanding when objects are inlined vs. referenced by OID string is essential for correct parsing:
| Slot | Inlined? | Notes |
|---|---|---|
MetaDataVersion.itemGroups |
✅ Yes | Full objects embedded |
ItemGroup.items |
✅ Yes | Full objects embedded |
ItemGroup.slices |
✅ Yes | Full objects embedded |
MetaDataVersion.codeLists |
✅ Yes | Full objects embedded |
MetaDataVersion.conditions |
✅ Yes | Full objects embedded |
MetaDataVersion.whereClauses |
✅ Yes | Full objects embedded |
Item.codeList |
❌ OID ref | String pointing to MetaDataVersion.codeLists[].OID
|
Item.method |
❌ OID ref | String pointing to MetaDataVersion.methods[].OID
|
Item.applicableWhen |
❌ OID ref list | Strings pointing to whereClauses[].OID
|
ItemGroup.applicableWhen |
❌ OID ref list | Strings pointing to whereClauses[].OID
|
ItemGroup.implementsConcept |
❌ OID ref | String pointing to concepts[].OID
|
MetaDataVersion.concepts |
❌ OID ref list |
Not inlined — referenced by OID from itemGroups
|
This means when building an index, you must traverse both inlined and top-level lists. The concepts array is notably not inlined despite being a top-level collection — it is always referenced by OID.
- Schema reference — full class, slot, and enumeration listings
- About page — design rationale and use cases
- Versioning architecture — copy-and-link model in detail
- Conversion guide — full XML↔JSON converter documentation
- GitHub repository — source schema, converters, examples, notebooks
- CDISC LinkML — related CDISC tooling ecosystem
© 2026 Clinical Data Interchange Standards Consortium
CDISC is a 501(c)(3) global nonprofit charitable organization with administrative offices in Austin, Texas, with hundreds of employees, volunteers, and member organizations around the world.