Skip to content

Conversation

@btkcodedev
Copy link
Contributor

closes #15421

Fix for Boolean Schema Properties in JSON Schema Ingestion

What: Fixed TypeError: argument of type 'bool' is not iterable when JSON schema properties are set to boolean values.

Root cause

Methods expected Dict but received bool when properties were set to true/false per JSON Schema spec.

Code Analysis

Main file: metadata-ingestion/src/datahub/ingestion/extractor/json_schema_util.py
-> Code expected all schemas to be Dict objects
-> Crashed at Ellipsis in schema and "key" in schema checks when schema was boolean
-> Boolean schemas are valid per JSON Schema: true = accepts any JSON, false = never validates

Solution

-> Added isinstance(schema, bool) checks before Ellipsis in schema
-> Convert boolean → {} dict at property iteration

This PR prevents crashes throughout the call chain for bool type

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Nov 28, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Nov 28, 2025
@btkcodedev btkcodedev changed the title fix: accept bool type in json-schema extractor fix(extractor): accept bool type in json-schema extractor Nov 28, 2025
@btkcodedev btkcodedev changed the title fix(extractor): accept bool type in json-schema extractor fix(ingestion/extractor): accept bool type in json-schema extractor Nov 28, 2025
@btkcodedev
Copy link
Contributor Author

Unit test results:
image

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@codecov
Copy link

codecov bot commented Nov 28, 2025

Bundle Report

Bundle size has no change ✅

@deepgarg760
Copy link
Collaborator

@btkcodedev Thanks for the contribution

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Dec 2, 2025
def _get_type_from_schema(schema: Dict) -> str:
"""Returns a generic json type from a schema."""
# Handle boolean schemas per JSON Schema spec: true accepts any JSON, false never validates
if isinstance(schema, bool):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method signature expects schema as Dict and here its been checked against bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boolean check is to prevent unexpected crashes similar to the issue. The user mentioned a crash from
if Ellipsis in schema: TypeError: argument of type 'bool' is not iterable

This is a check to prevent it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, should we fix the signature? def _get_type_from_schema(schema: Dict) -> str:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema: Union[Dict, bool]

@deepgarg760 deepgarg760 requested review from sgomezvillamor and treff7es and removed request for sgomezvillamor and treff7es December 2, 2025 12:25
@staticmethod
def _get_type_from_schema(schema: Dict) -> str:
"""Returns a generic json type from a schema."""
# Handle boolean schemas per JSON Schema spec: true accepts any JSON, false never validates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any public reference to the json schema spec that we could add here as comment?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, you can find the reference here:
https://json-schema.org/understanding-json-schema/basics#hello-world!
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata pending-submitter-response Issue/request has been reviewed but requires a response from the submitter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datahub failed when ingest json-schema if schema declare a property to true

4 participants