Problem
#1633 exposed an unresolved core schema-model question: what should nullability mean for public GraphSchema declarations at the per-entity-type level versus aggregate table views?
Today, public schema declarations normalize Python shorthand through ScalarType(...), whose default is nullable=True. That makes examples like:
NodeType("Person", {"id": int, "name": str})
behave as nullable unless the user supplied an Arrow field with nullable=False. That may be wrong for the public per-entity contract: a declared Person.id property should likely be present/non-null for Person rows by default, while the aggregate nodes table column can still be nullable because it is fused across multiple entity types (Person.id may be null on Company rows).
Proposed direction to decide
Separate two concepts explicitly:
-
Per-entity-type declaration nullability
- Defaults may need to be non-null / required for shorthand declarations.
- Users should opt into nullable/optional explicitly.
-
Aggregate table nullability
- Merged node/edge table columns may remain nullable because multiple entity types are fused into one table.
- This should not imply the per-entity property itself is nullable for rows of that type.
Why this should precede pretty/inference work
#1633 initially tried to render non-null Arrow fields with a compact marker, which surfaced that the pretty-printer would be encoding a policy that is not yet settled in the core model. Before pretty-printing, inference (#1338), or additional schema UX work bakes in display conventions, we should decide the model semantics.
Acceptance
- Decide whether Python shorthand declarations default to required/non-null per entity type.
- Decide how users declare nullable/optional/maybe-absent properties.
- Decide how per-entity nullability maps into merged Arrow schemas and validation.
- Add anchored tests covering:
- per-entity required/default property semantics,
- explicitly nullable property semantics,
- aggregate fused-table nullable columns across different entity types,
- Arrow import/export round trips.
- Update docs to clarify per-entity vs aggregate table nullability.
Cross-refs
Problem
#1633 exposed an unresolved core schema-model question: what should nullability mean for public
GraphSchemadeclarations at the per-entity-type level versus aggregate table views?Today, public schema declarations normalize Python shorthand through
ScalarType(...), whose default isnullable=True. That makes examples like:behave as nullable unless the user supplied an Arrow field with
nullable=False. That may be wrong for the public per-entity contract: a declaredPerson.idproperty should likely be present/non-null forPersonrows by default, while the aggregate nodes table column can still be nullable because it is fused across multiple entity types (Person.idmay be null onCompanyrows).Proposed direction to decide
Separate two concepts explicitly:
Per-entity-type declaration nullability
Aggregate table nullability
Why this should precede pretty/inference work
#1633 initially tried to render non-null Arrow fields with a compact marker, which surfaced that the pretty-printer would be encoding a policy that is not yet settled in the core model. Before pretty-printing, inference (#1338), or additional schema UX work bakes in display conventions, we should decide the model semantics.
Acceptance
Cross-refs