Summary
Add Python-native typed model adapters for the Arrow-canonical public GraphSchema layer.
The core schema contract remains Arrow-native: GraphSchema.to_arrow() / future graph-schema wire envelopes are the source of truth for internals, wire transport, backend storage, and cross-language interoperability. Dataclasses and Pydantic models are client-side adapters that compile to/from Arrow-backed NodeType, EdgeType, and GraphSchema declarations.
Parent / Related
Motivation
Python users often already have typed domain models, or want generated typed models after inspecting a dataset schema:
- Existing typed app:
@dataclass class Cat: ... or class Cat(BaseModel): ... should produce a NodeType / GraphSchema.
- Existing dataset/remote schema: an Arrow-backed
GraphSchema should generate dataclass/Pydantic wrappers for row/entity work.
- Heavy mypy/pyright/Pydantic users should not duplicate schema definitions by hand.
Design constraints
- Arrow remains canonical; Python models never replace the wire/backend contract.
- Model adapters must round-trip through
NodeType / EdgeType / GraphSchema.
- Runtime-generated models are useful for notebooks and services; static typing needs optional emitted
.py / .pyi artifacts.
- Pydantic validators are Python-side validation only unless they can be expressed in Arrow/Graphistry schema metadata.
- Ambiguous Python annotations should require explicit Arrow overrides rather than guessing.
Possible API shape
from dataclasses import dataclass
from graphistry.schema import GraphSchema, NodeType, EdgeType
@dataclass
class Cat:
id: int
name: str
lives: int | None = None
CatType = NodeType.from_dataclass(Cat, labels=("Animal", "Cat"))
CatModel = CatType.to_dataclass()
schema = GraphSchema.from_models(node_models=[Cat], edge_models=[...])
models = schema.to_dataclasses()
Pydantic equivalents:
CatType = NodeType.from_pydantic(CatModel, labels=("Animal", "Cat"))
CatModel = CatType.to_pydantic()
In scope
- Dataclass ->
NodeType / EdgeType adapters.
- Pydantic ->
NodeType / EdgeType adapters.
GraphSchema.from_models(...) convenience layer.
NodeType / EdgeType / GraphSchema -> runtime dataclass generation.
NodeType / EdgeType / GraphSchema -> runtime Pydantic model generation.
- Optional codegen plan for
.py / .pyi artifacts for mypy/pyright users.
- Type mapping table and explicit loss/override policy.
Out of scope
Acceptance
- Dataclass required/optional primitive fields map deterministically to Arrow field type/nullability.
- Pydantic required/optional primitive fields map to the same Arrow schema as equivalent dataclasses.
- Field aliases/metadata can override Arrow field name, dtype, nullability, and Graphistry metadata.
- Ambiguous annotations fail with clear override guidance.
- Arrow schema -> model -> Arrow round trip preserves field names, Arrow types, nullability, and Graphistry metadata where representable.
- GraphSchema with Cat/Dog/Car node models and shared
Animal label preserves NodeType.name as type identity and labels as GFQL predicates.
- Docs state that Python models are adapters around the Arrow-canonical graph contract.
Notes
Relevant current docs/APIs:
- Python stdlib supports runtime dataclass generation via
dataclasses.make_dataclass.
- Pydantic v2 supports runtime model generation via
create_model, validated construction via model_validate, and trusted construction via model_construct.
- Static typing users need emitted modules/stubs for full mypy/pyright visibility; runtime-generated classes alone are not enough for CI type checking.
Summary
Add Python-native typed model adapters for the Arrow-canonical public
GraphSchemalayer.The core schema contract remains Arrow-native:
GraphSchema.to_arrow()/ future graph-schema wire envelopes are the source of truth for internals, wire transport, backend storage, and cross-language interoperability. Dataclasses and Pydantic models are client-side adapters that compile to/from Arrow-backedNodeType,EdgeType, andGraphSchemadeclarations.Parent / Related
Motivation
Python users often already have typed domain models, or want generated typed models after inspecting a dataset schema:
@dataclass class Cat: ...orclass Cat(BaseModel): ...should produce aNodeType/GraphSchema.GraphSchemashould generate dataclass/Pydantic wrappers for row/entity work.Design constraints
NodeType/EdgeType/GraphSchema..py/.pyiartifacts.Possible API shape
Pydantic equivalents:
In scope
NodeType/EdgeTypeadapters.NodeType/EdgeTypeadapters.GraphSchema.from_models(...)convenience layer.NodeType/EdgeType/GraphSchema-> runtime dataclass generation.NodeType/EdgeType/GraphSchema-> runtime Pydantic model generation..py/.pyiartifacts for mypy/pyright users.Out of scope
gfql_remote()schema transport; tracked in GFQL remote: send bound typed GraphSchema with gfql_remote requests #1465.Acceptance
Animallabel preservesNodeType.nameas type identity and labels as GFQL predicates.Notes
Relevant current docs/APIs:
dataclasses.make_dataclass.create_model, validated construction viamodel_validate, and trusted construction viamodel_construct.