Skip to content

GFQL typed schema: Python dataclass/Pydantic adapters for GraphSchema #1642

@lmeyerov

Description

@lmeyerov

Summary

Add Python-native typed model adapters for the Arrow-canonical public GraphSchema layer.

The core schema contract remains Arrow-native: GraphSchema.to_arrow() / future graph-schema wire envelopes are the source of truth for internals, wire transport, backend storage, and cross-language interoperability. Dataclasses and Pydantic models are client-side adapters that compile to/from Arrow-backed NodeType, EdgeType, and GraphSchema declarations.

Parent / Related

Motivation

Python users often already have typed domain models, or want generated typed models after inspecting a dataset schema:

  • Existing typed app: @dataclass class Cat: ... or class Cat(BaseModel): ... should produce a NodeType / GraphSchema.
  • Existing dataset/remote schema: an Arrow-backed GraphSchema should generate dataclass/Pydantic wrappers for row/entity work.
  • Heavy mypy/pyright/Pydantic users should not duplicate schema definitions by hand.

Design constraints

  • Arrow remains canonical; Python models never replace the wire/backend contract.
  • Model adapters must round-trip through NodeType / EdgeType / GraphSchema.
  • Runtime-generated models are useful for notebooks and services; static typing needs optional emitted .py / .pyi artifacts.
  • Pydantic validators are Python-side validation only unless they can be expressed in Arrow/Graphistry schema metadata.
  • Ambiguous Python annotations should require explicit Arrow overrides rather than guessing.

Possible API shape

from dataclasses import dataclass
from graphistry.schema import GraphSchema, NodeType, EdgeType

@dataclass
class Cat:
    id: int
    name: str
    lives: int | None = None

CatType = NodeType.from_dataclass(Cat, labels=("Animal", "Cat"))
CatModel = CatType.to_dataclass()

schema = GraphSchema.from_models(node_models=[Cat], edge_models=[...])
models = schema.to_dataclasses()

Pydantic equivalents:

CatType = NodeType.from_pydantic(CatModel, labels=("Animal", "Cat"))
CatModel = CatType.to_pydantic()

In scope

  • Dataclass -> NodeType / EdgeType adapters.
  • Pydantic -> NodeType / EdgeType adapters.
  • GraphSchema.from_models(...) convenience layer.
  • NodeType / EdgeType / GraphSchema -> runtime dataclass generation.
  • NodeType / EdgeType / GraphSchema -> runtime Pydantic model generation.
  • Optional codegen plan for .py / .pyi artifacts for mypy/pyright users.
  • Type mapping table and explicit loss/override policy.

Out of scope

Acceptance

  1. Dataclass required/optional primitive fields map deterministically to Arrow field type/nullability.
  2. Pydantic required/optional primitive fields map to the same Arrow schema as equivalent dataclasses.
  3. Field aliases/metadata can override Arrow field name, dtype, nullability, and Graphistry metadata.
  4. Ambiguous annotations fail with clear override guidance.
  5. Arrow schema -> model -> Arrow round trip preserves field names, Arrow types, nullability, and Graphistry metadata where representable.
  6. GraphSchema with Cat/Dog/Car node models and shared Animal label preserves NodeType.name as type identity and labels as GFQL predicates.
  7. Docs state that Python models are adapters around the Arrow-canonical graph contract.

Notes

Relevant current docs/APIs:

  • Python stdlib supports runtime dataclass generation via dataclasses.make_dataclass.
  • Pydantic v2 supports runtime model generation via create_model, validated construction via model_validate, and trusted construction via model_construct.
  • Static typing users need emitted modules/stubs for full mypy/pyright visibility; runtime-generated classes alone are not enough for CI type checking.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions