Summary
Add Python-native typed adapters and dataframe/entity façades over Arrow-canonical GraphSchema declarations.
The goal is ORM-like ergonomics without replacing the dataframe/Arrow execution core: users should get dataclass/Pydantic adapters, typed selectors/views for node/edge tables, and typed wrappers for extracted rows/entities, while GFQL execution remains dataframe/vectorized and schema transport remains Arrow-native.
Parent / Related
Motivation
Users want type-friendly movement across layers:
- typed Python model definitions: dataclass/Pydantic models should compile to Arrow-backed
NodeType / EdgeType / GraphSchema,
- generated Python wrappers: Arrow-backed schemas should generate dataclass/Pydantic row/entity models,
- dataframe/table work: typed views over
g._nodes / g._edges or public equivalents,
- query/type narrowing: explicit
of_type(Cat) rather than static inference from arbitrary dataframe query strings,
- entity work:
c: Cat = ... for extracted node/edge rows,
- mypy/pyright/Pydantic-friendly wrappers for app code and tests.
String dataframe queries like g._edges.query('type == "Cat"') can be useful at runtime, but static type checkers cannot reliably infer a Cat type from arbitrary strings. Type narrowing should therefore happen through explicit typed APIs.
Possible API shape
from dataclasses import dataclass
@dataclass
class Cat:
id: int
name: str
lives: int | None = None
CatType = NodeType.from_dataclass(Cat, labels=("Animal", "Cat"))
CatModel = CatType.to_dataclass()
cats = g.nodes.of_type(Cat) # NodeFrame[Cat]
first: Cat = cats.first_model()
rows: list[Cat] = cats.as_models()
works = g.edges.of_type(WorksAt) # EdgeFrame[WorksAt]
works.select_model_fields()
works.validate_rows()
For heterogeneous results:
entity = g.nodes.first_entity()
# generated discriminated union keyed by Graphistry type identity,
# not raw label marker sets
Design constraints
- Arrow/GraphSchema remains canonical.
NodeType.name / EdgeType.name are stable type identity.
- Labels are GFQL predicates and may overlap, e.g. Cat and Dog can both be
Animal.
- Do not attempt to statically type arbitrary pandas/cuDF/polars expression strings.
- Do not reimplement full dataframe APIs; wrappers should be thin and explicit.
- Per-row conversion is a user/API boundary tool, not a GFQL hot-path mechanism.
- Runtime-generated models are useful for notebooks and services; static typing needs optional emitted
.py / .pyi artifacts.
- Pydantic validators are Python-side validation only unless they can be expressed in Arrow/Graphistry schema metadata.
- Ambiguous Python annotations should require explicit Arrow overrides rather than guessing.
In scope
- Dataclass/Pydantic ->
NodeType / EdgeType adapters.
NodeType / EdgeType / GraphSchema -> runtime dataclass/Pydantic model generation.
- Optional codegen plan for
.py / .pyi artifacts for mypy/pyright users.
- Type mapping table and explicit loss/override policy.
NodeFrame[T] / EdgeFrame[T] thin wrappers over pandas/cuDF/polars dataframes.
- Explicit type narrowing APIs such as
of_type(...), where_type(...), or as_type(...).
- Entity/row materialization helpers:
first_model(), iter_models(), as_models().
- Validation helpers that reuse Arrow-boundary schema validation.
- Heterogeneous row/entity wrapper support using a synthesized Graphistry type discriminator.
- Docs/examples for trusted vs untrusted conversion:
- trusted data after Arrow validation: dataclass construction or Pydantic
model_construct,
- untrusted boundary: Pydantic
model_validate.
Out of scope
Acceptance
- Dataclass/Pydantic required/optional primitive fields map deterministically to Arrow field type/nullability.
- Field aliases/metadata can override Arrow field name, dtype, nullability, and Graphistry metadata.
- Ambiguous annotations fail with clear override guidance.
- Arrow schema -> model -> Arrow round trip preserves field names, Arrow types, nullability, and Graphistry metadata where representable.
- Typed frame wrappers preserve access to the underlying dataframe via
.df.
- Type narrowing is explicit and keyed by
NodeType.name / EdgeType.name, not ambiguous label sets.
- Cat/Dog shared-label cases are covered:
Animal can select a broader view while Cat narrows to Cat wrappers.
- Row/entity materialization works for dataclass and Pydantic adapter outputs.
- Heterogeneous results can use a generated discriminated union keyed by a Graphistry type discriminator.
- Docs clearly state that dataframe execution remains vectorized/columnar and wrappers are client ergonomics.
Notes
This intentionally bundles Python model adapters with typed dataframe/entity façades. The split is useful conceptually, but the executable UX should be designed together so the generated models have an immediate typed table/entity surface.
Summary
Add Python-native typed adapters and dataframe/entity façades over Arrow-canonical
GraphSchemadeclarations.The goal is ORM-like ergonomics without replacing the dataframe/Arrow execution core: users should get dataclass/Pydantic adapters, typed selectors/views for node/edge tables, and typed wrappers for extracted rows/entities, while GFQL execution remains dataframe/vectorized and schema transport remains Arrow-native.
Parent / Related
Motivation
Users want type-friendly movement across layers:
NodeType/EdgeType/GraphSchema,g._nodes/g._edgesor public equivalents,of_type(Cat)rather than static inference from arbitrary dataframe query strings,c: Cat = ...for extracted node/edge rows,String dataframe queries like
g._edges.query('type == "Cat"')can be useful at runtime, but static type checkers cannot reliably infer aCattype from arbitrary strings. Type narrowing should therefore happen through explicit typed APIs.Possible API shape
For heterogeneous results:
Design constraints
NodeType.name/EdgeType.nameare stable type identity.Animal..py/.pyiartifacts.In scope
NodeType/EdgeTypeadapters.NodeType/EdgeType/GraphSchema-> runtime dataclass/Pydantic model generation..py/.pyiartifacts for mypy/pyright users.NodeFrame[T]/EdgeFrame[T]thin wrappers over pandas/cuDF/polars dataframes.of_type(...),where_type(...), oras_type(...).first_model(),iter_models(),as_models().model_construct,model_validate.Out of scope
gfql_remote()schema transport; tracked in GFQL remote: send bound typed GraphSchema with gfql_remote requests #1465.Acceptance
.df.NodeType.name/EdgeType.name, not ambiguous label sets.Animalcan select a broader view whileCatnarrows to Cat wrappers.Notes
This intentionally bundles Python model adapters with typed dataframe/entity façades. The split is useful conceptually, but the executable UX should be designed together so the generated models have an immediate typed table/entity surface.