Skip to content

GFQL typed schema: Python adapters + typed dataframe/entity façades #1643

@lmeyerov

Description

@lmeyerov

Summary

Add Python-native typed adapters and dataframe/entity façades over Arrow-canonical GraphSchema declarations.

The goal is ORM-like ergonomics without replacing the dataframe/Arrow execution core: users should get dataclass/Pydantic adapters, typed selectors/views for node/edge tables, and typed wrappers for extracted rows/entities, while GFQL execution remains dataframe/vectorized and schema transport remains Arrow-native.

Parent / Related

Motivation

Users want type-friendly movement across layers:

  • typed Python model definitions: dataclass/Pydantic models should compile to Arrow-backed NodeType / EdgeType / GraphSchema,
  • generated Python wrappers: Arrow-backed schemas should generate dataclass/Pydantic row/entity models,
  • dataframe/table work: typed views over g._nodes / g._edges or public equivalents,
  • query/type narrowing: explicit of_type(Cat) rather than static inference from arbitrary dataframe query strings,
  • entity work: c: Cat = ... for extracted node/edge rows,
  • mypy/pyright/Pydantic-friendly wrappers for app code and tests.

String dataframe queries like g._edges.query('type == "Cat"') can be useful at runtime, but static type checkers cannot reliably infer a Cat type from arbitrary strings. Type narrowing should therefore happen through explicit typed APIs.

Possible API shape

from dataclasses import dataclass

@dataclass
class Cat:
    id: int
    name: str
    lives: int | None = None

CatType = NodeType.from_dataclass(Cat, labels=("Animal", "Cat"))
CatModel = CatType.to_dataclass()

cats = g.nodes.of_type(Cat)          # NodeFrame[Cat]
first: Cat = cats.first_model()
rows: list[Cat] = cats.as_models()

works = g.edges.of_type(WorksAt)     # EdgeFrame[WorksAt]
works.select_model_fields()
works.validate_rows()

For heterogeneous results:

entity = g.nodes.first_entity()
# generated discriminated union keyed by Graphistry type identity,
# not raw label marker sets

Design constraints

  • Arrow/GraphSchema remains canonical.
  • NodeType.name / EdgeType.name are stable type identity.
  • Labels are GFQL predicates and may overlap, e.g. Cat and Dog can both be Animal.
  • Do not attempt to statically type arbitrary pandas/cuDF/polars expression strings.
  • Do not reimplement full dataframe APIs; wrappers should be thin and explicit.
  • Per-row conversion is a user/API boundary tool, not a GFQL hot-path mechanism.
  • Runtime-generated models are useful for notebooks and services; static typing needs optional emitted .py / .pyi artifacts.
  • Pydantic validators are Python-side validation only unless they can be expressed in Arrow/Graphistry schema metadata.
  • Ambiguous Python annotations should require explicit Arrow overrides rather than guessing.

In scope

  • Dataclass/Pydantic -> NodeType / EdgeType adapters.
  • NodeType / EdgeType / GraphSchema -> runtime dataclass/Pydantic model generation.
  • Optional codegen plan for .py / .pyi artifacts for mypy/pyright users.
  • Type mapping table and explicit loss/override policy.
  • NodeFrame[T] / EdgeFrame[T] thin wrappers over pandas/cuDF/polars dataframes.
  • Explicit type narrowing APIs such as of_type(...), where_type(...), or as_type(...).
  • Entity/row materialization helpers: first_model(), iter_models(), as_models().
  • Validation helpers that reuse Arrow-boundary schema validation.
  • Heterogeneous row/entity wrapper support using a synthesized Graphistry type discriminator.
  • Docs/examples for trusted vs untrusted conversion:
    • trusted data after Arrow validation: dataclass construction or Pydantic model_construct,
    • untrusted boundary: Pydantic model_validate.

Out of scope

Acceptance

  1. Dataclass/Pydantic required/optional primitive fields map deterministically to Arrow field type/nullability.
  2. Field aliases/metadata can override Arrow field name, dtype, nullability, and Graphistry metadata.
  3. Ambiguous annotations fail with clear override guidance.
  4. Arrow schema -> model -> Arrow round trip preserves field names, Arrow types, nullability, and Graphistry metadata where representable.
  5. Typed frame wrappers preserve access to the underlying dataframe via .df.
  6. Type narrowing is explicit and keyed by NodeType.name / EdgeType.name, not ambiguous label sets.
  7. Cat/Dog shared-label cases are covered: Animal can select a broader view while Cat narrows to Cat wrappers.
  8. Row/entity materialization works for dataclass and Pydantic adapter outputs.
  9. Heterogeneous results can use a generated discriminated union keyed by a Graphistry type discriminator.
  10. Docs clearly state that dataframe execution remains vectorized/columnar and wrappers are client ergonomics.

Notes

This intentionally bundles Python model adapters with typed dataframe/entity façades. The split is useful conceptually, but the executable UX should be designed together so the generated models have an immediate typed table/entity surface.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions