Skip to content

feat(gfql): add pretty_print_schema() + compact __repr__ for GraphSchema/NodeType/EdgeType (LLM-friendly) #1633

@lmeyerov

Description

@lmeyerov

Background

PR #1457 landed graphistry.schema.NodeType / EdgeType / GraphSchema / EdgeTopology as Python dataclasses. The default dataclass __repr__ produces multi-line verbose output: GraphSchema(node_types=(NodeType(name='Person', properties={'age': ScalarType(name='int64', ...), ..., with logical-type wrappers (ScalarType, EdgeRef, NodeRef, PathType) that expand to ~600 tokens for even a simple schema.

For LLM/AI-synthesizer consumption (the canonical strategic consumer per the broader typed-schema effort), this is 4-6× context-bloat vs a Cypher-pattern compact format. A schema like:

(:Person {id: int, name: str, age: int})
(:Company {id: int, name: str})
(:Person)-[:WORKS_AT {since: int}]->(:Company)
(:Company)-[:CONTRACTS {fee: int}]->(:Person)

conveys the same information in ~120 tokens. All the info is already in the GraphSchema dataclasses; only the formatter is missing.

Goal

Add a public pretty-printer for GraphSchema that produces compact LLM-friendly representations. Override __repr__ to use the compact form by default.

Scope

  • Add GraphSchema.pretty(format: Literal['cypher', 'yaml', 'compact'] = 'cypher') -> str (and equivalent for NodeType, EdgeType, EdgeTopology)
  • Override __repr__ on GraphSchema / NodeType / EdgeType to call pretty('cypher') by default
  • Provide three output formats:
    • cypher: Cypher-pattern-style for LLM prompts (highest density, most discoverable)
    • yaml: indented YAML-shape for human debugging
    • compact: single-line summary (e.g., GraphSchema(3 node types, 2 edge types, 4 properties))
  • Top-level graphistry.schema.pretty_print_schema(schema, format=...) re-export for ergonomics
  • Document with examples in docstrings + RTD docs page
  • Anchored regression tests covering all three formats + each dataclass type's pretty output

Non-scope

  • No changes to the underlying GraphSchema data shape (formatter only)
  • No serialization (separate concern — to_json()/from_json() may follow but is not this issue)
  • No inference (separate #1338 lane)
  • No LLM-specific token-budget heuristics (the cypher format is the LLM-friendly default; consumers can choose)

Suggested Cypher-format output

(:Person {id: int64, name: string, age: int64})
(:Company {id: int64, name: string, founded: int64})
(:Person)-[:WORKS_AT {since: int64}]->(:Company)
(:Company)-[:CONTRACTS {fee: float64}]->(:Person)

Suggested YAML-format output

nodes:
  Person:
    properties:
      id: int64
      name: string
      age: int64
  Company:
    properties:
      id: int64
      name: string
relationships:
  WORKS_AT:
    from: Person
    to: Company
    properties:
      since: int64

Acceptance

  • schema.pretty() (or repr(schema)) returns compact cypher-format string
  • schema.pretty('yaml') returns YAML-shape
  • schema.pretty('compact') returns single-line summary
  • NodeType.pretty(), EdgeType.pretty(), EdgeTopology.pretty() work analogously
  • Anchored regression tests confirm stable output across format choices
  • Public docs include usage examples
  • Experimental marking preserved per #1457
  • Compiler-plan surface touched: no

Cross-refs

  • Landed in #1457 / #1337
  • Surfaced by AI-synthesizer user-testing 2026-05-25 (P1 finding — explicit user-requested gap "clean pretty printer")
  • Coordination: pygraphistry2/'s #1338 inference may add presence/nullability info that the formatter should surface; coordinate on field representation
  • Metaissue: #1058, #1046
  • Downstream consumer: graphistrygpt/'s plugins/graphistry/tool.py and any LLM-tool path; relates to pygraphistry#1326/#1590 schema-artifact JSON exports (the compact text format is the human/LLM analog)

Effort

Small-to-medium (~150 prod LOC + ~80 tests). Self-contained, no cross-file ripple. Fits any paused pygraphistry lane.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions