Skip to content

perf: graphistry wheel size (625 kB) — GFQL engine files are large, investigate modularization #1058

@lmeyerov

Description

@lmeyerov

Observation

Installing graphistry 0.53.16 from PyPI downloads a 625 kB wheel (2.75 MB uncompressed).

What's NOT the problem

Tests: not in wheel. graphistry/tests/ has no top-level __init__.py so find_packages() skips it entirely. Confirmed by inspecting the actual built wheel (212 files, 0 test files).

Deps are already clean. Confirmed in Docker (python:3.12-slim):

Use case Required deps
import graphistry + basic GFQL pandas, numpy, requests, pyarrow, typing_extensions, packaging
Cypher string GFQL + lark (already lazy — only imported on first parse)
squarify, palettable, scipy, sklearn, igraph, cugraph not imported unless explicitly used

squarify is being eliminated separately.

What IS the problem

GFQL/Cypher engine files are large and may be bloated — grown organically without modularization passes:

File Uncompressed Lines
graphistry/compute/gfql/cypher/lowering.py 294 KB 8,212
graphistry/compute/gfql/row/pipeline.py 181 KB 3,976
graphistry/PlotterBase.py 150 KB 3,768
graphistry/feature_utils.py 112 KB 3,097
graphistry/pygraphistry.py 101 KB 2,653
graphistry/compute/gfql/cypher/parser.py 84 KB 1,949
graphistry/compute/gfql/temporal_text.py 74 KB 2,073
graphistry/compute/ast.py 66 KB 1,701
graphistry/compute/gfql_unified.py 58 KB 1,548
graphistry/compute/chain.py 50 KB 1,227

GFQL-specific files alone (lowering, pipeline, parser, temporal_text, ast, gfql_unified, chain) = ~808 KB uncompressed, ~29% of total wheel.

Scope

  1. Audit lowering.py (8,212 lines) and pipeline.py (3,976 lines) for dead code, duplication, or extractable sub-modules
  2. Same for temporal_text.py and ast.py
  3. Modularization into focused sub-files improves maintainability regardless of wheel impact
  4. Separately: PlotterBase.py and feature_utils.py are non-GFQL large files worth a similar pass

Not urgent — 625 kB downloads in ~18ms on a typical connection. Worth doing as engineering hygiene when touching these files for other reasons.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions