Skip to content

Latest commit

 

History

History
141 lines (116 loc) · 6.23 KB

File metadata and controls

141 lines (116 loc) · 6.23 KB

ZGraph — Unified Master Plan (Reconciled)

This document merges the current repository state with the idealized goals and becomes the single source of truth. Items are marked as:

  • ✅ Done
  • 🚧 Partial / scaffolding present
  • ⭕ Not started

A. Semantics & Types

  • ✅ A1. Compile-time graph semantics (directed, weighted) across storages
  • ✅ A2. Self-loops & multi-edges covered by tests
  • ⭕ A3. Attributes (node/edge KV with typed values)
  • ✅ A4. Numeric node IDs (u64/usize) in use
  • ⭕ A5. Storage-agnostic iterators (common trait iterators)

B. Storage Backends (in-memory)

  • ✅ B1. AdjacencyList: CRUD + neighbor ops + tests
  • ✅ B2. AdjacencyMatrix: CRUD + present bitset + tests
  • ✅ B3. IncidenceMatrix: rows, sign/weight semantics, edge_set, parallel builder, tests
  • ⭕ B4. CSR / ReverseCSR indices (standalone, optional overlays)
  • ⭕ B5. Attribute storage (typed, columnar, interning)

C. Common Trait Surface

  • ✅ C1. GraphStorage(type, directed, weighted) factory
  • ✅ C2. Core ops: init/deinit/add/remove/has/getNeighbors
  • ⭕ C3. Attribute API (node/edge getters/setters)
  • ⭕ C4. Iteration policy & attribute filters

D. Conversion Strategies

  • ⭕ D1–D7. Strategies + API + tests (Duplicate, Streamed, Chunked, CopyOnWrite)

E. File Formats

  • 🚧 E1. .zgraph text spec documented; runtime reader/writer pending
  • 🚧 E2. .zgraphb binary spec documented; runtime reader/writer pending

F. Algorithms

  • 🚧 F1. Traversal (BFS/DFS/CC) — source files exist; unify over trait; add tests
  • 🚧 F2. Shortest paths (Dijkstra/Bellman-Ford/Floyd-Warshall) — wire and test
  • 🚧 F3. Connectivity (SCC) — add Tarjan/Kosaraju; tests
  • 🚧 F4. Flow (Edmonds-Karp present; add Dinic); tests
  • 🚧 F5. Centrality (PageRank present; add Betweenness); tests
  • 🚧 F6. Spectral (Laplacian, eigen routines present); tests & trait integration

G. Indices & Acceleration

  • ⭕ G1. CSR build/use
  • ⭕ G2. Reverse CSR (directed)
  • ⭕ G3. Degree table
  • ⭕ G4. Index persistence in .zgraphb optional blocks
  • ⭕ G5. Rebuild-on-demand hooks

H. Attributes

  • ⭕ H1. Node & Edge KV (int|float|bool|string)
  • ⭕ H2. Storage-agnostic attribute map with typed accessors
  • ⭕ H3. Text↔Binary column mapping (string table)
  • ⭕ H4. Attribute filters in iterators
  • ⭕ H5. Tests for mixed types, missing values, interning

I. I/O & CLI

  • ⭕ I1. CLI subcommands: convert, validate, build-index, stats
  • ⭕ I2. Streaming I/O (chunked) for both formats
  • ⭕ I3. zstd dictionaries & auto-tuning

J. Concurrency & Memory

  • ✅ J0. IncidenceMatrix parallel builder + mutex guarding
  • ⭕ J1. Threading doc & invariants
  • ⭕ J2. Parallel builders for CSR & conversions where safe
  • ⭕ J3. Allocator strategy knobs (GPA/Arena/Page) + docs
  • ⭕ J4. Zero-copy .zgraphb readers (mmap)
  • ⭕ J5. Stress tests (OOM behavior)

K. Testing & Quality

  • ✅ K1. Unit tests for storages
  • ⭕ K2. Property tests (round-trips, cross-storage equivalence)
  • ⭕ K3. Fuzzers for text/binary parsers
  • ⭕ K4. Benchmarks (micro/macro) and tracked results
  • ⭕ K5. CI (Win/macOS/Linux; Zig 0.16.x)
  • ⭕ K6. zig fmt + static checks gates

L. Documentation

  • ✅ L1. README
  • 🚧 L2. Detailed specs for .zgraph/.zgraphb (needs sync with runtime)
  • ⭕ L3. Algorithm docs and complexity notes
  • ⭕ L4. Conversion strategy doc (memory math & decision table)
  • ⭕ L5. Perf tuning guide (allocators, cache, zstd params)
  • ⭕ L6. Roadmap milestones mapping to this checklist

Execution Order (Do This Next, Exactly)

Phase 1 — Lock the Core Interfaces

  1. Define common trait surface (GraphLike) with adapters over existing storages:
    • neighbors(u), hasEdge(u,v), weight(u,v)?, nodeCount(), edgeCount()
  2. Iterator policy: stable order + attribute filter hooks; basic NodeIter, EdgeFromIter

Phase 2 — Attributes (foundation for formats)

  1. Typed attribute store (node & edge): int|float|bool|string + string interning
  2. Attribute API on the trait surface + tests
  3. Iterator filters using attributes

Phase 3 — File Formats (runtime)

  1. .zgraph reader/writer (streaming CSV-ish with schemas; tolerant of unknown sections); round-trip tests
  2. .zgraphb reader/writer (chunked blocks + zstd + mmap-friendly); round-trip & parity tests
  3. Text↔Binary parity: load(text)->save(binary)->load(binary) == load(text)

Phase 4 — Conversion Strategies (memory-bounded)

  1. convertStorage API + Duplicate strategy
  2. Streamed strategy (destroy-as-you-go option; memory ceiling tests)
  3. Chunked strategy (node sharding; parallel)
  4. Copy-On-Write adapter (lazy migration) + background compaction
  5. Invariants & ceilings property tests

Phase 5 — Indices & Acceleration

  1. CSR/ReverseCSR build & use as optional overlays for any storage
  2. Degree table and index persistence blocks in .zgraphb
  3. Rebuild-on-demand hooks

Phase 6 — Algorithms (unify + verify)

  1. Traversal (BFS/DFS/CC) over trait + CSR; tests across storages
  2. Shortest paths (Dijkstra/BF/FW) with layout decision table; tests
  3. Connectivity/SCC (Tarjan/Kosaraju); tests
  4. Flow (finish Dinic); tests
  5. Centrality (PageRank + Betweenness); tests
  6. Spectral (laplacian, eigen); tests & perf

Phase 7 — CLI, Streaming I/O, Docs, Quality Gates

  1. CLI subcommands: convert, validate, build-index, stats
  2. Streaming I/O for both formats + zstd dictionaries
  3. Documentation: conversion strategies, perf tuning, thread model; keep specs synced
  4. Property tests, fuzzers, benches; CI on all platforms; formatting/static checks gates

Acceptance Criteria (per phase, condensed)

  • Formats: Round-trips preserve counts, attributes, weights; unknown sections/blocks ignored.
  • Conversion: Peak memory measured below ceiling; equivalence of neighbor sets/weights post-convert.
  • Indices: CSR neighbors == storage neighbors; persisted indices reload correctly.
  • Algorithms: Identical results across storages (given same semantics); perf within expected bounds.
  • CLI: Non-zero exit on invalid files; stats match library queries; build-index regenerates CSR.
  • Docs: Examples compile; decision tables match implemented behavior.