Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 61 additions & 39 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,26 +113,31 @@ PYTEST_WORKERS=2 make test

## Architecture: Unified API

GraphForge exposes four analyst-intent methods that share a single Parquet-backed storage layer.
GraphForge exposes seven analyst-intent methods that share a single Parquet-backed storage layer.

```
forge.execute("MATCH ...") → Cypher path (4-layer pipeline below)
forge.rank("Person", ...) → Algorithm path (bypasses parser/planner/executor)
forge.cluster("Person", ...) → Algorithm path (bypasses parser/planner/executor)
forge.find("query", ...) → Search path (bypasses parser/planner/executor)
forge.execute("MATCH …") → Cypher path (4-layer pipeline below)
forge.rank("Person", by=…) → Algorithm path (centrality, structural scoring)
forge.cluster("Person", by=…) → Algorithm path (community detection, components)
forge.paths(alice, bob, by=…) → Algorithm path (shortest paths, flow, reachability)
forge.analyze(by=…) → Algorithm path (DAG, coloring, spanning trees, embeddings)
forge.similar("Person", by=…) → Algorithm path (pairwise node similarity)
forge.find("query", …) → Search path (text + vector hybrid search)
```

**ALL methods return Apache Arrow Tables.** There are no `CypherValue` wrappers, no `.value` calls, no separate result types. The conventional instance name is `forge`, not `db`.

Full algorithm catalog: `docs/architecture/algorithms.md`

### Cypher Path (forge.execute)

```
User Code
┌─────────────────────────────────────────────────┐
│ 1. Parser (crates/gf-cypher/) │
│ - LALRPOP grammar
│ - Logos lexer
│ - Hand-written Tok lexer (lexer.rs)
│ - Recursive-descent + Pratt parser
│ - → AST with spans │
└─────────────────────────────────────────────────┘
Expand All @@ -158,35 +163,34 @@ User Code
Arrow RecordBatch stream → pyarrow.Table
```

### Algorithm Path (forge.rank / forge.cluster)
### Algorithm Path (forge.rank / forge.cluster / forge.paths / forge.analyze / forge.similar)

`forge.rank()` and `forge.cluster()` bypass the Cypher pipeline entirely. They live in `crates/gf-exec/` (Rust) and `src/graphforge/algorithms/` (Python 0.4.x shim).
All five analyst verbs bypass the Cypher pipeline entirely. They live in `crates/gf-exec/src/algorithms/` (Rust) and `src/graphforge/algorithms/` (Python 0.4.x shim).

```
forge.rank(label, by=..., via=..., directed=..., write_property=...)
_export_adjacency(label, via, directed) → graph representation
_resolve_backend() → native Rust → igraph → NetworkX
export_adjacency(label, via, directed) → AdjacencyGraph
backend.compute(G, algorithm) → Arrow Table (node props + score)
AlgorithmBackend::rank(graph, alg) → native Rust → igraph → NetworkX
if write_property: set_node_properties(...) ← opt-in only
Arrow Table returned
Arrow Table (node props + score) returned
```

**rank() by= values:** `pagerank` | `betweenness` | `closeness` | `degree` | `clustering_coefficient` | `triangles`
**cluster() by= values:** `louvain` | `components`
**rank() by= values:** see `docs/architecture/algorithms.md` — centrality, structural, link prediction
**cluster() by= values:** community detection, components, decomposition
**paths() by= values:** BFS, Dijkstra, A*, Bellman-Ford, Floyd-Warshall, Yen's, max flow, min cut, etc.
**analyze() by= values:** spanning trees, DAG analysis, coloring, matching, Eulerian, planarity, embeddings
**similar() by= values:** node_similarity, KNN, cosine

Both accept: `label` (node label), `via` (relationship type filter), `directed` (bool), `write_property` (opt-in mutation).
All accept: `via` (relationship type filter), `directed` (bool), `write_property` (opt-in mutation).

#### Adding a rank/cluster algorithm
#### Adding an algorithm

1. Implement in `crates/gf-exec/src/algorithms/<category>.rs` (Rust) or `src/graphforge/algorithms/<category>.py` (0.4.x shim)
2. Add dispatch branch in `_dispatch.py` / `gf-exec` for each supported backend
3. Expose as a `by=` value on `forge.rank()` or `forge.cluster()`
4. Tests: unit test per backend + integration test via `forge.rank(label, by="<name>")`
1. Implement in `crates/gf-exec/src/algorithms/<category>.rs`
2. Add variant to the relevant `*Algorithm` enum and `AlgorithmBackend` dispatch
3. Expose as a `by=` value on the appropriate verb
4. Tests: unit test per backend + integration test via the verb

### Search Path (forge.find)

Expand Down Expand Up @@ -243,13 +247,16 @@ Current recipes:

### Design Constraint: No Cypher Extensions

**`forge.rank()`, `forge.cluster()`, and `forge.find()` do not modify the Cypher grammar.** If you are adding a feature and find yourself editing the LALRPOP grammar, you are in the wrong path.
**The analyst verbs do not modify the Cypher grammar.** If you are adding a feature and find yourself editing the parser, you are in the wrong path.

| Feature type | Entry point | Touches parser? |
|---|---|---|
| New Cypher clause or function | `forge.execute` | Yes — all 4 layers |
| New ranking/centrality algorithm | `forge.rank` | No |
| New community algorithm | `forge.cluster` | No |
| New centrality / structural algorithm | `forge.rank` | No |
| New community / component algorithm | `forge.cluster` | No |
| New path / flow algorithm | `forge.paths` | No |
| New graph-level metric or embedding | `forge.analyze` | No |
| New similarity measure | `forge.similar` | No |
| New search capability | `forge.find` | No |

### Key Principles
Expand All @@ -264,7 +271,7 @@ Current recipes:

**You must modify ALL four layers:**

1. **Parser:** Add grammar to LALRPOP grammar + token definitions
1. **Parser:** Add to recursive-descent clause parser and/or Pratt expression parser in `crates/gf-cypher/src/parser/`
2. **AST:** Add/modify structs in `crates/gf-ast/src/`
3. **Graph IR:** Add `GraphOp` variant in `crates/gf-ir/src/`
4. **Relational lowering + executor:** Implement in `crates/gf-rel/` and `crates/gf-exec/`
Expand Down Expand Up @@ -392,7 +399,7 @@ def test_match_returns_nodes():
assert table.column("name")[0].as_py() == "Alice"
```

**Integration tests for rank/cluster/find:**
**Integration tests for analyst verbs:**
```python
def test_rank_returns_arrow_table():
forge = GraphForge()
Expand All @@ -401,6 +408,12 @@ def test_rank_returns_arrow_table():
assert "score" in table.schema.names
assert table.num_rows == 2

def test_paths_returns_cost():
forge = GraphForge()
forge.execute("CREATE (:P {name:'A'})-[:R]->(:P {name:'B'})")
table = forge.paths({"label": "P", "name": "A"}, by="bfs")
assert "cost" in table.schema.names

def test_find_lazy_index():
forge = GraphForge()
forge.execute("CREATE (:Paper {title: 'Graph Neural Networks'})")
Expand Down Expand Up @@ -454,12 +467,13 @@ def test_create_node():

## Common Development Tasks

### Adding a forge.rank() Algorithm
### Adding an Algorithm to Any Verb

1. Implement in `crates/gf-exec/src/algorithms/<category>.rs`
2. Add dispatch for each supported backend (native → igraph → NetworkX)
3. Expose as a `by=` value on `forge.rank()` in `crates/gf-bindings-py/`
4. Tests: unit test per backend + integration test via `forge.rank(label, by="<name>")`
2. Add a variant to the appropriate `*Algorithm` enum
3. Add dispatch to `AlgorithmBackend` for each supported backend (native → igraph → NetworkX)
4. Expose as a `by=` value on the correct verb (`rank`, `cluster`, `paths`, `analyze`, or `similar`)
5. Tests: unit test per backend + integration test via the verb

### Adding a forge.find() Capability

Expand All @@ -476,7 +490,7 @@ def test_create_node():

### Adding a New Cypher Clause

1. Grammar: Add to LALRPOP grammar in `crates/gf-cypher/src/`
1. Parser: Add to `crates/gf-cypher/src/parser/clauses.rs` (clause dispatch) or `expr.rs` (expression)
2. AST: Add struct to `crates/gf-ast/src/`
3. Graph IR: Add `GraphOp` variant to `crates/gf-ir/src/`
4. Relational lowering: Implement in `crates/gf-rel/src/`
Expand All @@ -485,15 +499,16 @@ def test_create_node():

### Adding a New Cypher Function

1. Grammar: Add to `function_call` rule in LALRPOP grammar
1. Parser: Add to `function_call` dispatch in `crates/gf-cypher/src/parser/expr.rs`
2. Evaluator: Implement in `crates/gf-exec/src/evaluator.rs`
3. Tests: Unit + integration tests

### Fixing Parser Issues

1. Check grammar: `crates/gf-cypher/src/cypher.lalrpop`
2. Check token definitions: `crates/gf-cypher/src/lexer.rs`
3. Debug with: `forge.explain("MATCH (n) RETURN n", stage="ast")`
1. Check clause parser: `crates/gf-cypher/src/parser/clauses.rs`
2. Check expression parser: `crates/gf-cypher/src/parser/expr.rs`
3. Check token definitions: `crates/gf-cypher/src/lexer.rs`
4. Debug with: `forge.explain("MATCH (n) RETURN n", stage="ast")`

### Fixing Executor Issues

Expand All @@ -517,13 +532,20 @@ print(plan) # Shows AST → GraphIR → LogicalPlan → PhysicalPlan
- `src/graphforge/api.py` — Python 0.4.x shim (reference implementation)

### Rust Crates
- `crates/gf-cypher/` — LALRPOP parser + Logos lexer
- `crates/gf-cypher/` — hand-written Tok lexer + recursive-descent/Pratt parser
- `crates/gf-ast/` — AST structs with spans
- `crates/gf-ir/` — Graph IR (GraphPlan, GraphOp, ExprArena)
- `crates/gf-rel/` — Graph IR → DataFusion LogicalPlan lowering
- `crates/gf-exec/` — execution session, algorithms, search
- `crates/gf-storage/` — Parquet StorageProvider
- `crates/gf-provenance/` — confidence + lineage models
- `crates/gf-bindings-uniffi/` — UniFFI shared binding for Swift + Kotlin

### Language Bindings
- `crates/gf-bindings-py/` — PyO3 + maturin Python binding
- `crates/gf-bindings-node/` — napi-rs Node/TypeScript binding
- `bindings/swift/` — Swift Package Manager package (UniFFI-generated)
- `bindings/kotlin/` — Gradle/Kotlin package (UniFFI-generated)

### Python 0.4.x Shim (src/graphforge/)
- `src/graphforge/api.py` — GraphForge class
Expand Down
50 changes: 1 addition & 49 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 15 additions & 0 deletions crates/gf-ast/src/ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ pub enum AstClause {
Delete(DeleteClause),
Unwind(UnwindClause),
Call(CallClause),
Union(UnionClause),
}

impl AstClause {
Expand All @@ -78,6 +79,7 @@ impl AstClause {
Self::Delete(c) => c.span,
Self::Unwind(c) => c.span,
Self::Call(c) => c.span,
Self::Union(c) => c.span,
}
}
}
Expand Down Expand Up @@ -279,6 +281,17 @@ pub struct CallClause {
pub span: Span,
}

// ---------------------------------------------------------------------------
// UNION
// ---------------------------------------------------------------------------

/// A `UNION` or `UNION ALL` clause joining two query halves.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct UnionClause {
pub all: bool,
pub span: Span,
}

// ---------------------------------------------------------------------------
// Path patterns
// ---------------------------------------------------------------------------
Expand Down Expand Up @@ -544,6 +557,8 @@ pub struct FunctionCall {
/// Namespace-qualified name, e.g. `["apoc", "coll", "sum"]`.
pub name: Vec<String>,
pub distinct: bool,
/// `true` when called as `f(*)` — `args` is empty in this case.
pub star: bool,
pub args: Vec<Expr>,
pub span: Span,
}
Expand Down
2 changes: 1 addition & 1 deletion crates/gf-ast/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ pub use ast::{
Literal, MapLiteral, MatchClause, MergeClause, NodePattern, OrderByClause, ParamRef,
PathElement, PathPattern, PatternComprehension, PropertyAccess, RelPattern, RemoveClause,
RemoveItem, ReturnClause, ReturnItem, SetClause, SetItem, SortItem, SortOrder, StringOpKind,
UnaryOp, UnaryOpKind, UnwindClause, VarRef, WhenClause, WhereClause, WithClause,
UnaryOp, UnaryOpKind, UnionClause, UnwindClause, VarRef, WhenClause, WhereClause, WithClause,
};
pub use gf_core::Span;
pub use parse_error::{ParseError, ParseErrorKind};
Expand Down
3 changes: 3 additions & 0 deletions crates/gf-ast/src/parse_error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ pub enum ParseErrorKind {
},
/// A string literal was opened but never closed.
UnterminatedString,
/// A block comment was opened with `/*` but never closed with `*/`.
UnterminatedBlockComment,
/// An integer or float literal could not be parsed.
InvalidNumericLiteral,
/// A `$` parameter prefix was not followed by a valid name or index.
Expand Down Expand Up @@ -58,6 +60,7 @@ impl std::fmt::Display for ParseError {
format!("unexpected token '{found}'")
}
ParseErrorKind::UnterminatedString => "unterminated string".to_owned(),
ParseErrorKind::UnterminatedBlockComment => "unterminated block comment".to_owned(),
ParseErrorKind::InvalidNumericLiteral => "invalid numeric literal".to_owned(),
ParseErrorKind::InvalidParameter => "invalid parameter".to_owned(),
ParseErrorKind::UnexpectedEof { .. } => "unexpected end of input".to_owned(),
Expand Down
1 change: 1 addition & 0 deletions crates/gf-ast/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,7 @@ fn ast_function_call_roundtrip() {
let e = Expr::FunctionCall(FunctionCall {
name: vec!["count".into()],
distinct: false,
star: false,
args: vec![Expr::Var(VarRef {
name: "n".into(),
span: zero(),
Expand Down
7 changes: 4 additions & 3 deletions crates/gf-cypher/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "gf-cypher"
description = "GraphForge Cypher lexer (Logos) and parser (LALRPOP)"
description = "GraphForge Cypher lexer and recursive-descent / Pratt parser"
version.workspace = true
edition.workspace = true
license.workspace = true
Expand All @@ -9,9 +9,10 @@ repository.workspace = true
[dependencies]
gf-core = { path = "../gf-core" }
gf-ast = { path = "../gf-ast" }
lalrpop-util = { workspace = true }
logos = { workspace = true }
thiserror = { workspace = true }

[dev-dependencies]
serde_json = { workspace = true }

[lints]
workspace = true
Loading
Loading