Skip to content
222 changes: 194 additions & 28 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,79 @@ DRAW line MAPPING month AS x, total AS y

---

## Public API

### Quick Start

```rust
use ggsql::reader::{DuckDBReader, Reader};
use ggsql::writer::VegaLiteWriter;

// Create a reader
let reader = DuckDBReader::from_connection_string("duckdb://memory")?;

// Execute the ggsql query
let spec = reader.execute(
"SELECT x, y FROM data VISUALISE x, y DRAW point"
)?;

// Render to Vega-Lite JSON
let writer = VegaLiteWriter::new();
let json = writer.render(&spec)?;
```

### Core Functions

| Function | Purpose |
| ----------------------- | ------------------------------------------------------ |
| `reader.execute(query)` | Main entry point: parse, execute SQL, resolve mappings |
| `writer.render(spec)` | Generate output from a Spec |
| `validate(query)` | Validate syntax + semantics, inspect query structure |

### Key Types

**`Validated`** - Result of `validate()`:

- `has_visual()` - Whether query has VISUALISE clause
- `sql()` - The SQL portion (before VISUALISE)
- `visual()` - The VISUALISE portion (raw text)
- `tree()` - CST for advanced inspection
- `valid()` - Whether query is valid
- `errors()` - Validation errors
- `warnings()` - Validation warnings

**`Spec`** - Result of `reader.execute()`, ready for rendering:

- `plot()` - Resolved plot specification
- `metadata()` - Rows, columns, layer count
- `warnings()` - Validation warnings from execution
- `data()` / `layer_data(i)` / `stat_data(i)` - Access DataFrames
- `sql()` / `visual()` / `layer_sql(i)` / `stat_sql(i)` - Query introspection

**`Metadata`**:

- `rows` - Number of rows in primary data
- `columns` - Column names
- `layer_count` - Number of layers

### Reader & Writer

**Reader trait** (data source abstraction):

- `execute_sql(sql)` - Run SQL, return DataFrame
- `register(name, df)` - Register DataFrame as table
- `unregister(name)` - Unregister a previously registered table
- Implementation: `DuckDBReader`

**Writer trait** (output format abstraction):

- `write(spec, data)` - Generate output string
- Implementation: `VegaLiteWriter` (Vega-Lite v6 JSON)

For detailed API documentation, see [`src/doc/API.md`](src/doc/API.md).

---

## Component Breakdown

### 1. Parser Module (`src/parser/`)
Expand Down Expand Up @@ -432,7 +505,7 @@ pub type Result<T> = std::result::Result<T, GgsqlError>;

```rust
pub trait Reader {
fn execute(&self, sql: &str) -> Result<DataFrame>;
fn execute_sql(&self, sql: &str) -> Result<DataFrame>;
fn supports_query(&self, sql: &str) -> bool;
}
```
Expand Down Expand Up @@ -462,7 +535,6 @@ pub fn parse_connection_string(uri: &str) -> Result<ConnectionInfo> {
The codebase includes connection string parsing and feature flags for additional readers, but they are not yet implemented:

- **PostgreSQL Reader** (`postgres://...`)

- Feature flag: `postgres`
- Connection string parsing exists in `connection.rs`
- Reader implementation: Not yet available
Expand Down Expand Up @@ -792,15 +864,18 @@ When running in Positron IDE, the extension provides enhanced functionality:

### 8. Python Bindings (`ggsql-python/`)

**Responsibility**: Python bindings for ggsql, enabling Python users to render Altair charts using ggsql's VISUALISE syntax.
**Responsibility**: Python bindings for ggsql, enabling Python users to create visualizations using ggsql's VISUALISE syntax.

**Features**:

- PyO3-based Rust bindings compiled to a native Python extension
- Two-stage API mirroring the Rust API: `reader.execute()` → `render()`
- DuckDB reader with DataFrame registration
- Custom Python reader support: any object with `execute_sql(sql) -> DataFrame` method
- Works with any narwhals-compatible DataFrame (polars, pandas, etc.)
- LazyFrames are collected automatically
- Returns native `altair.Chart` objects for easy display and customization
- Query splitting to separate SQL from VISUALISE portions
- Returns native `altair.Chart` objects via `render_altair()` convenience function
- Query validation and introspection (SQL, layer queries, stat queries)

**Installation**:

Expand All @@ -817,26 +892,117 @@ maturin develop
import ggsql
import polars as pl

# Split a ggSQL query into SQL and VISUALISE portions
sql, viz = ggsql.split_query("""
SELECT date, revenue FROM sales
VISUALISE date AS x, revenue AS y
DRAW line
""")
# Create reader and register data
reader = ggsql.DuckDBReader("duckdb://memory")
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
reader.register("data", df)

# Execute visualization
spec = reader.execute(
"SELECT * FROM data VISUALISE x, y DRAW point"
)

# Inspect metadata
print(f"Rows: {spec.metadata()['rows']}")
print(f"Columns: {spec.metadata()['columns']}")
print(f"SQL: {spec.sql()}")

# Render to Vega-Lite JSON
writer = ggsql.VegaLiteWriter()
json_output = writer.render(spec)
```

**Convenience Function** (`render_altair`):

For quick visualizations without explicit reader setup:

```python
import ggsql
import polars as pl

# Execute SQL and render to Altair chart
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")

# Display or save
# Render DataFrame to Altair chart in one call
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
chart.display() # In Jupyter
chart.save("chart.html")
```

**Query Validation**:

```python
# Validate syntax without execution
validated = ggsql.validate(
"SELECT x, y FROM data VISUALISE x, y DRAW point"
)
print(f"Valid: {validated.valid()}")
print(f"Has VISUALISE: {validated.has_visual()}")
print(f"SQL portion: {validated.sql()}")
print(f"Errors: {validated.errors()}")
```

**Classes**:

| Class | Description |
| -------------------------- | ------------------------------------------------- |
| `DuckDBReader(connection)` | Database reader with DataFrame registration |
| `VegaLiteWriter()` | Vega-Lite JSON output writer |
| `Validated` | Result of `validate()` with query inspection |
| `Spec` | Result of `reader.execute()`, ready for rendering |

**Functions**:

- `split_query(query: str) -> tuple[str, str]` - Split ggSQL query into SQL and VISUALISE portions
- `render_altair(df, viz, **kwargs) -> altair.Chart` - Render DataFrame with VISUALISE spec to Altair chart
| Function | Description |
| ------------------------ | ------------------------------------------------ |
| `validate(query)` | Syntax/semantic validation with query inspection |
| `reader.execute(query)` | Execute ggsql query, return Spec |
| `execute(query, reader)` | Execute with custom reader (bridge path) |
| `render_altair(df, viz)` | Convenience: render DataFrame to Altair chart |

**Spec Methods**:

| Method | Description |
| ---------------- | -------------------------------------------- |
| `render(writer)` | Generate Vega-Lite JSON |
| `metadata()` | Get rows, columns, layer_count |
| `sql()` | Get the SQL portion |
| `visual()` | Get the VISUALISE portion |
| `layer_count()` | Number of DRAW layers |
| `data()` | Get the main DataFrame |
| `layer_data(i)` | Get layer-specific DataFrame (if filtered) |
| `stat_data(i)` | Get stat transform DataFrame (if applicable) |
| `layer_sql(i)` | Get layer filter SQL (if applicable) |
| `stat_sql(i)` | Get stat transform SQL (if applicable) |
| `warnings()` | Get validation warnings |

**Custom Python Readers**:

Any Python object with an `execute_sql(sql: str) -> polars.DataFrame` method can be used as a reader:

```python
import ggsql
import polars as pl

class MyReader:
"""Custom reader that returns static data."""

def execute_sql(self, sql: str) -> pl.DataFrame:
return pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})

# Use custom reader with ggsql.execute()
reader = MyReader()
spec = ggsql.execute(
"SELECT * FROM data VISUALISE x, y DRAW point",
reader
)
```

Optional methods for custom readers:

- `supports_register() -> bool` - Return `True` if registration is supported
- `register(name: str, df: polars.DataFrame) -> None` - Register a DataFrame as a table
- `unregister(name: str) -> None` - Unregister a previously registered table

Native readers (e.g., `DuckDBReader`) use an optimized fast path, while custom Python readers are automatically bridged via IPC serialization.

**Dependencies**:

Expand Down Expand Up @@ -920,22 +1086,23 @@ cargo build --all-features
```

Where `<global_mapping>` can be:

- Empty: `VISUALISE` (layers must define all mappings)
- Mappings: `VISUALISE x, y, date AS x` (mixed implicit/explicit)
- Wildcard: `VISUALISE *` (map all columns)

### Clause Types

| Clause | Repeatable | Purpose | Example |
| -------------- | ---------- | ------------------ | ------------------------------------ |
| `VISUALISE` | ✅ Yes | Entry point | `VISUALISE date AS x, revenue AS y` |
| `DRAW` | ✅ Yes | Define layers | `DRAW line MAPPING date AS x, value AS y` |
| `SCALE` | ✅ Yes | Configure scales | `SCALE x SETTING type => 'date'` |
| `FACET` | ❌ No | Small multiples | `FACET WRAP region` |
| `COORD` | ❌ No | Coordinate system | `COORD cartesian SETTING xlim => [0,100]` |
| `LABEL` | ❌ No | Text labels | `LABEL title => 'My Chart', x => 'Date'` |
| `GUIDE` | ✅ Yes | Legend/axis config | `GUIDE color SETTING position => 'right'` |
| `THEME` | ❌ No | Visual styling | `THEME minimal` |
| Clause | Repeatable | Purpose | Example |
| ----------- | ---------- | ------------------ | ----------------------------------------- |
| `VISUALISE` | ✅ Yes | Entry point | `VISUALISE date AS x, revenue AS y` |
| `DRAW` | ✅ Yes | Define layers | `DRAW line MAPPING date AS x, value AS y` |
| `SCALE` | ✅ Yes | Configure scales | `SCALE x SETTING type => 'date'` |
| `FACET` | ❌ No | Small multiples | `FACET WRAP region` |
| `COORD` | ❌ No | Coordinate system | `COORD cartesian SETTING xlim => [0,100]` |
| `LABEL` | ❌ No | Text labels | `LABEL title => 'My Chart', x => 'Date'` |
| `GUIDE` | ✅ Yes | Legend/axis config | `GUIDE color SETTING position => 'right'` |
| `THEME` | ❌ No | Visual styling | `THEME minimal` |

### DRAW Clause (Layers)

Expand Down Expand Up @@ -1201,7 +1368,6 @@ COORD cartesian SETTING xlim => [0, 100], ylim => [0, 200]
LABEL x => 'Category', y => 'Count'
```


### LABEL Clause

**Syntax**:
Expand Down
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ csscolorparser = "0.8.1"
polars = { version = "0.52", features = ["lazy", "sql", "ipc"] }

# Readers
duckdb = { version = "1.1", features = ["bundled"] }
duckdb = { version = "1.4", features = ["bundled", "vtab-arrow"] }
arrow = { version = "56", default-features = false, features = ["ipc"] }
postgres = "0.19"
sqlx = { version = "0.8", features = ["postgres", "runtime-tokio-rustls"] }
rusqlite = "0.32"
Expand Down
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ THEME minimal
- ✅ REST API server (`ggsql-rest`) with CORS support
- ✅ Jupyter kernel (`ggsql-jupyter`) with inline Vega-Lite visualizations
- ✅ VS Code extension (`ggsql-vscode`) with syntax highlighting and Positron IDE integration
- ✅ Python bindings (`ggsql-python`) with Altair chart output

**Planned:**

Expand Down Expand Up @@ -93,7 +94,9 @@ ggsql/
├── ggsql-jupyter/ # Jupyter kernel
└── ggsql-vscode/ # VS Code extension
├── ggsql-vscode/ # VS Code extension
└── ggsql-python/ # Python bindings
```

## Development Workflow
Expand Down Expand Up @@ -297,6 +300,41 @@ When running in Positron IDE, the extension provides additional features:
- **Language runtime registration** for executing ggsql queries directly within Positron
- **Plot pane integration** - visualizations are automatically routed to Positron's Plots pane

## Python Bindings

The `ggsql-python` package provides Python bindings for using ggsql with DataFrames.

### Installation

```bash
cd ggsql-python
pip install maturin
maturin develop
```

### Usage

```python
import ggsql
import polars as pl

# Simple usage with render_altair
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
chart.display()

# Two-stage API for full control
reader = ggsql.DuckDBReader("duckdb://memory")
reader.register("data", df)

spec = reader.execute("SELECT * FROM data VISUALISE x, y DRAW point")

writer = ggsql.VegaLiteWriter()
json_output = writer.render(spec)
```

See the [ggsql-python README](ggsql-python/README.md) for complete API documentation.

## CLI

### Installation
Expand Down
Loading