Spatial phase I: Layer#370
Conversation
Registers GeomType::Spatial across the parser, AST, and builder. The spatial geom requires a `geometry` aesthetic and supports fill, stroke, opacity, linewidth, and linetype. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… SpatialRenderer - Feature-gated `spatial` (default-on) with geozero + hex dependencies - Auto-detect GEOMETRY columns via DESCRIBE for `DRAW spatial` layers - SpatialRenderer converts WKB hex / GeoJSON strings to GeoJSON Features - Vega-Lite geoshape mark with proper encoding channel handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip geometry aesthetic in encoding builder (structural, not visual) - Remove SpatialRenderer::modify_encoding stub (default suffices) - Remove geometry column auto-detection (revisit after Arrow migration) - Add end-to-end tests: GeoJSON features, WKB hex, mixed layers - Add execute test for explicit geometry mapping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Relax the grammar's other_sql_statement rule to accept any non-delimiter tokens, so statements like INSTALL/LOAD/SET/ATTACH parse without error. Execute these setup statements before the main query in the pipeline. Flip DDL detection in DuckDB and SQLite readers to a returns_rows whitelist, so unknown statement types are handled gracefully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…query_arrow DuckDB's query_arrow API exports results directly as Arrow RecordBatches, eliminating the manual row-by-row type mapping. Extension types like GEOMETRY now flow through as native Binary columns instead of hitting a lossy string fallback. Decimal128 columns are normalized to Float64 at the boundary for downstream compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… materialisation Replaces the execute_sql() → register() two-step with direct CREATE TEMP TABLE AS DDL. This keeps data inside the database engine and preserves native types (e.g. DuckDB GEOMETRY) that were lost during the Arrow materialisation round-trip. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Y tests Only apply ST_AsWKB when the geometry column is Binary (native DuckDB GEOMETRY via Arrow). String columns (GeoJSON, WKB hex) pass through to the writer unchanged. Adds tests using INSTALL spatial; LOAD spatial with ST_GeomFromText to verify the full native GEOMETRY pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mp table materialisation" This reverts commit 2694724.
Remove GeoJSON/WKB hex string handling from the writer and stat transform. Spatial data should come from native GEOMETRY columns (via ST_Read, ST_GeomFromText, etc.), not string columns. Drops hex crate dependency from the spatial feature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sql_geometry_to_wkb() to SqlDialect trait with ST_AsBinary default (OGC standard). DuckDB overrides with ST_AsWKB. The spatial stat transform uses the dialect instead of hardcoding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Natural Earth 110m country boundaries as geoparquet. Columns: name, iso_a3, continent, subregion, income_group, population, gdp, geom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The spatial stat transform now calls dialect.sql_spatial_setup() before emitting ST_AsWKB, so users no longer need manual LOAD spatial statements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a geom declares a geometry aesthetic and the user hasn't mapped it explicitly, scan the schema for a column with a conventional geometry name (geom, geometry, wkb_geometry, the_geom, shape) and binary type. Requires exactly one match to avoid ambiguity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB reads geoparquet geometry columns as BLOB when the spatial extension is not loaded, making ST_AsWKB fail later. Pre-load spatial when the world dataset is referenced so the column is read as GEOMETRY. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
thomasp85
left a comment
There was a problem hiding this comment.
This in general looks good. Most comments are cosmetic.
One thing we kinda ignore here, and maybe that is for the future projection PR, but do we need a geometry scale? Or do you scale latitude and longitude (defined by the projection), and the geometry mapping just sort of follows?
| fn looks_like_geometry(name: &str) -> bool { | ||
| matches!( | ||
| name.to_lowercase().as_str(), | ||
| "geom" | "geometry" | "wkb_geometry" | "the_geom" | "shape" |
There was a problem hiding this comment.
Are these based on use in the wild? "the_geom" sounds made up 🫠
There was a problem hiding this comment.
These are 100% based on what claude decided might be reasonable. I didn't ask it for receipts, but it does seem to be out there in the wild:
https://gis.stackexchange.com/questions/49455/geometry-column-naming-convention-geom-or-the-geom
| return Some(candidates[0].name.clone()); | ||
| } | ||
|
|
||
| // Fall back to name-only match (e.g. extension types we don't recognise) |
There was a problem hiding this comment.
what if there is a geometry type with another name? Shouldn't that win over the "standard" name?
There was a problem hiding this comment.
I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.
The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first
There was a problem hiding this comment.
So I think the logic you propose itself is sound, but the problem with this is that every DB has its own interpretation of what constitutes 'valid geometry'. WKB is the 'standard' but DuckDB has its own flavour of geometry and I imagine different databases store it in different formats yet again (i.e. the 'extension types we don't recognise'). The homogenisation of data is the main reason we cast the geometry column as WKB.
I'm happy to remove this fallback though, it'll force the user to declare a geometry column if we can't guess it.
There was a problem hiding this comment.
can't we rely on the dialect to know what is "valid geometry"?
There was a problem hiding this comment.
Have a look at 0f42b58 for how this could work. I personally think all the extra plumbing is not worth the slightly better heuristics and it should be reverted. Let me know if you think otherwise.
thomasp85
left a comment
There was a problem hiding this comment.
Also, a comment from Claude to consider:
GeoJSON reserved-key collisions in SpatialRenderer::prepare_data (src/writer/vegalite/layer.rs:2239):
properties.insert(col_name.clone(), value.clone());
feature.insert(col_name.clone(), value);
Every non-geometry column is also splatted into the feature top level. If a user has a column named type, this overwrites "Feature" and Vega-Lite stops treating it as a feature. properties, bbox, id collide too. At minimum, skip those reserved keys when copying to the top level — or just drop the top-level copy (see next point).
Worth fixing while you're here
Properties stored twice in the inline data (same lines as above). Each non-geometry column ends up both at feature.X and feature.properties.X. Vega-Lite resolves bare field: "X" references against datum.X, which is the top-level form — so the properties map is unused output. Drop the properties map (simpler, matches how the rest of the codebase emits row data), or drop the top-level copy and use from: { data: ..., fields: ["properties.X"] } style. Either way, half the geometry-row JSON goes away.
| } | ||
|
|
||
| fn sql_spatial_setup(&self) -> Vec<String> { | ||
| vec!["LOAD spatial".into()] |
There was a problem hiding this comment.
What happens if spatial is not installed? IS it installed on first load?
There was a problem hiding this comment.
No, the user should install it. We ourselves only ever try loading it.
If it isn't installed the error message returned by the execution of LOAD spatial surfaces to the user.
The one exception is that we try to install it in our own tests, but these are all contingent of the spatial feature gate and we can't really run the tests without DuckDB's spatial extension.
| return Some(candidates[0].name.clone()); | ||
| } | ||
|
|
||
| // Fall back to name-only match (e.g. extension types we don't recognise) |
There was a problem hiding this comment.
I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.
The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first
Co-authored-by: Thomas Lin Pedersen <thomasp85@gmail.com>
At some point maybe, but we'd have to have graticules first before it becomes meaningful.
Issue 1 about the reserved names is moot, because our naming scheme prevents collisions. I'm fine with being performatively defensive though. |
…to spatial_start
Geometry columns lose their native type during Arrow materialisation (DuckDB GEOMETRY becomes plain Binary). Add Reader::geometry_columns() that queries the backend's type system (DESCRIBE for DuckDB) to find actual geometry columns, with a name+type heuristic fallback. The detection is implemented as GeomTrait::detect_aesthetics(), which spatial overrides. This runs after global mapping merge so user-declared geometry mappings take precedence. Multiple native geometry columns are treated as ambiguous (user must declare explicitly). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 0f42b58.
This PR advances #336.
It introduces the
spatiallayer with draws the simple feature geometry.Internally, the geometry column is converted to WKB, which is handled via a dialect.
The vegalite writer converts the WKB to GeoJSON.
Like ggplot2, detection of the geometry column is automated for common cases.
This PR also adds the
ggsql:worldbuilt-in dataset that contains the Natural Earth 110m dataset with selected columns.Noteably absent in this PR is the whole projection ordeal.