Skip to content

feat: N-dimensional raster type extension (Phase 1)#749

Merged
paleolimbot merged 16 commits into
apache:mainfrom
james-willis:jw/nd-raster-type
May 14, 2026
Merged

feat: N-dimensional raster type extension (Phase 1)#749
paleolimbot merged 16 commits into
apache:mainfrom
james-willis:jw/nd-raster-type

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

@james-willis james-willis commented Apr 1, 2026

Summary

Upgrades SedonaDB's raster type from 2D tiles to N-dimensional chunks with named dimensions, enabling support for climate model outputs, hyperspectral imagery, Zarr/NetCDF datacubes, and other multi-dimensional geospatial datasets.

This is a schema-and-API migration. All 33 existing RS_* functions are migrated mechanically and produce identical outputs on 2D inputs; no new SQL functions are added in this PR. GDAL-backed functions added in #787 are ported to read from the new schema and gate on BandRef::is_2d() — N-D bands return a clear Execution error pending MDArray/Zarr support.

This PR is rebased onto post-#787 main and is the destructive replacement of #787's band schema. Because the schema swap and the GDAL loader port have to land together to keep main compiling, the PR is structured as four review-friendly commits but only the tip is required to be CI-green — intermediate commits intentionally don't build and the PR is not bisectable across commits 1–3.

This PR depends on #812 (GDAL outdb-URI parser); merge order: #812#749.

Note on utils.rs — the GDAL indb-loader helpers added in #811 (append_as_indb_raster, dataset_to_indb_raster) are deleted in this PR because they target the legacy BandMetadata / StorageType / band_data_writer API that the N-D port removes. They are reintroduced against the canonical N-D schema in the follow-up PR #818, which is already drafted and stacked on top of this branch.

Commits

  1. Schema swap — replace feat(sedona-raster-gdal) add GDAL foundation library for supporting GDAL-based RS functions #787's band schema with the canonical N-D schema. Removes nodata_value, storage_type, outdb_url, outdb_band_id; adds dim_names, source_shape, nullable view, outdb_uri, outdb_format. Top-level gains spatial_dims / spatial_shape.
  2. Trait surface + is_2dRasterRef and BandRef accessors for the N-D schema; BandRef::is_2d() default method returns true iff dim_names == ["y","x"] with the identity view (used by GDAL-backed paths to refuse N-D input cleanly).
  3. Reader / builder / RS_* migration — identity-view reader and builder, identity-view default via null view row, all 33 RS_* functions ported. RS_BandPath keeps its existing fragment-stripping (format-agnostic display) and is unchanged.
  4. Port GDAL loader to canonical schema — rewrite sedona-raster-gdal to read from outdb_uri + the feat(raster-gdal): add parse_outdb_source helper for the GDAL format driver #812 parser instead of storage_type / outdb_url / outdb_band_id. VSI normalization, dataset cache, and RasterIO bodies are byte-for-byte unchanged — only the schema-read sites move. Each GDAL-backed SQL function gates on band.is_2d() at entry.

Equivalence mapping (vs #787)

#787 field New schema
storage_type (InDb / OutDbRef enum) data.is_empty() — InDb bands carry their bytes in data; OutDb bands have an empty data buffer and resolve through outdb_uri + outdb_format.
outdb_url outdb_uri (no rename, same string)
outdb_band_id encoded inside outdb_uri (<uri>#band=N or GDAL native subdataset URI). Parsed only inside the GDAL format driver via #812's parse_outdb_source. Format-agnostic surfaces (incl. RS_BandPath) treat outdb_uri as opaque.
nodata_value nodata: Binary (typed bytes; a null row means "no nodata")

Top-level adds spatial_dims: List<Utf8> and spatial_shape: List<UInt64> (e.g. ["y","x"] and [height, width]).

Schema (canonical)

raster: Struct<
  crs:           Utf8View (nullable),
  transform:     List<Float64>,            // 6-element GDAL GeoTransform
  spatial_dims:  List<Utf8View>,           // ["x","y"] today
  spatial_shape: List<Int64>,              // [width, height] today
  bands: List<Struct<
    name:         Utf8 (nullable),
    dim_names:    List<Utf8>,              // 2D bands: ["y","x"]
    source_shape: List<UInt64>,            // C-order extent of source
    data_type:    UInt32 (BandDataType discriminant),
    nodata:       Binary (nullable),
    view:         List<Struct<source_axis,start,step,steps: Int64>>
                  (nullable; null = canonical identity view),
    outdb_uri:    Utf8 (nullable),         // null = InDb; set = OutDbRef
    outdb_format: Utf8View (nullable),     // GDAL driver hint
    data:         BinaryView (non-nullable; empty for outdb-only)
  >>
>

The view field is defined in the schema so it is part of the canonical N-D layout and round-trips through Arrow IPC, but in this PR every writer emits a null view row (canonical identity) and the reader treats a non-null view row as a hard read error. Encoding identity as a null row keeps the common "no slice applied" case free in Arrow space; consumers see the same shape and bytes regardless of which path produced the band. The composition path that decodes non-null view rows lands in #813.

Traits

  • RasterRefspatial_dims, spatial_shape, transform, crs, num_bands, band(i), width/height/x_dim/y_dim (derived).
  • BandRefname, dim_names, source_shape, shape (visible, derived), view, data_type, nodata, outdb_uri, outdb_format, nd_buffer, contiguous_data returning Cow<[u8]>, is_2d (default).
  • NdBuffer — zero-copy buffer view used at the numpy / Arrow C Data Interface boundary.

Builder

  • start_raster / start_band for full N-D; start_raster_2d / start_band_2d for legacy 2D.
  • start_band writes a null view (canonical identity) — no caller change.
  • finish_band validates each band's visible shape against the raster's spatial_shape along the spatial dims.

Backward compatibility

Legacy 2D rasters are represented as bands with dim_names = ["y","x"], source_shape = [height, width], and an identity view. All existing RS_* outputs are byte-identical on 2D inputs; every test that passed against #787's GDAL functions passes against the ported loader.

Crates modified

  • sedona-schema — N-D raster schema definition.
  • sedona-raster — traits, Arrow array reader, builder, affine transform, display.
  • sedona-testing — raster test helpers.
  • sedona-raster-functions — all RS_* function implementations migrated; RS_BandPath unchanged.
  • sedona-raster-gdal — loader reads outdb_uri + feat(raster-gdal): add parse_outdb_source helper for the GDAL format driver #812 parser; SQL functions gate on is_2d().

Out of scope (follow-ups)

Test plan

  • sedona-schema: tests pass; verifies the view field is nullable and the view-struct field order matches the indices module.
  • sedona-raster: tests pass — identity round-trip via start_band (null view row); on-disk schema invariant that identity bands serialise as null view rows; reader rejects non-null view rows with a clear error; BandRef::is_2d truth-table tests; builder visible-shape validation against spatial_shape.
  • sedona-raster-functions: tests pass — all existing function tests produce identical outputs after migration.
  • sedona-raster-gdal: 46/46 pass — every feat(sedona-raster-gdal) add GDAL foundation library for supporting GDAL-based RS functions #787 SQL-function test passes against the ported loader; new tests cover N-D rejection at raster_ref_to_gdal_mem and the VRT path; one end-to-end test confirms path#band=2 selects band 2 of a two-band GeoTIFF.
  • cargo clippy --all-targets -- -D warnings clean on the affected crates.
  • cargo fmt --all --check clean.

@james-willis james-willis marked this pull request as ready for review April 3, 2026 18:06
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look at the implementation tomorrow, but wanted to get a preliminary review of the schema out!

Comment thread rust/sedona-schema/src/raster.rs
Comment thread rust/sedona-schema/src/raster.rs Outdated
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another round...there are a few places where tests or comments were removed that were still useful.

I do think it's worth iterating a tiny bit on the schema, and importantly, I think we should wait until @Kontinuation has finished the GDAL read portion of the previous set of raster refactoring because (1) this was a lot of work and Kristin's time is valuable and (2) these are important use cases to consider when making the new type work!

Comment thread rust/sedona-raster/src/affine_transformation.rs Outdated
Comment thread rust/sedona-raster/src/affine_transformation.rs
Comment thread rust/sedona-raster/src/display.rs
Comment thread rust/sedona-raster/src/outdb_uri.rs Outdated
Comment thread rust/sedona-testing/src/rasters.rs
Comment thread rust/sedona-testing/src/rasters.rs
@james-willis james-willis marked this pull request as draft April 20, 2026 18:17
@james-willis james-willis force-pushed the jw/nd-raster-type branch 2 times, most recently from 0af0b01 to 4302af4 Compare April 20, 2026 18:41
james-willis added a commit to james-willis/sedona-db that referenced this pull request Apr 22, 2026
…hape

Replace the scalar x_dim/y_dim Utf8View fields with a top-level
spatial_dims (List<Utf8View>) and add spatial_shape (List<Int64>).
The raster struct becomes the single source of truth for the spatial
grid; every band must contain each name in spatial_dims with a size
matching the corresponding entry in spatial_shape.

- sedona-schema: column constants SPATIAL_DIMS / SPATIAL_SHAPE,
  raster_indices 0..4, new spatial_dims_type() / spatial_shape_type()
  accessors.
- sedona-raster: RasterRef gains spatial_dims() + spatial_shape();
  x_dim() / y_dim() / width() / height() become default methods
  reading from the top-level fields, so width/height work even on
  zero-band rasters ("target grid" GDAL warp pattern).
- RasterStructArray: decode the two new ListArrays; per-row slicing
  mirrors the existing band dim_names/shape accessors.
- RasterBuilder: new start_raster(transform, spatial_dims[],
  spatial_shape[], crs) signature; finish_raster strictly validates
  every band against the declared spatial grid and returns
  InvalidArgumentError on mismatch.
- sedona-testing: assert_raster_equal compares the new fields.

Addresses Dewey's review comment on PR apache#749 re: making the raster the
authority on the spatial grid.
james-willis added a commit to james-willis/sedona-db that referenced this pull request May 4, 2026
…hape

Replace the scalar x_dim/y_dim Utf8View fields with a top-level
spatial_dims (List<Utf8View>) and add spatial_shape (List<Int64>).
The raster struct becomes the single source of truth for the spatial
grid; every band must contain each name in spatial_dims with a size
matching the corresponding entry in spatial_shape.

- sedona-schema: column constants SPATIAL_DIMS / SPATIAL_SHAPE,
  raster_indices 0..4, new spatial_dims_type() / spatial_shape_type()
  accessors.
- sedona-raster: RasterRef gains spatial_dims() + spatial_shape();
  x_dim() / y_dim() / width() / height() become default methods
  reading from the top-level fields, so width/height work even on
  zero-band rasters ("target grid" GDAL warp pattern).
- RasterStructArray: decode the two new ListArrays; per-row slicing
  mirrors the existing band dim_names/shape accessors.
- RasterBuilder: new start_raster(transform, spatial_dims[],
  spatial_shape[], crs) signature; finish_raster strictly validates
  every band against the declared spatial grid and returns
  InvalidArgumentError on mismatch.
- sedona-testing: assert_raster_equal compares the new fields.

Addresses Dewey's review comment on PR apache#749 re: making the raster the
authority on the spatial grid.
@james-willis james-willis force-pushed the jw/nd-raster-type branch 3 times, most recently from 58b56e1 to 4ddfdbf Compare May 5, 2026 00:18
@james-willis james-willis force-pushed the jw/nd-raster-type branch 2 times, most recently from 999a577 to dda82a7 Compare May 5, 2026 18:14
Comment thread rust/sedona-raster-functions/src/rs_band_accessors.rs Outdated
@james-willis james-willis force-pushed the jw/nd-raster-type branch 6 times, most recently from 94be286 to 965e7d9 Compare May 5, 2026 22:22
Comment thread rust/sedona-raster-gdal/src/gdal_common.rs Outdated
@james-willis james-willis force-pushed the jw/nd-raster-type branch 3 times, most recently from 3484f09 to a33fdd1 Compare May 5, 2026 23:00
Replaces apache#787's 2D-only band schema with the canonical N-D schema:
spatial_dims/spatial_shape at the raster level; bands carry dim_names,
source_shape, nullable view, outdb_uri, outdb_format, plus the
non-nullable data buffer. Removes nodata_value, storage_type,
outdb_url, and outdb_band_id - every one is encodable in the new
schema:

- storage_type ↔ outdb_uri.is_null() (null = InDb, set = OutDbRef).
- outdb_url ↔ outdb_uri (no rename, same string).
- outdb_band_id ↔ encoded inside outdb_uri (#band=N or GDAL native
  subdataset URI), parsed only inside the GDAL format driver.
- nodata_value ↔ typed nodata: Binary (a null row means "no nodata").

Top-level adds spatial_dims: List<Utf8View> and spatial_shape:
List<Int64>; nullable view is List<Struct<source_axis, start, step,
steps: Int64>> where a null row encodes the canonical identity view.

Note: intermediate commits in this PR are not expected to build; only
the PR tip is CI-green. The trait, reader/builder, RS_* migration,
and GDAL loader port land in subsequent commits.
RasterRef and BandRef accessors over the canonical N-D schema:
spatial_dims/spatial_shape, transform, crs, num_bands, band(i), and
band-level dim_names, source_shape, shape (visible, derived from view),
view, data_type, nodata, outdb_uri, outdb_format, nd_buffer,
contiguous_data returning Cow<[u8]>.

validate_view enforces all view rules including i64-overflow on
start + (steps-1)*step. NdBuffer exposes raw buffer + shape + byte
strides + offset for zero-copy access (numpy / Arrow C Data Interface
boundary); VIEW → byte strides happens inside nd_buffer().

Adds BandRef::is_2d() default method as the gate GDAL-backed paths
use to refuse N-D input cleanly: true iff dim_names == ["y","x"]
over the identity view.
`raster_ref_to_gdal_mem` previously returned a `Result<Dataset>` and
guarded against `BandRef::contiguous_data()` returning `Cow::Owned`
with a runtime tripwire ("Internal: contiguous_data must be borrowed
for is_2d bands; got owned"). The check was correct — handing GDAL a
pointer into a `Vec<u8>` that drops at the end of the iteration would
dangle — but it ties an internal invariant ("`is_2d` ⇒ Borrowed") to
incidental properties of today's reader. Any future copy path in the
reader (compression, BinaryView block-boundary stitching, alignment
fix-up, sliced/broadcast/transposed views from apache#813 / apache#750) would
detonate the tripwire on perfectly valid 2-D rasters.

Change: return `Result<(Dataset, Vec<Vec<u8>>)>`. On `Cow::Borrowed`
the GDAL band still points directly at the StructArray buffer
(zero-copy). On `Cow::Owned` we move the `Vec<u8>` out of the Cow
without copying — the reader's existing materialization is the only
allocation — and stash it in the returned vector. The caller (the
provider in `gdal_dataset_provider.rs`) parks it in a new
`RasterDataset::_owned_band_bytes` field that lives as long as the
MEM dataset that holds the pointers.

`raster_ref_to_gdal_empty` discards the always-empty vector.
`BandRef::nd_buffer()` and `BandRef::contiguous_data()` previously
returned the empty Arrow `data` buffer when `is_indb() == false`,
giving consumers silent garbage instead of a signal that the band
needs a backend-specific OutDb resolver. Replace the silent path
with `ArrowError::NotYetImplemented`, documenting the integration
point for the future resolver work.

No existing call site in `sedona-raster-functions` reads bytes from
OutDb bands today (RS_BandPath only reads the URI), so this guard
turns a latent gap into a visible one without breaking behavior.
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! This is a solid schema design that will be the foundation of a solid nd spatial raster framework that will be flexible enough to deliver the type of performance we are hoping for in the native translation of Sedona raster functions to Rust.

Most of my comments are about minimizing the diff to avoid disrupting existing work. With a few small changes I think you avoid almost all disruption of existing + in progress work and minimize the footprint of this PR considerably. Basically, just keep RasterMetadata and BandMetadata and bands() around, even if they are just shims returning concrete structs you'll remove later. For what it's worth I frequently do this with PRs into larger projects using VSCode's pull request extension which nicely lets you roll through your PR diff to minimize it.

Comment thread rust/sedona-raster-functions/src/executor.rs Outdated
Comment thread rust/sedona-raster-functions/src/rs_band_accessors.rs Outdated
Comment thread rust/sedona-raster-functions/src/rs_band_accessors.rs Outdated
Comment thread rust/sedona-raster-functions/src/rs_convexhull.rs Outdated
Comment thread rust/sedona-raster-functions/src/rs_envelope.rs Outdated
Comment thread rust/sedona-raster/src/builder.rs Outdated
Comment thread rust/sedona-raster/src/builder.rs
Comment thread rust/sedona-raster/src/builder.rs
Comment thread rust/sedona-raster/src/builder.rs Outdated
Comment thread rust/sedona-raster/src/builder.rs Outdated
@james-willis james-willis marked this pull request as draft May 12, 2026 20:55
@james-willis
Copy link
Copy Markdown
Contributor Author

Design note on raster_ref_to_gdal_mem lifetime plumbing

The GDAL backend in this PR uses the shim's BandRef::data() -> &[u8] rather than BandRef::contiguous_data() -> Result<Cow<'_, [u8]>>. That's deliberate, and it lets us drop the (Dataset, Vec<Vec<u8>>) tuple return / _owned_band_bytes: Vec<Vec<u8>> field on RasterDataset.

Why it works in this PR: every band is identity-viewed, so the bytes GDAL points at are a borrow into the StructArray's backing buffer. RasterDataset already keeps the source raster alive via _source_raster: PhantomData<&'a dyn RasterRef>, so the pointer is valid for the dataset's lifetime — no extra owned-bytes plumbing needed. Non-identity views error at read time here, so Cow::Owned cannot appear.

What changes when the view machinery lands: once non-identity views are supported, contiguous_data() may return Cow::Owned for sliced/permuted bands (the reader materializes a new buffer). That Vec<u8> will need to outlive the GDAL dataset that points into it. Bringing back the _owned_band_bytes: Vec<Vec<u8>> field on RasterDataset (and the matching tuple return from raster_ref_to_gdal_mem) is the right way to do that — earlier revisions of this PR carried that plumbing for exactly this reason.

Short version: if you're picking this code up later to add view support, expect to re-thread owned band bytes from the reader, through raster_ref_to_gdal_mem, and onto RasterDataset.

…lity shim

Reintroduces the pre-N-D metadata surface as a parallel API layer so call
sites stay aligned with main and downstream branches don't have to chase
the N-D refactor while their work is in flight. Per Dewey's PR review
feedback, this is intentionally a compatibility scaffold — DB-15 tracks
the removal once the stack lands.

What the shim adds (`rust/sedona-raster/src/traits.rs`):
- `RasterMetadata` struct + `MetadataRef` trait (with a blanket
  `&T: MetadataRef` impl so `from_metadata(&m)` and
  `from_metadata(raster.metadata())` both compile)
- `BandMetadata` struct exposing `data_type()`, `nodata_value()`,
  `storage_type()`, `outdb_url()`, `outdb_band_id()`,
  `nodata_value_as_f64()` — `outdb_url`/`outdb_band_id` are eagerly
  parsed from the N-D `outdb_uri` via `split_outdb_band_fragment`
- `RasterRef::metadata() -> RasterMetadata` and
  `BandRef::metadata() -> BandMetadata` default impls
- `BandRef::data() -> &[u8]` for InDb identity views (panics elsewhere
  in PR-B; the `Cow::Owned` materialized-view case comes back with the
  view machinery in PR-D)
- `Bands<'a>` view + `RasterRefBandsExt` extension trait so callers can
  keep using `raster.bands().band(i)` / `.iter()` / `.len()`

Builder shim (`rust/sedona-raster/src/builder.rs`):
- `start_raster(&dyn MetadataRef, Option<&str>)` and
  `start_band(BandMetadata)` alongside the N-D-native
  `start_raster_nd` / `start_band_nd` (and the 2D positional convenience
  variants)
- Restored doc example using the `RasterMetadata{..}` / `BandMetadata{..}`
  literal pattern

`AffineMatrix` and GDAL adapters:
- `AffineMatrix::from_metadata<M: MetadataRef>(m: M)` (replaces
  `from_transform`) so `to_world_coordinate(raster)` / `rotation(raster)`
  /`to_raster_coordinate(raster)` match main
- Re-add `ToGdalGeoTransform` and `RasterMetadataFromGdalGeoTransform`
  helper traits in `gdal_common.rs`; `raster_ref_to_gdal_mem` reads
  `bands.band(i)` with `band.metadata().{data_type, nodata_value,
  storage_type}` and returns `Result<Dataset>` (no more owned-bytes
  plumbing — `band.data()` yields a borrowed slice straight out of the
  StructArray, kept alive via `RasterDataset._source_raster:
  PhantomData<&dyn RasterRef>`)
- `build_vrt_from_sources` / `raster_ref_to_gdal` / `VrtKey::from_raster`
  in `gdal_dataset_provider.rs` use 1-based `bands.band(i)` and the
  `BandMetadata` accessors; `VrtBandKey` gets its `storage_type` field
  back

Test fixtures and call sites: `rs_band_accessors`, `rs_bandpath`,
`rs_convexhull`, `rs_envelope`, `rs_example`, `rs_georeference`,
`rs_geotransform`, `rs_numbands`, `rs_pixel_functions`, `rs_setsrid`,
`rs_size`, `rs_spatial_predicates`, `rs_srid`, `rasters.rs`,
`benchmark_util.rs`, and the affine / display / GDAL tests all use the
metadata-struct surface that main exposes. The handful of PR-B-only
additions that remain are structural (the `RasterRefBandsExt` import on
the few generic helpers that need it, and the `is_2d()` guards / N-D
rejection tests on the GDAL backend).

Net effect on PR-B vs main: roughly 18 files, +3.2k / -1.5k, with most
of the inserts now sitting in the N-D-native reader/builder code and
nearly all of main's pre-N-D test bodies preserved verbatim.
…on cfg(test)

`utils.rs` is the canonical `Dataset -> SedonaDB raster` loader on main:
`append_as_indb_raster` and `dataset_to_indb_raster` are re-exported from
the crate root and are the public API consumers reach for. The earlier
N-D loader port deleted the file outright and replaced its callers with
code reaching into `BandRef::contiguous_data()` directly — a breaking
API change with no in-tree replacement. Restore the file verbatim from
main; the shim's `start_raster(&RasterMetadata{..}) /
start_band(BandMetadata{..})` already produces N-D-compatible 2D
rasters, so the existing body works unchanged. The only PR-B-only diff
is `use sedona_raster::traits::{RasterRef, RasterRefBandsExt};` in the
test module so `raster.bands()` resolves through our extension trait.
Brings 7 GDAL-loader tests back into the suite.

Also gate `mod source_uri;` on `#[cfg(test)]` to match main. With the
shim eagerly parsing the `#band=N` fragment via
`split_outdb_band_fragment` in `sedona-raster::traits`, the GDAL
backend no longer reaches for `parse_outdb_source`. Main already had
the module test-only for exactly this reason — neither
`#[allow(dead_code)]` nor a TODO is needed.
…me tie

Two structural cleanups that cascade through call sites and shave ~280
lines of divergence vs main:

1. Fold `RasterRefBandsExt::bands()` into `RasterRef` as a required
   trait method. Each `impl RasterRef for X` provides
   `fn bands(&self) -> Bands<'_> { Bands::new(self) }` (one line per
   impl, two impls total). Drops the extension trait, its blanket impl
   for `T: RasterRef`, its impl for `dyn RasterRef + 'r`, the six
   `use sedona_raster::traits::RasterRefBandsExt;` imports across the
   raster-functions / raster-gdal / testing crates, and the
   `+ RasterRefBandsExt` bound on the four generic GDAL-backend
   functions (`raster_ref_to_gdal_mem`, `raster_ref_to_gdal_empty`,
   `build_vrt_from_sources`, `raster_ref_to_gdal`,
   `VrtKey::from_raster`). Adds `Bands::new` as a pub constructor so
   `bands()` impls outside `traits.rs` can build the wrapper.

2. Rebuild `RasterRefImpl<'a>` to hold flat `&'a Array` references for
   every field instead of a back-pointer to `&'a RasterStructArray<'a>`.
   `RasterStructArray::get` goes back to `fn get(&self, idx) ->
   RasterRefImpl<'a>` (was `&'a self`), which removes the lifetime tie
   that was forcing every scalar-raster call site to hoist
   `RasterStructArray::new(...)` into a local. Reverts the
   `let arr0; arr0 = ...; if arr0.is_null(0) ...` plumbing in
   `executor.rs` and the two-line hoists in `sedona-testing/rasters.rs`.
   Restores `gdal_common::tests::single_raster` so the four
   `let rsa = RasterStructArray::new(&raster_array); let raster =
   rsa.get(0).unwrap();` test sites can go back to
   `let raster = single_raster(&raster_array);`.

Also revert `AffineMatrix::from_metadata` to main's `&dyn MetadataRef`
parameter — callers pass `&raster.metadata()` (one extra `&` versus
main, since our shim's `metadata()` returns `RasterMetadata` by value
rather than `&dyn MetadataRef`). The `impl<T: MetadataRef + ?Sized>
MetadataRef for &T` blanket impl that was carrying the by-value-generic
form goes with it.
`builder.rs` and `array.rs` each had their test modules entirely rewritten
during the N-D port, with main's tests deleted and replaced by N-D-flavoured
variants. The shim makes main's tests buildable as-is. Restore them
verbatim and keep the PR-B-only N-D tests alongside:

- builder.rs: `test_iterator_basic_functionality`, `test_multi_band_iterator`,
  `test_copy_metadata_from_iterator`, `test_band_data_types`,
  `test_outdb_metadata_fields`, `test_band_access_errors`. Adds
  `use crate::traits::{BandMetadata, MetadataRef};` at the top so `start_band`
  / `start_raster` can drop their fully-qualified `crate::traits::` paths
  and match main's call shape, and the test module pulls `BandMetadata` /
  `RasterMetadata` / `StorageType` in alongside the existing N-D imports.
- array.rs: `test_array_basic_functionality`, `test_multi_band_array`,
  `test_raster_is_null`. The fourth main test
  (`test_invalid_band_metadata_returns_err`) doesn't translate — it pokes a
  nested `metadata` Struct that doesn't exist in our flat N-D layout — and
  the equivalent corruption check is already covered by
  `band_and_band_data_type_return_none_for_unknown_discriminant`.

Two cleanups that fall out:

- `Bands::band(0)` and out-of-range errors now use main's exact wording
  ("band numbers must be 1-based" / "is out of range") so
  `test_band_access_errors` passes without a tweak.
- `BandRefImpl::data()` is now an explicit override that returns the raw
  `data` column bytes (matching main: `&[]` for OutDb, the row-major buffer
  for InDb identity views) instead of routing through `nd_buffer()` and
  panicking on OutDb. The shim's default `BandRef::data()` is kept for
  other implementors.

PR-B total: 10 files +3,286 / -897 (was +3,260 / -1,985 before this round).
Second-round review punch list — small, surgical:

- `builder.rs` imports: merge the two `sedona_schema::raster::{...}`
  imports into one, recombine `arrow_schema::{ArrowError, DataType}`
  (the lone `use arrow_schema::DataType;` line was unnecessary), drop
  the duplicate blank lines around the imports block.
- `start_raster_2d` doc-comment was the stitched-together result of two
  drafts ("Sets `spatial_dims=...`, and Build the 6-element GDAL
  transform..." with a self-reference at the end). Rewrite as a single
  coherent sentence and point at `Self::start_raster` instead of
  `Self::start_raster_2d`.
- `MetadataRef`: restore main's per-method doc comments
  (`/// Width of the raster in pixels` etc.) — they were dropped on the
  N-D rewrite for no N-D reason since this trait is the pre-N-D shim.
- `array.rs`: drop the three `// ----` banner blocks ("Band
  implementation", "Raster implementation", "RasterStructArray …") that
  weren't in main; the doc-comment on each type already names it. Also
  drop the `Critical #N` / `Important #N` prefixes from the five
  test-section dividers — they encode review prioritization in source.
- `sedona-schema/src/raster.rs`: `StorageType` repr back to `#[repr(u16)]`
  to match main; the only existing caller casts `as u32` explicitly so
  binary layout doesn't matter here.
- `traits.rs`: drop the unused `PartialEq` derive on `RasterMetadata`
  and `BandMetadata` — main has bare `#[derive(Debug, Clone)]` on both
  and no workspace consumer compares either via `==` / `assert_eq!` on
  the values themselves.

The other big finding (drop the inherent `impl RasterMetadata { ... }`
in favour of `MetadataRef` trait imports at every call site) was a wash
once accounted for — 25 inherent-method lines vs ~10 added trait imports
across ~8 files, with the structural cost that every downstream
consumer of `RasterMetadata.width()` style would need the trait import
forever. Kept the inherent block.

PR-B total: 10 files +3,270 / -888.
CI's codespell hook flagged two occurrences in `traits.rs` doc comments
(`MetadataRef` and `RasterRef::bands` definitions).
@james-willis james-willis marked this pull request as ready for review May 12, 2026 23:36
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few optional nits probably worth considering now but in general this look great! Thank you!

Comment thread rust/sedona-raster/src/traits.rs Outdated
Comment thread rust/sedona-raster/src/traits.rs Outdated
Comment thread rust/sedona-raster/src/traits.rs Outdated
Comment thread rust/sedona-raster/src/array.rs
Comment thread rust/sedona-raster/src/array.rs Outdated
@james-willis james-willis force-pushed the jw/nd-raster-type branch 2 times, most recently from 8e62d5f to f1eaba4 Compare May 13, 2026 18:40
- `RasterRefImpl::band()`: change the non-null view row case from a
  silent `return None` to an `assert!`, surfacing the corrupt-schema
  invariant rather than hiding it behind an out-of-range-looking miss.
- `BandRef::data()` (trait default): document as a compatibility shim,
  delegate to `contiguous_data()` and panic on `Cow::Owned` (the
  view-materialized path can't be returned through `&[u8]` since the
  owned `Vec` would die at the end of the call). Implementors that
  need view-materialized bytes via `data()` must override and anchor
  the materialized buffer on `Self`; other consumers should reach for
  `contiguous_data()` directly.
- `BandRefImpl::data()` (concrete override): keeps the one-line raw
  bytes return; correct by construction since the upstream `assert!`
  rejects non-identity views before BandRefImpl construction. GDAL
  backend in `raster_ref_to_gdal_mem` keeps using this directly —
  no `contiguous_data()` plumbing until view support actually lands.
- `traits::nodata_bytes_to_f64`: drop `pub`. The only external surface
  is the lossless wrapper (`nodata_bytes_to_f64_lossless`); the lossy
  variant is now an internal fallback for non-i64/u64 dispatch.
  `sedona-raster-gdal::gdal_common::nodata_bytes_to_f64` is a separate
  `Option<f64>`-returning helper, unaffected.
- `traits.rs`: drop the `test_nodata_as_f64_int64_loses_precision_above_2_pow_53`
  test. With `nodata_bytes_to_f64_lossless` as the canonical path,
  locking in the lossy fallback's rounding behavior on out-of-mantissa
  Int64 documents an implementation detail rather than a contract.
james-willis added a commit to james-willis/sedona-db that referenced this pull request May 13, 2026
…nal_err

Per Dewey's review (PR apache#749): the silent-`return None` and the assert!
follow-up both hid the difference between "out-of-range index" and
"schema-level invariant violation." Change `RasterRef::band(idx)` to
return `Result<Option<Box<dyn BandRef + '_>>, DataFusionError>`:

- `Ok(None)` — out-of-range index (back-compat with the old Option
  semantics for that case).
- `Ok(Some(_))` — valid band.
- `Err(DataFusionError::External(...))` — schema corruption, surfaced
  via `sedona_common::sedona_internal_err!` with the standardized
  SedonaDB internal-error message format.

`RasterRefImpl::band` uses `sedona_internal_err!` for the non-null view
row case; the upstream `assert!` is gone.

Back-compat preserved at the `Bands::band(N)` wrapper — it still
returns `Result<_, ArrowError>` (matching main's `BandsRef::band`
surface). `DataFusionError` from the underlying `RasterRef::band` is
wrapped into `ArrowError::ExternalError` so callers using main's
`bands().band(N)?` shape see no change.

Default impls and internal callers updated:
- `band_data_type` default: `.ok().flatten().map(...)` — silently drops
  corruption (the only consumer is the default, which `RasterRefImpl`
  overrides; corruption surfaces through the explicit `band()` path).
- `band_by_name`: same pattern.
- `Bands::iter`: `.filter_map(|i| raster.band(i).ok().flatten())` —
  matches main's `BandsRef::iter` (no error channel through the
  iterator). Doc note that consumers needing corruption signal must
  iterate by index and call `band()` directly.

Test sites updated mechanically: every `r.band(N).unwrap()` becomes
`r.band(N).unwrap().unwrap()` (Result → Option → Box), and
`r.band(N).is_none()` becomes `r.band(N).unwrap().is_none()`.
`bands.band(N)` style (the 1-based Result-returning wrapper) is
untouched.

`sedona-raster/Cargo.toml` picks up `datafusion-common` as a direct
dep so the macro's `datafusion_common::DataFusionError::External(...)`
path resolves.
…ion_err

Use the SedonaDB-internal-error macro as the assert message so the
non-identity view panic carries the standard "file a bug report"
pointer and backtrace, matching how the rest of the codebase signals
internal invariant violations.
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment thread rust/sedona-raster/src/array.rs Outdated
…l errors

Changes `RasterRef::band(index)` from `Option<Box<dyn BandRef + '_>>` to
`Result<Box<dyn BandRef + '_>, ArrowError>`, matching main's pre-N-D
shape and giving callers a single signal that distinguishes "found a
band" from "lookup failed." Knock-on effects:

- `Bands::band` simplifies — no more `ok_or_else` to convert the Option
  back into the ArrowError the back-compat surface promises.
- `Bands::iter` now yields Result items so corrupt bands surface
  rather than being silently dropped by the previous `filter_map`.
- `band_by_name` mirrors `band` and returns a Result.
- `band_data_type` keeps its Option fast-path shape (consistent with
  the other ScalAR-column fast-path accessors), using `.ok()` to
  collapse errors to None.
- Schema-corruption paths in `RasterRefImpl::band` (empty source_shape,
  unknown data_type discriminant, non-null view row) now return
  `ArrowError::ExternalError(Box::new(sedona_internal_datafusion_err!(
  ...)))` so the error message carries the standardised
  "SedonaDB internal error: ... file a bug report" framing, addressing
  the spirit of Dewey's review comment.

Tests updated to assert the new error shape rather than `is_none()`.
@paleolimbot paleolimbot merged commit 8ed9e7d into apache:main May 14, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants