feat: N-dimensional raster type extension (Phase 1)#749
Conversation
2b969d1 to
66b3c2e
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
I'll take a closer look at the implementation tomorrow, but wanted to get a preliminary review of the schema out!
paleolimbot
left a comment
There was a problem hiding this comment.
I did another round...there are a few places where tests or comments were removed that were still useful.
I do think it's worth iterating a tiny bit on the schema, and importantly, I think we should wait until @Kontinuation has finished the GDAL read portion of the previous set of raster refactoring because (1) this was a lot of work and Kristin's time is valuable and (2) these are important use cases to consider when making the new type work!
0af0b01 to
4302af4
Compare
…hape
Replace the scalar x_dim/y_dim Utf8View fields with a top-level
spatial_dims (List<Utf8View>) and add spatial_shape (List<Int64>).
The raster struct becomes the single source of truth for the spatial
grid; every band must contain each name in spatial_dims with a size
matching the corresponding entry in spatial_shape.
- sedona-schema: column constants SPATIAL_DIMS / SPATIAL_SHAPE,
raster_indices 0..4, new spatial_dims_type() / spatial_shape_type()
accessors.
- sedona-raster: RasterRef gains spatial_dims() + spatial_shape();
x_dim() / y_dim() / width() / height() become default methods
reading from the top-level fields, so width/height work even on
zero-band rasters ("target grid" GDAL warp pattern).
- RasterStructArray: decode the two new ListArrays; per-row slicing
mirrors the existing band dim_names/shape accessors.
- RasterBuilder: new start_raster(transform, spatial_dims[],
spatial_shape[], crs) signature; finish_raster strictly validates
every band against the declared spatial grid and returns
InvalidArgumentError on mismatch.
- sedona-testing: assert_raster_equal compares the new fields.
Addresses Dewey's review comment on PR apache#749 re: making the raster the
authority on the spatial grid.
4302af4 to
4033c82
Compare
…hape
Replace the scalar x_dim/y_dim Utf8View fields with a top-level
spatial_dims (List<Utf8View>) and add spatial_shape (List<Int64>).
The raster struct becomes the single source of truth for the spatial
grid; every band must contain each name in spatial_dims with a size
matching the corresponding entry in spatial_shape.
- sedona-schema: column constants SPATIAL_DIMS / SPATIAL_SHAPE,
raster_indices 0..4, new spatial_dims_type() / spatial_shape_type()
accessors.
- sedona-raster: RasterRef gains spatial_dims() + spatial_shape();
x_dim() / y_dim() / width() / height() become default methods
reading from the top-level fields, so width/height work even on
zero-band rasters ("target grid" GDAL warp pattern).
- RasterStructArray: decode the two new ListArrays; per-row slicing
mirrors the existing band dim_names/shape accessors.
- RasterBuilder: new start_raster(transform, spatial_dims[],
spatial_shape[], crs) signature; finish_raster strictly validates
every band against the declared spatial grid and returns
InvalidArgumentError on mismatch.
- sedona-testing: assert_raster_equal compares the new fields.
Addresses Dewey's review comment on PR apache#749 re: making the raster the
authority on the spatial grid.
58b56e1 to
4ddfdbf
Compare
999a577 to
dda82a7
Compare
94be286 to
965e7d9
Compare
3484f09 to
a33fdd1
Compare
Replaces apache#787's 2D-only band schema with the canonical N-D schema: spatial_dims/spatial_shape at the raster level; bands carry dim_names, source_shape, nullable view, outdb_uri, outdb_format, plus the non-nullable data buffer. Removes nodata_value, storage_type, outdb_url, and outdb_band_id - every one is encodable in the new schema: - storage_type ↔ outdb_uri.is_null() (null = InDb, set = OutDbRef). - outdb_url ↔ outdb_uri (no rename, same string). - outdb_band_id ↔ encoded inside outdb_uri (#band=N or GDAL native subdataset URI), parsed only inside the GDAL format driver. - nodata_value ↔ typed nodata: Binary (a null row means "no nodata"). Top-level adds spatial_dims: List<Utf8View> and spatial_shape: List<Int64>; nullable view is List<Struct<source_axis, start, step, steps: Int64>> where a null row encodes the canonical identity view. Note: intermediate commits in this PR are not expected to build; only the PR tip is CI-green. The trait, reader/builder, RS_* migration, and GDAL loader port land in subsequent commits.
RasterRef and BandRef accessors over the canonical N-D schema: spatial_dims/spatial_shape, transform, crs, num_bands, band(i), and band-level dim_names, source_shape, shape (visible, derived from view), view, data_type, nodata, outdb_uri, outdb_format, nd_buffer, contiguous_data returning Cow<[u8]>. validate_view enforces all view rules including i64-overflow on start + (steps-1)*step. NdBuffer exposes raw buffer + shape + byte strides + offset for zero-copy access (numpy / Arrow C Data Interface boundary); VIEW → byte strides happens inside nd_buffer(). Adds BandRef::is_2d() default method as the gate GDAL-backed paths use to refuse N-D input cleanly: true iff dim_names == ["y","x"] over the identity view.
`raster_ref_to_gdal_mem` previously returned a `Result<Dataset>` and
guarded against `BandRef::contiguous_data()` returning `Cow::Owned`
with a runtime tripwire ("Internal: contiguous_data must be borrowed
for is_2d bands; got owned"). The check was correct — handing GDAL a
pointer into a `Vec<u8>` that drops at the end of the iteration would
dangle — but it ties an internal invariant ("`is_2d` ⇒ Borrowed") to
incidental properties of today's reader. Any future copy path in the
reader (compression, BinaryView block-boundary stitching, alignment
fix-up, sliced/broadcast/transposed views from apache#813 / apache#750) would
detonate the tripwire on perfectly valid 2-D rasters.
Change: return `Result<(Dataset, Vec<Vec<u8>>)>`. On `Cow::Borrowed`
the GDAL band still points directly at the StructArray buffer
(zero-copy). On `Cow::Owned` we move the `Vec<u8>` out of the Cow
without copying — the reader's existing materialization is the only
allocation — and stash it in the returned vector. The caller (the
provider in `gdal_dataset_provider.rs`) parks it in a new
`RasterDataset::_owned_band_bytes` field that lives as long as the
MEM dataset that holds the pointers.
`raster_ref_to_gdal_empty` discards the always-empty vector.
`BandRef::nd_buffer()` and `BandRef::contiguous_data()` previously returned the empty Arrow `data` buffer when `is_indb() == false`, giving consumers silent garbage instead of a signal that the band needs a backend-specific OutDb resolver. Replace the silent path with `ArrowError::NotYetImplemented`, documenting the integration point for the future resolver work. No existing call site in `sedona-raster-functions` reads bytes from OutDb bands today (RS_BandPath only reads the URI), so this guard turns a latent gap into a visible one without breaking behavior.
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for working on this! This is a solid schema design that will be the foundation of a solid nd spatial raster framework that will be flexible enough to deliver the type of performance we are hoping for in the native translation of Sedona raster functions to Rust.
Most of my comments are about minimizing the diff to avoid disrupting existing work. With a few small changes I think you avoid almost all disruption of existing + in progress work and minimize the footprint of this PR considerably. Basically, just keep RasterMetadata and BandMetadata and bands() around, even if they are just shims returning concrete structs you'll remove later. For what it's worth I frequently do this with PRs into larger projects using VSCode's pull request extension which nicely lets you roll through your PR diff to minimize it.
|
Design note on The GDAL backend in this PR uses the shim's Why it works in this PR: every band is identity-viewed, so the bytes GDAL points at are a borrow into the StructArray's backing buffer. What changes when the view machinery lands: once non-identity views are supported, Short version: if you're picking this code up later to add view support, expect to re-thread owned band bytes from the reader, through |
…lity shim
Reintroduces the pre-N-D metadata surface as a parallel API layer so call
sites stay aligned with main and downstream branches don't have to chase
the N-D refactor while their work is in flight. Per Dewey's PR review
feedback, this is intentionally a compatibility scaffold — DB-15 tracks
the removal once the stack lands.
What the shim adds (`rust/sedona-raster/src/traits.rs`):
- `RasterMetadata` struct + `MetadataRef` trait (with a blanket
`&T: MetadataRef` impl so `from_metadata(&m)` and
`from_metadata(raster.metadata())` both compile)
- `BandMetadata` struct exposing `data_type()`, `nodata_value()`,
`storage_type()`, `outdb_url()`, `outdb_band_id()`,
`nodata_value_as_f64()` — `outdb_url`/`outdb_band_id` are eagerly
parsed from the N-D `outdb_uri` via `split_outdb_band_fragment`
- `RasterRef::metadata() -> RasterMetadata` and
`BandRef::metadata() -> BandMetadata` default impls
- `BandRef::data() -> &[u8]` for InDb identity views (panics elsewhere
in PR-B; the `Cow::Owned` materialized-view case comes back with the
view machinery in PR-D)
- `Bands<'a>` view + `RasterRefBandsExt` extension trait so callers can
keep using `raster.bands().band(i)` / `.iter()` / `.len()`
Builder shim (`rust/sedona-raster/src/builder.rs`):
- `start_raster(&dyn MetadataRef, Option<&str>)` and
`start_band(BandMetadata)` alongside the N-D-native
`start_raster_nd` / `start_band_nd` (and the 2D positional convenience
variants)
- Restored doc example using the `RasterMetadata{..}` / `BandMetadata{..}`
literal pattern
`AffineMatrix` and GDAL adapters:
- `AffineMatrix::from_metadata<M: MetadataRef>(m: M)` (replaces
`from_transform`) so `to_world_coordinate(raster)` / `rotation(raster)`
/`to_raster_coordinate(raster)` match main
- Re-add `ToGdalGeoTransform` and `RasterMetadataFromGdalGeoTransform`
helper traits in `gdal_common.rs`; `raster_ref_to_gdal_mem` reads
`bands.band(i)` with `band.metadata().{data_type, nodata_value,
storage_type}` and returns `Result<Dataset>` (no more owned-bytes
plumbing — `band.data()` yields a borrowed slice straight out of the
StructArray, kept alive via `RasterDataset._source_raster:
PhantomData<&dyn RasterRef>`)
- `build_vrt_from_sources` / `raster_ref_to_gdal` / `VrtKey::from_raster`
in `gdal_dataset_provider.rs` use 1-based `bands.band(i)` and the
`BandMetadata` accessors; `VrtBandKey` gets its `storage_type` field
back
Test fixtures and call sites: `rs_band_accessors`, `rs_bandpath`,
`rs_convexhull`, `rs_envelope`, `rs_example`, `rs_georeference`,
`rs_geotransform`, `rs_numbands`, `rs_pixel_functions`, `rs_setsrid`,
`rs_size`, `rs_spatial_predicates`, `rs_srid`, `rasters.rs`,
`benchmark_util.rs`, and the affine / display / GDAL tests all use the
metadata-struct surface that main exposes. The handful of PR-B-only
additions that remain are structural (the `RasterRefBandsExt` import on
the few generic helpers that need it, and the `is_2d()` guards / N-D
rejection tests on the GDAL backend).
Net effect on PR-B vs main: roughly 18 files, +3.2k / -1.5k, with most
of the inserts now sitting in the N-D-native reader/builder code and
nearly all of main's pre-N-D test bodies preserved verbatim.
…on cfg(test)
`utils.rs` is the canonical `Dataset -> SedonaDB raster` loader on main:
`append_as_indb_raster` and `dataset_to_indb_raster` are re-exported from
the crate root and are the public API consumers reach for. The earlier
N-D loader port deleted the file outright and replaced its callers with
code reaching into `BandRef::contiguous_data()` directly — a breaking
API change with no in-tree replacement. Restore the file verbatim from
main; the shim's `start_raster(&RasterMetadata{..}) /
start_band(BandMetadata{..})` already produces N-D-compatible 2D
rasters, so the existing body works unchanged. The only PR-B-only diff
is `use sedona_raster::traits::{RasterRef, RasterRefBandsExt};` in the
test module so `raster.bands()` resolves through our extension trait.
Brings 7 GDAL-loader tests back into the suite.
Also gate `mod source_uri;` on `#[cfg(test)]` to match main. With the
shim eagerly parsing the `#band=N` fragment via
`split_outdb_band_fragment` in `sedona-raster::traits`, the GDAL
backend no longer reaches for `parse_outdb_source`. Main already had
the module test-only for exactly this reason — neither
`#[allow(dead_code)]` nor a TODO is needed.
e7feb8a to
5b12a07
Compare
…me tie
Two structural cleanups that cascade through call sites and shave ~280
lines of divergence vs main:
1. Fold `RasterRefBandsExt::bands()` into `RasterRef` as a required
trait method. Each `impl RasterRef for X` provides
`fn bands(&self) -> Bands<'_> { Bands::new(self) }` (one line per
impl, two impls total). Drops the extension trait, its blanket impl
for `T: RasterRef`, its impl for `dyn RasterRef + 'r`, the six
`use sedona_raster::traits::RasterRefBandsExt;` imports across the
raster-functions / raster-gdal / testing crates, and the
`+ RasterRefBandsExt` bound on the four generic GDAL-backend
functions (`raster_ref_to_gdal_mem`, `raster_ref_to_gdal_empty`,
`build_vrt_from_sources`, `raster_ref_to_gdal`,
`VrtKey::from_raster`). Adds `Bands::new` as a pub constructor so
`bands()` impls outside `traits.rs` can build the wrapper.
2. Rebuild `RasterRefImpl<'a>` to hold flat `&'a Array` references for
every field instead of a back-pointer to `&'a RasterStructArray<'a>`.
`RasterStructArray::get` goes back to `fn get(&self, idx) ->
RasterRefImpl<'a>` (was `&'a self`), which removes the lifetime tie
that was forcing every scalar-raster call site to hoist
`RasterStructArray::new(...)` into a local. Reverts the
`let arr0; arr0 = ...; if arr0.is_null(0) ...` plumbing in
`executor.rs` and the two-line hoists in `sedona-testing/rasters.rs`.
Restores `gdal_common::tests::single_raster` so the four
`let rsa = RasterStructArray::new(&raster_array); let raster =
rsa.get(0).unwrap();` test sites can go back to
`let raster = single_raster(&raster_array);`.
Also revert `AffineMatrix::from_metadata` to main's `&dyn MetadataRef`
parameter — callers pass `&raster.metadata()` (one extra `&` versus
main, since our shim's `metadata()` returns `RasterMetadata` by value
rather than `&dyn MetadataRef`). The `impl<T: MetadataRef + ?Sized>
MetadataRef for &T` blanket impl that was carrying the by-value-generic
form goes with it.
`builder.rs` and `array.rs` each had their test modules entirely rewritten
during the N-D port, with main's tests deleted and replaced by N-D-flavoured
variants. The shim makes main's tests buildable as-is. Restore them
verbatim and keep the PR-B-only N-D tests alongside:
- builder.rs: `test_iterator_basic_functionality`, `test_multi_band_iterator`,
`test_copy_metadata_from_iterator`, `test_band_data_types`,
`test_outdb_metadata_fields`, `test_band_access_errors`. Adds
`use crate::traits::{BandMetadata, MetadataRef};` at the top so `start_band`
/ `start_raster` can drop their fully-qualified `crate::traits::` paths
and match main's call shape, and the test module pulls `BandMetadata` /
`RasterMetadata` / `StorageType` in alongside the existing N-D imports.
- array.rs: `test_array_basic_functionality`, `test_multi_band_array`,
`test_raster_is_null`. The fourth main test
(`test_invalid_band_metadata_returns_err`) doesn't translate — it pokes a
nested `metadata` Struct that doesn't exist in our flat N-D layout — and
the equivalent corruption check is already covered by
`band_and_band_data_type_return_none_for_unknown_discriminant`.
Two cleanups that fall out:
- `Bands::band(0)` and out-of-range errors now use main's exact wording
("band numbers must be 1-based" / "is out of range") so
`test_band_access_errors` passes without a tweak.
- `BandRefImpl::data()` is now an explicit override that returns the raw
`data` column bytes (matching main: `&[]` for OutDb, the row-major buffer
for InDb identity views) instead of routing through `nd_buffer()` and
panicking on OutDb. The shim's default `BandRef::data()` is kept for
other implementors.
PR-B total: 10 files +3,286 / -897 (was +3,260 / -1,985 before this round).
Second-round review punch list — small, surgical:
- `builder.rs` imports: merge the two `sedona_schema::raster::{...}`
imports into one, recombine `arrow_schema::{ArrowError, DataType}`
(the lone `use arrow_schema::DataType;` line was unnecessary), drop
the duplicate blank lines around the imports block.
- `start_raster_2d` doc-comment was the stitched-together result of two
drafts ("Sets `spatial_dims=...`, and Build the 6-element GDAL
transform..." with a self-reference at the end). Rewrite as a single
coherent sentence and point at `Self::start_raster` instead of
`Self::start_raster_2d`.
- `MetadataRef`: restore main's per-method doc comments
(`/// Width of the raster in pixels` etc.) — they were dropped on the
N-D rewrite for no N-D reason since this trait is the pre-N-D shim.
- `array.rs`: drop the three `// ----` banner blocks ("Band
implementation", "Raster implementation", "RasterStructArray …") that
weren't in main; the doc-comment on each type already names it. Also
drop the `Critical #N` / `Important #N` prefixes from the five
test-section dividers — they encode review prioritization in source.
- `sedona-schema/src/raster.rs`: `StorageType` repr back to `#[repr(u16)]`
to match main; the only existing caller casts `as u32` explicitly so
binary layout doesn't matter here.
- `traits.rs`: drop the unused `PartialEq` derive on `RasterMetadata`
and `BandMetadata` — main has bare `#[derive(Debug, Clone)]` on both
and no workspace consumer compares either via `==` / `assert_eq!` on
the values themselves.
The other big finding (drop the inherent `impl RasterMetadata { ... }`
in favour of `MetadataRef` trait imports at every call site) was a wash
once accounted for — 25 inherent-method lines vs ~10 added trait imports
across ~8 files, with the structural cost that every downstream
consumer of `RasterMetadata.width()` style would need the trait import
forever. Kept the inherent block.
PR-B total: 10 files +3,270 / -888.
CI's codespell hook flagged two occurrences in `traits.rs` doc comments (`MetadataRef` and `RasterRef::bands` definitions).
paleolimbot
left a comment
There was a problem hiding this comment.
A few optional nits probably worth considering now but in general this look great! Thank you!
8e62d5f to
f1eaba4
Compare
- `RasterRefImpl::band()`: change the non-null view row case from a silent `return None` to an `assert!`, surfacing the corrupt-schema invariant rather than hiding it behind an out-of-range-looking miss. - `BandRef::data()` (trait default): document as a compatibility shim, delegate to `contiguous_data()` and panic on `Cow::Owned` (the view-materialized path can't be returned through `&[u8]` since the owned `Vec` would die at the end of the call). Implementors that need view-materialized bytes via `data()` must override and anchor the materialized buffer on `Self`; other consumers should reach for `contiguous_data()` directly. - `BandRefImpl::data()` (concrete override): keeps the one-line raw bytes return; correct by construction since the upstream `assert!` rejects non-identity views before BandRefImpl construction. GDAL backend in `raster_ref_to_gdal_mem` keeps using this directly — no `contiguous_data()` plumbing until view support actually lands. - `traits::nodata_bytes_to_f64`: drop `pub`. The only external surface is the lossless wrapper (`nodata_bytes_to_f64_lossless`); the lossy variant is now an internal fallback for non-i64/u64 dispatch. `sedona-raster-gdal::gdal_common::nodata_bytes_to_f64` is a separate `Option<f64>`-returning helper, unaffected. - `traits.rs`: drop the `test_nodata_as_f64_int64_loses_precision_above_2_pow_53` test. With `nodata_bytes_to_f64_lossless` as the canonical path, locking in the lossy fallback's rounding behavior on out-of-mantissa Int64 documents an implementation detail rather than a contract.
f1eaba4 to
fa32b34
Compare
…nal_err Per Dewey's review (PR apache#749): the silent-`return None` and the assert! follow-up both hid the difference between "out-of-range index" and "schema-level invariant violation." Change `RasterRef::band(idx)` to return `Result<Option<Box<dyn BandRef + '_>>, DataFusionError>`: - `Ok(None)` — out-of-range index (back-compat with the old Option semantics for that case). - `Ok(Some(_))` — valid band. - `Err(DataFusionError::External(...))` — schema corruption, surfaced via `sedona_common::sedona_internal_err!` with the standardized SedonaDB internal-error message format. `RasterRefImpl::band` uses `sedona_internal_err!` for the non-null view row case; the upstream `assert!` is gone. Back-compat preserved at the `Bands::band(N)` wrapper — it still returns `Result<_, ArrowError>` (matching main's `BandsRef::band` surface). `DataFusionError` from the underlying `RasterRef::band` is wrapped into `ArrowError::ExternalError` so callers using main's `bands().band(N)?` shape see no change. Default impls and internal callers updated: - `band_data_type` default: `.ok().flatten().map(...)` — silently drops corruption (the only consumer is the default, which `RasterRefImpl` overrides; corruption surfaces through the explicit `band()` path). - `band_by_name`: same pattern. - `Bands::iter`: `.filter_map(|i| raster.band(i).ok().flatten())` — matches main's `BandsRef::iter` (no error channel through the iterator). Doc note that consumers needing corruption signal must iterate by index and call `band()` directly. Test sites updated mechanically: every `r.band(N).unwrap()` becomes `r.band(N).unwrap().unwrap()` (Result → Option → Box), and `r.band(N).is_none()` becomes `r.band(N).unwrap().is_none()`. `bands.band(N)` style (the 1-based Result-returning wrapper) is untouched. `sedona-raster/Cargo.toml` picks up `datafusion-common` as a direct dep so the macro's `datafusion_common::DataFusionError::External(...)` path resolves.
…ion_err Use the SedonaDB-internal-error macro as the assert message so the non-identity view panic carries the standard "file a bug report" pointer and backtrace, matching how the rest of the codebase signals internal invariant violations.
368a29e to
a97def9
Compare
…l errors Changes `RasterRef::band(index)` from `Option<Box<dyn BandRef + '_>>` to `Result<Box<dyn BandRef + '_>, ArrowError>`, matching main's pre-N-D shape and giving callers a single signal that distinguishes "found a band" from "lookup failed." Knock-on effects: - `Bands::band` simplifies — no more `ok_or_else` to convert the Option back into the ArrowError the back-compat surface promises. - `Bands::iter` now yields Result items so corrupt bands surface rather than being silently dropped by the previous `filter_map`. - `band_by_name` mirrors `band` and returns a Result. - `band_data_type` keeps its Option fast-path shape (consistent with the other ScalAR-column fast-path accessors), using `.ok()` to collapse errors to None. - Schema-corruption paths in `RasterRefImpl::band` (empty source_shape, unknown data_type discriminant, non-null view row) now return `ArrowError::ExternalError(Box::new(sedona_internal_datafusion_err!( ...)))` so the error message carries the standardised "SedonaDB internal error: ... file a bug report" framing, addressing the spirit of Dewey's review comment. Tests updated to assert the new error shape rather than `is_none()`.
Summary
Upgrades SedonaDB's raster type from 2D tiles to N-dimensional chunks with named dimensions, enabling support for climate model outputs, hyperspectral imagery, Zarr/NetCDF datacubes, and other multi-dimensional geospatial datasets.
This is a schema-and-API migration. All 33 existing
RS_*functions are migrated mechanically and produce identical outputs on 2D inputs; no new SQL functions are added in this PR. GDAL-backed functions added in #787 are ported to read from the new schema and gate onBandRef::is_2d()— N-D bands return a clearExecutionerror pending MDArray/Zarr support.This PR is rebased onto post-#787
mainand is the destructive replacement of #787's band schema. Because the schema swap and the GDAL loader port have to land together to keepmaincompiling, the PR is structured as four review-friendly commits but only the tip is required to be CI-green — intermediate commits intentionally don't build and the PR is not bisectable across commits 1–3.This PR depends on #812 (GDAL outdb-URI parser); merge order: #812 → #749.
Commits
nodata_value,storage_type,outdb_url,outdb_band_id; addsdim_names,source_shape, nullableview,outdb_uri,outdb_format. Top-level gainsspatial_dims/spatial_shape.is_2d—RasterRefandBandRefaccessors for the N-D schema;BandRef::is_2d()default method returns true iffdim_names == ["y","x"]with the identity view (used by GDAL-backed paths to refuse N-D input cleanly).viewrow, all 33RS_*functions ported.RS_BandPathkeeps its existing fragment-stripping (format-agnostic display) and is unchanged.sedona-raster-gdalto read fromoutdb_uri+ the feat(raster-gdal): add parse_outdb_source helper for the GDAL format driver #812 parser instead ofstorage_type/outdb_url/outdb_band_id. VSI normalization, dataset cache, and RasterIO bodies are byte-for-byte unchanged — only the schema-read sites move. Each GDAL-backed SQL function gates onband.is_2d()at entry.Equivalence mapping (vs #787)
storage_type(InDb / OutDbRef enum)data.is_empty()— InDb bands carry their bytes indata; OutDb bands have an emptydatabuffer and resolve throughoutdb_uri+outdb_format.outdb_urloutdb_uri(no rename, same string)outdb_band_idoutdb_uri(<uri>#band=Nor GDAL native subdataset URI). Parsed only inside the GDAL format driver via #812'sparse_outdb_source. Format-agnostic surfaces (incl.RS_BandPath) treatoutdb_urias opaque.nodata_valuenodata: Binary(typed bytes; a null row means "no nodata")Top-level adds
spatial_dims: List<Utf8>andspatial_shape: List<UInt64>(e.g.["y","x"]and[height, width]).Schema (canonical)
The
viewfield is defined in the schema so it is part of the canonical N-D layout and round-trips through Arrow IPC, but in this PR every writer emits a null view row (canonical identity) and the reader treats a non-null view row as a hard read error. Encoding identity as a null row keeps the common "no slice applied" case free in Arrow space; consumers see the same shape and bytes regardless of which path produced the band. The composition path that decodes non-null view rows lands in #813.Traits
RasterRef—spatial_dims,spatial_shape,transform,crs,num_bands,band(i),width/height/x_dim/y_dim(derived).BandRef—name,dim_names,source_shape,shape(visible, derived),view,data_type,nodata,outdb_uri,outdb_format,nd_buffer,contiguous_datareturningCow<[u8]>,is_2d(default).NdBuffer— zero-copy buffer view used at the numpy / Arrow C Data Interface boundary.Builder
start_raster/start_bandfor full N-D;start_raster_2d/start_band_2dfor legacy 2D.start_bandwrites a null view (canonical identity) — no caller change.finish_bandvalidates each band's visible shape against the raster'sspatial_shapealong the spatial dims.Backward compatibility
Legacy 2D rasters are represented as bands with
dim_names = ["y","x"],source_shape = [height, width], and an identity view. All existingRS_*outputs are byte-identical on 2D inputs; every test that passed against #787's GDAL functions passes against the ported loader.Crates modified
sedona-schema— N-D raster schema definition.sedona-raster— traits, Arrow array reader, builder, affine transform, display.sedona-testing— raster test helpers.sedona-raster-functions— allRS_*function implementations migrated;RS_BandPathunchanged.sedona-raster-gdal— loader readsoutdb_uri+ feat(raster-gdal): add parse_outdb_source helper for the GDAL format driver #812 parser; SQL functions gate onis_2d().Out of scope (follow-ups)
append_as_indb_raster/dataset_to_indb_raster(deleted here, see note above) against the N-D schema.validate_view,start_band_with_view, view-decoding in the reader, view-awarecontiguous_data, and the explicit-view test cluster — landing in [WIP] feat(raster): view machinery for non-identity band views #813 between this PR and feat: add N-D raster dimension query and manipulation functions #750.is_2d()guard for selected functions).Test plan
sedona-schema: tests pass; verifies theviewfield is nullable and the view-struct field order matches the indices module.sedona-raster: tests pass — identity round-trip viastart_band(null view row); on-disk schema invariant that identity bands serialise as null view rows; reader rejects non-null view rows with a clear error;BandRef::is_2dtruth-table tests; builder visible-shape validation againstspatial_shape.sedona-raster-functions: tests pass — all existing function tests produce identical outputs after migration.sedona-raster-gdal: 46/46 pass — every feat(sedona-raster-gdal) add GDAL foundation library for supporting GDAL-based RS functions #787 SQL-function test passes against the ported loader; new tests cover N-D rejection atraster_ref_to_gdal_memand the VRT path; one end-to-end test confirmspath#band=2selects band 2 of a two-band GeoTIFF.cargo clippy --all-targets -- -D warningsclean on the affected crates.cargo fmt --all --checkclean.