Skip to content

Lazy Raster Loading Support for Zarr and GDAL#886

Open
james-willis wants to merge 9 commits into
apache:mainfrom
james-willis:jw/outdb-loader-v2
Open

Lazy Raster Loading Support for Zarr and GDAL#886
james-willis wants to merge 9 commits into
apache:mainfrom
james-willis:jw/outdb-loader-v2

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

@james-willis james-willis commented May 28, 2026

Summary

This PR adds support for OutDB raster loading via the RS_EnsureLoaded UDF. When a function that needs the bytes of a raster available is called, an optimizer rule injects RS_EnsureLoaded into the expression tree. This _should also leverage Common Subexpression Elimination from Datafusion to avoid repeated loads.

This PR supercedes #849. This is the new approach based on the review comments I received from Dewey on that PR.

Copy reduction

Returning arrow_buffer::Buffer (not Vec<u8> or Bytes) and constructing the output BinaryViewArray directly from collected Buffers — bypassing BinaryViewBuilder's internal block copy — gives one copy total (storage → loader-internal Buffer) on the byte path, down from three with a naive Vec<u8> interface.

Zarr Registration

I punted on this but we will need it in the near future. The standard datafusion FFI doesnt help us here because these OutDBLoaders are not part of the datafusion interface. the Agent presented two solutions:

A. Build a C-ABI wrapper for AsyncByteLoader following the datafusion-ffi pattern (#[repr(C)] struct, extern "C" function pointers, abi_stable types). ~200-400 LoC of FFI scaffolding + safety review. Future-proof but heavyweight, and we'd own the ABI compatibility story going forward.

B. PyCapsule with raw Arc transit. Non-standard but works when both wheels link the same sedona-raster version. PyArrow has analogous patterns (PyCapsule carrying ArrowSchema* etc., where the C struct happens to be standardized — ours wouldn't be). Document the version-coupling constraint.

Foundational module for async OutDb byte loading. Defines:

- `AsyncByteLoader` trait — async fn that returns `arrow_buffer::Buffer`,
  enabling zero-copy hand-off to the downstream BinaryViewArray
  construction in RS_EnsureLoaded (one storage→column copy instead of
  three: storage→loader-internal→Vec<u8>→BinaryViewBuilder-block→column).
- `OutDbLoadRequest` — borrowed request struct carrying the URI,
  dim_names, source_shape, and pixel type for one band's worth of bytes.
- `OutDbLoaderRegistry` — format-keyed map from `outdb_format` strings to
  `Arc<dyn AsyncByteLoader>`. One instance per SedonaContext; the context
  wraps it in `Arc<RwLock<…>>` so plugin crates can register
  post-construction via a public SedonaContext API.

No callers yet — the registry/trait stand alone. Subsequent commits add
the SedonaScalarUDF needs_bytes annotation, the RS_EnsureLoaded async
UDF that consumes the registry, the analyzer rule that injects the wrap,
and the GDAL + Zarr backend impls.

7 unit tests cover registration, lookup, dispatch, overwrite semantics,
and the format-iteration helper used for diagnostic messages.

See docs/research/async-outdb-design.md for the full design rationale.
Class-level annotation declaring whether a UDF's kernels read raster
pixel bytes. The analyzer rule (later commit) reads this to decide
whether to wrap raster arguments with RS_EnsureLoaded.

Shape:
- New `needs_bytes: bool` field on SedonaScalarUDF, default false.
- `with_needs_bytes()` builder method flips it to true at construction.
- `needs_bytes()` accessor for the analyzer to query post-downcast.

Class-level by convention: each UDF is constructed once per session, so
the instance field reflects a fixed property of the underlying kernels.
The field doesn't live on SedonaScalarKernel because kernels within a
single UDF don't realistically disagree on byte-needing-ness — they
differ by type signature, not by data-access pattern. If a future use
case demands per-kernel granularity, that's the trigger to push the
flag down to the kernel trait; not before.

No current RS_* UDFs read pixel bytes (audit returns empty across
sedona-raster-functions, sedona-raster-gdal, sedona). The annotation
mechanism is staged for future byte-reading kernels (e.g., RS_Value,
RS_Sum) which will call `.with_needs_bytes()` at construction.

See docs/research/async-outdb-design.md §2 for the design rationale.
Materialises OutDb raster bands at query time via the format-keyed
AsyncByteLoader registry from sedona-raster. The UDF instance closes
over an Arc<RwLock<OutDbLoaderRegistry>> owned by SedonaContext, so
loader registrations from session bootstrap (compiled-in defaults like
GDAL under a Cargo feature, later commit) and from plugin entry points
(out-of-tree crates calling SedonaContext::register_outdb_loader,
later commits) are immediately visible at dispatch time without
session-extension lookups — AsyncScalarUDFImpl::invoke_async_with_args
in DataFusion 52.5 only receives Arc<ConfigOptions>, so the closure-bound
registry is the right runtime path.

Implementation:

- `RsEnsureLoaded` implements `AsyncScalarUDFImpl`. For each input row,
  walks bands, passes through InDb bytes verbatim, dispatches OutDb
  bytes via the registered loader for the band's `outdb_format`,
  validates the loader-returned `Buffer` length against
  `Π source_shape × data_type.byte_size()`, and assembles the output
  raster StructArray via `RasterBuilder`. Sequential await for the
  first cut; parallel `buffer_unordered` fan-out is a follow-up that
  doesn't change the trait surface.
- `SedonaContext` gains a private `outdb_registry: Arc<RwLock<...>>`
  field initialised empty at `new_from_context`. The `RsEnsureLoaded`
  UDF is registered there alongside other session UDFs, with a clone
  of the same Arc.
- Public `SedonaContext::register_outdb_loader(format, loader)` is the
  single insertion hook used by both compiled-in defaults and plugins.
  `registered_outdb_formats()` exposes the keys for diagnostics.
- `OutDbLoaderRegistry` gains a manual `Debug` impl listing format keys
  (the loader values themselves are opaque trait objects).
- `SedonaScalarUDF` Eq/Hash/Debug derive lessons applied: manual impls
  by identity-by-name, since the registry field doesn't participate in
  UDF identity and isn't itself Eq/Hash.

5 unit tests cover the resolver end-to-end with a mock loader:
InDb passthrough, OutDb byte materialisation, missing-format diagnostic
with registered-formats listing, undersized-loader-output detection,
null raster row preservation. One context-level test asserts the
registry plumbing (empty initial state, post-register listing, UDF
visible in the scalar function registry by name).
Pre-optimizer rewrite that makes OutDb byte materialisation explicit in
the logical plan. For every `Expr::ScalarFunction` whose UDF downcasts
to a `SedonaScalarUDF` annotated `with_needs_bytes()`, each
raster-typed argument is wrapped in `RS_EnsureLoaded(arg)`. The wrap
references the same `Arc<ScalarUDF>` as the one
`SedonaContext::new_from_context` registers under the UDF registry, so
both injected and explicitly-written calls dispatch through the same
async resolver.

Recursion guard: `RS_EnsureLoaded` itself is left alone, so
`SELECT ... FROM RS_EnsureLoaded(rast)` doesn't recurse into
`RS_EnsureLoaded(RS_EnsureLoaded(rast))`.

Detection of "raster-typed" arguments uses `Expr::to_field(schema)` —
not `get_type()` — so the Field's extension-type metadata is available;
`SedonaType::Raster` is identified by the `"sedona.raster"` extension
type, not by raw `DataType::Struct`.

Two adjustments to Commit 3's RS_EnsureLoaded UDF that pair with the
analyzer rule:

- Signature volatility flipped from `Volatile` to `Stable`. DataFusion's
  `CommonSubexprEliminate` pass skips volatile expressions
  (`is_volatile_node()` returns true iff signature is exactly
  `Volatility::Volatile`), so the previous Volatile classification
  would have prevented dedup of injected wraps. Semantically Stable is
  the honest call: within a query, byte materialisation is deterministic
  for fixed inputs; across queries the underlying storage may change.
- Signature shape changed from `user_defined` to `any(1, ...)`. The
  `AsyncScalarUDF` adapter doesn't delegate `coerce_types` to the inner
  impl, so `Signature::user_defined` (which requires `coerce_types`)
  produced a planning-time "function does not implement coerce_types"
  error. `any(1, ...)` lets DataFusion accept whatever single-arg type
  the caller passes; we validate "argument is Struct" in `return_type`.

The CSE-dedup test (load-bearing) constructs the post-analyzer plan
shape directly (two `needs_bytes` UDFs calling the same raster
column, both with their args wrapped) and applies DataFusion's
`CommonSubexprEliminate::rewrite` to it. Before CSE: two
RS_EnsureLoaded(rast) calls in the plan tree. After CSE: exactly one.
This validates the volatility-Stable choice and proves we lean on
DataFusion's existing optimiser for dedup rather than re-implementing
it in our rule.

8 tests cover:
- `expr_is_raster` detection on raster + non-raster columns
- analyzer rule wraps raster args of needs_bytes UDFs
- leaves non-raster args alone
- leaves metadata-only (un-annotated) UDFs alone
- recursion guard against rewrapping RS_EnsureLoaded
- nested calls are rewritten inside-out via transform_up
- CSE dedup of injected wraps (2 → 1)

Registered from `SedonaContext::new_from_context` via
`ctx.state_ref().write().add_analyzer_rule(...)`. The rule runs in
DataFusion's analyzer phase, before any optimizer rules including
PushDownFilter, PushDownLimit, and CommonSubexprEliminate.

See docs/research/async-outdb-design.md for the broader design.
Materialises OutDb GeoTIFF (and other GDAL-supported) bands when the
RS_EnsureLoaded UDF dispatches against the `"gdal"` registry key.

Shape:
- `GdalLoader` is a stateless `AsyncByteLoader` impl. The thread-local
  `GDALDatasetCache` from `gdal_dataset_provider` does the per-thread
  dataset memoisation; the loader itself owns no state, so instances
  are interchangeable and cheap to clone.
- Inside `load()`, GDAL's blocking API is wrapped in
  `tokio::task::spawn_blocking` so the calling tokio worker stays
  responsive. The dataset cache uses `Rc` (not Send/Sync) and is
  thread-local, so each blocking-pool worker maintains its own cache
  — concurrent loads via `buffer_unordered` build per-worker caches
  without contention.
- `read_as_bytes` returns a `Vec<u8>`; `Buffer::from_vec` wraps it
  zero-copy for the return path, matching the design's
  one-storage-to-column-copy goal end-to-end on the GDAL backend.

Request validation (synchronous, before spawning):
- Source shape must be 2-D and `dim_names == ["y", "x"]`. Higher-rank
  or transposed OutDb reads need MDArray support and explicit axis
  mapping, tracked separately.
- File's GDAL pixel type must match the band metadata's `data_type`
  claim. Caught here with a clear error before reading, otherwise a
  size mismatch would mis-blame the loader rather than naming the
  dtype problem.

Bootstrap registration: `SedonaContext::new_from_context` calls
`out.register_outdb_loader(GDAL_FORMAT, Arc::new(GdalLoader::new()))`.
GDAL itself is dlopen'd lazily by `sedona-gdal` (workspace-default
`gdal-sys = false`), so registration is safe on systems without
libgdal — the first OutDb GeoTIFF query surfaces a clean "libgdal not
found" error from `sedona-gdal::global` instead of a panic, and
non-OutDb queries are unaffected.

Module bookkeeping: the existing `source_uri` module (`parse_outdb_source`)
was previously `#[cfg(test)]`-only; it's needed in production now, so
the cfg gate is removed. `get_or_create_outdb_source` on
`GDALDatasetCache` is promoted from private to `pub(crate)` so the
loader can reach it.

7 unit tests cover: 2x3 UInt8 round-trip, default-band-1 when fragment
absent, request-validation rejections (non-2D, non-yx dim_names, dtype
mismatch), and propagation of GDAL-layer errors (missing file,
out-of-range band index). Tests use real GDAL via the existing
`with_gdal` helper and the in-crate driver pattern from
`gdal_dataset_provider.rs`.

Updated context-level test: bootstrap registry contains `"gdal"`
instead of being empty; plugin registrations are stacked on top.
Materialises OutDb Zarr-backed bands when the RS_EnsureLoaded UDF
dispatches against the `"zarr"` registry key. Pairs with the Zarr
reader's existing chunk-anchor URI scheme
(`<store_uri>#array=<array_path>&chunk=<i0>,<i1>,...`): the loader
parses the anchor, opens the Zarr store + array, retrieves the named
chunk via `zarrs`, and returns the bytes as an `arrow_buffer::Buffer`.

Shape:
- `ZarrLoader` is a stateless `AsyncByteLoader` impl. Per-call store
  opens; per-(store_uri, array_path) caching is a follow-up
  optimisation that doesn't change the trait surface.
- Inside `load()`, the sync `zarrs` API runs inside
  `tokio::task::spawn_blocking` so the calling tokio worker stays
  responsive. Resulting `Buffer` is Send, crosses the await cleanly.
- Validates the Zarr array's dtype against the request's
  `data_type` claim before reading. Mismatches surface here with a
  clear diagnostic rather than as a downstream byte-count surprise.

Delivery model: as an out-of-tree plugin, `sedona-raster-zarr` does
NOT depend on `sedona` — callers register the loader from their own
context-setup code:

  ctx.register_outdb_loader(
      sedona_raster_zarr::ZARR_FORMAT,
      Arc::new(sedona_raster_zarr::ZarrLoader::new()),
  );

This is the asymmetry from Commit 5 (GDAL): GDAL is upstream of
sedona and registered automatically; Zarr is a plugin and explicit.
The Python wheel `sedonadb-zarr` will surface a Python-level
`register(con)` shim that wraps this call in a follow-up.

Module bookkeeping: `ChunkAnchor`, `parse_chunk_anchor`, and the
`retrieve_chunk_bytes` chunk-read primitive in `loader.rs` were
previously `#[cfg(test)]`-only — they're production-needed now, so
the cfg gates are removed. `retrieve_chunk_bytes` is now `pub(crate)`
since `outdb_loader` is its only consumer outside tests.

5 unit tests cover: UInt8 chunk round-trip (write via `ArrayBuilder`,
read via the loader), dtype mismatch with clear diagnostic, malformed
chunk-anchor URI rejection (missing fragment), missing-array-path
propagation from `zarrs`, and cloud-scheme rejection (until cloud
stores are supported). Tests build a real Zarr v3 group via
`GroupBuilder` + `ArrayBuilder` against a temp filesystem.
@github-actions github-actions Bot requested a review from zhangfengcdt May 28, 2026 17:19
Three small adjustments from review:

- Drop the "Class-level by convention…" paragraph from
  `SedonaScalarUDF::with_needs_bytes()`'s doc comment. The remaining
  doc already names the purpose; the extra prose was prescriptive
  without adding information.
- Replace `unreachable!("matched above")` in
  `ensure_loaded_analyzer::rewrite_expr_node` with a
  `sedona_internal_datafusion_err!`-bearing let-else. Same
  intent-preserved-as-leading-comment, but the failure mode on a
  future refactor that breaks the invariant is a clean query-level
  internal error instead of a process-level panic — matches the
  defensive style used everywhere else in this PR.
- Fix codespell: "Implementors" → "Implementers" in the
  `AsyncByteLoader` trait doc.
@james-willis james-willis marked this pull request as ready for review May 28, 2026 18:49
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through the analyzer bits today but thought I'd leave the loader part and I'll take a look at the analyzer tomorrow. In general this looks great!

Comment on lines +131 to +132

let bytes = retrieve_chunk_bytes(&array, &anchor.chunk_indices)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is nicely happening on another thread so as to not block what's happening elsewhere, doing it all at once makes it non-cancelable. If it's possible to break this into chunks of work that are independently awaitable it would make it easier for tokio to schedule them effectively. In the absence of that, perhaps put a limit on either the number of bytes or the number of seconds and loop in such a way that a user won't get a hanging process.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will all go away when we migrate to the async/obstore-based implementation when we add cloud support to the zarr crate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rough(!) draft of migrating sedona-zarr to be all async: #888

let buffer = tokio::task::spawn_blocking(move || -> Result<Buffer, ArrowError> {
let anchor = parse_chunk_anchor(&uri)?;
let fs_path = group_uri_to_filesystem_path(&anchor.store_uri)?;
let store = FilesystemStore::new(&fs_path).map_err(|e| {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the zarr crate have an async API you can use to avoid a spawn here? I was expecting this to look something like

let read_tasks = open_zarr(uri).await?;
futures::stream::iter(read_tasks)
    .map(|task| {
        async move {
            task.doit().await
        }
    })
    .boxed()
    .buffered(max_concurrency)
    .try_collect()
    .await?;

...but maybe that's what's happening one level higher than this. As long as the data transfer size is bounded (a few MB per chance to evaluate whether this should continue or not) the current approach is OK too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Cloud ones will. AFAIK file doesnt. I can look for something to avoid forking those two cases but no promises.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I did some more research. There is a way to use zarrs with obs store to get a local fs that works asynchronously.

My plan (wrote it down in my notes for that work) is that when I do that work to integrate zarr cloud support I will remove all of this synchronous code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, just wondering if there was already an async way to split this up (totally fine if there isn't, although best if you can find a way to do it chunks like the GDAL loader does)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rough(!) draft of migrating sedona-zarr to be all async: #888

Comment thread rust/sedona-raster-gdal/src/outdb_loader.rs Outdated
Comment on lines +79 to +84
let height = usize::try_from(req.source_shape[0]).map_err(|_| {
ArrowError::InvalidArgumentError(format!(
"GDAL OutDb source_shape[0]={} exceeds usize::MAX",
req.source_shape[0]
))
})?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this loading the source shape or the target shape? (I think Kristin added a getter for an extract that considers a target shape as long as it is contiguous)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_shape is the name of the field in the raster trait. it represents the shape of:

a. in the indentity view case, the data
b. in the non-identity view case, the data that backs the view.

it is named as such to be clear of the difference between the visible bytes vs the backing bytes.

So in this case source_shape is the target shape.

Comment on lines +50 to +53
/// Raw source shape in `dim_names` order. The loader returns a
/// `Buffer` whose length equals `Π source_shape × data_type.byte_size()`
/// bytes, encoding pixels in C-order over `dim_names`.
pub source_shape: &'a [u64],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this see the view as well? In some cases the IO will be the same (but we can discard a lot of pixels right away); in other cases we can avoid IO because the retile will map to a small subset of a contiguous range. (Apologies if I misunderstood what source_shape is here)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I havent thought about designing the system to allow "pushing" views into loaders. This PR does not support that and I'm not sure when we would be able to take advantage of such a thing.

Generally a cog tile or zarr chunk its retrieved atomically so we can't benefit from "pushing" the view in these cases. so I dont think theres a way to reduce io with such support.

Im not sure when you want to eagerly slice/crop vs when you want to perform that lazily.

Do you think crops on indb rasters should be eager in general? if so we could probably solve this with operator fusion + a rule in the crop operators impl that always eager crops the indb raster

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that if a loader is asked to load a 3x3 chunk of a 1e6 by 1e6 raster, the loader should return 9 pixels. It would be up to the caller to choose whether retiling or cropping is appropriate (and up to the loader to decide whether or not to cache 1e6 by 1e6 pixels in anticipation of reading another slice of the same chunk).

It's OK not to implement crop pushdown but if the information isn't there it can never happen.

Comment on lines +73 to +79
#[async_trait::async_trait]
pub trait AsyncByteLoader: Send + Sync {
/// Fetch the band's bytes. The returned `Buffer` must contain exactly
/// `Π source_shape × data_type.byte_size()` bytes in C-order over
/// `dim_names`. Errors propagate to the caller of `RS_EnsureLoaded`.
async fn load(&self, req: &OutDbLoadRequest<'_>) -> Result<Buffer, ArrowError>;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure exactly what the parameter should be here, but this should possibly also pass an Arc<dyn ObjectStore>. The loader might not be able to make use of this (i.e., it might use the URI and hope/pray that the authentication ENV vars specified auth identically) but it would be great to centralize IO around this (we need this for non-raster datasources as well, in particular we need to FFI it to get true auth unification across a plugin boundary).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea. GDAL wouldnt use but native loaders (ie zarr) will.

I think probably this can find a good home in the zarr cloud PR.

I'll sketch that up tonight or tomorrow in parallel with this PR in case we want to land that first. Some of the changes in that might make this PR simpler to review.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rough(!) draft of migrating sedona-zarr to be all async: #888

depending which of these PRs we land first ill think about how to best pipe the obstore into the byte load request.

Comment thread rust/sedona/src/context.rs Outdated
Addresses two of Dewey's review comments on the GDAL loader: the
non-cancellable spawn_blocking concern and the absence of a size
sanity check.

- New `MAX_OUTDB_LOAD_BYTES` (4 GiB) constant. Requests whose
  `Π source_shape × byte_size` exceeds this are rejected before
  the blocking task is spawned — runaway / corrupt-metadata loads
  can't tie up a blocking-pool thread.
- `CancelOnDrop` guard flips an `Arc<AtomicBool>` when the outer
  async future is dropped (e.g. query-cancel). The blocking task
  observes the flag between block-height-aligned strip reads and
  returns a cancellation error rather than running to completion.
- Read loop replaced: single `band.read_as_bytes` → pre-allocated
  output `Vec<u8>` + `band.read_into_bytes` per strip of
  `band.block_size().1` rows. Each strip writes into a contiguous
  output slice (no per-block row-splice math). Cancellation
  granule = natural block height: 1–64 rows for strip GeoTIFFs,
  256 rows for tile GeoTIFFs (COG-style). For whole-image-block
  formats (PNG / JPEG) the byte cap is the primary safety net.
- Helper `read_band_blockwise(band, output, w, h, byte_size, cancel)`
  extracted as the testable unit. Direct tests for pre-armed cancel
  and multi-strip byte-correctness; full-loader integration tests
  for byte-cap rejection.

10 unit tests, up from 7. All existing tests pass unchanged.
Addresses Dewey's review feedback that the registry belongs in the
session's `ConfigOptions` extensions, mirroring how `CrsProvider` is
hosted inside `SedonaOptions`. The motivation: any UDF can reach
`ConfigOptions` at dispatch time via
`AsyncScalarUDFImpl::invoke_async_with_args`, whereas the previous
closure-bound `Arc` was reachable only by the one UDF that closed
over it. ConfigOptions is the DataFusion-idiomatic placement.

Shape:
- New `OutDbLoaderConfig` `ConfigExtension` in `sedona-raster` that
  holds a `OutDbLoaderRegistryOption(Arc<RwLock<OutDbLoaderRegistry>>)`
  field. Lives in `sedona-raster` (not `sedona-common`) because the
  registry type is here and a cross-crate move would create a circular
  dep. Has its own `sedona.outdb_loader` PREFIX namespace alongside
  the existing `sedona` `SedonaOptions`.
- `SedonaContext::new_from_context` builds the shared
  `Arc<RwLock<OutDbLoaderRegistry>>` once, stashes a clone inside
  ConfigOptions via `extensions.insert(OutDbLoaderConfig::from_handle(...))`,
  and keeps another clone in the existing `SedonaContext.outdb_registry`
  field. Both handles share the same `RwLock`, so writes through
  `register_outdb_loader(&self, ...)` are immediately visible to UDF
  reads.
- `RsEnsureLoaded` no longer holds a registry field. It pulls the
  handle out of `args.config_options.extensions.get::<OutDbLoaderConfig>()`
  at the top of every `invoke_async_with_args` call. The struct is
  now session-agnostic — one instance is correct for any session that
  has the extension registered. Tests updated to call `RsEnsureLoaded::new()`
  with no args.
- `ensure_loaded(...)` test helper signature simplified to
  `(input, lookup_closure)`; the redundant registry arg is gone.

Mirrors the precedent established by `CrsProviderOption`/`SedonaOptions`
in `sedona-common`. If a future caller bypasses `SedonaContext::new()`
and builds their own session without the extension, RS_EnsureLoaded
returns a clear internal error naming `OutDbLoaderConfig` so the fix
is obvious.

All 109 sedona-crate tests + 72 sedona-raster tests pass unchanged.
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes, but the details here are all great!

Comment on lines +50 to +53
/// Raw source shape in `dim_names` order. The loader returns a
/// `Buffer` whose length equals `Π source_shape × data_type.byte_size()`
/// bytes, encoding pixels in C-order over `dim_names`.
pub source_shape: &'a [u64],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that if a loader is asked to load a 3x3 chunk of a 1e6 by 1e6 raster, the loader should return 9 pixels. It would be up to the caller to choose whether retiling or cropping is appropriate (and up to the loader to decide whether or not to cache 1e6 by 1e6 pixels in anticipation of reading another slice of the same chunk).

It's OK not to implement crop pushdown but if the information isn't there it can never happen.

Comment on lines +98 to +99
// Show registered format keys but hide the opaque loader impls
// (which don't carry a meaningful Debug surface).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to just add Debug (AsyncByteLoader: Send + Sync + Debug) and #[derive(Debug)] here.

Comment on lines +274 to +283
// Register the analyzer rule that wraps raster args of
// `needs_bytes` UDFs with `RS_EnsureLoaded`. The rule holds an
// Arc clone of the same UDF so the wraps it injects route
// through the same async dispatcher (and thus the same
// per-session registry) as any explicit `RS_EnsureLoaded(...)`
// already written by the user.
out.ctx
.state_ref()
.write()
.add_analyzer_rule(Arc::new(EnsureLoadedAnalyzerRule::new(ensure_loaded_udf)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need this because logical optimizer rules have access to the function_registry(). (If this has to stay an analyzer rule and analyzer rules don't share that context this is a good solution)

Comment on lines +285 to +295
// Register the GDAL OutDb byte loader. `sedona-raster-gdal` is a
// mandatory dep on `sedona`, but libgdal itself is dlopen'd
// lazily by `sedona-gdal` (workspace-default-features = false),
// so this registration is safe on systems without libgdal —
// the loader's `load()` call will surface a clean "libgdal not
// found" error when first invoked, but registration and import
// succeed regardless.
out.register_outdb_loader(
sedona_raster_gdal::GDAL_FORMAT,
Arc::new(sedona_raster_gdal::GdalLoader::new()),
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this becomes problematic we could gate it behind the [gdal] feature and just set this up automatically in the Python package when creating the context, but I think lazy GDAL is OK to include.

Comment on lines +164 to +168
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
// Output shape mirrors input — the schema is preserved; only the
// `data` column's contents change. We require exactly one Struct
// argument shaped like a raster.
if arg_types.len() != 1 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need return_field() here (or the extension "sedona.raster" from the raster type will get dropped).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this live in the sedona-raster-funcs crate?

Comment on lines +304 to +309
// For InDb bands, copy bytes into an owned Buffer.
// `Buffer::from_vec` is zero-copy ownership transfer of
// the Vec; the per-row clone of `band.data()` itself is
// the one InDb copy we accept for sequential simplicity.
// Parallel fan-out + zero-copy borrowing of the input
// column is a follow-up optimisation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it isn't already, this should have a tracking issue + link here because it is a particularly egregious copy that should be fixed before anybody uses this seriously

Comment on lines +46 to +54
/// Analyzer rule that wraps raster arguments of `needs_bytes` UDFs with
/// `RS_EnsureLoaded`. Holds an `Arc<ScalarUDF>` clone of the registered
/// `RS_EnsureLoaded` UDF so wraps it injects reference the same async
/// dispatcher (and thus the same per-session loader registry) as any
/// explicit `RS_EnsureLoaded(...)` already written by the user.
#[derive(Debug)]
pub struct EnsureLoadedAnalyzerRule {
ensure_loaded_udf: Arc<ScalarUDF>,
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something I think this is probably better as a logical optimizer rule (you can put it in sedona-query-planner with the other logical optimizer rule we have there).

Comment on lines +74 to +80
/// Class-level metadata: does this UDF read raster pixel bytes from
/// any of its inputs? The optimiser/analyzer rule for `RS_EnsureLoaded`
/// uses this to decide whether to wrap raster arguments. Default
/// `false`; set via [`SedonaScalarUDF::with_needs_bytes`] at
/// construction time for UDFs whose kernels call
/// `BandRef::nd_buffer()` / `BandRef::contiguous_data()`.
needs_bytes: bool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this more generic (like: HashMap<String, String>)? This would need to be added to the FFI in sedona-extension and a single bool is tricky to future proof. Alternatively, just keep a list of the functions that need pixels in the optimizer rule.

Comment on lines +70 to +80
// Walk plan bottom-up so a nested `RS_X(RS_Y(rast))` is rewritten
// inside-out: the inner call's raster arg is wrapped first, then
// the outer call's (now-wrapped) arg is rewritten if it itself
// is a needs_bytes target.
plan.transform_up(|p| {
// Only single-input nodes (Projection, Filter, Aggregate,
// Window) carry expressions whose args are evaluated against
// a single input schema. Multi-input nodes (Join) and leaf
// nodes (TableScan, EmptyRelation) are passed through —
// joins with needs_bytes UDFs in the condition are a niche
// case to handle in a follow-up.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this becomes a logical optimizer rule it will need to be able to run twice on the same plan, which might make the inside out logic need guarding. (Maybe you'd have to check the expression for any RS_EnsureLoaded call and skip transforming it if so).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants