Lazy Raster Loading Support for Zarr and GDAL#886
Conversation
Foundational module for async OutDb byte loading. Defines: - `AsyncByteLoader` trait — async fn that returns `arrow_buffer::Buffer`, enabling zero-copy hand-off to the downstream BinaryViewArray construction in RS_EnsureLoaded (one storage→column copy instead of three: storage→loader-internal→Vec<u8>→BinaryViewBuilder-block→column). - `OutDbLoadRequest` — borrowed request struct carrying the URI, dim_names, source_shape, and pixel type for one band's worth of bytes. - `OutDbLoaderRegistry` — format-keyed map from `outdb_format` strings to `Arc<dyn AsyncByteLoader>`. One instance per SedonaContext; the context wraps it in `Arc<RwLock<…>>` so plugin crates can register post-construction via a public SedonaContext API. No callers yet — the registry/trait stand alone. Subsequent commits add the SedonaScalarUDF needs_bytes annotation, the RS_EnsureLoaded async UDF that consumes the registry, the analyzer rule that injects the wrap, and the GDAL + Zarr backend impls. 7 unit tests cover registration, lookup, dispatch, overwrite semantics, and the format-iteration helper used for diagnostic messages. See docs/research/async-outdb-design.md for the full design rationale.
Class-level annotation declaring whether a UDF's kernels read raster pixel bytes. The analyzer rule (later commit) reads this to decide whether to wrap raster arguments with RS_EnsureLoaded. Shape: - New `needs_bytes: bool` field on SedonaScalarUDF, default false. - `with_needs_bytes()` builder method flips it to true at construction. - `needs_bytes()` accessor for the analyzer to query post-downcast. Class-level by convention: each UDF is constructed once per session, so the instance field reflects a fixed property of the underlying kernels. The field doesn't live on SedonaScalarKernel because kernels within a single UDF don't realistically disagree on byte-needing-ness — they differ by type signature, not by data-access pattern. If a future use case demands per-kernel granularity, that's the trigger to push the flag down to the kernel trait; not before. No current RS_* UDFs read pixel bytes (audit returns empty across sedona-raster-functions, sedona-raster-gdal, sedona). The annotation mechanism is staged for future byte-reading kernels (e.g., RS_Value, RS_Sum) which will call `.with_needs_bytes()` at construction. See docs/research/async-outdb-design.md §2 for the design rationale.
Materialises OutDb raster bands at query time via the format-keyed AsyncByteLoader registry from sedona-raster. The UDF instance closes over an Arc<RwLock<OutDbLoaderRegistry>> owned by SedonaContext, so loader registrations from session bootstrap (compiled-in defaults like GDAL under a Cargo feature, later commit) and from plugin entry points (out-of-tree crates calling SedonaContext::register_outdb_loader, later commits) are immediately visible at dispatch time without session-extension lookups — AsyncScalarUDFImpl::invoke_async_with_args in DataFusion 52.5 only receives Arc<ConfigOptions>, so the closure-bound registry is the right runtime path. Implementation: - `RsEnsureLoaded` implements `AsyncScalarUDFImpl`. For each input row, walks bands, passes through InDb bytes verbatim, dispatches OutDb bytes via the registered loader for the band's `outdb_format`, validates the loader-returned `Buffer` length against `Π source_shape × data_type.byte_size()`, and assembles the output raster StructArray via `RasterBuilder`. Sequential await for the first cut; parallel `buffer_unordered` fan-out is a follow-up that doesn't change the trait surface. - `SedonaContext` gains a private `outdb_registry: Arc<RwLock<...>>` field initialised empty at `new_from_context`. The `RsEnsureLoaded` UDF is registered there alongside other session UDFs, with a clone of the same Arc. - Public `SedonaContext::register_outdb_loader(format, loader)` is the single insertion hook used by both compiled-in defaults and plugins. `registered_outdb_formats()` exposes the keys for diagnostics. - `OutDbLoaderRegistry` gains a manual `Debug` impl listing format keys (the loader values themselves are opaque trait objects). - `SedonaScalarUDF` Eq/Hash/Debug derive lessons applied: manual impls by identity-by-name, since the registry field doesn't participate in UDF identity and isn't itself Eq/Hash. 5 unit tests cover the resolver end-to-end with a mock loader: InDb passthrough, OutDb byte materialisation, missing-format diagnostic with registered-formats listing, undersized-loader-output detection, null raster row preservation. One context-level test asserts the registry plumbing (empty initial state, post-register listing, UDF visible in the scalar function registry by name).
Pre-optimizer rewrite that makes OutDb byte materialisation explicit in the logical plan. For every `Expr::ScalarFunction` whose UDF downcasts to a `SedonaScalarUDF` annotated `with_needs_bytes()`, each raster-typed argument is wrapped in `RS_EnsureLoaded(arg)`. The wrap references the same `Arc<ScalarUDF>` as the one `SedonaContext::new_from_context` registers under the UDF registry, so both injected and explicitly-written calls dispatch through the same async resolver. Recursion guard: `RS_EnsureLoaded` itself is left alone, so `SELECT ... FROM RS_EnsureLoaded(rast)` doesn't recurse into `RS_EnsureLoaded(RS_EnsureLoaded(rast))`. Detection of "raster-typed" arguments uses `Expr::to_field(schema)` — not `get_type()` — so the Field's extension-type metadata is available; `SedonaType::Raster` is identified by the `"sedona.raster"` extension type, not by raw `DataType::Struct`. Two adjustments to Commit 3's RS_EnsureLoaded UDF that pair with the analyzer rule: - Signature volatility flipped from `Volatile` to `Stable`. DataFusion's `CommonSubexprEliminate` pass skips volatile expressions (`is_volatile_node()` returns true iff signature is exactly `Volatility::Volatile`), so the previous Volatile classification would have prevented dedup of injected wraps. Semantically Stable is the honest call: within a query, byte materialisation is deterministic for fixed inputs; across queries the underlying storage may change. - Signature shape changed from `user_defined` to `any(1, ...)`. The `AsyncScalarUDF` adapter doesn't delegate `coerce_types` to the inner impl, so `Signature::user_defined` (which requires `coerce_types`) produced a planning-time "function does not implement coerce_types" error. `any(1, ...)` lets DataFusion accept whatever single-arg type the caller passes; we validate "argument is Struct" in `return_type`. The CSE-dedup test (load-bearing) constructs the post-analyzer plan shape directly (two `needs_bytes` UDFs calling the same raster column, both with their args wrapped) and applies DataFusion's `CommonSubexprEliminate::rewrite` to it. Before CSE: two RS_EnsureLoaded(rast) calls in the plan tree. After CSE: exactly one. This validates the volatility-Stable choice and proves we lean on DataFusion's existing optimiser for dedup rather than re-implementing it in our rule. 8 tests cover: - `expr_is_raster` detection on raster + non-raster columns - analyzer rule wraps raster args of needs_bytes UDFs - leaves non-raster args alone - leaves metadata-only (un-annotated) UDFs alone - recursion guard against rewrapping RS_EnsureLoaded - nested calls are rewritten inside-out via transform_up - CSE dedup of injected wraps (2 → 1) Registered from `SedonaContext::new_from_context` via `ctx.state_ref().write().add_analyzer_rule(...)`. The rule runs in DataFusion's analyzer phase, before any optimizer rules including PushDownFilter, PushDownLimit, and CommonSubexprEliminate. See docs/research/async-outdb-design.md for the broader design.
Materialises OutDb GeoTIFF (and other GDAL-supported) bands when the RS_EnsureLoaded UDF dispatches against the `"gdal"` registry key. Shape: - `GdalLoader` is a stateless `AsyncByteLoader` impl. The thread-local `GDALDatasetCache` from `gdal_dataset_provider` does the per-thread dataset memoisation; the loader itself owns no state, so instances are interchangeable and cheap to clone. - Inside `load()`, GDAL's blocking API is wrapped in `tokio::task::spawn_blocking` so the calling tokio worker stays responsive. The dataset cache uses `Rc` (not Send/Sync) and is thread-local, so each blocking-pool worker maintains its own cache — concurrent loads via `buffer_unordered` build per-worker caches without contention. - `read_as_bytes` returns a `Vec<u8>`; `Buffer::from_vec` wraps it zero-copy for the return path, matching the design's one-storage-to-column-copy goal end-to-end on the GDAL backend. Request validation (synchronous, before spawning): - Source shape must be 2-D and `dim_names == ["y", "x"]`. Higher-rank or transposed OutDb reads need MDArray support and explicit axis mapping, tracked separately. - File's GDAL pixel type must match the band metadata's `data_type` claim. Caught here with a clear error before reading, otherwise a size mismatch would mis-blame the loader rather than naming the dtype problem. Bootstrap registration: `SedonaContext::new_from_context` calls `out.register_outdb_loader(GDAL_FORMAT, Arc::new(GdalLoader::new()))`. GDAL itself is dlopen'd lazily by `sedona-gdal` (workspace-default `gdal-sys = false`), so registration is safe on systems without libgdal — the first OutDb GeoTIFF query surfaces a clean "libgdal not found" error from `sedona-gdal::global` instead of a panic, and non-OutDb queries are unaffected. Module bookkeeping: the existing `source_uri` module (`parse_outdb_source`) was previously `#[cfg(test)]`-only; it's needed in production now, so the cfg gate is removed. `get_or_create_outdb_source` on `GDALDatasetCache` is promoted from private to `pub(crate)` so the loader can reach it. 7 unit tests cover: 2x3 UInt8 round-trip, default-band-1 when fragment absent, request-validation rejections (non-2D, non-yx dim_names, dtype mismatch), and propagation of GDAL-layer errors (missing file, out-of-range band index). Tests use real GDAL via the existing `with_gdal` helper and the in-crate driver pattern from `gdal_dataset_provider.rs`. Updated context-level test: bootstrap registry contains `"gdal"` instead of being empty; plugin registrations are stacked on top.
Materialises OutDb Zarr-backed bands when the RS_EnsureLoaded UDF
dispatches against the `"zarr"` registry key. Pairs with the Zarr
reader's existing chunk-anchor URI scheme
(`<store_uri>#array=<array_path>&chunk=<i0>,<i1>,...`): the loader
parses the anchor, opens the Zarr store + array, retrieves the named
chunk via `zarrs`, and returns the bytes as an `arrow_buffer::Buffer`.
Shape:
- `ZarrLoader` is a stateless `AsyncByteLoader` impl. Per-call store
opens; per-(store_uri, array_path) caching is a follow-up
optimisation that doesn't change the trait surface.
- Inside `load()`, the sync `zarrs` API runs inside
`tokio::task::spawn_blocking` so the calling tokio worker stays
responsive. Resulting `Buffer` is Send, crosses the await cleanly.
- Validates the Zarr array's dtype against the request's
`data_type` claim before reading. Mismatches surface here with a
clear diagnostic rather than as a downstream byte-count surprise.
Delivery model: as an out-of-tree plugin, `sedona-raster-zarr` does
NOT depend on `sedona` — callers register the loader from their own
context-setup code:
ctx.register_outdb_loader(
sedona_raster_zarr::ZARR_FORMAT,
Arc::new(sedona_raster_zarr::ZarrLoader::new()),
);
This is the asymmetry from Commit 5 (GDAL): GDAL is upstream of
sedona and registered automatically; Zarr is a plugin and explicit.
The Python wheel `sedonadb-zarr` will surface a Python-level
`register(con)` shim that wraps this call in a follow-up.
Module bookkeeping: `ChunkAnchor`, `parse_chunk_anchor`, and the
`retrieve_chunk_bytes` chunk-read primitive in `loader.rs` were
previously `#[cfg(test)]`-only — they're production-needed now, so
the cfg gates are removed. `retrieve_chunk_bytes` is now `pub(crate)`
since `outdb_loader` is its only consumer outside tests.
5 unit tests cover: UInt8 chunk round-trip (write via `ArrayBuilder`,
read via the loader), dtype mismatch with clear diagnostic, malformed
chunk-anchor URI rejection (missing fragment), missing-array-path
propagation from `zarrs`, and cloud-scheme rejection (until cloud
stores are supported). Tests build a real Zarr v3 group via
`GroupBuilder` + `ArrayBuilder` against a temp filesystem.
4ab8a0d to
d7fce30
Compare
Three small adjustments from review:
- Drop the "Class-level by convention…" paragraph from
`SedonaScalarUDF::with_needs_bytes()`'s doc comment. The remaining
doc already names the purpose; the extra prose was prescriptive
without adding information.
- Replace `unreachable!("matched above")` in
`ensure_loaded_analyzer::rewrite_expr_node` with a
`sedona_internal_datafusion_err!`-bearing let-else. Same
intent-preserved-as-leading-comment, but the failure mode on a
future refactor that breaks the invariant is a clean query-level
internal error instead of a process-level panic — matches the
defensive style used everywhere else in this PR.
- Fix codespell: "Implementors" → "Implementers" in the
`AsyncByteLoader` trait doc.
paleolimbot
left a comment
There was a problem hiding this comment.
I didn't get through the analyzer bits today but thought I'd leave the loader part and I'll take a look at the analyzer tomorrow. In general this looks great!
|
|
||
| let bytes = retrieve_chunk_bytes(&array, &anchor.chunk_indices)?; |
There was a problem hiding this comment.
While this is nicely happening on another thread so as to not block what's happening elsewhere, doing it all at once makes it non-cancelable. If it's possible to break this into chunks of work that are independently awaitable it would make it easier for tokio to schedule them effectively. In the absence of that, perhaps put a limit on either the number of bytes or the number of seconds and loop in such a way that a user won't get a hanging process.
There was a problem hiding this comment.
This will all go away when we migrate to the async/obstore-based implementation when we add cloud support to the zarr crate.
There was a problem hiding this comment.
rough(!) draft of migrating sedona-zarr to be all async: #888
| let buffer = tokio::task::spawn_blocking(move || -> Result<Buffer, ArrowError> { | ||
| let anchor = parse_chunk_anchor(&uri)?; | ||
| let fs_path = group_uri_to_filesystem_path(&anchor.store_uri)?; | ||
| let store = FilesystemStore::new(&fs_path).map_err(|e| { |
There was a problem hiding this comment.
Does the zarr crate have an async API you can use to avoid a spawn here? I was expecting this to look something like
let read_tasks = open_zarr(uri).await?;
futures::stream::iter(read_tasks)
.map(|task| {
async move {
task.doit().await
}
})
.boxed()
.buffered(max_concurrency)
.try_collect()
.await?;...but maybe that's what's happening one level higher than this. As long as the data transfer size is bounded (a few MB per chance to evaluate whether this should continue or not) the current approach is OK too.
There was a problem hiding this comment.
the Cloud ones will. AFAIK file doesnt. I can look for something to avoid forking those two cases but no promises.
There was a problem hiding this comment.
Ok I did some more research. There is a way to use zarrs with obs store to get a local fs that works asynchronously.
My plan (wrote it down in my notes for that work) is that when I do that work to integrate zarr cloud support I will remove all of this synchronous code.
There was a problem hiding this comment.
Not a big deal, just wondering if there was already an async way to split this up (totally fine if there isn't, although best if you can find a way to do it chunks like the GDAL loader does)
There was a problem hiding this comment.
rough(!) draft of migrating sedona-zarr to be all async: #888
| let height = usize::try_from(req.source_shape[0]).map_err(|_| { | ||
| ArrowError::InvalidArgumentError(format!( | ||
| "GDAL OutDb source_shape[0]={} exceeds usize::MAX", | ||
| req.source_shape[0] | ||
| )) | ||
| })?; |
There was a problem hiding this comment.
Is this loading the source shape or the target shape? (I think Kristin added a getter for an extract that considers a target shape as long as it is contiguous)
There was a problem hiding this comment.
source_shape is the name of the field in the raster trait. it represents the shape of:
a. in the indentity view case, the data
b. in the non-identity view case, the data that backs the view.
it is named as such to be clear of the difference between the visible bytes vs the backing bytes.
So in this case source_shape is the target shape.
| /// Raw source shape in `dim_names` order. The loader returns a | ||
| /// `Buffer` whose length equals `Π source_shape × data_type.byte_size()` | ||
| /// bytes, encoding pixels in C-order over `dim_names`. | ||
| pub source_shape: &'a [u64], |
There was a problem hiding this comment.
Shouldn't this see the view as well? In some cases the IO will be the same (but we can discard a lot of pixels right away); in other cases we can avoid IO because the retile will map to a small subset of a contiguous range. (Apologies if I misunderstood what source_shape is here)
There was a problem hiding this comment.
I'm not sure. I havent thought about designing the system to allow "pushing" views into loaders. This PR does not support that and I'm not sure when we would be able to take advantage of such a thing.
Generally a cog tile or zarr chunk its retrieved atomically so we can't benefit from "pushing" the view in these cases. so I dont think theres a way to reduce io with such support.
Im not sure when you want to eagerly slice/crop vs when you want to perform that lazily.
Do you think crops on indb rasters should be eager in general? if so we could probably solve this with operator fusion + a rule in the crop operators impl that always eager crops the indb raster
There was a problem hiding this comment.
Yes, I think that if a loader is asked to load a 3x3 chunk of a 1e6 by 1e6 raster, the loader should return 9 pixels. It would be up to the caller to choose whether retiling or cropping is appropriate (and up to the loader to decide whether or not to cache 1e6 by 1e6 pixels in anticipation of reading another slice of the same chunk).
It's OK not to implement crop pushdown but if the information isn't there it can never happen.
| #[async_trait::async_trait] | ||
| pub trait AsyncByteLoader: Send + Sync { | ||
| /// Fetch the band's bytes. The returned `Buffer` must contain exactly | ||
| /// `Π source_shape × data_type.byte_size()` bytes in C-order over | ||
| /// `dim_names`. Errors propagate to the caller of `RS_EnsureLoaded`. | ||
| async fn load(&self, req: &OutDbLoadRequest<'_>) -> Result<Buffer, ArrowError>; | ||
| } |
There was a problem hiding this comment.
I am not sure exactly what the parameter should be here, but this should possibly also pass an Arc<dyn ObjectStore>. The loader might not be able to make use of this (i.e., it might use the URI and hope/pray that the authentication ENV vars specified auth identically) but it would be great to centralize IO around this (we need this for non-raster datasources as well, in particular we need to FFI it to get true auth unification across a plugin boundary).
There was a problem hiding this comment.
Probably a good idea. GDAL wouldnt use but native loaders (ie zarr) will.
I think probably this can find a good home in the zarr cloud PR.
I'll sketch that up tonight or tomorrow in parallel with this PR in case we want to land that first. Some of the changes in that might make this PR simpler to review.
There was a problem hiding this comment.
rough(!) draft of migrating sedona-zarr to be all async: #888
depending which of these PRs we land first ill think about how to best pipe the obstore into the byte load request.
Addresses two of Dewey's review comments on the GDAL loader: the non-cancellable spawn_blocking concern and the absence of a size sanity check. - New `MAX_OUTDB_LOAD_BYTES` (4 GiB) constant. Requests whose `Π source_shape × byte_size` exceeds this are rejected before the blocking task is spawned — runaway / corrupt-metadata loads can't tie up a blocking-pool thread. - `CancelOnDrop` guard flips an `Arc<AtomicBool>` when the outer async future is dropped (e.g. query-cancel). The blocking task observes the flag between block-height-aligned strip reads and returns a cancellation error rather than running to completion. - Read loop replaced: single `band.read_as_bytes` → pre-allocated output `Vec<u8>` + `band.read_into_bytes` per strip of `band.block_size().1` rows. Each strip writes into a contiguous output slice (no per-block row-splice math). Cancellation granule = natural block height: 1–64 rows for strip GeoTIFFs, 256 rows for tile GeoTIFFs (COG-style). For whole-image-block formats (PNG / JPEG) the byte cap is the primary safety net. - Helper `read_band_blockwise(band, output, w, h, byte_size, cancel)` extracted as the testable unit. Direct tests for pre-armed cancel and multi-strip byte-correctness; full-loader integration tests for byte-cap rejection. 10 unit tests, up from 7. All existing tests pass unchanged.
Addresses Dewey's review feedback that the registry belongs in the session's `ConfigOptions` extensions, mirroring how `CrsProvider` is hosted inside `SedonaOptions`. The motivation: any UDF can reach `ConfigOptions` at dispatch time via `AsyncScalarUDFImpl::invoke_async_with_args`, whereas the previous closure-bound `Arc` was reachable only by the one UDF that closed over it. ConfigOptions is the DataFusion-idiomatic placement. Shape: - New `OutDbLoaderConfig` `ConfigExtension` in `sedona-raster` that holds a `OutDbLoaderRegistryOption(Arc<RwLock<OutDbLoaderRegistry>>)` field. Lives in `sedona-raster` (not `sedona-common`) because the registry type is here and a cross-crate move would create a circular dep. Has its own `sedona.outdb_loader` PREFIX namespace alongside the existing `sedona` `SedonaOptions`. - `SedonaContext::new_from_context` builds the shared `Arc<RwLock<OutDbLoaderRegistry>>` once, stashes a clone inside ConfigOptions via `extensions.insert(OutDbLoaderConfig::from_handle(...))`, and keeps another clone in the existing `SedonaContext.outdb_registry` field. Both handles share the same `RwLock`, so writes through `register_outdb_loader(&self, ...)` are immediately visible to UDF reads. - `RsEnsureLoaded` no longer holds a registry field. It pulls the handle out of `args.config_options.extensions.get::<OutDbLoaderConfig>()` at the top of every `invoke_async_with_args` call. The struct is now session-agnostic — one instance is correct for any session that has the extension registered. Tests updated to call `RsEnsureLoaded::new()` with no args. - `ensure_loaded(...)` test helper signature simplified to `(input, lookup_closure)`; the redundant registry arg is gone. Mirrors the precedent established by `CrsProviderOption`/`SedonaOptions` in `sedona-common`. If a future caller bypasses `SedonaContext::new()` and builds their own session without the extension, RS_EnsureLoaded returns a clear internal error naming `OutDbLoaderConfig` so the fix is obvious. All 109 sedona-crate tests + 72 sedona-raster tests pass unchanged.
paleolimbot
left a comment
There was a problem hiding this comment.
A few notes, but the details here are all great!
| /// Raw source shape in `dim_names` order. The loader returns a | ||
| /// `Buffer` whose length equals `Π source_shape × data_type.byte_size()` | ||
| /// bytes, encoding pixels in C-order over `dim_names`. | ||
| pub source_shape: &'a [u64], |
There was a problem hiding this comment.
Yes, I think that if a loader is asked to load a 3x3 chunk of a 1e6 by 1e6 raster, the loader should return 9 pixels. It would be up to the caller to choose whether retiling or cropping is appropriate (and up to the loader to decide whether or not to cache 1e6 by 1e6 pixels in anticipation of reading another slice of the same chunk).
It's OK not to implement crop pushdown but if the information isn't there it can never happen.
| // Show registered format keys but hide the opaque loader impls | ||
| // (which don't carry a meaningful Debug surface). |
There was a problem hiding this comment.
Probably better to just add Debug (AsyncByteLoader: Send + Sync + Debug) and #[derive(Debug)] here.
| // Register the analyzer rule that wraps raster args of | ||
| // `needs_bytes` UDFs with `RS_EnsureLoaded`. The rule holds an | ||
| // Arc clone of the same UDF so the wraps it injects route | ||
| // through the same async dispatcher (and thus the same | ||
| // per-session registry) as any explicit `RS_EnsureLoaded(...)` | ||
| // already written by the user. | ||
| out.ctx | ||
| .state_ref() | ||
| .write() | ||
| .add_analyzer_rule(Arc::new(EnsureLoadedAnalyzerRule::new(ensure_loaded_udf))); |
There was a problem hiding this comment.
I think you don't need this because logical optimizer rules have access to the function_registry(). (If this has to stay an analyzer rule and analyzer rules don't share that context this is a good solution)
| // Register the GDAL OutDb byte loader. `sedona-raster-gdal` is a | ||
| // mandatory dep on `sedona`, but libgdal itself is dlopen'd | ||
| // lazily by `sedona-gdal` (workspace-default-features = false), | ||
| // so this registration is safe on systems without libgdal — | ||
| // the loader's `load()` call will surface a clean "libgdal not | ||
| // found" error when first invoked, but registration and import | ||
| // succeed regardless. | ||
| out.register_outdb_loader( | ||
| sedona_raster_gdal::GDAL_FORMAT, | ||
| Arc::new(sedona_raster_gdal::GdalLoader::new()), | ||
| ); |
There was a problem hiding this comment.
If this becomes problematic we could gate it behind the [gdal] feature and just set this up automatically in the Python package when creating the context, but I think lazy GDAL is OK to include.
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
| // Output shape mirrors input — the schema is preserved; only the | ||
| // `data` column's contents change. We require exactly one Struct | ||
| // argument shaped like a raster. | ||
| if arg_types.len() != 1 { |
There was a problem hiding this comment.
I think you need return_field() here (or the extension "sedona.raster" from the raster type will get dropped).
There was a problem hiding this comment.
Should this live in the sedona-raster-funcs crate?
| // For InDb bands, copy bytes into an owned Buffer. | ||
| // `Buffer::from_vec` is zero-copy ownership transfer of | ||
| // the Vec; the per-row clone of `band.data()` itself is | ||
| // the one InDb copy we accept for sequential simplicity. | ||
| // Parallel fan-out + zero-copy borrowing of the input | ||
| // column is a follow-up optimisation. |
There was a problem hiding this comment.
If it isn't already, this should have a tracking issue + link here because it is a particularly egregious copy that should be fixed before anybody uses this seriously
| /// Analyzer rule that wraps raster arguments of `needs_bytes` UDFs with | ||
| /// `RS_EnsureLoaded`. Holds an `Arc<ScalarUDF>` clone of the registered | ||
| /// `RS_EnsureLoaded` UDF so wraps it injects reference the same async | ||
| /// dispatcher (and thus the same per-session loader registry) as any | ||
| /// explicit `RS_EnsureLoaded(...)` already written by the user. | ||
| #[derive(Debug)] | ||
| pub struct EnsureLoadedAnalyzerRule { | ||
| ensure_loaded_udf: Arc<ScalarUDF>, | ||
| } |
There was a problem hiding this comment.
Unless I'm missing something I think this is probably better as a logical optimizer rule (you can put it in sedona-query-planner with the other logical optimizer rule we have there).
| /// Class-level metadata: does this UDF read raster pixel bytes from | ||
| /// any of its inputs? The optimiser/analyzer rule for `RS_EnsureLoaded` | ||
| /// uses this to decide whether to wrap raster arguments. Default | ||
| /// `false`; set via [`SedonaScalarUDF::with_needs_bytes`] at | ||
| /// construction time for UDFs whose kernels call | ||
| /// `BandRef::nd_buffer()` / `BandRef::contiguous_data()`. | ||
| needs_bytes: bool, |
There was a problem hiding this comment.
Can you make this more generic (like: HashMap<String, String>)? This would need to be added to the FFI in sedona-extension and a single bool is tricky to future proof. Alternatively, just keep a list of the functions that need pixels in the optimizer rule.
| // Walk plan bottom-up so a nested `RS_X(RS_Y(rast))` is rewritten | ||
| // inside-out: the inner call's raster arg is wrapped first, then | ||
| // the outer call's (now-wrapped) arg is rewritten if it itself | ||
| // is a needs_bytes target. | ||
| plan.transform_up(|p| { | ||
| // Only single-input nodes (Projection, Filter, Aggregate, | ||
| // Window) carry expressions whose args are evaluated against | ||
| // a single input schema. Multi-input nodes (Join) and leaf | ||
| // nodes (TableScan, EmptyRelation) are passed through — | ||
| // joins with needs_bytes UDFs in the condition are a niche | ||
| // case to handle in a follow-up. |
There was a problem hiding this comment.
If this becomes a logical optimizer rule it will need to be able to run twice on the same plan, which might make the inside out logic need guarding. (Maybe you'd have to check the expression for any RS_EnsureLoaded call and skip transforming it if so).
Summary
This PR adds support for OutDB raster loading via the
RS_EnsureLoadedUDF. When a function that needs the bytes of a raster available is called, an optimizer rule injects RS_EnsureLoaded into the expression tree. This _should also leverage Common Subexpression Elimination from Datafusion to avoid repeated loads.This PR supercedes #849. This is the new approach based on the review comments I received from Dewey on that PR.
Copy reduction
Returning
arrow_buffer::Buffer(notVec<u8>orBytes) and constructing the outputBinaryViewArraydirectly from collectedBuffers — bypassingBinaryViewBuilder's internal block copy — gives one copy total (storage → loader-internalBuffer) on the byte path, down from three with a naiveVec<u8>interface.Zarr Registration
I punted on this but we will need it in the near future. The standard datafusion FFI doesnt help us here because these OutDBLoaders are not part of the datafusion interface. the Agent presented two solutions:
A. Build a C-ABI wrapper for AsyncByteLoader following the datafusion-ffi pattern (#[repr(C)] struct, extern "C" function pointers, abi_stable types). ~200-400 LoC of FFI scaffolding + safety review. Future-proof but heavyweight, and we'd own the ABI compatibility story going forward.
B. PyCapsule with raw Arc transit. Non-standard but works when both wheels link the same sedona-raster version. PyArrow has analogous patterns (PyCapsule carrying ArrowSchema* etc., where the C struct happens to be standardized — ours wouldn't be). Document the version-coupling constraint.