Skip to content

Add storage:schemes to geoparquet #999

@rsignell

Description

@rsignell

TL;DR: It would be great if we could use rustac to search STAC items in geoparquet and return sufficient info to open Icechunks in Xarray.

I made a catalog of STAC items for each of the dynamical.org Icechunk datasets on AWS Open Data, and then used rustac to create a geoparquet version.  I pushed both to a bucket, mostly because I wanted to keep the regular catalog for the STAC browser.

I then asked Claude Code to make a notebook that queries the geoparquet with rustac and open one of the returned items in xarray: https://nbviewer.org/gist/rsignell/42cba4d8db34f49ed91538ef47375b32

What it built works. But after querying the geoparquet file, it uses the JSON STAC items to get the asset info necessary to load into Xarray, saying:
"rustac drops non-spec top-level attributes (like storage:schemes) when writing GeoParquet, so we look the item up by ID in the original JSON catalog, which has the full metadata needed by xpystac to open the icechunk store."

So it would be great if we could store the storage:schemes in geoparquet also, allowing us to open and start working with the icechunk assets from the items that got returned.

In case it's useful, here are the notebooks I used for the dynamical data:

The notebook to create the STAC catalog: https://github.com/OpenScienceComputing/NCICS-2026/blob/main/build_dynamical_catalog_full.ipynb

The notebook to create geoparquet:
https://github.com/OpenScienceComputing/NCICS-2026/blob/main/build_geoparquet.ipynb

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions