Apache Iceberg read/write support

I am unsure if this repo is the right place for any of this, but I just want to get the idea out there!

I am interested in building operational pipelines that can stream new STAC records into a stac-geoparquet store. So far I have just built a pipeline that appends records via hive-partitioned parquet files but that solution would not scale well for a continuously updated metadata archive. Apache Iceberg could be a good option because it is simple to append an arrow dataframe to an Iceberg table in a transactional way and queries don't require potentially slow/expensive ListBucket requests.

Here is a rough pseudo-code Python example of how I have written STAC records to Iceberg:

```python
import pyarrow.parquet as pq
from pyiceberg.catalog import load_catalog
from pyiceberg.partitioning import PartitionSpec

catalog = load_catalog()

# get arrow table with stac records
# could probably do this with rustac.arrow, too
df = pq.read_table("/path/to/stac.parquet")

table = catalog.create_table(
    "stac.{collection_id}",
    schema=df.schema,  # need to convert to an actual Iceberg schema but you get the idea
    partition_spec=PartitionSpec(),
    properties={"geo": ...},
)

table.append(df)
```

Then subsequent tasks can write items with the same schema to the table with more `table.append(df)` calls in some distributed context without worrying about manual partitioning or file locks.

In any case it could be nice to be able to read from an iceberg table using the duckdb client. The only change I think you need to make (compared to reading a parquet file) is `iceberg_scan('/path/to/iceberg.metadata.json')` instead of `read_parquet('/path/to/stac.parquet')`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apache Iceberg read/write support #942

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Apache Iceberg read/write support #942

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions