Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 19 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,18 @@
>
> v0.1.0 includes a major refactor with breaking API changes.
>
> - Migration steps: [docs/migration-guide-v0.1.0.md](docs/migration-guide-v0.1.0.md)
> - Canonical install options (prerelease vs stable): [docs/installation.md](docs/installation.md)
> - Migration steps:
> [docs/migration-guide-v0.1.0.md](docs/migration-guide-v0.1.0.md)
> - Canonical install options (prerelease vs stable):
> [docs/installation.md](docs/installation.md)

# Xee: Xarray + Google Earth Engine

![Xee Logo](https://raw.githubusercontent.com/google/Xee/main/docs/xee-logo.png)

Xee is an Xarray backend for Google Earth Engine. Open `ee.Image` / `ee.ImageCollection` objects as lazy `xarray.Dataset`s and analyze petabyte‑scale Earth data with the scientific Python stack.
Xee is an Xarray backend for Google Earth Engine. Open `ee.Image` /
`ee.ImageCollection` objects as lazy `xarray.Dataset`s and analyze
petabyte-scale Earth data with the scientific Python stack.

[![image](https://img.shields.io/pypi/v/xee.svg)](https://pypi.python.org/pypi/xee)
[![image](https://static.pepy.tech/badge/xee)](https://pepy.tech/project/xee)
Expand Down Expand Up @@ -51,25 +55,27 @@ print(ds)

Next steps:

- [Quickstart](docs/quickstart.md)
- [Concepts (grid params, CRS, orientation)](docs/concepts.md)
- [User Guide (workflows)](docs/guide.md)
- [Quickstart](docs/quickstart.md)
- [Concepts (grid params, CRS, orientation)](docs/concepts.md)
- [User Guide (workflows)](docs/guide.md)

## Features

- Lazy, parallel pixel retrieval through Earth Engine
- Flexible output grid definition (fixed resolution or fixed shape)
- CF-friendly dimension order: `[time, y, x]`
- Plays nicely with Xarray, Dask, and friends
- Lazy, parallel pixel retrieval through Earth Engine
- Flexible output grid definition (fixed resolution or fixed shape)
- CF-friendly dimension order: `[time, y, x]`
- Plays nicely with Xarray, Dask, and friends

## Community & Support

- [Discussions](https://github.com/google/Xee/discussions)
- [Issues](https://github.com/google/Xee/issues)
- [Discussions](https://github.com/google/Xee/discussions)
- [Issues](https://github.com/google/Xee/issues)

## Contributing

See [Contributing](https://github.com/google/Xee/blob/main/docs/contributing.md) and sign the required CLA. For local development, we recommend the Pixi environments defined in this repository for reproducible test and docs runs.
See [Contributing](https://github.com/google/Xee/blob/main/docs/contributing.md)
and sign the required CLA. For local development, we recommend the Pixi
environments defined in this repository for reproducible test and docs runs.

## License

Expand All @@ -78,4 +84,3 @@ See [Contributing](https://github.com/google/Xee/blob/main/docs/contributing.md)
`SPDX-License-Identifier: Apache-2.0`

This is not an official Google product.

9 changes: 4 additions & 5 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
## User grid helpers

High-level utilities for deriving or matching pixel grid parameters passed to
``xarray.open_dataset(..., engine='ee')``.
`xarray.open_dataset(..., engine='ee')`.

```{eval-rst}
.. autosummary::
Expand All @@ -21,9 +21,8 @@ High-level utilities for deriving or matching pixel grid parameters passed to

## Core extension backend

Lower-level interfaces used internally by the xarray backend. Most users do
not need these directly; they're documented for advanced workflows and
debugging.
Lower-level interfaces used internally by the xarray backend. Most users do not
need these directly; they're documented for advanced workflows and debugging.

```{eval-rst}
.. autosummary::
Expand All @@ -41,4 +40,4 @@ debugging.
:toctree: _autosummary

geometry_to_bounds
```
```
3 changes: 1 addition & 2 deletions docs/code-of-conduct.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,5 +89,4 @@ harassment or threats to anyone's safety, we may take action without notice.
## Attribution

This Code of Conduct is adapted from the Contributor Covenant, version 1.4,
available at
https://www.contributor-covenant.org/version/1/4/code-of-conduct/
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct/
10 changes: 6 additions & 4 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ sign a new one.

### Review our Community Guidelines

This project follows [Google's Open Source Community
Guidelines](https://opensource.google/conduct/).
This project follows
[Google's Open Source Community Guidelines](https://opensource.google/conduct/).

## Contribution process

Expand Down Expand Up @@ -61,7 +61,9 @@ build with warnings treated as errors.

### Running tests

The Xee integration tests only pass on Xee branches (no forks). Please run the integration tests locally before sending a PR. To run the tests locally, authenticate using `earthengine authenticate` and run one of the following:
The Xee integration tests only pass on Xee branches (no forks). Please run the
integration tests locally before sending a PR. To run the tests locally,
authenticate using `earthengine authenticate` and run one of the following:

```bash
pixi run -e tests python -m unittest xee/ext_integration_test.py
Expand All @@ -87,4 +89,4 @@ pixi run -e docs docs-check
```

If your change touches Earth Engine integration behavior and you are working on
an Xee branch, also run the integration tests locally.
an Xee branch, also run the integration tests locally.
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ Xee v0.1.0 introduced a refactored API with breaking changes relative to the 0.0
If you are upgrading from 0.0.x, see the [Migration Guide](migration-guide-v0.1.0.md).
```

Xee is an Xarray extension for Google Earth Engine that lets you open `ee.Image` and `ee.ImageCollection` objects as lazy `xarray.Dataset`s.
Xee is an Xarray extension for Google Earth Engine that lets you open `ee.Image`
and `ee.ImageCollection` objects as lazy `xarray.Dataset`s.

```{toctree}
:maxdepth: 2
Expand All @@ -27,4 +28,3 @@ why-xee
contributing
code-of-conduct
```

49 changes: 26 additions & 23 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Installation

Install Xee with pip or conda. Use virtual environments (`venv`, conda envs) to avoid dependency conflicts.
Install Xee with pip or conda. Use virtual environments (`venv`, conda envs) to
avoid dependency conflicts.

Xee v0.1.0 introduced a refactored API with breaking changes relative to the 0.0.x series. This page documents the v0.1.0+ install and usage path. If you are upgrading from 0.0.x, review the [Migration Guide](migration-guide-v0.1.0.md).
Xee v0.1.0 introduced a refactored API with breaking changes relative to the
0.0.x series. This page documents the v0.1.0+ install and usage path. If you are
upgrading from 0.0.x, review the [Migration Guide](migration-guide-v0.1.0.md).

## Install

Expand Down Expand Up @@ -34,23 +37,24 @@ Prerelease builds are not published to Conda-Forge.

## Earth Engine setup

Xee makes requests to [Google Earth
Engine](https://developers.google.com/earth-engine/guides) for data. To use
Earth Engine, you'll need to create and register a Google Cloud project,
authenticate with Google, and initialize the service.
Xee makes requests to
[Google Earth Engine](https://developers.google.com/earth-engine/guides) for
data. To use Earth Engine, you'll need to create and register a Google Cloud
project, authenticate with Google, and initialize the service.

If you already have a registered Earth Engine Cloud project and know the auth/initialize steps, skip to the [Quickstart](quickstart.md).
If you already have a registered Earth Engine Cloud project and know the
auth/initialize steps, skip to the [Quickstart](quickstart.md).

**Note**: the authentication and initialization steps described in the following
sections cover the majority of common system configurations and access methods,
if you're having trouble, refer to the Earth Engine [Authentication and
Initialization guide](https://developers.google.com/earth-engine/guides/auth).
if you're having trouble, refer to the Earth Engine
[Authentication and Initialization guide](https://developers.google.com/earth-engine/guides/auth).

### 1. Create and register a Cloud project

Follow instructions in the [Earth Engine Access
guide](https://developers.google.com/earth-engine/guides/access#get_access_to_earth_engine
) to create and register a Google Cloud project.
Follow instructions in the
[Earth Engine Access guide](https://developers.google.com/earth-engine/guides/access#get_access_to_earth_engine)
to create and register a Google Cloud project.

### 2. Authentication

Expand All @@ -63,9 +67,8 @@ environment:
#### Persistent environment (one-time)

If you're working from a system with a persistent environment, such as a local
computer or on-premises server, you can authenticate using the [Earth Engine
command line
utility](https://developers.google.com/earth-engine/guides/command_line#authenticate):
computer or on-premises server, you can authenticate using the
[Earth Engine command line utility](https://developers.google.com/earth-engine/guides/command_line#authenticate):

```shell
earthengine authenticate
Expand All @@ -87,25 +90,25 @@ every session. In this case, you can use the `earthengine-api` library
ee.Authenticate()
```

This method selects the most appropriate [authentication
mode](https://developers.google.com/earth-engine/guides/auth#authentication_details)
This method selects the most appropriate
[authentication mode](https://developers.google.com/earth-engine/guides/auth#authentication_details)
and guides you through steps to generate authentication credentials. Be sure to
rerun the authentication process each time the environment is reset.

### 3. Initialization

Initialization checks user authentication credentials, sets the Cloud project to
use for requests, and connects the client to Earth Engine's services. At the
top of your script, include one of the following expressions with the `project`
use for requests, and connects the client to Earth Engine's services. At the top
of your script, include one of the following expressions with the `project`
argument modified to match the Google Cloud project ID enabled and registered
for Earth Engine use.

#### High-volume endpoint (stored collections)

If you are requesting stored data (supplying a collection ID or passing an
unmodified `ee.ImageCollection()` object to `xarray.open_dataset`), connect to
the [high-volume
endpoint](https://developers.google.com/earth-engine/guides/processing_environments#high-volume_endpoint).
the
[high-volume endpoint](https://developers.google.com/earth-engine/guides/processing_environments#high-volume_endpoint).

```python
ee.Initialize(
Expand All @@ -117,8 +120,8 @@ ee.Initialize(
#### Standard endpoint (computed collections / iterative development)

If you are requesting computed data (applying expressions to the data), consider
connecting to the [standard
endpoint](https://developers.google.com/earth-engine/guides/processing_environments#standard_endpoint).
connecting to the
[standard endpoint](https://developers.google.com/earth-engine/guides/processing_environments#standard_endpoint).
It utilizes caching, so it can be more efficient if you need to rerun or adjust
something about the request.

Expand Down
122 changes: 65 additions & 57 deletions docs/why-xee.md
Original file line number Diff line number Diff line change
@@ -1,85 +1,93 @@
# Why Xee?

We noticed two clusters of users working with climate and weather data at
Google Research: Some were [Xarray](https://xarray.dev) (and
[Zarr](https://zarr.dev/)) centric and others, Google Earth Engine centric. Xee
came about as an effort to bring these two groups of developers closer together.
We noticed two clusters of users working with climate and weather data at Google
Research: Some were [Xarray](https://xarray.dev) (and [Zarr](https://zarr.dev/))
centric and others, Google Earth Engine centric. Xee came about as an effort to
bring these two groups of developers closer together.

## Goals

Primary Goals:

- Make [EE-curated data](https://developers.google.com/earth-engine/datasets)
accessible to users in the Xarray community and to the wider scientific Python
ecosystem.
- Make it trivial to avoid quota limits when computing pixels from Earth Engine.
- Provide an easy way for scientists and ML practitioners to coalesce Earth data
at different scales into a common resolution.
- Make [EE-curated data](https://developers.google.com/earth-engine/datasets)
accessible to users in the Xarray community and to the wider scientific
Python ecosystem.
- Make it trivial to avoid quota limits when computing pixels from Earth
Engine.
- Provide an easy way for scientists and ML practitioners to coalesce Earth
data at different scales into a common resolution.

Secondary Goals:

- Provide a succinct interface for querying Earth Engine data at scale (i.e. via
[Xarray-Beam](https://xarray-beam.readthedocs.io/)).
- Make it trivial to quickly [export Earth Engine data to Zarr](https://github.com/google/xee/tree/main/examples#export-earth-engine-imagecollections-to-zarr-with-xarray-beam).
- Provide compelling alternative for the need to export Zarr in the first
place (e.g. during the ML training process).
- Provide a succinct interface for querying Earth Engine data at scale (i.e.
via [Xarray-Beam](https://xarray-beam.readthedocs.io/)).
- Make it trivial to quickly
[export Earth Engine data to Zarr](https://github.com/google/xee/tree/main/examples#export-earth-engine-imagecollections-to-zarr-with-xarray-beam).
- Provide compelling alternative for the need to export Zarr in the first
place (e.g. during the ML training process).

## Approach

With the addition of Earth Engine's [Pixel API](https://medium.com/google-earth/pixels-to-the-people-2d3c14a46da6),
With the addition of Earth Engine's
[Pixel API](https://medium.com/google-earth/pixels-to-the-people-2d3c14a46da6),
it became possible to easily get NumPy array data from `ee.Image`s. In building
tools atop of this, we noticed that the best practices for managing data were
Xarray-shaped. For example:

- Our codebases involved many similar LOC to translate between Earth Engine and
arrays: Users typically thought in NumPy and molded EE's Python client to fit
those idioms.
- We often needed to page `computePixel()` requests in a way that's strikingly
similar to Dask/Xarray's concept of [`chunks`](https://docs.xarray.dev/en/stable/user-guide/dask.html#what-is-a-dask-array).
- Users were wrapping NumPy arrays within dataclasses to associate metadata and
labels with data.
- Our codebases involved many similar LOC to translate between Earth Engine
and arrays: Users typically thought in NumPy and molded EE's Python client
to fit those idioms.
- We often needed to page `computePixel()` requests in a way that's strikingly
similar to Dask/Xarray's concept of
[`chunks`](https://docs.xarray.dev/en/stable/user-guide/dask.html#what-is-a-dask-array).
- Users were wrapping NumPy arrays within dataclasses to associate metadata
and labels with data.

In an attempt to group these disparate solutions into a singular interface, we
experimented with wrapping `computePixels()` into
[Xarray's standard mechanism for defining backends](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html). The result of this effort is Xee.

[Xarray's standard mechanism for defining backends](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html).
The result of this effort is Xee.

## An array by any other name? (Xee vs Zarr)

[Zarr](https://zarr.dev/) has been growing in relevance to the world of [cloud-based scientific data](https://doi.org/10.1109/MCSE.2021.3059437).
Members of the open source community have [demonstrated](https://www.youtube.com/watch?v=0bqpxX3Nn_A)
that Zarr is more of a data protocol rather than a data format. In many ways,
Xee is inspired by this work. To this end, we'd like to point out some
similarities and differences between Zarr backed and Earth Engine backed data in
Xarray.
[Zarr](https://zarr.dev/) has been growing in relevance to the world of
[cloud-based scientific data](https://doi.org/10.1109/MCSE.2021.3059437).
Members of the open source community have
[demonstrated](https://www.youtube.com/watch?v=0bqpxX3Nn_A) that Zarr is more of
a data protocol rather than a data format. In many ways, Xee is inspired by this
work. To this end, we'd like to point out some similarities and differences
between Zarr backed and Earth Engine backed data in Xarray.

Similarities:
- **Xarray-compatible**: Of course, this library proves that both types of data
stores can be compatible with Xarray. [Zarr](https://docs.xarray.dev/en/stable/user-guide/io.html#zarr)
reading and writing is deeply integrated into Xarray as well.
- **Optimal IO Chunks**: Ultimately, cloud-based data stores will inherently
involve networking overhead. There are similarities in the best way to page
data across a network into a local context: the optimal Zarr chunk
size is around [10-100 MBs](https://esipfed.github.io/cloud-computing-cluster/optimization-practices.html#chunk-size). With Earth Engine's backend, the maximum chunk size possible
is 48 MBs.

- **Xarray-compatible**: Of course, this library proves that both types of
data stores can be compatible with Xarray.
[Zarr](https://docs.xarray.dev/en/stable/user-guide/io.html#zarr) reading
and writing is deeply integrated into Xarray as well.
- **Optimal IO Chunks**: Ultimately, cloud-based data stores will inherently
involve networking overhead. There are similarities in the best way to page
data across a network into a local context: the optimal Zarr chunk size is
around
[10-100 MBs](https://esipfed.github.io/cloud-computing-cluster/optimization-practices.html#chunk-size).
With Earth Engine's backend, the maximum chunk size possible is 48 MBs.

Differences:
- **Quota vs No Quota**: Since Earth Engine is API based, there are quota
restrictions that limit IO, namely a 100 QPS limit on data requests. Readers
all need to be authenticated and tied to a GCP project quota. Zarr, on the
other hand, has a lower level access pattern. Reading is delegating to basic
permissions on cloud buckets.
- **On the fly vs up-front data shaping**: In Zarr, the representation of data
at rest fundamentally influences performance at query time. For this reason,
[rechunking](https://xarray-beam.readthedocs.io/en/latest/rechunking.html) and
projecting is a common routine performed up front on Zarr when data does not
quite fit the problem at hand. Earth Engine provides a more flexible interface
than this. Since datasets are pyramided (either at ingestion or server-side),
users are free to request the resolution and projection of the data
during dataset open. Similarly, while Earth Engine's internal dataset
does fit an internal chunking scheme, chunking schemes are a lot more
fungibile.

We hope that this comparison provides the user of a set of useful precedents
for working with cloud-based datasets.

- **Quota vs No Quota**: Since Earth Engine is API based, there are quota
restrictions that limit IO, namely a 100 QPS limit on data requests. Readers
all need to be authenticated and tied to a GCP project quota. Zarr, on the
other hand, has a lower level access pattern. Reading is delegating to basic
permissions on cloud buckets.
- **On the fly vs up-front data shaping**: In Zarr, the representation of data
at rest fundamentally influences performance at query time. For this reason,
[rechunking](https://xarray-beam.readthedocs.io/en/latest/rechunking.html)
and projecting is a common routine performed up front on Zarr when data does
not quite fit the problem at hand. Earth Engine provides a more flexible
interface than this. Since datasets are pyramided (either at ingestion or
server-side), users are free to request the resolution and projection of the
data during dataset open. Similarly, while Earth Engine's internal dataset
does fit an internal chunking scheme, chunking schemes are a lot more
fungibile.

We hope that this comparison provides the user of a set of useful precedents for
working with cloud-based datasets.
Loading
Loading