Skip to content

Commit b0c583d

Browse files
aboydnwclaude
andcommitted
docs: tighten obstore notebook copy
Refines the intro framing, expected-result notes, and section explanations across the notebook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ab7889a commit b0c583d

1 file changed

Lines changed: 20 additions & 30 deletions

File tree

quickstarts/obstore.ipynb

Lines changed: 20 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"id": "b02d33be",
66
"metadata": {},
7-
"source": "# Working with Planetary Computer data using obstore\n\nThis notebook walks through reading Planetary Computer data with [obstore](https://developmentseed.org/obstore/) — a Python library that talks to cloud object stores (Azure Blob, S3, GCS) directly, without going through HTTP wrappers like fsspec. It's the foundation that higher-level libraries (async-geotiff, zarr-python, deck.gl-raster via Lonboard) sit on top of.\n\n**Five reasons to use it over the older `planetary_computer.sign() + fsspec` pattern:**\n\n1. **Reliability** — SAS tokens auto-refresh. No `TokenExpiredError` mid-job, no manual re-signing.\n2. **Cost** — range reads download only the bytes you need (e.g. a 16 KB COG header instead of a 100 MB file).\n3. **Speed** — async surface fires reads in parallel. Roughly N× faster than serial for multi-file workloads.\n4. **Composability** — the same store works with async-geotiff, zarr-python, Lonboard, etc.\n5. **Portability** — `AzureStore`, `S3Store`, `GCSStore` are interchangeable. Cloud-agnostic code.\n\nEach cell below calls out which of these it demonstrates. Speed-relevant cells use `%%time` so you can compare wall-clock numbers.\n\nThe companion [obstore tutorial](../overview/obstore.md) has the full narrative and migration reference."
7+
"source": "# Working with Planetary Computer data using obstore\n\nThis notebook walks through reading Planetary Computer data with [obstore](https://developmentseed.org/obstore/). Obstore is a Python library that talks to cloud object stores (Azure Blob, S3, GCS) directly, without going through HTTP wrappers like fsspec. This has a number of key benefits:**\n\n1. **Reliability** - SAS tokens auto-refresh. No `TokenExpiredError` mid-job, no manual re-signing.\n2. **Cost** — range reads download only the bytes you need.\n3. **Speed** — async surface fires reads in parallel. Roughly N× faster than serial for multi-file workloads.\n4. **Composability** — any library that accepts an [obspec](https://github.com/developmentseed/obspec)-compatible store reads through your authenticated connection without re-doing auth.\n5. **Portability** — `AzureStore`, `S3Store`, `GCSStore` are interchangeable. Cloud-agnostic code.\n\nEach cell below calls out which of these it demonstrates. Speed-relevant cells use `%%time` so you can compare wall-clock numbers.\n\nThe companion [obstore tutorial](../overview/obstore.md) has the full narrative and migration reference."
88
},
99
{
1010
"cell_type": "markdown",
@@ -33,7 +33,7 @@
3333
"source": [
3434
"## Authenticate from a STAC asset\n",
3535
"\n",
36-
"**Demonstrates: reliability.** `PlanetaryComputerCredentialProvider` handles SAS token acquisition and refresh under the hood — no manual `planetary_computer.sign()` calls anywhere in this notebook. If a token expires mid-job, the provider re-acquires it transparently. The old fsspec pattern required you to handle re-signing and retry logic yourself.\n",
36+
"`PlanetaryComputerCredentialProvider` handles SAS token acquisition and refresh under the hood — no manual `planetary_computer.sign()` calls anywhere in this notebook. If a token expires mid-job, the provider re-acquires it transparently. The old fsspec pattern required you to handle re-signing and retry logic yourself.\n",
3737
"\n",
3838
"**Expected result:** working `provider` object, no output printed."
3939
]
@@ -62,7 +62,7 @@
6262
"id": "b81cc8c9",
6363
"metadata": {},
6464
"source": [
65-
"Notice the asset href is unsigned — no SAS query string appended. The provider signs it for you at read time."
65+
"Notice the asset href is unsigned without a SAS query string appended. The provider signs it for you at read time."
6666
]
6767
},
6868
{
@@ -82,7 +82,7 @@
8282
"source": [
8383
"## Build a store\n",
8484
"\n",
85-
"A *store* is obstore's connection to a specific cloud location. Once built, you hand it to any obstore read/write function or to any higher-level library that accepts an obstore-compatible store.\n",
85+
"A *store* is obstore's connection to a specific cloud location. Once built, you hand it to any obstore read/write function, or to any higher-level library that accepts an obstore-compatible store.\n",
8686
"\n",
8787
"**Expected result:** working `store` object, no output printed."
8888
]
@@ -106,18 +106,11 @@
106106
"source": [
107107
"## Read\n",
108108
"\n",
109-
"`from_asset()` scopes the store to that *specific blob* — the asset URL becomes the store's prefix. So every read uses an empty string as the path; obstore appends the path to the prefix, and you don't want it appending anything. (For multi-object access you'd build a container-scoped store instead — covered further down.)\n",
110-
"\n",
111-
"Three ways to read, each demonstrating a different value prop:\n",
109+
"There are three ways to read data, depending on your needs.:\n",
112110
"\n",
113111
"### 1. Read the entire file\n",
114112
"\n",
115-
"**Demonstrates: baseline (the slow path you want to avoid).** Use when you actually want all the bytes. NAIP scenes range 100–500 MB.\n",
116-
"\n",
117-
"A surprise here: this cell is slow even on a fast connection. Azure Blob caps single-stream downloads at roughly 8–15 MB/s — your home bandwidth doesn't help. To go faster you'd need parallel range reads against the same file (which is what async-geotiff does internally when it reads COG tiles).\n",
118-
"\n",
119-
"This cell is the foil for everything below. The whole point of range reads and async is to avoid this scenario.\n",
120-
"\n",
113+
"**This is the slowest path. Use when you actually want all the bytes. For large files, this can take a long time. For example, these NAIP scenes range from 100–500 MB, taking a minute or more depending on your connection.\n",
121114
"**Expected result:** 100–500 million bytes, 30–90 seconds depending on which NAIP scene came back."
122115
]
123116
},
@@ -142,9 +135,9 @@
142135
"source": [
143136
"### 2. Read a byte range (16 KB)\n",
144137
"\n",
145-
"**Demonstrates: cost savings.** A Cloud Optimized GeoTIFF stores its header in the first few KB. Most libraries (async-geotiff, GDAL, rasterio) only need the header to start working — they don't need the pixel data until you ask for a specific window. Range reads make this possible.\n",
138+
"A Cloud Optimized GeoTIFF stores its header in the first few KB. Most libraries (async-geotiff, GDAL, rasterio) only need the header to start working. They don't need the pixel data until you ask for a specific window. Range reads make this possible.\n",
146139
"\n",
147-
"**Expected result:** 16,384 bytes, well under a second. Tens of thousands of times less data than the full file above (the exact multiple depends on your scene's size)."
140+
"**Expected result:** 16,384 bytes, well under a second. Much less data than the full file above."
148141
]
149142
},
150143
{
@@ -167,7 +160,7 @@
167160
"source": [
168161
"### 3. Read multiple byte ranges in one request\n",
169162
"\n",
170-
"**Demonstrates: latency savings.** When you need several slices of the same file — say, multiple COG tiles — you could issue separate `get_range` calls. Each one is a round-trip to Azure. `get_ranges` batches them into a single HTTP request, cutting round-trip latency.\n",
163+
"When you need several slices of the same file you could issue separate `get_range` calls. Each one is a round-trip to Azure. `get_ranges` batches them into a single HTTP request, cutting round-trip latency.\n",
171164
"\n",
172165
"**Expected result:** two ranges of 16 KB each, similar wall time to a single `get_range`."
173166
]
@@ -193,7 +186,7 @@
193186
"source": [
194187
"## Listing requires a container-scoped store\n",
195188
"\n",
196-
"**Demonstrates: reach beyond a single asset.** Up to here we've worked with one blob. To enumerate objects under a prefix (\"show me every NAIP scene in Montana in 2023\"), the store needs to be scoped to the container *and* the credential provider needs container-level `List` permission. The asset-derived provider above only signs the single blob — it can't list — so we build a fresh provider against the container URL.\n",
189+
"To enumerate objects under a prefix (\"show me every NAIP scene in Montana in 2023\"), the store needs to be scoped to the container *and* the credential provider needs container-level `List` permission. The asset-derived provider above only signs the single blob so we build a fresh provider against the container URL.\n",
197190
"\n",
198191
"**Expected result:** three lines printed, each a blob path and its size in bytes."
199192
]
@@ -225,9 +218,9 @@
225218
"id": "93160f86",
226219
"metadata": {},
227220
"source": [
228-
"## Concurrent reads (async) — the speed payoff\n",
221+
"## Concurrent reads (async)\n",
229222
"\n",
230-
"**Demonstrates: speed via parallelism.** Up to here, reads happen one at a time. For multi-file workloads, running them in parallel is dramatically faster than serial. Below we read the same 4 KB header four times first serially, then concurrently — and compare wall times.\n",
223+
"For multi-file workloads, running them in parallel is dramatically faster than serial. Below we read the same 4 KB header four times - first serially, then concurrently — and compare wall times.\n",
231224
"\n",
232225
"Async needs its own credential provider class (`PlanetaryComputerAsyncCredentialProvider`) backed by `aiohttp` instead of `requests`."
233226
]
@@ -251,7 +244,6 @@
251244
"id": "79c84f80",
252245
"metadata": {},
253246
"source": [
254-
"We'll time both with `time.perf_counter()` for an apples-to-apples comparison (the `%%time` magic doesn't play well with top-level `await`, so we measure manually).\n",
255247
"\n",
256248
"**Warmup the async store.** First call has to acquire a SAS token from Planetary Computer — a separate HTTP round-trip. We do one throwaway read so that overhead doesn't pollute the timing below. (The sync store was already warmed up by cells 1/2/3, which is why we only need to warm the async store.)"
257249
]
@@ -336,7 +328,7 @@
336328
"source": [
337329
"## Hand the store to async-geotiff\n",
338330
"\n",
339-
"**Demonstrates: composability.** The whole point of obstore is that *other libraries* sit on top of it. Once you have a working `AzureStore`, you can hand it to any library that accepts an [obspec](https://github.com/developmentseed/obspec)-compatible store — async-geotiff, zarr-python, and others — and they'll read through your authenticated connection. No re-auth, no double signing."
331+
"Once you have a working `AzureStore`, you can hand it to any library that accepts an [obspec](https://github.com/developmentseed/obspec)-compatible store and they'll read through your authenticated connection."
340332
]
341333
},
342334
{
@@ -354,9 +346,9 @@
354346
"id": "1c801295",
355347
"metadata": {},
356348
"source": [
357-
"Open the NAIP scene as a COG and read its metadata. `geotiff.transform` is the affine that maps pixel coordinates to geographic coordinates. `geotiff.crs` is the coordinate reference system.\n",
349+
"Open the NAIP scene as a COG and read its metadata. geotiff.transform tells you the scene's pixel size and corner position on the ground. \n",
358350
"\n",
359-
"**Expected result:** the affine transform and the CRS name (e.g. `NAD83 / UTM zone 11N` — varies by scene)."
351+
"**Expected result:** a transform and a CRS name (e.g. NAD83 / UTM zone 11N)."
360352
]
361353
},
362354
{
@@ -379,7 +371,7 @@
379371
"id": "f6d207db",
380372
"metadata": {},
381373
"source": [
382-
"If you want the full CRS details (datum, axis order, area of use), just evaluate `geotiff.crs` on its own — pyproj prints a detailed dump.\n",
374+
"If you want the full CRS details (datum, axis order, area of use), just evaluate `geotiff.crs` on its own\n",
383375
"\n",
384376
"Notice async-geotiff only fetched ~16 KB to get this metadata, not the full file. The range-read win compounds at every level of the stack."
385377
]
@@ -389,25 +381,23 @@
389381
"id": "9dd739eb",
390382
"metadata": {},
391383
"source": [
392-
"## Bonus: portability\n",
384+
"## Portability\n",
393385
"\n",
394-
"**Demonstrates: cloud-agnostic code.** The same `obstore.get(store, ...)` call works against S3 or GCSonly the store constructor changes. This cell isn't runnable (we don't have S3 creds), but it shows the shape:\n",
386+
"The same `obstore.get(store, ...)` call works against S3 or GCS, only the store constructor changes. The example below shows the shape of the request in S3:\n",
395387
"\n",
396388
"```python\n",
397389
"from obstore.store import S3Store\n",
398390
"\n",
399391
"s3_store = S3Store(bucket=\"my-bucket\", region=\"us-west-2\")\n",
400392
"buf = obstore.get(s3_store, \"path/to/object\").bytes() # same call\n",
401-
"```\n",
402-
"\n",
403-
"Any library that accepts an obspec-compatible store benefits automatically. async-geotiff opening a COG works identically against Azure, S3, or GCS."
393+
"```\n"
404394
]
405395
},
406396
{
407397
"cell_type": "markdown",
408398
"id": "b2d0385b",
409399
"metadata": {},
410-
"source": "## You're done\n\nIf you got expected output on every cell above, the obstore stack is wired up end-to-end:\n\n- **Reliability** — authenticated against Planetary Computer with auto-refreshing SAS tokens\n- **Cost** — read 16 KB of a multi-hundred-MB file via range reads (tens of thousands of times less egress)\n- **Latency** — batched multi-range read in a single round-trip\n- **Speed** — parallel reads several times faster than serial\n- **Composability** — handed the same store to async-geotiff, opened a COG with no re-auth\n- **Portability** — same API shape works for S3 and GCS\n\nFrom here, any obspec-compatible library plugs in the same way. Check the companion [Lonboard tutorial](../overview/lonboard.md) for interactive visualization or the [async-geotiff tutorial](../overview/async-geotiff.md) for pixel-level analysis."
400+
"source": "## You're done\n\nIf you got expected output on every cell above, the obstore stack is wired up end-to-end.\n\nFrom here, any obspec-compatible library plugs in the same way. Check the companion [Lonboard tutorial](../overview/lonboard.md) for interactive visualization or the [async-geotiff tutorial](../overview/async-geotiff.md) for pixel-level analysis."
411401
}
412402
],
413403
"metadata": {

0 commit comments

Comments
 (0)