Skip to content
This repository was archived by the owner on Dec 1, 2025. It is now read-only.
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 49 additions & 11 deletions docs/gettingstarted/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
"\n",
"* The `0` and `9` tell us the \"divisions\" of the partitions. When the dataset is sorted by the index, these divisions are ranges to show which index values reside in each partition.\n",
"\n",
"We can signal to Dask that we'd like to actually obtain the data as `nested_pandas.NestedFrame` by using `compute`."
"We can use peek at the first `n` rows using `ndf.head(n)` (or the last few with `ndf.tail(n)`)."
]
},
{
Expand All @@ -66,7 +66,23 @@
"metadata": {},
"outputs": [],
"source": [
"ndf.compute() # or could use ndf.head(n) to peak at the first n rows"
"ndf.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can signal to Dask that we'd like to actually obtain *all* of the data as `nested_pandas.NestedFrame` by using `compute`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ndf.compute()"
]
},
{
Expand Down Expand Up @@ -134,14 +150,14 @@
"metadata": {},
"outputs": [],
"source": [
"result.head(5).nested[0] # no t value lower than 17.0"
"result.head(5).nested[0] # no `t` value is lower than 17.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nested-Dask `reduce` functions near-identically to Nested-Pandas `reduce`, providing a way to call custom functions on `NestedFrame` data. The one addition is that we'll need to provide the Dask `meta` value for the result. This is a dataframe-like or series-like object that has the same structure as the expected output. Let's compute the mean flux for each dataframe in the \"nested\" column. "
"Nested-Dask `reduce` functions near-identically to Nested-Pandas `reduce`, providing a way to call custom functions on `NestedFrame` data. The one additional concern is that Dask requires, in almost every case, a `meta=` argument to help Dask understand the shape and type of the output data. Dask provides a `make_meta` function, to which you can pass a dummy output value."
]
},
{
Expand All @@ -152,16 +168,38 @@
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from dask.dataframe.utils import make_meta\n",
"\n",
"# The result will be a series with float values\n",
"meta = pd.DataFrame(columns=[0], dtype=float)\n",
"# Use hierarchical column names to access the flux column\n",
"# passed as an array to np.mean .\n",
"#\n",
"# Take a single sample row, computed (that's what .head(1) will do),\n",
"# and generate the meta for it.\n",
"meta = make_meta(ndf.head(1).reduce(np.mean, \"nested.flux\"))\n",
"\n",
"# use hierarchical column names to access the flux column\n",
"# passed as an array to np.mean\n",
"means = ndf.reduce(np.mean, \"nested.flux\", meta=meta)\n",
"means.compute()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `reduce` function can also be used to apply any row-based calculation, as it turns out, even if the dimension stays the same. Observe that we can use this similar pattern to produce, say, the square of the flux. It is still a \"reduction\" in that the result is no longer within the original `NestedFrame` structure, but the cardinality of each output row is now the same as the cardinality of each input row."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"meta = make_meta(ndf.head(1).reduce(np.square, \"nested.flux\"))\n",
"\n",
"flux_sq = ndf.reduce(np.square, \"nested.flux\", meta=meta)\n",
"flux_sq.compute()"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -172,7 +210,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -186,9 +224,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
"version": "3.13.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}