Skip to content

Commit 4b21572

Browse files
toppyykevinjqliutimsaucer
authored
Add a working, more complete example of using a catalog (docs) (#1427)
* Add a working, more complete example of using a catalog * the default schema is 'public', not 'default' * in-memory table instead of imaginary csv for standalone example * typo fix Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> * minor c string fix after merge --------- Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> Co-authored-by: Tim Saucer <timsaucer@gmail.com>
1 parent 75d07ce commit 4b21572

File tree

2 files changed

+20
-8
lines changed

2 files changed

+20
-8
lines changed

crates/core/src/context.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ impl PySessionConfig {
196196
let capsule = capsule.cast::<PyCapsule>()?;
197197

198198
let extension: NonNull<FFI_ExtensionOptions> = capsule
199-
.pointer_checked(Some(c_str!("datafusion_extension_options")))?
199+
.pointer_checked(Some(c"datafusion_extension_options"))?
200200
.cast();
201201
let mut extension = unsafe { extension.as_ref() }.clone();
202202

docs/source/user-guide/data-sources.rst

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -224,25 +224,37 @@ A common technique for organizing tables is using a three level hierarchical app
224224
supports this form of organizing using the :py:class:`~datafusion.catalog.Catalog`,
225225
:py:class:`~datafusion.catalog.Schema`, and :py:class:`~datafusion.catalog.Table`. By default,
226226
a :py:class:`~datafusion.context.SessionContext` comes with a single Catalog and a single Schema
227-
with the names ``datafusion`` and ``default``, respectively.
227+
with the names ``datafusion`` and ``public``, respectively.
228228

229229
The default implementation uses an in-memory approach to the catalog and schema. We have support
230-
for adding additional in-memory catalogs and schemas. This can be done like in the following
230+
for adding additional in-memory catalogs and schemas. You can access tables registered in a schema
231+
either through the Dataframe API or via sql commands. This can be done like in the following
231232
example:
232233

233234
.. code-block:: python
234235
236+
import pyarrow as pa
235237
from datafusion.catalog import Catalog, Schema
238+
from datafusion import SessionContext
239+
240+
ctx = SessionContext()
236241
237242
my_catalog = Catalog.memory_catalog()
238-
my_schema = Schema.memory_schema()
243+
my_schema = Schema.memory_schema()
244+
my_catalog.register_schema('my_schema_name', my_schema)
245+
ctx.register_catalog_provider('my_catalog_name', my_catalog)
239246
240-
my_catalog.register_schema("my_schema_name", my_schema)
247+
# Create an in-memory table
248+
table = pa.table({
249+
'name': ['Bulbasaur', 'Charmander', 'Squirtle'],
250+
'type': ['Grass', 'Fire', 'Water'],
251+
'hp': [45, 39, 44],
252+
})
253+
df = ctx.create_dataframe([table.to_batches()], name='pokemon')
241254
242-
ctx.register_catalog("my_catalog_name", my_catalog)
255+
my_schema.register_table('pokemon', df)
243256
244-
You could then register tables in ``my_schema`` and access them either through the DataFrame
245-
API or via sql commands such as ``"SELECT * from my_catalog_name.my_schema_name.my_table"``.
257+
ctx.sql('SELECT * FROM my_catalog_name.my_schema_name.pokemon').show()
246258
247259
User Defined Catalog and Schema
248260
-------------------------------

0 commit comments

Comments
 (0)