Skip to content

Commit 0bc626a

Browse files
timsaucerclaude
andcommitted
Add FFI type coverage and implementation pattern to check-upstream skill
Document the full FFI type pipeline (Rust PyO3 wrapper → Protocol type → Python wrapper → ABC base class → exports → example) and catalog which upstream datafusion-ffi types are supported, which have been evaluated as not needing direct exposure, and how to check for new gaps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1d27e8f commit 0bc626a

File tree

1 file changed

+144
-0
lines changed

1 file changed

+144
-0
lines changed

.claude/skills/check-upstream/SKILL.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,150 @@ def new_method(self, param: str) -> DataFrame:
196196
return DataFrame(self.ctx.new_method(param))
197197
```
198198

199+
### Adding a New FFI Type
200+
201+
FFI types require a full pipeline from C struct through to a typed Python wrapper. Each layer must be present.
202+
203+
**Step 1: Rust PyO3 wrapper class** in a new or existing file under `crates/core/src/`:
204+
```rust
205+
use datafusion_ffi::new_type::FFI_NewType;
206+
207+
#[pyclass(from_py_object, frozen, name = "RawNewType", module = "datafusion.module_name", subclass)]
208+
pub struct PyNewType {
209+
pub inner: Arc<dyn NewTypeTrait>,
210+
}
211+
212+
#[pymethods]
213+
impl PyNewType {
214+
#[staticmethod]
215+
fn from_pycapsule(obj: &Bound<'_, PyAny>) -> PyDataFusionResult<Self> {
216+
let capsule = obj
217+
.getattr("__datafusion_new_type__")?
218+
.call0()?
219+
.downcast::<PyCapsule>()?;
220+
let ffi_ptr = unsafe { capsule.reference::<FFI_NewType>() };
221+
let provider: Arc<dyn NewTypeTrait> = ffi_ptr.into();
222+
Ok(Self { inner: provider })
223+
}
224+
225+
fn some_method(&self) -> PyResult<...> {
226+
// wrap inner trait method
227+
}
228+
}
229+
```
230+
Register in the appropriate `init_module()`:
231+
```rust
232+
m.add_class::<PyNewType>()?;
233+
```
234+
235+
**Step 2: Python Protocol type** in the appropriate Python module (e.g., `python/datafusion/catalog.py`):
236+
```python
237+
class NewTypeExportable(Protocol):
238+
"""Type hint for objects providing a __datafusion_new_type__ PyCapsule."""
239+
240+
def __datafusion_new_type__(self) -> object: ...
241+
```
242+
243+
**Step 3: Python wrapper class** in the same module:
244+
```python
245+
class NewType:
246+
"""Description of the type.
247+
248+
This class wraps a DataFusion NewType, which can be created from a native
249+
Python implementation or imported from an FFI-compatible library.
250+
"""
251+
252+
def __init__(
253+
self,
254+
new_type: df_internal.module_name.RawNewType | NewTypeExportable,
255+
) -> None:
256+
if isinstance(new_type, df_internal.module_name.RawNewType):
257+
self._raw = new_type
258+
else:
259+
self._raw = df_internal.module_name.RawNewType.from_pycapsule(new_type)
260+
261+
def some_method(self) -> ReturnType:
262+
"""Description of the method."""
263+
return self._raw.some_method()
264+
```
265+
266+
**Step 4: ABC base class** (if users should be able to subclass and provide custom implementations in Python):
267+
```python
268+
from abc import ABC, abstractmethod
269+
270+
class NewTypeProvider(ABC):
271+
"""Abstract base class for implementing a custom NewType in Python."""
272+
273+
@abstractmethod
274+
def some_method(self) -> ReturnType:
275+
"""Description of the method."""
276+
...
277+
```
278+
279+
**Step 5: Module exports** — add to the appropriate `__init__.py`:
280+
- Add the wrapper class (`NewType`) to `python/datafusion/__init__.py`
281+
- Add the ABC (`NewTypeProvider`) if applicable
282+
- Add the Protocol type (`NewTypeExportable`) if it should be public
283+
284+
**Step 6: FFI example** — add an example implementation under `examples/datafusion-ffi-example/src/`:
285+
```rust
286+
// examples/datafusion-ffi-example/src/new_type.rs
287+
use datafusion_ffi::new_type::FFI_NewType;
288+
// ... example showing how an external Rust library exposes this type via PyCapsule
289+
```
290+
291+
**Checklist for each FFI type:**
292+
- [ ] Rust PyO3 wrapper with `from_pycapsule()` method
293+
- [ ] Python Protocol type (e.g., `NewTypeExportable`) for FFI objects
294+
- [ ] Python wrapper class with full type hints on all public methods
295+
- [ ] ABC base class (if the type can be user-implemented)
296+
- [ ] Registered in Rust `init_module()` and Python `__init__.py`
297+
- [ ] FFI example in `examples/datafusion-ffi-example/`
298+
- [ ] Type appears in union type hints where accepted (e.g., `Table | TableProviderExportable`)
299+
300+
### 7. FFI Types (datafusion-ffi)
301+
302+
**Upstream source of truth:**
303+
- Crate source: https://github.com/apache/datafusion/tree/main/datafusion/ffi/src
304+
- Rust docs: https://docs.rs/datafusion-ffi/latest/datafusion_ffi/
305+
306+
**Where they are exposed in this project:**
307+
- Rust bindings: various files under `crates/core/src/` and `crates/util/src/`
308+
- FFI example: `examples/datafusion-ffi-example/src/`
309+
- Dependency declared in root `Cargo.toml` and `crates/core/Cargo.toml`
310+
311+
**Currently supported FFI types:**
312+
- `FFI_ScalarUDF``crates/core/src/udf.rs`
313+
- `FFI_AggregateUDF``crates/core/src/udaf.rs`
314+
- `FFI_WindowUDF``crates/core/src/udwf.rs`
315+
- `FFI_TableFunction``crates/core/src/udtf.rs`
316+
- `FFI_TableProvider``crates/core/src/table.rs`, `crates/util/src/lib.rs`
317+
- `FFI_TableProviderFactory``crates/core/src/context.rs`
318+
- `FFI_CatalogProvider``crates/core/src/catalog.rs`, `crates/core/src/context.rs`
319+
- `FFI_CatalogProviderList``crates/core/src/context.rs`
320+
- `FFI_SchemaProvider``crates/core/src/catalog.rs`
321+
- `FFI_LogicalExtensionCodec` — multiple files
322+
- `FFI_ExtensionOptions``crates/core/src/context.rs`
323+
- `FFI_TaskContextProvider``crates/core/src/context.rs`
324+
325+
**Evaluated and not requiring direct Python exposure:**
326+
These upstream FFI types have been reviewed and do not need to be independently exposed to end users:
327+
- `FFI_ExecutionPlan` — already used indirectly through table providers; no need for direct exposure
328+
- `FFI_PhysicalExpr` / `FFI_PhysicalSortExpr` — internal physical planning types not expected to be needed by end users
329+
- `FFI_RecordBatchStream` — one level deeper than FFI_ExecutionPlan, used internally when execution plans stream results
330+
- `FFI_SessionRef` / `ForeignSession` — session sharing across FFI; Python manages sessions natively via SessionContext
331+
- `FFI_SessionConfig` — Python can configure sessions natively without FFI
332+
- `FFI_ConfigOptions` / `FFI_TableOptions` — internal configuration plumbing
333+
- `FFI_PlanProperties` / `FFI_Boundedness` / `FFI_EmissionType` — read from existing plans, not user-facing
334+
- `FFI_Partitioning` — supporting type for physical planning
335+
- Supporting/utility types (`FFI_Option`, `FFI_Result`, `WrappedSchema`, `WrappedArray`, `FFI_ColumnarValue`, `FFI_Volatility`, `FFI_InsertOp`, `FFI_AccumulatorArgs`, `FFI_Accumulator`, `FFI_GroupsAccumulator`, `FFI_EmitTo`, `FFI_AggregateOrderSensitivity`, `FFI_PartitionEvaluator`, `FFI_PartitionEvaluatorArgs`, `FFI_Range`, `FFI_SortOptions`, `FFI_Distribution`, `FFI_ExprProperties`, `FFI_SortProperties`, `FFI_Interval`, `FFI_TableProviderFilterPushDown`, `FFI_TableType`) — used as building blocks within the types above, not independently exposed
336+
337+
**How to check:**
338+
1. Compare the upstream `datafusion-ffi` crate's `lib.rs` exports against the lists above
339+
2. If new FFI types appear upstream, evaluate whether they represent a user-facing capability
340+
3. Check against the "evaluated and not requiring exposure" list before flagging as a gap
341+
4. Report any genuinely new types that enable user-facing functionality
342+
199343
## Important Notes
200344

201345
- The upstream DataFusion version used by this project is specified in `crates/core/Cargo.toml` — check the `datafusion` dependency version to ensure you're comparing against the right upstream version.

0 commit comments

Comments
 (0)