Skip to content

Commit a657f7d

Browse files
committed
Initial commit for skill to check upstream repo
1 parent ad8d41f commit a657f7d

File tree

1 file changed

+205
-0
lines changed

1 file changed

+205
-0
lines changed
Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
---
2+
name: check-upstream
3+
description: Check if upstream Apache DataFusion features (functions, DataFrame ops, SessionContext methods) are exposed in this Python project. Use when adding missing functions, auditing API coverage, or ensuring parity with upstream.
4+
argument-hint: [area] (e.g., "scalar functions", "aggregate functions", "window functions", "dataframe", "session context", "all")
5+
---
6+
7+
# Check Upstream DataFusion Feature Coverage
8+
9+
You are auditing the datafusion-python project to find features from the upstream Apache DataFusion Rust library that are **not yet exposed** in this Python binding project. Your goal is to identify gaps and, if asked, implement the missing bindings.
10+
11+
## Areas to Check
12+
13+
The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" is given, check all areas.
14+
15+
### 1. Scalar Functions
16+
17+
**Upstream source of truth:**
18+
- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions/index.html
19+
- User docs: https://datafusion.apache.org/user-guide/sql/scalar_functions.html
20+
21+
**Where they are exposed in this project:**
22+
- Python API: `python/datafusion/functions.py` — each function wraps a call to `datafusion._internal.functions`
23+
- Rust bindings: `crates/core/src/functions.rs``#[pyfunction]` definitions registered via `init_module()`
24+
25+
**How to check:**
26+
1. Fetch the upstream scalar function documentation page
27+
2. Compare against functions listed in `python/datafusion/functions.py` (check the `__all__` list)
28+
3. Also check `crates/core/src/functions.rs` for what's registered in `init_module()`
29+
4. Report functions that exist upstream but are missing from this project
30+
31+
### 2. Aggregate Functions
32+
33+
**Upstream source of truth:**
34+
- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions_aggregate/index.html
35+
- User docs: https://datafusion.apache.org/user-guide/sql/aggregate_functions.html
36+
37+
**Where they are exposed in this project:**
38+
- Python API: `python/datafusion/functions.py` (aggregate functions are mixed in with scalar functions)
39+
- Rust bindings: `crates/core/src/functions.rs`
40+
41+
**How to check:**
42+
1. Fetch the upstream aggregate function documentation page
43+
2. Compare against aggregate functions in `python/datafusion/functions.py`
44+
3. Report missing aggregate functions
45+
46+
### 3. Window Functions
47+
48+
**Upstream source of truth:**
49+
- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions_window/index.html
50+
- User docs: https://datafusion.apache.org/user-guide/sql/window_functions.html
51+
52+
**Where they are exposed in this project:**
53+
- Python API: `python/datafusion/functions.py` (window functions like `rank`, `dense_rank`, `lag`, `lead`, etc.)
54+
- Rust bindings: `crates/core/src/functions.rs`
55+
56+
**How to check:**
57+
1. Fetch the upstream window function documentation page
58+
2. Compare against window functions in `python/datafusion/functions.py`
59+
3. Report missing window functions
60+
61+
### 4. Table Functions
62+
63+
**Upstream source of truth:**
64+
- Rust docs: https://docs.rs/datafusion/latest/datafusion/functions_table/index.html
65+
- User docs: https://datafusion.apache.org/user-guide/sql/table_functions.html (if available)
66+
67+
**Where they are exposed in this project:**
68+
- Python API: `python/datafusion/functions.py` and `python/datafusion/user_defined.py` (TableFunction/udtf)
69+
- Rust bindings: `crates/core/src/functions.rs` and `crates/core/src/udtf.rs`
70+
71+
**How to check:**
72+
1. Fetch the upstream table function documentation
73+
2. Compare against what's available in this project
74+
3. Report missing table functions
75+
76+
### 5. DataFrame Operations
77+
78+
**Upstream source of truth:**
79+
- Rust docs: https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html
80+
81+
**Where they are exposed in this project:**
82+
- Python API: `python/datafusion/dataframe.py` — the `DataFrame` class
83+
- Rust bindings: `crates/core/src/dataframe.rs``PyDataFrame` with `#[pymethods]`
84+
85+
**How to check:**
86+
1. Fetch the upstream DataFrame documentation page listing all methods
87+
2. Compare against methods in `python/datafusion/dataframe.py`
88+
3. Also check `crates/core/src/dataframe.rs` for what's implemented
89+
4. Report DataFrame methods that exist upstream but are missing
90+
91+
### 6. SessionContext Methods
92+
93+
**Upstream source of truth:**
94+
- Rust docs: https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html
95+
96+
**Where they are exposed in this project:**
97+
- Python API: `python/datafusion/context.py` — the `SessionContext` class
98+
- Rust bindings: `crates/core/src/context.rs``PySessionContext` with `#[pymethods]`
99+
100+
**How to check:**
101+
1. Fetch the upstream SessionContext documentation page listing all methods
102+
2. Compare against methods in `python/datafusion/context.py`
103+
3. Also check `crates/core/src/context.rs` for what's implemented
104+
4. Report SessionContext methods that exist upstream but are missing
105+
106+
## Output Format
107+
108+
For each area checked, produce a report like:
109+
110+
```
111+
## [Area Name] Coverage Report
112+
113+
### Currently Exposed (X functions/methods)
114+
- list of what's already available
115+
116+
### Missing from Upstream (Y functions/methods)
117+
- function_name — brief description of what it does
118+
- function_name — brief description of what it does
119+
120+
### Notes
121+
- Any relevant observations about partial implementations, naming differences, etc.
122+
```
123+
124+
## Implementation Pattern
125+
126+
If the user asks you to implement missing features, follow these patterns:
127+
128+
### Adding a New Function (Scalar/Aggregate/Window)
129+
130+
**Step 1: Rust binding** in `crates/core/src/functions.rs`:
131+
```rust
132+
#[pyfunction]
133+
#[pyo3(signature = (arg1, arg2))]
134+
fn new_function_name(arg1: PyExpr, arg2: PyExpr) -> PyResult<PyExpr> {
135+
Ok(datafusion::functions::module::expr_fn::new_function_name(arg1.expr, arg2.expr).into())
136+
}
137+
```
138+
Then register in `init_module()`:
139+
```rust
140+
m.add_wrapped(wrap_pyfunction!(new_function_name))?;
141+
```
142+
143+
**Step 2: Python wrapper** in `python/datafusion/functions.py`:
144+
```python
145+
def new_function_name(arg1: Expr, arg2: Expr) -> Expr:
146+
"""Description of what the function does.
147+
148+
Args:
149+
arg1: Description of first argument.
150+
arg2: Description of second argument.
151+
152+
Returns:
153+
Description of return value.
154+
"""
155+
return Expr(f.new_function_name(arg1.expr, arg2.expr))
156+
```
157+
Add to `__all__` list.
158+
159+
### Adding a New DataFrame Method
160+
161+
**Step 1: Rust binding** in `crates/core/src/dataframe.rs`:
162+
```rust
163+
#[pymethods]
164+
impl PyDataFrame {
165+
fn new_method(&self, py: Python, param: PyExpr) -> PyDataFusionResult<Self> {
166+
let df = self.df.as_ref().clone().new_method(param.into())?;
167+
Ok(Self::new(df))
168+
}
169+
}
170+
```
171+
172+
**Step 2: Python wrapper** in `python/datafusion/dataframe.py`:
173+
```python
174+
def new_method(self, param: Expr) -> DataFrame:
175+
"""Description of the method."""
176+
return DataFrame(self.df.new_method(param.expr))
177+
```
178+
179+
### Adding a New SessionContext Method
180+
181+
**Step 1: Rust binding** in `crates/core/src/context.rs`:
182+
```rust
183+
#[pymethods]
184+
impl PySessionContext {
185+
pub fn new_method(&self, py: Python, param: String) -> PyDataFusionResult<PyDataFrame> {
186+
let df = wait_for_future(py, self.ctx.new_method(&param))?;
187+
Ok(PyDataFrame::new(df))
188+
}
189+
}
190+
```
191+
192+
**Step 2: Python wrapper** in `python/datafusion/context.py`:
193+
```python
194+
def new_method(self, param: str) -> DataFrame:
195+
"""Description of the method."""
196+
return DataFrame(self.ctx.new_method(param))
197+
```
198+
199+
## Important Notes
200+
201+
- The upstream DataFusion version used by this project is specified in `crates/core/Cargo.toml` — check the `datafusion` dependency version to ensure you're comparing against the right upstream version.
202+
- Some upstream features may intentionally not be exposed (e.g., internal-only APIs). Use judgment about what's user-facing.
203+
- When fetching upstream docs, prefer the published docs.rs documentation as it matches the crate version.
204+
- Function aliases (e.g., `array_append` / `list_append`) should both be exposed if upstream supports them.
205+
- Check the `__all__` list in `functions.py` to see what's publicly exported vs just defined.

0 commit comments

Comments
 (0)