You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .ai/skills/check-upstream/SKILL.md
+45Lines changed: 45 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,6 +29,29 @@ You are auditing the datafusion-python project to find features from the upstrea
29
29
30
30
**IMPORTANT: The Python API is the source of truth for coverage.** A function or method is considered "exposed" if it exists in the Python API (e.g., `python/datafusion/functions.py`), even if there is no corresponding entry in the Rust bindings. Many upstream functions are aliases of other functions — the Python layer can expose these aliases by calling a different underlying Rust binding. Do NOT report a function as missing if it appears in the Python `__all__` list and has a working implementation, regardless of whether a matching `#[pyfunction]` exists in Rust.
31
31
32
+
**IMPORTANT: audit the total upstream surface, not the delta since the last pin.** Gaps accumulate across syncs. A patch-release bump with a "bug fixes only" changelog does not mean there is nothing to find — pre-existing gaps from earlier majors still need to be surfaced. Always run the full comparison.
33
+
34
+
## Compile-Signal Triggers
35
+
36
+
If a recent upstream bump required *any* of the following while fixing
37
+
compile errors in `crates/core/` or the FFI example, treat that as a
38
+
**hard signal** that user-facing surface area grew and run this skill
39
+
before considering the bump done. Each pattern corresponds to a class of
40
+
gap that frequently shows up in the audit:
41
+
42
+
| Signal during PR 1 compile fix | Likely gap to check |
43
+
|---|---|
44
+
| New `Expr::*` variant added to a non-exhaustive `match` (`HigherOrderFunction`, `Lambda`, `LambdaVariable`, …) | New lambda / higher-order scalar functions (`any_match`, `array_transform`, `list_transform`, …) |
45
+
| New `ScalarValue::*` variant (`ListView`, `LargeListView`, …) | New scalar / array functions that consume or produce the type |
46
+
| New required trait method on `ExecutionPlan` / `TableProvider` / `*UDFImpl` (`apply_expressions`, …) | Corresponding capability on the Python wrapper class |
47
+
| Renamed or restructured struct field (e.g. `Cast.data_type` → `Cast.field: FieldRef`) | Any Python accessor / SKILL.md doc that read the old field |
48
+
| Newly deprecated trait method with a `_with_args` / `_with_options` replacement | The `*_with_options` variant frequently warrants a separate Python entry point |
49
+
50
+
PR 1 of `dev/release/upstream-sync.md` asks you to log these signals as
51
+
they appear. When you run this skill, use that log as a checklist: every
52
+
entry must either show up in the audit output or be explicitly skipped
53
+
with a reason.
54
+
32
55
## Areas to Check
33
56
34
57
The user may specify an area via `$ARGUMENTS`. If no area is specified or "all" is given, check all areas.
@@ -173,6 +196,28 @@ These upstream FFI types have been reviewed and do not need to be independently
173
196
- FFI example in `examples/datafusion-ffi-example/`
174
197
- Type appears in union type hints where accepted
175
198
199
+
### 8. `__all__` Hygiene (functions.py)
200
+
201
+
Independent of upstream parity, also flag public `def` symbols in
202
+
`python/datafusion/functions.py` that are missing from the module's
203
+
`__all__`. These are functions a user can call but that do not show up in
204
+
`from datafusion.functions import *`, in tab-completion against the
205
+
namespace, or in generated API docs — typically an oversight rather than
206
+
an intentional omission.
207
+
208
+
**How to check:**
209
+
1. Grep for `^def ([a-z_][a-z0-9_]*)\(` in `python/datafusion/functions.py`
210
+
to enumerate every public function definition.
211
+
2. Read the `__all__` list at the top of the same file.
212
+
3. Report any function in (1) that is not in (2). Skip private helpers
213
+
(names starting with `_`).
214
+
215
+
A historical example: `instr` and `position` shipped as public `def`s but
216
+
were absent from `__all__` until the gap was caught here.
217
+
218
+
For each finding, propose adding the name to `__all__` in alphabetical
219
+
position with the existing entries.
220
+
176
221
## Checking for Existing GitHub Issues
177
222
178
223
After identifying missing APIs, search the open issues at https://github.com/apache/datafusion-python/issues for each gap to see if an issue already exists requesting that API be exposed. Search using the function or method name as the query.
0 commit comments