chore: post-CLI cleanup — fix Feature.compute, llm exports, cache silence; add CLI docs#6
Conversation
…ence; add CLI docs Findings from a post-PR-#5 sweep of the repo: Bug fixes --------- 1. `featcopilot/core/feature.py` — `Feature.compute` previously called `exec(self.code, {"__builtins__": {}}, locals)` so any stored snippet that used a Python builtin (`len`, `range`, `int`, `sum`, ...) raised `NameError` at exec time, even though the snippet was otherwise valid. The fix introduces a curated `_SAFE_BUILTINS` dict mirroring `TransformRule._get_safe_builtins` so common idioms work without giving the snippet unrestricted access to imports / file I/O. 2. `featcopilot/llm/__init__.py` — `SyncCopilotFeatureClient` was defined in `copilot_client.py` and used internally by sibling modules but was missing from the package's imports and `__all__`, while sister classes `SyncOpenAIFeatureClient` and `SyncLiteLLMFeatureClient` were both exported. `from featcopilot.llm import SyncCopilotFeatureClient` raised `ImportError` even though the class itself was importable from the submodule. The fix adds the missing import + `__all__` entry so all three sync clients have a consistent public surface. 3. `featcopilot/utils/cache.py` — three bare `except Exception: pass` blocks silently swallowed disk I/O / pickle / JSON errors when the on-disk cache entry was corrupted, when a value was unpicklable (e.g. lambda), or when metadata JSON was truncated. The fix replaces each with `logger.warning(...)` so callers can diagnose disk issues, version mismatches, and unpicklable values from logs while still preserving the "cache is best-effort, never crash the caller" contract. Documentation fixes ------------------- 4. `docs/user-guide/cli.md` — NEW page documenting the `featcopilot` CLI added in PR #5 (`info` / `transform` / `explain` subcommands, output contract, agentic-usage tips, parquet caveat, `--config` example). The CLI was previously only mentioned in the README; mkdocs had no nav entry. 5. `mkdocs.yml` — added `Command-Line Interface: user-guide/cli.md` to the `User Guide` nav section. 6. `docs/getting-started/installation.md:82` — sample expected output updated from `# Output: 0.1.0` to `# Output: 0.3.7` (matches current `pyproject.toml` version). Tests ----- * `tests/test_core.py::TestFeature::test_feature_compute_uses_safe_builtins` — pins the safe-builtin whitelist; covers `len` / `range` / `sum` / `int` / `abs` / `round` / `min` / `max`. * `tests/test_utils.py` — three new `FeatureCache` regression tests for corrupted `.pkl` reads, unpicklable values on `set`, and corrupted metadata JSON, each asserting `logger.warning` is called via `mock.patch` on the module logger. * `tests/test_llm_modules.py::TestLLMPackageExports` — pins the `featcopilot.llm` public API: all three sync clients are importable directly AND listed in `__all__`, with identity checks against the underlying submodule definitions. Local: 912 tests pass (full suite), pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6 +/- ##
==========================================
+ Coverage 89.24% 89.32% +0.08%
==========================================
Files 38 38
Lines 4351 4358 +7
==========================================
+ Hits 3883 3893 +10
+ Misses 468 465 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0bb637b370
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR is a post-PR-#5 cleanup sweep that fixes three correctness issues (Feature snippet execution, LLM package exports, and cache error visibility) and fills in CLI documentation gaps so the CLI’s public contract is documented and test-pinned.
Changes:
- Fix
Feature.computeto expose a curated safe-builtin whitelist duringexec, and add regression coverage. - Export
SyncCopilotFeatureClientfromfeatcopilot.llm(imports +__all__) and pin the public API via tests. - Replace silent cache exception swallowing with
logger.warning(...)and add regression tests; add MkDocs CLI page + installation doc version update.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
featcopilot/core/feature.py |
Adds _SAFE_BUILTINS and uses it in Feature.compute execution environment. |
featcopilot/llm/__init__.py |
Exports SyncCopilotFeatureClient to align sync-client public surface with other backends. |
featcopilot/utils/cache.py |
Replaces silent exception swallowing with warnings on disk cache read/write/metadata failures. |
docs/user-guide/cli.md |
Adds a full CLI documentation page (subcommands, flags, output/exit-code contract, parquet notes). |
mkdocs.yml |
Adds the CLI page to the documentation nav. |
docs/getting-started/installation.md |
Updates the expected version output example to 0.3.7. |
tests/test_core.py |
Adds regression tests ensuring Feature.compute supports common builtins. |
tests/test_utils.py |
Adds regression tests asserting cache failures log warnings (read, write, metadata). |
tests/test_llm_modules.py |
Pins featcopilot.llm package exports and __all__ consistency for sync clients. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nct error Round-1 review on PR #6 raised four threads (3x copilot-pull-request-reviewer, 1x codex). All addressed: 1. **Codex P2 (feature.py:176) — comprehensions/lambdas saw NameError** Free variables inside a comprehension or lambda body are resolved against the enclosing function's globals, NOT against the caller's locals dict passed to `exec`. The previous fix put `df`/`np`/ `pd` only in `locals` so a snippet such as `[df['c'].iloc[i] for i in range(len(df))]` would still raise `NameError` and `FeatureSet.compute_all` would silently drop the feature. `Feature.compute` now uses a SINGLE shared namespace (passing one dict as both `globals` and `locals` to `exec`), so comprehensions / lambdas / generator expressions resolve `df`, `np`, `pd` and the safe builtins identically to the top-level statement scope. 2. **Copilot (feature.py:51) — `_SAFE_BUILTINS` mutation persisted** `_SAFE_BUILTINS` was passed to `exec` by reference, so a snippet that rebound entries (`__builtins__['len'] = lambda x: -999`) could leak the mutation into every subsequent `Feature.compute` call in the same process. `compute` now passes `dict(_SAFE_BUILTINS)` (a fresh shallow copy) on every call, mirroring `TransformRule._get_safe_builtins()` which already returns a fresh dict per call. Side effects are now strictly local to the single `exec` call. 3. **Copilot (feature.py:179) — misleading error when `result` missing** The docstring said `ValueError` would be raised when the executed code did not bind a `result` variable, but the implementation always surfaced `"No code defined for feature ..."` even when `self.code` was clearly present and only `result` was missing. The two branches now produce DISTINCT messages: - `self.code` empty → `"No code defined for feature ..."` - `self.code` present but no `result` → `"Feature '...' code did not produce a 'result' variable. Stored snippet must bind its output to a name called 'result'."` 4. **Copilot (docs/user-guide/cli.md:223) — misleading "by `--config llm_config`"** `--config` takes a JSON file path; `llm_config` is a key inside that JSON. Reworded the See-also bullet to "provide an `llm_config` object inside the JSON file passed to `--config`" and link back to the existing Configuration Files section. Tests added (regression locks): - `test_feature_compute_resolves_df_inside_comprehension` — list comprehension, lambda body, and generator expression all referencing `df` / `pd` / `len` / `range` succeed. - `test_feature_compute_isolates_safe_builtins_across_calls` — runs a "poisoner" snippet that rebinds `__builtins__['len']`, then asserts a clean follow-up snippet still sees the original `len`. - `test_feature_compute_distinguishes_missing_code_from_missing_result` — pins the two distinct `ValueError` messages. Local: 915 tests pass (full suite); +3 new tests vs round-0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Round 1 push at 3cd59a6 addresses all four previously-unresolved threads:
3 new regression tests pin each behavior. Local: 915 tests pass (full suite); pre-commit clean. @copilot @codex please re-review the WHOLE PR (not just the latest commit). Thanks! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3cd59a699c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Re-reviewed the full PR (both commits). All changes are correct and the test suite passes (911 passed, 6 skipped locally).
No issues found. The PR is ready to merge. |
Round-2 review on PR #6: Codex P2 noted that the user-guide CLI doc listed `--leakage-guard` choices as `warn` / `block` / `ignore` and the `info` JSON sample showed `["block", "ignore", "warn"]`, but the actual `AutoFeatureEngineer.SUPPORTED_LEAKAGE_GUARDS` set is `{"off", "warn", "raise"}` (per `featcopilot/transformers/sklearn_compat.py:136`) and the argparse parser validates against `sorted(SUPPORTED_LEAKAGE_GUARDS)`. Users following the doc with `--leakage-guard block` would get an argparse error rather than the described behavior. Two doc fixes: - `"supported_leakage_guards": ["block", "ignore", "warn"]` → `"supported_leakage_guards": ["off", "raise", "warn"]` in the `info` JSON sample. - `--leakage-guard` row in the transform flags table now lists the three actual values (`warn`, `raise`, `off`) with a short description of what each does. Added `test_cli_doc_leakage_guard_values_match_source_of_truth` which reads `docs/user-guide/cli.md` and asserts (a) the JSON sample contains the literal sorted-list of `SUPPORTED_LEAKAGE_GUARDS` and (b) every valid value appears as a backtick-quoted token AND no non-existent value (`block`, `ignore`) appears anywhere on the page. Future drift in either direction (renaming a guard, adding a new one, removing one) will fail this test fast. Local: 916 tests pass (full suite); pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Round 2 push at a9c25ec addresses the round-2 Codex P2 thread on Local: 916 tests pass (full suite); pre-commit clean; CI was green for the previous round-1 push and is currently re-running on a9c25ec. @copilot @codex please re-review the WHOLE PR (not just the latest commit). Thanks! |
Re-reviewed the full PR (all three commits). All changes are correct and the full test suite passes (912 passed, 6 skipped locally — up 1 from round-2 due to the new regression test). The round-2 fix in No issues found across the full PR. Ready to merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a9c25ec5d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…cstring Round-2 review on PR #6 (4 new threads). All addressed: 1. **Codex P3 (test_core.py:162) + Copilot (test_core.py:168) — broken isolation test.** `test_feature_compute_isolates_safe_builtins_across_calls` started its attacker snippet with `import builtins`, but `__import__` is intentionally NOT in the safe-builtins whitelist, so the snippet raised at the first line and the surrounding `try/except` swallowed the failure. Result: the test would still pass even if `_SAFE_BUILTINS` were passed by reference, so it was not actually pinning the per-call-copy contract it claims to protect. Rewritten to mutate `__builtins__` directly via the dict that `exec` sees as the snippet's builtins source AND assert inside the snippet (`len(df) == -999`) that the rebinding took effect. Now requires BOTH the in-snippet mutation to succeed AND the next call to see the original `len`: a real regression check. 2. **Codex P2 (cli.md:63) — info sample omitted valid choices.** The sample showed `supported_engines: ["relational", "tabular", "text", "timeseries"]` (missing `llm`) and `supported_selection_methods: ["correlation", "importance", "mutual_info"]` (missing `chi2`, `f_test`, `xgboost`). The sample now lists the FULL sorted set from `AutoFeatureEngineer.SUPPORTED_*` and a follow-up paragraph clarifies the parquet-conditional behaviour for the I/O format arrays. Added `test_cli_doc_info_sample_matches_engines_and_selection_methods` which pins both lists against the source of truth. 3. **Copilot (feature.py:156) — misleading docstring.** The previous docstring claimed the snippet ran "without giving the snippet unrestricted access to imports / file I/O". That's incorrect: `pd` is in scope, which means `pd.read_csv` / `pd.read_parquet` / `df.to_csv` are all reachable. The fix reframes the doc accurately: `__import__` is not in the safe builtins (so plain `import foo` raises) but the runtime is NOT a security sandbox; stored snippets must come from a trusted source. Verification - 917 tests pass (full suite); pre-commit clean. - The new attacker-isolation test would FAIL if `_SAFE_BUILTINS` were passed by reference (the `-999` rebinding would leak into the next call's `[len(df)] * len(df)` and produce `[-999, -999, -999]` instead of `[3, 3, 3]`); confirmed locally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Round 2 push at e5b7a19 addresses all four round-2 threads:
Local: 917 tests pass (full suite); pre-commit clean. @copilot @codex please re-review the WHOLE PR (not just the latest commit). Thanks! |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Re-reviewed the full PR (all four commits). All changes are correct and the full suite passes (913 passed, 6 skipped — up 1 from round-3 due to the new
No issues found. The PR is ready to merge. |
PR #5 added the agent-friendly `featcopilot` CLI (`info` / `transform` / `explain`) — a new user-facing surface that warrants a minor-version bump under semver. PR #6 followed up with three correctness fixes (`Feature.compute` safe-builtins, `SyncCopilotFeatureClient` public-API consistency, `FeatureCache` exception logging) and the new CLI documentation page. This commit bumps: - `pyproject.toml::version` 0.3.7 → 0.4.0 - `docs/getting-started/installation.md` sample expected output - `docs/user-guide/cli.md` `info` JSON sample The CLI's `info --json` reports the actual installed version via `featcopilot.__version__` (read from package metadata at import time), so all three references stay in sync once installed. Local: `python -m pytest tests/test_cli.py` passes 135/135 with the bumped version (the `test_info_json_emits_supported_options` and related tests cross-check `payload["version"] == __version__`). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Post-PR-#5 sweep of the repo. Three real correctness bugs and three documentation gaps:
Bug fixes
featcopilot/core/feature.pyFeature.computecalledexec(code, {"__builtins__": {}}, ...)so any stored snippet using a Python builtin (len,range,int,sum, ...) raisedNameErrorat compute time._SAFE_BUILTINSdict mirroringTransformRule._get_safe_builtins;computenow passes it as__builtins__.featcopilot/llm/__init__.pySyncCopilotFeatureClientwas defined incopilot_client.pyand used internally but missing from package imports +__all__, whileSyncOpenAIFeatureClient/SyncLiteLLMFeatureClientwere both exported.from featcopilot.llm import SyncCopilotFeatureClientraisedImportError.__all__entry so all three sync clients have a consistent public surface.featcopilot/utils/cache.pyexcept Exception: passblocks silently swallowed disk I/O / pickle / JSON errors when on-disk cache entries were corrupted, values were unpicklable, or metadata JSON was truncated.logger.warning(...)so callers can diagnose disk issues / version mismatches from logs; the "cache is best-effort, never crash the caller" contract is preserved.Documentation fixes
docs/user-guide/cli.md(NEW) +mkdocs.ymlfeatcopilotCLI added in #5 (info/transform/explain) was undocumented in mkdocs ΓÇö only mentioned in the README.stdout=JSON,stderr=failures, exit codes), all flags, the--configexample, parquet caveat, and agentic-usage tips. Added nav entry.docs/getting-started/installation.md:82# Output: 0.1.0but currentpyproject.tomlis0.3.7.# Output: 0.3.7.Tests
tests/test_core.py::TestFeature::test_feature_compute_uses_safe_builtinsΓÇö pins the whitelist (len,range,sum,int,abs,round,min,max).tests/test_utils.pyΓÇö three newFeatureCacheregression tests for corrupted.pklreads, unpicklable values onset, and corrupted metadata JSON, each assertinglogger.warningis called viamock.patchon the module logger.tests/test_llm_modules.py::TestLLMPackageExportsΓÇö pins thefeatcopilot.llmpublic API: all three sync clients are importable directly AND listed in__all__, with identity checks against the underlying submodule definitions.Verification
python -m pytest tests/ -q: 912 passed, 2 skipped in 4m 21s on Windows / Python 3.13 (→ +6 over main's 906).Out of scope
Several speculative findings from the audit were intentionally left out:
"gpt-5.2"default model name is the project's chosen identifier inutils/models.pyand four LLM clients; changing it would be a behavioral change, not a doc fix.except Exceptioninbenchmarks/simple_models/run_simple_models_benchmark.py) is not on the contract surface and is best handled in a dedicated benchmark hardening PR.Closes the post-#5 cleanup item; no functional contract changes for end users beyond making
Feature.computeactually work for snippets that use builtins.