Skip to content

feat(cli): add plugin catalog core#618

Open
johnnygreco wants to merge 34 commits intomainfrom
johnny/feat/617/plugin-catalog-cli-core
Open

feat(cli): add plugin catalog core#618
johnnygreco wants to merge 34 commits intomainfrom
johnny/feat/617/plugin-catalog-cli-core

Conversation

@johnnygreco
Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco commented May 7, 2026

📋 Summary

This PR adds the client-side plugin package workflow for Data Designer. The new data-designer plugin command discovers package-first catalogs, shows compatibility-aware package metadata, installs and uninstalls plugin packages, and verifies the runtime plugin entry points exposed by those packages.

The implementation follows the current NVIDIA-NeMo/DataDesignerPlugins catalog contract and uses the stable NVIDIA catalog at https://nvidia-nemo.github.io/DataDesignerPlugins/catalog/plugins.json as the built-in trusted source.

🔗 Related Issue

Closes #617

🔄 Changes

Public CLI

  • Adds the singular data-designer plugin command group with list, search, info, install, uninstall, and installed.
  • Adds singular catalog management under data-designer plugin catalog list/add/remove.
  • Keeps package distributions as the only install/uninstall unit. Commands that target a package accept either the full package name, such as data-designer-calculator, or its package alias, such as calculator.
  • Shows the runtime plugins exposed by a package in catalog and installed views without treating runtime plugin names as install targets.
  • Includes package descriptions in catalog list/search output.

Catalog Contract And Discovery

  • Adds schema v2 catalog models and strict validation for package-first catalog payloads: package metadata, docs URLs, install requirements, compatibility constraints, and nested runtime plugin entry points.
  • Adds the built-in nvidia catalog, user-managed catalog aliases in plugin_catalogs.yaml, and DATA_DESIGNER_DEFAULT_PLUGIN_CATALOG_URL for QA/staging overrides.
  • Supports remote catalog URLs, GitHub repository/tree/blob URLs, local catalog files, and local catalog directories, normalizing each to the catalog/plugins.json layout used by the plugin repo.
  • Caches catalog payloads by alias and URL hash, writes cache files atomically, and only falls back to stale cache data when the catalog source cannot be fetched.

Compatibility And Package Management

  • Evaluates catalog-declared Python and Data Designer compatibility before install, hides incompatible packages by default, and still supports dry-run inspection of incompatible install plans.
  • Builds manager-specific install/uninstall plans for uv project installs, uv pip environment installs, and pip-only environments.
  • Uses uv add --active when the user is in an active virtual environment with a user pyproject.toml, so plugin packages are recorded in the project.
  • Keeps pip support for constrained or sandboxed environments where uv is unavailable; auto mode prefers supported uv and falls back to pip when needed.
  • Protects the active Data Designer package family (data-designer, data-designer-config, and data-designer-engine) from being reinstalled or upgraded as plugin package dependencies.
  • Verifies every declared runtime entry point after install and verifies removal after uninstall.

Docs And Tests

  • Updates the CLI architecture docs and CLI README for package-first catalogs, package aliases, install/uninstall behavior, cache behavior, and catalog commands.
  • Adds unit coverage across command dispatch, controller UX, catalog schema validation, repository loading/caching, package alias resolution, compatibility filtering, install/uninstall planning, entry-point verification, and lazy CLI loading.
  • Adds upstream DataDesignerPlugins catalog fixtures as consumer contract tests plus a network-gated production catalog smoke test.
  • Refreshes the existing packaging lockfile entry used by the new catalog/compatibility code.

👀 Review Focus

🧪 Testing

Local verification:

  • uv run --package data-designer pytest packages/data-designer/tests/cli -q (690 passed, 1 skipped, 86 warnings)
  • make format-check
  • make lint
  • git diff --check
  • Temporary-directory integration gauntlet covering stable/local catalogs, aliases/full names, runtime-name rejection, compatibility blocking, uv project installs/uninstalls, uv environment installs/uninstalls, pip-only installs/uninstalls, Data Designer version preservation, and temporary constraint cleanup (145 checks)

✅ Checklist

  • Unit tests added/updated
  • Architecture and CLI docs updated
  • Consistent with the current DataDesignerPlugins package catalog and package-index publishing flow
  • Keeps install, uninstall, and info package-only while still displaying runtime plugin metadata

Description updated with AI

@johnnygreco johnnygreco requested a review from a team as a code owner May 7, 2026 20:58
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Code Review: PR #618 — feat(cli): add plugin catalog core

Summary

Adds a data-designer plugins command group that lets users browse, search, inspect, and install plugins from "tap" catalogs (built-in NVIDIA tap plus user-added aliases), plus a strict schema-v2 validator that matches the DataDesignerPlugins PR #36 contract. The implementation follows the repo's existing command → controller → service → repository CLI layering, adds packaging>=24,<27 as a runtime dep, and ships ~850 lines of test coverage across the four new layers.

Overall the PR is well-structured, strictly validated, and thoughtfully tested. Comments below are mostly minor; one high-value item (entry point group constant usage) and a few smaller polish items.

Findings

Correctness

  • plugin_catalog.py:900 and plugin_catalog_service.py:1796 hard-code "data_designer.plugins" instead of using the PLUGIN_ENTRY_POINT_GROUP constant defined in plugin_catalog.py:839. Three call-sites with the same literal invite drift — swap the string for the constant (including the PluginEntryPointInfo.group default and importlib.metadata.entry_points(group=...) call).
  • PluginTapRepository.load() (repositories/plugin_tap_repository.py:1359) swallows every exception and returns None. A corrupted plugin_taps.yaml silently disappears from the user's config — they get the built-in tap back and no error. Consider narrowing the catch (at least to OSError | yaml.YAMLError | ValidationError) and surfacing a warning so the user knows their file didn't parse.
  • _load_cached_catalog (repositories/plugin_tap_repository.py:1473) also catches bare Exception, which hides JSON corruption / partial writes. A print_warning on fall-through would help debuggability without changing behavior.
  • PluginCatalogController exception handling is broad. run_search, _list_entries_or_exit, _get_entry_or_exit, run_taps_remove, etc. catch bare Exceptionprint_error(str(e)). Prefer catching the known types (PluginCatalogError, ValueError, URLError, OSError) so genuine bugs (e.g. AttributeError during refactors) don't turn into vague user-facing messages.
  • run_taps_add double-catches ValidationError and then Exception. Since the repository only raises ValueError (duplicate) and Pydantic only raises ValidationError, the bare Exception branch is dead — remove it or narrow it.
  • plugins_callback's dummy _ = tap is load-bearing but unobvious. A one-line comment ("# read via ctx.parent.params in subcommands") would save the next reader a trip to Click's internals. (Feedback instruction says "no comments unless WHY is non-obvious" — this qualifies.)
  • _resolve_path_source (services/plugin_install_service.py:2029) assumes the tap URL points to <root>/catalog/plugins.json for path sources. This matches normalize_tap_location, but a user who passes a local catalog JSON named something else gets Path.parent which may or may not be what they want. Catalog-level _validate_package_path already blocks .. traversal, so this is a UX rather than a security issue — worth a test for the non-canonical tap location case.

Schema validation

  • Strict key sets (CATALOG_PLUGIN_KEYS etc.) require exact equality. Good for the v2 freeze, but note: the Pydantic models carry ConfigDict(extra="allow") which becomes unreachable once the strict validator runs first. Either drop extra="allow" (defensive; strict is already enforced) or keep the asymmetry deliberately and document why in the module docstring. Current code works but the intent is ambiguous.
  • _catalog_data_designer_compatibility enforces specifier == str(requirement.specifier) exactly. Catalog authors who write data-designer (>=0.5.7) in requirement and >=0.5.7 in specifier will hit spurious mismatches if Requirement's canonical form differs from what they wrote. Likely fine in practice (tap schema v2 controls the format), but worth noting to the tap-producer side.
  • Catalog size limit is exactly 1 * 1024 * 1024. A 1 MB cap for a JSON catalog is reasonable but quite tight if the NVIDIA tap grows. Consider 4–8 MB, or document the cap in the tap schema so producers know.

Security

  • urlopen(request, timeout=10) with 1 MB read cap is solid: no shell exec, size-capped read, http/https scheme guard, explicit timeout. ✓
  • Subprocess install uses list-form subprocess.run(command, check=False) — no shell, no injection. ✓
  • PEP 508 direct references (git install target) are built from validated fields (url scheme-checked, subdirectory passed through _validate_package_path). ✓
  • Untrusted-tap install warning is printed but does not block. That matches the design (force confirmation via confirm_action(default=False)), but reviewers should note that --yes on an untrusted tap skips the prompt with only a yellow warning. Consider requiring an explicit --allow-untrusted flag when combining --yes with a non-trusted tap.
  • Remote fetch follows redirects implicitly via urlopen's default HTTPRedirectHandler, which restricts to http/https/ftp — no file:// escape. ✓

Tests

  • Coverage is broad: command delegation, controller branching (dry-run, incompatible, force, trusted/untrusted), repository (cache keying, zero-TTL refresh, schema rejection, case-insensitive aliases, duplicate runtime names), service (compatibility evaluation, grouping, installed entry points), install service (pypi/git/path, uv vs pip, verification flow).
  • Missing: no test exercises _fetch_remote_catalog error paths (HTTP 4xx/5xx, oversized body, decode failure). A single patch("urllib.request.urlopen") test per branch would round this out.
  • Missing: no test for the "cache fallback when fresh fetch fails" branch in load_catalog. That's the main reason to have a cache — worth one direct test.

Style / minor

  • Several raise typer.Exit(code=1) blocks follow except ... as e: but don't chain (from e). The project convention appears to be bare raise typer.Exit(...) for CLI errors, so this is consistent, but the lost traceback makes debug logs less useful. Consider a debug-mode env var that re-raises.
  • plugin_catalog.py is 490 lines with 20+ module-level constants and helper _-prefixed validators. Consider splitting the validator helpers into plugin_catalog/_validation.py in a follow-up — not blocking.
  • InstalledPluginInfo, InstallPlan, CompatibilityResult are @dataclass(frozen=True) while the catalog models are Pydantic. Fine, but an out-of-band explanation ("internal DTOs vs serialized catalog schema") in the module docstring would help.
  • _fetch_catalog_payload return type is dict; under the project's modern-typing rule it should be dict[str, object] (already done elsewhere in this PR).

Verdict

Approve with minor requests. The architecture is sound, the schema validation is appropriately strict for a v2 contract freeze, and the test coverage is solid. Action items in priority order:

  1. Replace three "data_designer.plugins" string literals with PLUGIN_ENTRY_POINT_GROUP.
  2. Narrow the bare except Exception catches in the controller and repository; drop the dead Exception branch in run_taps_add.
  3. Surface (don't swallow) load errors for plugin_taps.yaml and cache files.
  4. Add a couple of tests for remote-fetch error paths and cache fallback.
  5. Consider --allow-untrusted as a stronger guard when --yes + untrusted tap combine.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR introduces the data-designer plugin command group, providing a full client-side plugin catalog workflow: browsing, searching, installing, uninstalling, and verifying runtime entry points from package-first catalogs. The implementation covers schema v2 catalog validation, multi-source URL normalization, cache management, and both uv and pip install paths with Data Designer package-family protection.

  • New modules: plugin_catalog.py (schema/models), plugin_catalog_repository.py (alias persistence, caching, URL normalization), plugin_catalog_service.py (compatibility evaluation, entry-point discovery), plugin_install_service.py (uv/pip plan building and verification), plugin_catalog_controller.py (UX/dispatch), and commands/plugin.py (CLI surface).
  • Catalog contract: strict schema v2 key validation (additive changes require a version bump), duplicate package/entry-point deduplication, specifier/marker cross-consistency enforcement, and stale-cache fallback when the remote source is unreachable.
  • Install safety: Data Designer package family is protected from unintended upgrades via --excludes - (uv environment), --no-install-package (uv project), and a pip constraint file; post-install entry-point verification uses importlib.metadata directly to avoid coupling on catalog name vs. runtime discriminator.

Confidence Score: 5/5

Safe to merge; all previously flagged issues have been addressed and no new defects were found in the current HEAD.

The plugin catalog workflow is fully self-contained with no mutations to existing code paths. The schema validator, cache layer, install service, and compatibility evaluator all behave correctly: strict catalog key validation is intentional by design, the atomic temp-file rename correctly cleans up on both success and failure, the uv-environment excludes list is passed via stdin as documented, and post-install verification uses importlib.metadata entry points rather than the registry discriminator. The 690-test suite covers commands, controllers, catalog schema, repository caching, compatibility logic, alias resolution, install/uninstall planning, and entry-point verification.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer/src/data_designer/cli/plugin_catalog.py Schema v2 models, strict catalog validator, and shared dataclasses; deduplication keyed correctly by entry_point.name; specifier/marker cross-consistency enforced.
packages/data-designer/src/data_designer/cli/repositories/plugin_catalog_repository.py Catalog alias persistence, URL normalization (including GitHub tree subdirectory fix), atomic PID-keyed cache writes, and stale-cache fallback are all correct.
packages/data-designer/src/data_designer/cli/services/plugin_install_service.py uv/pip plan construction, active-project detection, Data Designer protection (excludes/constraints/no-install-package), and entry-point verification via importlib.metadata are correct.
packages/data-designer/src/data_designer/cli/controllers/plugin_catalog_controller.py Compatibility gate, dry-run path, install/uninstall dispatch, and empty-list UX guidance are all implemented correctly.
packages/data-designer/src/data_designer/cli/services/plugin_catalog_service.py Compatibility evaluation correctly merges the supplied python_version override into the default PEP 508 environment; alias resolution, grouping, and installed-plugin listing look correct.
packages/data-designer/src/data_designer/cli/commands/plugin.py CLI surface; parent-catalog alias propagation and unused-catalog warnings are correctly implemented.
packages/data-designer/src/data_designer/cli/lazy_group.py Adds parse_args override to restore no_args_is_help behavior for lazy-loaded subcommand groups; correct and non-breaking.
packages/data-designer/src/data_designer/cli/main.py Wires plugin_app and plugin_catalog_app into the main CLI; plugin callback correctly captures --catalog for propagation to subcommands.

Reviews (17): Last reviewed commit: "remove plugin catalog trust flag" | Re-trigger Greptile

Comment thread packages/data-designer/src/data_designer/cli/plugin_catalog.py Outdated
@johnnygreco johnnygreco force-pushed the johnny/feat/617/plugin-catalog-cli-core branch from 4549bd7 to 3916648 Compare May 8, 2026 02:43
Comment thread packages/data-designer/src/data_designer/cli/services/plugin_install_service.py Outdated
@johnnygreco
Copy link
Copy Markdown
Contributor Author

Greptile follow-up addressed in 955a675.

What changed:

  • Aligned the plugin CLI UX/docs around catalog packages and runtime plugins, matching the current DataDesignerPlugins contract.
  • Synced validation and repository behavior with the merged plugin repo updates: package-first schema, optional install.index_url handling, duplicate canonical package rejection, catalog-directory URL/path normalization, and untrusted env-overridden default catalogs.
  • Strengthened install verification to match entry-point name, target value, and owning distribution for all declared runtime plugins in a package.
  • Replaced broad controller/repository error handling with known catalog/config error paths and added coverage for invalid registries, remote fetch failures, stale-cache fallback, package-first display, compatibility filtering, and lazy CLI wiring.
  • Confirmed tracked source/docs no longer contain the old vocabulary.

Verified with:

  • uv run --package data-designer pytest packages/data-designer/tests/cli -q
  • make format-check
  • make lint
  • Live validation of https://nvidia-nemo.github.io/DataDesignerPlugins/catalog/plugins.json against the local catalog validator

Comment thread packages/data-designer/src/data_designer/cli/plugin_catalog.py
"""Return a catalog entry by runtime plugin name or package name."""
entries = self.list_entries(catalog_alias, refresh=refresh, include_incompatible=True)
canonical_name = canonicalize_name(name)
matches = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate from the casing issue already noted in the PR summary: this combines exact runtime-plugin matches and package-name matches, then picks by sort order. If package alpha exists and another package exposes runtime plugin alpha, plugins install alpha can select the package match instead of the exact runtime name. Could separate the two match sets and either prefer runtime names explicitly or raise an ambiguity error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in fb27046, and the CLI path is now package-only as well. plugins install, plugins info, and plugins uninstall resolve through get_package_entries() using a full package name or package alias, so runtime plugin names no longer provide an install affordance. I also tightened get_entry() itself to separate runtime-plugin matches from package-name matches and prefer exact runtime-plugin matches, with a regression test for the alpha collision.

@eric-tramel
Copy link
Copy Markdown
Contributor

While testing the plugin install flow, I hit one more install-safety issue.

The plugin package installs successfully, but the resolver can also replace the Data Designer packages in the current environment. In my test, installing data-designer-github changed the local/editable data-designer, data-designer-config, and data-designer-engine packages to the published 0.5.9 wheels, so the plugin command disappeared immediately afterward.

A simple approach for the uv path would be to treat the Data Designer host packages as protected during plugin installs, e.g. generate/pass an excludes file containing:

data-designer
data-designer-config
data-designer-engine

and include it in the install plan:

uv pip install \
  --python <current-python> \
  --excludes <protected-packages.txt> \
  --default-index https://pypi.org/simple/ \
  --index <catalog-index> \
  <plugin-requirement>

I tried that shape locally with --dry-run; it installs only the plugin package/deps and no longer plans to uninstall/reinstall the Data Designer packages. It would also be worth adding a regression test around the generated install plan and, ideally, a pre/post guard that errors or warns if those protected packages would change.

For pip, I do not think there is an equally clean equivalent to uv --excludes, so the safest near-term policy may be to require uv for catalog installs or make the pip path do an explicit dry-run/report guard before mutating the environment.

@johnnygreco
Copy link
Copy Markdown
Contributor Author

johnnygreco commented May 9, 2026

@eric-tramel thanks, this install-safety issue is now addressed in the latest pushed commits.

What changed:

  • uv pip install environment mode now protects the full Data Designer package family with --excludes - and stdin entries for:
    • data-designer
    • data-designer-config
    • data-designer-engine
  • uv add project mode protects the same package family with --no-install-package for each distribution, and also uses --no-install-project so installing a plugin does not build/install the user's current project as a side effect.
  • The pip path is still supported for pip-only/sandboxed environments, but now uses a process-scoped temporary constraints file pinning all three installed Data Designer distributions to their current versions.
  • uv >= 0.6.0 is now required for uv plugin installs; auto mode falls back to pip with a warning when uv is unavailable or too old, while explicit --manager uv errors clearly.

Coverage added/verified:

  • Unit tests assert the generated uv and pip install plans include the protected package family.
  • Unit tests assert the pip constraint file is temporary and cleaned up after install.
  • The temporary-directory integration gauntlet verified uv project, uv environment, and pip-only installs/uninstalls while confirming the installed Data Designer versions do not change.

I did not add a separate pre/post mutation guard in code because the resolver commands now carry the protection directly, and pip has the equivalent constraints protection for the pip-only case. Happy to add an extra post-install belt-and-suspenders version check later if we decide the extra noise is worth it, but I don't think it should block this PR now.

johnnygreco added 13 commits May 9, 2026 01:19
Add typed catalog and tap models, persistent tap storage, cached
catalog loading, compatibility evaluation, install plan generation,
and runtime plugin discovery helpers.

Refs #617
Wire list, search, info, install, installed, and tap management
commands through the existing command-controller CLI pattern.

Refs #617
Add regression coverage for tap caching, catalog compatibility,
installer command generation, local path resolution, and Typer command
delegation.

Refs #617
Validate tap catalogs against the schema v2 contract used by
NVIDIA-NeMo/DataDesignerPlugins#36, including source union fields,
docs URLs, package paths, compatibility metadata, and unique runtime
plugin names.

Derive Git install targets as package-qualified PEP 508 direct
references so git tap entries install the package described by the
catalog source metadata.

Refs #617
- Invalidate import caches before post-install entry point verification
- Make tap aliases case-insensitive and cache catalogs by alias plus URL
- Prefer compatible catalog entries before falling back to forced installs
- Clarify unused --tap behavior and list installed entry points without imports
- Add direct controller coverage and update CLI plugin documentation

Refs #617
Fetch install targets before compatibility filtering so the controller
owns the final --force decision and the incompatible install guard stays
reachable.

Refs #617
Apply ruff formatting to the plugin command and tap repository tests so
CI format checks pass on the PR merge commit.

Refs #617
Key catalog duplicate detection by entry_point.name so distinct catalog
entries cannot register the same runtime plugin name.

Refs #617
- adopt catalog terminology for plugin source aliases
- parse package-first plugin catalog metadata from the plugin repo
- install package requirements with optional catalog indexes
@johnnygreco johnnygreco force-pushed the johnny/feat/617/plugin-catalog-cli-core branch from 8cda7f4 to ff50a1e Compare May 9, 2026 01:19
@johnnygreco johnnygreco force-pushed the johnny/feat/617/plugin-catalog-cli-core branch from ff50a1e to 13a1c9b Compare May 9, 2026 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build Data Designer plugin catalog CLI core

3 participants