Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,8 @@ When working with column configurations, understand these key types:
- **`ExpressionColumnConfig`**: Expression-based derived columns (Python eval or Jinja2)
- **`ValidationColumnConfig`**: Validation results (Python, SQL, Code, Remote validators)
- **`SeedDatasetColumnConfig`**: Data from seed datasets
- **`EmbeddingColumnConfig`**: Embedding generation for text columns using a specified model
- **`CustomColumnConfig`**: Custom user-defined column generators via `@custom_column_generator` decorator

See [packages/data-designer-config/src/data_designer/config/column_configs.py](packages/data-designer-config/src/data_designer/config/column_configs.py) for detailed schemas.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@
from data_designer.config.run_config import RunConfig # noqa: F401
from data_designer.config.sampler_constraints import ( # noqa: F401
ColumnInequalityConstraint,
ConstraintType,
InequalityOperator,
Comment on lines +63 to +64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, these leaked over from a different PR πŸ€¦β€β™‚οΈ

ScalarInequalityConstraint,
)
from data_designer.config.sampler_params import ( # noqa: F401
Expand Down Expand Up @@ -168,6 +170,8 @@
"RunConfig": (f"{_MOD_BASE}.run_config", "RunConfig"),
# sampler_constraints
"ColumnInequalityConstraint": (_MOD_SAMPLER_CONSTRAINTS, "ColumnInequalityConstraint"),
"ConstraintType": (_MOD_SAMPLER_CONSTRAINTS, "ConstraintType"),
"InequalityOperator": (_MOD_SAMPLER_CONSTRAINTS, "InequalityOperator"),
"ScalarInequalityConstraint": (_MOD_SAMPLER_CONSTRAINTS, "ScalarInequalityConstraint"),
# sampler_params
"BernoulliMixtureSamplerParams": (_MOD_SAMPLER_PARAMS, "BernoulliMixtureSamplerParams"),
Expand Down
79 changes: 79 additions & 0 deletions skill/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Data Designer Skill for Claude Code

A [Claude Code skill](https://docs.anthropic.com/en/docs/claude-code/skills) that teaches Claude how to generate synthetic datasets using [NVIDIA NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner).

When activated, Claude can design and build complete data generation pipelines β€” choosing the right column types, writing prompts, wiring up dependencies, and iterating on previews β€” all from a natural language description of the dataset you want.

## What's in the skill

```
.claude/skills/data-designer/
β”œβ”€β”€ SKILL.md # Core skill definition and workflow guide
β”œβ”€β”€ references/
β”‚ β”œβ”€β”€ api_reference.md # Complete API documentation
β”‚ └── advanced_patterns.md # Custom columns, MCP tools, multimodal, etc.
β”œβ”€β”€ examples/ # 5 runnable pattern-reference scripts
β”œβ”€β”€ scripts/ # Discovery tools for API introspection
└── hooks/ # Session startup check, ruff lint, ty type-check
```

## Prerequisites

- **[uv](https://docs.astral.sh/uv/getting-started/installation/)** β€” used for environment management and required by the skill's session hooks
- **[Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview)** β€” the CLI that runs the skill
- **Python 3.10+** β€” any version from 3.10 to 3.13 works (`uv` will install it for you)
- **An LLM provider API key** β€” e.g., an [NVIDIA API key](https://build.nvidia.com/) (`NVIDIA_API_KEY`)

## Quick start

### 1. Set up a project and download the skill

```bash
mkdir my-project && cd my-project
mkdir -p .claude/skills
```

Download the `skill/data-designer` folder into `.claude/skills/data-designer`:

```bash
# with curl
curl -L https://github.com/NVIDIA-NeMo/DataDesigner/archive/refs/heads/main.tar.gz \
| tar xz --strip-components=2 -C .claude/skills "DataDesigner-main/skill/data-designer"

# or with wget
wget -qO- https://github.com/NVIDIA-NeMo/DataDesigner/archive/refs/heads/main.tar.gz \
| tar xz --strip-components=2 -C .claude/skills "DataDesigner-main/skill/data-designer"
```

### 2. Create a Python environment and install Data Designer

```bash
uv venv --python 3.13
source .venv/bin/activate
uv pip install --pre data-designer
```

> **Note:** The `--pre` flag installs the latest pre-release.

### 3. Set up your default model providers and models

Use the Data Designer CLI to configure your LLM provider(s) and model(s) interactively:

```bash
# Configure a provider (endpoint, API key, etc.)
data-designer config providers

# Configure model(s) that use the provider
data-designer config models

# Verify your configuration
data-designer config list
```

The CLI walks you through each setting with an interactive prompt. You only need to do this once β€” configurations are saved to `~/.data-designer/`.

### 4. Launch Claude Code

```bash
claude
```
Loading