Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .claude/rules/skill-guidance.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ This project has guided skills for common workflows. **Proactively suggest the r
| `/review-pr` | Review a PR, audit code changes, check PR quality, validate a PR against standards |
| `/adr` | Record an architecture decision, choose between frameworks/libraries/patterns, query past decisions |
| `/scaffold-snowflake-connector` | Add a new Snowflake-connector data source or integration |
| `/packages-worker-setup` | First-time setup of packages-db and github-repos-enricher for a new engineer |
| `/packages-worker-add-entrypoint` | Scaffold a new sibling worker inside packages_worker (npm, OSV, scorecard, etc.) |

## Trigger Phrases

Expand Down Expand Up @@ -45,3 +47,13 @@ This project has guided skills for common workflows. **Proactively suggest the r
**`/scaffold-snowflake-connector`** — match any of these intents:
- "Add a new Snowflake connector", "New integration for [platform]"
- "Scaffold a new data source", anything about adding a platform to `snowflake_connectors`

**`/packages-worker-setup`** — match any of these intents:
- "Set up packages worker", "how do I run the enricher", "first time on this branch"
- "Get packages-db running", "packages-db won't start", "ENRICHER_GITHUB_TOKENS"
- Any first-time setup question specific to `packages_worker` or `packages-db`

**`/packages-worker-add-entrypoint`** — match any of these intents:
- "Add a new packages worker", "scaffold a sibling worker", "new entry point in packages_worker"
- "Add npm ingestion", "add OSV worker", "add scorecard runner"
- Any request to create a new `src/bin/*.ts` worker inside `packages_worker`
129 changes: 129 additions & 0 deletions .claude/skills/packages-worker-add-entrypoint/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
name: packages-worker-add-entrypoint
description: >
Scaffold a new sub-worker inside packages_worker (npm, deps.dev, osv, scorecard,
etc.) following the single-service multi-entry-point structure. Use when: "add a
new packages worker", "scaffold a sub-worker in packages_worker", "new worker for
packages-db", "add npm worker", "add OSV worker", "add deps.dev worker".
allowed-tools: Read, Write, Edit, Bash, AskUserQuestion, Glob
---

# packages-worker — Add a New Sub-worker

You are adding a new data-ingestion worker to `services/apps/packages_worker/`.
The structure follows the same pattern as `backend/` (where `api.ts` and
`job-generator.ts` share one Dockerfile): one npm package, one Docker image,
each worker in its own `src/{worker}/` directory with its own entry point.

```
services/apps/packages_worker/
src/
bin/
packages-worker.ts ← parent stub
github-repos-enricher.ts ← existing worker
<name>.ts ← entry point you will create
github/ ← existing worker logic
<worker>/ ← directory you will create
index.ts ← main logic for this worker
types.ts
config.ts ← shared — add your config getter here
db.ts ← shared — do not modify
```

## Step 1 — Gather requirements

Ask the engineer for:

1. **Worker name** (kebab-case) — e.g. `npm-sync`, `osv-sync`, `scorecard-runner`. Used as the entry point filename (`src/bin/<name>.ts`) and docker-compose service name.
2. **Worker directory name** (short, lowercase) — e.g. `npm`, `osv`, `scorecard`. Becomes `src/<worker>/`.
3. **What it does** — what data it fetches/writes, what table(s) in packages-db it reads from and writes to.
4. **External API or data source** (if any) — URL, auth method, rate-limit characteristics.
5. **Required env vars** beyond the shared DB vars — e.g. `NPM_API_URL`, `OSV_API_KEY`.

Do not proceed until you have answers to 1–3.

## Step 2 — Read existing files first

```bash
cat services/apps/packages_worker/src/bin/github-repos-enricher.ts
cat services/apps/packages_worker/src/config.ts
cat services/apps/packages_worker/package.json
cat scripts/services/github-repos-enricher.yaml
```

These are the canonical references. Do not deviate from the patterns you see there.

## Step 3 — Scaffold the files

### 3a. Worker directory — `services/apps/packages_worker/src/<worker>/`

Create the directory with at minimum:

**`types.ts`** — types specific to this worker (input/output shapes, error kinds if calling an external API).

**`index.ts`** — the main logic function(s) this worker runs. What goes here depends entirely on what the worker does — do not force a loop shape if it does not fit. Discuss with the engineer what the execution model should be (continuous loop, one-shot batch, event-driven, etc.) and implement accordingly.

Add any additional files the worker needs (e.g. an API client, a DB query helper). All DB access uses inline pg-promise SQL via `qx.select` / `qx.result` / `qx.none` — do not add files to `services/libs/data-access-layer`.

### 3b. Entry point — `services/apps/packages_worker/src/bin/<name>.ts`

Follow the structure of `github-repos-enricher.ts`:
- Import `getServiceLogger` from `@crowd/logging`
- Import your worker's config getter from `../config` and `getPackagesDb` from `../db`
- Import your worker's main function from `../<worker>/index`
- Set `liveFilePath` / `readyFilePath` to `../tmp/<name>-live.tmp` / `../tmp/<name>-ready.tmp`
- Handle SIGINT / SIGTERM with a `shuttingDown` flag
- In `main()`: call config getter → validate any required tokens/keys → `await getPackagesDb()` → `await qx.selectOne('SELECT 1')` → `fs.mkdirSync` for the tmp dir → `setInterval` writing probe files every 5000ms → call your worker's main function → `clearInterval` → `process.exit(0)`
- Fatal handler: `main().catch(err => { log.error({ err }, '<name> fatal error'); process.exit(1) })`

### 3c. Config additions — `services/apps/packages_worker/src/config.ts`

Read the file first, then add a `get<Worker>Config()` function:
- Use `requireEnv(name)` for string vars, `requireEnvInt(name)` for integers
- No defaults, no `?? undefined` — the process must refuse to start on missing config

### 3d. Docker-compose service — `scripts/services/<name>.yaml`

Copy `scripts/services/github-repos-enricher.yaml` and adapt:
- Service names: `<name>` (prod) and `<name>-dev` (dev)
- `command` (prod): `pnpm run start:<name>`
- `command` (dev): `pnpm run dev:<name>`
- `env_file`: keep the same four files (`backend/.env.dist.local`, `backend/.env.dist.composed`, `backend/.env.override.local`, `backend/.env.override.composed`)
- `environment`: set any tuning var defaults inline (avoids requiring them in `.env.override.local` for local dev)
- `volumes` (dev only): bind-mount `./services/apps/packages_worker/src` plus every `services/libs/*/src` directory (copy the full list from the enricher yaml for hot reload)

### 3e. package.json scripts — `services/apps/packages_worker/package.json`

Read the file first, then add:
```json
"start:<name>": "tsx src/bin/<name>.ts",
"dev:<name>": "tsx watch src/bin/<name>.ts"
```

### 3f. Env var files — `backend/.env.dist.local` and `backend/.env.dist.composed`

Append new required vars with empty-string defaults (or sensible local values for non-secrets):
```
NEW_WORKER_API_KEY=
```

## Step 4 — TypeScript check

```bash
cd services/apps/packages_worker && pnpm tsc --noEmit
```

Fix any errors before proceeding.

## Checklist before committing

- [ ] `src/<worker>/` directory created with `types.ts` and `index.ts`
- [ ] `src/bin/<name>.ts` — probe files, SIGINT/SIGTERM handler, fail-fast config check, `SELECT 1` on startup
- [ ] `config.ts` — new `get<Worker>Config()` using `requireEnv`/`requireEnvInt`, no defaults
- [ ] `scripts/services/<name>.yaml` — prod + dev services with bind mounts
- [ ] `package.json` — `start:<name>` and `dev:<name>` scripts added
- [ ] `backend/.env.dist.local` and `.env.dist.composed` — new vars documented
- [ ] No new files in `services/libs/data-access-layer` (packages-db uses inline SQL)
- [ ] `pnpm tsc --noEmit` passes

Use `/preflight` before opening a PR and `/commit` to sign off.
99 changes: 99 additions & 0 deletions .claude/skills/packages-worker-setup/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
name: packages-worker-setup
description: >
Get packages_worker running locally — first time or resuming after a break.
Spins up packages-db if not running, applies any pending migrations, and starts
the worker. All steps are safe to re-run.
Use when: "set up packages worker", "start packages worker", "resume packages worker",
"get packages-db running", "packages-db stopped", "restart the worker".
allowed-tools: Read, Bash, Edit, AskUserQuestion
---

# packages-worker

Get `packages_worker` running locally. All steps are idempotent — safe to run
whether this is your first time or you're resuming after a break.

## Prerequisites check

```bash
git branch --show-current # should be feat/track-packages
docker info --format '{{.ServerVersion}}'
pnpm --version
```

If the branch is wrong: `git checkout feat/track-packages && pnpm i`.

## Step 1 — Start packages-db

No-op if already running.

```bash
docker compose -f scripts/scaffold.yaml up -d packages
until docker compose -f scripts/scaffold.yaml exec packages pg_isready -U postgres; do sleep 1; done
echo "packages-db is ready"
```

## Step 2 — Apply pending migrations

Flyway skips already-applied migrations, so this is safe to re-run.

```bash
arch=$(uname -m)
[ "$arch" = "arm64" ] && PLATFORM="--platform=linux/arm64/v8" || PLATFORM="--platform=linux/amd64"
docker build $PLATFORM -t packages_flyway \
-f backend/src/osspckgs/Dockerfile.flyway backend/src/osspckgs --load

docker run --rm --network crowd-bridge \
-e PGHOST=packages \
-e PGPORT=5432 \
-e PGUSER=postgres \
-e PGPASSWORD=example \
-e PGDATABASE=packages-db \
packages_flyway
```

To create a new migration:

```bash
./scripts/cli scaffold create-packages-migration <descriptive_name>
```

## Step 3 — Start the worker

```bash
DEV=1 ./scripts/cli service packages-worker up
```

Dev mode uses hot reload — edits to `services/apps/packages_worker/src/` and
`services/libs/*/src/` are picked up immediately without restarting.

## Day-to-day commands

```bash
# Follow logs
./scripts/cli service packages-worker logs

# Stop
./scripts/cli service packages-worker down

# Restart
./scripts/cli service packages-worker restart

# Check status
./scripts/cli service packages-worker status
```

## Going further

- Add a new sub-worker (npm-sync, osv-sync, etc.): `/packages-worker-add-entrypoint`
- Record an architecture decision: `/adr`
- Before opening a PR: `/preflight`
- Commit with DCO sign-off: `/commit`

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `Connection refused` on packages-db | Docker not running | `docker compose -f scripts/scaffold.yaml up -d packages` |
| `permission denied: scripts/cli` | CLI not executable | `chmod +x scripts/cli` |
9 changes: 8 additions & 1 deletion backend/.env.dist.composed
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,11 @@ CROWD_OPENSEARCH_NODE=http://open-search:9200
CROWD_TEMPORAL_SERVER_URL=temporal:7233

# Seach sync api
CROWD_SEARCH_SYNC_API_URL=http://search-sync-api:8083
CROWD_SEARCH_SYNC_API_URL=http://search-sync-api:8083
# packages DB (osspckgs)
CROWD_PACKAGES_DB_READ_HOST=packages
CROWD_PACKAGES_DB_WRITE_HOST=packages
CROWD_PACKAGES_DB_PORT=5432
CROWD_PACKAGES_DB_USERNAME=postgres
CROWD_PACKAGES_DB_PASSWORD=example
CROWD_PACKAGES_DB_DATABASE=packages-db
15 changes: 14 additions & 1 deletion backend/.env.dist.local
Original file line number Diff line number Diff line change
Expand Up @@ -166,4 +166,17 @@ CROWD_TINYBIRD_BASE_URL=http://localhost:7181/

# Auth0
CROWD_AUTH0_ISSUER_BASE_URLS=
CROWD_AUTH0_AUDIENCE=
CROWD_AUTH0_AUDIENCE=
# packages DB (osspckgs)
CROWD_PACKAGES_DB_READ_HOST=localhost
CROWD_PACKAGES_DB_WRITE_HOST=localhost
CROWD_PACKAGES_DB_PORT=5434
CROWD_PACKAGES_DB_USERNAME=postgres
CROWD_PACKAGES_DB_PASSWORD=example
CROWD_PACKAGES_DB_DATABASE=packages-db

# github-repos-enricher
ENRICHER_GITHUB_TOKENS=
ENRICHER_BATCH_SIZE=100
ENRICHER_REPO_UPDATE_INTERVAL_HOURS=24
ENRICHER_IDLE_SLEEP_SEC=60
17 changes: 17 additions & 0 deletions backend/src/osspckgs/Dockerfile.flyway
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
FROM flyway/flyway:7.8.1-alpine

USER root

# Install envsubst from gettext used for templating.
RUN apk update \
&& apk add --no-cache gettext

USER flyway

COPY ./flyway_migrate.sh /migrate.sh

# Override default `flyway` entrypoint.
ENTRYPOINT ["/migrate.sh"]

# Copy migrations.
COPY ./migrations /tmp/migrations
17 changes: 17 additions & 0 deletions backend/src/osspckgs/flyway_migrate.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash

set -e
echo "Migrating jdbc:postgresql://${PGHOST}:${PGPORT}/${PGDATABASE}"

flyway \
-locations="filesystem:/tmp/migrations" \
-url="jdbc:postgresql://${PGHOST}:${PGPORT}/${PGDATABASE}" \
-user="$PGUSER" \
-password="$PGPASSWORD" \
-connectRetries=60 \
-outOfOrder=true \
-mixed=true \
-placeholderReplacement=false \
-schemas=public \
-X \
migrate
Loading
Loading