Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
141 commits
Select commit Hold shift + click to select a range
87ac4f1
Add GitHub Actions workflow for AKS deployment and configure client/s…
sombaner Aug 21, 2025
c74025a
Implement memory leak debugging tool and load testing scripts
sombaner Aug 21, 2025
ea032f8
Enable debug endpoints in server deployment configuration
sombaner Aug 21, 2025
3a50b93
Fix debug endpoint registration to ensure it is always registered
sombaner Aug 21, 2025
2d52fb4
Add MemoryLeakTool component and enable debug endpoints in server con…
sombaner Aug 21, 2025
4ce3acb
Initial plan
Copilot Aug 26, 2025
2e7649a
Split deploy-aks.yml into separate server and client workflows
Copilot Aug 26, 2025
9f14edc
Merge pull request #3 from sombaner/copilot/fix-2
sombaner Aug 26, 2025
6eeda50
Create AGENTS.md
sombaner Aug 26, 2025
039988c
Create AGENTS.md
sombaner Aug 26, 2025
2880fa2
Create AGENTS.md
sombaner Aug 26, 2025
bfcbb4f
Initial plan
Copilot Aug 26, 2025
4e6a698
Implement comprehensive end-to-end tests with API and UI integration
Copilot Aug 26, 2025
8ae7bbe
Merge pull request #5 from sombaner/copilot/fix-4
sombaner Aug 26, 2025
5c1c83b
Update AGENTS.md
sombaner Aug 28, 2025
3050436
Update AGENTS.md
sombaner Aug 28, 2025
b2a8b80
Initial plan
Copilot Aug 28, 2025
8e603e5
Fix Memory Leak Tool alignment and button layout issues
Copilot Aug 28, 2025
b5d0a57
Merge pull request #7 from sombaner/copilot/fix-6
sombaner Aug 28, 2025
6a93185
Add weekly-research workflow
sombaner Sep 2, 2025
2794808
Add daily test improver
sombaner Sep 2, 2025
aa5a50d
Add workflow: githubnext/agentics/update-docs
sombaner Sep 3, 2025
86e2250
Merge pull request #10 from sombaner/add-workflow-githubnext-agentics…
sombaner Sep 3, 2025
af8e292
Initial plan
Copilot Oct 26, 2025
c6406cb
Fix workflow permission to allow disabling workflow
Copilot Oct 26, 2025
9f37532
Merge pull request #17 from sombaner/copilot/fix-safety-check-job
sombaner Oct 27, 2025
2b5a662
Create detailed documentation instructions
sombaner Nov 18, 2025
d6577b6
Create documentation instructions for bookstore-supreme
sombaner Nov 18, 2025
053f52f
Add name and description to Documenter agent
sombaner Nov 18, 2025
73ad11a
Fix formatting of Documenter agent name
sombaner Nov 18, 2025
f45c272
feat: Add templates for checklist, plan, spec, and tasks for feature …
sombaner Dec 6, 2025
8324678
Add research and tasks documentation for AKS deployment automation
sombaner Dec 6, 2025
603a2e5
Initial plan
Copilot Dec 6, 2025
efa206d
Complete Phase 1 and Phase 2: Terraform foundation and K8s manifests
Copilot Dec 6, 2025
16124da
Complete Phases 1-5: Infrastructure, workflows, and documentation
Copilot Dec 6, 2025
b211492
Complete Phase 6: Architecture documentation and final polish
Copilot Dec 6, 2025
fc1eee3
Merge pull request #19 from sombaner/copilot/start-implementation-in-…
sombaner Dec 6, 2025
76bfdf4
Merge pull request #20 from sombaner/001-aks-deployment-automation
sombaner Dec 6, 2025
29155f7
Initial plan
Copilot Dec 6, 2025
21c4503
Fix Azure OIDC authentication error by removing environment designation
Copilot Dec 6, 2025
1c516d5
Improve consistency in documentation placeholders
Copilot Dec 6, 2025
81a4961
Merge pull request #22 from sombaner/copilot/fix-azure-cli-login-error
sombaner Dec 6, 2025
73730e9
Initial plan
Copilot Dec 6, 2025
b4768da
Add Terraform backend validation and fix Azure setup guide
Copilot Dec 6, 2025
a258f97
Add comprehensive Terraform backend fix documentation
Copilot Dec 6, 2025
670833b
Merge pull request #23 from sombaner/copilot/fix-terraform-storage-ac…
sombaner Dec 6, 2025
9ecdf32
Initial plan
Copilot Jan 2, 2026
382f920
Fix namespace mismatch in GitHub Actions workflows
Copilot Jan 2, 2026
12bb156
Merge pull request #26 from sombaner/copilot/fix-deploy-client-workflow
sombaner Jan 2, 2026
eb49051
Initial plan
Copilot Jan 3, 2026
f1a868f
Update K8s deployments to use GHCR images and remove imagePullSecrets
Copilot Jan 3, 2026
d0952b9
Simplify workflows and add deployment instructions
Copilot Jan 3, 2026
307534e
Add detailed summary of changes and next steps
Copilot Jan 3, 2026
055f877
Merge pull request #28 from sombaner/copilot/update-image-names-in-de…
sombaner Jan 3, 2026
5b4e274
Initial plan
Copilot Feb 20, 2026
71dd341
Initial plan for issue triage agentic workflow
Copilot Feb 20, 2026
3a61da3
Add issue-triage agentic workflow with compiled lock file
Copilot Feb 20, 2026
46f5eaa
Merge pull request #50 from sombaner/copilot/create-triage-issues-wor…
sombaner Feb 20, 2026
9ae6ab9
Initial plan
Copilot Feb 21, 2026
14dbb5d
feat: add CI Doctor daily report GitHub Agentic Workflow
Copilot Feb 21, 2026
3e531ab
Merge pull request #55 from sombaner/copilot/create-github-agentic-wo…
sombaner Feb 21, 2026
c6b7beb
Update CI Daily Report workflow name
sombaner Feb 21, 2026
b897e15
Initial plan
Copilot Feb 22, 2026
d92d75c
Add CI Issue Trigger agentic workflow for deployment failure analysis
Copilot Feb 22, 2026
7841da3
Merge pull request #58 from sombaner/copilot/create-ci-issue-trigger-…
sombaner Feb 23, 2026
1ead0e8
Fix artifact name for client manifests upload
sombaner Feb 23, 2026
900744e
Simplify condition for activation job
sombaner Feb 23, 2026
efe491c
Create platform-sre-kubernetes.agent.md for SRE guidance
sombaner Feb 23, 2026
9209b01
Fix typo in artifact name for client manifests
sombaner Feb 23, 2026
1663043
Fix: Use GHCR images in k8s manifests and remove imagePullSecrets
sombaner Feb 23, 2026
6a8110f
Trigger Actions: Build and Deploy Client/Server to AKS
sombaner Feb 23, 2026
f9f19ef
chore(ci): trigger AKS deploy workflows - scheduled health check
sombaner Feb 24, 2026
df6417b
chore(ci): automated trigger for AKS client/server deploys (cluster s…
sombaner Feb 26, 2026
d0f9fd5
Scheduled task: trigger Tailspin Client/Server deploy workflows (AKS …
sombaner Feb 27, 2026
e74232e
Trigger AKS deploy workflows due to AKS cluster Stopped state at 2026…
sombaner Feb 28, 2026
19b5f2a
chore: trigger AKS deploy workflows (client & server) due to AKS clus…
sombaner Mar 1, 2026
5ec79d3
Align k8s images with GHCR, remove pull secrets; ensure public images…
sombaner Mar 2, 2026
4114d92
chore(scheduled): trigger client AKS deploy via manifest touch; no fu…
sombaner Mar 2, 2026
70b5180
chore(scheduled): trigger server AKS deploy via manifest touch; no fu…
sombaner Mar 2, 2026
41d9888
chore(sre): scheduled trigger to retrun client AKS deploy workflow (2…
sombaner Mar 2, 2026
19c5577
chore(sre): scheduled trigger to retrun server AKS deploy workflow (2…
sombaner Mar 2, 2026
c3dd3ff
chore: SRE trigger — re-touch client manifest to re-run AKS deploy
sombaner Mar 2, 2026
e0e1061
chore: SRE trigger — re-touch server manifest to re-run AKS deploy
sombaner Mar 2, 2026
971adcb
chore(sre): scheduled trigger to re-run client AKS deploy (2026-03-02…
sombaner Mar 2, 2026
d3abe17
chore(sre): scheduled trigger to re-run server AKS deploy (2026-03-02…
sombaner Mar 2, 2026
9d4dc28
fix(sre): correct server service targetPort and ensure valid manifest…
sombaner Mar 2, 2026
04c0e26
SRE: re-trigger AKS deploy workflows by touching k8s manifests (2026-…
sombaner Mar 2, 2026
e20a4bc
chore: retrigger client AKS deploy (SRE auto-trigger 2026-03-02 09:46…
sombaner Mar 2, 2026
acf7075
chore: retrigger server AKS deploy (SRE auto-trigger 2026-03-02 09:47…
sombaner Mar 2, 2026
24b41a9
SRE: Retrigger AKS client/server deploy workflows (public GHCR images)
sombaner Mar 2, 2026
62d9bba
SRE: Retrigger AKS client/server deploy workflows (touch k8s manifest…
sombaner Mar 2, 2026
7bba0de
SRE: Retrigger AKS client/server deploy workflows — touch manifests (…
sombaner Mar 2, 2026
f7bd73d
chore: retrigger AKS deploy workflows (SRE touch 2026-03-02T10:05Z)
sombaner Mar 2, 2026
02cba58
Revise Copilot instructions for coding and testing standards
sombaner Mar 2, 2026
f71e143
Initial plan
Copilot Mar 2, 2026
4b8e498
Initial plan for search and support comment features
Copilot Mar 2, 2026
1ada223
Add search query parameter to GET /api/games endpoint with tests
Copilot Mar 2, 2026
2e862da
Add search bar to GameList and support comment textbox to GameDetails…
Copilot Mar 2, 2026
b64cfce
Use string concatenation instead of f-string in ilike filter for clarity
Copilot Mar 2, 2026
d7ca6f7
Update client/e2e-tests/games.spec.ts
sombaner Mar 2, 2026
7e6d355
Update server/routes/games.py
sombaner Mar 2, 2026
4e33ddd
Merge pull request #98 from sombaner/copilot/add-search-functionality
sombaner Mar 2, 2026
c9c78be
SRE: Retrigger AKS client deploy workflow (timestamped touch)
sombaner Mar 3, 2026
f1d0bf2
SRE: Retrigger AKS server deploy workflow (timestamped touch)
sombaner Mar 3, 2026
9dba893
SRE: Retrigger AKS deployments for Tailspin client/server
sombaner Mar 3, 2026
cebd77a
SRE: touch to trigger Client AKS deploy workflow (2026-03-03T09:12:37Z)
sombaner Mar 3, 2026
8740f32
SRE: touch to trigger Server AKS deploy workflow (2026-03-03T09:13:20Z)
sombaner Mar 3, 2026
f0b00ef
SRE: touch k8s manifests to retrigger AKS deploy workflows (public GH…
sombaner Mar 3, 2026
044dcfe
SRE: Retrigger AKS deploy workflows (touch manifests) - 2026-03-04
sombaner Mar 4, 2026
6a1f1db
SRE: Fix GHCR image names for client/server to ghcr.io/sombaner/tails…
sombaner Mar 4, 2026
e10b3c4
SRE: Retrigger AKS deploys
sombaner Mar 4, 2026
2aa84a5
sre: fix GHCR image replacements in deploy workflows (client/server)
sombaner Mar 4, 2026
3948dd2
sre: retrigger AKS deploys for client and server (touch manifests)
sombaner Mar 4, 2026
15abf97
SRE: Daily AKS verification retrigger (2026-03-04)
sombaner Mar 4, 2026
5df0830
SRE: Daily AKS verification retrigger (2026-03-04)
sombaner Mar 4, 2026
00a722e
chore(sre): retrigger AKS deploy and align image refs for SHA rendering
sombaner Mar 6, 2026
5618e52
chore(sre): retrigger AKS deploys and align image refs for SHA rendering
sombaner Mar 6, 2026
958b1a2
chore(sre): retrigger AKS deploys for client/server and align image refs
sombaner Mar 6, 2026
cafb8b0
chore(sre): fix GHCR images in k8s manifests and retrigger AKS deploys
sombaner Mar 6, 2026
85faaf6
SRE: Retrigger AKS deploys (2026-03-07)
sombaner Mar 7, 2026
9980c01
SRE: Retrigger AKS deploys (2026-03-07 09:12 UTC)
sombaner Mar 7, 2026
7f45a81
SRE: Retrigger AKS deploys (2026-03-07 09:18 UTC)
sombaner Mar 7, 2026
39e4300
SRE: Fix GHCR image refs and retrigger AKS deploys (2026-03-08)
sombaner Mar 8, 2026
4d6e825
SRE: Retrigger AKS deploys by touching manifests (2026-03-08) (#163)
sombaner Mar 8, 2026
9e40396
SRE: Retrigger AKS deploys by touching manifests (2026-03-08 09:12 UTC)
sombaner Mar 8, 2026
1d65898
SRE retrigger: touch client manifest (2026-03-08T09:18:55Z)
sombaner Mar 8, 2026
c6a8ed1
SRE retrigger: touch server manifest (2026-03-08T09:19:35Z)
sombaner Mar 8, 2026
ca65989
SRE: Retrigger AKS deploys by touching manifests (2026-03-08 09:19 UTC)
sombaner Mar 8, 2026
21ad267
SRE: Retrigger AKS client/server deploys (touch manifests)
sombaner Mar 8, 2026
d581ab3
SRE: Fix image refs and retrigger AKS deploys (2026-03-10)
sombaner Mar 10, 2026
614f266
SRE: Public GHCR images; trigger AKS deploys (2026-03-11)
sombaner Mar 11, 2026
7d25057
SRE: Retrigger AKS deploys for client/server (2026-03-11 09:21 UTC)
sombaner Mar 11, 2026
6c4c77c
SRE: Fix server resources key; add post-deploy AKS health checks
sombaner Mar 11, 2026
83eb785
SRE: daily 09:00 UTC retrigger for AKS client/server deploys (2026-03…
sombaner Mar 12, 2026
6bdf372
add multiple text game support
sombaner Mar 12, 2026
bc7d7c6
Add sorting by popularity, release date, and user rating
sombaner Mar 17, 2026
29e3c79
Add skill for listing pull requests assigned to the user
sombaner Mar 17, 2026
b6c26fa
Add review functionality with API endpoints and UI component
sombaner Mar 30, 2026
11da421
feat: Implement Session Start Security Hook for sensitive data detection
sombaner Apr 18, 2026
3928b2a
docs: scribe orchestration logs, decisions consolidation, team histor…
sombaner Apr 19, 2026
71b3c2d
feat: Implement checkout functionality with payment processing
sombaner Apr 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 14 additions & 0 deletions .copilot/mcp-config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"mcpServers": {
"EXAMPLE-github": {
"command": "npx",
"args": [
"-y",
"@anthropic/github-mcp-server"
],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}"
}
}
}
}
42 changes: 42 additions & 0 deletions .copilot/skills/agent-collaboration/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
name: "agent-collaboration"
description: "Standard collaboration patterns for all squad agents — worktree awareness, decisions, cross-agent communication"
domain: "team-workflow"
confidence: "high"
source: "extracted from charter boilerplate — identical content in 18+ agent charters"
---

## Context

Every agent on the team follows identical collaboration patterns for worktree awareness, decision recording, and cross-agent communication. These were previously duplicated in every charter's Collaboration section (~300 bytes × 18 agents = ~5.4KB of redundant context). Now centralized here.

The coordinator's spawn prompt already instructs agents to read decisions.md and their history.md. This skill adds the patterns for WRITING decisions and requesting help.

## Patterns

### Worktree Awareness
Use the `TEAM ROOT` path provided in your spawn prompt. All `.squad/` paths are relative to this root. If TEAM ROOT is not provided (rare), run `git rev-parse --show-toplevel` as fallback. Never assume CWD is the repo root.

### Decision Recording
After making a decision that affects other team members, write it to:
`.squad/decisions/inbox/{your-name}-{brief-slug}.md`

Format:
```
### {date}: {decision title}
**By:** {Your Name}
**What:** {the decision}
**Why:** {rationale}
```

### Cross-Agent Communication
If you need another team member's input, say so in your response. The coordinator will bring them in. Don't try to do work outside your domain.

### Reviewer Protocol
If you have reviewer authority and reject work: the original author is locked out from revising that artifact. A different agent must own the revision. State who should revise in your rejection response.

## Anti-Patterns
- Don't read all agent charters — you only need your own context + decisions.md
- Don't write directly to `.squad/decisions.md` — always use the inbox drop-box
- Don't modify other agents' history.md files — that's Scribe's job
- Don't assume CWD is the repo root — always use TEAM ROOT
24 changes: 24 additions & 0 deletions .copilot/skills/agent-conduct/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: "agent-conduct"
description: "Shared hard rules enforced across all squad agents"
domain: "team-governance"
confidence: "high"
source: "reskill extraction — Product Isolation Rule and Peer Quality Check appeared in all 20 agent charters"
---

## Context

Every squad agent must follow these two hard rules. They were previously duplicated in every charter. Now they live here as a shared skill, loaded once.

## Patterns

### Product Isolation Rule (hard rule)
Tests, CI workflows, and product code must NEVER depend on specific agent names from any particular squad. "Our squad" must not impact "the squad." No hardcoded references to agent names (Flight, EECOM, FIDO, etc.) in test assertions, CI configs, or product logic. Use generic/parameterized values. If a test needs agent names, use obviously-fake test fixtures (e.g., "test-agent-1", "TestBot").

### Peer Quality Check (hard rule)
Before finishing work, verify your changes don't break existing tests. Run the test suite for files you touched. If CI has been failing, check your changes aren't contributing to the problem. When you learn from mistakes, update your history.md.

## Anti-Patterns
- Don't hardcode dev team agent names in product code or tests
- Don't skip test verification before declaring work done
- Don't ignore pre-existing CI failures that your changes may worsen
151 changes: 151 additions & 0 deletions .copilot/skills/architectural-proposals/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
---
name: "architectural-proposals"
description: "How to write comprehensive architectural proposals that drive alignment before code is written"
domain: "architecture, product-direction"
confidence: "high"
source: "earned (2026-02-21 interactive shell proposal)"
tools:
- name: "view"
description: "Read existing codebase, prior decisions, and team context before proposing changes"
when: "Always read .squad/decisions.md, relevant PRDs, and current architecture docs before writing proposal"
- name: "create"
description: "Create proposal in docs/proposals/ with structured format"
when: "After gathering context, before any implementation work begins"
---

## Context

Proposals create alignment before code is written. Cheaper to change a doc than refactor code. Use this pattern when:
- Architecture shifts invalidate existing assumptions
- Product direction changes require new foundation
- Multiple waves/milestones will be affected by a decision
- External dependencies (Copilot CLI, SDK APIs) change

## Patterns

### Proposal Structure (docs/proposals/)

**Required sections:**
1. **Problem Statement** — Why current state is broken (specific, measurable evidence)
2. **Proposed Architecture** — Solution with technical specifics (not hand-waving)
3. **What Changes** — Impact on existing work (waves, milestones, modules)
4. **What Stays the Same** — Preserve existing functionality (no regression)
5. **Key Decisions Needed** — Explicit choices with recommendations
6. **Risks and Mitigations** — Likelihood + impact + mitigation strategy
7. **Scope** — What's in v1, what's deferred (timeline clarity)

**Optional sections:**
- Implementation Plan (high-level milestones)
- Success Criteria (measurable outcomes)
- Open Questions (unresolved items)
- Appendix (prior art, alternatives considered)

### Tone Ceiling Enforcement

**Always:**
- Cite specific evidence (user reports, performance data, failure modes)
- Justify recommendations with technical rationale
- Acknowledge trade-offs (no perfect solutions)
- Be specific about APIs, libraries, file paths

**Never:**
- Hype ("revolutionary", "game-changing")
- Hand-waving ("we'll figure it out later")
- Unsubstantiated claims ("users will love this")
- Vague timelines ("soon", "eventually")

### Wave Restructuring Pattern

When a proposal invalidates existing wave structure:
1. **Acknowledge the shift:** "This becomes Wave 0 (Foundation)"
2. **Cascade impacts:** Adjust downstream waves (Wave 1, Wave 2, Wave 3)
3. **Preserve non-blocking work:** Identify what can proceed in parallel
4. **Update dependencies:** Document new blocking relationships

**Example (Interactive Shell):**
- Wave 0 (NEW): Interactive Shell — blocks all other waves
- Wave 1 (ADJUSTED): npm Distribution — shell bundled in cli.js
- Wave 2 (DEFERRED): SquadUI — waits for shell foundation
- Wave 3 (ADJUSTED): Public Docs — now documents shell as primary interface

### Decision Framing

**Format:** "Recommendation: X (recommended) or alternatives?"

**Components:**
- Recommendation (pick one, justify)
- Alternatives (what else was considered)
- Decision rationale (why recommended option wins)
- Needs sign-off from (which agents/roles must approve)

**Example:**
```
### 1. Terminal UI Library: `ink` (recommended) or alternatives?

**Recommendation:** `ink`
**Alternatives:** `blessed`, raw readline
**Decision rationale:** Component model enables testable UI. Battle-tested ecosystem.

**Needs sign-off from:** Brady (product direction), Fortier (runtime performance)
```

### Risk Documentation

**Format per risk:**
- **Risk:** Specific failure mode
- **Likelihood:** Low / Medium / High (not percentages)
- **Impact:** Low / Medium / High
- **Mitigation:** Concrete actions (measurable)

**Example:**
```
### Risk 2: SDK Streaming Reliability

**Risk:** SDK streaming events might drop messages or arrive out of order.
**Likelihood:** Low (SDK is production-grade).
**Impact:** High — broken streaming makes shell unusable.

**Mitigation:**
- Add integration test: Send 1000-message stream, verify all deltas arrive in order
- Implement fallback: If streaming fails, fall back to polling session state
- Log all SDK events to `.squad/orchestration-log/sdk-events.jsonl` for debugging
```

## Examples

**File references from interactive shell proposal:**
- Full proposal: `docs/proposals/squad-interactive-shell.md`
- User directive: `.squad/decisions/inbox/copilot-directive-2026-02-21T202535Z.md`
- Team decisions: `.squad/decisions.md`
- Current architecture: `docs/architecture/module-map.md`, `docs/prd-23-release-readiness.md`

**Key patterns demonstrated:**
1. Read user directive first (understand the "why")
2. Survey current architecture (module map, existing waves)
3. Research SDK APIs (exploration task to validate feasibility)
4. Document problem with specific evidence (unreliable handoffs, zero visibility, UX mismatch)
5. Propose solution with technical specifics (ink components, SDK session management, spawn.ts module)
6. Restructure waves when foundation shifts (Wave 0 becomes blocker)
7. Preserve backward compatibility (squad.agent.md still works, VS Code mode unchanged)
8. Frame decisions explicitly (5 key decisions with recommendations)
9. Document risks with mitigations (5 risks, each with concrete actions)
10. Define scope (what's in v1 vs. deferred)

## Anti-Patterns

**Avoid:**
- ❌ Proposals without problem statements (solution-first thinking)
- ❌ Vague architecture ("we'll use a shell") — be specific (ink components, session registry, spawn.ts)
- ❌ Ignoring existing work — always document impact on waves/milestones
- ❌ No risk analysis — every architecture has risks, document them
- ❌ Unbounded scope — draw the v1 line explicitly
- ❌ Missing decision ownership — always say "needs sign-off from X"
- ❌ No backward compatibility plan — users don't care about your replatform
- ❌ Hand-waving timelines ("a few weeks") — be specific (2-3 weeks, 1 engineer full-time)

**Red flags in proposal reviews:**
- "Users will love this" (citation needed)
- "We'll figure out X later" (scope creep incoming)
- "This is revolutionary" (tone ceiling violation)
- No section on "What Stays the Same" (regression risk)
- No risks documented (wishful thinking)
84 changes: 84 additions & 0 deletions .copilot/skills/ci-validation-gates/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
name: "ci-validation-gates"
description: "Defensive CI/CD patterns: semver validation, token checks, retry logic, draft detection — earned from v0.8.22"
domain: "ci-cd"
confidence: "high"
source: "extracted from Drucker and Trejo charters — earned knowledge from v0.8.22 release incident"
---

## Context

CI workflows must be defensive. These patterns were learned from the v0.8.22 release disaster where invalid semver, wrong token types, missing retry logic, and draft releases caused a multi-hour outage. Both Drucker (CI/CD) and Trejo (Release Manager) carried this knowledge in their charters — now centralized here.

## Patterns

### Semver Validation Gate
Every publish workflow MUST validate version format before `npm publish`. 4-part versions (e.g., 0.8.21.4) are NOT valid semver — npm mangles them.

```yaml
- name: Validate semver
run: |
VERSION="${{ github.event.release.tag_name }}"
VERSION="${VERSION#v}"
if ! npx semver "$VERSION" > /dev/null 2>&1; then
echo "❌ Invalid semver: $VERSION"
echo "Only 3-part versions (X.Y.Z) or prerelease (X.Y.Z-tag.N) are valid."
exit 1
fi
echo "✅ Valid semver: $VERSION"
```

### NPM Token Type Verification
NPM_TOKEN MUST be an Automation token, not a User token with 2FA:
- User tokens require OTP — CI can't provide it → EOTP error
- Create Automation tokens at npmjs.com → Settings → Access Tokens → Automation
- Verify before first publish in any workflow

### Retry Logic for npm Registry Propagation
npm registry uses eventual consistency. After `npm publish` succeeds, the package may not be immediately queryable.
- Propagation: typically 5-30s, up to 2min in rare cases
- All verify steps: 5 attempts, 15-second intervals
- Log each attempt: "Attempt 1/5: Checking package..."
- Exit loop on success, fail after max attempts

```yaml
- name: Verify package (with retry)
run: |
MAX_ATTEMPTS=5
WAIT_SECONDS=15
for attempt in $(seq 1 $MAX_ATTEMPTS); do
echo "Attempt $attempt/$MAX_ATTEMPTS: Checking $PACKAGE@$VERSION..."
if npm view "$PACKAGE@$VERSION" version > /dev/null 2>&1; then
echo "✅ Package verified"
exit 0
fi
[ $attempt -lt $MAX_ATTEMPTS ] && sleep $WAIT_SECONDS
done
echo "❌ Failed to verify after $MAX_ATTEMPTS attempts"
exit 1
```

### Draft Release Detection
Draft releases don't emit `release: published` event. Workflows MUST:
- Trigger on `release: published` (NOT `created`)
- If using workflow_dispatch: verify release is published via GitHub API before proceeding

### Build Script Protection
Set `SKIP_BUILD_BUMP=1` (or `$env:SKIP_BUILD_BUMP = "1"` on Windows) before ANY release build. bump-build.mjs is for dev builds ONLY — it silently mutates versions.

## Known Failure Modes (v0.8.22 Incident)

| # | What Happened | Root Cause | Prevention |
|---|---------------|-----------|------------|
| 1 | 4-part version published, npm mangled it | No semver validation gate | `npx semver` check before every publish |
| 2 | CI failed 5+ times with EOTP | User token with 2FA | Automation token only |
| 3 | Verify returned false 404 | No retry logic for propagation | 5 attempts, 15s intervals |
| 4 | Workflow never triggered | Draft release doesn't emit event | Never create draft releases |
| 5 | Version mutated during release | bump-build.mjs ran in release | SKIP_BUILD_BUMP=1 |

## Anti-Patterns
- ❌ Publishing without semver validation gate
- ❌ Single-shot verification without retry
- ❌ Hard-coded secrets in workflows
- ❌ Silent CI failures — every error needs actionable output with remediation
- ❌ Assuming npm publish is instantly queryable
47 changes: 47 additions & 0 deletions .copilot/skills/cli-wiring/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Skill: CLI Command Wiring

**Bug class:** Commands implemented in `packages/squad-cli/src/cli/commands/` but never routed in `cli-entry.ts`.

## Checklist — Adding a New CLI Command

1. **Create command file** in `packages/squad-cli/src/cli/commands/<name>.ts`
- Export a `run<Name>(cwd, options)` async function (or class with static methods for utility modules)

2. **Add routing block** in `packages/squad-cli/src/cli-entry.ts` inside `main()`:
```ts
if (cmd === '<name>') {
const { run<Name> } = await import('./cli/commands/<name>.js');
// parse args, call function
await run<Name>(process.cwd(), options);
return;
}
```

3. **Add help text** in the help section of `cli-entry.ts` (search for `Commands:`):
```ts
console.log(` ${BOLD}<name>${RESET} <description>`);
console.log(` Usage: <name> [flags]`);
```

4. **Verify both exist** — the recurring bug is doing step 1 but missing steps 2-3.

## Wiring Patterns by Command Type

| Type | Example | How to wire |
|------|---------|-------------|
| Standard command | `export.ts`, `build.ts` | `run*()` function, parse flags from `args` |
| Placeholder command | `loop`, `hire` | Inline in cli-entry.ts, prints pending message |
| Utility/check module | `rc-tunnel.ts`, `copilot-bridge.ts` | Wire as diagnostic check (e.g., `isDevtunnelAvailable()`) |
| Subcommand of another | `init-remote.ts` | Already used inside parent + standalone alias |

## Common Import Pattern

```ts
import { BOLD, RESET, DIM, RED, GREEN, YELLOW } from './cli/core/output.js';
```

Use dynamic `await import()` for command modules to keep startup fast (lazy loading).

## History

- **#237 / PR #244:** 4 commands wired (rc, copilot-bridge, init-remote, rc-tunnel). aspire, link, loop, hire were already present.
Loading