Skip to content

Latest commit

 

History

History
99 lines (58 loc) · 20.6 KB

File metadata and controls

99 lines (58 loc) · 20.6 KB

gstack-extensions — Project Log

Narrative + decision log for the gstack-extensions repo. For per-skill technical changes, see each skill's own CHANGELOG.md and the repo's git history.

Format: date-headed sections, topic-tagged entries. One line per decision; expand inline if the why is non-obvious.


2026-05-21

[skill][feature-spike] Added; pre-plan risk-discovery skill

New skill at skills/feature-spike/. Triggered by /feature-spike or phrases like "spike this", "spike a feature", "throwaway prove this works", "test the risky part first", "is this even possible", "smallest thing that proves the mechanic". Sits before /plan-eng-review and /autoplan in the workflow; explicitly opts out of karpathy-guidelines production discipline for the duration of the spike. Drives a four-phase loop: (1) lock a one-line outcome of the form "I will know X if Y", reject vague goals; (2) prepare isolation (worktree if dirty, branch if clean) and open SPIKE.md as a live ledger with a THROWAWAY SPIKE header marker, then write the leanest possible falsifier; (3) escalate when blocked via /second-opinion first then user, with "block" defined as two attempts on the same blocking predicate and a one-sentence blocker statement required before escalating; (4) complete the verdict (PROVEN / DISPROVEN / INCONCLUSIVE) in SPIKE.md.

Design history worth keeping: the worked example was the sms-hero "draft injection into messages.google.com reply box" arc from a parallel session, where the user wanted ~30 min of throwaway code to prove the React-aware setter trick works before committing to the full LLM-voice-trained-replies feature. Codex /second-opinion pressure-tested the initial four-phase spine and surfaced four load-bearing changes that all made it in: (a) SPIKE.md as a live ledger created at outcome-lock instead of written at the end. This collapses the unreliable "watch for compaction signals" box mechanic into a self-healing artifact, since a ledger that already exists cannot be lost to compaction; (b) git worktree as the preferred isolation when the working tree is dirty, not auto-stash; (c) "same wall" defined by blocking predicate rather than surface error string, with a forced one-sentence blocker statement before escalation; (d) karpathy opt-out as an on-disk header marker in SPIKE.md plus // SPIKE ONLY inline comments, not just prose in the skill body. The codex 5-phase split was rejected as relabeling; the four-phase shape stayed.

Searched before building: mcpmarket lists a "Technical Spike" community skill, but its philosophy (no time-box, verdict-document-centric, code is secondary) does not match the user's "minutes-to-an-hour, code-IS-the-experiment, throw away when done" intent. Built from scratch. No evals (judgment skill like first-principles-thinking). No CHANGELOG.md or references/ yet (lean v1).

[skill][first-principles-thinking] Added; goal-first reframe coaching skill

New skill at skills/first-principles-thinking/. Triggered by /first-principles-thinking or phrases like "first principles this", "reframe from first principles", "challenge the assumptions", "Musk this", "what's the ultimate goal here". Operates on three input modes (in-flight conversation, referenced plan file, standalone problem statement), auto-detected. Drives a seven-step walk: goal, success signal, hard constraints, assumed constraints, current path, constraint-class attacks (3+ reframes each attacking a different class), pressure-test. Modeled on the SpaceX rocket-floor and Neuralink no-surgeons examples; both worked examples ship in the skill body.

Default operating mode is Light (agent drafts each step, user corrects); escalates to Deep (Socratic, one question per turn) only when user pushes back or context is missing. No upfront mode-pick gate. Artifact behavior is chat-only by default; asks before writing to a plan file or FIRST-PRINCIPLES.md.

Design history worth keeping: original draft had a six-step flow centered on "find the floor" as a mandatory step. Codex second opinion pushed back hard on this (false-floors risk, numeric-fetish risk, gap-theater on problems with no real floor) and reframed it as "separate goal, constraints, assumptions, and solution path" with floor as an optional subtype of hard constraints. The seven-step flow and the constraint-class-diversity rule for reframes both came from that critique. Decision: incorporate, ship only SpaceX + Neuralink examples (user did not have a third real one yet). No evals (judgment skill, not verifiable). No CHANGELOG.md or references/ yet (lean v1).


2026-05-20

[bundle][qa-quincey] Created QA Quincey bundle; absorbed qa-headless and replaced qa-quincey-browser

New bundle at qa-quincey/ mirroring the pm-penny/ shape: shared/core.md carries the QA Quincey persona, report/plan storage conventions under ~/.gstack/projects/<slug>/qa-quincey/, deviation category vocabulary (LAYOUT, COPY, COLOR, TYPOGRAPHY, MISSING, EXTRA, STATE), the reconcile loop, and the verdict rubric (PASS / DEVIATIONS / FAIL). Two sub-skills live under qa-quincey/skills/:

  • qa-quincey-manual-browser-testing (flagship, new): AI-autonomous defined-flow browser QA against Pencil mockups. Intake from GitHub issue / .pen file / free-form / saved plan. AI narrative visual diff (no pixel diff in v1) per the prompt in references/visual-diff-prompt.md. Chrome profile-aware cookie probe documented in references/cookie-profile-probe.md. Happy-path extraction rules in references/happy-path-extraction.md.
  • qa-quincey-manual-headless-testing (migrated, was skills/qa-headless/): same backend-feature QA workflow as before, now loading shared/core.md at Step 0.5 and writing artifacts under .gstack/qa-quincey/headless-reports/ and .gstack/qa-quincey/headless-golden/ (was .gstack/qa-headless-reports/ and .gstack/qa-headless/golden/). Version bumped to 2.0.0 because the path change and slash-command rename are breaking.

Deleted: skills/qa-quincey-browser/ (the labeled-Chromium-only skill; its scope is now subsumed by the flagship's autonomous browser-driving workflow) and skills/qa-headless/ (moved into the bundle). Stale ~/.claude/skills/qa-quincey-browser and ~/.claude/skills/qa-headless symlinks were removed and ./install regenerated the new bundle symlinks. No alias left behind for /qa-headless per "full refactor" intent; users invoking the old name will get an unknown-skill error and will need to retype.

Design intent: QA Quincey is an agent concept that will accrete sibling skills (mobile, accessibility, performance regression). The bundle gives them one identity file to share so persona and report shape stay consistent. Same pattern as pm-penny. Decided in a /handoff sidequest started from the per-person-memory deploy session; alignment questions answered: AI autonomous (not human-driven, not hybrid), AI narrative visual diff (not pixel diff).

[skill][coderabbit-config] Added, rescued from orphan state

New skill: generates tailored .coderabbit.yaml per repo (inspects languages, monorepo shape, generated/vendored dirs, lifts conventions from CLAUDE.md/AGENTS.md). Originally created by a sibling session directly under ~/.claude/skills/coderabbit-config/, which meant it wouldn't survive a fresh clone of any repo and wasn't published anywhere. Moved into this repo's skills/ so it's installable via the standard ./install symlink flow. Picked gstack-extensions over mutwo because the user explicitly chose it; the skill has no gstack-specific deps so mutwo would also have fit. Shipped via PR #8.

[skill][pr-watcher] Fresh-watch baselines now filter to addressed CR items only

Old behavior was dead on arrival in the common case: /ship opens a PR, CodeRabbit posts a review, user invokes /pr-watcher, the skill silently baselined every existing CR comment ("watch from now forward only"), the sensor returned already_settled, and the user had to address the comments by hand anyway. The "watch forward" framing made sense for long-lived PRs with backlogs the user had already triaged, but that case turned out to be hypothetical: the user always either replies to a CR comment when triaging or pushes a commit referencing its URL.

New behavior: on a fresh watch (all three baseline files are empty arrays), only baseline CR items that already look addressed. "Addressed" means either (a) the item is an inline review_comment in a thread where any non-CR author also commented (uses GraphQL reviewThreads, since REST doesn't expose thread structure), or (b) the item's html_url appears in some other PR comment body (the watcher's own "Addressed in <sha>" template and most manual triage replies both include the URL). Unaddressed items get left out of the baseline so the first sensor cycle returns them and Step 4 processes them normally. The opt-out for "I decided not to fix this" becomes uniform with the rest of the skill: reply with the reason and the next fresh watch baselines it.

API failures during seeding fall back to empty baselines for that stream. The dispatcher's status_ping / nitpick_only classification in Step 4b absorbs CR's walkthrough and "Actionable comments posted: 0" noise without making edits, so the worst case is one extra processing cycle, not an unwanted commit. PR #6 carried the change; addressed both CR nitpicks (per_page pagination tradeoff documented inline, FRESH variable renamed to IS_FRESH with positive logic) and merged via squash same session.

Dogfood bug found and shipped in the same PR: the original filter_url_referenced helper used select($bodies | contains(.html_url)), which jq evaluates with the current input being the piped-in $bodies string, not the iterated object. Substring check fails with Cannot index string with string "html_url". Fix: bind the URL first (.html_url as $url | select($bodies | contains($url))). Worth remembering: inside a pipe | function(.field) shape, jq re-evaluates .field against the piped value, not the outer iteration context.


2026-05-18

[repo][layout] Promoted bundles out of plugins/, dropped plugin-marketplace install path

Restructured plugins/pm-penny/ and plugins/feature-frank/ to top-level pm-penny/ and feature-frank/, removed the now-empty plugins/ dir, and deleted both .claude-plugin/plugin.json manifests. The Claude Code plugin marketplace install path (enabledPlugins: pm-penny@mj-claude pointing at /Users/mujtaba/dev/claude) was already orphaned: that source directory no longer exists, and ~/.claude/plugins/cache/local-agents/pm-penny/ was marked .orphaned_at. The live install has been via ./install symlinks for some time, so the plugin shape was dead weight. Decided not to commit to a bundles/ wrapper directory either: a category dir that only ever has two members is more clutter than the word "plugin" was. Generalized the install script to walk any top-level dir (other than skills/) that has its own skills/<sub-skill>/ tree. Renamed "plugin root" to "bundle root" across the 4 SKILL.md files and the README/INDEX so the language matches the install model. Decision validated via /second-opinion panel (Claude/Codex/Gemini) — all three converged on "don't flatten the sub-skills, keep shared/ co-located"; the steelman of the consensus (promote bundles to top level without a wrapper dir) is what we picked.

[skill][pm-penny] Added --fast mode to pm-penny-feature and pm-penny-bug

New shared/fast-mode.md in pm-penny. Triggered by /pm-penny-feature --fast <description> or /pm-penny-bug --fast <description>. Skips discovery, scope gate (feature), repro gate (bug, marks reproduction as waived), and the pre-creation preview. Calibrates by reading the past 30 issues in the repo plus a learnings log at ~/.claude/pm-penny/fast-learnings.md. Files immediately, shows the rendered body. If the user requests a change in the next turn, applies the edit on GitHub via gh issue edit and appends a one-line rule to fast-learnings.md under an H2 keyed on owner/repo. The learnings log is user-scoped, not in any repo, and grows by correction. Why both fast mode and the layout migration in one session: the --fast work surfaced the misleading "plugin" name during discussion, which led to the restructure.

[skill][pm-penny] Consolidated user-scoped state into ~/.claude/pm-penny/

Penny's state was split across two dirs: ~/.claude/plugins/pm-penny/config.json (issue-source config used by pm-penny-next-issue) and ~/.claude/pm-penny/<repo>.json (per-repo project-board cache used by shared/core.md). Moved the lone outlier to ~/.claude/pm-penny/config.json, removed the empty ~/.claude/plugins/pm-penny/ dir, updated the two path references in pm-penny-next-issue/SKILL.md. New flat layout under ~/.claude/pm-penny/: config.json (global), <repo>.json (per-repo cache), fast-learnings.md (fast-mode log). Added a "State location rule" guardrail to shared/core.md and a parallel note in shared/fast-mode.md instructing future Claude never to write Penny state inside a project repo. Also added /pm-penny/*.json and /pm-penny/fast-learnings.md to .gitignore as belt-and-suspenders against a stray write into the in-repo pm-penny/ bundle dir (which now shares its basename with the user-scoped state dir).


2026-05-17

[skill][pr-watcher] Dropped the test-command gate from the skill contract

Removed Step 2's TEST_CMD discovery + AskUserQuestion fallback, Step 4d's "run tests before push" step, the test-failure row in the failure-handling table, and the "do not skip the test command before pushing" rule at the bottom. Test gating was there to prevent the watcher from pushing a CR-induced fix that broke the suite, but in practice it caused friction on every invocation against a repo without a configured test framework: the watcher asked for a command, then if the answer was a no-op like true it was theatre, and if the answer was a real command in a repo with no tests it never resolved. The user hit this on the first run against mutwo (a docs + bash repo with no tests) and the workflow halted on the question. Trade-off accepted: CR-induced fixes now push without an automated verification step. The mitigation is that CR's own re-review on the new HEAD will catch a regression on the next cycle, and the change-per-finding atomicity (one commit per CR comment with a clear Address CodeRabbit: ... subject) makes any post-merge revert surgical. Skill version not bumped because the contract is strictly looser, no caller has to change anything.

[skill][pr-watcher] v3 staleness fixes: sensor init pass, all-clear exit, cr_failure, fallback hardening

Shipped via PR #3 squash-merge to main as commit 136f821. Three original causes of the 30-minute stale-wait, plus five robustness gaps surfaced over four /codex review iterations, all closed.

Original three (the user's complaint that opened the session): (1) sensor initialized last_terminal_status_updated_at = null and required current_updated_at > null, which is always falsy when CR was already terminal at spawn. Init pass added before the 15s loop returns already_settled (clean success), cr_failure (terminal failure/error), or new_cr_feedback (CR has unprocessed items meeting a settling condition) immediately. (2) After a nitpicks-only batch with zero commits pushed, the dispatcher spawned another sensor that had nothing to wait for. Step 4h added: explicit "loop until CR has nothing for us" exit when batch processed only nitpicks/status_pings and no commits were pushed. (3) idle_timeout default flipped from "keep watching" to "stop", since 30 minutes of silence after watcher start overwhelmingly means CR is done.

Codex review surfaced five additional gaps across four iterations (1× [P1] + 4× [P2], all addressed). [P1] the 15s loop's step 2c had the same null-baseline bug as the init pass; treats null explicitly as "no prior terminal seen" now. [P2] already_settled was firing for failure/error states, contradicting Step 4h's "keep alive for human inspection" rule; success-only now, with new cr_failure outcome for failure/error at init that exits with a clear warning citing the failed status. [P2] step 0e returned new_cr_feedback on any non-empty INIT_NEW regardless of CR status, risking partial-batch dispatch; now requires a settling condition (terminal status, sentinel marker, or quiet_period). [P2] step 4(c) quiet_period used updated_at only; reviews expose submitted_at, so age computation never resolved for review-only batches; effective-timestamp fallback added. [P2] step 2d's fallback counter only advanced on absent status; a stuck pending would hide late comments; now advances on both absent AND pending.

Live validation on test PR #2 (closed without merging): three substantive CR review rounds + clean exit with zero user intervention. Sensor cycles returned in 0-31 s across all rounds (vs the pre-fix 30-min idle_timeout). Final cycle exited 23 s after CR's terminal success.

Operational note: the /codex skill's hardcoded --base BRANCH "<PROMPT>" invocation broke at v0.130.0 (the two are now mutually exclusive). See [[reference_codex_cli_breaking_changes]].


2026-05-16

[skill][qa-quincey-browser] New skill: visible Chromium with "QA Quincey | …" tab-title prefix

User wanted to visually distinguish a QA-mode browser window from regular dogfood browsing. The clean fix would be a one-time CDP Page.addScriptToEvaluateOnNewDocument injection, but gstack browse's CDP allowlist is deny-default and that method isn't on it. Rather than patch gstack core (which /gstack-upgrade would clobber), the skill connects the headed browser the same way /open-gstack-browser does, then spawns a tiny background bash loop that re-applies the prefix to the active tab every 2 seconds via browse js. PID lands at ~/.gstack/qa-quincey-title.pid so the next invocation's pre-flight can kill stale loops. One gotcha worth recording: document.title getter strips trailing whitespace per HTML spec, so a startsWith("QA Quincey | ") check (with trailing space) returns false against the trimmed string and the poller re-prefixes infinitely ("QA Quincey | QA Quincey | …"). The shipped script uses a regex that strips any number of leading QA Quincey |… runs and re-adds exactly one, making the poll idempotent regardless of starting state. Verified live: prefix persists across goto calls (Hesco → Hacker News → admin/login).


2026-05-14

[meta][schema] Bootstrapped the project schema (CLAUDE.md / LOG.md / INDEX.md)

Repo had none of the three. Adopted the ~/dev/ convention (per /Users/mujtaba/dev/CLAUDE.md) and adapted the unbound/ template to this repo's reality: single-purpose skills repo, no sub-repos, no DESIGN.md, no features folder. CLAUDE.md documents the skill layout convention (each skills/<name>/SKILL.md is the entry point; description is load-bearing; optional CHANGELOG.md per skill). INDEX.md catalogs the two current skills and the install scripts.

[meta][plugins] Absorbed pm-penny and feature-frank from the deprecated mj-claude repo

The mj-claude marketplace repo (mujtaba3B/mj-claude) was being archived; its two plugins (PM Penny, Feature Frank) needed a home. Moved them into a new plugins/ directory parallel to skills/. Kept the plugin's internal structure (shared/*.md plus skills/<sub-skill>/SKILL.md) so the sub-skills' "load shared/core.md from the plugin root" instructions keep resolving. Extended ./install to also symlink plugins/*/skills/*/ into ~/.claude/skills/; resolution of shared/*.md through the symlink works because the symlink targets a real directory whose parent contains shared/. Updated README.md and INDEX.md to document the plugin layout. Rationale for keeping plugin shape (rather than flattening into skills/): three pm-penny sub-skills share core.md, repro-gate.md, and scope-gate.md; duplicating those into each sub-skill folder would be worse than the small install-script extension.

[skill][pr-watcher] Architecture v2: dispatcher + sensor split

Replaced the 9-minute Bash polling loop with a passive sensor subagent that blocks up to 30 minutes in one Agent call and returns a single JSON blob when CodeRabbit posts a settled round of feedback. Main agent now applies fixes, runs tests, commits, pushes, and replies on the PR itself (subagents only sense). Reasons: (1) one Agent call can wait for real signal up to 30 min vs. re-entering Bash every 9 min, (2) main context absorbs one short JSON per cycle instead of minutes of "tick" output, (3) main-owned git + one sensor at a time eliminates concurrent-edit risk. Live validation: PR #49 on Healthcare-Super-Connector/hesco landed via this pattern, two CR rounds, < 5 min total. Skill CHANGELOG entry added at skills/pr-watcher/CHANGELOG.md.