apecloud · weicao · May 6, 2026 · May 6, 2026
diff --git a/skills/README.md b/skills/README.md
@@ -56,6 +56,24 @@ Some skills are distilled from the legacy `apecloud/kubeblocks-addon-skills` rep
 
 These skills were adapted to include standard `name` / `description` frontmatter. Their old `docs/...` and `scripts/...` references are bundled under `skills/kubeblocks-addon-source-docs/`; when installed as user skills, copy that support directory beside the individual skill directories.
 
+## Methodology companion skills
+
+Some skills are distilled directly from the methodology guides in `docs/`. These are used when the guide contains a repeatable decision tree, preflight, review checklist, or closeout protocol:
+
+- `bounded-eventual-convergence` — companion to `docs/addon-bounded-eventual-convergence-guide.md`
+- `evidence-discipline` — companion to `docs/addon-evidence-discipline-guide.md`
+- `first-blocker-classification` — companion to `docs/addon-test-acceptance-and-first-blocker-guide.md`
+- `github-submission-discipline` — companion to `docs/addon-github-submission-discipline-guide.md`
+- `paramdef-range-validation` — companion to `docs/addon-paramdef-cue-range-validation-guide.md`
+- `plain-language-with-westonnnn` — communication rule for westonnnn-facing messages
+- `probe-soft-failure` — companion to `docs/addon-probe-timeout-and-soft-failure-guide.md`
+- `runner-portability` — companion to `docs/addon-test-runner-portability-guide.md`
+- `soak-test-classification` — companion to `docs/addon-soak-test-result-classification-guide.md`
+- `team-skill-rollout` — companion to `docs/addon-agent-skill-rollout-guide.md`
+- `xp-design-contract-review` — companion to `docs/addon-design-contract-review-during-xp-guide.md`
+
+Treat this list as the first conversion batch. Future conversions should keep the same rule: only turn a document into a skill when it changes an agent's behavior at the moment of coding, testing, reviewing, debugging, or reporting.
+
 ## Distribution
 
 Each addon team member loads skills via one of:

diff --git a/skills/github-submission-discipline/SKILL.md b/skills/github-submission-discipline/SKILL.md
@@ -0,0 +1,105 @@
+---
+name: github-submission-discipline
+description: Use before committing, amending, pushing, force-pushing, or merging public GitHub PRs from AI agents. Prevents AI attribution leakage and multi-agent branch overwrite incidents by enforcing clean commit bodies, trailer grep checks, fetch/log comparison, and single-owner recovery.
+allowed-tools: Bash(git *) Bash(gh *) Bash(rg *) Bash(grep *)
+---
+
+# GitHub Submission Discipline
+
+## Hard Rules
+
+1. **No AI attribution in public commits or PR bodies.** Remove `Co-Authored-By`, generated-by banners, model names, and vendor noreply addresses.
+2. **Check every commit body, not only the latest.** Squash and amend can keep old body text.
+3. **Fetch before force-push.** `--force-with-lease` protects only against the last fetched remote tip, not a teammate push you never fetched.
+4. **Compare local and remote commit chains before pushing shared PR branches.**
+5. **One recovery owner at a time.** When a branch was overwritten, stop parallel fixes and let one person rebuild the chain.
+6. **Dropped commit owner verifies content.** The owner of the dropped work checks the recovered diff, not only the person who rebuilt the branch.
+
+## When To Invoke
+
+Use this skill when:
+
+- Creating a commit for a public GitHub repo.
+- Amending or squashing a PR.
+- Pushing or force-pushing a branch other agents may touch.
+- Reviewing a PR for merge readiness.
+- Recovering from a missing commit, unexpected rebase, or force-push race.
+
+## Pre-Push Checks
+
+Run before any push:
+
+```bash
+git fetch origin
+git log --oneline --decorate --graph --max-count=12 HEAD
+git log --oneline --decorate --graph --max-count=12 origin/$(git branch --show-current) 2>/dev/null || true
+
+git log origin/main..HEAD --format='%H' | while read c; do
+  git show -s --format='%B' "$c" | grep -Ei 'Co-Authored-By|Generated with|Anthropic|Claude|OpenAI|🤖|noreply@anthropic' && {
+    echo "AI attribution found in $c" >&2
+    exit 1
+  }
+done
+```
+
+Also inspect the PR body before requesting review:
+
+```bash
+gh pr view --json body --jq .body | grep -Ei 'Co-Authored-By|Generated with|Anthropic|Claude|OpenAI|🤖|noreply@anthropic' && exit 1
+```
+
+## Commit Message Pattern
+
+Use a normal human commit:
+
+```text
+docs: add probe soft-failure skill
+
+Summarize what changed and why.
+```
+
+Do not add model signatures, generated-by sections, or AI co-author trailers.
+
+## Shared Branch Push Protocol
+
+Before force-pushing:
+
+1. `git fetch origin`
+2. Compare `HEAD` vs remote branch.
+3. Confirm the commits you expect are present.
+4. Use `git push --force-with-lease`.
+5. Immediately re-read remote tip:
+   ```bash
+   git ls-remote origin "$(git branch --show-current)"
+   ```
+
+If the remote contains work you did not expect, stop and coordinate. Do not overwrite.
+
+## Recovery Protocol
+
+If a commit disappeared:
+
+1. Freeze branch writes.
+2. Identify remote tip before and after the bad push.
+3. Reconstruct the intended chain in one local checkout.
+4. Ask each dropped-commit owner to verify their diff is present.
+5. Push once with `--force-with-lease`.
+6. Run the attribution grep again across the final chain.
+
+## Closeout Language
+
+Good:
+
+```text
+Submission check passed: PR body clean; commits origin/main..HEAD clean for AI attribution; local chain matches remote tip after fetch; pushed with force-with-lease.
+```
+
+Bad:
+
+```text
+Looks clean.
+```
+
+## Related Docs
+
+- `docs/addon-github-submission-discipline-guide.md`
diff --git a/skills/paramdef-range-validation/SKILL.md b/skills/paramdef-range-validation/SKILL.md
@@ -0,0 +1,88 @@
+---
+name: paramdef-range-validation
+description: Use when adding or reviewing KubeBlocks ParametersDefinition cue/tpl numeric ranges, or when a reconfigure OpsRequest passes validation but the database cannot restart. Forces practical_min/practical_max validation instead of copying vendor hard limits.
+allowed-tools: Bash(kubectl *) Bash(helm *) Bash(rg *) Read
+---
+
+# Parameter Range Validation
+
+## Hard Rules
+
+1. **Schema guards startup, not documentation theory.** A range in ParametersDefinition must block values that make the database fail to start.
+2. **Use practical_min / practical_max, not doc_hard_min / doc_hard_max.** Vendor docs often describe parser limits, not values that can run a real instance.
+3. **Every numeric lower bound needs evidence.** If you write `>=N`, be able to say how `N` was tested.
+4. **Always set an upper bound.** A lower-only range still allows values that can exhaust PID, memory, file descriptor, or engine resource limits.
+5. **Version upgrades reset the evidence.** Do not reuse a 12c / 5.7 / old-major range for a new major engine version without a boundary test.
+6. **Separate default, recommendation, and guardrail.** The schema rejects unsafe values; defaults and docs recommend good values.
+
+## When To Invoke
+
+Use this skill when:
+
+- Writing a new `ParametersDefinition` / cue / tpl file.
+- Reviewing any `int & >=...`, `int & <=...`, min/max, or enum-like numeric range.
+- Debugging reconfigure where validation succeeded but the engine failed to restart.
+- Upgrading an addon to a new major engine version or KubeBlocks parameter API version.
+
+## Review Checklist
+
+For each numeric parameter:
+
+1. Identify the source of each bound:
+   - `doc_hard_min/max`
+   - observed startup boundary
+   - resource cap
+   - arbitrary default copied from examples
+2. Require a startup proof for `practical_min`:
+   - value at `practical_min` starts successfully
+   - value at `practical_min - 1` is rejected by schema or demonstrably unsafe
+3. Require a resource proof for `practical_max`:
+   - container resources can tolerate the value
+   - engine hard max is not exceeded
+4. Add a negative validation case:
+   - unsafe low value must fail during ValidatePhase
+   - unsafe high value must fail during ValidatePhase
+5. Confirm runtime readback after a valid reconfigure:
+   - OpsRequest success alone is not enough
+   - read the parameter from the running engine
+
+## Incident Triage
+
+If reconfigure is stuck after a numeric change:
+
+1. Check whether the value passed schema validation.
+2. Check engine logs for startup failure after writing that value.
+3. If the value is below practical startup requirements, classify the bug as **schema guard too weak**.
+4. Fix the schema first. Do not paper over it by only changing the test.
+5. Add a boundary-negative test so the same value is rejected before engine restart.
+
+## Bad Patterns
+
+| Pattern | Why It Fails | Fix |
+|---|---|---|
+| Copy vendor minimum directly | Parser accepts it, engine may not start | Use tested practical minimum |
+| Lower bound only | Huge values can exhaust resources | Add practical upper bound |
+| Default as schema minimum | Rejects values that are safe but not recommended | Keep default separate |
+| OpsRequest Succeed as proof | Runtime may not apply or engine may restart later | Perform engine readback |
+| Shared range across major versions | Startup requirements change | Split by engine version |
+
+## Closeout Language
+
+Good:
+
+```text
+Checked parameter range for `processes`: schema now uses practical_min=100, not vendor hard_min=6. Boundary negative `99` is rejected in ValidatePhase; `100` starts and runtime readback confirms applied value.
+```
+
+Bad:
+
+```text
+Range looks fine; copied from docs.
+```
+
+## Related Docs
+
+- `docs/addon-paramdef-cue-range-validation-guide.md`
+- `docs/addon-reconfigure-guide.md`
+- `docs/addon-reconfigure-version-skew-guide.md`
+- `docs/cases/oracle/oracle-12c-processes-cue-paramdef-range-case.md`
diff --git a/skills/probe-soft-failure/SKILL.md b/skills/probe-soft-failure/SKILL.md
@@ -0,0 +1,94 @@
+---
+name: probe-soft-failure
+description: Use when writing or reviewing addon livenessProbe/readinessProbe scripts, or diagnosing unexpected restarts/readiness flaps. Forces channel errors such as client timeout or connection refusal to be treated as transient unless the client succeeds and output proves a bad database state.
+allowed-tools: Bash(kubectl *) Bash(rg *) Read
+---
+
+# Probe Soft Failure
+
+## Hard Rules
+
+1. **Probe exit code must represent product state, not client transport state.**
+2. **Every external client call must be wrapped in a shorter timeout than the probe timeout.**
+3. **Client `rc != 0` is transient / unknown, so probe should usually `exit 0`.**
+4. **Only `client succeeded + output proves bad state` may `exit 1`.**
+5. **Known legal slow windows must short-circuit before the client call.** Use a narrow process guard when startup, DBCA, backup, restore, switchover, or role setup is known to make the control plane slow.
+6. **Probe timing must fit the slowest legal operation.** Size `initialDelaySeconds`, `periodSeconds`, `timeoutSeconds`, and `failureThreshold` from real slow windows.
+7. **Readiness can be conservative; liveness must be careful.** A false liveness failure kills the container and can create a restart cascade.
+
+## When To Invoke
+
+Use this skill when:
+
+- Adding or reviewing `livenessProbe` / `readinessProbe` in `ComponentDefinition`.
+- Writing shell scripts called by K8s probes.
+- Seeing restarts increase while engine logs do not show a real crash.
+- Seeing readiness flap after switchover, reconfigure, backup/restore, role rebuild, or startup.
+- Reviewing PRs that use `mysql`, `redis-cli`, `sqlplus`, `dgmgrl`, `psql`, or similar clients in probes.
+
+## Probe Script Shape
+
+Use this shape unless the engine has a stronger local health API:
+
+```bash
+# 1. Known legal slow window guard
+if pgrep -f '<init-or-role-change-marker>' >/dev/null 2>&1; then
+  echo "probe: legal slow window, soft pass"
+  exit 0
+fi
+
+# 2. Client call with explicit timeout
+out=$(timeout 25 <engine-client-command> 2>&1)
+rc=$?
+
+# 3. Transport/client failure is unknown, not product failure
+if [ "$rc" -ne 0 ]; then
+  echo "probe: client unavailable rc=$rc, soft pass"
+  exit 0
+fi
+
+# 4. Only parsed bad state fails
+case "$out" in
+  *'<known-good-state>'*) exit 0 ;;
+  *'<known-bad-state>'*)  echo "probe: bad state: $out"; exit 1 ;;
+  *)                     echo "probe: unknown output, soft pass"; exit 0 ;;
+esac
+```
+
+## Review Checklist
+
+Before approving a probe:
+
+1. Find every command substitution and pipeline that calls an engine client.
+2. Confirm each client call has an explicit timeout.
+3. Confirm `rc != 0` does not directly become `exit 1`.
+4. Confirm `exit 1` paths are based on parsed output from a successful client call.
+5. Confirm legal slow windows are guarded before the client call.
+6. Confirm probe timing leaves enough time for the slowest legal operation.
+7. Confirm liveness and readiness semantics are not copied blindly from each other.
+
+## Incident Triage
+
+If a pod is repeatedly killed by liveness:
+
+1. Compare restart timestamps with DB startup / role-change / backup / restore activity.
+2. Read previous container logs and probe script output.
+3. If probe failed on timeout, connection refusal, or transient client error, classify as **channel error promoted to product failure**.
+4. Patch the probe script semantics first.
+5. Re-run the same lifecycle path and verify restarts stop.
+
+## Bad Patterns
+
+| Pattern | Risk | Fix |
+|---|---|---|
+| `client || exit 1` | Kills during legal slow window | Treat client failure as unknown |
+| `stderr contains ORA/ERR => exit 1` | Transient startup errors become product failure | Require successful readback + parsed state |
+| No explicit `timeout` | Probe SIGKILLs client mid-call | Wrap client with `timeout N` |
+| Broad `pgrep` guard | Probe always soft-passes | Use narrow marker strings |
+| Liveness checks role readiness | Role changes kill otherwise healthy DB | Keep liveness minimal |
+
+## Related Docs
+
+- `docs/addon-probe-timeout-and-soft-failure-guide.md`
+- `docs/addon-test-probe-classification-guide.md`
+- `docs/cases/oracle/oracle-12c-post-switchover-probe-cascade-kill.md`