Skip to content

some recent changes#251

Merged
plusplusoneplusplus merged 24 commits into
mainfrom
pr/recent-changes
May 30, 2026
Merged

some recent changes#251
plusplusoneplusplus merged 24 commits into
mainfrom
pr/recent-changes

Conversation

@plusplusoneplusplus
Copy link
Copy Markdown
Owner

No description provided.

plusplusoneplusplus and others added 22 commits May 30, 2026 10:41
- buildRalphSynthesisPrompt() accepts optional seedGoal param (AC-01):
  when provided, injects the pre-existing ## Goal block with an
  authoritative-preserve instruction so the model keeps all [decision]
  tags and constraints verbatim and only expands missing slots.

- Improved RALPH_SYNTHESIS_PROMPT_BASE (AC-03): explicitly instructs the
  model to reconstruct decisions ([decision] tags), constraints, ACs with
  DoD bullets, and assumptions — no information from the conversation
  should be omitted.

- ralph-promote-routes.ts (AC-02): inspects the last assistant
  conversationTurn for a ## Goal block via regex; if found, extracts the
  block and passes it as seedGoal to buildRalphSynthesisPrompt().

- Tests: updated synthesis-prompt.test.ts (replaced stale snapshot
  assertions, added seedGoal coverage); added three new route tests for
  seed detection (match, no-match, last-turn-only).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep schedule run records active after successful enqueue until the queued task reaches a terminal state. Add regression coverage for queue success, queue failure, and enqueue failure lifecycle boundaries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep overlapped schedule timer fires from enqueueing duplicate work by recording a missed run and deferring the next timer until the active run completes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After commit f56a824a8, ScheduleExecutor.executeRun() became async and
awaits task completion before returning. In test environments with no AI
executor, queued tasks fail immediately, so POST /schedules/:id/run now
returns a run with status 'failed'.

Update the three affected integration test assertions to also accept
'failed' alongside 'running'/'completed':
- schedule-concurrent.test.ts
- schedule-handler.test.ts
- schedule-pause-markers.test.ts

Also fix a stale comment in schedule-mutation-during-run.test.ts that
incorrectly described executeRun() as synchronous.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use a validation-only system prompt for Ralph final-check tasks while keeping autopilot execution enabled. This prevents the final-check phase from receiving normal implementation-loop instructions that tell it to edit code and commit.

Add regression coverage for final-check system-message construction and executor dispatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Render completed ask_user calls as read-only history cards in Activity, keep them visible outside whisper collapse, and improve the fallback tool summary for batched questions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a compact "Open PR" input row at the top of the Pull Requests tab
queue panel. Bare numeric input resolves against the active workspace
repo; full GitHub PR URLs are parsed and matched against registered
workspaces by repository remote URL, then opened in the existing internal
PR detail/conversation view (#repos/<repoId>/pull-requests/<n>/overview).

Validation calls the existing single-PR API before navigating, so closed,
merged, or non-listed PRs open while truly missing PRs surface an inline
error. Unmatched URL repositories produce a "repository not registered"
inline error with no auto-registration and no external navigation.

No new persistent storage, feature flag, or top-level per-repo data path
is introduced.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add gitCommitLookup feature flag (disabled by default) wired through
  CLIConfig -> ResolvedCLIConfig -> RuntimeDashboardConfig -> SPA config
- Add isGitCommitLookupEnabled() helper in SPA utils/config.ts
- RepoGitTab: add handleCommitLookup callback that validates SHA pattern,
  checks loaded list first, then falls back to getCommit API for misses
- RepoGitTab: search input shows '↵ open commit' hint and loading state
  when query looks like a SHA and feature is enabled; Enter triggers lookup
- RepoGitTab: deep-link useEffect attempts getCommit API lookup when commit
  is not in the loaded history page (feature-gated)
- RepoGitTab: render 'Opened commit' section above CommitList for directly
  opened commits not in the loaded page; calls handleSelect on click
- RepoGitTab: show inline commitLookupError near search bar on failure
- Add namespace-registry merge for gitCommitLookup feature flag
- Add 21 new tests covering all new state, callbacks, UI elements, and
  safety constraints (no state-changing git commands in lookup path)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refresh all 15 outdated specs under packages/coc/specs/ to reflect the
current dashboard SPA, including the Admin shell redesign (embedded
Memory/Skills/Logs/Usage&Costs/Models/Servers panels), the Workflows tab
unification (Workflows + Templates + AI Chat Templates + Prompt & Script
Templates), the deprecated Plans/Tasks tab, the redesigned PR queue, and
the regrouped Repository Settings sidebar. Bumps each affected spec to a
new minor or major version with revision-history entries.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Create packages/forge/resources/bundled-skills/ultra-ralph/SKILL.md with
  five sections (grill, synthesis, execution, iteration, final-check), each
  section delimited by '## Section: <name>' headings (greppable)
- Register ultra-ralph in bundled-skills-registry.ts
- Add ultra-ralph to DEFAULT_BUNDLED_SKILLS in coc/src/config.ts for
  auto-install at server startup (never clobbers user edits)
- Thin the five Ralph prompt call sites to a skill reference + minimal
  machine-contract lines only:
  - RALPH_GRILL_SUFFIX (chat-base-executor.ts)
  - RALPH_SYNTHESIS_PROMPT_BASE (synthesis-prompt.ts)
  - RALPH_BASE_INSTRUCTIONS (ralph-executor.ts)
  - RALPH_SPEC_CONTRACT_PROMPT (iteration-prompt.ts)
  - RALPH_FINAL_CHECK_BASE_INSTRUCTIONS + READ_ONLY_INSTRUCTIONS
- Delete ralph-prompt-overrides.ts; replace with admin-prompt-overrides.ts
  (no RALPH_PROMPT_IDS) for diff classification override still in use
- Remove getPromptOverride calls from all five Ralph prompt sites
- Remove /api/admin/prompts Ralph entries from getBuiltInPrompts()
- Remove Ralph from PromptsPanel GROUP_ORDER in admin SPA
- Add Vitest test for ultra-ralph SKILL.md presence and section checks
- Update all affected test files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Running and queued tasks were always shown with the Copilot (green) color
in ChatListPane regardless of their actual provider. The root cause was
that serializeTaskSummary stripped payload.provider when building the slim
payload for queue-list API responses. ChatListPane resolves colors via
getTaskChatProvider, which checks task.payload.provider among other paths;
with it absent all tasks fell through to the Copilot green fallback.

Adds payload.provider to slimPayload so Codex (indigo) and Claude (coral)
sessions are colored correctly in the running/queued sidebar list.

Also adds four regression tests covering all three provider values and the
undefined (no provider) case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ensures Ralph follow-on tasks (next iteration, final-check, gap-fix loops)
are admitted ahead of unrelated exclusive backlog in the same workspace,
preventing session interruption.

Core mechanism: add continuationOfSessionId to QueuedTask/CreateTaskInput
and a new insertAsContinuation() method in TaskQueueManager that scans the
queue for the first exclusive task not belonging to the same session and
inserts the continuation before it.

Changes:
- packages/forge/src/queue/types.ts: add continuationOfSessionId? to QueuedTask
- packages/forge/src/queue/task-queue-manager.ts: add insertAsContinuation(),
  call it from insertTask() when task.continuationOfSessionId is set
- packages/coc/src/server/ralph/types.ts: extend RalphFinalCheckStatus with 'queued'
- packages/coc/src/server/ralph/enqueue-iteration.ts: thread continuationOfSessionId
- packages/coc/src/server/ralph/enqueue-final-check.ts: set continuationOfSessionId
  on final-check payload; buildFinalCheckStartRecord returns status: 'queued' (AC-06)
- packages/coc/src/server/ralph/orchestrate-final-check.ts: set continuationOfSessionId
  on gap-fix loop task (AC-03)
- packages/coc/src/server/queue/queue-executor-bridge.ts: set continuationOfSessionId
  on next-iteration enqueue (AC-01)

Tests:
- forge/test/queue/task-queue-manager.test.ts: 6 new continuation ordering tests
- coc/test/server/ralph/enqueue-final-check.test.ts: expect 'queued' start status,
  assert continuationOfSessionId is set
- coc/test/server/queue-executor-bridge.test.ts: Ralph session queue continuity
  describe block with AC-01 (RALPH_NEXT) and AC-02 (RALPH_COMPLETE final-check) tests

AC-04 (workspace-scoped) is guaranteed by per-repo TaskQueueManager instances.
AC-05 (non-exclusive concurrency) is unaffected.
AC-07 (scheduled Ralph semantics) verified via existing schedule tests (350 pass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…imization

Implements a simplified SkillOpt loop (arXiv:2605.23904) under
scripts/skillopt/ that trains a SKILL.md document by driving the
GitHub Copilot CLI non-interactively.

AC-01 Copilot CLI driver (cli-driver.ts)
- runCopilotCli(prompt, workdir, model) with --allow-all-tools
- Typed CliError for non-zero exits; configurable timeout
- captureGitDiff for post-run diff capture

AC-02 Task corpus + schema (corpus.ts + corpus/tasks.json)
- Documented Task schema (id, prompt, seedRef, visibleTests,
  hiddenTests, judgeRubric, split)
- loadCorpus validates schema, split presence, unique IDs
- 5-task seed corpus (3 train, 2 selection)

AC-03 Isolated rollout (rollout.ts)
- git worktree add --detach + skill injection
- Injects skill as .github/skills/active-skill.md
- Runs hidden tests before worktree cleanup
- Always cleans up in finally block

AC-04 Scoring (scoring.ts)
- blendScores(hidden, judge, weights) -- pure, testable
- Default weights w1=0.7 / w2=0.3 (configurable)
- LLM judge via headless CLI call on the diff
- Hidden tests NEVER placed in agent-visible prompt

AC-05 Optimizer edit (optimizer.ts)
- buildOptimizerPrompt from current skill + scored rollouts
- parseOptimizerEdit: JSON block -> add/delete/replace
- applyEdit: anchor-based bounded edit applicator
- Malformed output = no-op candidate (run continues)

AC-06 Held-out validation gate (gate.ts)
- evaluateGate: accept iff candidateScore > bestScore (strict)
- Records decision + scores for history

AC-07 Loop + artifacts (loop.ts)
- Bounded loop (max-steps), Ctrl-C safe SIGINT handler
- Atomic writes (tmp -> rename) for best_skill.md
- Appends step records to history.jsonl
- Writes summary.json on completion

AC-08 CLI ergonomics (skillopt.ts + README.md)
- parseArgs interface with --help, --skill, --corpus, --out,
  --target-model, --optimizer-model, --max-steps, --w1, --w2,
  --timeout-ms
- Pre-flight check: fails fast if copilot not on PATH
- README with schema docs, optimizer prompt contract, algorithm

Tests: 62 vitest tests; all CLI calls mocked for CI

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract shared owner-bridge dispatch in MultiRepoQueueRouter and pass reasoningEffort through follow-up routing. Add regression coverage for follow-up, steering, and ask-user dispatch across bridges.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ClaudeSDKService only passed cwd to the agent-sdk query, so the Claude agent was sandboxed to the working directory and could not read out-of-repo files such as CoC skill definitions under ~/.coc/skills.

Wire an additionalDirectories option through SendMessageOptions into the SDK query options. ClaudeSDKService.resolveAdditionalDirectories always grants access to ~/.coc and the system temp directory, merges any caller-provided directories, resolves them to absolute paths, and de-duplicates (case-insensitively on Windows).

Add unit tests covering the auto-injected directories, caller-provided entries, and de-duplication.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add deriveEffort() pure function (effortUtils.ts) that computes the
  effort override from a model's stored preference, supported efforts, and
  reasoning capability flag.
- Add useProviderReasoningEfforts(provider) hook that fetches the
  per-provider, per-model effort preference map from
  GET /api/agent-providers/<provider>/models/reasoning-efforts.
- Update NewChatArea to auto-derive effortOverride whenever provider or
  effective model changes. Uses userPickedForModelRef to preserve explicit
  user picks within the same (provider, model) combo while re-deriving on
  any model/provider swap.  Draft restore no longer restores effortOverride;
  always re-derives from current preferences.
- Update ChatDetail to initialize effortOverride from
  processDetails.config.reasoningEffort on first task load (§5.1) and
  re-derive on mid-conversation model override change (§5.4).  Resets on
  taskId change.  Uses chatEffectiveModelId so effort options and disabled
  state reflect the active override, not only the original session model.
- Tests: deriveEffort unit tests (13), useProviderReasoningEfforts hook
  tests (7), source-level wiring tests for NewChatArea and ChatDetail (23).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
plusplusoneplusplus and others added 2 commits May 30, 2026 12:01
…dled skill

The Ralph prompt entries were removed from getBuiltInPrompts() when the five
Ralph prompt call sites were thinned to reference the bundled ultra-ralph
skill. Update the stale admin-prompts test to expect the remaining 8 built-in
prompts and drop the Ralph group assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…effort hook

- chat-mode-executors: assert grilling suffix references the bundled ultra-ralph
  skill + '## Goal' contract instead of the removed inline directive text.
- config snapshot: accept new feature-flag key 'gitCommitLookup'.
- ChatDetail source-introspection: widen the taskId-reset window now that the
  effect resets additional refs.
- NewChatArea (x2) and composer ordering: add getReasoningEfforts to the
  agentProviders client mock now that ChatDetail/NewChatArea call
  useProviderReasoningEfforts on mount.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@plusplusoneplusplus plusplusoneplusplus merged commit 4675d79 into main May 30, 2026
66 of 68 checks passed
@plusplusoneplusplus plusplusoneplusplus deleted the pr/recent-changes branch May 30, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant