Skip to content

fix(mothership): fix boundary#3745

Open
Sg312 wants to merge 12 commits intostagingfrom
fix/mothership-boundary
Open

fix(mothership): fix boundary#3745
Sg312 wants to merge 12 commits intostagingfrom
fix/mothership-boundary

Conversation

@Sg312
Copy link
Collaborator

@Sg312 Sg312 commented Mar 24, 2026

Summary

Fix boundary issues

Type of Change

  • Bug fix

Testing

Manual

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Mar 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 24, 2026 10:14pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR replaces the client-side async tool resumption logic (claim/release/deliver flow against the local DB) with a lighter Go-side checkpoint polling model: Sim now polls /api/tools/checkpoint-status until the Go backend reports readiness, then issues a resume request containing only the checkpointId. It also changes markToolComplete from fire-and-forget to fully awaited, ensuring proper ordering before resource persistence and improving error visibility.

  • index.ts: Removes claimCompletedAsyncToolCall, releaseCompletedAsyncToolClaim, markAsyncToolDelivered, and didAsyncToolSucceed in favour of waitForCheckpointReady (40-attempt exponential poll). Adds upstream retry for 502/503/504 on the resume route. Two dead-code artifacts left behind: let resumeReady = false (line 279) and if (!resumeReady) { break } (lines 340–342) can be removed.
  • tool-execution.ts: markToolComplete is now awaited via waitForMarkComplete (previously fire-and-forget). reportCancelledTool is now async. Added ASYNC_RESUME_DIAG_TAG diagnostic logging at each completion phase.
  • handlers.ts: Added diagnostic logging before context.awaitingAsyncContinuation is set in the done handler.
  • Tests: Updated mocks to reflect the new boundary; fetchMock stubs the checkpoint-status endpoint; all four continuation scenarios are covered including the 502 upstream retry path.

Confidence Score: 5/5

  • Safe to merge; the only finding is cosmetic dead code that poses no runtime risk.
  • The architectural change is well-scoped and thoroughly tested. All four async-continuation scenarios are covered in the updated test suite, including the new checkpoint polling and upstream retry paths. The markToolComplete promotion from fire-and-forget to awaited is coherent with the new boundary design (tool execution is always a background promise, so the SSE reader remains unblocked). The sole issue is the leftover resumeReady variable and its dead guard — a non-blocking cleanup with no impact on correctness.
  • No files require special attention beyond the minor dead-code cleanup in apps/sim/lib/copilot/orchestrator/index.ts.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/orchestrator/index.ts Core change: replaces claim-based async tool resumption with Go-side checkpoint polling. New waitForCheckpointReady polls /api/tools/checkpoint-status with exponential backoff (up to 40 × 250ms). Upstream resume retries added (3× for 502/503/504). Dead code: resumeReady variable and its guard are leftover from the old inner loop and should be removed.
apps/sim/lib/copilot/orchestrator/sse/handlers/tool-execution.ts Changed markToolComplete from fire-and-forget to fully awaited (via new waitForMarkComplete wrapper). reportCancelledTool is now async and awaited. Added diagnostic logging at each completion phase. The old "must not await to avoid deadlock" comment is gone, consistent with the new boundary design where tool execution is always a background promise.
apps/sim/lib/copilot/orchestrator/sse/handlers/handlers.ts Added ASYNC_RESUME_DIAG_TAG constant and diagnostic logger.warn call in the done handler before setting context.awaitingAsyncContinuation, improving observability of the async-pause flow.
apps/sim/lib/copilot/orchestrator/index.test.ts Tests updated to reflect new checkpoint polling model. Mocks trimmed to only relevant functions. New fetchMock stubs the checkpoint-status endpoint. afterEach properly restores fake timers and globals. All four scenarios covered: readiness success, readiness failure, async-pause done event, and upstream 502 retry.
apps/sim/lib/copilot/orchestrator/sse/handlers/handlers.test.ts Updated test mock setup to include upsertAsyncToolCall. Tests cover tool execution, deduplication, cancellation, in-flight promise stability, mark-complete ordering, and upsert failure resilience. Coverage looks solid.

Sequence Diagram

sequenceDiagram
    participant Sim as Sim (orchestrator)
    participant Go as Go Backend

    Sim->>Go: POST /api/copilot (initial stream)
    Go-->>Sim: SSE events (tool_call, content, ...)
    Note over Sim: tool_call handler fires executeToolAndReport() in background
    Go-->>Sim: SSE done {async_pause: {checkpointId, pendingToolCallIds}}
    Note over Sim: context.awaitingAsyncContinuation set

    loop Poll checkpoint readiness (up to 40x × 250ms backoff)
        Sim->>Go: GET /api/tools/checkpoint-status?checkpointId=...
        Go-->>Sim: {ready: false}
    end
    Sim->>Go: GET /api/tools/checkpoint-status?checkpointId=...
    Go-->>Sim: {ready: true}

    loop Retry on 502/503/504 (up to 3x)
        Sim->>Go: POST /api/tools/resume {checkpointId}
        Go-->>Sim: SSE events (next turn)
    end

    Note over Sim: executeToolAndReport awaits markToolComplete (no longer fire-and-forget)
    Sim->>Go: markToolComplete(toolCallId, status, result)
    Go-->>Sim: ack
Loading

Reviews (1): Last reviewed commit: "Fix lint" | Re-trigger Greptile

Comment on lines +333 to 342
context.awaitingAsyncContinuation = undefined
route = '/api/tools/resume'
payload = {
checkpointId: continuation.checkpointId,
}
resumeReady = true

if (!resumeReady) {
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead code: resumeReady flag is always true here

resumeReady is set to true unconditionally on line 338, immediately before this guard. There is no code path that reaches if (!resumeReady) with resumeReady === false — if checkpoint readiness failed, the earlier if (!readiness.ready) block pushes an error and breaks out of the loop before getting here. This guard (and the variable itself) is a leftover artifact from the previous inner for(;;) loop pattern that was removed in this refactor and should be cleaned up.

Suggested change
context.awaitingAsyncContinuation = undefined
route = '/api/tools/resume'
payload = {
checkpointId: continuation.checkpointId,
}
resumeReady = true
if (!resumeReady) {
break
}
context.awaitingAsyncContinuation = undefined
route = '/api/tools/resume'
payload = {
checkpointId: continuation.checkpointId,
}

@@ -190,193 +277,66 @@ export async function orchestrateCopilotStream(
if (!continuation) break

let resumeReady = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unreferenced variable declaration

resumeReady is declared here but the only value it ever holds when the if (!resumeReady) guard is reached is true. The declaration can be removed along with the guard.

Suggested change
let resumeReady = false

Sg312 and others added 9 commits March 24, 2026 13:13
* fix(home): voice input text persistence bugs

* fix(home): gate setIsListening on startRecognition success

* fix(home): handle startRecognition failure in restartRecognition

* fix(home): reset speech prefix on submit while mic is active
* feat(table): column drag-and-drop reorder

* fix(table): remove duplicate onDragEnd call from handleDrop

* fix(table): persist columnOrder on rename/delete and defer delete to onSuccess

* fix(table): prevent stale refs during column drag operations

Fix two bugs in column drag-and-drop:
1. Stale columnWidths ref during rename - compute updated widths inline
   before passing to updateMetadata
2. Escape-cancelled drag still reorders - update dropTargetColumnNameRef
   directly in handleColumnDragLeave to prevent handleColumnDragEnd from
   reading stale ref value

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(table): insert column at correct side when anchor is unordered

When the anchor column isn't in columnOrder, add it first then insert
the new column relative to it, so 'right' insertions appear after the
anchor as expected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* chore: fix conflicts

* chore: fix review changes

* chore: fix review changes

* chore: fix review changes
* feat: add product tour

* chore: updated modals

* chore: fix the tour

* chore: Tour Updates

* chore: fix review changes

* chore: fix review changes

* chore: fix review changes

* chore: fix review changes

* chore: fix review changes

* minor improvements

* chore(tour): address PR review comments

- Extract shared TourState, TourStateContext, mapPlacement, and TourTooltipAdapter
  into tour-shared.tsx, eliminating ~100 lines of duplication between product-tour.tsx
  and workflow-tour.tsx
- Fix stale closure in handleStartTour — add isOnWorkflowPage to useCallback deps
  so Take a tour dispatches the correct event after navigation

* chore(tour): address remaining PR review comments

- Remove unused logger import and instance in product-tour.tsx
- Remove unused tour-tooltip-fade animation from tailwind config
- Remove unnecessary overflow-hidden wrapper around WorkflowTour
- Add border stroke to arrow SVG in tour-tooltip for visual consistency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(tour): address second round of PR review comments

- Remove unnecessary 'use client' from workflow layout (children are already client components)
- Fix ref guard timing issue in TourTooltipAdapter that could prevent Joyride from tracking tooltip on subsequent steps

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(tour): extract shared Joyride config, fix popover arrow overflow

- Extract duplicated Joyride floaterProps/styles into getSharedJoyrideProps()
  in tour-shared.tsx, parameterized by spotlightBorderRadius
- Fix showArrow disabling content scrolling in PopoverContent by wrapping
  children in a scrollable div when arrow is visible

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* lint

* fix(tour): stop running tour when disabled becomes true

Prevents nav and workflow tours from overlapping. When a user navigates
to a workflow page while the nav tour is running, the disabled flag
now stops the nav tour instead of just suppressing auto-start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tour): move auto-start flag into timer, fix truncate selector conflict

- Move hasAutoStarted flag inside setTimeout callback so it's only set
  when the timer fires, allowing retry if disabled changes during delay
- Add data-popover-scroll attribute to showArrow scroll wrapper and
  exclude it from the flex-1 truncate selector to prevent overflow
  conflict

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tour): remove duplicate overlay on center-placed tour steps

Joyride's spotlight already renders a full-screen overlay via boxShadow.
The centered TourTooltip was adding its own bg-black/55 overlay on top,
causing double-darkened backgrounds. Removed the redundant overlay div.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: move docs link from settings to help dropdown

The Docs link (https://docs.sim.ai) was buried in settings navigation.
Moved it to the Help dropdown in the sidebar for better discoverability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Adithya Krishna <aadithya794@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(home): auth-aware landing page navigation

- Redirect authenticated users from / to /workspace via middleware (?home param bypasses)
- Show "Go to App" instead of "Log in / Get started" in navbar for authenticated users
- Logo links to /?home for authenticated users to stay in marketing context
- Settings "Home Page" button opens /?home
- Handle isPending session state to prevent CTA button flash

* lint

* fix(home): remove stale ?from=nav params in landing nav

* fix(home): preserve ?home param in nav links during session pending state

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants