Skip to content

Extract preview/sync GitHub Actions#4897

Draft
backspace wants to merge 4 commits into
mainfrom
cs-11180-extract-shared-preview-realm-github-actions-to-monorepo
Draft

Extract preview/sync GitHub Actions#4897
backspace wants to merge 4 commits into
mainfrom
cs-11180-extract-shared-preview-realm-github-actions-to-monorepo

Conversation

@backspace
Copy link
Copy Markdown
Contributor

@backspace backspace commented May 19, 2026

I noticed that boxel-home PR previews are broken:

image

This is because the interface to _publish-realm changed:

Publishing https://realms-staging.stack.cards/boxel_homepage_realm/boxel-home-pr-57/ to https://boxel_homepage_realm.staging.boxel.dev/boxel-home-pr-57/
Failed to publish realm (HTTP 202):
{
  "data": {
    "type": "published_realm",
    "id": "23ea3f2a-9c7a-4028-ad3a-7be647ed476b",
    "attributes": {
      "sourceRealmURL": "https://realms-staging.stack.cards/boxel_homepage_realm/boxel-home-pr-57/",
      "publishedRealmURL": "https://boxel_homepage_realm.staging.boxel.dev/boxel-home-pr-57/",
      "lastPublishedAt": "1778870465062",
      "status": "pending"
    }
  }
}

HTTP 202 is actually expected now!

I also noticed that boxel-catalog, boxel-home, and boxel-skills were all using duplicative bespoke workflows to accomplish similar tasks, with use of cardstack/boxel-cli, npm Boxel CLI, and the old workspace sync CLI.

This extracts the preview/sync workflows into the monorepo so they can be used from external repositories and tested in-monorepo in case of interface changes like the above.

@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch from 037a389 to 79657eb Compare May 20, 2026 00:00
@backspace backspace changed the base branch from cs-11161-extract-workspace-sync-action to main May 20, 2026 00:01
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Observability diff (vs staging)

Show diff
diff --git a/tmp/remote-canon.Nq1dRP/dashboards/boxel-status/indexing.json b/tmp/committed-canon.XjD11i/dashboards/boxel-status/indexing.json
index a39cf75..25280b9 100644
--- a/tmp/remote-canon.Nq1dRP/dashboards/boxel-status/indexing.json
+++ b/tmp/committed-canon.XjD11i/dashboards/boxel-status/indexing.json
@@ -69,6 +69,10 @@
           "uid": "cef5v5sl9k7i8f"
         },
         "description": "System-wide operator action: queue a full reindex across every realm. The button disables itself while a `full-reindex` orchestration job is already pending or running. Per-realm reindex moved to the Realms dashboard. Click POSTs with `Authorization: Bearer ${grafana_secret}` (substituted from SSM at apply time, CS-10929).",
+        "fieldConfig": {
+          "defaults": {},
+          "overrides": []
+        },
         "gridPos": {
           "h": 8,
           "w": 24,

(Run: https://github.com/cardstack/boxel/actions/runs/26161560752)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Grafana preview

Preview deployed for 1 dashboard in the staging Grafana.
Cross-dashboard drill-throughs still point at the canonical staging dashboards.

Dashboards:

Preview is torn down automatically when this PR is closed or merged.

(Run: https://github.com/cardstack/boxel/actions/runs/26161560825)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Preview deployments

Host Test Results

    1 files  ±  0      1 suites  ±0   1h 34m 40s ⏱️ + 3m 50s
2 712 tests +212  2 697 ✅ +212  15 💤 ±0  0 ❌ ±0 
2 731 runs  +213  2 716 ✅ +213  15 💤 ±0  0 ❌ ±0 

Results for commit 880fdff. ± Comparison against earlier commit 57d3fe8.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   10m 23s ⏱️ -18s
1 480 tests ±0  1 480 ✅ ±0  0 💤 ±0  0 ❌ ±0 
1 571 runs  ±0  1 571 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 880fdff. ± Comparison against earlier commit 57d3fe8.

@backspace backspace changed the title feat: shared preview-realm GitHub Actions (split off from #4851) Extract preview/sync GitHub Actions May 20, 2026
@backspace backspace changed the base branch from main to cs-11161-extract-workspace-sync-action May 20, 2026 00:27
@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch 2 times, most recently from d7095f0 to 434ac24 Compare May 20, 2026 12:09
@backspace backspace changed the base branch from cs-11161-extract-workspace-sync-action to main May 20, 2026 12:09
backspace added a commit that referenced this pull request May 20, 2026
Node's fetch always reports `TypeError: fetch failed` as `error.message`;
the actual transport reason (ECONNRESET, TLS handshake error, undici
socket error, ENOTFOUND, GOAWAY, etc.) is stashed on `error.cause` and
was being silently dropped by the publish/unpublish error paths. That
left the action-demo workflow showing a bare "Error: fetch failed" with
no way to distinguish a real network issue from, say, a self-signed
cert problem against the published-realm subdomain.

Wrap the three swallowed sites:

- `publish.ts` `.action()` catch: log `err.cause` separately if present.
- `publish.ts` `waitForPublishedRealmReady`: capture cause into the
  `lastError` string so the readiness-timeout error reports the same
  thing the polling loop kept hitting.
- `unpublish.ts` `unpublishRealm`: embed cause into the `result.error`
  string the CLI surfaces.

This is the diagnostic the action-demo on #4897 needs to figure out
why publish hangs at the initial POST despite the server-side mount
completing successfully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backspace added a commit that referenced this pull request May 20, 2026
The worker's `fatalExit` handler already exists (uncaughtException /
unhandledRejection backstop with a finalize-reservation race) — but
it reports the error via `log.error(...)` immediately before
`process.exit(1)`. `worker-manager.ts` spawns the child with
`stdio: ['pipe', 'pipe', 'pipe', 'ipc']`, so the child's stderr is a
libuv-async pipe; the final stream chunk gets discarded when the
process disappears, and the captured server log shows the child as
having silently exited `code=1, signal=null` with no clue why.

worker.ts already uses `writeSync(2, ...)` for exactly this reason
on the STARTUP / SIGINT / SIGTERM / disconnect stamps (see the
comment above the STARTUP block at the top of the file). Apply the
same pattern to the three fatal-exit paths: the uncaughtException /
unhandledRejection handler, its inner finalize-failed fallback, and
the outer startup-error `.catch`. Route each through a new helper
that serializes the error with its full stack and walks `error.cause`
(where Node fetch / undici / TLS errors stash the real reason).

Discovered while debugging the action-demo on #4897 (CS-11180): every
`_publish-realm` of a fresh source realm enqueues a copy-index job
that throws *something* inside the worker; the worker exited
silently; pg-queue retried, hit the 2-reservation cap, abandoned the
job; the realm-server returned HTTP 500
`Job abandoned after 2 failed attempts (max=2)` to the publish
endpoint caller. Without this fix the underlying job-processing
error is unobservable.

The bundled `serialize-fatal-reason` helper is in its own module
because the FD-level write behavior can't be unit-tested in-process
(it requires a real child_process.spawn + libuv-piped stderr to
reproduce the bug being fixed) — but the serialization can. Tests
cover: stack preservation, cause-chain walking, non-Error values,
self-referential cause cycles (depth-capped), and Node fetch's
typical `TypeError: fetch failed` + ECONNRESET-on-cause shape.

Closes CS-11200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@backspace backspace marked this pull request as draft May 20, 2026 19:46
backspace added a commit that referenced this pull request May 21, 2026
Node's fetch always reports `TypeError: fetch failed` as `error.message`;
the actual transport reason (ECONNRESET, TLS handshake error, undici
socket error, ENOTFOUND, GOAWAY, etc.) is stashed on `error.cause` and
was being silently dropped by the publish/unpublish error paths. That
left the action-demo workflow showing a bare "Error: fetch failed" with
no way to distinguish a real network issue from, say, a self-signed
cert problem against the published-realm subdomain.

Wrap the three swallowed sites:

- `publish.ts` `.action()` catch: log `err.cause` separately if present.
- `publish.ts` `waitForPublishedRealmReady`: capture cause into the
  `lastError` string so the readiness-timeout error reports the same
  thing the polling loop kept hitting.
- `unpublish.ts` `unpublishRealm`: embed cause into the `result.error`
  string the CLI surfaces.

This is the diagnostic the action-demo on #4897 needs to figure out
why publish hangs at the initial POST despite the server-side mount
completing successfully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch from f8a1399 to 608717a Compare May 21, 2026 19:05
backspace and others added 2 commits May 21, 2026 14:08
Extract the publish-preview-realm / unpublish-preview-realm /
workspace-sync composite actions so `boxel-catalog`, `boxel-home`,
`boxel-skills` (and any future consumer) can stop maintaining
duplicated bespoke preview-realm workflows.

This branch is layered on top of cs-11161 (#4851) so the bundled
demo workflow can exercise `boxel realm publish` / `unpublish` /
`push` end-to-end against the CLI commits in this branch's
ancestry. Once #4851 lands, GitHub will auto-rebase this PR's base
onto main and the diff will stay clean against main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Used while iterating on the three composite actions; not part of the
shipped product. External consumers (boxel-catalog, boxel-home,
boxel-skills) exercise the actions in their own preview workflows.
@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch from 608717a to cb1d9db Compare May 21, 2026 19:08
backspace and others added 2 commits May 21, 2026 14:33
Adds preview-realm-actions-integration.yml — runs the three composite
actions (publish, workspace-sync, unpublish) end-to-end against the
same local matrix + realm-server stack `boxel-cli-test` boots, so
contract drift between the actions, the boxel-cli commands they wrap,
and the realm-server handlers they POST to is caught the moment any
side changes.

Path-gated triggers (on `pull_request` and `push` to main, plus
`workflow_dispatch` for manual) so only PRs touching the integration
surface pay the runtime cost. The set covers each action.yml, this
workflow, the publish/unpublish/push CLI commands, the
handle-publish-realm / handle-unpublish-realm server handlers, and
the copy-index task that the publish handler enqueues.

Uses path-relative `uses: ./.github/actions/...` so the actions run
at the PR's own commit. External consumers (boxel-catalog, -home,
-skills) pin a SHA instead.

Also re-applies the in-repo `mise` short-circuit in each action: when
`github.action_repository == github.repository` (i.e., invoked from
inside cardstack/boxel itself), set BOXEL_SRC to $GITHUB_WORKSPACE
and skip the clone + mise/pnpm install steps because the calling
workflow's ./.github/actions/init already did them. Without this the
inner `jdx/mise-action` re-hashes a separate cache key whose lookup
sits ~30 minutes before transfer. External consumers continue to go
through the full clone + install path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The in-repo short-circuit compared `github.action_repository` against
`github.repository`, but `github.action_repository` is only populated
for *external* `uses: org/repo/...@ref` references. For path-relative
`uses: ./.github/...` (which is exactly how
preview-realm-actions-integration.yml invokes these actions), the
value is empty, so the predicate `"" = "cardstack/boxel"` was false
and the action fell into the external-consumer branch and tried to
`git clone https://github.com/.git/`, failing with `remote: Not Found`.

Treat empty BOXEL_REPO as in-repo too. External consumers still hit
the populated-and-different branch and run the full clone + install.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant