Pin default --format archive layout: document.md + document.html + images/#131
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #113
…ages/ Pins the default archive layout contract (issue #113) across paths: - JS: the markdown-format /archive endpoint previously bundled only document.md + images/, omitting the reference document.html. It now emits document.html on every path. Adds buildArchiveFromHtml(html, baseUrl) and an extracted appendMarkdownAndImages helper so the layout stays identical and avoids code duplication. - Rust: adds web_capture::archive::build_zip_from_html(html, base_url) building the same document.md + document.html + images/ layout via the shared create_archive_zip path. - Tests: integration tests on both paths pin the layout (and assert document.md references relative images/ paths, never inline base64) so a future refactor cannot regress the contract. - Strengthens the existing /archive markdown endpoint test to assert the document.md + document.html layout on the real handler path.
--format archive layout: zip = document.md + document.html + images/--format archive layout: document.md + document.html + images/
Working session summaryWorking tree is clean and everything is committed and pushed. Here's a summary of what I did. SummaryIssue #113 asked to pin the default Root cause foundWhile the issue framed this as purely "add a regression test," I discovered a real behavioral gap: the JS markdown-format Changes
Verification
PR #131 is updated, marked ready for review, and up to date with This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $4.039436📊 Context and tokens usage:
Total: (4.6K new + 82.3K cache writes + 5.4M cache reads) input tokens, 32.8K output tokens, $4.039436 cost 🤖 Models used:
📎 Log file uploaded as Gist (1967KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
🚨 Solution Draft FailedThe automated solution draft encountered an error: 🤖 Models used:
📎 Failure log uploaded as Gist (3072KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
https://github.com/link-assistant/web-capture/actions/runs/26643060561/job/78598688406?pr=131 gets cancelled, can we somehow use prebuilt images on top of https://github.com/link-foundation/box to actually solve it? |
|
🤖 AI Work Session Started Starting automated work session at 2026-05-30T00:19:02.995Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback. |
The Playwright/Puppeteer browser downloads occasionally stall after reaching 100% (the connection never closes). With no per-step bound this consumed the entire 30-minute job budget and the run was cancelled (issue #113 PR #131). Wrap each install in `timeout 600` with up to 3 attempts so a stalled download fails fast and retries instead of hanging the whole job.
The Playwright/Puppeteer browser downloads occasionally stall after reaching 100% (the connection never closes). With no per-step bound this consumed the entire 30-minute job budget and the run was cancelled (issue #113 PR #131). Wrap each install in `timeout 600` with up to 3 attempts so a stalled download fails fast and retries instead of hanging the whole job.
c73d57c to
070c496
Compare
Working session summaryThe CI run is queued (runners busy). My background monitors will notify me when it progresses. Let me wait for those notifications before proceeding. Summary of work so far:
I'll wait for the CI result, then post a comment addressing the box suggestion and mark the PR ready if green. This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $2.404680📊 Context and tokens usage:
Total: (7.2K new + 64.9K cache writes + 2.8M cache reads) input tokens, 21.9K output tokens, $2.404680 cost 🤖 Models used:
📎 Log file uploaded as Gist (1427KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart triggered (iteration 1)Reason: CI failures detected Starting new session to address the issues. Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations. |
…hangs The live download from cdn.playwright.dev intermittently hangs at 100% on GitHub-hosted runners, consuming the full 30-min job budget and getting the run cancelled (issue #113 / PR #131). Copy the browser bundle out of Microsoft's official prebuilt Playwright image (mcr.microsoft.com/playwright:v<version>-noble, version resolved dynamically) into ~/.cache/ms-playwright, with the previous timeout+retry CDN download kept as a fallback. Stay on the host runner so the later Docker-based e2e steps keep working.
The runner image ships an incomplete Chrome cache entry (version folder present, executable missing); puppeteer browsers install aborts instantly on it rather than redownloading. Remove incomplete Chrome installs before the retry loop. Also drop the accidental duplicate Install Puppeteer browsers step introduced in the previous commit.
The previous run's install step exited 0 in ~1.4s with no download and no path output, leaving Chrome absent so browser-launch tests failed with 'Could not find Chrome'. Wipe the Chrome cache before installing so the download always runs, override any skip-download env the runner image may set, list the cache after install, and hard-verify the executable Puppeteer resolves at launch time (puppeteer.executablePath).
…incomplete) Puppeteer reliably downloads the Chrome archive (~163 MiB) but its built-in extraction leaves the bundle incomplete on the runner — the main chrome binary is missing while the command still exits 0, so browser-launch tests fail with 'Could not find Chrome'. Let Puppeteer download, then extract the archive ourselves with system unzip and hard-verify the resolved executable. Version is derived dynamically via puppeteer.executablePath().
Working session summaryConfirmed green. Final state for PR #131 (HEAD ✅ All CI checks pass
PR state: What fixed the failuresThe originally-reported JS - Test failure traced to the markdown-format Two supporting fixes were also needed for the JS workflow to go fully green:
The task is complete: the failing CI checks are fixed, all checks pass on the current commit, the changes meet the issue #113 archive-contract requirements, and PR #131 is open and mergeable ( PR: #131 This summary was automatically extracted from the AI working session output. |
🔄 Auto-restart-until-mergeable Log (iteration 1)This log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $6.160428📊 Context and tokens usage:Claude Opus 4.8: (3 sub-sessions)
Total: (9.0K new + 352.0K cache writes + 160.1K cache reads) input tokens, 153.4K output tokens, $6.160428 cost 🤖 Models used:
📎 Log file uploaded as Gist (3965KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
CI is green now ✅Full run (both workflows passing): https://github.com/link-assistant/web-capture/actions/runs/26670735682 The cancelled job turned out to be two independent browser-install failures that the 30‑min budget was masking as a single "cancelled" run. I diagnosed each from the job logs and fixed them: 1. Playwright —
|
🔄 Auto-restart triggered (iteration 2)Reason: New comment(s) from non-bot user(s): konard Starting new session to address the issues. Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations. |
Working session summaryEverything checks out. The SummaryI reviewed the triggering comment and verified the full state of PR #131. The comment that triggered this restart — konard's "CI is green now ✅" — is informational; it confirms the user diagnosed and fixed the two browser-install CI flakes (Playwright CDN hang → pull from Verification results — all green, nothing to fix:
Issue #113 requirements intact — the PR still pins the default
PR #131 is fully mergeable with all checks passing on the current commit and the branch up to date with This summary was automatically extracted from the AI working session output. |
🔄 Auto-restart-until-mergeable Log (iteration 2)This log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $0.327416📊 Context and tokens usage:
Total: (2.3K new + 22.6K cache writes + 168.6K cache reads) input tokens, 3.6K output tokens, $0.327416 cost 🤖 Models used:
📎 Log file uploaded as Gist (4340KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
This reverts commit ebab194.
Summary
Pins the default
--format archivelayout contract from issue #113 so it is identical across capture paths:Fixes #113
Root cause fixed
The JS markdown-format
/archiveendpoint bundled onlydocument.md+images/and omitted the referencedocument.html. It now emitsdocument.htmlon every path, matching the Rust paths (which already usedcreate_archive_zip).Changes
js/src/archive.js):/archiveoutput now includesdocument.html.buildArchiveFromHtml(html, baseUrl)builds the contract layout as aBuffer.appendMarkdownAndImages(archive, markdown)so the handler and the new helper share one code path (keepsjscpdduplication at 0).rust/src/archive.rs):web_capture::archive::build_zip_from_html(html, base_url)building the samedocument.md+document.html+images/layout via the sharedcreate_archive_zip.patch) for@link-assistant/web-capture. RustCargo.tomlis already at0.3.18; CI auto-bumps on merge.How to reproduce / tests
The fixture is a tiny HTML page with one inline base64 PNG — no external service required.
js/tests/integration/archive-default-layout.test.js— asserts the zip frombuildArchiveFromHtmlcontainsdocument.md+document.html+images/, thatdocument.mdreferencesimages/and contains nodata:image, and thatdocument.htmlis the source.rust/tests/integration/archive_default_layout.rs— same contract viaweb_capture::archive::build_zip_from_html.js/tests/integration/api-endpoints.test.jsto assert thedocument.md+document.htmllayout on the real/archivemarkdown handler path.Local verification
The 4 failing tests in
tests/unit/browser.test.jsin the sandbox are pre-existing and unrelated (no Chrome/Playwright browser available in the environment).