feat: remote image builder for the TypeScript SDK with SIGINT cancellation and unified retry policy#78
Merged
Merged
Conversation
mohamedveron
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimize image-based snapshots through the remote builder
❌ Current behavior
✅ New behavior
🤔 Assumptions
FROM <image>Dockerfile and produces a nydus-optimized image in the internal registry — no special-case API on the builder side needed.ghcr.io/foo/bar:latest→bar) is preferable to a random UUID for the build's image name, since the existing_buildRemotelyderives the name frompath.basename(contextDir).🧠 Decisions
_buildRemotelyrather than adding a new builder code path: write a syntheticFROM <image>Dockerfile into a temp folder and delegate. Keeps one remote-build implementation; nydus conversion happens for free because_buildRemotelyalready passesnydus: true.mkdtemp+finallycleanup so the temp directory is removed regardless of build outcome (success, failure, Ctrl+C)._→-. Falls back to"image"for pathological inputs that strip to empty.🧪 Testing
📁 References
together-sandbox-typescript/src/Snapshots.tstogether-sandbox-typescript/src/RemoteImageBuilder.tsTear down server-side builds on Ctrl+C
❌ Current behavior
✅ New behavior
sequenceDiagram participant U as User participant SDK as SDK participant API as Build API participant Pod as Build Pod U->>SDK: snapshots.create({ image }) SDK->>API: POST /builds (FROM <image>) API-->>SDK: { build_id } SDK->>API: GET /builds/{id}/logs (SSE) Note over U,SDK: Ctrl+C → U->>SDK: SIGINT SDK->>API: DELETE /builds/{id} API->>Pod: tear down🤔 Assumptions
process.once("SIGINT", ...)handler is required to issue the DELETE before exiting with status 130.🧠 Decisions
once+removeListenerin TS so repeatedbuild()calls in the same process (tests, daemons) don't accumulate stale SIGINT handlers.asyncio.shield(self.cancel(...))in Python guards against a second Ctrl+C aborting the DELETE mid-flight and leaving an orphaned pod.🧪 Testing
📁 References
together-sandbox-typescript/src/RemoteImageBuilder.tstogether-sandbox-python/together_sandbox/_remote_image_builder.pyFaster, consistent retries when streaming remote build logs
❌ Current behavior
✅ New behavior
🤔 Assumptions
eventsourcenpm library setsevt.codeonly for HTTP responses; transport failures leavecodeundefined and already flow through the resolve-with-sawDone=falseretry path.🧠 Decisions
new Set([...RETRYABLE_STATUS_CODES, 404])instead of mutating the shared export, so the 404-retry stays scoped to the SSE stream and doesn't leak into othercallApiconsumers.EventSourceconstructor failures from malformed URLs) throw immediately rather than burn the retry budget on programmer errors.except HTTPStatusErrorcovers HTTP retries;except Exceptionalready covers transport errors — TS reaches the same outcome via its two-path topology.🔄 Discussions
except Exception:parity. Verified via theeventsourcelibrary source that transport failures always take theresolve(sawDone=false)path, so the strict predicate is correct.RETRYABLE_STATUS_CODES.add(404)to share the set, but that would silently extend retry behavior for every othercallApiconsumer — chose a local spread-constructed set instead.🧪 Testing
📁 References
together-sandbox-typescript/src/RemoteImageBuilder.tstogether-sandbox-typescript/src/utils.tstogether-sandbox-python/together_sandbox/_remote_image_builder.pytogether-sandbox-python/together_sandbox/_utils.py