Skip to content

improvement(sandbox): expand document generation — style extraction, sandbox hardening, OOM errors, task guards#4526

Merged
waleedlatif1 merged 5 commits into
stagingfrom
improvement/pptx-docx-pdf-sandbox
May 9, 2026
Merged

improvement(sandbox): expand document generation — style extraction, sandbox hardening, OOM errors, task guards#4526
waleedlatif1 merged 5 commits into
stagingfrom
improvement/pptx-docx-pdf-sandbox

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Expand style extraction for DOCX/PPTX/PDF: OOXML font inheritance, bold detection (LibreOffice compat), heading ID case-insensitivity, compound font suffix stripping, PPTX slide count/aspect ratio/background color
  • Fix auth order in style route (session before parseRequest per CLAUDE.md mandate) and add 100 MB size guard
  • Fix contract schema drift: add pdf to format enum, make theme optional, add defaults/pageSize/fonts/slideCount/aspectRatio/background fields
  • Add __docxDocOptions to docx sandbox bootstrap/finalize so chunked mode can set document-wide styles and numbering
  • Add null guard to pptx finalize; align docx addImage validation with pdf/pptx (throw on missing dims)
  • Surface friendly OOM error from isolated-vm worker instead of raw V8 message

Type of Change

  • Bug fix
  • Improvement

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…sandbox hardening, OOM errors, PPTX/DOCX/PDF task guards
@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 9, 2026 0:26am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 8, 2026

PR Summary

Medium Risk
Medium risk: expands document parsing to PDFs and adds more OOXML style inference, which can affect performance/memory and edge-case handling for user-uploaded files. Also tweaks sandbox/worker error classification and generation guards, which could change failure modes in production.

Overview
Adds PDF support to workspace file style extraction (API + VFS), including a 100MB size limit, updated error messaging, and an expanded response schema (optional theme, plus PDF pageSize/fonts and PPTX slideCount/aspectRatio/background).

Improves OOXML extraction robustness by making DOCX themes optional (LibreOffice compatibility), resolving style inheritance/theme fonts, and enriching PPTX metadata; updates the sandbox worker to surface a friendlier OOM error before cancellation detection.

Enhances document generation tasks: DOCX chunked mode can now pass document-wide options via __docxDocOptions, and PPTX finalize now guards against globalThis.pptx being overwritten.

Reviewed by Cursor Bugbot for commit d251915. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR expands document style extraction to cover PDF files (via pdf-lib) alongside the existing DOCX/PPTX support, adds PPTX metadata (slide count, aspect ratio, background color), and introduces OOXML font-inheritance resolution for DOCX styles. It also fixes a critical error-ordering bug in the isolated-vm worker where OOM exceptions were silently swallowed by the cancellation path, and hardens the sandbox with a 100 MB file-size guard and auth-order correction.

  • document-style.ts: New extractPdfStyle function reads page dimensions and embedded font families from the first 10 pages; parseDocxStyles now walks basedOn chains and resolves w:asciiTheme font references; parsePptxPresentation and parseSlideMasterBackground add slide metadata.
  • isolated-vm-worker.cjs: OOM check moved before the isDisposed guard in both executeCode and executeTask so that an OOM (which auto-disposes the isolate) is no longer misreported as a user cancellation.
  • Route + contract: auth order fixed per CLAUDE.md mandate, 100 MB size guard added, and documentStyleSummarySchema updated to reflect all new optional fields.

Confidence Score: 5/5

Safe to merge — all logic paths are well-guarded and the OOM ordering fix corrects a genuine misclassification without introducing new failure modes.

The OOM reordering is the most load-bearing change and it is clearly correct: OOM auto-disposes the isolate, so the old code's isDisposed check would fire first and mask the real error. Moving OOM detection above the dispose check eliminates that ambiguity. The style-extraction additions are well-insulated behind try/catch at every layer (outer, per-page, per-font-entry), so a malformed or unusual document degrades gracefully to null. The contract schema changes are additive and backward-compatible. No existing call sites are broken.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/vfs/document-style.ts Major expansion: adds PDF style extraction via pdf-lib, PPTX slide count/aspect ratio/background parsing, DOCX inheritance resolution and font theme mapping; well-guarded with try/catch; landscape orientation not detected as a named preset
apps/sim/lib/execution/isolated-vm-worker.cjs OOM check moved before the isDisposed guard in both executeCode and executeTask; fixes a real ordering bug where OOM silently appeared as AbortError
apps/sim/app/api/workspaces/[id]/files/[fileId]/style/route.ts Auth order fixed (session before parseRequest), 100 MB size guard added, PDF extension support added; all changes correct
apps/sim/lib/api/contracts/workspace-files.ts Contract schema updated: pdf added to format enum, theme made optional, new fields for pageSize/fonts/slideCount/aspectRatio/background; aligns with runtime implementation
apps/sim/sandbox-tasks/docx-generate.ts Adds __docxDocOptions global for chunked document-wide styles/numbering; spread correctly placed before sections so sections always wins
apps/sim/sandbox-tasks/pptx-generate.ts Null guard added to finalize to catch overwritten pptx instance; error message is clear and accurate
apps/sim/lib/copilot/vfs/workspace-vfs.ts Comment and type annotation updates to reflect pdf support; no logic changes

Reviews (2): Last reviewed commit: "chore(lint): suppress noTemplateCurlyInS..." | Re-trigger Greptile

Comment thread apps/sim/sandbox-tasks/pptx-generate.ts
Comment thread apps/sim/lib/copilot/vfs/document-style.ts Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

…ings intentionally assert template literal preservation
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit d251915. Configure here.

@waleedlatif1 waleedlatif1 merged commit b74f8da into staging May 9, 2026
9 checks passed
@waleedlatif1 waleedlatif1 deleted the improvement/pptx-docx-pdf-sandbox branch May 9, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant