π‘οΈ Sentinel: [CRITICAL] Fix PDF compilation RCE#323
Conversation
Enforce `-no-shell-escape` for `pdflatex` and `--pdf-engine-opt=-no-shell-escape` for `pandoc` in `cli/pdf/converter.py` and `cli/generators/cover_letter_generator.py` to prevent Remote Code Execution via untrusted LaTeX input. Also added a subprocess timeout to prevent Denial of Service via infinite compilation loops. Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
|
π Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a π emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideHarden LaTeX-based PDF compilation against RCE and DoS by disabling shell-escape and adding timeouts/cleanup around pdflatex and pandoc subprocesses, plus documenting the issue in the Sentinel journal. Sequence diagram for secure LaTeX PDF compilation with timeoutsequenceDiagram
participant CoverLetterGenerator as CoverLetterGenerator._compile_pdf
participant PdfConverter as converter._compile_pdflatex
participant PandocConverter as converter._compile_pandoc
participant Subprocess as subprocess
CoverLetterGenerator->>Subprocess: Popen(["pdflatex", "-interaction=nonstopmode", "-no-shell-escape", tex_path.name])
Note over CoverLetterGenerator,Subprocess: Primary pdflatex compilation
CoverLetterGenerator->>Subprocess: communicate(timeout=30)
alt [pdflatex completes]
Subprocess-->>CoverLetterGenerator: stdout, stderr (returncode == 0)
CoverLetterGenerator->>CoverLetterGenerator: set pdf_created = True
else [TimeoutExpired]
CoverLetterGenerator->>Subprocess: kill()
CoverLetterGenerator->>Subprocess: communicate()
CoverLetterGenerator->>CoverLetterGenerator: pdf_created remains False
end
alt [pdf_created is False]
CoverLetterGenerator->>Subprocess: Popen(["pandoc", tex_path, "-o", output_path, "--pdf-engine=xelatex", "--pdf-engine-opt=-no-shell-escape"])
Note over CoverLetterGenerator,Subprocess: Fallback pandoc compilation
CoverLetterGenerator->>Subprocess: communicate(timeout=30)
alt [pandoc completes]
Subprocess-->>CoverLetterGenerator: stdout, stderr (returncode == 0)
CoverLetterGenerator->>CoverLetterGenerator: set pdf_created = True
else [TimeoutExpired]
CoverLetterGenerator->>Subprocess: kill()
CoverLetterGenerator->>Subprocess: communicate()
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The timeout and
Popen/communicate/killlogic forpdflatexandpandocis now duplicated in multiple places; consider extracting a shared helper to reduce repetition and keep future security changes centralized. - The
30second timeout is hard-coded in several calls tocommunicate; it would be easier to tune and reason about if this were a single constant or configuration value used across all LaTeX compilation paths. - The timeout behavior differs slightly between
cover_letter_generator._compile_pdf(which falls through and may try pandoc) andconverter._compile_pdflatex/_compile_pandoc(which immediately returnFalse); consider aligning these semantics so callers can rely on consistent behavior when a compilation times out.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The timeout and `Popen`/`communicate`/`kill` logic for `pdflatex` and `pandoc` is now duplicated in multiple places; consider extracting a shared helper to reduce repetition and keep future security changes centralized.
- The `30` second timeout is hard-coded in several calls to `communicate`; it would be easier to tune and reason about if this were a single constant or configuration value used across all LaTeX compilation paths.
- The timeout behavior differs slightly between `cover_letter_generator._compile_pdf` (which falls through and may try pandoc) and `converter._compile_pdflatex`/`_compile_pandoc` (which immediately return `False`); consider aligning these semantics so callers can rely on consistent behavior when a compilation times out.Help me be more useful! Please click π or π on each comment and I'll use the feedback to improve your reviews.
π¨ Severity: CRITICAL
π‘ Vulnerability: The application was executing LaTeX compilers without restricting the shell-escape feature (
\write18). It also lacked a subprocess timeout when runningpdflatexorpandoc.π― Impact: Maliciously crafted LaTeX input could execute arbitrary shell commands on the host machine leading to full system compromise (RCE). Additionally, malicious documents could induce an infinite compilation loop leading to a Denial of Service (DoS) by resource exhaustion.
π§ Fix: Passed the
-no-shell-escapeargument topdflatexand passed--pdf-engine-opt=-no-shell-escapewhen utilizingpandoc. Also enforced atimeout=30argument with process cleanup and graceful failure logic for the subprocess executions. Added journal entry to.jules/sentinel.mdoutlining the vulnerabilities and preventions.β Verification: Verified by reviewing the implementation with
catand runningtests/test_pdf_security.pyviapytest, as well as standard project lint and test checks.PR created automatically by Jules for task 1962055901658537478 started by @anchapin
Summary by Sourcery
Harden LaTeX-based PDF compilation against remote code execution and denial-of-service by tightening compiler options and subprocess handling.
Bug Fixes:
Documentation: