feat: add content sanitization for user-provided content by github-actions[bot] · Pull Request #210 · shaftoe/pi-coding-agent-action

github-actions · 2026-05-24T10:52:29Z

Summary

Implements content sanitization for all user-provided text before it enters the LLM prompt context. Closes #209.

Approach: No 3rd party library

Regarding the question about using 3rd party libraries — after evaluating the options, a self-contained utility is the better fit here:

sanitize-html: 7 dependencies (including postcss, htmlparser2), designed for HTML sanitization — overkill for stripping HTML comments from Markdown text
deghost: The most relevant library (zero-dep invisible Unicode stripper), but it's v0.0.1 with a single publisher and zero track record. That's a supply-chain risk for a GitHub Action.
Our implementation: 3 stable regex patterns covering well-established Unicode and HTML standards that won't change. ~25 lines, zero dependencies, zero maintenance burden.

The regexes target immutable standards (Unicode character categories, HTML comment syntax) — they require the same "maintenance" as knowing that \n means newline. No library update will ever improve .

Changes

New file: `src/platform/github/sanitize.ts`

sanitizeContent(text: string): string — strips three categories of hidden content:
1. HTML comments () — invisible in rendered Markdown but visible to LLMs
2. Invisible Unicode characters — zero-width spaces, joiners, directional markers, BOM
3. ASCII control characters — preserves meaningful whitespace (\n, \r, \t)

Integration points

src/platform/github/context.ts — sanitizes issue/PR bodies (via CONTEXT_EXTRACTORS) and comment bodies (in getComment())
src/platform/github/tools/thread.ts — sanitizes:
- Issue/PR body in buildThreadResult()
- Issue comment bodies in transformComment()
- PR review comment bodies in fetchPRReviewComments()

Tests: `tests/platform/github/sanitize.spec.ts`

22 test cases covering:

HTML comment removal (single-line, multi-line, multiple, nested)
Individual invisible Unicode character removal (U+200B-F, U+2028-F, U+2060-F, U+FEFF)
Control character removal (null, BEL, BS, DEL, form feed, vertical tab)
Meaningful whitespace preservation (\n, \r, \t)
Unicode content preservation (emoji, CJK, Arabic)
Realistic attack scenarios (hidden instructions + zero-width chars)

Validation

✅ ESLint — pass
✅ TypeScript — pass
✅ Prettier — pass
✅ 22/22 new tests — pass

Co-authored-by: shaftoe <shaftoe@users.noreply.github.com>

feat: add content sanitization for user-provided content

ca01456

Co-authored-by: shaftoe <shaftoe@users.noreply.github.com>

github-actions Bot mentioned this pull request May 24, 2026

Feature: Content sanitization for user-provided content #209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add content sanitization for user-provided content#210

feat: add content sanitization for user-provided content#210
github-actions[bot] wants to merge 1 commit into
developfrom
pi/issue209-1779619931630

github-actions Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

github-actions Bot commented May 24, 2026

Summary

Approach: No 3rd party library

Changes

New file: src/platform/github/sanitize.ts

Integration points

Tests: tests/platform/github/sanitize.spec.ts

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

New file: `src/platform/github/sanitize.ts`

Tests: `tests/platform/github/sanitize.spec.ts`