Skip to content

fix(vt): forward UTF-8 bytes through DCS passthrough#11216

Open
daiimus wants to merge 1 commit intoghostty-org:mainfrom
daiimus:fix/dcs-passthrough-utf8
Open

fix(vt): forward UTF-8 bytes through DCS passthrough#11216
daiimus wants to merge 1 commit intoghostty-org:mainfrom
daiimus:fix/dcs-passthrough-utf8

Conversation

@daiimus
Copy link
Copy Markdown

@daiimus daiimus commented Mar 7, 2026

Summary

The VT parser's dcs_passthrough state only forwarded bytes 0x00-0x7E via .put actions. Bytes 0x80-0x9F triggered C1 "anywhere" transitions that exited dcs_passthrough entirely, and bytes 0xA0-0xFF defaulted to .none (silently dropped). This breaks any DCS payload containing UTF-8 multi-byte sequences.

Changes

src/terminal/parse_table.zig:

  • Override 0x80-0x9B and 0x9D-0xFF to stay in dcs_passthrough with
    .put (0x9C / C1 ST is preserved as the legitimate DCS terminator)
  • Override ESC (0x1B), CAN (0x18), SUB (0x1A) to .put instead of
    the "anywhere" transitions that would exit dcs_passthrough
  • 11 tests covering every byte range

src/terminal/dcs.zig:

  • Add pending_esc flag to detect 7-bit ST (ESC ) manually, since ESC
    is now forwarded as .put rather than triggering the parser's escape
    state
  • 5 tests including tmux control mode with embedded ESC sequences in
    block content

Context

Found while building tmux control mode support in Geistty, an iOS SSH terminal using Ghostty as its rendering engine (fork). tmux %output notifications carry raw pane content inside DCS 1000p, which regularly contains UTF-8 text — every non-ASCII character was being silently dropped or causing premature DCS termination.

This is a prerequisite for #1935 (tmux control mode support) but is independently correct — any DCS extension carrying UTF-8 payloads benefits from this fix.

AI Disclosure

Fix was developed with Claude Code assistance. I understand the changes: the VT parser state machine (based on the vt100.net model) was designed for 7-bit/8-bit character sets where 0x80-0x9F are C1 controls. Overriding these in dcs_passthrough is safe because DCS content is opaque to the parser — the sub-handler (not the state machine) is responsible for interpreting the byte stream. The pending_esc mechanism in dcs.zig restores 7-bit ST detection that was previously handled by the parser's ESC → escape state transition.

The VT parser's dcs_passthrough state only forwarded bytes 0x00-0x7E
via .put actions. Bytes 0x80-0x9F triggered C1 "anywhere" transitions
that exited dcs_passthrough entirely, and bytes 0xA0-0xFF defaulted to
.none (silently dropped). This breaks any DCS payload containing UTF-8
multi-byte sequences.

parse_table.zig:
- Override 0x80-0x9B and 0x9D-0xFF to stay in dcs_passthrough with .put
  (0x9C / C1 ST preserved as legitimate DCS terminator)
- Override ESC (0x1B), CAN (0x18), SUB (0x1A) to .put instead of the
  "anywhere" transitions that would exit dcs_passthrough
- 11 tests covering every byte range

dcs.zig:
- Add pending_esc flag to detect 7-bit ST (ESC \) manually, since ESC
  is now forwarded as .put rather than triggering the parser's escape
  state
- 5 tests including tmux control mode with embedded ESC sequences
@daiimus daiimus requested a review from a team as a code owner March 7, 2026 20:31
Copilot AI review requested due to automatic review settings March 7, 2026 20:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make the VT parser’s dcs_passthrough state forward full 8-bit payloads (including UTF-8 multibyte sequences) so DCS-based protocols like tmux control mode can carry non-ASCII data without being dropped or prematurely terminating the DCS.

Changes:

  • Update dcs_passthrough transitions to .put for bytes that were previously treated as “anywhere” transitions (notably ESC/CAN/SUB and C1 ranges) and for 0xA0..0xFF that previously defaulted to .none.
  • Add targeted table-generation tests to ensure the transition table matches the intended DCS passthrough behavior across key byte ranges.
  • Add pending_esc tracking and new tests in the DCS handler to detect 7-bit ST (ESC \) when ESC is forwarded.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/terminal/parse_table.zig Adjusts DCS passthrough transition table so high bytes and specific controls are forwarded via .put, plus adds table validation tests.
src/terminal/dcs.zig Adds pending_esc logic intended to recognize 7-bit ST in the DCS handler, plus adds tests around ST handling and tmux embedded ESC content.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

range(&result, 0, 0x17, source, source, .put);
single(&result, 0x18, source, source, .put); // CAN: override "anywhere" → ground
single(&result, 0x1A, source, source, .put); // SUB: override "anywhere" → ground
single(&result, 0x1B, source, source, .put); // ESC: override "anywhere" → escape
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding ESC (0x1B) to stay in dcs_passthrough means the parser no longer transitions to .escape, so 7-bit ST (ESC \) will not terminate DCS at the parser level. Since Parser.next only emits .dcs_unhook when the state changes away from dcs_passthrough, the parser can get stuck in dcs_passthrough after a normal ESC \ terminator, causing subsequent input to be routed as DCS payload instead of being parsed/printed. Consider implementing 7-bit ST recognition in the parser/state machine (e.g., a dedicated post-ESC DCS state or explicit pending-ESC tracking in Parser) so that ESC \ transitions back to ground and triggers .dcs_unhook.

Suggested change
single(&result, 0x1B, source, source, .put); // ESC: override "anywhere" → escape
single(&result, 0x1B, source, .escape, .none); // ESC: allow ESC → escape for 7-bit ST terminator

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional design: DCS termination is handled at the DCS handler level, not the parse table level. @copilot

Comment thread src/terminal/dcs.zig
Comment on lines +123 to +144
// Handle 7-bit ST detection. Because ESC (0x1B) is now forwarded
// as .put in dcs_passthrough (rather than triggering a state
// transition to .escape), we must detect the ESC + '\' sequence
// that forms 7-bit ST ourselves.
if (self.pending_esc) {
self.pending_esc = false;
if (byte == 0x5C) {
// ESC \ = 7-bit ST → terminate DCS, same as unhook()
return self.unhook();
}

// Not ST: forward the stored ESC to the sub-handler first,
// then fall through to handle the current byte normally.
if (self.forwardPut(0x1B)) |cmd| return cmd;
return self.forwardPut(byte);
}

if (byte == 0x1B) {
self.pending_esc = true;
return null;
}

Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put() terminates DCS on 7-bit ST by calling self.unhook(), but this only changes the DCS handler’s internal state; it does not transition the VT Parser out of dcs_passthrough. As a result, after receiving ESC \\ the parser will continue emitting .dcs_put actions for subsequent bytes (now routed to an inactive/ignore DCS handler), effectively dropping normal terminal output until a byte like 0x9C forces a parser state change. The terminator handling likely needs to stay in the VT parser/state machine so it can emit .dcs_unhook and return to ground.

Suggested change
// Handle 7-bit ST detection. Because ESC (0x1B) is now forwarded
// as .put in dcs_passthrough (rather than triggering a state
// transition to .escape), we must detect the ESC + '\' sequence
// that forms 7-bit ST ourselves.
if (self.pending_esc) {
self.pending_esc = false;
if (byte == 0x5C) {
// ESC \ = 7-bit ST → terminate DCS, same as unhook()
return self.unhook();
}
// Not ST: forward the stored ESC to the sub-handler first,
// then fall through to handle the current byte normally.
if (self.forwardPut(0x1B)) |cmd| return cmd;
return self.forwardPut(byte);
}
if (byte == 0x1B) {
self.pending_esc = true;
return null;
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about the interaction between DCS handlers and the parser state machine. Which is a bigger architectural question that affects all DCS sequences and not just tmux. Changing it risks breaking other DCS handlers. @copilot

Comment thread src/terminal/dcs.zig
Comment on lines 191 to 197
pub fn unhook(self: *Handler) ?Command {
// Note: we do NOT call deinit here on purpose because some commands
// transfer memory ownership. If state needs cleanup, the switch
// prong below should handle it.
self.pending_esc = false;
defer self.state = .inactive;

Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unhook() clears pending_esc unconditionally. If the DCS payload ends with a literal ESC byte and the string is terminated via 8-bit ST (0x9C) or another parser-level unhook trigger, the ESC will have been buffered (not forwarded) and will be silently dropped when unhook() runs. If buffering ESC remains part of the design, consider flushing a pending ESC into the sub-handler before finalizing unhook() (or redesigning so an ESC is never held across a parser-driven unhook).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pending ESC is always consumed as part of the ST terminator. There's no code path where a "real" payload ESC gets dropped by unhook(). @copilot

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// in dcs_passthrough. tmux control mode protocol data contains raw
// UTF-8 in %output pane content. UTF-8 multi-byte sequences use:
// - Continuation bytes: 0x80-0xBF
// - 2-byte start bytes: 0xC0-0xDF
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describes UTF-8 2-byte start bytes as 0xC0-0xDF, but 0xC0 and 0xC1 are not valid UTF-8 leading bytes (they would create overlong encodings). Since this block is documentation explaining the rationale, it’d be clearer to either call out the technically valid lead-byte range (0xC2-0xDF) or note that the table forwards the full high-byte range regardless of UTF-8 validity.

Suggested change
// - 2-byte start bytes: 0xC0-0xDF
// - 2-byte start bytes: 0xC2-0xDF (0xC0-0xC1 are not valid UTF-8
// leading bytes but are still forwarded by this table)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table forwards the full 0x80-0xFF range intentionally. We're not validating UTF-8, just passing bytes through to the DCS handler. The comment could be more precise but the behavior is correct. @copilot

Comment thread src/terminal/dcs.zig
pub fn unhook(self: *Handler) ?Command {
// Note: we do NOT call deinit here on purpose because some commands
// transfer memory ownership. If state needs cleanup, the switch
// prong below should handle it.
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put() buffers a lone ESC in pending_esc, but if the DCS is terminated via the parser-level 8-bit ST (0x9C) then put() is never called for the terminator and unhook() will run while pending_esc is still true. Since unhook() clears pending_esc unconditionally, that final ESC payload byte will be silently dropped. Consider flushing a pending ESC to the active sub-handler at the start of unhook() (only when pending_esc is set) before clearing state, so payload bytes aren’t lost when termination happens via 0x9C.

Suggested change
// prong below should handle it.
// prong below should handle it.
if (self.pending_esc) {
// If a lone ESC was buffered as part of the DCS payload and the
// DCS is terminated via the parser-level 8-bit ST (0x9C), `put()`
// is not called for the terminator. Flush that pending ESC to
// the active sub-handler here so it is not lost.
_ = self.tryPut(0x1b) catch {};
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as the earlier pending_esc comment. In tmux control mode, 0x9C termination doesn't occur — tmux uses ESC . This is a theoretical edge case in the general DCS machinery, not a bug introduced by this PR. @copilot

Copy link
Copy Markdown
Contributor

@mitchellh mitchellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you.

I would love to know what other terminals do about this and plan to do some more research there, but the implementation here looks good to me and the tests cover good cases across the board.

Btw, please don't trigger Copilot for PR review, Copilot in its present form is a piece of shit.

@mitchellh mitchellh added this to the 1.4.0 milestone Mar 8, 2026
Copy link
Copy Markdown
Contributor

@mitchellh mitchellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still approve this but going to put a block on merge until 1.3 is out the door, I don't want to risk something in the core parse table for 1.3.

@daiimus
Copy link
Copy Markdown
Author

daiimus commented Mar 10, 2026

So, full disclaimer I've definitely waded into waters a bit too deep for me but it's good to see a contribution be usable. I did take a step back to look at other terminals. Namely, iTerm2 handling tmux. My rough understanding is it is eager vs Ghostty's lazy approach.

So, I've changed how I approach my project to be a better reference upstream and also adhere to this lazy philosophy a bit more. I'm going to take some time and review the other PRs I submitted. (I won't submit anymore)

I think my hope is that rather than send a bunch of AI fueled PRs, I can create a reference for the path I trudged through that can hopefully help along the way.

@ghostty-bot ghostty-bot Bot added the vt Control sequence related label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

vt Control sequence related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants