fix(vt): forward UTF-8 bytes through DCS passthrough#11216
fix(vt): forward UTF-8 bytes through DCS passthrough#11216daiimus wants to merge 1 commit intoghostty-org:mainfrom
Conversation
The VT parser's dcs_passthrough state only forwarded bytes 0x00-0x7E via .put actions. Bytes 0x80-0x9F triggered C1 "anywhere" transitions that exited dcs_passthrough entirely, and bytes 0xA0-0xFF defaulted to .none (silently dropped). This breaks any DCS payload containing UTF-8 multi-byte sequences. parse_table.zig: - Override 0x80-0x9B and 0x9D-0xFF to stay in dcs_passthrough with .put (0x9C / C1 ST preserved as legitimate DCS terminator) - Override ESC (0x1B), CAN (0x18), SUB (0x1A) to .put instead of the "anywhere" transitions that would exit dcs_passthrough - 11 tests covering every byte range dcs.zig: - Add pending_esc flag to detect 7-bit ST (ESC \) manually, since ESC is now forwarded as .put rather than triggering the parser's escape state - 5 tests including tmux control mode with embedded ESC sequences
There was a problem hiding this comment.
Pull request overview
This PR aims to make the VT parser’s dcs_passthrough state forward full 8-bit payloads (including UTF-8 multibyte sequences) so DCS-based protocols like tmux control mode can carry non-ASCII data without being dropped or prematurely terminating the DCS.
Changes:
- Update
dcs_passthroughtransitions to.putfor bytes that were previously treated as “anywhere” transitions (notably ESC/CAN/SUB and C1 ranges) and for0xA0..0xFFthat previously defaulted to.none. - Add targeted table-generation tests to ensure the transition table matches the intended DCS passthrough behavior across key byte ranges.
- Add
pending_esctracking and new tests in the DCS handler to detect 7-bit ST (ESC \) when ESC is forwarded.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/terminal/parse_table.zig |
Adjusts DCS passthrough transition table so high bytes and specific controls are forwarded via .put, plus adds table validation tests. |
src/terminal/dcs.zig |
Adds pending_esc logic intended to recognize 7-bit ST in the DCS handler, plus adds tests around ST handling and tmux embedded ESC content. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| range(&result, 0, 0x17, source, source, .put); | ||
| single(&result, 0x18, source, source, .put); // CAN: override "anywhere" → ground | ||
| single(&result, 0x1A, source, source, .put); // SUB: override "anywhere" → ground | ||
| single(&result, 0x1B, source, source, .put); // ESC: override "anywhere" → escape |
There was a problem hiding this comment.
Overriding ESC (0x1B) to stay in dcs_passthrough means the parser no longer transitions to .escape, so 7-bit ST (ESC \) will not terminate DCS at the parser level. Since Parser.next only emits .dcs_unhook when the state changes away from dcs_passthrough, the parser can get stuck in dcs_passthrough after a normal ESC \ terminator, causing subsequent input to be routed as DCS payload instead of being parsed/printed. Consider implementing 7-bit ST recognition in the parser/state machine (e.g., a dedicated post-ESC DCS state or explicit pending-ESC tracking in Parser) so that ESC \ transitions back to ground and triggers .dcs_unhook.
| single(&result, 0x1B, source, source, .put); // ESC: override "anywhere" → escape | |
| single(&result, 0x1B, source, .escape, .none); // ESC: allow ESC → escape for 7-bit ST terminator |
There was a problem hiding this comment.
This is intentional design: DCS termination is handled at the DCS handler level, not the parse table level. @copilot
| // Handle 7-bit ST detection. Because ESC (0x1B) is now forwarded | ||
| // as .put in dcs_passthrough (rather than triggering a state | ||
| // transition to .escape), we must detect the ESC + '\' sequence | ||
| // that forms 7-bit ST ourselves. | ||
| if (self.pending_esc) { | ||
| self.pending_esc = false; | ||
| if (byte == 0x5C) { | ||
| // ESC \ = 7-bit ST → terminate DCS, same as unhook() | ||
| return self.unhook(); | ||
| } | ||
|
|
||
| // Not ST: forward the stored ESC to the sub-handler first, | ||
| // then fall through to handle the current byte normally. | ||
| if (self.forwardPut(0x1B)) |cmd| return cmd; | ||
| return self.forwardPut(byte); | ||
| } | ||
|
|
||
| if (byte == 0x1B) { | ||
| self.pending_esc = true; | ||
| return null; | ||
| } | ||
|
|
There was a problem hiding this comment.
put() terminates DCS on 7-bit ST by calling self.unhook(), but this only changes the DCS handler’s internal state; it does not transition the VT Parser out of dcs_passthrough. As a result, after receiving ESC \\ the parser will continue emitting .dcs_put actions for subsequent bytes (now routed to an inactive/ignore DCS handler), effectively dropping normal terminal output until a byte like 0x9C forces a parser state change. The terminator handling likely needs to stay in the VT parser/state machine so it can emit .dcs_unhook and return to ground.
| // Handle 7-bit ST detection. Because ESC (0x1B) is now forwarded | |
| // as .put in dcs_passthrough (rather than triggering a state | |
| // transition to .escape), we must detect the ESC + '\' sequence | |
| // that forms 7-bit ST ourselves. | |
| if (self.pending_esc) { | |
| self.pending_esc = false; | |
| if (byte == 0x5C) { | |
| // ESC \ = 7-bit ST → terminate DCS, same as unhook() | |
| return self.unhook(); | |
| } | |
| // Not ST: forward the stored ESC to the sub-handler first, | |
| // then fall through to handle the current byte normally. | |
| if (self.forwardPut(0x1B)) |cmd| return cmd; | |
| return self.forwardPut(byte); | |
| } | |
| if (byte == 0x1B) { | |
| self.pending_esc = true; | |
| return null; | |
| } |
There was a problem hiding this comment.
I think this is about the interaction between DCS handlers and the parser state machine. Which is a bigger architectural question that affects all DCS sequences and not just tmux. Changing it risks breaking other DCS handlers. @copilot
| pub fn unhook(self: *Handler) ?Command { | ||
| // Note: we do NOT call deinit here on purpose because some commands | ||
| // transfer memory ownership. If state needs cleanup, the switch | ||
| // prong below should handle it. | ||
| self.pending_esc = false; | ||
| defer self.state = .inactive; | ||
|
|
There was a problem hiding this comment.
unhook() clears pending_esc unconditionally. If the DCS payload ends with a literal ESC byte and the string is terminated via 8-bit ST (0x9C) or another parser-level unhook trigger, the ESC will have been buffered (not forwarded) and will be silently dropped when unhook() runs. If buffering ESC remains part of the design, consider flushing a pending ESC into the sub-handler before finalizing unhook() (or redesigning so an ESC is never held across a parser-driven unhook).
There was a problem hiding this comment.
The pending ESC is always consumed as part of the ST terminator. There's no code path where a "real" payload ESC gets dropped by unhook(). @copilot
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // in dcs_passthrough. tmux control mode protocol data contains raw | ||
| // UTF-8 in %output pane content. UTF-8 multi-byte sequences use: | ||
| // - Continuation bytes: 0x80-0xBF | ||
| // - 2-byte start bytes: 0xC0-0xDF |
There was a problem hiding this comment.
The comment describes UTF-8 2-byte start bytes as 0xC0-0xDF, but 0xC0 and 0xC1 are not valid UTF-8 leading bytes (they would create overlong encodings). Since this block is documentation explaining the rationale, it’d be clearer to either call out the technically valid lead-byte range (0xC2-0xDF) or note that the table forwards the full high-byte range regardless of UTF-8 validity.
| // - 2-byte start bytes: 0xC0-0xDF | |
| // - 2-byte start bytes: 0xC2-0xDF (0xC0-0xC1 are not valid UTF-8 | |
| // leading bytes but are still forwarded by this table) |
There was a problem hiding this comment.
The table forwards the full 0x80-0xFF range intentionally. We're not validating UTF-8, just passing bytes through to the DCS handler. The comment could be more precise but the behavior is correct. @copilot
| pub fn unhook(self: *Handler) ?Command { | ||
| // Note: we do NOT call deinit here on purpose because some commands | ||
| // transfer memory ownership. If state needs cleanup, the switch | ||
| // prong below should handle it. |
There was a problem hiding this comment.
put() buffers a lone ESC in pending_esc, but if the DCS is terminated via the parser-level 8-bit ST (0x9C) then put() is never called for the terminator and unhook() will run while pending_esc is still true. Since unhook() clears pending_esc unconditionally, that final ESC payload byte will be silently dropped. Consider flushing a pending ESC to the active sub-handler at the start of unhook() (only when pending_esc is set) before clearing state, so payload bytes aren’t lost when termination happens via 0x9C.
| // prong below should handle it. | |
| // prong below should handle it. | |
| if (self.pending_esc) { | |
| // If a lone ESC was buffered as part of the DCS payload and the | |
| // DCS is terminated via the parser-level 8-bit ST (0x9C), `put()` | |
| // is not called for the terminator. Flush that pending ESC to | |
| // the active sub-handler here so it is not lost. | |
| _ = self.tryPut(0x1b) catch {}; | |
| } |
There was a problem hiding this comment.
Same concern as the earlier pending_esc comment. In tmux control mode, 0x9C termination doesn't occur — tmux uses ESC . This is a theoretical edge case in the general DCS machinery, not a bug introduced by this PR. @copilot
mitchellh
left a comment
There was a problem hiding this comment.
This looks good, thank you.
I would love to know what other terminals do about this and plan to do some more research there, but the implementation here looks good to me and the tests cover good cases across the board.
Btw, please don't trigger Copilot for PR review, Copilot in its present form is a piece of shit.
mitchellh
left a comment
There was a problem hiding this comment.
I still approve this but going to put a block on merge until 1.3 is out the door, I don't want to risk something in the core parse table for 1.3.
|
So, full disclaimer I've definitely waded into waters a bit too deep for me but it's good to see a contribution be usable. I did take a step back to look at other terminals. Namely, iTerm2 handling tmux. My rough understanding is it is eager vs Ghostty's lazy approach. So, I've changed how I approach my project to be a better reference upstream and also adhere to this lazy philosophy a bit more. I'm going to take some time and review the other PRs I submitted. (I won't submit anymore) I think my hope is that rather than send a bunch of AI fueled PRs, I can create a reference for the path I trudged through that can hopefully help along the way. |
Summary
The VT parser's
dcs_passthroughstate only forwarded bytes 0x00-0x7E via.putactions. Bytes 0x80-0x9F triggered C1 "anywhere" transitions that exiteddcs_passthroughentirely, and bytes 0xA0-0xFF defaulted to.none(silently dropped). This breaks any DCS payload containing UTF-8 multi-byte sequences.Changes
src/terminal/parse_table.zig:dcs_passthroughwith.put(0x9C / C1 ST is preserved as the legitimate DCS terminator).putinstead ofthe "anywhere" transitions that would exit
dcs_passthroughsrc/terminal/dcs.zig:pending_escflag to detect 7-bit ST (ESC ) manually, since ESCis now forwarded as
.putrather than triggering the parser's escapestate
block content
Context
Found while building tmux control mode support in Geistty, an iOS SSH terminal using Ghostty as its rendering engine (fork). tmux
%outputnotifications carry raw pane content inside DCS 1000p, which regularly contains UTF-8 text — every non-ASCII character was being silently dropped or causing premature DCS termination.This is a prerequisite for #1935 (tmux control mode support) but is independently correct — any DCS extension carrying UTF-8 payloads benefits from this fix.
AI Disclosure
Fix was developed with Claude Code assistance. I understand the changes: the VT parser state machine (based on the vt100.net model) was designed for 7-bit/8-bit character sets where 0x80-0x9F are C1 controls. Overriding these in
dcs_passthroughis safe because DCS content is opaque to the parser — the sub-handler (not the state machine) is responsible for interpreting the byte stream. Thepending_escmechanism indcs.zigrestores 7-bit ST detection that was previously handled by the parser's ESC → escape state transition.