Set TextFormat default recursion limit to 100#26491
Set TextFormat default recursion limit to 100#26491gladiator9797 wants to merge 1 commit intoprotocolbuffers:mainfrom
Conversation
The TextFormat::Parser constructor initializes recursion_limit_ to
std::numeric_limits<int>::max() (2,147,483,647). This is inconsistent
with the binary format parser (default 100) and the JSON parser (default
64-100). A textproto with ~5,000-10,000 nested messages overflows the C
stack before the recursion check triggers, causing SIGSEGV.
Change the default to 100, matching CodedInputStream::default_recursion_limit_.
Users who need a different limit can still call SetRecursionLimit().
Reproducer:
Parse textproto with 10000 levels of nested_type { ... }
-> SIGSEGV (exit 139) due to stack overflow
9cd1ebe to
8991e76
Compare
|
Sorry, it's not clear that if we can accept this change or not, as it would be a breaking change for a large number of people. As a breaking change, it would need to be done with one of our breaking change releases which we only do in Q1 (we already did the one for 2026), if we want to move forward with this we likely couldn't release it until Q1-2027. Security issues take precedence over breaking change cadence, but we don't view this specific issue as a security topic. The reason why is that TextProto is intended to be a debugging and configuration format rather than a wire format. It generally should not used in adversarial/untrusted inputs, and the lack of depth enforcement is working as intended. The opt-in depth limit on the C++ TextFormat parser is primarily available as a convenience knob to avoid confusion/ergonomic problems when you do parseText > serializeBinary > parseBinary since the first moment of the depth limit enforcement is that third step, which can be confusing. In general because of the second property, we actually probably will update our guidance that new TextProto parsers should enforce depth by default with an easily available disable, but likely we would view 'edge case depth bypass' issues as a vanilla minor bug rather than as a security-sensitive topic from there. In general, because AI tools are suddenly finding a lot of depth bypass concerns we are likely to clarify and document our stance on this, so please bear with us. Thanks! |
|
We are going to have more followups to clarify the security posture regarding TextProto (namely: that by default it is presumed to not be adversarial input, unlike Binary and JSON formats). Even on trusted inputs, aligned depth has good ergonomic benefits because it is quite unfortunate to parse a config and then not be able to round trip it through binary wire format. But, after evaluating the fallout of how many preexisting cases would suddenly fail with this change, we definitely cannot take this change today unfortunately, as a significant breaking change that we can't justify doing under our intended security posture. Thanks again for the candidate change. |
Summary
The
TextFormat::Parserconstructor initializesrecursion_limit_tostd::numeric_limits<int>::max()(2,147,483,647) attext_format.cc:1803. This is inconsistent with:CodedInputStream::default_recursion_limit_)kUpb_WireFormat_DefaultDepthLimit)A textproto with ~5,000-10,000 nested messages overflows the C stack before the recursion check ever triggers, causing
SIGSEGV.Fix
Change the default from
std::numeric_limits<int>::max()to100, matchingCodedInputStream::default_recursion_limit_. Users who need a different limit can still callSetRecursionLimit().Reproducer
Test plan
SetRecursionLimit()API still works for custom limits