-
Notifications
You must be signed in to change notification settings - Fork 985
MultipartEntityBuilder to use a fixed boundary by a default #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@arturobernalg Please, please do not take everything I say literally or even seriously. By boundary_blah I did not mean 'boundary_blah', but something that is unique enough and is unlikely to be present in body parts' text. Something like |
… in MultipartEntityBuilder. Update Javadoc to clarify users must set their own boundary for uniqueness, and adjust test to verify the new fixed value.
bd9160a to
b307812
Compare
|
@arturobernalg Do you think you would have time to do the following?
|
4727a13 to
cbc6082
Compare
@ok2c |
2837a9e to
8749a1a
Compare
httpclient5/src/main/java/org/apache/hc/client5/http/entity/mime/MultipartEntityBuilder.java
Outdated
Show resolved
Hide resolved
ok2c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arturobernalg Looks good to me. Please do allow the security folks a few more days to review and comment
httpclient5/src/main/java/org/apache/hc/client5/http/entity/mime/MultipartEntityBuilder.java
Show resolved
Hide resolved
httpclient5/src/main/java/org/apache/hc/client5/http/entity/mime/MultipartEntityBuilder.java
Show resolved
Hide resolved
|
Is there a reason to default to the fixed boundary rather than the secure-random boundary? I think the only time it's safe to use a fixed boundary is if there's no user-controlled data in the request that could possibly include the fixed boundary. A few examples:
With Rather than trying to communicate these risks via Javadoc, why not just default to the always-safe option of a UUID, which uses a cryptographically-secure RNG and doesn't have any of these risks? Most other MultipartEntity libraries use a cryptographically-secure RNG to generate the boundary. And then users could explicitly opt-in to the simpler behavior if they're confident that none of their data could contain the fixed boundary. |
|
@benweissmann Regular users can be expected to sanitize / validate UI input and use TLS for transport security, in which case the boundary value can as well be fixed. Folks concerned about (1) green men from Mars, (2) Evil Russians, (3) witchcraft, etc can opt to use the UUID based generator. We can make the latter default but I personally do not see a good reason. |
|
This PR appears to fundamentally misunderstand the risks; it's not a packet injection or MITM attack, it's an in-band attack within user supplied data. The notion that an application should know that it needs to parse and sanitize out a magic apache-specific boundary token from input is absurd. The changes in this PR are actively dangerous and should be reverted immediately, as they result in a fundamentally unsafe default behavior. |
|
Aside from the security implications, this change also violates the guidance in RFC 2048 that:
With this change, nested multipart encoding will also break unless users explicitly toggle on the randomized boundaries. I'd also note the following section from the RFC:
|
|
@outlandishlizard Thank you for sharing your opinion. Please do note HttpClient does not support nested MIME bodies and never will. For anything other than trivial cases it is recommended to use a full-featured MIME library such as Apache Mime4J. |
This does not mean that our code should be compliant. |
|
Exactly compliant with that? The section in question is referring to "a body part within another"multipart" entity". We do not have such cases. |
You mean that the client is responsible for the payload? |
|
Circling back, what is the benefit of using a fixed boundary value? It appears to be a change that is less RFC compliant and less secure, and places an unrealistic burden on users to understand particulars of the internals of the library when sanitizing message content. |
@outlandishlizard They are not many: it is less computationally expensive and is simple, the folks would know what to look for when sanitizing input By the way, what are exactly the benefits of using a random boundary other than giving the users a false notion of not having to worry about the security of their applications and having someone else to blame in case something goes sideways? |
|
The RFC names quite a few reasons for using a pseudorandom or random value, which I have quoted. Additionally, I don't think that it is "providing a false sense of security" to ensure that users don't need to be aware of unique implementation details of a downstream library when parsing input. I have never before considered whether a message I was sending via cURL, or python requests, or any other number of tools might be mangled because of multipart encoding issues with the underlying library, and I'm a bit confused as to why you think your users, who may be several wrappers downstream, will have any idea that they need to escape these values-- it's not a normal application security consideration at all. |
|
Even if we accept your premise that this is a user responsibility to ensure input never contains a boundary separator (something that they literally cannot do in many RFC compliant implementations that use randomized boundaries), that would make this patch a breaking change to current behavior without any user comms. |
This is a responsibility of people who wrap the library and enforce a security model or provide a UI, not that of the library. Again, do not pin it on us. |
|
So, you contend that your users, consumers of an HTTP client library, are responsible for piercing the abstraction layer, seeing that your code handles multipart encoding wrong, and defending themselves from the library? |
|
I will also note that you have yet to address my comment that this constitutes a breaking change that users had no way to defend themselves from-- surely you don't contend that they should also have read your mind that you planned to switch the implementation to a static boundary and defensively coded their applications to contain filters for that boundary? |
@outlandishlizard Please re-read what I have said.
I do not. There is a reasonably well-established process for dealing with potentially breaking changes. |
I have read what you said several times. The above quote seems to me to say that you believe it is the responsibility of your users to escape the boundary values, piercing the abstraction layer of "send a multipart message please" and requiring a full scan of the request body in order to ensure it complies with the specific quirks of multipart encoding this PR implements. It seems to me that this is both onerous on the user, and substantially less performant than using a random boundary value from a secure random source (there will be a constant-time versus linear time cost breakpoint for certain-- for small enough multipart requests the linear scan probably wins!). With regard to your comments about a "false sense of security": The entire point of using a random value is to make the chance of a collision of message content and boundary content negligible; the sense of security provided is in no way false. The spec allows for up to 70 characters of boundary; these characters are selected from a set of 75 potential options, meaning that we have a total of: 75^70 = potential valid boundary delimiters, making the chance of a robust random scheme producing a collision negligible, in the cryptographic sense of the word. So, we are left with three options for providing users a robust path forward:
|
@outlandishlizard It is a responsibility of the application layer that enforces domain specific security constraints and handles user input through a UI. HttpClient does neither. It cannot make those decisions. |
|
No, the chance of a boundary value collision is NOT zero, as boundary value collisions can also occur due to the message body containing the boundary value, which is the issue that was originally brought up in this thread. Changing the boundary value to a static value makes it possible for a library to anticipate this and perform a linear scan to prevent the content it sends from being mangled, but in doing so also dramatically increases the chances of a collision occurring in the first place. |
|
@outlandishlizard So, you happily ignored the main point and proceeded to tell me the good old tale if thousands of monkeys bang away at a type writer long enough they will eventually produce Leo Tolstoy's "War and Peace". So, we keep on merrily going in circles. I can no longer help you here. If someone else submits a pull request making UUID based boundary default and another project committers accepts the PR, I will not stand in their way. My personal preference though is to deprecate the whole damn thing and be done with it and Green Men from Mars. |
|
Hey folks, I'm a software engineer at Stainless. We generate Java/Kotlin SDKs and they all use I was just made aware of this change. I don't agree with a fixed boundary being the default behavior, but putting that aside, does this PR change the default behavior in a non-major version bump? It seems like a clear breaking change to me. If this will not be associated with a major version bump, then we're going to have a serious problem. The SDKs will be stuck between the following two choices:
Technically I could use reflection to look up at runtime whether So with all that being said, if you're not going to revert this PR, can you at least release this as a major version bump? I think it would be the responsible thing to do to set expectations around which versions of this library are backwards compatible with each other. |
|
Hello @TomerAberbach We normally reserve major version bumps for breaking binary compatibility as defined by the JLS, which is not the case here. |
|
Okay, I hear you, but did my comment help explain why I think this change would be disruptive if it's not released as a major version bump? Do you have any thoughts on the problem I described? |
|
I'd rather make it random and move it. Not worth the pain or reduction of the computational overhead. |
|
@TomerAberbach Maven dependency resolver should pick version 5.5 over 5.4 if both are present as transitive dependencies. (Please @michael-o correct me if I am wrong.) If the user has explicitly pinned HttpClient to a specific version in their POM this is a general dependency management problem that the user needs to address either by pinning the version of your library to match that of HttpClient or by upgrading (or unpinning) HttpClient version. You may be much better off migrating to Apache Mime4J given the direction this whole thing is going. |
Pinning is always crucial, if you know something is coming.
I don't know whether this is really necessary. The MIME package here covers a lot of cases, I don't see a reason why it should not continue to do so. I am really inclined to say that it should be random by default and static by choice. I do use MIME code myself and it is fully sufficient for almost all use cases here. |
@michael-o And who deals with them "security professional"? Who reads through numerous RFCs to make sure we stay conformant? |
I wish this were true, but unfortunately Maven will pick the version closest to the root of the user's project... Gradle does the right thing and picks the higher of the two versions, but enough people use Maven that its resolution strategy is a problem for me.
The problem with this is that the user may be on an old version and it may be onerous for them to upgrade to the latest one just so that my SDK can call this That's why I consider this PR a breaking change:
Is this not clearly a breaking change? A change to the contract of a class that also ends up requiring breaking changes to users of this library?
You're one of the maintainers right? I'm curious why you'd prefer users to migrate to a different library instead of using this one? |
@TomerAberbach Disregard Apache Mime4J is a full featured MIME library. I used to contribute to it a lot 15 years ago but not any more. When HttpComponents was established as a Top Level Project it was decided we would be using Mime4J instead of developing out own MIME support. Then, Google came and everything had gone to sh*te |
Simplify
MultipartEntityBuilderby replacing random boundary with fixed "boundary-blah" default. Update Javadoc to note users should set custom boundaries if needed, and tweak test for the fixed value.