-
Notifications
You must be signed in to change notification settings - Fork 428
net-tokio: add fn socks5_connect_outbound
#4305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
a7f498c to
42cbfd4
Compare
lightning-net-tokio/src/lib.rs
Outdated
|
|
||
| tcp_stream.write_all(&socks5_request).await.map_err(|_| ())?; | ||
|
|
||
| let mut buffer: Vec<u8> = Vec::with_capacity(SOCKS5_REPLY_MIN_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably immediately re-allocate since the reply should be larger than min reply len in the common case if we read_to_end, no? Should we rather pick a reasonable bufsize for this? Might also make sense to only read the 3 first bytes we look at in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably immediately re-allocate since the reply should be larger than min reply len in the common case if we read_to_end, no?
If we take the common case to be Tor, this reply will always be 10 bytes.
Should we rather pick a reasonable bufsize for this?
I have nonetheless set this to the max possible reply given my reading of the RFC spec.
Might also make sense to only read the 3 first bytes we look at in the first place?
Want to make sure any bytes after the first 3 get discarded. Didn't find a better solution other than just reading them into the buffer and ignoring them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nonetheless set this to the max possible reply given my reading of the RFC spec.
I'm confused, you're saying SOCKS5_REPLY_MIN_LEN is the maximum reply length? Edit: nevermind only now saw that you actually changed the code to include SOCKS5_REQUEST_REPLY_MAX_LEN.
Want to make sure any bytes after the first 3 get discarded. Didn't find a better solution other than just reading them into the buffer and ignoring them.
Hmm, if we're positive this ~always 10 bytes, why not actually discard 10 bytes (or read the fields and keep discarding until we've seen all expected fields)? FWIW, seems we could def. avoid the allocations at all if we make buffer a small fixed-size byte array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, you're saying SOCKS5_REPLY_MIN_LEN is the maximum reply length?
I changed the length of the buffer we preallocate to the max length of the reply given my reading of the RFC spec.
Hmm, if we're positive this ~always 10 bytes, why not actually discard 10 bytes (or read the fields and keep discarding until we've seen all expected fields)? FWIW, seems we could def. avoid the allocations at all if we make buffer a small fixed-size byte array?
I want to avoid being too tor-specific here. Another socks5 setup could definitely return more bytes in this response, up to 262 SOCKS5_REQUEST_REPLY_MAX_LEN.
Would you rather we use a [0u8; SOCKS5_REQUEST_REPLY_MAX_LEN] to avoid any heap allocations ? We could make it work yes, but seems to me Vec<u8> is good enough.
lightning-net-tokio/src/lib.rs
Outdated
| tcp_stream.write_all(&socks5_request).await.map_err(|_| ())?; | ||
|
|
||
| let mut buffer: Vec<u8> = Vec::with_capacity(SOCKS5_REPLY_MIN_LEN); | ||
| let n_read = tcp_stream.read_to_end(&mut buffer).await.map_err(|_| ())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we read what's available from the stream and then discard the buffer. Are we positive that this would never hold data we actually want to use and bubble up to handlers? As mentioned above, maybe we should just read 3 bytes rather than read_to_end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we positive that this would never hold data we actually want to use and bubble up to handlers?
In the common case where we talk to Tor, any bytes after this will always be set to ATYP IPV4 and then ADDR 0.0.0.0 and PORT 0. None of this information is useful to callers, so I ignore it. Happy to update this in the future should this information be needed by callers in other SOCKS5 setups.
As mentioned above, maybe we should just read 3 bytes rather than read_to_end?
See above, I want to make sure any bytes after the first 3 get properly discarded, and not read by future calls to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the common case where we talk to Tor, any bytes after this will always be set to ATYP IPV4 and then ADDR
0.0.0.0and PORT0. None of this information is useful to callers, so I ignore it. Happy to update this in the future should this information be needed by callers in other SOCKS5 setups.
Hmm, if we expect these fields exactly, why not then read the expected number of bytes only, and error out if there's anything unexpected? I'm still wondering if it would be ensured we'd never be able to read actual payload on this first read after connection establishment. FWIW, that kinda violates TCP's streaming model, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not then read the expected number of bytes only, and error out if there's anything unexpected?
As mentioned above, I want to match the socks5 spec, and not constrain the use case to the tor proxy only.
I'm still wondering if it would be ensured we'd never be able to read actual payload on this first read after connection establishment. FWIW, that kinda violates TCP's streaming model, no?
I don't follow this one, can you clarify ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this one, can you clarify ?
TCP is a stream-based protocol. If you 'read to end' that only means you read what's currently there to your buffer. That might be the full response, only a part of it, or even the response followed by the some bytes of following packets. See for example Tor's 'optimistic data' mode: given that they allow to pipeline sending requests even when the connection attempt is still inflight could potentially mean that the first response bytes are already queued when we get to read_to_end, so we might end up discarding some valid goodput? It might actually be unlikely, but such races happen, so it would be safer to read only what we need to the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point thank you ! See the fixup below, I now read the socks5 reply to the exact byte in case the payload follows immediately in the stream.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4305 +/- ##
==========================================
- Coverage 86.59% 86.55% -0.05%
==========================================
Files 158 158
Lines 102408 102821 +413
Branches 102408 102821 +413
==========================================
+ Hits 88678 88992 +314
- Misses 11309 11412 +103
+ Partials 2421 2417 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
42cbfd4 to
e5f6359
Compare
lightning-net-tokio/src/lib.rs
Outdated
| let mut buffer: Vec<u8> = Vec::with_capacity(SOCKS5_REQUEST_REPLY_MAX_LEN); | ||
| let n_read = tcp_stream.read_to_end(&mut buffer).await.map_err(|_| ())?; | ||
| if n_read < SOCKS5_REQUEST_REPLY_MIN_LEN | ||
| || n_read > SOCKS5_REQUEST_REPLY_MAX_LEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is too strict ? Anything longer than this deviates from the spec, so I lean towards returning Err in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO fine to error if it doesn't adhere to the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also aware we have no tests for this code now, would it be ok to add tests that try connections to actual endpoints on the internet ? ie torproject.org, some famous lightning node onion V3 address...
i have tested socks5_connect in a toy binary and everything works.
Yeah, tests towards some known endpoints on the internet are notoriously flaky (especially in Github CI), however, they are better than no tests! |
lightning-net-tokio/src/lib.rs
Outdated
|
|
||
| tcp_stream.write_all(&socks5_request).await.map_err(|_| ())?; | ||
|
|
||
| let mut buffer: Vec<u8> = Vec::with_capacity(SOCKS5_REQUEST_REPLY_MAX_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the other comment: now that we have SOCKS5_REQUEST_REPLY_MAX_LEN, coudl we make this a [u8; SOCKS5_REQUEST_REPLY_MAX_LEN]?
Let me see that would require installing C tor in CI right ? Do we want to do this still ? |
Hmm, it would likely not be too bad, though indeed it might have some overhead if run every time. It would be a good candidate for a nightly CI job, IMO. For now it might be fine to add some tests, if they all reuse the same setup? |
8d22deb to
7fdaeb2
Compare
lightning-net-tokio/src/lib.rs
Outdated
| let method_selection_request = [VERSION, NMETHODS, selected_auth]; | ||
| let mut tcp_stream = TcpStream::connect(&socks5_proxy_addr).await.map_err(|_| ())?; | ||
| tcp_stream.write_all(&method_selection_request).await.map_err(|_| ())?; | ||
|
|
||
| let mut method_selection_reply = [0u8; METHOD_SELECT_REPLY_LEN]; | ||
| let n_read = tcp_stream.read_exact(&mut method_selection_reply).await.map_err(|_| ())?; | ||
| if n_read != METHOD_SELECT_REPLY_LEN || method_selection_reply != [VERSION, selected_auth] { | ||
| return Err(()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wat? No, just send the CONNECT command and be done with it. We don't need the vast majority of this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send something like this - https://github.com/rust-bitcoin/corepc/blob/master/bitreq/src/proxy.rs#L104 and then just check the status code response https://github.com/rust-bitcoin/corepc/blob/master/bitreq/src/proxy.rs#L113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, HTTP CONNECT is gated behind the HTTPTunnelPort setting, itself not enabled by default in C Tor, so we go with SOCKS5.
tnull
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good I think, feel free to squash.
lightning-net-tokio/src/lib.rs
Outdated
| let mut username_password_request = [0u8; USERNAME_PASSWORD_REQUEST_MAX_LEN]; | ||
| username_password_request[0] = USERNAME_PASSWORD_VERSION; | ||
| username_password_request[1] = username.len() as u8; | ||
| username_password_request[2..2 + username.len()].copy_from_slice(username.as_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having raw numbers here and introducing multiple _pos variables below, should we just use an io::Cursor and write to it? Alternatively we could introduce a single mut pos variable that indicates our current position? Same goes for the socks5_request below, seems like its construction is unnecessarily complicated as keeping track of all the exact offset numbers could go wrong at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks if you don't mind I add a quick custom BufWriter, see below. I don't like Cursor much since it forces std::io::Write, and a bunch of unwrap or map_err everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why using std::io::Write is bad? Adding a bunch of boilerplate seems a bit overkill, IMO? We could also just use a stream like let mut writer = &mut buf[..]; if you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should I mentioned this: Rust is confused on which write to call if we import both std::io::Write, and AsyncWriteExt - Cursor implements both. I think I'll go with std::io::Write and fully qualify calls to AsyncWriteExt::write
7fdaeb2 to
5f60f17
Compare
5f60f17 to
323f1e0
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two questions, but otherwise looks fine, i didnt bother to check that the protocol is correct but if tor likes it that's fine enough.
lightning-net-tokio/src/lib.rs
Outdated
| /// Same as [`connect_outbound`], using a SOCKS5 proxy | ||
| pub async fn socks5_connect_outbound<PM: Deref + 'static + Send + Sync + Clone>( | ||
| peer_manager: PM, their_node_id: PublicKey, socks5_proxy_addr: SocketAddr, addr: SocketAddress, | ||
| user_pass: Option<(&str, &str)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is ~entirely for Tor (has anyone used SOCKS for anything else in the last decade?), should we instead require an RNG and pick a random password and always pass <torS0X>0 as the username? I suppose we could leave the an option for socks-custom but istm we should encourage a stream isolation parameter and a naive use of this API will simply provide no user/pass and get no stream isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just keep the current socks5_connect_outbound and add another tor_socks5_connect_outbound variant that does that and calls back into socks5_connect_outbound with the corresponding username/password values? This way we'd keep both options open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely could, not like its that complex, just a question of if anyone would ever use it (except accidentally because they just add a config knob in their application called "socks proxy" and never use the Tor one). It seems VPNs basically entirely supplanted SOCKS proxies over the last 10-15 years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should encourage a stream isolation parameter and a naive use of this API will simply provide no user/pass and get no stream isolation.
Seems to me forcing <torS0X>0 as the username is too restrictive ?
If we remove the Option, and force users to always pass a username: &str, password: &str, seems this does the job.
Users have to remember to rotate the stream isolation parameter in both approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting take an EntropySource and do the rotation ourselves.
lightning-net-tokio/src/lib.rs
Outdated
|
|
||
| #[cfg(tor_socks5)] | ||
| #[tokio::test] | ||
| async fn test_socks5_connect() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test this in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a standalone job below, let me know if that's what you have in mind.
tnull
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side, mod pending comments.
In particular some coverage in CI would be good to have to check it actually works as expected, even if we want to offload that to a nightly job eventually (which we had at least discussed briefly elsewhere).
lightning-net-tokio/src/lib.rs
Outdated
| /// Same as [`connect_outbound`], using a SOCKS5 proxy | ||
| pub async fn socks5_connect_outbound<PM: Deref + 'static + Send + Sync + Clone>( | ||
| peer_manager: PM, their_node_id: PublicKey, socks5_proxy_addr: SocketAddr, addr: SocketAddress, | ||
| user_pass: Option<(&str, &str)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just keep the current socks5_connect_outbound and add another tor_socks5_connect_outbound variant that does that and calls back into socks5_connect_outbound with the corresponding username/password values? This way we'd keep both options open?
a04440d to
f563923
Compare
source the stream isolation parameter from `EntropySource::get_secure_random_bytes`
f563923 to
051b8f2
Compare
No description provided.