-
Notifications
You must be signed in to change notification settings - Fork 14
Dev/sftp start: Implementing basic SFTP functionality #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev/sftp-start
Are you sure you want to change the base?
Conversation
|
After that I have:
As a minor change I have clarified that the module will follow SFTP v3 specifications in 12 |
|
We have a complete, not complaining sftp/src/proto.rs. However I restricted the lifetime for Apart from that, I have avoided the use or There is a number of TODOs in the file, mainly regard to:
|
Fixed according to the indications in the next discussion: mkj#29 (comment) Adding the expanded version for reference too Thanks
|
Hi again, I am starting to process the sftp subsystem, I have deserialized I have made mayor changes in The use of the macro has changed as well and maybe for good. |
|
I am working on providing basic functionality to handle the sftp initialisation and adjusting the sftpserver trait to expose as little as possible protocol details to the user. I also have to handle properly errors inside sftphandle. The code at this point is able to start an sftp session after running |
|
At this point, I need to handle the case where after extracting from the I am exploring storing the fragment of the request in an auxiliar buffer to make sense of it after the next buffer read. Did you have to deal with this situation at a different level @mkj? For now, I am going to use an auxiliary buffer to assemble the request. More coming soon... |
Yeah I think a buffer is probably necessary (if SSHWire had partial decoding it'd still need to store intermediate parts in memory somewhere). As an optimisation it might be possible to special_case |
|
Thanks for the ideas. For now I am writing a helper where I can take enough data from buffer in to get the size and packet type packet type. In the case of SSH_FXP_WRITE I copying enough data to create a multi part write of the content reusing the current technique for it. SSH_FXP_NAME is an interesting case. And FXP_DATA will also require some thinking, since the sink length may limit the sftpserver implementer and would create a bottleneck reading. |
|
Thanks to @mkj original design of structures in proto.rs, there is only one entity that for now cannot be used in The reason is simple and @mkj mentioned before. The response for Since reading a directory is part of the basic features that I have proposed in the library roadmap, I think that I should address it. I would like to share the approach that I am starting to put together but I am open to more ideas. I am considering processing the data element by element in the directory to resolve the arbitrary number of elements with new traits and structures for collections of SSH_FXP_READPATH responses. I would like to capture the next workflow:
I have a lifetime challenge here because the I would like to keep those internal details for the the Does anybody have a simpler or better idea? Maybe there is a better pattern to use here that I am overlooking. |
|
One way to resolve lifetime problems could be to have the |
|
Thanks @mkj! We've been inspecting this piece of code (and your advice), for readdir and we have a few questions: We assume that the developer would eventually implement We are a bit confused by the What is We think that As in: we noticed the |
|
I'm currently travelling, should be able to take a look later next week.
|
|
Hi again @mkj, I think that I finally understand what you where suggesting that could work. A visitor pattern to avoid borrowing/lifetime issues while allowing iteratively process the directory elements. @brainstorm I am implementing a
For now I have a mock structure that simulates sending data. Soon I will push it. |
Looping to write all bytes. The counters confirm that there is no missing bytes as they match. Tested with a 100MB writing operation: ./test_long_write_requests.sh I am confident that this the write issue is solved
packet type + request id + items (1 + 4 + 4 bytes) hardcoded but will refactor
…e read dir request [skip ci] WIP: BUG **Sending loop includes sftp client** - the ReqId increases - The loop stop in a client interruption [ctl]+C I am going to need more visibility on why the client keeps on sending new request for the same files [skip ci] WIP: BUG I was not following V3 specification as did not send EOF https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filexfer-02#section-6.7 Next issue is that I am sending the relative path and not just the filename. Will fix next
… all the items and is asked again. Anything else fails This is point I have not considered. I could have sent one item at a time, with different challenges. In my opinion I should add this behavior to the SftpServer implementation, but that passes the responsibility to the library user and the idea is making the user not needing to care too much about the protocol implementation [skip ci] WIP: Typo and cleanup commented out code
Found a bug sending back the status code. It is always Ok! [skip ci] WIP: Temporary ways to track completed Read operations I will add a test to understand the unexpected behaviour in response status packet
…ode implementation for enum sshwire-derive lib.rs defines the rules to encode enums in `encode_enum` and the encoding decision is not to encode the value. See [implementation here](https://github.com/mkj/sunset/blob/8e5d20916cf7b29111b90e4d3b7bb7827c9be8e5/sshwire-derive/src/lib.rs#L287) This could be addressed by introducing a parameter for enum fields where the encoded data type is declared (or not). There is probably a good reason for this behavior.
… generate encoded values? To fix the issue with the message type encoding I have done a manual implementation of SSHEncode for `StatusCode`. It is tested in a proto_test mod. From last commit: sshwire-derive lib.rs defines the rules to encode enums in `encode_enum` and the encoding decision is not to encode the value. See [implementation here](https://github.com/mkj/sunset/blob/8e5d20916cf7b29111b90e4d3b7bb7827c9be8e5/sshwire-derive/src/lib.rs#L287) This could be addressed by introducing a parameter for enum fields where the encoded data type is declared (or not). There is probably a good reason for this behavior.
Not ready yet. In order to answer EOF to a SSH_SXP_READDIR after all the dir items have been already sent. I am storing a `sftpserver::ReadStatus` passed as a result on `SftpServer.dirread()` with PendingData or EndOfFile. This would not work well in scenarios where there is more than one read transaction in course. I have decided that I am going to make the SftpServer implementators the responsibility to send the EOF status message but I will help them by providing a detailed trait documentation. To test this listing you may run the demo/sftp/std and run the script testing/test_read_dir.sh mind that the hard link counter is not recovered as it is not part of SFTP V3 implementation
|
In the commit Fixed Status Encoding..and the previous one, I came across with the fact that sshwire-derive does not encode enum values. Therefore, SSHEncoding structures with enum fields would not encode enum fields. To solve this situation I implemented the SSHEncode for I identified the relevant function and point in sshwire-derive/src/lib.rs where automatic encoding of numeric enum fields could be done, but the enum data type would need to be provided somehow, maybe as a struct Attribute, to automate the numerical value encoding. @mkj Do you recognise that implementing an automatic enum value encoding would be a good use of time? |
I have rocorded packets (encrypted) and straces for exec and test script finding: - invalid packets reading from the channel - No data reading from channel after sending EOF - demo-sftp-std server stucked reading from file
The next files have been updated to facilitate the collection of data. The data collected will include: - Output of trace level of the sunset-demo-sftp-std - strace for sunset-demo-sftp-std - Output for log_get_single.sh script
|
First situation. The Client :
Files in the archive:
|
|
completing_packet_waiting_for_incoming_bytes.tar.gz Files in the archive: log_get_single.log: log_get_single.sh output |
|
missing_bytes_break_communication.tar.gz: In this archive I have collected all the bytes received, grouped them by packets and I could identify that:
Files in the archive:
|
| error!("server progress failed: {:?}", e); // NoRoom: 2048 Bytes Output buffer | ||
| return Err(e); | ||
| match e { | ||
| Error::NoRoom {} => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it's related to the current problems, but catching an error after progress() isn't generally OK. Some operations modify internal state so it should just exit if it fails here.
I think the specific changes in 2869501 should be OK though for the window adjust case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies, I did not realise that your fix was not applied
My apologies, I though that It had been merged to the main branch and merged when the env_var (pull 35) feature was included
… 100MB get Will provide further logs in a comment. Improved log scripts. In summary, the communication stalls because the ChanOut.write at some point does not return after being called. The reception if packets does not show any anomaly.
|
I tested again getting files from the server and worked BUT when I made the file longer. 100MB, the communication stalls after 4 MB have been transfers at a non deterministic point. This is the only problem that I can record at this point. I looked at the logs and can see that the communication stalls because ChanOut.write blocks and don't get more byte sent. More details in the attached logs stalled_not_sending_bytes.zip and |
Coordinating pcap files, test script output, demo program output and strace (only network traffic)
Not all the data that the producer tries to send reach the consumer, that is the one that writes the data in the ChannelOut. This is likely to be the responsible for the stuck communication. Apart from that, I added more testing and fixed issue in Proto::Data (SSH_FXP_DATA). It had too many fields. More details to come
|
Analysing the logs with more detail and putting together all the received and sent packets I have noticed that not all the data gets sent in the ChannelOut. In the library, the I have noticed that the call to write_all gets blocked indefinitely at the point that the communication stalls. You can find that in long_extract.log. I have found some errors around the proto::Data structure that are fixed, but were not causing the problem. Moving forward, I am going to examine the packets sent to As a curiosity, I am attaching two sets of SftpPackets decoded. One for a short get transmission that successes and another for a long transmission that stalls. |
It was introduced for read_packet_from_file.rs and it is no longer needed
At this point I can see that on stall, the backpressure from the ChanOut blocks the data put in the pipe. I took measures in a previous commit (5d1c1f9) to guarantee that for SSH_FXP_DATA and SSH_FXP_NAME the announced length and real length matches. - SftpServer Unit tests to check the length in header and data sent - Runtime checks for ReadDir, Read and PathInfo requests I am going to start digging in the ChanOut Write process
This gives us visibility on when the ChanOut won't take all the bytes. It hapens regularly and that is expected. Does not provide extra visibility on the problem. I will leave it like this for now
|
One question @mkj: Looking at async channels, the first thing that I have noticed is that I might not be using ChanOut as intended. According to ChanOut docs
And later adds
Does this means that only one instance should be used for writing and only one (other) instance should be used for reading or only one instance should be used for read and writing? I believe it is the first interpretation, given the 'each' word, but please correct me if I am wrong and will refactor accordingly. |
|
I have notice something important. The On every call of The only point in the code where the windows length is increased is in the dispatch_inner match packet case with But Checking why they stopped being sent by the client I realised that they where sent. Here goes an extract of the client vs server logs: |
Yes, one reading instance and one writing instance. If you want they can be the same (ChanInOut).
|
|
I'm not certain, but it looks like it might be a problem with OpenSSH's SFTP read request pipelining. The server is processing read requests sequentially, so it has to send all response channel data for a read request before proceeding to the next packet. If it runs out of channel window while there are outstanding read requests, it will get stuck. The Window Adjust is queued after the outstanding read requests.
By default the sftp client has 64 requests pending, each of size 32768. The default channel window is 64*32768 (maybe deliberate?). That doesn't have any leeway for SFTP headers. The lock always seems to occur when the gap between read requests and sent data reaches the 2MB gap (target window size). An example running log_get_single_long.sh with https://github.com/mkj/sunset/tree/dev/matt-sftp-debug (changed debugging a bit), the last request received by sunset-sftp is offset 4751360: In the client log, that's around the point the amount of outstanding data crosses the window size: 4751360 - 2654208 = 2097152 (64*32 kB) |
|
That is a very interesting point and all your analysis adds up. Thanks for your insights! Now that you have pointed out to the client logs, I have noticed something that I find peculiar. Doing a grep to select Here we have eight requests of 32kB each follow by a window adjust of 98816 bytes. This adjust Counting the 32kB request in my log files and adding up all the window adjustment sent by the client I can summarise this deficit: which accounts a total of 2,089,984 less bytes for adjust messages than request sent. Is it not strange? |
The reason is that the adjust occurs after the read response is received by the client, and that may be some time later than the read request. So when the deficit reaches 2097152 (64*32kB) it can't send any more data, the deadlock occurs. The exact window adjust sizes are kind of arbitrary (since data packets aren't necessarily a nice multiple), the client will send those whenever the window drops below 2097152 I think. I added some logging here to openssh client, the window adjustments matched what was being done in sunset. |
|
You are right, the receiver window adjustments are not preemptive but need to follow the reception of bytes from the sending peer. So really, the window is use to prevent deadlocks from a peer sending more bytes than the receiver can process. In this case the client is sending more bytes than the server can process. Without a full understanding on the SSH Connection protocol, my intuition tells me that maybe we should slow down the number of request that the client sends by delaying the server window reception adjustment. Looking at I believe that since sending a 32kB data block in a SSH_FXP_DATA response is orders of magnitud slower than receiving short sftp requests ideally the SFTP subsystem should slow the request rate. How to do this is tricky since the SSH Connection Protocol layer is not concern about the subsystem nature. @mkj: Would you consider an optional Maybe a transparent way to do this is using the I have tried with different thresholds in check_window_adjust and could not find a "sweet spot" to prevent the deadlock. |
Hi there @mkj,
This is an early draft of a pull request to help with the SFTP subsystem for sunset. I have been reading and working on
proto.rsand I would like you to see my commits and discuss problems and further steps. If you prefer other channels to discuss this I am happy to do it in nay other way.To begin with, there were a number of build errors and to make incremental steps I opted to comment out and out
mod sftpserverfromlib.rsand themacro_rules! sftpmessagesin recent commits (1,2,3). This way I could focus and use cargo-expand to analyse the SSHEncode/SSHDecode macro expansions.The 20th of August I was trying to fix small things without getting into how the sftp module would work In the commits from the 20th of August I fixed some minor problems and is likely that I will have to revert some changes later on, like 4, 5.
Before working on the big entity in
proto.rs,sftpmessages!, I need to fix some issues withStatusCode. I would like to ask you if I am using SSHEncode/SSHDecode macros correctly forStatusCodesince I felt forced to add to it a lifetime which might not be necessary. Also I believe that SSHEncodeEnum is not working as expected (6, 7). I addedsftp/out/cargo-expand-sftp.rsas reference to the expanded SSHEncode.I also would like to mention that I had to modify some files outside of the sftp module as you will see in Files changed tab to implement or alter visibility of elements (
src/packets.rs,src/sshwire.rs)