-
Notifications
You must be signed in to change notification settings - Fork 138
fix: buffered writes when using io.Copy
#318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When copying files to a fat filesystem and if using `io.Copy` on the `filesystem.OpenFile` it was slow which the entries were flushed for each write which is in-efficient. Only flush once on `Close()`. Signed-off-by: Noel Georgi <git@frezbo.dev>
|
@deitch this is more of a draft stuff and I'll leave it upto the maintainers to do any fixes, most of this was generated with Claude so I don't feel fully confident in the implementation, I've tried to add tests. The benchmark test shows how bad is when using Can be run as |
| return totalRead, retErr | ||
| } | ||
|
|
||
| // ReadFrom reads data from r until EOF and writes it to the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from go doc io.Copy
go doc io.Copy
package io // import "io"
func Copy(dst Writer, src Reader) (written int64, err error)
Copy copies from src to dst until either EOF is reached on src or an
error occurs. It returns the number of bytes copied and the first error
encountered while copying, if any.
A successful Copy returns err == nil, not err == EOF. Because Copy is
defined to read from src until EOF, it does not treat an EOF from Read as an
error to be reported.
If src implements WriterTo, the copy is implemented by calling
src.WriteTo(dst). Otherwise, if dst implements ReaderFrom, the copy is
implemented by calling dst.ReadFrom(src).
otherwise it was calling Write with a default buffer which is extremely slow
| // Try to determine source size for optimal allocation using Seeker interface | ||
| var knownSize int64 = -1 | ||
|
|
||
| if seeker, ok := r.(io.Seeker); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this helps pre-allocate by checking if we source file implements io.Seeker if reading from a sourcefile with os.Open this works great
| ) | ||
|
|
||
| const ( | ||
| benchmarkFileSize = (4 * 1024 * 1024 * 1024) - 1 // 4GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried to use the max fat32 size, since that would be the most extreme case
|
Do you mind explaining the logic of the changes? I really like that you did the two benchmark tests, which is nice to pull the contrast together. As far as I can tell, they are identical except for one does: _, err = destFile.Write(data)And the other does: _, err = io.Copy(destFile, srcFile)If the problem is just that Which are you saying is, or is not implemented? Or is it |
|
Also, does need a rebase. |
it;'s not about the buffer, I had benchmark test that used a large buffer with similar bad performance If ReaderFrom is not there, it uses |
|
OK, so the problem is that You are trying to fix this by having it implement Now that I get it (although I haven't dug into your allocation algorithm), I have a few questions. First, if Second, do we know algorithm the fat32 driver uses, maybe in Linux? It must have a similar issue when someone calls write(2), which eventually sends data to a file descriptor, which is held open by the kernel for a particular filesystem driver, i.e. fat32. A naive approach might be similar to what you did: always create a large number of clusters, keep them cached, use them from the cache, when the cache runs dry, allocate another large chunk. I don't recall offhand how fat32 filesystem handles clusters created but not for a particular file, if that is doable. |
When copying files to a fat filesystem and if using
io.Copyon thefilesystem.OpenFileit was slow which the entries were flushed for each write which is in-efficient. Only flush once onClose().