-
Notifications
You must be signed in to change notification settings - Fork 134
CodeQL guided medium risk (potential) file handle leaks path.c/flock.c #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodeQL guided medium risk (potential) file handle leaks path.c/flock.c #374
Conversation
dlmiles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments to the flock.c code that raises questions over what the intention really is.
What is being protected from what, it is usual for 2 independent processes to be modifying files. I'm not sure which operations in magic and which files needs inter-process co-operation to prevent data corruption? So the design goal is unclear to me to envisage why FILE_LOCKS exists.
Maybe I should check if ./configure --disable-locking is working as I can't spot the use case.
| fl.l_pid = getpid(); | ||
|
|
||
| if (fcntl(fd, F_GETLK, &fl)) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re use of F_GETLK before F_SETLK
It seems strange to call F_GETLK first. The same process can not see its own lock, but it can see the lock of another process on the file(dev:ino). But if we are looking for another process with an active lock, there is a
race condition to this, because maybe it places down the lock between our call to F_GETLK and our F_SETLK.
So in effect any data obtained from doing things in this order is always out-of-date.
It is more usual to attempt to make the lock first, then if it fails, enquire with F_GETLK the PID of the process that is blocking us by using F_GETLK. So the nice error message can be displayed. Now if the F_GETLK fails to find a lock is set, maybe we just missed the other process unlocking, so we might (optionally) retry the operation of obtaining the lock one more time. Obviously we don't want to repeat more than once. But maybe magic files are not high-contention file (like database) so the retry maybe over the top.
| perror(fname); | ||
| perror(fname); | ||
| /* appears to be a best-effort rather than signal error, */ | ||
| /* and continues to gzdopen() to provide caller handle */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when we don't get the lock, we carry on regardless, no indication or action made on error. A best-effort to obtain.
| close(fd); | ||
| fd = -1; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are successful at this point here, we did fail to obtain the lock on this FD.
We did not mark *is_locked = TRUE informing the caller.
But... if the caller were to use close(fd) and there was another open FD on this same file(dev:ino) by another part of magic then it would release all our locks. Line 137-139 area and Line 160-161 area.
I think the caller will just use close(fd) like normal because *is_locked == TRUE. which removes all locks from the fd anyway.
So either this file is lock aware or it isn't. Given is_locked != NULL we can assume the caller is classifying this open call is being lock aware. If this is correct then all calls to open should have their fd managed
| f = gzdopen(fd, mode); | ||
| goto done; | ||
| if (f) | ||
| goto done_store_fdp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vvvv Re Line 160 below vvvv
This mandatory call to close(fd) removes all locks our process has on this file. Another process can now gain their lock. Even if another part of our application has another FD open and will be doing more work on the FD later.
I don't get why a close() is made here, to a file that might be regarded as a lock aware file/path.
| f = gzdopen(fd, mode); | ||
| if (f == NULL) | ||
| { | ||
| close(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this close to manage the potential error path from the 3rd party library not accepting the 'fd'.
However this should also be a managed close if the file/path is consider file-lock aware.
| done_store_fdp: | ||
| if (fdp) *fdp = fd; | ||
| done: | ||
| if (fdp != NULL) *fdp = fd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move because the function path_gzdopen_internal() will take care of this when successful and only one exit path exists no in this function that needs to ensure it is done here. So special goto label for that one use case'
| f = fopen(filename, mode); | ||
| if ((fdp != NULL) && (f != NULL)) *fdp = fileno(f); | ||
| if (f) | ||
| if (fdp) *fdp = fileno(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rewrite of what is already there, in a form that is both similar to similar lines above and (I find) easier to see glance at its intention.
| return gzhandle; | ||
|
|
||
| close: | ||
| close(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do anything for *fdp = -1 to handle error path. Maybe it should. None of previous code has it but looking back on it again I spot this.
dlmiles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
| fl.l_type = F_WRLCK; | ||
| fl.l_pid = getpid(); | ||
|
|
||
| fd = open(fname, O_RDWR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^^ line 166 ^^^
Consider fl.l_type = ((oflags & O_ACCMODE) == O_RDONLY) ? F_RDLCK : F_WRLCK
Consider honouring the callers open mode flags for O_RDONLY and O_RDWR, I think maybe a O_RDONLY caller request is upgraded to O_RDWR with F_WRLCK. Unclear why that would be done.
|
File locking was introduced by my advisor Mike Godfrey at Stanford back in the early 1990s. The point of file locking was related to our use of magic with a database on a central server, and a number of people working on the same design at the same time. File locking prevents two people from overwriting each other's edits on a cell layout. Only one of the users can have the layout file open at a time, and if another person opens the same layout, they will find it read-only and uneditable. This is a feature of commercial layout tools, although I doubt they use file locking. Otherwise, even on a single-user system, the file locking can prevent a script from overwriting an open layout, so there are valid use cases. The major downside of file locking is that there is a major conflict between large, complex chip layouts with many cells, and Linux version that have hard-coded a limit of 1024 open file descriptors. This first showed up on the gf180mcu open PDK, where the I/O cells were so deeply nested that my system ran out of file descriptors. Since then, I hacked in some fixes to allow file locking to be disabled on demand, but much of the code is really hacks on hacks on hacks, mostly due to hastily trying to fix things during emergency situations happening at the last minute close to tapeouts. |
|
There having been 33 years between the original file locking code and now, I would be very surprised if there aren't much better ways to deal with the whole issue of preventing people from overwriting each other's edits. I would not, however, want to depend on an automated git merge to resolve layout differences. . . |
|
Thanks for the background, information, I'll digest what you said into what I can see in codebase. To me a cell editmode lock sounds like a traditional There is also the UI ability to, simply take a copy of the original file (into private auto-save area), checksum/timestamp original, have auto-save always use a private magic managed area, so when it comes to the user pressing save button, it then resolves that the original file has been changed under the session, and provide the user resolution options. Many regular text editors have some combination of the same thing from a UI perspective. Worst case scenario the current session of work is already safely inside the auto-save area. Linux does have a default environment limit of 1024, there are historical reasons for why this is set at that level: The hard limits are with -H and the soft/current/active limit is with -S. Many programs using a lot of FDs do this transparently on startup using The reason why this limit exists is the historical kernel syscalls for When the magic process starts up it can call: Can you provide/point-me-to a torture project ZIP of gf180 (or any open-source tech) that demonstrates the high fd usage ? at least I can profile/examine it to understand the nature of the file usage pattern. |
|
FWIW the existing magic Other than this there isn't any other area I have seen that is a concern to when using FDs above 1023. |
|
@dlmiles : The I/O library of gf180mcu itself exhibits the problem and so any full project will do so as well. So, |
|
This is more refined version that works for me. |
I have taken a look (impact assessment for raising RLIMIT_NOFILE by default). Have a few commits to protect the lesser used The final part is the 3rd party libraries.
So any strategy here may consider making it a default option for tcl9 and to advise (from a support perspective) to use tcl9 when using higher numbered FDs with magic. |
|
The previous comments on RLIMIT can be ignored in respect of this PR, they are for a different topic for which a new PR will be created for. The previous comments on I just updated this with style changes and invalidation of fd=-1 on error case. This was my only changes I expect to make. I just need to test now. Merge Status: merge on hold, pending targeted testing (review status in 3 weeks) |
a296fd1 to
ddab8ae
Compare
|
ddab8ae formatting fix, and set *fdp=-1 on error path |
|
Any update on this? Your "review status in 3 weeks" passed some time ago. |
|
The main subject of this change was to cleanup error handling scenarios and introduce While I have run with this patch for testing I can't say that I used input files which were gzip compressed during that time. Other than that I am still happy things should be ok. |
ddab8ae to
a4c5c7c
Compare
|
Rebase onto 8.3.530 no changes |
|
@dlmiles : Finally starting on another round of merges; Should I go ahead and merge this? |
|
Yes please. Despite the large amount of commentary in this PR. Most of the comments are about something else. This commit here just cleans up to error handling path for gzopen that static source analyser was showing it need attention (over leaked file handles during error path). |
|
@RTimothyEdwards is this one going soon and will clear all mergable outstanding PRs. |
Guided by CodeQL static code analyser. FileMayNotBeClosed.ql FileMayNeverBeClosed.ql Multiple paths exist that seems to be equivalent, some of those paths handled the error path, some of them didn't. CodeQL will still report a concern (instead of multiple concerns) so this has been commented to clarify when the libz handle setup is successful, the ownership of the 'fd' changed.
This is due to the transfer of ownership of the 'fd' into libz/gzip but the error handling wasn't always being checked.
a4c5c7c to
84264b3
Compare
|
git rebase --onto 1ec8b6e HEAD~2 HEAD # 8.3.534 |
84264b3 to
1a139e7
Compare
|
Fixup from header file location change utils.h => magic_zlib.h |
|
Pulled and merged (finally). |
This is mainly a refactor of a commonly repeated pattern into a new function that fully handle error paths.
Medium risk due to the high foot-fall and different use cases/filesystem scenarios.
I have a few questions with the flock.c file, best annotated into the source in a moment.