BF: job_templates: Call tar with --ignore-failed-read#451
BF: job_templates: Call tar with --ignore-failed-read#451kyleam wants to merge 1 commit intoReproNim:masterfrom
Conversation
After a command completes, it writes to "status.$subjob". If, after completing its command, a subjob sees that the status files for all the other subjobs are in, it claims responsibility for the post-processing step. For the datalad-run orchestrators, post-processing includes calling `find` to get a list of newly added files and then calling `tar` with these files as input. Given that the above procedure waits until each command exits, the hope is that all the output files are created and any temporary files will have been cleaned up. But we're hitting into cases [*] where apparently intermediate files are present for the `find` call but gone by the time `tar` is called. This leads to `tar` exiting with a non-zero status and the post-processing being aborted. Until someone has a better idea of how to deal with this, instruct `tar` to exit with zero even if an expected file isn't present. This allows post-processing to succeed and the incident will still show up in the captured stderr. [*] ReproNim#438 (comment)
|
I kept thinking about it and so far treat it (failure) as a "feature". We aren't catching complete shutdown of the underlying process or some filesystem effect. |
I'm not really seeing it. Sure, it's useful for us to know there's something funky going on. But until we have a clear understanding of the issue and how to fix it, it seems unnecessary to make the post-processing completely fail because a file (very likely an uninteresting and temporary one) was removed between the But ok, let's hold off on this, and revisit it if we hit into it in other scenarios. |
|
I'll close this for now. We can resurrect it if desired at a later point. |
After a command completes, it writes to "status.$subjob". If, after
completing its command, a subjob sees that the status files for all
the other subjobs are in, it claims responsibility for the
post-processing step. For the datalad-run orchestrators,
post-processing includes calling
findto get a list of newly addedfiles and then calling
tarwith these files as input.Given that the above procedure waits until each command exits, the
hope is that all the output files are created and any temporary files
will have been cleaned up. But we're hitting into cases [*] where
apparently intermediate files are present for the
findcall but goneby the time
taris called. This leads totarexiting with anon-zero status and the post-processing being aborted.
Until someone has a better idea of how to deal with this, instruct
tarto exit with zero even if an expected file isn't present. Thisallows post-processing to succeed and the incident will still show up
in the captured stderr.
[*] #438 (comment)