O2DPG workflow_runner: New early-file removal feature #2132
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implements a major new feature in the O2DPG workflow/pipeline runner.
The runner can now auto-delete artefacts from intermediate stages as soon as these artefacts are no longer needed. For example, we can delete TPC hits, as soon as TPC digitization finishes.
This allows then to operate on smaller disc spaces or to simulate more timeframes within a job.
To use the feature, one needs to provide a "file-access" report with
--remove-files-early access_report.json. This report is a "learned" structure containing the list of files that are written/consumed in a workflow and by which task.Such report needs to be generated, in a prior pilot job with the same workflow, by a user with sudo rights. See here #2126.
This is primarily useful for productions on the GRID, and the idea would be to
(a) for each new MC production, we produce the file-access file in a pilot
job or github actions when releasing software
(b) we then use this file to optimize the disc space in MC productions
on the GRID
This development is related to https://its.cern.ch/jira/browse/O2-4365