Skip to content

Conversation

@sawenzel
Copy link
Contributor

This implements a major new feature in the O2DPG workflow/pipeline runner.

The runner can now auto-delete artefacts from intermediate stages as soon as these artefacts are no longer needed. For example, we can delete TPC hits, as soon as TPC digitization finishes.

This allows then to operate on smaller disc spaces or to simulate more timeframes within a job.

To use the feature, one needs to provide a "file-access" report with --remove-files-early access_report.json. This report is a "learned" structure containing the list of files that are written/consumed in a workflow and by which task.

Such report needs to be generated, in a prior pilot job with the same workflow, by a user with sudo rights. See here #2126.

This is primarily useful for productions on the GRID, and the idea would be to

(a) for each new MC production, we produce the file-access file in a pilot
job or github actions when releasing software

(b) we then use this file to optimize the disc space in MC productions
on the GRID

This development is related to https://its.cern.ch/jira/browse/O2-4365

This implements a major new feature in the O2DPG workflow/pipeline runner.

The runner can now auto-delete artefacts from intermediate stages as soon as these
artefacts are no longer needed. For example, we can delete TPC hits, as soon as TPC digitization
finishes.

This allows then to operate on smaller disc spaces or to simulate more timeframes
within a job.

To use the feature, one needs to provide a "file-access" report with
`--remove-files-early access_report.json`. This report is a "learned"
structure containing the list of files that are written/consumed in a workflow and by which task.

Such report needs to be generated, in a prior pilot job with the same workflow, by a user with sudo rights.
See here AliceO2Group#2126.

This is primarily useful for productions on the GRID, and the idea would be to
(a) for each new MC production, we produce the file-access file in a pilot
    job or github actions when releasing software

(b) we then use this file to optimize the disc space in MC productions
    on the GRID

This development is related to https://its.cern.ch/jira/browse/O2-4365
@github-actions
Copy link

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@sawenzel sawenzel merged commit 93ff9d0 into AliceO2Group:master Sep 26, 2025
7 of 8 checks passed
@sawenzel sawenzel deleted the swenzel/early-file-removal branch September 26, 2025 12:44
@sawenzel sawenzel restored the swenzel/early-file-removal branch September 26, 2025 15:06
@sawenzel sawenzel deleted the swenzel/early-file-removal branch September 26, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant