Skip to content

Conversation

@sawenzel
Copy link
Contributor

  • improvements to analyse_FileIO.py
  • o2dpg_workflow_runner.py: Integrated option to produce fileaccess reports

* improvements to analyse_FileIO.py
* o2dpg_workflow_runner.py: Integrated option to produce fileaccess reports
@github-actions
Copy link

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@sawenzel sawenzel merged commit 8eb8228 into AliceO2Group:master Sep 23, 2025
8 checks passed
@sawenzel sawenzel deleted the swenzel/fileaccess_impr branch September 23, 2025 12:00
sawenzel added a commit to sawenzel/O2DPG that referenced this pull request Sep 25, 2025
This implements a major new feature in the O2DPG workflow/pipeline runner.

The runner can now auto-delete artefacts from intermediate stages as soon as these
artefacts are no longer needed. For example, we can delete TPC hits, as soon as TPC digitization
finishes.

This allows then to operate on smaller disc spaces or to simulate more timeframes
within a job.

To use the feature, one needs to provide a "file-access" report with
`--remove-files-early access_report.json`. This report is a "learned"
structure containing the list of files that are written/consumed in a workflow and by which task.

Such report needs to be generated, in a prior pilot job with the same workflow, by a user with sudo rights.
See here AliceO2Group#2126.

This is primarily useful for productions on the GRID, and the idea would be to
(a) for each new MC production, we produce the file-access file in a pilot
    job or github actions when releasing software

(b) we then use this file to optimize the disc space in MC productions
    on the GRID

This development is related to https://its.cern.ch/jira/browse/O2-4365
sawenzel added a commit that referenced this pull request Sep 26, 2025
This implements a major new feature in the O2DPG workflow/pipeline runner.

The runner can now auto-delete artefacts from intermediate stages as soon as these
artefacts are no longer needed. For example, we can delete TPC hits, as soon as TPC digitization
finishes.

This allows then to operate on smaller disc spaces or to simulate more timeframes
within a job.

To use the feature, one needs to provide a "file-access" report with
`--remove-files-early access_report.json`. This report is a "learned"
structure containing the list of files that are written/consumed in a workflow and by which task.

Such report needs to be generated, in a prior pilot job with the same workflow, by a user with sudo rights.
See here #2126.

This is primarily useful for productions on the GRID, and the idea would be to
(a) for each new MC production, we produce the file-access file in a pilot
    job or github actions when releasing software

(b) we then use this file to optimize the disc space in MC productions
    on the GRID

This development is related to https://its.cern.ch/jira/browse/O2-4365
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant