Skip to content

Conversation

@f3sch
Copy link
Collaborator

@f3sch f3sch commented May 18, 2025

Do not merge.
for now this is just to check if it compiles on hip since I do not have the hardware, sorry for the noise.

This PR prints the gpu params and introduces a general stream abstraction, which is then used in the most low hanging fruit, e.g. the trackleting. I still have to measure if this brings actually any benefit, in any case the number of streams that are used can later be configured via the params.

@f3sch f3sch requested review from fprino, mconcas and shahor02 as code owners May 18, 2025 10:24
@github-actions
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@f3sch f3sch changed the title [WIP] ITS: GPU: minor changes & add stream abstraction ITS: GPU: minor changes & add stream abstraction May 18, 2025
@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI_slc9 for b967887 at 2025-05-19 01:45:

## sw/BUILD/O2-latest/log
/sw/BUILD/9bc8b0b901c0781437adfd032d0e6f6470ba3caa/O2/Detectors/ITSMFT/ITS/tracking/GPU/hip/TimeFrameGPU.hip:23:10: fatal error: 'format' file not found
ninja: build stopped: subcommand failed.

Full log here.

@f3sch f3sch marked this pull request as draft May 21, 2025 19:00
@f3sch f3sch changed the title ITS: GPU: minor changes & add stream abstraction ITS-GPU: print params, add stream abstraction and use for trackleting Jun 9, 2025
@davidrohr
Copy link
Collaborator

@f3sch just FYI: Do you have access to the EPN slurm batch system? From there you can get an interactive node and test compilation there. Might be easier than to use the CI.
Alternatively, you can get the registry.cern.ch/alisw/slc9-gpu-builder docker container and build inside there. It will have the nvidia and hip compiler. It is basically the build container we use for the CI.

@f3sch
Copy link
Collaborator Author

f3sch commented Jun 26, 2025

Hi @davidrohr, yes I do and I am actively using it to test most of the code :) At home I can compile for Nvidia as-well, since I have not figured out how to use the ci containers... yet.
This PR has taken a bit of backseat for now due to the recent refactoring in the cpu part although I think at the end of this week it should be ready.

@davidrohr
Copy link
Collaborator

OK, I see.
To use the continer, I run:

docker run -ti --name testneu --privileged --shm-size 128G -v /home/qon/docker:/home/qon/docker:rw registry.cern.ch/alisw/slc9-gpu-builder

the shm-size and privileged flags are only if you actually want to run O2 inside, for compilation you can skip them.
I mount a dir of my homedir into the container, and have all the alidist O2 and sw folders there, so the persists if I recreate the container.
Then, inside, it is just:

pip3 install alibuild
cd /home/qon/docker
aliBuild build ...

f3sch and others added 3 commits June 30, 2025 18:30
prints gpu kernel params

Signed-off-by: Felix Schlepper <felix.schlepper@cern.ch>
uses multiple streams for trackleting

Signed-off-by: Felix Schlepper <felix.schlepper@cern.ch>
Signed-off-by: Felix Schlepper <felix.schlepper@cern.ch>
@f3sch
Copy link
Collaborator Author

f3sch commented Jun 30, 2025

Ok, I tested this, and I get reasonable results.
Execution time does generally not improve, since trackleting anyways is fast.
But we can use the multiple streams easily for other parts of the algorithm.

@f3sch f3sch marked this pull request as ready for review June 30, 2025 16:33
@davidrohr
Copy link
Collaborator

This is fine for now. In principle, we could just use the stream arrays of GPU Reconstruction instead of your own set of streams, and we already have an equivalend of yous sync function. But can be adapted later. For now, perhaps @mconcas wants to have a look as well. From my side, fine to merge this.

Copy link
Collaborator

@mconcas mconcas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to consider the possibility for streams which are independent from GPUReco, just for those cases (e.g. prototyping of ALICE 3 tracking) where we don't need the GPUReconstruction (yet).
I would add the possibility of obtaining the streams from the GPU reconstruction in the case that we actually use it.

However, I agree that this can be done a second time, where we also demonstrate some better performance improvement by using multiple streams.

@davidrohr davidrohr merged commit fdc30b1 into AliceO2Group:dev Jul 1, 2025
13 checks passed
@f3sch f3sch deleted the its3/gpu branch July 1, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants