Skip to content

Conversation

@shahor02
Copy link
Collaborator

@shahor02 shahor02 commented Apr 9, 2025

Edge clusters before and after the fix with GPU_rec_tpc.clustersEdgeFixDistance=0.5
Default GPU_rec_tpc.clustersEdgeFixDistance is 0 (no fix).

clfix

@shahor02 shahor02 requested a review from davidrohr as a code owner April 9, 2025 16:14
@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2025

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

}
const auto cluster = decompressTrackStore(cmprClusters, clusterOffset, slice, row, pad, time, args...);
auto cluster = decompressTrackStore(cmprClusters, clusterOffset, slice, row, pad, time, args...);
if ((cluster.getFlags() & ClusterNative::flagEdge) && param.rec.tpc.clustersShiftTimebins != 0.f) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is wrong, instead of param.rec.tpc.clustersShiftTimebins you'd need to use param.rec.tpc.clustersEdgeFixDistance.
But then, why do you need this change in the TPCClusterDecompressionCore.inc at all, if you fix the clusters later in GPUTPCDecompressionKernels and TPCClusterDecompressor? I'd actually remove this part here, I believe it is not needed, and it adds an additional branch on the GPU if clustersEdgeFixDistance is not set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this decompressTrack( is called for attached clusters, in https://github.com/AliceO2Group/AliceO2/blob/dev/GPU/GPUTracking/DataCompression/GPUTPCDecompressionKernels.cxx#L36, while the previous step1unattached is called for unattached clusters in GPU/GPUTracking/Global/GPUChainTrackingCompression.cxx

Please confirm that this is not used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is a bit more convoluted than I thought and when checking again at first I thought you were right and took me one hour to understand...

So, the decompressTrack( is of course used as you say, but the clusters are copied and touched again in the step for unattached clusters here

if (param.rec.tpc.clustersShiftTimebins != 0.f) {
and here
if (processors.param.rec.tpc.clustersShiftTimebins != 0.f) {
.
Note that the loop in these places runs from 0 to nClusters, not only over the unattached ones, so these 2 loops apply the corrections for all clusters.
The TPCClusterDecompressor is for the tpcUseOldCPUDecoding, while the GPUTPCDecompressionKernels is for Gabriele`s new GPU-capable decoding (which runs on both CPU and GPU).
So fixing these 2 places is enough as I wrote initially.

In your case, you are fixing the edge pad during the attached decoding, and fixing it afterwards again.

Also, your approach is bogus, since decompressTrackStore is not guaranteed to return a reference (it does in some of the overloads). I.e., you might be fixing a temporary object. For your approach, you'd need to fix the edge flag before line 140 here

const auto cluster = decompressTrackStore(cmprClusters, clusterOffset, slice, row, pad, time, args...);
.

That said, I am not sure if Gabriele or we ever tested the clustersShiftTimebins parameter for the new decoding. Perhaps we should double-check before we rely on it, and try with tpcUseOldCPUDecoding=1 on CPU and with tpcUseOldCPUDecoding=0 on CPU and GPU.

@alibuild
Copy link
Collaborator

alibuild commented Apr 9, 2025

Error while checking build/O2/fullCI_slc9 for 2fad56c at 2025-04-10 06:38:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-full-system-test-latest/log
Detected critical problem in logfile sim.log


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/0ee7f2deea1feb7f44770c093e77888c309d4c4d/slc9_x86-64/o2checkcode/1.0-local116/etc/modulefiles
++ cat
--

Full log here.

@shahor02
Copy link
Collaborator Author

@davidrohr ok, removed TPCClusterDecompressionCore.inc changes and tested GPU_rec_tpc.clustersEdgeFixDistance=0.5;GPU_rec_tpc.clustersShiftTimebins=100.; with

(1) GPU_global.deviceType=CPU;GPU_proc.tpcUseOldCPUDecoding=true;
(2) GPU_global.deviceType=CPU;GPU_proc.tpcUseOldCPUDecoding=false;
(3) GPURECOPARAM="$GPUEXTRA;GPU_global.deviceType=CUDA;

Both timeshift and edge bit fix work:

testTBshift100Edge05

@davidrohr
Copy link
Collaborator

great, thx for doing these checks!

@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI_slc9 for 29796fd at 2025-04-10 16:24:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-full-system-test-latest/log
Detected critical problem in logfile sim.log


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
grep: error-log.txt: binary file matches
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/8f26ccab4d2c55bbee20e2c6f6f0c8f2dc69d6e9/slc9_x86-64/o2checkcode/1.0-local132/etc/modulefiles
++ cat
--

Full log here.

@shahor02 shahor02 enabled auto-merge (rebase) April 18, 2025 10:39
@shahor02 shahor02 disabled auto-merge April 18, 2025 10:39
@shahor02 shahor02 merged commit 6963217 into AliceO2Group:dev Apr 18, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants