Skip to content

Conversation

@ariedel-cern
Copy link
Contributor

Add skeleton for GPUErrorQA task in O2.

In this draft, only the number of errors for a given error code is counted and put into a 1D histogram.

In principle the error is reporting 4 values, the error code +3 parameters (like SectorRow, Sector, Value, Max ...). However, the meaning of the parameters depends on the error code itself (defined in GPUErrorCodes.h). There was a suggestion in the past to display the errors in a 2D histogram.

In principle, we could implement 3 2D histograms, putting the error code on the x-axis and the different parameter values on the y-axis.

Tagging @wiechula and @davidrohr for their opinion.

@github-actions
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1

@davidrohr
Copy link
Collaborator

I think a 2D histogram might make sense, at least as an overview, since you would see al error types.
Perhaps even a 1D, just with the counts for the error codes could give a good first overview?
Ideally, could we put the error code names as ticks on the x-axis.

Then we might want additional 3D histograms on top, for single error codes, with multiple parameters?
I guess adding the histograms is simple? We could also just start with something and then extend?

And to be honest, I am not so familiar with other QC tasks, so I don't know what is the normal strategy.

@ariedel-cern ariedel-cern marked this pull request as ready for review March 6, 2025 11:11
@ariedel-cern
Copy link
Contributor Author

Thanks for the comments, David.
For now I put only 1 histogram which counts the occurring errors. Then I can also finish up the implementation in QC itself.
I think memory consumption is always a bit of a concern for QC. If you can tell which errors are of special importance we can discuss if we can have dedicated higher dimensional histograms for these.

Comment on lines +30 to +35
namespace o2
{
namespace tpc
{
namespace qc
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace o2
{
namespace tpc
{
namespace qc
{
namespace o2::tpc::qc
{

Comment on lines +69 to +71
} // namespace qc
} // namespace tpc
} // namespace o2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} // namespace qc
} // namespace tpc
} // namespace o2
} // namespace o2::tpc::qc

Comment on lines +24 to +28
// root includes
#include "TH1.h"

// o2 includes
// #include "DataFormatsTPC/Defs.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use forward declaration for TH1

Suggested change
// root includes
#include "TH1.h"
// o2 includes
// #include "DataFormatsTPC/Defs.h"
class TH1;

void resetHistograms();

/// return histograms
std::unordered_map<std::string, std::unique_ptr<TH1>>& getMapHist() { return mMapHist; };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the histograms don't need to be filled outside, this can be const

Suggested change
std::unordered_map<std::string, std::unique_ptr<TH1>>& getMapHist() { return mMapHist; };
std::unordered_map<std::string, std::unique_ptr<TH1>>& getMapHist() const { return mMapHist; };

#ifndef AliceO2_TPC_QC_GPUERRORQA_H
#define AliceO2_TPC_QC_GPUERRORQA_H

#include <memory>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <memory>
#include <string>
#include <memory>


// root includes
#include "TFile.h"
#include <TH1.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <TH1.h>
#include "TH1.h"

Comment on lines +15 to +16
#include <memory>
#include <unordered_map>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already defined in the header

Suggested change
#include <memory>
#include <unordered_map>

};

// 1D histogram counting all reported errors
mMapHist["ErrorCounter"] = std::make_unique<TH1F>("ErrorCounter", "ErrorCounter", errorNames.size(), 0, errorNames.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a histogram of floats? If it is a simple counter, it could also be TH1I, unless one would like to normalize at some point.

Comment on lines +73 to +78
for (const auto& [name, hist] : mMapHist) {
TObjArray arr;
arr.SetName(name.data());
arr.Add(hist.get());
arr.Write(arr.GetName(), TObject::kSingleKey);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it does not make sense to put each histogram in a separate array.

Suggested change
for (const auto& [name, hist] : mMapHist) {
TObjArray arr;
arr.SetName(name.data());
arr.Add(hist.get());
arr.Write(arr.GetName(), TObject::kSingleKey);
}
TObjArray arr;
arr.SetName("GPUError_Hists");
for (const auto& [name, hist] : mMapHist) {
arr.Add(hist.get());
}
arr.Write(arr.GetName(), TObject::kSingleKey);

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2025

This PR did not have any update in the last 30 days. Is it still needed? Unless further action in will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

3 participants