Skip to content

Conversation

@ktf
Copy link
Member

@ktf ktf commented Apr 1, 2025

No description provided.

@ktf ktf requested a review from a team as a code owner April 1, 2025 21:46
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2025

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@ktf
Copy link
Member Author

ktf commented Apr 1, 2025

@shahor02 this should be close to what is needed. Not tested yet since I am rebuilding the world...
@knopers8 do you use the tfCounter of timers in any meaningful way? If not, any objections to make it seconds since EPOCH or something like that?

@alibuild
Copy link
Collaborator

alibuild commented Apr 2, 2025

Error while checking build/O2/fullCI_slc9 for fcdd7f3 at 2025-04-02 13:43:

## sw/BUILD/O2-full-system-test-latest/log
Detected critical problem in logfile digi.log
digi.log-[20040:SimReader]: [13:43:34][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-sim-digitizer-workflow, device shutting down. Reason: Cannot find N2o29framework17DataTakingContextE service using a global salt.
[20040:SimReader]: [13:43:34][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-sim-digitizer-workflow, device shutting down. Reason: Cannot find N2o29framework17DataTakingContextE service using a global salt.
[ERROR] Workflow crashed - PID 20040 (SimReader) did not exit correctly however it's not clear why. Exit code forced to 128.


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/98c3703005fe7ad79bdd33137f30356664a8f9cc/slc9_x86-64/o2checkcode/1.0-local63/etc/modulefiles
++ cat
--

Full log here.

@knopers8
Copy link
Collaborator

knopers8 commented Apr 2, 2025

@knopers8 do you use the tfCounter of timers in any meaningful way? If not, any objections to make it seconds since EPOCH or something like that?

We use it only with data inputs. It should be safe to change it with regards to QC. Unless it can break the latest possible timeframe computations?

@ktf
Copy link
Member Author

ktf commented Apr 2, 2025

In principle timers are already skipped for that...

@ktf ktf changed the title Attempt at adding the run number to timers and enumerations DPL: attempt at adding the run number to timers and enumerations Apr 2, 2025
@davidrohr
Copy link
Collaborator

Should we perhaps reserve special runnumbers we put in there, instead or initializing to 0 in case of failure? Then we would at least know why we get invalid run numbers.

@ktf
Copy link
Member Author

ktf commented Apr 2, 2025

yes, I was thinking about it however I wanted to minimise changes to the current behaviour.

try {
dh.runNumber = atoi(services.get<DataTakingContext>().runNumber.c_str());
} catch (...) {
dh.runNumber = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dh.runNumber = 0;
dh.runNumber = -1;

@davidrohr something like this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I would rather reserve a range of ~100 invalid positive numbers with meaning, like we have a range for unanchored MC. You can discuss the range with @ehellbar and RC.
Then, the invalid runNumber check would also need to check for that range. And if we get an error with a number from that range, it is clear how it happened.

@alibuild
Copy link
Collaborator

alibuild commented Apr 2, 2025

Error while checking build/O2/fullCI_slc9 for 3f8e858 at 2025-04-02 19:32:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-full-system-test-latest/log
Detected critical problem in logfile digi.log
digi.log:[21522:internal-dpl-ccdb-backend]: [19:31:57][ERROR] Exception while running: Fatal error. Rethrowing.
digi.log-[21522:internal-dpl-ccdb-backend]: [19:31:57][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-sim-digitizer-workflow, device shutting down. Reason: Fatal error
[21522:internal-dpl-ccdb-backend]: [19:31:54][ERROR] CCDBDownloader CURL transfer error - Timeout was reached
[21522:internal-dpl-ccdb-backend]: [19:31:54][ERROR] CcdbDownloader finished transfer http://alice-ccdb.cern.ch/CTP/Calib/OrbitReset for 1550600800000 (agent_id: alimetal04.cern.ch-1743615103-qwInpc) with http code: 0
[21522:internal-dpl-ccdb-backend]: [19:31:54][ERROR] File CTP/Calib/OrbitReset could not be retrieved. No more hosts to try.
[21522:internal-dpl-ccdb-backend]: [19:31:54][FATAL] Unable to find CCDB object CTP/Calib/OrbitReset/1550600800000
[21522:internal-dpl-ccdb-backend]: [19:31:57][ERROR] Exception while running: Fatal error. Rethrowing.
[21522:internal-dpl-ccdb-backend]: [19:31:57][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-sim-digitizer-workflow, device shutting down. Reason: Fatal error
[ERROR] Workflow crashed - PID 21522 (internal-dpl-ccdb-backend) did not exit correctly however it's not clear why. Exit code forced to 128.


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/858be680255d42de56e650d92b49dec242332dbf/slc9_x86-64/o2checkcode/1.0-local66/etc/modulefiles
++ cat
--

Full log here.

@ktf ktf merged commit 75153a0 into AliceO2Group:dev Apr 3, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants