Skip to content

Conversation

@ktf
Copy link
Member

@ktf ktf commented Feb 18, 2025

This is the first step towards having the workflow definition inside plugins
rather than in executables. This will allow accumulating plugins which are needed
to instantiate a topology and do the option parsing / topology building only once,
simplifying the current case.

The end goal is to allow the driver to preload certain common services (e.g. ROOT)
and share it among the different tasks (which at the moment it's not allowed because different
tasks are in different executables). Moreover this will allow us to coalesce strictly coupled
dataprocessors and reduce the number of running processes.

For now the plugins are embedded in the executables and behave exactly like before.

@ktf ktf requested a review from a team as a code owner February 18, 2025 10:20
@github-actions
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1

@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI for c381e69 at 2025-02-18 12:40:

## sw/BUILD/O2-latest/log
c++: error: unrecognized command-line option '--rtlib=compiler-rt'
c++: error: unrecognized command-line option '--rtlib=compiler-rt'
/sw/SOURCES/O2/13976-slc8_x86-64/0/Framework/Core/src/runDataProcessing.cxx:62:10: fatal error: Framework/WorkflowDefinitionContext.h: No such file or directory
ninja: build stopped: subcommand failed.

Full log here.

@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI_slc9 for c381e69 at 2025-02-18 12:52:

## sw/BUILD/O2-latest/log
/sw/SOURCES/O2/13976-slc9_x86-64/0/Framework/Core/src/runDataProcessing.cxx:62:10: fatal error: Framework/WorkflowDefinitionContext.h: No such file or directory
ninja: build stopped: subcommand failed.

Full log here.

@alibuild
Copy link
Collaborator

alibuild commented Mar 7, 2025

Error while checking build/O2/fullCI for 9f7bb5c at 2025-03-07 19:40:

## sw/BUILD/O2-latest/log
c++: error: unrecognized command-line option '--rtlib=compiler-rt'
c++: error: unrecognized command-line option '--rtlib=compiler-rt'


## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-full-system-test-latest/log
Detected critical problem in logfile digi.log
digi.log:[19440:internal-dpl-ccdb-backend]: [18:39:54][ERROR] Exception while running: Fatal error. Rethrowing.
digi.log-[19440:internal-dpl-ccdb-backend]: [18:39:54][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-sim-digitizer-workflow, device shutting down. Reason: Fatal error
[19440:internal-dpl-ccdb-backend]: [18:39:52][ERROR] CCDBDownloader CURL transfer error - Timeout was reached
[19440:internal-dpl-ccdb-backend]: [18:39:52][ERROR] CcdbDownloader finished transfer http://alice-ccdb.cern.ch/CTP/Calib/OrbitReset for 1550600800000 (agent_id: alimetal09.cern.ch-1741372786-C9T8W4) with http code: 0
[19440:internal-dpl-ccdb-backend]: [18:39:52][ERROR] File CTP/Calib/OrbitReset could not be retrieved. No more hosts to try.
[19440:internal-dpl-ccdb-backend]: [18:39:52][FATAL] Unable to find object CTP/Calib/OrbitReset/1550600800000
[19440:internal-dpl-ccdb-backend]: [18:39:54][ERROR] Exception while running: Fatal error. Rethrowing.
[19440:internal-dpl-ccdb-backend]: [18:39:54][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-sim-digitizer-workflow, device shutting down. Reason: Fatal error
[ERROR] Workflow crashed - PID 19440 (internal-dpl-ccdb-backend) did not exit correctly however it's not clear why. Exit code forced to 128.


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/fafc3821b8e07a2624e54e94ca631a495eb2429b/slc8_x86-64/o2checkcode/1.0-local283/etc/modulefiles
++ cat
[0 more errors; see full log]

Full log here.

@alibuild
Copy link
Collaborator

alibuild commented Mar 9, 2025

Error while checking build/O2/fullCI_slc9 for 9f7bb5c at 2025-03-09 09:51:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-sim-challenge-test-latest/log
./sim-challenge.logDetected critical problem in logfile trdMatch.log
./sim-challenge.logtrdMatch.log:[13647:internal-dpl-ccdb-backend]: [09:51:47][ERROR] Exception while running: Fatal error. Rethrowing.
./sim-challenge.logtrdMatch.log-[13647:internal-dpl-ccdb-backend]: [09:51:48][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-trd-global-tracking, device shutting down. Reason: Fatal error
./sim-challenge.log[13647:internal-dpl-ccdb-backend]: [09:51:47][ERROR] CCDBDownloader CURL transfer error - Timeout was reached
./sim-challenge.log[13647:internal-dpl-ccdb-backend]: [09:51:47][ERROR] CcdbDownloader finished transfer http://alice-ccdb.cern.ch/CTP/Calib/OrbitReset for 1546300800000 (agent_id: alimetal00.cern.ch-1741510305-DRrLFH) with http code: 0
./sim-challenge.log[13647:internal-dpl-ccdb-backend]: [09:51:47][ERROR] File CTP/Calib/OrbitReset could not be retrieved. No more hosts to try.
./sim-challenge.log[13647:internal-dpl-ccdb-backend]: [09:51:47][FATAL] Unable to find object CTP/Calib/OrbitReset/1546300800000
./sim-challenge.log[13647:internal-dpl-ccdb-backend]: [09:51:47][ERROR] Exception while running: Fatal error. Rethrowing.
./sim-challenge.log[13647:internal-dpl-ccdb-backend]: [09:51:48][FATAL] Unhandled o2::framework::runtime_error reached the top of main of o2-trd-global-tracking, device shutting down. Reason: Fatal error
./sim-challenge.log[ERROR] Workflow crashed - PID 13647 (internal-dpl-ccdb-backend) did not exit correctly however it's not clear why. Exit code forced to 128.
./sim-challenge.log[ERROR]  - Device internal-dpl-ccdb-backend: pid 13647 (exit 128)
./sim-challenge.log[INFO]    - First error: [09:51:47][FATAL] Unable to find object CTP/Calib/OrbitReset/1546300800000
./sim-challenge.log[ERROR] SEVERE: Device internal-dpl-ccdb-backend (13647) had at least one message above severity 7: Unable to find object CTP/Calib/OrbitReset/1546300800000
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/37}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/38}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/40}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/42}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/43}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/45}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/46}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/47}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/48}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/49}
./digi.log[5923:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/37}
./digi.log[5923:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/38}
./digi.log[5923:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/40}
./digi.log[5923:internal-dpl-clock]: [ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/42}
[0 more errors; see full log]

Full log here.

@ktf ktf requested a review from a team as a code owner March 20, 2025 11:25
@ktf ktf changed the title DPL: first step to make workflow definition plugins Ignore signals for now Mar 20, 2025
@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI_slc9 for 4e90ae6 at 2025-03-20 16:22:

## sw/BUILD/O2Physics-latest/log
[ERROR] Could not load library ""
[ERROR] lib"".so: cannot open shared object file: No such file or directory
ninja: build stopped: subcommand failed.

Full log here.

@alibuild
Copy link
Collaborator

alibuild commented Mar 21, 2025

Error while checking build/O2/fullCI_slc9 for 50c20c6 at 2025-03-27 06:27:

## sw/BUILD/O2-RTC-test-latest/log
clang++: error: unknown argument: '-mdaz-ftz'
clang++: error: unknown argument: '-mdaz-ftz'
clang++: error: unknown argument: '-mdaz-ftz'
clang++: error: unknown argument: '-mdaz-ftz'


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
grep: error-log.txt: binary file matches
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/8cf16e10b67428ee404246e8b98da18bd1d07a65/slc9_x86-64/o2checkcode/1.0-local9/etc/modulefiles
++ cat
--

Full log here.

@ktf ktf changed the title Ignore signals for now DPL: first step to make workflow definition plugins Mar 27, 2025
@alibuild
Copy link
Collaborator

alibuild commented Mar 27, 2025

Error while checking build/O2/fullCI_slc9 for d466abb at 2025-04-15 23:40:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-full-system-test-latest/log
task timeout reached .. killing all processes


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
grep: error-log.txt: binary file matches
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/e20a5f1f5d46957e4c6480c7cd3d47744f3f0827/slc9_x86-64/o2checkcode/1.0-local188/etc/modulefiles
++ cat
--

Full log here.

@ktf ktf changed the title DPL: first step to make workflow definition plugins More fixes for the plugins Apr 16, 2025
@alibuild
Copy link
Collaborator

alibuild commented Apr 16, 2025

Error while checking build/O2/fullCI_slc9 for 7f354f4 at 2025-04-17 09:06:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-sim-challenge-test-latest/log
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/37}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/38}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/40}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/42}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/43}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/45}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/46}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/47}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/48}
./digi.log[ERROR] Found duplicate input binding with different spec.:collisioncontext {SIM/COLLISIONCONTEXT/49}


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/0f7d3c85df7188ebf07fe58b3ab46be893602459/slc9_x86-64/o2checkcode/1.0-local196/etc/modulefiles
++ cat
--

Full log here.

This is the first step towards having the workflow definition inside plugins
rather than in executables. This will allow accumulating plugins which are needed
to instantiate a topology and do the option parsing / topology building only once,
simplifying the current case.

The end goal is to allow the driver to preload certain common services (e.g. ROOT)
and share it among the different tasks (which at the moment it's not allowed because different
tasks are in different executables). Moreover this will allow us to coalesce strictly coupled
dataprocessors and reduce the number of running processes.

For now the plugins are embedded in the executables and behave exactly like before.
@ktf ktf changed the title More fixes for the plugins DPL: first step to make workflow definition plugins Apr 17, 2025
@alibuild
Copy link
Collaborator

alibuild commented Apr 17, 2025

Error while checking build/O2/fullCI_slc9 for 0c10047 at 2025-06-11 10:38:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:


## sw/BUILD/O2-full-system-test-latest/log
command o2-raw-file-reader-workflow --session default --shm-segment-size 8000000000 --configKeyValues "HBFUtils.nHBFPerTF=128" --detect-tf0 --input-conf raw/FT0/FT0raw.cfg | o2-ft0-flp-dpl-workflow  --session default --shm-segment-size 8000000000 -b had nonzero exit code 141


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
grep: error-log.txt: binary file matches
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/738efe48533deb74caaf83b72ec4052b86eb91d9/slc9_x86-64/o2checkcode/1.0-local113/etc/modulefiles
++ cat
--

Full log here.

@github-actions
Copy link
Contributor

This PR did not have any update in the last 30 days. Is it still needed? Unless further action in will be closed in 5 days.

@github-actions github-actions bot added the stale label Jul 12, 2025
@github-actions github-actions bot closed this Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants