Skip to content

Conversation

@ehellbar
Copy link
Collaborator

@ehellbar ehellbar commented Jul 4, 2025

in case of topology generation error, this does the following:

  1. in InfoLogger GUI, it prints the last message from the gen_topo scripts stderr first, directly after the "Topology generation script failed" message from ODC, as a fatal so it is displayed as level 1 and, afterwards, the full stderr message stream will be printed as it is now.
  2. the error string in AliECS GUI contains the first one or two lines of the gen_topo scripts stderr output. With this, the last line of the gen_topo stderr output will be printed first also in the AliECS GUI error string.

The scripts contain a few foreseen error sources for which we print a corresponding error message. Especially for these cases, this should help the shifter to distinguish between different topology failure reasons.

Two examples of AliECS GUI error strings

a) EPN resource allocation error (this triggered me to look into this)

# current
cannot create new environment: critical hook failed at trigger after_DEPLOY: EPN PartitionInitialize call failed. REASON: status ERROR from ODC with error code 111 from ODC: Failed topology (Incorrect topology provided: Topology generation script failed with exit code: 1, stderr: "Reusing cached XML topology 6a8ce85e53cff1a07fa786ee7e0aa590 Running post-caching topo-merger command: env - PYTHONPATH+=/usr/local/lib/python

# with this PR
cannot create new environment: critical hook failed at trigger after_DEPLOY: EPN PartitionInitialize call failed. REASON: status ERROR from ODC with error code 111 from ODC: Failed topology (Incorrect topology provided: Topology generation script failed with exit code: 1, stderr: "FATAL Error during EPN resource allocation - full stderr output: Running topology generation to temporary file /var/tmp/gen_topo/1_1/output.xm

b) workflow parsing error

# current
cannot create new environment: critical hook failed at trigger after_DEPLOY: EPN PartitionInitialize call failed. REASON: status ERROR from ODC with error code 111 from ODC: Failed topology (Incorrect topology provided: Topology generation script failed with exit code: 1, stderr: "Running topology generation to temporary file /var/tmp/gen_topo/1_1/output.xml Loading O2PDPSuite/epn-20250605.2-DDv1.6.9-QCv1.176.0-flp-suite

# with this PR
cannot create new environment: critical hook failed at trigger after_DEPLOY: EPN PartitionInitialize call failed. REASON: status ERROR from ODC with error code 111 from ODC: Failed topology (Incorrect topology provided: Topology generation script failed with exit code: 1, stderr: "FATAL Error during workflow description parsing - full stderr output: Running topology generation to temporary file /var/tmp/gen_topo/1_1/outp

@github-actions
Copy link

github-actions bot commented Jul 4, 2025

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@ehellbar
Copy link
Collaborator Author

ehellbar commented Jul 9, 2025

@davidrohr do you approve? :)

@ehellbar ehellbar merged commit d0f996b into AliceO2Group:master Jul 12, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants