Skip to content

Conversation

@lmassacr
Copy link
Contributor

@lmassacr lmassacr commented May 6, 2025

This fix is meant to cure the bug in the nightlies reported in https://its.cern.ch/jira/browse/O2-5885
The aQC MCH+MFT task is enabled only if the MID is not active in the run

@github-actions
Copy link

github-actions bot commented May 6, 2025

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@sawenzel sawenzel merged commit 7e3511e into AliceO2Group:master May 12, 2025
8 checks passed
@alcaliva
Copy link
Collaborator

alcaliva commented Jun 3, 2025

Hi @sawenzel,
I was trying to cherry-pick this PR and I get conflicts. Do you perhaps know what else am I missing?

ERROR: There was a problem cherry-picking 7e3511e
Auto-merging MC/bin/o2dpg_qc_finalization_workflow.py
CONFLICT (content): Merge conflict in MC/bin/o2dpg_qc_finalization_workflow.py
Auto-merging MC/bin/o2dpg_sim_workflow.py
CONFLICT (content): Merge conflict in MC/bin/o2dpg_sim_workflow.py
error: could not apply 7e3511e... Fix for the MC aQC MCH+MFT task to avoid MID active cases (#1988)
hint: After resolving the conflicts, mark them with
hint: "git add/rm ", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
INFO: Trying to continue
INFO: Cherry-picking
SUCCESS: 1 (441b9cc)
SKIPPED: 0 ()
FAILED: 1 (7e3511e)
INFO: Cherry-picking has failed, resetting everything in package O2DPG

@sawenzel
Copy link
Contributor

sawenzel commented Jun 3, 2025

@alcaliva : Did you already try to use this tool: https://github.com/AliceO2Group/O2DPG/blob/master/UTILS/get_cherrypick_commit_list.sh ? It's not perfect ... but might give you the precise list of commits. Let me know if it doesn't work.

@alcaliva
Copy link
Collaborator

alcaliva commented Jun 3, 2025

I get:

To cherry-pick 7e3511e onto branch async-v1-02-10, we need to apply:
0:

Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
Deleting expired sessions... 2 completed.

it didn't help much :/

@alcaliva
Copy link
Collaborator

alcaliva commented Jun 3, 2025

I tried to fix the conflicts myself but the build failed.

@sawenzel
Copy link
Contributor

sawenzel commented Jun 4, 2025

I tried to fix the conflicts myself but the build failed.

Weird. This is only changing stuff in O2DPG which should not affect compilation. How did the build fail?

@sawenzel
Copy link
Contributor

sawenzel commented Jun 4, 2025

I get:

To cherry-pick 7e3511e onto branch async-v1-02-10, we need to apply: 0:

Saving session... ...copying shared history... ...saving history...truncating history files... ...completed. Deleting expired sessions... 2 completed.

it didn't help much :/

This could mean that the commit is already part of the release.

@alcaliva
Copy link
Collaborator

alcaliva commented Jun 4, 2025

I'm trying again. I never understand the logs of Jenkins .... do you?
This is what I got:

Started by user alcaliva
Building remotely on slc8-builder-2 (slc8_x86-64-light) in workspace /local/workspace/Build async reco O2 for CPU+GPU
[WS-CLEANUP] Deleting project workspace...
Triggering Build async reco O2 for CPU+GPU ? slc9_x86-64
Build async reco O2 for CPU+GPU ? slc9_x86-64 completed with result FAILURE
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
Started calculate disk usage of workspace
Finished Calculation of disk usage of workspace in 0 seconds
Finished: FAILURE

@chiarazampolli
Copy link
Collaborator

Ciao @alcaliva ,
From yesterday, when it also failed, I found:

2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0: CMake Error at dependencies/FindO2GPU.cmake:175 (message):
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0:   AMD OpenCL 1.2 not available
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0: Call Stack (most recent call first):
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0:   dependencies/O2Dependencies.cmake:199 (find_package)
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0:   dependencies/CMakeLists.txt:13 (include)
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0:   CMakeLists.txt:65 (include)
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0: 
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0: 
2025-06-03@17:27:57:DEBUG:O2PDPSuite:O2:0: -- Configuring incomplete, errors occurred!
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: Error while executing /local/workspace/BuildAsyncRecoO2/daily-tags.lyk4pTuFWC/SPECS/slc9_x86-64/O2/async-2024-ppRef-apass1-v1-1/build.sh on `alimetal01.cern.ch'.
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: Log can be found in /local/workspace/BuildAsyncRecoO2/daily-tags.lyk4pTuFWC/BUILD/O2-latest/log
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: Please upload it to CERNBox/Dropbox if you intend to request support.
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: Build directory is /local/workspace/BuildAsyncRecoO2/daily-tags.lyk4pTuFWC/BUILD/O2-latest/O2.
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: 
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: Build info:
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: OS: slc9_x86-64
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: Using aliBuild from alibuild@1.17.18 recipes in alidist@7395dd4794
2025-06-03@17:27:57:ERROR:O2PDPSuite:O2:0: Build arguments: --pkgname=['O2PDPSuite'] --defaults=o2 --architecture=slc9_x86-64 --jobs=32 --plugin=legacy --disable=['mesos', 'MySQL', 'openmp', 'make', 'yacc-like', 'make', 'opengl', 'Xdevel'] --annotate={'O2PDPSuite': 'New tag for ppRef 2024 apass1'} --noSystem=*
+ builderr=1
+ echo 'Exiting with an error (1), not tagging'
Exiting with an error (1), not tagging
+ exit 1
+ err=1
+ rm -rf alidist daily-tags.lyk4pTuFWC mirror
+ exit 1
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // retry
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] emailext
Sending email to: alberto.caliva@cern.ch
[Pipeline] End of Pipeline
ERROR: script returned exit code 1
Finished: FAILURE

from: https://alijenkins.cern.ch/job/BuildAsyncRecoO2/719/console

I don't know if this is what you are looking for. Anyway, this error for me would need that @ktf and @singiamtel should look .

Chiara

@alcaliva
Copy link
Collaborator

alcaliva commented Jun 4, 2025

where did you find this log? can you post the link to it?

Apparently, build system requires AMD OpenCL 1.2, which is not available.
I have no clue on how to fix that. I hope that @ktf or @singiamtel can help.

@chiarazampolli
Copy link
Collaborator

chiarazampolli commented Jun 4, 2025

Ciao @alcaliva ,

I posted the link above: https://alijenkins.cern.ch/job/BuildAsyncRecoO2/719/console,

from:

https://alijenkins.cern.ch/job/BuildAsyncRecoO2/719/

from:

https://alijenkins.cern.ch/job/Build%20async%20reco%20O2%20for%20CPU+GPU/272/ARCHITECTURE=slc9_x86-64/console

Note that there is another build ongoing, so I was not sure which one you were interested in. I toll the latest that finished (failing) yesterday.

@alcaliva
Copy link
Collaborator

alcaliva commented Jun 4, 2025

sorry, I had missed it. Yes, the other build is from today, when I retried (it failed again).
Thanks for your help

@singiamtel
Copy link
Collaborator

@davidrohr this looks like it stopped working since yesterday's slc9-gpu-builder redeploy, how can we fix it?

Maybe we need to pin the OpenCL version?

[root@alimetal04 ~]# docker run --rm -it registry.cern.ch/alisw/slc9-gpu-builder clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3635.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               0

@davidrohr
Copy link
Collaborator

What O2 version are you building? This asks for AMD OCL 1.2, which we remove some months ago.
Perhaps only now the builder was updated with it removed?

@davidrohr
Copy link
Collaborator

Could you try setting these 2 environment variables during the build, they should disable the unnecessary opencl1.2 backend:
DISABLE_GPU=1
ALIBUILD_ENABLE_HIP=1

@davidrohr
Copy link
Collaborator

@alcaliva : Did you manage to retry. Do you know how to set env variables in Jenkins? There is an extra field to set them:
https://alijenkins.cern.ch/job/DailyBuilds/job/WeeklyO2Release/configure

@alcaliva
Copy link
Collaborator

alcaliva commented Jun 6, 2025

retrying now... Let you know

@alcaliva
Copy link
Collaborator

alcaliva commented Jun 6, 2025

@davidrohr, your trick worked! the new tag is ready.
Thanks!

@davidrohr
Copy link
Collaborator

@alcaliva : ok, great. Then you will have to keep using this trick for building the old O2 tags. But please remember to remove it once you start building new O2 tags :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants