Proposal: putting `libcuda.so` in our Gentoo Prefix installation.

Some important info about the CUDA software stack, and how it could change (subject to testing):

As most of you know we have 3 layers:

1. the kernel modules (`/lib/modules/$(uname -r)/extra/nvidia.ko.xz` and related)
2. the user-mode driver component used to run CUDA applications ( `/usr/lib64/nvidia/libcuda.so`)
3. the CUDA toolkit (from `module load cuda`)

up so far we assumed that 1 & 2 are tightly coupled. But an NVidia employee in the EasyBuild slack clarified they are not, and `libcuda.so.1` is forward compatible and the newest libcuda (465.x) is compatible with kernel drivers going all the way back to 418.40.04+.

Note that in fact there are four maintained driver families: the long term support ones (R418, EOL Mar 2022, R450, EOL Jul 2023) and short term ones (R460, EOL Jan 2022, and R465). Béluga and Graham are running an R460 version, Cedar is at R455, which is no longer supported.

So this means that we could put the newest libcuda in cvmfs and the sysadmins only need to worry about the kernel modules. This will need to be tested of course (which we can do via `LD_LIBRARY_PATH` and/or the cvmfs-dev repo).

Once libcuda is in place all cuda toolkit modules, including 11.3, can then be used on all clusters, irrespective of the kernel driver (as long as it's >= R418.40.04), and the present Lmod check could become obsolete.

As for kernel modules, clusters could consider staying with an R450 version, since with libcuda in cvmfs it no longer needs to be upgraded to 460 to stay compatible with newer CUDA toolkit versions.

see this
https://docs.nvidia.com/datacenter/tesla/drivers/#lifecycle
and this:
https://docs.nvidia.com/deploy/cuda-compatibility/index.html#cuda-compatibility-platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: putting `libcuda.so` in our Gentoo Prefix installation. #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: putting libcuda.so in our Gentoo Prefix installation. #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Proposal: putting `libcuda.so` in our Gentoo Prefix installation. #79