Skip to content

Conversation

@agirault
Copy link

@agirault agirault commented Mar 31, 2025

Context

  • The current cuda SASS/PTX list is hardcoded manually based on a versioning heuristic that is error-prone. Case in point:
    • blackwell sm 10.1 is missing
    • blackwell sm 10.0 is supported after 12.8, not 12.6.
      make[1]: Entering directory '***/gdrcopy/tests'
      /usr/local/cuda/bin/nvcc -o pplat.o -c pplat.cu -lcuda -lpthread -ldl -lgdrapi -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -gencode arch=compute_60,code=compute_60 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_62,code=compute_62 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_72,code=compute_72 -gencode arch=compute_75,code=compute_75 -gencode arch=compute_80,code=compute_80 -gencode arch=compute_86,code=compute_86 -gencode arch=compute_87,code=compute_87 -gencode arch=compute_90,code=compute_90 -gencode arch=compute_100,code=compute_100 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_100,code=sm_100
      nvcc fatal   : Unsupported gpu architecture 'compute_100'
      make[1]: *** [Makefile:54: pplat.o] Error 1
  • full arch list are used for code (SASS) and compute (PTX). For PTX, only latest is needed.

Changes

  • Use sm list from nvcc --list-gpu-code directly when available
  • Fix blackwell sm list and version compatibility
  • Consolidate compute & sm list in a single variable
  • Only build PTX for last supported arch

Signed-off-by: Alexis Girault <agirault@nvidia.com>
- 10.0 is from CTK 12.8+
- 10.1 was missing

Signed-off-by: Alexis Girault <agirault@nvidia.com>
The list was the same

Signed-off-by: Alexis Girault <agirault@nvidia.com>
Signed-off-by: Alexis Girault <agirault@nvidia.com>
COMPUTE_LIST="$COMPUTE_LIST 120"
SM_LIST="$SM_LIST 120"
# Add Blackwell (10.0, 10.1, 12.0) if CUDA >= 12.8
if [ "$CUDA_VERSION_MAJOR" -ge 12 ] && [ "$CUDA_VERSION_MINOR" -ge 8 ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this check work for CUDA 13.0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

fi

# Add Ada Lovelace (8.9) if CUDA >= 11.8
if [ "$CUDA_VERSION_MAJOR" -ge 11 ] && [ "$CUDA_VERSION_MINOR" -ge 8 ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work on CUDA 12.0 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it would. wasn't in the scope of this MR but I can edit these checks to check major first

@pakmarkthub
Copy link
Collaborator

@agirault The fix has been merged to R2.5 branch (https://github.com/NVIDIA/gdrcopy/tree/R2.5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants