-
Notifications
You must be signed in to change notification settings - Fork 62
Handle CUDA 13 CI compatibility in gpu-app-collection #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,6 +13,10 @@ all: pannotia rodinia_2.0-ft proxy-apps microbench rodinia-3.1 ispass-2009 pol | |
| endif | ||
| endif | ||
| ci: rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini cuda_samples | ||
| # Keep the repository pinned to the legacy cuda-samples revision for older toolkits, | ||
| # but use the upstream CUDA 13.1 samples when building with CUDA 13. | ||
| CUDA_SAMPLES_CUDA13_REF := 4f735616ba599fe93cc2c6c85dcb4369260f9643 | ||
| CUDA_SAMPLES_PINNED_REF := $(shell git -C .. rev-parse HEAD:src/cuda/cuda-samples 2>/dev/null) | ||
| accelwattch: accelwattch_validation accelwattch_hw_power accelwattch_ubench | ||
| accelwattch_validation: rodinia-3.1_accelwattch_validation parboil_accelwattch_validation cutlass cuda_samples-11.0_accelwattch_validation | ||
| accelwattch_hw_power: rodinia-3.1_hw_power parboil_hw_power cuda_samples-11.0_hw_power | ||
|
|
@@ -518,8 +522,40 @@ mlperf_training: | |
|
|
||
| cuda_samples: | ||
| mkdir -p $(BINDIR)/$(BINSUBDIR)/ | ||
| mkdir -p ./cuda/cuda-samples/build && cd ./cuda/cuda-samples/build && cmake .. && $(MAKE) | ||
| find $(GPUAPPS_ROOT)/src/cuda/cuda-samples/build/Samples -type f -executable -exec mv {} "$(BINDIR)/$(BINSUBDIR)/" \; ; | ||
| set -eu; \ | ||
| repo_dir=./cuda/cuda-samples; \ | ||
| desired_ref="$(CUDA_SAMPLES_PINNED_REF)"; \ | ||
| if [ -z "$$desired_ref" ]; then \ | ||
| desired_ref=$$(git -C "$$repo_dir" rev-parse HEAD); \ | ||
| fi; \ | ||
| source_dir="$$repo_dir"; \ | ||
| build_dir="$$repo_dir/build-$(CUDA_VERSION)"; \ | ||
| export_dir=; \ | ||
| cleanup_cuda_samples() { \ | ||
| if [ -n "$$export_dir" ] && [ -d "$$export_dir" ]; then \ | ||
| rm -rf "$$export_dir"; \ | ||
| fi; \ | ||
| }; \ | ||
| trap 'cleanup_cuda_samples' EXIT; \ | ||
| if [ ${CUDA_VERSION_MAJOR} -ge 13 ]; then \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we have the stuff in setup_environment, do we still need this?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that setup_environment is ok for normal Makefile builds. cuda_samples is a separate case, I believe cause it hardcodes 'CMAKE_CUDA_ARCHITECTURES' not sure. ^ could be completely wrong about this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok - this is a lot of code which I think is mostly setting up enviroment stuff for samples.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it works, we can keep it - @JRPan - who originally integrated samples? Maybe we can check with them? |
||
| desired_ref="$(CUDA_SAMPLES_CUDA13_REF)"; \ | ||
| fi; \ | ||
| original_ref=$$(git -C "$$repo_dir" rev-parse HEAD); \ | ||
| if [ "$$original_ref" != "$$desired_ref" ]; then \ | ||
| export_dir=./cuda/cuda-samples-export-$(CUDA_VERSION); \ | ||
| build_dir="$$repo_dir/build-$(CUDA_VERSION)-compat"; \ | ||
| git -C "$$repo_dir" rev-parse --verify "$$desired_ref^{commit}" >/dev/null; \ | ||
| rm -rf "$$export_dir"; \ | ||
| mkdir -p "$$export_dir"; \ | ||
| archive_file="$$export_dir/source.tar"; \ | ||
| git -C "$$repo_dir" archive --format=tar "$$desired_ref" > "$$archive_file"; \ | ||
| tar -xf "$$archive_file" -C "$$export_dir"; \ | ||
| rm -f "$$archive_file"; \ | ||
| source_dir="$$export_dir"; \ | ||
| fi; \ | ||
| cmake -S "$$source_dir" -B "$$build_dir"; \ | ||
| cmake --build "$$build_dir"; \ | ||
| find "$$build_dir"/Samples -type f -executable -exec mv {} "$(BINDIR)/$(BINSUBDIR)/" \; ; | ||
|
|
||
| pytorch_examples: | ||
| mkdir -p $(BINDIR)/$(BINSUBDIR)/ | ||
|
|
@@ -711,10 +747,11 @@ clean_pytorch_examples: | |
| rm -f $(BINDIR)/$(BINSUBDIR)/inference_vae | ||
|
|
||
| clean_cuda_samples: | ||
| $(MAKE) clean -C ./cuda/cuda-samples/build | ||
| find ./cuda/cuda-samples -maxdepth 1 -type d \( -name 'build' -o -name 'build-*' \) -exec rm -rf {} + | ||
| find ./cuda -maxdepth 1 -type d -name 'cuda-samples-export-*' -exec rm -rf {} + | ||
|
|
||
| clean_huggingface: | ||
| rm -rf $(BINDIR)/$(BINSUBDIR)/huggingface | ||
|
|
||
| clean_GPU_Microbenchmark: | ||
| $(MAKE) clean -C ./cuda/GPU_Microbenchmark | ||
| $(MAKE) clean -C ./cuda/GPU_Microbenchmark | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,6 +46,15 @@ struct GpuConfig | |
| unsigned L2_BANKS = 0; // L2 Cache Banks (LTCs) | ||
| }; | ||
| inline GpuConfig config; | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's 0, the uBench will be wrong. Why is it failing? If the syntax has changed, just update it to use the new syntax. There must still be a way to query device attributes.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, I don't think I changed the FBP_COUNT = 0 and L2_BANKS = 0 I think in the simulator path, they are overwritten from gpgpusim.config. In the hardware path filled with queryGrInfo().
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These apps also run on real GPUs, not just in gpgpu-sim. The ubench uses device query to determine kernel parameters. So these value must be correct (matching the HW) for these kernels to perform as expected. This is not the config in the simulation.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it seems like a pre-existing issue in commit 4becbe3. The only changes I made to stuff like this in the file were to the clock rate, memory clock rate, and memory bus width.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I think I misunderstood your point, yes, that's true. When I go over this, I can make that fix even though it was pre-existing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So @JRPan - are you saying we don't need this function? Isn't the real issue that we are returning 0s for values that should have an attribute? Said another way, why aren't we seeing success with these values? We can just fill them in from the gpgpusim config like the other values. But maybe I am misunderstanding this. |
||
| inline int getDeviceAttributeOrZero(cudaDeviceAttr attr, int deviceID) | ||
| { | ||
| int value = 0; | ||
| if (cudaDeviceGetAttribute(&value, attr, deviceID) != cudaSuccess) | ||
| return 0; | ||
| return value; | ||
| } | ||
|
|
||
| // Parses short flags like --sm 80 into a GpuConfig object | ||
| inline void parseGpuConfigArgs(int argc, char *argv[]) | ||
| { | ||
|
|
@@ -280,8 +289,12 @@ inline unsigned initializeDeviceProp(unsigned deviceID, int argc, char *argv[]) | |
| cudaSetDevice(deviceID); | ||
| cudaGetDeviceProperties(&deviceProp, deviceID); | ||
|
|
||
| int clockRateKHz; | ||
| cudaDeviceGetAttribute(&clockRateKHz, cudaDevAttrClockRate, deviceID); | ||
| int clockRateKHz = | ||
| getDeviceAttributeOrZero(cudaDevAttrClockRate, deviceID); | ||
| int memoryClockRateKHz = | ||
| getDeviceAttributeOrZero(cudaDevAttrMemoryClockRate, deviceID); | ||
| int memoryBusWidthBits = | ||
| getDeviceAttributeOrZero(cudaDevAttrGlobalMemoryBusWidth, deviceID); | ||
|
|
||
| // core stats | ||
| config.SM_NUMBER = deviceProp.multiProcessorCount; | ||
|
|
@@ -310,9 +323,12 @@ inline unsigned initializeDeviceProp(unsigned deviceID, int argc, char *argv[]) | |
|
|
||
| // memory | ||
| config.MEM_SIZE = deviceProp.totalGlobalMem; | ||
| config.MEM_CLK_FREQUENCY = deviceProp.memoryClockRate * 1e-3f; | ||
| config.MEM_BITWIDTH = deviceProp.memoryBusWidth; | ||
| config.CLK_FREQUENCY = clockRateKHz * 1e-3f; | ||
| if (memoryClockRateKHz > 0) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is with all the zeros?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify my intent: the > 0 means if CUDA returns positive, we overwrite the default. If it returns 0, it's because the query is unsupported; it leaves it alone. I made this choice because before we used deviceProp.memoryClockRate and deviceProp.memoryBusWidth, this needed to be switched to cudaDeviceGetAttribute() I don't think its a correctness issue but yea its not the most readable I can change it if you'd like.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JRPan - Should we just be asserting if we don't get good values here? @SamMausberg - the issue is that some of the uBenchmarks need good values here to work properly. If we give them back something conforrect, the uBench will not test the thing we think it is supposed to test. |
||
| config.MEM_CLK_FREQUENCY = memoryClockRateKHz * 1e-3f; | ||
| if (memoryBusWidthBits > 0) | ||
| config.MEM_BITWIDTH = memoryBusWidthBits; | ||
| if (clockRateKHz > 0) | ||
| config.CLK_FREQUENCY = clockRateKHz * 1e-3f; | ||
|
|
||
| // Get FBP_COUNT and L2_BANKS from NVIDIA RM API | ||
| config.FBP_COUNT = queryGrInfo(NV2080_CTRL_GR_INFO_INDEX_LITTER_NUM_FBPS); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,12 @@ SRC = mbarrier.cu | |
|
|
||
| EXE = mbarrier | ||
|
|
||
| ARCH?=sm_80 sm_90a sm_100a sm_101 sm_120 | ||
| ifneq ($(filter 13 14 15 16,$(CUDA_VERSION_MAJOR)),) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know why all this code got added - @SamMausberg, doesn't this just do the exact same thing as the single-line baseline? A bunch of the makefiles have this - please delete unless there is a good reason to have it. |
||
| ARCH_DEFAULT := sm_80 sm_90a sm_100a sm_110 sm_120 | ||
| else | ||
| ARCH_DEFAULT := sm_80 sm_90a sm_100a sm_101 sm_120 | ||
| endif | ||
| ARCH ?= $(ARCH_DEFAULT) | ||
| # Unset the CUDA_CPPFLAGS which is set based on CUDA version | ||
| CUDA_CPPFLAGS= | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Samples should build it; it is an NVIDIA collection and should work at different CUDA versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails with nvcc fatal: Unsupported gpu architecture 'compute_50' when in CUDA 13.1.1 environment.
How would you like me to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ok - they probably dropped support for compute_50 in 13 - I am a little shocked they didn't update the samples to build properly in 13, unless they are just lagging on updating samples.
Are you 100% sure that the latest cuda_samples fails to build with cuda13? Maybe we are syncing to an older version of samples and we need to update it?