Skip to content
This repository was archived by the owner on Mar 1, 2025. It is now read-only.
This repository was archived by the owner on Mar 1, 2025. It is now read-only.

RuntimeError: CUDA error: an illegal memory access was encountered #231

@eddiewrc

Description

@eddiewrc

Hi, first of all thanks for sharing this library with all of us!
Unfortunately I am encountering few problems while trying to run it. In particular, I tried to build the following network, which is supposed to take as input a sparse tensor of shape (8192, 16384). Part of it is now commented because I tried to locate the origin of the problem, and apparently it happens just with just the first Convolution module (so I commented the rest for now)

The error that I get is pasted below. The GPU is a quadro gv100, system cuda version 11.4, pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0

class HCSparseConvNet1(t.nn.Module):
        def __init__(self, featSize, numOut, size, name = "NN"):
                super(HCSparseConvNet1, self).__init__()
                print(size)
                self.inputLayer = scn.InputLayer(2, size, 2)
                
                self.sparseModel = scn.Sequential(scn.Convolution(2,1,4,8,8, True))#, scn.Convolution(2,4,8,8,4, True), scn.LeakyReLU(), scn.Convolution(2,8,16,3,2,True), scn.LeakyReLU(), scn.Convolution(2,16,16, 3,2, True), scn.SparseToDense(2, 16))#, scn.MaxPooling(2,16,8), scn.Convolution(2, 10,10,64,32, False))
                self.out1 = t.nn.Sequential(t.nn.GroupNorm(1,16), t.nn.Tanh(), t.nn.Conv2d(16,8,3,2), t.nn.GroupNorm(1,8), t.nn.Tanh(), t.nn.Conv2d(8,4,3,1, padding=1), t.nn.GroupNorm(1,4), t.nn.Tanh())
#self.spatial_size= self.sparseModel.input_spatial_size(size)
                self.final = t.nn.Sequential(t.nn.Linear(7812, 100), t.nn.LayerNorm(100), t.nn.Tanh(), t.nn.Linear(100, numOut))

        def forward(self, x, batchSize):
                #print(x[0].size(), x[1].size())
                x = self.inputLayer(x)
                x = self.sparseModel(x)
                print(x)
                #x = self.out1(x)
                #print(x.size())
                #x = self.final(x.view(batchSize, -1))
                return x

The error:

Traceback (most recent call last):
  File "/home/eddiewrc/galiana2/galianaHCsparseConvNet.py", line 144, in <module>
    sys.exit(main(sys.argv))
  File "/home/eddiewrc/galiana2/galianaHCsparseConvNet.py", line 94, in main
    wrapper.fit(X, Y, device, epochs=50, batch_size = 11, LOG=False)
  File "/home/eddiewrc/galiana2/sources/HCModels.py", line 200, in fit
    yp = self.model.forward([coord, features], batchSize)
  File "/home/eddiewrc/galiana2/sources/HCModels.py", line 58, in forward
    print(x)
  File "/home/eddiewrc/SparseConvNet/sparseconvnet/sparseConvNetTensor.py", line 58, in __repr__
    'features=' + repr(self.features) + \
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor.py", line 305, in __repr__
    return torch._tensor_str._str(self)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 434, in _str
    return _str_intern(self)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 409, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 264, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 296, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
```.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions