CUDA Error when pytorch distribution training...

Hi, thanks for your contribution, when I using distribution training, there is always RuntimeError:  `RuntimeError: CUDA error: invalid device function`, here is my test code:

```
import torch
from spatial_correlation_sampler import SpatialCorrelationSampler

device = "cuda"
batch_size = 1
channel = 1
H = 10
W = 10
dtype = torch.float32

input1 = torch.randint(1, 4, (batch_size, channel, H, W), dtype=dtype, device=device, requires_grad=True)
input2 = torch.randint_like(input1, 1, 4).requires_grad_(True)

correlation_sampler = SpatialCorrelationSampler(
    kernel_size=3,
    patch_size=1,
    stride=2,
    padding=0,
    dilation=2,
    dilation_patch=1)

model = torch.nn.DataParallel(correlation_sampler, device_ids=[0,1,2]).cuda()

out = model(input1, input2)

print(out.shape)
```

My enviroment is
```
Ubuntu 18.04.5 LTS
PyTorch -- 1.6.0
torchvision -- 0.7.0
gcc -- 7.5.0
CUDA -- 10.2
```

The whole error info is:
```
Traceback (most recent call last):
  File "test.py", line 24, in <module>
    out = model(input1, input2)
  File "/home/liming/anaconda3/envs/motionsqueeze/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/liming/anaconda3/envs/motionsqueeze/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
    return self.gather(outputs, self.output_device)
  File "/home/liming/anaconda3/envs/motionsqueeze/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/liming/anaconda3/envs/motionsqueeze/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/home/liming/anaconda3/envs/motionsqueeze/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/liming/anaconda3/envs/motionsqueeze/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 68, in forward
    return comm.gather(inputs, ctx.dim, ctx.target_device)
  File "/home/liming/anaconda3/envs/motionsqueeze/lib/python3.8/site-packages/torch/cuda/comm.py", line 166, in gather
    return torch._C._gather(tensors, dim, destination)
RuntimeError: CUDA error: invalid device function
[1]    20866 segmentation fault (core dumped)  python test.py
```

For un-distribution training, there is no error, but still some strange info:
```
torch.Size([1, 1, 1, 3, 3])
[1]    22742 segmentation fault (core dumped)  python test.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Error when pytorch distribution training... #63

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA Error when pytorch distribution training... #63

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions