Skip to content

Multi GPU support #290

Open
Open
@saswat0

Description

@saswat0

Problem Description

The current implementation doesn't consider servers with multiple GPUs. For scenarios where several cards, each with a lower VRAM are present, running CTGAN throws an out of memory.

The below trace is during a run where a job was triggered on a T4 GPU (common in cloud servers). The real dataset had 26 columns and 20k rows.

OutOfMemoryError: CUDA out of memory. Tried to allocate 3.46 GiB (GPU 0; 14.76 GiB total capacity; 10.49 GiB already allocated; 621.75 MiB free; 13.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected behavior

CTGAN should be able to leverage PyTorch's DataParallel module such that model and data parallelism can be facilitated for bigger batch sizes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestRequest for a new featurepending reviewThis issue needs to be further reviewed, so work cannot be started

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions