Skip to content

Current instructions don't work for installation. #1092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mullenba opened this issue May 17, 2025 · 1 comment
Open

Current instructions don't work for installation. #1092

mullenba opened this issue May 17, 2025 · 1 comment

Comments

@mullenba
Copy link

mullenba commented May 17, 2025

Working from a fresh Ubuntu 24.04 install, I can't get nvidia-container-toolkit working.

I've tried a bunch of times, no matter what I follow I get:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

nvidia-smi shows everything is fine with the driver.

Sat May 17 19:28:26 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:08:00.0 Off |                  N/A |
| 53%   64C    P0            138W /  420W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:09:00.0 Off |                  Off |
| 30%   43C    P0             64W /  450W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        Off |   00000000:42:00.0 Off |                  Off |
| 30%   46C    P0             56W /  450W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3090        Off |   00000000:43:00.0 Off |                  N/A |
| 30%   51C    P0            113W /  350W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Here's the process I'm using, starting from scratch.

# Remove old drivers

sudo apt-get --purge remove -y "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"

sudo apt-get --purge remove -y "*nvidia*" "libxnvctrl*" 'libnvidia-.*' 

sudo /usr/bin/nvidia-uninstall

sudo apt-get autoremove -y

# Install base drivers

sudo apt install -y build-essential gcc dirmngr ca-certificates software-properties-common apt-transport-https dkms curl -y

sudo apt install -y nvidia-driver-550-server

sudo apt install -y nvidia-utils-550-server

sudo apt install -y nvidia-cuda-toolkit

(Reboot)

sudo apt-get install -y docker-ce

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

I think I've tried every version of the drivers from 535-570 and their corresponding CUDA versions. Nothing works.

@mullenba
Copy link
Author

I figured out the issue. I think that during install of Ubuntu, I told it to install docker. This installed docker through snap, which is broken. Removing snap docker fixed the issue.

sudo snap disable docker

sudo snap remove docker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant