Skip to content

Faiss IVFPQ (GPU) Enable raft-on cause /dev/nvidiactl ioctl race #4272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
neezeeyee opened this issue Apr 3, 2025 · 6 comments
Open

Faiss IVFPQ (GPU) Enable raft-on cause /dev/nvidiactl ioctl race #4272

neezeeyee opened this issue Apr 3, 2025 · 6 comments
Labels

Comments

@neezeeyee
Copy link

neezeeyee commented Apr 3, 2025

Severe Performance Degradation Due to NVIDIA Driver (nvidiactl) ioctl Contention When RAFT is Enabled in Faiss GPU Build

Environment

  • Faiss Version: 1.9.0

  • RAFT Version: 24.06.00

  • GPU

  • Faiss Build Configuration:

  # Docker build snippet
RUN set -ex \
  && mkdir -p /root/logs \
  && rm -rf /lib64/libstdc++.so.6.0.25-gdb.py \
  && ldconfig \
  && cd cmake-3.29.4 \
  && ./configure --prefix=/usr/local/cmake \
  && make && make install \
  && ln -s /usr/local/cmake/bin/cmake /usr/bin/cmake \
  && cd .. && rm -rf cmake-3.29.4 \
  && cd raft-24.06.00 \
  && ./build.sh libraft \
  && cd ..

ENV raft_DIR=/home/raft-24.06.00/cpp/build/

RUN cd faiss-1.9.0 \
  && cmake -B build . -DFAISS_ENABLE_GPU=ON -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=OFF -DCUDAToolkit_ROOT=/usr/local/cuda/targets/x86_64-linux/include -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_RAFT=ON  -DCMAKE_BUILD_TYPE=Release  -DCMAKE_CUDA_ARCHITECTURES="70;75;80;86;90" \
  && make -C build install -j6 && cd .. \
  && rm -rf faiss-1.9.0

Problem Description

We run GPU searches using the following workflow

std::unique_ptr<faiss::Index> index_cpu_ptr(faiss::read_index(faiss_local_file_path.c_str(), 0));
faiss::Index* cpu_ptr = cpu_unique_ptr.release();
 faiss::gpu::StandardGpuResources res;
faiss::Index* index_gpu = faiss::gpu::index_cpu_to_gpu(&res, 0, cpu_ptr, &options);
index_gpu->search();

Within a single k8s Pod, multiple GPU indices (gpu_index) may coexist, where each index is bound to an isolated StandardGpuResources, Under normal circumstances, search operations across these indices should not interfere with each other due to resource isolation.

1. Cross-Pod Driver-Level Contention

When monitoring multiple k8s Pods sharing the same GPU host, we observed heavy contention through:

@ioctl_file[b, /dev/nvidiactl, NV_ESC_RM_FREE]:  // 
[0]                 2248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

@ioctl_file[b /dev/nvidiactl, NV_ESC_RM_VID_HEAP_CONTROL]: 
[0]                 2248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

2. Within-Pod Contention

raft race
Image

Solve

remove DFAISS_ENABLE_RAFT=ON and recomplie faiss

@cjnolet
Copy link
Contributor

cjnolet commented Apr 4, 2025

Hi @neezeeyee. There have been significant updates to this since June of last year (version 24.06 denotes June of 2024). Are you able to update FAISS to use the new cuVS library? It's also possible updating RAFT could work @tarang-jain do you recall what version of RAFT was released w/ FAISS 1.9.0?

@tarang-jain
Copy link
Contributor

@cjnolet
Copy link
Contributor

cjnolet commented Apr 4, 2025

Thanks @tarang-jain. @neezeeyee in that case, it might be easier to try the latest version of FAISS (1.10), which upgrades the RAFT dependency to cuVS (and provides a wealth of new capabilities and bug fixes).

@neezeeyee
Copy link
Author

neezeeyee commented Apr 6, 2025

@cjnolet Are there any known issues with FAISS 1.9.0 and RAFT 24.06 when performing GPU-based searches? Will upgrading versions definitely resolve the problem or only potentially resolve it? What is the root cause of the current observed issue? thx

@mdouze
Copy link
Contributor

mdouze commented Apr 7, 2025

Normally there should be one StandardGpuResources per device not per index.

@neezeeyee
Copy link
Author

neezeeyee commented Apr 7, 2025

@mdouze But the concurrency level is determined by the GpuResources configuration. Each GpuResources instance can handle only one request at a time. which means that If all indexes share the same GpuResources, search operations across different indexes will be queued sequentially.

@bshethmeta bshethmeta added the GPU label Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants