Closed
Description
Severe Performance Degradation Due to NVIDIA Driver (nvidiactl) ioctl Contention When RAFT is Enabled in Faiss GPU Build
Environment
-
Faiss Version: 1.9.0
-
RAFT Version: 24.06.00
-
GPU
-
Faiss Build Configuration:
# Docker build snippet
RUN set -ex \
&& mkdir -p /root/logs \
&& rm -rf /lib64/libstdc++.so.6.0.25-gdb.py \
&& ldconfig \
&& cd cmake-3.29.4 \
&& ./configure --prefix=/usr/local/cmake \
&& make && make install \
&& ln -s /usr/local/cmake/bin/cmake /usr/bin/cmake \
&& cd .. && rm -rf cmake-3.29.4 \
&& cd raft-24.06.00 \
&& ./build.sh libraft \
&& cd ..
ENV raft_DIR=/home/raft-24.06.00/cpp/build/
RUN cd faiss-1.9.0 \
&& cmake -B build . -DFAISS_ENABLE_GPU=ON -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=OFF -DCUDAToolkit_ROOT=/usr/local/cuda/targets/x86_64-linux/include -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_RAFT=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES="70;75;80;86;90" \
&& make -C build install -j6 && cd .. \
&& rm -rf faiss-1.9.0
Problem Description
We run GPU searches using the following workflow
std::unique_ptr<faiss::Index> index_cpu_ptr(faiss::read_index(faiss_local_file_path.c_str(), 0));
faiss::Index* cpu_ptr = cpu_unique_ptr.release();
faiss::gpu::StandardGpuResources res;
faiss::Index* index_gpu = faiss::gpu::index_cpu_to_gpu(&res, 0, cpu_ptr, &options);
index_gpu->search();
Within a single k8s Pod, multiple GPU indices (gpu_index
) may coexist, where each index is bound to an isolated StandardGpuResources
, Under normal circumstances, search operations across these indices should not interfere with each other due to resource isolation.
1. Cross-Pod Driver-Level Contention
When monitoring multiple k8s Pods sharing the same GPU host, we observed heavy contention through:
@ioctl_file[b, /dev/nvidiactl, NV_ESC_RM_FREE]: //
[0] 2248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@ioctl_file[b /dev/nvidiactl, NV_ESC_RM_VID_HEAP_CONTROL]:
[0] 2248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
2. Within-Pod Contention
Solve
remove DFAISS_ENABLE_RAFT=ON and recomplie faiss