-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Faiss IVFPQ (GPU) Enable raft-on cause /dev/nvidiactl ioctl race #4272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @neezeeyee. There have been significant updates to this since June of last year (version 24.06 denotes June of 2024). Are you able to update FAISS to use the new cuVS library? It's also possible updating RAFT could work @tarang-jain do you recall what version of RAFT was released w/ FAISS 1.9.0? |
Faiss 1.9.0 had RAFT=24.06 (https://github.com/facebookresearch/faiss/blob/v1.9.0/cmake/thirdparty/fetch_rapids.cmake) |
Thanks @tarang-jain. @neezeeyee in that case, it might be easier to try the latest version of FAISS (1.10), which upgrades the RAFT dependency to cuVS (and provides a wealth of new capabilities and bug fixes). |
@cjnolet Are there any known issues with FAISS 1.9.0 and RAFT 24.06 when performing GPU-based searches? Will upgrading versions definitely resolve the problem or only potentially resolve it? What is the root cause of the current observed issue? thx |
Normally there should be one |
@mdouze But the concurrency level is determined by the GpuResources configuration. Each GpuResources instance can handle only one request at a time. which means that If all indexes share the same GpuResources, search operations across different indexes will be queued sequentially. |
Severe Performance Degradation Due to NVIDIA Driver (nvidiactl) ioctl Contention When RAFT is Enabled in Faiss GPU Build
Environment
Faiss Version: 1.9.0
RAFT Version: 24.06.00
GPU
Faiss Build Configuration:
Problem Description
We run GPU searches using the following workflow
Within a single k8s Pod, multiple GPU indices (
gpu_index
) may coexist, where each index is bound to an isolatedStandardGpuResources
, Under normal circumstances, search operations across these indices should not interfere with each other due to resource isolation.1. Cross-Pod Driver-Level Contention
When monitoring multiple k8s Pods sharing the same GPU host, we observed heavy contention through:
2. Within-Pod Contention
raft race

Solve
remove DFAISS_ENABLE_RAFT=ON and recomplie faiss
The text was updated successfully, but these errors were encountered: