Skip to content

store_dataset CAGRA parameter #4274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rchitale7 opened this issue Apr 3, 2025 · 11 comments
Open

store_dataset CAGRA parameter #4274

rchitale7 opened this issue Apr 3, 2025 · 11 comments
Labels

Comments

@rchitale7
Copy link

rchitale7 commented Apr 3, 2025

Hi,

I am trying to build a GPU index using GpuIndexCagra, and was under the impression that the store_dataset parameter prevents the dataset from being attached to the index. I am converting the index to HNSW afterwards, so I don't want to load the dataset into GPU memory. Recently, there was a PR that seemed to address this: #4173.

However, for a 6.144 GB example dataset, I noticed that the GPU memory spiked to as high as 10.5 GB, when I monitored the GPU usage with nvidia-smi in the background. The code I'm using to test is here: https://github.com/navneet1v/VectorSearchForge/tree/main/cuvs_benchmarks. Specifically, this function: https://github.com/navneet1v/VectorSearchForge/blob/main/cuvs_benchmarks/main.py#L324 is used to build the index on a GPU

Interestingly, when I used the numpy mmap feature to load the dataset, I did not see the GPU memory exceed 5.039 GB. This was regardless of the value I set the store_dataset parameter. It looks like CAGRA supports keeping the dataset on disk, so that is probably the reason why the GPU memory doesn't spike. However, we want to see if it's possible to keep the dataset entirely in CPU memory, without loading it into GPU memory and without using disk. Is the store_dataset parameter supposed to do this? If not, is there any other way to do this with the faiss python API? Please let me know, thank you!

Additional Background
Faiss version: We are using Faiss as a git submodule, with version 1.10.0, and the submodule is pointed to commit df6a8f6

df6a8f6

OS version:

NAME="Amazon Linux"
VERSION="2023"
ID="amzn"
ID_LIKE="fedora"
VERSION_ID="2023"
PLATFORM_ID="platform:al2023"
PRETTY_NAME="Amazon Linux 2023.6.20250218"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023"
HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/"
DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/"
SUPPORT_URL="https://aws.amazon.com/premiumsupport/"
BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023"
VENDOR_NAME="AWS"
VENDOR_URL="https://aws.amazon.com/"
SUPPORT_END="2029-06-30"

Type of GPU:

00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1)

EC2 Instance Type: g5.2xlarge

Reproduction Instructions

  1. On a server with GPUs, clone https://github.com/navneet1v/VectorSearchForge
    • Server must have git and docker installed
    • Server must have nvidia developer tools installed, such as nvidia-smi and nvidia-container-toolkit
  2. cd into cuvs_benchmarks folder, and create a temp directory to store the faiss graph files:
mkdir ./benchmarks_files
chmod 777 ./benchmarks_files
  1. Build the docker image:
docker build -t <your_image_name> .
  1. Run the image:
docker run -v ./benchmarks_files:/tmp/files --gpus all <your_image_name>

In a separate terminal, run nvidia-smi to monitor the GPU memory:

nvidia-smi --query-gpu=timestamp,utilization.gpu,memory.used,temperature.gpu --format=csv -l 1

For loading the numpy dataset with mmap, I added the following lines below https://github.com/navneet1v/VectorSearchForge/blob/main/cuvs_benchmarks/main.py#L253:

    # Line 1-253 code above ...
    np.save("array.npy",xb)
    del xb
    xb = np.load("array.npy", mmap_mode='r+')
    # rest of code below ...
@tarang-jain
Copy link
Contributor

The store_dataset parameter is used to attach the dataset to the CAGRA index. That is, it will bring the dataset into device memory if it is not already on device to allow for the CAGRA index to be searched.
For the IVF-PQ graph build, part of the dataset will have to be loaded onto the GPU for training the IVF centroids. In your benchmarking script, I see the trainset_ratio is set to 0.5, which means that at some point, half of the dataset is loaded onto the device. Reducing that could potentially reduce the peak memory pressure (if that training part is the culprit for your peak GPU memory usage). Furthermore, reducing pq_dim could also help (for openai_1536, a pq_dim of 384 should work).

@tarang-jain
Copy link
Contributor

To answer your original question, at least some of your dataset will have to be loaded onto the GPU for IVF-PQ build and search, during the CAGRA graph build.

@navneet1v
Copy link
Contributor

navneet1v commented Apr 4, 2025

@tarang-jain Thanks for responding on this. I think we understand that for doing clustering some parts of dataset needs to be loaded in GPU memeory, but this is where it is not consistent with trainset_ratio. A simple question would be why we see this behavior:

However, for a 6.144 GB example dataset, I noticed that the GPU memory spiked to as high as 10.5 GB, when I monitored the GPU usage with nvidia-smi in the background.

This is very bizarre.
cc: @cjnolet and @divyegala

@navneet1v
Copy link
Contributor

@bshethmeta this issue should not be closed. As I think this is a bug in the code.

@tfeher
Copy link

tfeher commented Apr 15, 2025

The memory usage that reported above is higher than what it should be. Below I show the expected memory usage when we use cuvs natively (using cuvs-bench). I will need to repeat the same test with the faiss integrated version of cuvs to see where the additional allocations happens. I suspect some temporary allocations grow larger than expected.

Here is the GPU memory usage for CAGRA index building when the dataset is in host memory. Initially we subsample the dataset, and that is used for k-means clustering. As discussed above, the allocation size can be controlled by the kmeans_trainset_fraction parameter. Afterwards, the memory that is allocated on GPU is mainly determined by the IVF-PQ index size.

Image

Note that the IVF index has nlist clusters, each allocated separately. The allocator might round up small allocations to MB size, and that would add some overhead.

@rchitale7
Copy link
Author

@tfeher @tarang-jain Thanks for the responses. I am most confused on why we see such a dramatic difference in peak GPU memory usage when loading the dataset into CPU memory v.s. storing on disk using numpy.mmap. When I store the dataset entirely in CPU memory, the peak GPU memory usage during CAGRA index build is 10.5 GB. Then, if I keep all of the CAGRA hyper parameters (such as kmeans_trainset_fraction and nlist) the same but only change the way I store the dataset (by using numpy.mmap), the peak GPU memory usage drops to 5.039 GB. I understand that changing the value of certain CAGRA hyper parameters can reduce the GPU memory usage, but I'm not sure why storing the dataset on disk instead of CPU memory would also impact GPU memory usage, if store_dataset=False. Does this seem like a bug with the store_dataset parameter?

@navneet1v
Copy link
Contributor

navneet1v commented Apr 16, 2025

I will need to repeat the same test with the faiss integrated version of cuvs to see where the additional allocations happens.

@tfeher
This will be great. Because we use faiss in our testing.

@mnorris11 mnorris11 added the GPU label Apr 16, 2025
@tfeher
Copy link

tfeher commented Apr 25, 2025

@rchitale7 you are right that from GPU memory usage point of view, it should not matter whether the dataset is accessed using numpy.mmap, or it is in CPU memory.

Note that the store_dataset option only influences what do we do with the dataset after the neighborhood graph is created. When store_dataset=True then the dataset is copied to GPU, and we return a GPU index that can be immediately used for searching on GPU.

When store_dataset=False, then the returned index does not contain the dataset. It stores only the graph. The index can be saved or converted to a CPU index and searched using HNSW on the CPU.

If we run a CAGRA search on the index created by store_dataset=False, then that will load the dataset on the GPU.

The memory usage that you report is higher than expected, please see my answer here: rapidsai/cuvs#566 (comment)

Could you run your test with RMM logging enabled (as described in the linked answer) to see when these allocations happen?

@navneet1v
Copy link
Contributor

@rchitale7 can you please run the benchmarks as suggested by @tfeher

@navneet1v
Copy link
Contributor

@tfeher did you try running the code(referenced here: rapidsai/cuvs#566 (comment)) without memory mapped file? Because its the non memory mapped code which was spiking the memory.

@rchitale7
Copy link
Author

I added the rmm memory logs results here: rapidsai/cuvs#566 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants