diff --git a/acceleration/README.md b/acceleration/README.md index e803b6e44..8c6cf65f8 100644 --- a/acceleration/README.md +++ b/acceleration/README.md @@ -4,6 +4,8 @@ Typically, model training is a time-consuming step during deep learning developm ### List of notebooks and examples #### [fast_model_training_guide](./fast_model_training_guide.md) The document introduces details of how to profile the training pipeline, how to analyze the dataset and select suitable algorithms, and how to optimize GPU utilization in single GPU, multi-GPUs or even multi-nodes. +#### [fast_inference_tutorial](./fast_inference_tutorial) +The example introduces details of how to use GDS, GPU transforms and TensorRT to accelerate the inference. #### [distributed_training](./distributed_training) The examples show how to execute distributed training and evaluation based on 3 different frameworks: - PyTorch native `DistributedDataParallel` module with `torchrun`. diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb new file mode 100644 index 000000000..ea0f398c7 --- /dev/null +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -0,0 +1,646 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) MONAI Consortium \n", + "Licensed under the Apache License, Version 2.0 (the \"License\"); \n", + "you may not use this file except in compliance with the License. \n", + "You may obtain a copy of the License at \n", + "    http://www.apache.org/licenses/LICENSE-2.0 \n", + "Unless required by applicable law or agreed to in writing, software \n", + "distributed under the License is distributed on an \"AS IS\" BASIS, \n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. \n", + "See the License for the specific language governing permissions and \n", + "limitations under the License.\n", + "\n", + "# Fast Inference with MONAI Features" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Accelerating Model Inference with MONAI\n", + "\n", + "In the rapidly evolving field of medical imaging, the ability to perform fast and efficient model inference is crucial for real-time diagnostics and treatment planning. This tutorial explores three advanced features of the MONAI framework that are designed to significantly accelerate model inference, particularly when leveraging the computational power of NVIDIA GPUs.\n", + "\n", + "1. **TensorRT Inference**: Learn how to utilize NVIDIA's TensorRT to optimize and execute models for high-performance inference, reducing latency and improving throughput.\n", + "\n", + "2. **GPU-Based Preprocessing**: Discover how to offload data preprocessing tasks to the GPU, minimizing CPU bottlenecks and accelerating the overall inference pipeline.\n", + "\n", + "3. **Direct GPU Data Loading**: Understand the benefits of loading data directly from disk into GPU memory, which reduces data transfer times and enhances processing efficiency.\n", + "\n", + "In addition to exploring these features, this tutorial provides a comprehensive benchmarking strategy to evaluate the performance improvements offered by each feature. We will use MONAI's [wholeBody_ct_segmentation](https://github.com/Project-MONAI/model-zoo/tree/dev/models/wholeBody_ct_segmentation) as a reference and build a Liver CT segmentation model for benchmarking purposes.\n", + "\n", + "Finally, we will analyze and visualize the benchmark results, offering insights into the performance gains achieved through these optimizations. By the end of this tutorial, you will have a deeper understanding of how to leverage MONAI's capabilities to enhance the efficiency of your medical imaging workflows." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup environment\n", + "\n", + "Loading data directly from disk to GPU memory requires the `kvikio` library. In addition, this tutorial requires many other dependencies such as `monai`, `torch`, `torch_tensorrt`, `numpy`, `ignite`, `pandas`, `matplotlib`, etc. We recommend using the [MONAI Docker](https://docs.monai.io/en/latest/installation.html#from-dockerhub) image to run this tutorial, which includes pre-configured dependencies and allows you to skip manual installation.\n", + "\n", + "If not using MONAI Docker, install `kvikio` using one of these methods:\n", + "\n", + "- **PyPI Installation** \n", + " Use the appropriate package for your CUDA version:\n", + " ```bash\n", + " pip install kvikio-cu12 # For CUDA 12\n", + " pip install kvikio-cu11 # For CUDA 11\n", + " ```\n", + "\n", + "- **Conda/Mamba Installation** \n", + " Follow the official [KvikIO installation guide](https://docs.rapids.ai/api/kvikio/nightly/install/) for Conda/Mamba installations.\n", + "\n", + "For convenience, we provide the cell below to install all the dependencies (please modify the cell based on your actual CUDA version, and please note that only CUDA 11 and CUDA 12 are supported for now)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel, pydicom, tqdm]\"\n", + "!python -c \"import matplotlib\" || pip install -q matplotlib\n", + "!python -c \"import torch_tensorrt\" || pip install torch_tensorrt\n", + "!python -c \"import kvikio\" || pip install kvikio-cu12\n", + "!python -c \"import pandas\" || pip install pandas\n", + "!python -c \"import requests\" || pip install requests\n", + "!python -c \"import onnx\" || pip install onnx\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "import torch\n", + "import torch_tensorrt\n", + "from monai.config import print_config\n", + "from monai.transforms import (\n", + " EnsureChannelFirstd,\n", + " EnsureTyped,\n", + " LoadImaged,\n", + " Orientationd,\n", + " Spacingd,\n", + " NormalizeIntensityd,\n", + " ScaleIntensityd,\n", + " Compose,\n", + ")\n", + "from monai.inferers import sliding_window_inference\n", + "from monai.networks.nets import SegResNet\n", + "import matplotlib.pyplot as plt\n", + "import gc\n", + "import pandas as pd\n", + "from timeit import default_timer as timer\n", + "from utils import prepare_test_datalist, prepare_model_weights, prepare_tensorrt_model\n", + "\n", + "print_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction on Fast Inference Features" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. TensorRT Inference\n", + "\n", + "`monai.networks.utils.convert_to_trt` is a function that converts a PyTorch model to a TensorRT engine-based TorchScript model. Except the loading method (need to use `torch.jit.load` to load the model), the usage of the converted TorchScript model is the same as the original model.\n", + "\n", + "`monai.data.torchscript_utils.save_net_with_metadata` is a function that saves the converted TorchScript model and its metadata.\n", + "\n", + "example:\n", + "\n", + "```py\n", + "\n", + "from monai.networks.nets import SegResNet\n", + "from monai.networks.utils import convert_to_trt\n", + "from monai.data.torchscript_utils import save_net_with_metadata\n", + "\n", + "model = SegResNet(\n", + " spatial_dims=3,\n", + " in_channels=1,\n", + " out_channels=105,\n", + " init_filters=32,\n", + " blocks_down=[1, 2, 2, 4],\n", + " blocks_up=[1, 1, 1],\n", + " dropout_prob=0.2,\n", + ")\n", + "weights = torch.load(\"model.pt\")\n", + "model.load_state_dict(weights)\n", + "torchscript_model = convert_to_trt(\n", + " model=model,\n", + " precision=\"fp16\",\n", + " input_shape=[1, 1, 96, 96, 96],\n", + " dynamic_batchsize=[1, 4, 4],\n", + " use_trace=True,\n", + " verify=False,\n", + ")\n", + "\n", + "save_net_with_metadata(torchscript_model, \"segresnet_trt\")\n", + "\n", + "model = torch.jit.load(\"segresnet_trt.ts\")\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. GPU-Based Preprocessing\n", + "\n", + "`monai.transforms.EnsureTyped` transform allows you to specify the `device` and `dtype` for the output tensor. Therefore, in order to perform GPU-based preprocessing, you can insert the `EnsureTyped` transform at the beginning of your preprocessing transforms. For example:\n", + "\n", + "```python\n", + "preprocess_transforms = [\n", + " EnsureTyped(keys=\"image\", device=torch.device(\"cuda:0\"), track_meta=True),\n", + " Spacingd(keys=[\"image\"], pixdim=(1.5, 1.5, 2.0), mode=\"bilinear\"),\n", + " ScaleIntensityRanged(\n", + " keys=[\"image\"],\n", + " a_min=-57,\n", + " a_max=164,\n", + " b_min=0.0,\n", + " b_max=1.0,\n", + " clip=True,\n", + " ),\n", + "]\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Direct GPU Data Loading\n", + "\n", + "Starting with MONAI `1.4.1rc1`, `monai.data.PydicomReader` and `monai.data.NibabelReader` added the `to_gpu` argument to enable direct GPU data loading. To use this feature, you can set the `to_gpu` argument to `True` when initializing the `LoadImaged` transform. For example:\n", + "\n", + "```python\n", + "loader = LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=True)\n", + "```\n", + "\n", + "Please note that only NIfTI (`.nii`, for compressed `.nii.gz` files, this feature also supports but the acceleration is not guaranteed) and DICOM (`.dcm`) files are supported for direct GPU data loading.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Benchmarking Strategy\n", + "\n", + "In this section, we will benchmark the acceleration performance on each feature. Specifically, we will benchmark the following inference workflows:\n", + "\n", + "- Original inference workflow\n", + "- TensorRT inference workflow\n", + "- TensorRT inference workflow with GPU-based preprocessing\n", + "- TensorRT inference workflow with direct GPU data loading and GPU-based preprocessing\n", + "\n", + "For each benchmark type, `timeit.default_timer` is used to measure the time taken." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define Benchmark Type\n", + "\n", + "A variable `benchmark_type` is used to specify the type of benchmark to run. To have a fair comparison, each benchmark type should be run after restarting the notebook kernel. `benchmark_type` can be one of the following:\n", + "\n", + "- `\"original\"`: benchmark the original model inference.\n", + "- `\"trt\"`: benchmark the TensorRT accelerated model inference.\n", + "- `\"trt_gpu_transforms\"`: benchmark the model inference with GPU transforms.\n", + "- `\"trt_gds_gpu_transforms\"`: benchmark the model inference with GPU data loading and GPU transforms." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# please uncomment the expected benchmark type to run\n", + "\n", + "benchmark_type = \"original\"\n", + "# benchmark_type = \"trt\"\n", + "# benchmark_type = \"trt_gpu_transforms\"\n", + "# benchmark_type = \"trt_gds_gpu_transforms\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare Data and Model\n", + "\n", + "The [Medical Segmentation Decathlon Task03 Liver dataset](http://medicaldecathlon.com/) is used as an example to benchmark the acceleration performance.\n", + "\n", + "A helper script, [`prepare_data.py`](./prepare_data.py), is used to download and extract the dataset. In addition, the script also prepares the model weights and TensorRT engine-based TorchScript model.\n", + "\n", + "The script automatically checks for existing data. This ensures that repeated executions of the notebook do not result in redundant operations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "root_dir = \".\"\n", + "torch.backends.cudnn.benchmark = True\n", + "torch_tensorrt.runtime.set_multi_device_safe_mode(True)\n", + "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", + "train_files = prepare_test_datalist(root_dir)\n", + "weights_path = prepare_model_weights(root_dir=root_dir, bundle_name=\"wholeBody_ct_segmentation\")\n", + "trt_model_name = \"model_trt.ts\"\n", + "trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define Inference Components" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def get_transforms(device, gpu_loading_flag=False, gpu_transforms_flag=False):\n", + " preprocess_transforms = [\n", + " LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=gpu_loading_flag),\n", + " EnsureChannelFirstd(keys=\"image\"),\n", + " Orientationd(keys=[\"image\"], axcodes=\"RAS\"),\n", + " Spacingd(keys=[\"image\"], pixdim=(1.5, 1.5, 1.5), mode=\"bilinear\"),\n", + " NormalizeIntensityd(keys=\"image\", nonzero=True),\n", + " ScaleIntensityd(\n", + " keys=[\"image\"],\n", + " minv=-1.0,\n", + " maxv=1.0,\n", + " ),\n", + " ]\n", + "\n", + " if gpu_transforms_flag and not gpu_loading_flag:\n", + " preprocess_transforms.insert(1, EnsureTyped(keys=\"image\", device=device, track_meta=True))\n", + " infer_transforms = Compose(preprocess_transforms)\n", + "\n", + " return infer_transforms\n", + "\n", + "\n", + "def get_model(device, weights_path, trt_model_path, trt_flag=False):\n", + " if not trt_flag:\n", + " model = SegResNet(\n", + " spatial_dims=3,\n", + " in_channels=1,\n", + " out_channels=105,\n", + " init_filters=32,\n", + " blocks_down=[1, 2, 2, 4],\n", + " blocks_up=[1, 1, 1],\n", + " dropout_prob=0.2,\n", + " )\n", + " weights = torch.load(weights_path)\n", + " model.load_state_dict(weights)\n", + " model.to(device)\n", + " model.eval()\n", + " else:\n", + " model = torch.jit.load(trt_model_path)\n", + " return model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define Inference Workflow\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def run_inference(data_list, infer_transforms, model, device, benchmark_type):\n", + " total_time_dict = {}\n", + " roi_size = (96, 96, 96)\n", + " sw_batch_size = 4\n", + "\n", + " for idx, sample in enumerate(data_list[:10]):\n", + " start = timer()\n", + " data = infer_transforms({\"image\": sample})\n", + "\n", + " with torch.no_grad():\n", + " input_image = (\n", + " data[\"image\"].unsqueeze(0).to(device)\n", + " if benchmark_type in [\"trt\", \"original\"]\n", + " else data[\"image\"].unsqueeze(0)\n", + " )\n", + "\n", + " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", + " output_image = output_image.cpu()\n", + "\n", + " end = timer()\n", + "\n", + " print(output_image.mean())\n", + "\n", + " del data\n", + " del input_image\n", + " del output_image\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + "\n", + " sample_name = sample.split(\"/\")[-1]\n", + " if idx > 0:\n", + " total_time_dict[sample_name] = end - start\n", + " print(end - start)\n", + " return total_time_dict" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Running the Benchmark\n", + "\n", + "The cell below will execute the benchmark based on the `benchmark_type` variable.\n", + "\n", + "#### (Optional) Using the Python Script\n", + "\n", + "For convenience, a Python script, [`run_benchmark.py`](./run_benchmark.py), is available to run the benchmark. You can open a terminal and execute the following command to run the benchmark for all benchmark types:\n", + "\n", + "\n", + "```bash\n", + "for benchmark_type in \"original\" \"trt\" \"trt_gpu_transforms\" \"trt_gds_gpu_transforms\"; do\n", + " python run_benchmark.py --benchmark_type \"$benchmark_type\"\n", + "done\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "gpu_transforms_flag = False\n", + "gpu_loading_flag = False\n", + "trt_flag = False\n", + "\n", + "if \"trt\" in benchmark_type:\n", + " trt_flag = True\n", + "if \"gpu_transforms\" in benchmark_type:\n", + " gpu_transforms_flag = True\n", + "if \"gds\" in benchmark_type:\n", + " gpu_loading_flag = True\n", + "\n", + "infer_transforms = get_transforms(device, gpu_loading_flag, gpu_transforms_flag)\n", + "model = get_model(device, weights_path, trt_model_path, trt_flag)\n", + "\n", + "# total_time_dict = run_inference(train_files, infer_transforms, model, device, benchmark_type)\n", + "\n", + "# df = pd.DataFrame(list(total_time_dict.items()), columns=[\"file_name\", \"time\"])\n", + "# df.to_csv(os.path.join(root_dir, f\"time_{benchmark_type}.csv\"), index=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Analyze and Visualize the Results\n", + "\n", + "In this section, we will analyze and visualize the results.\n", + "All cell outputs presented in this section were obtained by a NVIDIA RTX A6000 GPU." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# collect benchmark results\n", + "all_df = pd.read_csv(os.path.join(root_dir, \"time_original.csv\"))\n", + "all_df.columns = [\"file_name\", \"original_time\"]\n", + "\n", + "for benchmark_type in [\"trt\", \"trt_gpu_transforms\", \"trt_gds_gpu_transforms\"]:\n", + " df = pd.read_csv(os.path.join(root_dir, f\"time_{benchmark_type}.csv\"))\n", + " df.columns = [\"file_name\", f\"{benchmark_type}_time\"]\n", + " all_df = pd.merge(all_df, df, on=\"file_name\", how=\"left\")\n", + "\n", + "# for each file, add it's size\n", + "all_df[\"file_size\"] = all_df[\"file_name\"].apply(\n", + " lambda x: os.path.getsize(os.path.join(root_dir, \"Task03_Liver\", \"imagesTs_nii\", x))\n", + ")\n", + "# sort by file size\n", + "all_df = all_df.sort_values(by=\"file_size\", ascending=True)\n", + "# convert file size to MB\n", + "all_df[\"file_size\"] = all_df[\"file_size\"] / 1024 / 1024\n", + "# get the total time for each benchmark type\n", + "total_time = all_df.sum(numeric_only=True)\n", + "del total_time[\"file_size\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Analyze the Total Inference Time\n", + "\n", + "- TensorRT Improvement:\n", + "Switching from the original model to TensorRT (`trt_time`) results in a slight performance improvement, reducing inference time for 0.93%.\n", + "\n", + "- TensorRT + GPU Transforms Improvement:\n", + "Incorporating GPU transforms (`trt_gpu_transforms_time`) further reduces the inference time by 9.32%.\n", + "\n", + "- TensorRT + GDS + GPU Transforms Improvement:\n", + "The combination of GPU Direct Storage and GPU transforms (`trt_gds_gpu_transforms_time`) provides the most substantial improvement, reducing more than 55% of the inference time compared to the original model." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "original_time 360.122527\n", + "trt_time 356.739906\n", + "trt_gpu_transforms_time 326.563954\n", + "trt_gds_gpu_transforms_time 160.416928\n", + "dtype: float64\n" + ] + } + ], + "source": [ + "print(total_time)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TensorRT Improvement: 0.009392972563697605\n", + "TensorRT + GPU Transforms Improvement: 0.09318654129529037\n", + "TensorRT + GDS + GPU Transforms Improvement: 0.5545490328713701\n" + ] + } + ], + "source": [ + "print(\"TensorRT Improvement: \", (total_time[\"original_time\"] - total_time[\"trt_time\"]) / total_time[\"original_time\"])\n", + "print(\n", + " \"TensorRT + GPU Transforms Improvement: \",\n", + " (total_time[\"original_time\"] - total_time[\"trt_gpu_transforms_time\"]) / total_time[\"original_time\"],\n", + ")\n", + "print(\n", + " \"TensorRT + GDS + GPU Transforms Improvement: \",\n", + " (total_time[\"original_time\"] - total_time[\"trt_gds_gpu_transforms_time\"]) / total_time[\"original_time\"],\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "total_time.index = [\"pytorch_model\", \"TensorRT\", \"TensorRT_GPU_Transform\", \"TensorRT_GPU_Transform_GDS\"]\n", + "\n", + "plt.figure(figsize=(10, 6))\n", + "total_time.plot(kind=\"bar\", color=[\"skyblue\", \"orange\", \"green\", \"red\"])\n", + "plt.title(\"Total Inference Time for Each Benchmark Type\")\n", + "plt.xlabel(\"Benchmark Type\")\n", + "plt.ylabel(\"Total Time (seconds)\")\n", + "plt.xticks(rotation=45)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compare the Original model and the most optimized model\n", + "\n", + "If we plot all the scatter points comparing the original model to the most optimized model, it becomes evident that larger files benefit significantly more from our optimizations.\n", + "\n", + "With the file size increasing, the inference time of the original model increases significantly, while the inference time of the most optimized model does not show obvious increase. This indicates that our approach is particularly effective for handling larger datasets." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize=(10, 6))\n", + "plt.scatter(all_df[\"file_size\"], all_df[\"original_time\"], label=\"Original Model\")\n", + "plt.scatter(all_df[\"file_size\"], all_df[\"trt_gds_gpu_transforms_time\"], label=\"Optimized Model\")\n", + "plt.xlabel(\"File Size (MB)\")\n", + "plt.ylabel(\"Average Inference Time (seconds)\")\n", + "plt.title(\"Comparison of original and most optimized model\")\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Limitations\n", + "\n", + "Although the optimizations have shown significant improvements in inference time, there are still some limitations to consider:\n", + "\n", + "1. **TensorRT**: \n", + " - **Model Compatibility**: Not all models are compatible with TensorRT. Models with unsupported layers or operations may not benefit from TensorRT acceleration.\n", + " - **Batch Size**: TensorRT is optimized for larger batch sizes. For very small batch sizes, the overhead of conversion and execution might outweigh the performance gains.\n", + " - **Precision**: While using lower precision (e.g., FP16) can speed up inference, it may lead to a loss in model accuracy, which is critical in medical imaging applications.\n", + "\n", + "2. **GPU-Based Preprocessing**:\n", + " - **Memory Usage**: The GPU-based preprocessing requires additional GPU memory. This can be a limitation if the available GPU memory is limited.\n", + "\n", + "3. **GPU Direct Storage (GDS)**:\n", + " - **File Format Support**: Currently, only specific file formats like NIfTI (for compressed `.nii.gz` NIFTI files, this feature also supports but the acceleration is not guaranteed) and DICOM are supported for direct GPU data loading. Other formats may not benefit from this feature.\n", + " - **Small File Acceleration**: For small files, the overhead of conversion and execution might outweigh the performance gains.\n", + "\n", + "By understanding these limitations, users can better assess when and how to apply these acceleration features effectively in their workflows." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "monai_tutorial", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.16" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/acceleration/fast_inference_tutorial/run_benchmark.py b/acceleration/fast_inference_tutorial/run_benchmark.py new file mode 100644 index 000000000..df7245c60 --- /dev/null +++ b/acceleration/fast_inference_tutorial/run_benchmark.py @@ -0,0 +1,163 @@ +# Copyright (c) MONAI Consortium +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import argparse +import gc +import os +from timeit import default_timer as timer + +import pandas as pd +import torch +import torch_tensorrt +from monai.inferers import sliding_window_inference +from monai.networks.nets import SegResNet +from monai.transforms import ( + Activationsd, + AsDiscreted, + Compose, + EnsureChannelFirstd, + EnsureTyped, + Invertd, + LoadImaged, + NormalizeIntensityd, + Orientationd, + ScaleIntensityd, + Spacingd, +) + +from utils import prepare_model_weights, prepare_tensorrt_model, prepare_test_datalist + + +def get_transforms(device, gpu_loading_flag=False, gpu_transforms_flag=False): + preprocess_transforms = [ + LoadImaged(keys="image", reader="NibabelReader", to_gpu=gpu_loading_flag), + EnsureChannelFirstd(keys="image"), + Orientationd(keys=["image"], axcodes="RAS"), + Spacingd(keys=["image"], pixdim=(1.5, 1.5, 1.5), mode="bilinear"), + NormalizeIntensityd(keys="image", nonzero=True), + ScaleIntensityd( + keys=["image"], + minv=-1.0, + maxv=1.0, + ), + ] + + if gpu_transforms_flag and not gpu_loading_flag: + preprocess_transforms.insert(1, EnsureTyped(keys="image", device=device, track_meta=True)) + infer_transforms = Compose(preprocess_transforms) + + return infer_transforms + + +def get_post_transforms(infer_transforms): + post_transforms = Compose( + [ + Activationsd(keys="pred", softmax=True), + AsDiscreted(keys="pred", argmax=True), + Invertd( + keys="pred", + transform=infer_transforms, + orig_keys="image", + nearest_interp=True, + to_tensor=True, + ), + ] + ) + return post_transforms + + +def get_model(device, weights_path, trt_model_path, trt_flag=False): + if not trt_flag: + model = SegResNet( + spatial_dims=3, + in_channels=1, + out_channels=105, + init_filters=32, + blocks_down=[1, 2, 2, 4], + blocks_up=[1, 1, 1], + dropout_prob=0.2, + ) + weights = torch.load(weights_path) + model.load_state_dict(weights) + model.to(device) + model.eval() + else: + model = torch.jit.load(trt_model_path) + return model + + +def run_inference(data_list, infer_transforms, model, device, benchmark_type): + total_time_dict = {} + roi_size = (96, 96, 96) + sw_batch_size = 4 + + for idx, sample in enumerate(data_list): + start = timer() + data = infer_transforms({"image": sample}) + + with torch.no_grad(): + input_image = ( + data["image"].unsqueeze(0).to(device) + if benchmark_type in ["trt", "original"] + else data["image"].unsqueeze(0) + ) + + output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model) + output_image = output_image.cpu() + + end = timer() + + del data + del input_image + del output_image + torch.cuda.empty_cache() + gc.collect() + + sample_name = sample.split("/")[-1] + if idx > 0: + total_time_dict[sample_name] = end - start + print(f"Time taken for {sample_name}: {end - start} seconds") + return total_time_dict + + +def main(): + parser = argparse.ArgumentParser(description="Run inference benchmark.") + parser.add_argument("--benchmark_type", type=str, default="original", help="Type of benchmark to run") + args = parser.parse_args() + + ### Prepare the environment + root_dir = "." + torch.backends.cudnn.benchmark = True + torch_tensorrt.runtime.set_multi_device_safe_mode(True) + device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") + train_files = prepare_test_datalist(root_dir) + # since the dataset is too large, the smallest 31 files are used for warm up (1 file) and benchmarking (30 files) + train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:31] + weights_path = prepare_model_weights(root_dir=root_dir, bundle_name="wholeBody_ct_segmentation") + trt_model_name = "model_trt.ts" + trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name) + + gpu_transforms_flag = "gpu_transforms" in args.benchmark_type + gpu_loading_flag = "gds" in args.benchmark_type + trt_flag = "trt" in args.benchmark_type + # Get components + infer_transforms = get_transforms(device, gpu_loading_flag, gpu_transforms_flag) + model = get_model(device, weights_path, trt_model_path, trt_flag) + # Run inference + total_time_dict = run_inference(train_files, infer_transforms, model, device, args.benchmark_type) + # Save the results + df = pd.DataFrame(list(total_time_dict.items()), columns=["file_name", "time"]) + df.to_csv(os.path.join(root_dir, f"time_{args.benchmark_type}.csv"), index=False) + + +if __name__ == "__main__": + main() diff --git a/acceleration/fast_inference_tutorial/utils.py b/acceleration/fast_inference_tutorial/utils.py new file mode 100644 index 000000000..60486b7bf --- /dev/null +++ b/acceleration/fast_inference_tutorial/utils.py @@ -0,0 +1,90 @@ +# Copyright (c) MONAI Consortium +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import glob +import os +import shutil + +import monai +import torch +from monai.apps import download_and_extract +from monai.data.torchscript_utils import save_net_with_metadata +from monai.networks.nets import SegResNet +from monai.networks.utils import convert_to_trt + + +def prepare_test_datalist(root_dir): + resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task03_Liver.tar" + + compressed_file = os.path.join(root_dir, "Task03_Liver.tar") + data_root = os.path.join(root_dir, "Task03_Liver") + if not os.path.exists(data_root): + download_and_extract(resource, compressed_file, root_dir) + + nii_dir = os.path.join(data_root, "imagesTs_nii") + if not os.path.exists(nii_dir): + os.makedirs(nii_dir, exist_ok=True) + train_gz_files = sorted(glob.glob(os.path.join(data_root, "imagesTs", "*.nii.gz"))) + for file in train_gz_files: + new_file = file.replace(".nii.gz", ".nii") + if not os.path.exists(new_file): + os.system(f"gzip -dc {file} > {new_file}") + shutil.copy(new_file, nii_dir) + else: + print(f"Test data already exists at {nii_dir}") + + files = sorted(glob.glob(os.path.join(nii_dir, "*.nii"))) + return files + + +def prepare_model_weights(root_dir, bundle_name="spleen_ct_segmentation"): + bundle_path = os.path.join(root_dir, bundle_name) + weights_path = os.path.join(root_dir, "model.pt") + if not os.path.exists(weights_path): + monai.bundle.download(name=bundle_name, bundle_dir=root_dir) + + weights_original_path = os.path.join(bundle_path, "models", "model.pt") + shutil.copy(weights_original_path, weights_path) + else: + print(f"Weights already exists at {weights_path}") + + return weights_path + + +def prepare_tensorrt_model(root_dir, weights_path, trt_model_name="model_trt.ts"): + trt_path = os.path.join(root_dir, trt_model_name) + if not os.path.exists(trt_path): + model = SegResNet( + spatial_dims=3, + in_channels=1, + out_channels=105, + init_filters=32, + blocks_down=[1, 2, 2, 4], + blocks_up=[1, 1, 1], + dropout_prob=0.2, + ) + weights = torch.load(weights_path) + model.load_state_dict(weights) + torchscript_model = convert_to_trt( + model=model, + precision="fp16", + input_shape=[1, 1, 96, 96, 96], + dynamic_batchsize=[1, 4, 4], + use_trace=True, + verify=False, + ) + + save_net_with_metadata(torchscript_model, trt_model_name.split(".")[0]) + else: + print(f"TensorRT model already exists at {trt_path}") + + return os.path.join(root_dir, trt_model_name) diff --git a/runner.sh b/runner.sh index e2bd1f853..94773d273 100755 --- a/runner.sh +++ b/runner.sh @@ -70,6 +70,7 @@ doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" TCIA_PROSTATEx_Pros doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_functional.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_compose.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" TensorRT_inference_acceleration.ipynb) +doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" fast_inference_tutorial.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_benchmark.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" modular_patch_inferer.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" GDS_dataset.ipynb) @@ -118,6 +119,7 @@ skip_run_papermill=("${skip_run_papermill[@]}" .*swinunetr_finetune*) skip_run_papermill=("${skip_run_papermill[@]}" .*active_learning*) skip_run_papermill=("${skip_run_papermill[@]}" .*transform_visualization*) # https://github.com/Project-MONAI/tutorials/issues/1155 skip_run_papermill=("${skip_run_papermill[@]}" .*TensorRT_inference_acceleration*) +skip_run_papermill=("${skip_run_papermill[@]}" .*fast_inference_tutorial*) skip_run_papermill=("${skip_run_papermill[@]}" .*mednist_classifier_ray*) # https://github.com/Project-MONAI/tutorials/issues/1307 skip_run_papermill=("${skip_run_papermill[@]}" .*TorchIO_MONAI_PyTorch_Lightning*) # https://github.com/Project-MONAI/tutorials/issues/1324 skip_run_papermill=("${skip_run_papermill[@]}" .*GDS_dataset*) # https://github.com/Project-MONAI/tutorials/issues/1324