Welcome to the Image Classification Project, using deep learning techniques for camera trap images analysis. The motivation for undertaking the topic of wildlife species classification was to develop an intelligent tool that would assist researchers in the rapid and accurate identification of animal species in camera trap images. The images were collected in Taï National Park by the Wild Chimpanzee Foundation and the Max Planck Institute for Evolutionary Anthropology, and were gathered and published as part of a competition by the organizers on the DrivenData platform. Camera traps are one of the best tools available for monitoring wildlife populations, but the vast amount of data they generate requires advanced processing. By applying deep learning techniques, the aim was to support conservation efforts by automating the analysis of this data. The classification involved eight categories seven animal species (birds, civets, duikers, hogs, leopards, other monkeys, rodents) and additional class for images with no animals. The goal was to build a model that could predict whether an image contains one of these species or belongs to the empty class.
The project implemented and tested five models, including three convolutional neural network architectures. Key evaluation metrics included loss, accuracy, precision, recall, and F1-score. The study also explored the impact of data splitting and augmentation techniques.
The performance of all models was evaluated based on loss and accuracy metrics in validation set. Below is a table summarizing the configuration of all models and results.
The table presents a comparison of all models using loss and accuracy metrics. The ResNet-101 model demonstrated better performance when data was split using the first method, stratified k-fold, which ensures an even distribution of classes between datasets. In contrast, when the data was split based on "site" location clustering, the model achieved significantly worse results. The classification accuracy was only 38%, which was extremely low. Consequently, further tests were conducted using the first data split method.
Surprisingly, the application of augmentation techniques to the ResNet model resulted in worse performance, as indicated by both the loss function values and classification accuracy.
In pursuit of better results, two additional architectures were tested: EfficientNet and ConvNeXt. Both models outperformed ResNet, but ConvNeXt-Base proved to be the best. This model achieved the lowest loss value, 0.44, and an overall classification accuracy of 84.63%, outperforming all other models.
To analyze the classification performance for each class, metrics such as precision, recall, and F1-score were calculated. The following chart illustrates the performance for all classes.
The intended goal was successfully achieved through the implementation and analysis of deep neural network models for classifying animal species in camera trap images. A key factor was the appropriate data split method, which significantly impacted model training. However, the application of augmentation did not yield the expected results and led to a decrease in classification accuracy.
To enhance results in future studies, consider:
- Object Detection: Utilize detection models to identify objects within images.
- Classification Refinement: Use convolutional neural networks to classify detected objects.
This guide provides all necessary steps to configure local development environment, including setting up a Python virtual environment (venv), installing dependencies, and running the application.
This project was developed on Windows 11, utilizing an NVIDIA RTX 3060 GPU with 6GB VRAM for faster model training and inference. The system also includes a i7-10750h CPU processor with 6 cores, 32 GB of RAM ensuring smooth processing of datasets.
To leverage the GPU for deep learning tasks, I installed CUDA (Compute Unified Device Architecture) and cuDNN (CUDA Deep Neural Network library) from NVIDIA, which are essential for accelerating deep learning operations. Here's how the installation process was completed:
If you haven't already, download and install Visual Studio Code from the official website: Download Visual Studio Code
Before setting up the Python environment, make sure to install CUDA 11.2.2 and cuDNN 8.1.1 to enable GPU acceleration for deep learning tasks. Follow these steps:
- Download CUDA 11.2.2 from the official NVIDIA website.
- Install CUDA and ensure it is installed in the default path:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\
. - Download cuDNN 8.1.1 from the official NVIDIA cuDNN page.
- After downloading cuDNN, extract the contents and copy the files to the respective directories within the CUDA Toolkit installation (e.g.,
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\
). - Add the following paths to your system Environment Variables under
Path
to ensure that your system can locate the CUDA and cuDNN libraries:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp
Open Visual Studio Code and create a new working folder for project. You can create a Python virtual environment using one of the following methods:
- Open the Command Palette:
- Shortcut:
Ctrl+Shift+P
(Windows/Linux)
- Shortcut:
- Type and select Python: Select Interpreter.
- Choose Create New Virtual Environment.
- Follow the prompts:
- Select Venv as the type of virtual environment.
- Choose your Python installation (e.g., Python 3.10).
- After the virtual environment is created, VS Code will automatically switch to it.
- If VS Code does not automatically detect the environment, use the Command Palette (shortcut
Shift + Ctrl + P
), typePython: Select Interpreter
, and choose the interpreter associated with your newly created environment.
-
In VS Code, open the PowerShell terminal (shortcut
Ctrl + ~
). Use the following commands to install python virtual environment:python -m pip install virtualenv
After create a new Python virtual environment using:
python -m virtualenv venv -p="C:\Program Files\Python310\python.exe"
You may also want to update
pip
:python.exe -m pip install --upgrade pip
-
To activate the virtual environment on Windows, use the following command:
.\venv\Scripts\activate.ps1
Once the interpreter is selected, install all required dependencies (for example by running file requirements.txt
created before with package versions):
pip install -r requirements.txt
To save the current list of dependencies, run the following command:
pip freeze > requirements.txt
In your virtual environment, you'll need to install the ipykernel
package. Run the following command inside your virtual environment:
pip install ipykernel
The structure of the project is based on the Cookiecutter Data Science project template, as described on DrivenData-Labs. This template provides a clear and organized framework for data science projects, ensuring consistency and best practices.
Deep learning project focused on animal classification in camera trap images.
├── LICENSE <- Open-source license if one is chosen
├── Makefile <- Makefile with convenience commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- A default mkdocs project; see www.mkdocs.org for details
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml <- Project configuration file with package metadata for
│ src and configuration for tools like black
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.cfg <- Configuration file for flake8
│
└── src <- Source code for use in this project.
│
├── __init__.py <- Makes src a Python module
│
├── config.py <- Store useful variables and configuration
│
├── dataset.py <- Scripts to download or generate data
│
├── features.py <- Code to create features for modeling
│
├── modeling
│ ├── __init__.py
│ ├── predict.py <- Code to run model inference with trained models
│ └── train.py <- Code to train models
│
└── plots.py <- Code to create visualizations