|
| 1 | +--- |
| 2 | +title: "CIFAR Dataset (CIFAR-10)" |
| 3 | +author: |
| 4 | + - name: Aidan O'Brien |
| 5 | + |
| 6 | + |
| 7 | +date: 2025/03/13 |
| 8 | +date-format: long |
| 9 | +image: "../../../images/cifar.png" # In your local copy of the Nexus repository, add a representative image for this resource to the images folder. Both PNGs and JPGs are acceptable formats. Adjust the filename here from "vincent-van-zalinge-mDohQISBnCk-unsplash.jpg" to the name of your image. |
| 10 | + |
| 11 | +categories: |
| 12 | + - Data |
| 13 | + - Image |
| 14 | + - Computer vision |
| 15 | + - Image classification |
| 16 | + - CNN |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## About this resource |
| 21 | +CIFAR-10 (Canadian Institute For Advanced Research) is a dataset consisiting of 60,000 32x32 pixed RGB color images of 10 classes, with 6000 images per class. CIFAR-10 is a labeled subset of the 80 Million Tiny Images dataset (created by MIT and NYU), created by students paid to label all the images. The images cover a broad range of everyday real-world objects, yet still managed to maintain a level of simplicity. CIFAR-10 is a popular dataset widely used as a bench mark in evaluating and training computer vision models. |
| 22 | + |
| 23 | +#### Key features |
| 24 | +- **Small Size**: Each image is only 32x32 pixels, and with only 60,000 images in total, the dataset is computationally manageable. |
| 25 | +- **Diversity**: The dataset cover a broad range of real-world objects, mainly consisting to animals and transportation vehicles. To be exact, the classes are airplanes, automobiles, cats, deer, dogs, horses, ships, and trucks. With this diveristy, CIFAR-10 ensures that models generalize well across different types of objects rather than specializing in a narrow domain. |
| 26 | +- **Balance**: Each class contains an equal number of images (6,000), preventing class imbalance issues. |
| 27 | + |
| 28 | +#### Key applications |
| 29 | +- **Image Classifcation Benchmarking**: Due to its wide popularity, CIFAR-10 is commonly used as a benchmark dataset to evaluate the performance of various image classification models, including convolutional neural networks (CNNs), vision transformers (ViTs), and other deep learning architectures. |
| 30 | +- **Training, Testing, Experimentation**: CIFAR-10 is commonly applied in quick supervised learning tasks such as training CNNSs for object recognition, evaluation extraction techinques (like PCA), and testing optimization technqiues. |
| 31 | +- **Transfer Learning*: CIFAR-10 is commonly used to fine-tune pre-trained CNNs on CIFAR-10 to assess model generalizability and to test domain adaption techiniques on CIFAR-10 and fine-tuning them for real-world tasks. |
| 32 | +- **Data Augmentation**: Using image augmentation, the size of CIFAR-10 can be synthetically increased by including random rotations, flips, cropping, and color transformations. This may also help models generalize better. Other techniques like MixUp and CutMix also can improve classification accuracy. |
| 33 | + |
| 34 | + |
| 35 | +#### Ethical Concerns and Acknowledgments |
| 36 | +CIFAR's parent dataset has been widely criticized for its content and collection method. 80 Million Tiny Images was created by scraping images off the internet without knowledge or consent of any of the owners of the photos. Also, in 2020, 80 Million Tiny Images was found to contain a range of racist, sexist, and other offensive labels, and for that reason was taken offline by MIT ans NYU in 2020, who also requested other researchers and users refrain from using copies of the dataset. |
| 37 | + |
| 38 | +#### Related datasets & projects |
| 39 | +- **CIFAR-100**: A sister dataset to CIFAR-10, CIFAR 100 contains 100 classes grouped into 20 superclasses, with 600 images per class. Compared to CIFAR-10, CIFAR-100 contains more classes and less images per class, providing a more fine-grained classification challenge due to the larger number of categories. |
| 40 | +-**Dollar street 10**: MLCommons Dollar Street Dataset is a great alternative to CIFAR-10 that is not ethically problematic. Unlike CIFAR-10, all the images in Dollar street 10 are under public domain data, licensed for academic, commercial and non-commercial usage, or under CC-BY and CC-BY-SA 4. Like CIFAR-10, the images are of 10 categories of everyday, real-life objects, but was created in effort to visually capture socioeconomic diversity of these objects around the globe in order to avoid bias in machine learning models. |
| 41 | + |
| 42 | +## Questions? |
| 43 | +If you have any lingering questions about this resource, feel free to post them on the [ML+X Nexus Q&A](https://github.com/UW-Madison-DataScience/ML-X-Nexus/discussions/categories/q-a) on GitHub. We will update this resource as new information or applications arise. |
| 44 | + |
| 45 | +## See also |
| 46 | +<!-- MARKDOWN COMMENT: Please Check the existing resources on Nexus to see if any other related resources (e.g., related books/videos, blog posts commenting on the resource, alternative approaches/frameworks, etc.) should be linked below. You may also link to resources which aren't currently on the Nexus platform, if applicable. However, if you're feeling ambitious, you may wish to post those to Nexus as well! --> |
| 47 | +- [CIFAR-10 Kaggle](https://www.kaggle.com/c/cifar-10/): CIFAR-10 Kaggle image prediction competition. |
| 48 | +- [Papers With Code (CIFAR-10)](https://paperswithcode.com/dataset/cifar-10): Additional documentation on CIFAR-10 |
| 49 | +- [Papers With Code (CIFAR-100)](https://paperswithcode.com/dataset/cifar-100): CIFAR-100 documentation |
| 50 | +- [CIFAR Ethical Acknowledgement](https://venturebeat.com/ai/mit-takes-down-80-million-tiny-images-data-set-due-to-racist-and-offensive-content/): News article that detaild the ethical concerns of 80 Million Tiny Images that lead to its shutdown. |
| 51 | +- [Dollar street 10](https://zenodo.org/records/10970014): More details and download link to Dollar street 10. |
0 commit comments