Skip to content

Commit c488dc8

Browse files
Update CIFAR.qmd
1 parent 99dba83 commit c488dc8

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

Toolbox/Data/CIFAR.qmd

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,13 @@ CIFAR-10 (Canadian Institute For Advanced Research) is a dataset consisiting of
3131
- **Transfer Learning*: CIFAR-10 is commonly used to fine-tune pre-trained CNNs on CIFAR-10 to assess model generalizability and to test domain adaption techiniques on CIFAR-10 and fine-tuning them for real-world tasks.
3232
- **Data Augmentation**: Using image augmentation, the size of CIFAR-10 can be synthetically increased by including random rotations, flips, cropping, and color transformations. This may also help models generalize better. Other techniques like MixUp and CutMix also can improve classification accuracy.
3333

34+
35+
#### Ethical Concerns and Acknowledgments
36+
CIFAR's parent dataset has been widely criticized for its content and collection method. 80 Million Tiny Images was created by scraping images off the internet without knowledge or consent of any of the owners of the photos. Also, in 2020, 80 Million Tiny Images was found to contain a range of racist, sexist, and other offensive labels, and for that reason was taken offline by MIT ans NYU in 2020, who also requested other researchers and users refrain from using copies of the dataset.
37+
3438
#### Related datasets & projects
3539
- **CIFAR-100**: A sister dataset to CIFAR-10, CIFAR 100 contains 100 classes grouped into 20 superclasses, with 600 images per class. Compared to CIFAR-10, CIFAR-100 contains more classes and less images per class, providing a more fine-grained classification challenge due to the larger number of categories.
40+
-**Dollar street 10**: MLCommons Dollar Street Dataset is a great alternative to CIFAR-10 that is not ethically problematic. Unlike CIFAR-10, all the images in Dollar street 10 are under public domain data, licensed for academic, commercial and non-commercial usage, or under CC-BY and CC-BY-SA 4. Like CIFAR-10, the images are of 10 categories of everyday, real-life objects, but was created in effort to visually capture socioeconomic diversity of these objects around the globe in order to avoid bias in machine learning models.
3641

3742
## Questions?
3843
If you have any lingering questions about this resource, feel free to post them on the [ML+X Nexus Q&A](https://github.com/UW-Madison-DataScience/ML-X-Nexus/discussions/categories/q-a) on GitHub. We will update this resource as new information or applications arise.
@@ -42,3 +47,5 @@ If you have any lingering questions about this resource, feel free to post them
4247
- [CIFAR-10 Kaggle](https://www.kaggle.com/c/cifar-10/): CIFAR-10 Kaggle image prediction competition.
4348
- [Papers With Code (CIFAR-10)](https://paperswithcode.com/dataset/cifar-10): Additional documentation on CIFAR-10
4449
- [Papers With Code (CIFAR-100)](https://paperswithcode.com/dataset/cifar-100): CIFAR-100 documentation
50+
- [CIFAR Ethical Acknowledgement](https://venturebeat.com/ai/mit-takes-down-80-million-tiny-images-data-set-due-to-racist-and-offensive-content/): News article that detaild the ethical concerns of 80 Million Tiny Images that lead to its shutdown.
51+
- [Dollar street 10](https://zenodo.org/records/10970014): More details and download link to Dollar street 10.

0 commit comments

Comments
 (0)