You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Toolbox/Data/CIFAR.qmd
+7Lines changed: 7 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -31,8 +31,13 @@ CIFAR-10 (Canadian Institute For Advanced Research) is a dataset consisiting of
31
31
-**Transfer Learning*: CIFAR-10 is commonly used to fine-tune pre-trained CNNs on CIFAR-10 to assess model generalizability and to test domain adaption techiniques on CIFAR-10 and fine-tuning them for real-world tasks.
32
32
-**Data Augmentation**: Using image augmentation, the size of CIFAR-10 can be synthetically increased by including random rotations, flips, cropping, and color transformations. This may also help models generalize better. Other techniques like MixUp and CutMix also can improve classification accuracy.
33
33
34
+
35
+
#### Ethical Concerns and Acknowledgments
36
+
CIFAR's parent dataset has been widely criticized for its content and collection method. 80 Million Tiny Images was created by scraping images off the internet without knowledge or consent of any of the owners of the photos. Also, in 2020, 80 Million Tiny Images was found to contain a range of racist, sexist, and other offensive labels, and for that reason was taken offline by MIT ans NYU in 2020, who also requested other researchers and users refrain from using copies of the dataset.
37
+
34
38
#### Related datasets & projects
35
39
-**CIFAR-100**: A sister dataset to CIFAR-10, CIFAR 100 contains 100 classes grouped into 20 superclasses, with 600 images per class. Compared to CIFAR-10, CIFAR-100 contains more classes and less images per class, providing a more fine-grained classification challenge due to the larger number of categories.
40
+
-**Dollar street 10**: MLCommons Dollar Street Dataset is a great alternative to CIFAR-10 that is not ethically problematic. Unlike CIFAR-10, all the images in Dollar street 10 are under public domain data, licensed for academic, commercial and non-commercial usage, or under CC-BY and CC-BY-SA 4. Like CIFAR-10, the images are of 10 categories of everyday, real-life objects, but was created in effort to visually capture socioeconomic diversity of these objects around the globe in order to avoid bias in machine learning models.
36
41
37
42
## Questions?
38
43
If you have any lingering questions about this resource, feel free to post them on the [ML+X Nexus Q&A](https://github.com/UW-Madison-DataScience/ML-X-Nexus/discussions/categories/q-a) on GitHub. We will update this resource as new information or applications arise.
@@ -42,3 +47,5 @@ If you have any lingering questions about this resource, feel free to post them
-[Papers With Code (CIFAR-10)](https://paperswithcode.com/dataset/cifar-10): Additional documentation on CIFAR-10
44
49
-[Papers With Code (CIFAR-100)](https://paperswithcode.com/dataset/cifar-100): CIFAR-100 documentation
50
+
-[CIFAR Ethical Acknowledgement](https://venturebeat.com/ai/mit-takes-down-80-million-tiny-images-data-set-due-to-racist-and-offensive-content/): News article that detaild the ethical concerns of 80 Million Tiny Images that lead to its shutdown.
51
+
-[Dollar street 10](https://zenodo.org/records/10970014): More details and download link to Dollar street 10.
0 commit comments