Add v4.0 suite on Readme (#716)

nv-rborkar · web-flow · commit 09373f2c1d5f · 2024-03-28T07:15:11.000-07:00
* Add v4.0 suite on Readme

* Update README.md

* Update README.md

* Update README.md

Added links to src code for new benchmarks
diff --git a/README.md b/README.md
@@ -31,22 +31,29 @@ Each reference implementation provides the following:
 
 # Running Benchmarks
 
-These benchmarks have been tested on the following machine configuration:
-
-* 16 CPUs, one Nvidia P100.
-* Ubuntu 16.04, including docker with nvidia support.
-* 600GB of disk (though many benchmarks do require less disk).
-* Either CPython 2 or CPython 3, depending on benchmark (see Dockerfiles for details).
-
-Generally, a benchmark can be run with the following steps:
+Follow instructions on the Readme of each benchmark. Generally, a benchmark can be run with the following steps:
 
 1. Setup docker & dependencies. There is a shared script (install_cuda_docker.sh) to do this. Some benchmarks will have additional setup, mentioned in their READMEs.
 2. Download the dataset using `./download_dataset.sh`. This should be run outside of docker, on your host machine. This should be run from the directory it is in (it may make assumptions about CWD).
 3. Optionally, run `verify_dataset.sh` to ensure the was successfully downloaded.
 4. Build and run the docker image, the command to do this is included with each Benchmark. 
 
-Each benchmark will run until the target quality is reached and then stop, printing timing results. 
+Each benchmark will run until the target quality is reached and then stop, printing timing results.
+
+Some these benchmarks are rather slow or take a long time to run on the reference hardware. We expect to see significant performance improvements with more hardware and optimized implementations. 
 
-Some these benchmarks are rather slow or take a long time to run on the reference hardware (i.e. 16 CPUs and one P100). We expect to see significant performance improvements with more hardware and optimized implementations. 
+# MLPerf Training v4.0 (Submission Deadline May 10, 2024)
+*Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.
 
+| model | reference implementation | framework | dataset
+| ---- | ---- | ---- | ---- |
+| resnet50v1.5 | [vision/classification_and_detection](https://github.com/mlcommons/training/tree/master/image_classification) | tensorflow2 | Imagenet
+| RetinaNet | [vision/object detection](https://github.com/mlcommons/training/tree/master/single_stage_detector) | pytorch | OpenImages
+| 3DUnet | [vision/image segmentation](https://github.com/mlcommons/training/tree/master/image_segmentation/pytorch) | pytorch | KiTS19
+| Stable Diffusionv2 | [image generation](https://github.com/mlcommons/training/tree/master/stable_diffusion) | pytorch | LAION-400M-filtered
+| BERT-large | [language/nlp](https://github.com/mlcommons/training/tree/master/language_model/tensorflow/bert) | tensorflow | Wikipedia 2020/01/01
+| GPT3 | [language/llm](https://github.com/mlcommons/training/tree/master/large_language_model) | paxml,megatron-lm | C4
+| LLama2 70B-LoRA | [language/LLM fine-tuning](https://github.com/mlcommons/training/tree/master/llama2_70b_lora) | pytorch | SCROLLS govtReport
+| DLRMv2 | [recommendation](https://github.com/mlcommons/training/tree/master/recommendation_v2/torchrec_dlrm) | torchrec | Criteo 4TB multi-hot
+| RGAT | [GNN](https://github.com/mlcommons/training/tree/master/graph_neural_network) | pytorch | IGBFull