Skip to content

Commit 09373f2

Browse files
authored
Add v4.0 suite on Readme (#716)
* Add v4.0 suite on Readme * Update README.md * Update README.md * Update README.md Added links to src code for new benchmarks
1 parent 42aaab3 commit 09373f2

File tree

1 file changed

+17
-10
lines changed

1 file changed

+17
-10
lines changed

README.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -31,22 +31,29 @@ Each reference implementation provides the following:
3131

3232
# Running Benchmarks
3333

34-
These benchmarks have been tested on the following machine configuration:
35-
36-
* 16 CPUs, one Nvidia P100.
37-
* Ubuntu 16.04, including docker with nvidia support.
38-
* 600GB of disk (though many benchmarks do require less disk).
39-
* Either CPython 2 or CPython 3, depending on benchmark (see Dockerfiles for details).
40-
41-
Generally, a benchmark can be run with the following steps:
34+
Follow instructions on the Readme of each benchmark. Generally, a benchmark can be run with the following steps:
4235

4336
1. Setup docker & dependencies. There is a shared script (install_cuda_docker.sh) to do this. Some benchmarks will have additional setup, mentioned in their READMEs.
4437
2. Download the dataset using `./download_dataset.sh`. This should be run outside of docker, on your host machine. This should be run from the directory it is in (it may make assumptions about CWD).
4538
3. Optionally, run `verify_dataset.sh` to ensure the was successfully downloaded.
4639
4. Build and run the docker image, the command to do this is included with each Benchmark.
4740

48-
Each benchmark will run until the target quality is reached and then stop, printing timing results.
41+
Each benchmark will run until the target quality is reached and then stop, printing timing results.
42+
43+
Some these benchmarks are rather slow or take a long time to run on the reference hardware. We expect to see significant performance improvements with more hardware and optimized implementations.
4944

50-
Some these benchmarks are rather slow or take a long time to run on the reference hardware (i.e. 16 CPUs and one P100). We expect to see significant performance improvements with more hardware and optimized implementations.
45+
# MLPerf Training v4.0 (Submission Deadline May 10, 2024)
46+
*Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.
5147

48+
| model | reference implementation | framework | dataset
49+
| ---- | ---- | ---- | ---- |
50+
| resnet50v1.5 | [vision/classification_and_detection](https://github.com/mlcommons/training/tree/master/image_classification) | tensorflow2 | Imagenet
51+
| RetinaNet | [vision/object detection](https://github.com/mlcommons/training/tree/master/single_stage_detector) | pytorch | OpenImages
52+
| 3DUnet | [vision/image segmentation](https://github.com/mlcommons/training/tree/master/image_segmentation/pytorch) | pytorch | KiTS19
53+
| Stable Diffusionv2 | [image generation](https://github.com/mlcommons/training/tree/master/stable_diffusion) | pytorch | LAION-400M-filtered
54+
| BERT-large | [language/nlp](https://github.com/mlcommons/training/tree/master/language_model/tensorflow/bert) | tensorflow | Wikipedia 2020/01/01
55+
| GPT3 | [language/llm](https://github.com/mlcommons/training/tree/master/large_language_model) | paxml,megatron-lm | C4
56+
| LLama2 70B-LoRA | [language/LLM fine-tuning](https://github.com/mlcommons/training/tree/master/llama2_70b_lora) | pytorch | SCROLLS govtReport
57+
| DLRMv2 | [recommendation](https://github.com/mlcommons/training/tree/master/recommendation_v2/torchrec_dlrm) | torchrec | Criteo 4TB multi-hot
58+
| RGAT | [GNN](https://github.com/mlcommons/training/tree/master/graph_neural_network) | pytorch | IGBFull
5259

0 commit comments

Comments
 (0)