Skip to content

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

License

Notifications You must be signed in to change notification settings

mit-han-lab/once-for-all

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f03b267 · Sep 26, 2023
Jun 16, 2020
Jul 19, 2023
Sep 26, 2023
Jul 19, 2023
Jun 12, 2020
Feb 26, 2021
Sep 26, 2023
Jun 12, 2020
Mar 2, 2022
Mar 2, 2022
Jun 10, 2022
Jul 19, 2023
Jul 19, 2023
Jul 19, 2023

Repository files navigation

Once-for-All: Train One Network and Specialize it for Efficient Deployment [arXiv] [Slides] [Video]

@inproceedings{
  cai2020once,
  title={Once for All: Train One Network and Specialize it for Efficient Deployment},
  author={Han Cai and Chuang Gan and Tianzhe Wang and Zhekai Zhang and Song Han},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://arxiv.org/pdf/1908.09791.pdf}
}

[News] Once-for-All is available at PyTorch Hub now!

[News] Once-for-All (OFA) Network is adopted by SONY Neural Architecture Search Library.

[News] Once-for-All (OFA) Network is adopted by ADI MAX78000/MAX78002 Model Training and Synthesis Tool.

[News] Once-for-All (OFA) Network is adopted by Alibaba and ranked 1st in the open division of the MLPerf Inference Benchmark (Datacenter and Edge).

[News] First place in the CVPR 2020 Low-Power Computer Vision Challenge, CPU detection and FPGA track.

[News] OFA-ResNet50 is released.

[News] The hands-on tutorial of OFA is released!

[News] OFA is available via pip! Run pip install ofa to install the whole OFA codebase.

[News] First place in the 4th Low-Power Computer Vision Challenge, both classification and detection track.

[News] First place in the 3rd Low-Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.

Train once, specialize for many deployment scenarios

80% top1 ImageNet accuracy under mobile setting

Consistently outperforms MobileNetV3 on Diverse hardware platforms

OFA-ResNet50 [How to use]

How to use / evaluate OFA Networks

Use

""" OFA Networks.
    Example: ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
""" 
from ofa.model_zoo import ofa_net
ofa_network = ofa_net(net_id, pretrained=True)
    
# Randomly sample sub-networks from OFA network
ofa_network.sample_active_subnet()
random_subnet = ofa_network.get_active_subnet(preserve_weight=True)
    
# Manually set the sub-network
ofa_network.set_active_subnet(ks=7, e=6, d=4)
manual_subnet = ofa_network.get_active_subnet(preserve_weight=True)

Evaluate

python eval_ofa_net.py --path 'Your path to imagenet' --net ofa_mbv3_d234_e346_k357_w1.0

OFA Network Design Space Resolution Width Multiplier Depth Expand Ratio kernel Size
ofa_resnet50 ResNet50D 128 - 224 0.65, 0.8, 1.0 0, 1, 2 0.2, 0.25, 0.35 3
ofa_mbv3_d234_e346_k357_w1.0 MobileNetV3 128 - 224 1.0 2, 3, 4 3, 4, 6 3, 5, 7
ofa_mbv3_d234_e346_k357_w1.2 MobileNetV3 160 - 224 1.2 2, 3, 4 3, 4, 6 3, 5, 7
ofa_proxyless_d234_e346_k357_w1.3 ProxylessNAS 128 - 224 1.3 2, 3, 4 3, 4, 6 3, 5, 7

How to use / evaluate OFA Specialized Networks

Use

""" OFA Specialized Networks.
Example: net, image_size = ofa_specialized('flops@[email protected]_finetune@75', pretrained=True)
""" 
from ofa.model_zoo import ofa_specialized
net, image_size = ofa_specialized(net_id, pretrained=True)

Evaluate

python eval_specialized_net.py --path 'Your path to imagent' --net flops@[email protected]_finetune@75

Model Name Details Top-1 (%) Top-5 (%) #Params #MACs
ResNet50 Design Space
ofa-resnet50D-41 [email protected][email protected] 79.8 94.7 30.9M 4.1B
ofa-resnet50D-37 [email protected][email protected] 79.7 94.7 26.5M 3.7B
ofa-resnet50D-30 [email protected][email protected] 79.3 94.5 28.7M 3.0B
ofa-resnet50D-24 [email protected][email protected] 79.0 94.2 29.0M 2.4B
ofa-resnet50D-18 [email protected][email protected] 78.3 94.0 20.7M 1.8B
ofa-resnet50D-12 [email protected][email protected]_finetune@25 77.1 93.3 19.3M 1.2B
ofa-resnet50D-09 [email protected][email protected]_finetune@25 76.3 92.9 14.5M 0.9B
ofa-resnet50D-06 [email protected][email protected]_finetune@25 75.0 92.1 9.6M 0.6B
FLOPs
ofa-595M flops@[email protected]_finetune@75 80.0 94.9 9.1M 595M
ofa-482M flops@[email protected]_finetune@75 79.6 94.8 9.1M 482M
ofa-389M flops@[email protected]_finetune@75 79.1 94.5 8.4M 389M
LG G8
ofa-lg-24 LG-G8_lat@[email protected]_finetune@25 76.4 93.0 5.8M 230M
ofa-lg-16 LG-G8_lat@[email protected]_finetune@25 74.7 92.0 5.8M 151M
ofa-lg-11 LG-G8_lat@[email protected]_finetune@25 73.0 91.1 5.0M 103M
ofa-lg-8 LG-G8_lat@[email protected]_finetune@25 71.1 89.7 4.1M 74M
Samsung S7 Edge
ofa-s7edge-88 s7edge_lat@[email protected]_finetune@25 76.3 92.9 6.4M 219M
ofa-s7edge-58 s7edge_lat@[email protected]_finetune@25 74.7 92.0 4.6M 145M
ofa-s7edge-41 s7edge_lat@[email protected]_finetune@25 73.1 91.0 4.7M 96M
ofa-s7edge-29 s7edge_lat@[email protected]_finetune@25 70.5 89.5 3.8M 66M
Samsung Note8
ofa-note8-65 note8_lat@[email protected]_finetune@25 76.1 92.7 5.3M 220M
ofa-note8-49 note8_lat@[email protected]_finetune@25 74.9 92.1 6.0M 164M
ofa-note8-31 note8_lat@[email protected]_finetune@25 72.8 90.8 4.6M 101M
ofa-note8-22 note8_lat@[email protected]_finetune@25 70.4 89.3 4.3M 67M
Samsung Note10
ofa-note10-64 note10_lat@[email protected]_finetune@75 80.2 95.1 9.1M 743M
ofa-note10-50 note10_lat@[email protected]_finetune@75 79.7 94.9 9.1M 554M
ofa-note10-41 note10_lat@[email protected]_finetune@75 79.3 94.5 9.0M 457M
ofa-note10-30 note10_lat@[email protected]_finetune@75 78.4 94.2 7.5M 339M
ofa-note10-22 note10_lat@[email protected]_finetune@25 76.6 93.1 5.9M 237M
ofa-note10-16 note10_lat@[email protected]_finetune@25 75.5 92.3 4.9M 163M
ofa-note10-11 note10_lat@[email protected]_finetune@25 73.6 91.2 4.3M 110M
ofa-note10-08 note10_lat@[email protected]_finetune@25 71.4 89.8 3.8M 79M
Google Pixel1
ofa-pixel1-143 pixel1_lat@[email protected]_finetune@75 80.1 95.0 9.2M 642M
ofa-pixel1-132 pixel1_lat@[email protected]_finetune@75 79.8 94.9 9.2M 593M
ofa-pixel1-79 pixel1_lat@[email protected]_finetune@75 78.7 94.2 8.2M 356M
ofa-pixel1-58 pixel1_lat@[email protected]_finetune@75 76.9 93.3 5.8M 230M
ofa-pixel1-40 pixel1_lat@[email protected]_finetune@25 74.9 92.1 6.0M 162M
ofa-pixel1-28 pixel1_lat@[email protected]_finetune@25 73.3 91.0 5.2M 109M
ofa-pixel1-20 pixel1_lat@[email protected]_finetune@25 71.4 89.8 4.3M 77M
Google Pixel2
ofa-pixel2-62 pixel2_lat@[email protected]_finetune@25 75.8 92.7 5.8M 208M
ofa-pixel2-50 pixel2_lat@[email protected]_finetune@25 74.7 91.9 4.7M 166M
ofa-pixel2-35 pixel2_lat@[email protected]_finetune@25 73.4 91.1 5.1M 113M
ofa-pixel2-25 pixel2_lat@[email protected]_finetune@25 71.5 90.1 4.1M 79M
1080ti GPU (Batch Size 64)
ofa-1080ti-27 1080ti_gpu64@[email protected]_finetune@25 76.4 93.0 6.5M 397M
ofa-1080ti-22 1080ti_gpu64@[email protected]_finetune@25 75.3 92.4 5.2M 313M
ofa-1080ti-15 1080ti_gpu64@[email protected]_finetune@25 73.8 91.3 6.0M 226M
ofa-1080ti-12 1080ti_gpu64@[email protected]_finetune@25 72.6 90.9 5.9M 165M
V100 GPU (Batch Size 64)
ofa-v100-11 v100_gpu64@[email protected]_finetune@25 76.1 92.7 6.2M 352M
ofa-v100-09 v100_gpu64@[email protected]_finetune@25 75.3 92.4 5.2M 313M
ofa-v100-06 v100_gpu64@[email protected]_finetune@25 73.0 91.1 4.9M 179M
ofa-v100-05 v100_gpu64@[email protected]_finetune@25 71.6 90.3 5.2M 141M
Jetson TX2 GPU (Batch Size 16)
ofa-tx2-96 tx2_gpu16@[email protected]_finetune@25 75.8 92.7 6.2M 349M
ofa-tx2-80 tx2_gpu16@[email protected]_finetune@25 75.4 92.4 5.2M 313M
ofa-tx2-47 tx2_gpu16@[email protected]_finetune@25 72.9 91.1 4.9M 179M
ofa-tx2-35 tx2_gpu16@[email protected]_finetune@25 70.3 89.4 4.3M 121M
Intel Xeon CPU with MKL-DNN (Batch Size 1)
ofa-cpu-17 cpu_lat@[email protected]_finetune@25 75.7 92.6 4.9M 365M
ofa-cpu-15 cpu_lat@[email protected]_finetune@25 74.6 92.0 4.9M 301M
ofa-cpu-11 cpu_lat@[email protected]_finetune@25 72.0 90.4 4.4M 160M
ofa-cpu-10 cpu_lat@[email protected]_finetune@25 71.1 89.9 4.2M 143M

How to train OFA Networks

mpirun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    python train_ofa_net.py

or

horovodrun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    python train_ofa_net.py

Introduction Video

Watch the video

Hands-on Tutorial Video

Watch the video

Requirement

  • Python 3.6+
  • Pytorch 1.4.0+
  • ImageNet Dataset
  • Horovod

Related work on automated and efficient deep learning:

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)

HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)