Releases: intel/neural-compressor
Intel® Neural Compressor v1.13.1 Release
Features
-
Support experimental auto-coding quantization for PyTorch
- Post-training static and dynamic quantization for PyTorch
- Post-training static quantization for IPEX
- Mixed-precision (BF16, INT8, and FP32) for PyTorch
-
Refactor quantization utilities for ONNX Runtime
Bug fix
- Fixed model compression orchestration issue caused by PyTorch v1.11
- Fixed GUI issues
Validated Configurations
- Python 3.8
- Centos 8.4
- TensorFlow 2.9
- Intel TensorFlow 2.9
- PyTorch 1.12.0+cpu
- IPEX 1.12.0
- MXNet 1.7.0
- ONNX Runtime 1.11.0
Intel® Neural Compressor v1.13 Release
Features
-
Quantization
- Support new quantization APIs for Intel TensorFlow
- Support FakeQuant (QDQ) quantization format for ITEX
- Improve INT8 quantization recipes for ONNX Runtime
-
Mixed Precision
- Enhance mixed precision interface to support BF16 (FP16) mixed with FP32
-
Neural Architecture Search
- Support SuperNet-based neural architecture search (DyNAS)
-
Sparsity
- Support training for block-wise structured sparsity
-
Strategy
- Support operator-type based tuning strategy
Productivity
- Support light (default) and full binary packages (default package size 0.5MB, full package size 2MB)
- Add experimental accuracy diagnostic feature for INT8 quantization including tensor statistics visualization and fine-grained precision setting
- Add experimental one-click BF16/INT8 low precision enabling & inference optimization, first-ever code-free solution in industry
Ecosystem
- Upstream 4 more quantized models (emotion_ferplus, ultraface, arcfase, bidaf) to ONNX Model Zoo
- Upstream 10 quantized Transformers-based models to HuggingFace Model Hub
Examples
- Add notebooks for Quantization on Intel DevCloud, Distillation/Sparsity/Quantization for BERT-Mini SST-2, and Neural Architecture Search (DyNAS)
- Add more quantization examples from TensorFlow Model Zoo
Validated Configurations
- Python 3.8, 3.9, 3.10
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.7, 2.8, 2.9
- Intel TensorFlow 2.7, 2.8, 2.9
- PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
- IPEX 1.10.0, 1.11.0, 1.12.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.9.0, 1.10.0, 1.11.0
Intel® Neural Compressor v1.12 Release
Features
-
Quantization
- Support accuracy-aware AMP (INT8/BF16/FP32) on PyTorch
- Improve post-training quantization (static & dynamic) on PyTorch
- Improve post-training quantization on TensorFlow
- Improve QLinear and QDQ quantization modes on ONNX Runtime
- Improve accuracy-aware AMP (INT8/FP32) on ONNX Runtime
-
Pruning
- Improve pruning-once-for-all for NLP models
-
Sparsity
- Support experimental sparse kernel for reference examples
Productivity
- Support model deployment by loading INT8 models directly from HuggingFace model hub
- Improve GUI with optimized model downloading, performance profiling, etc.
Ecosystem
- Highlight simple quantization usage with few clicks on ONNX Model Zoo
- Upstream INC quantized models (ResNet101, Tiny YoloV3) to ONNX Model Zoo
Examples
- Add Bert-mini distillation + quantization notebook example
- Add DLRM & SSD-ResNet34 quantization examples on IPEX
- Improve BERT structured sparsity training example
Validated Configurations
- Python 3.8, 3.9, 3.10
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.6.2, 2.7, 2.8
- Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
- PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
- IPEX 1.8.0, 1.9.0, 1.10.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.8.0, 1.9.0, 1.10.0
Intel® Neural Compressor v1.11 Release
Features
- Quantization
- Supported QDQ as experimental quantization format for ONNX Runtime
- Improved FX symbolic tracing for PyTorch
- Supported multi-metrics for quantization tuning
- Knowledge distillation
- Improved distillation algorithm for intermediate layer knowledge transfer
- Productivity
- Improved quantization productivity for ONNX Runtime through GUI
- Improved PyTorch INT8 model save/load methods
- Ecosystem
- Upstreamed INC quantized Yolov3, DenseNet, Mask-Rcnn, Yolov4 models to ONNX Model Zoo
- Became PyTorch ecosystem tool shortly after published PyTorch INC tutorial
- Examples
- Added INC quantized ResNet50 v1.5 and BERT-Large model for IPEX
- Supported dynamic quantization & weight sharing on bare metal reference engine
Intel® Neural Compressor v1.10 Release
Features
- Quantization
- Supported the quantization on latest deep learning frameworks
- Supported the quantization for a new model domain (Audio)
- Supported the compatible quantization recipes for framework upgrade
- Pruning & Knowledge distillation
- Supported fine-tuning and quantization using INC & Optimum for “Prune Once for All: Sparse Pre-Trained Language Models” published at ENLSP NeurIPS Workshop 2021
- Structured sparsity
- Proved the sparsity training recipes across multiple model domains (CV, NLP, and Recommendation System)
Productivity
- Improved INC GUI for easy quantization
- Supported Windows OS conda installation
Ecosystem
- Upgraded INC v1.9 into HuggingFace Optimum
- Upsteamed INC quantized mobilenet & faster-rcnn models to ONNX Model Zoo
Examples
- Supported quantization on 300 random models
- Added bare-metal examples for Bert-mini and DLRM
Validated Configurations
- Python 3.7, 3.8, 3.9
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.6.2, 2.7, 2.8
- Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
- PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
- IPEX 1.8.0, 1.9.0, 1.10.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.8.0, 1.9.0, 1.10.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.9 Release
Features
-
Knowledge distillation
- Supported one-shot compression pipelines (knowledge distillation during quantization-aware training) on PyTorch
- Added more distillation examples on TensorFlow and PyTorch
-
Quantization
- Supported multi-objective tuning for quantization
- Supported Intel Extension for PyTorch v1.10 version
- Improved quantization-aware training support on PyTorch v1.10
-
Pruning
- Added more magnitude pruning examples on TensorFlow
-
Reference bara-metal examples
- Supported BF16 optimizations on NLP models
- Added sparse DLRM model (experimental)
-
Productivity
- Added Python favorable API (alternative to YAML configuration file)
- Improved user facing APIs more pythonic
-
Ecosystem
- Integrated pruning API into HuggingFace Optimum
- Added ssd-mobilenetv1, efficientnet, ssd, fcn_rn50, inception_v1 quantized models to ONNX Model Zoo
Validated Configurations
- Python 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.2 & 2.7
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.8.1 Release
Features
- Knowledge distillation
- Supported knowledge distillation on TensorFlow
- Pruning
- Support Multi-node training on TensorFlow
- Acceleration library
- Supported Hugging Face minilm_l6_h384_uncased_sst2, bert_base_cased_mrpc, and bert_base_nli_mean_tokens_stsb models
Validated Configurations
- Python 3.6 & 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.2 & 2.7
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.8 Release
Features
- Knowledge distillation
- Implemented the algorithms of paper “Pruning Once For All” accepted by NeurIPS 2021 ENLSP workshop
- Supported optimization pipelines (knowledge distillation & quantization-aware training) on PyTorch
- Quantization
- Added the support of ONNX RT 1.7
- Added the support of TensorFlow 2.6.2 and 2.7
- Added the support of PyTorch 1.10
- Pruning
- Supported magnitude pruning on TensorFlow
- Acceleration library
- Supported Hugging Face top 10 downloaded NLP models
Productivity
- Added performance profiling feature to INC UI service.
- Improved ease-of-use user interface for quantization with few clicks
Ecosystem
- Added notebook of using HuggingFace optimization library (Optimum) to Transformers
- Enabled top 20 downloaded Hugging Face NLP models with Optimum
- Upstreamed more INC quantized models to ONNX Model Zoo
Validated Configurations
- Python 3.6 & 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.2 & 2.7
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.
Intel® Neural Compressor v1.7.1 Release
Intel® Neural Compressor(formerly known as Intel® Low Precision Optimization Tool) v1.7 release is featured by:
Features
- Acceleration library
- Support unified buffer memory allocation policy
Ecosystem
- Upstreamed INC quantized models (alexnet/caffenet/googlenet/squeezenet) to ONNX Model Zoo
Documentation
- Performance and accuracy data update
Validated Configurations
- Python 3.6 & 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.0
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact INC Maintainers, if you get any questions.
Intel® Neural Compressor v1.7 Release
Intel® Neural Compressor(formerly known as Intel® Low Precision Optimization Tool) v1.7 release is featured by:
Features
- Quantization
- Improved quantization accuracy in SSD-Reset34 and MobileNet v3 on TensorFlow
- Pruning
- Supported magnitude pruning on TensorFlow
- Knowledge distillation
- Supported knowledge distillation on PyTorch
- Multi-node support
- Supported multi-node pruning with distributed dataloader on PyTorch
- Supported multi-node inference for benchmark on PyTorch
- Acceleration library
- Added a domain-specific acceleration library for NLP models
Productivity
- Supported the configuration-free (pure Python) quantization
- Improved ease-of-use user interface for quantization with few clicks
Ecosystem
- Integrated into HuggingFace optimization library (Optimum)
- Upstreamed INC quantized models (RN50, VGG16) to ONNX Model Zoo
Documentation
- Add tutorial and examples for knowledge distillation
- Add tutorial and examples for multi-node training
- Add tutorial and examples for acceleration library
Validated Configurations
- Python 3.6 & 3.7 & 3.8 & 3.9
- Centos 8.3 & Ubuntu 18.04
- TensorFlow 2.6.0
- Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
- PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.6.0, 1.7.0, 1.8.0
Distribution:
Channel | Links | Install Command | |
---|---|---|---|
Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git |
Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor |
Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel |
Contact:
Please feel free to contact [email protected], if you get any questions.