Freeze rel (#2863)

dtrawins · web-flow · commit 3c284cfe0bc0 · 2024-11-20T13:10:15.000+01:00
* freeze requriements and docker image

* fixes
diff --git a/demos/common/export_models/requirements.txt b/demos/common/export_models/requirements.txt
@@ -1,11 +1,102 @@
 --extra-index-url "https://download.pytorch.org/whl/cpu"
---extra-index-url "https://storage.openvinotoolkit.org/simple/wheels/nightly"
---pre
-optimum-intel@git+https://github.com/huggingface/optimum-intel.git  
-openvino-tokenizers[transformers]==2024.5.* 
-openvino==2024.5.* 
-nncf>=2.11.0 
-sentence_transformers==3.1.1 
-openai 
-transformers<4.45
-einops
+about-time==4.2.1
+aiohappyeyeballs==2.4.3
+aiohttp==3.11.6
+aiosignal==1.3.1
+alive-progress==3.2.0
+annotated-types==0.7.0
+anyio==4.6.2.post1
+async-timeout==5.0.1
+attrs==24.2.0
+autograd==1.7.0
+certifi==2024.8.30
+charset-normalizer==3.4.0
+cma==3.2.2
+coloredlogs==15.0.1
+contourpy==1.3.1
+cycler==0.12.1
+datasets==3.1.0
+Deprecated==1.2.15
+dill==0.3.8
+distro==1.9.0
+einops==0.8.0
+exceptiongroup==1.2.2
+filelock==3.16.1
+fonttools==4.55.0
+frozenlist==1.5.0
+fsspec==2024.9.0
+grapheme==0.6.0
+h11==0.14.0
+httpcore==1.0.7
+httpx==0.27.2
+huggingface-hub==0.26.2
+humanfriendly==10.0
+idna==3.10
+Jinja2==3.1.4
+jiter==0.7.1
+joblib==1.4.2
+jsonschema==4.23.0
+jsonschema-specifications==2024.10.1
+jstyleson==0.0.2
+kiwisolver==1.4.7
+markdown-it-py==3.0.0
+MarkupSafe==3.0.2
+matplotlib==3.9.2
+mdurl==0.1.2
+mpmath==1.3.0
+multidict==6.1.0
+multiprocess==0.70.16
+natsort==8.4.0
+networkx==3.3
+ninja==1.11.1.1
+nncf==2.13.0
+numpy==1.26.4
+onnx==1.17.0
+openai==1.54.5
+openvino==2024.5.0
+openvino-telemetry==2024.5.0
+openvino-tokenizers==2024.5.0.0
+optimum==1.23.3
+optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@e3031f058fff4763a9fd917464e26aab9994449f
+packaging==24.2
+pandas==2.2.3
+pillow==11.0.0
+propcache==0.2.0
+protobuf==5.28.3
+psutil==6.1.0
+pyarrow==18.0.0
+pydantic==2.9.2
+pydantic_core==2.23.4
+pydot==2.0.0
+Pygments==2.18.0
+pymoo==0.6.1.3
+pyparsing==3.2.0
+python-dateutil==2.9.0.post0
+pytz==2024.2
+PyYAML==6.0.2
+referencing==0.35.1
+regex==2024.11.6
+requests==2.32.3
+rich==13.9.4
+rpds-py==0.21.0
+safetensors==0.4.5
+scikit-learn==1.5.2
+scipy==1.14.1
+sentence-transformers==3.1.1
+sentencepiece==0.2.0
+six==1.16.0
+sniffio==1.3.1
+sympy==1.13.1
+tabulate==0.9.0
+threadpoolctl==3.5.0
+tiktoken==0.8.0
+tokenizers==0.19.1
+torch==2.5.1+cpu
+tqdm==4.67.0
+transformers==4.44.2
+typing_extensions==4.12.2
+tzdata==2024.2
+urllib3==2.2.3
+wrapt==1.16.0
+xxhash==3.5.0
+yarl==1.17.2
diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md
@@ -7,16 +7,14 @@ That makes it easy to use and efficient especially on on Intel® Xeon® processo
 
 ## Get the docker image
 
-Build the image from source to try the latest enhancements in this feature.
+Pull the image from Dockerhub with CPU support:
 ```bash
-git clone https://github.com/openvinotoolkit/model_server.git
-cd model_server
-make release_image GPU=1
+docker pull openvino/model_server:2024.5
+```
+or if you want to include also the support for GPU execution:
+```bash
+docker pull openvino/model_server:2024.5-gpu
 ```
-It will create an image called `openvino/model_server:latest`.
-> **Note:** This operation might take 40min or more depending on your build host.
-> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
-> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image.
 
 ## Model preparation
 > **Note** Python 3.9 or higher is need for that step
@@ -69,14 +67,14 @@ Check the [LLM calculator documentation](../../docs/llm/reference.md) to learn a
 
 Running this command starts the container with CPU only target device:
 ```bash
-docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --rest_port 8000 --config_path /workspace/config.json
+docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5 --rest_port 8000 --config_path /workspace/config.json
 ```
 ### GPU
 
 In case you want to use GPU device to run the generation, add extra docker parameters `--device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1)` 
 to `docker run` command, use the image with GPU support. Export the models with precision matching the GPU capacity and adjust pipeline configuration.
 It can be applied using the commands below:
-```
+```bash
 python demos/common/export_models/export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format int4 --target_device GPU --cache_size 2 --config_file_path models/config.json --model_repository_path models --overwrite_models
 
 docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:latest-gpu --rest_port 8000 --config_path /workspace/config.json
diff --git a/demos/embeddings/README.md b/demos/embeddings/README.md
@@ -4,15 +4,13 @@ Text generation use case is exposed via OpenAI API `embeddings` endpoint.
 
 ## Get the docker image
 
-Build the image from source to try this new feature. It will be included in the public image in the coming version 2024.5.
+Pull the image from Dockerhub with CPU support:
 ```bash
-git clone https://github.com/openvinotoolkit/model_server.git
-cd model_server
-make release_image GPU=1
+docker pull openvino/model_server:2024.5
 ```
-It will create an image called `openvino/model_server:latest`.
-> **Note:** This operation might take 40min or more depending on your build host.
-> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
+or if you want to include also the support for GPU execution:
+```bash
+docker pull openvino/model_server:2024.5-gpu
 
 ## Model preparation
 > **Note** Python 3.9 or higher is needed for that step
@@ -77,17 +75,17 @@ All models supported by [optimum-intel](https://github.com/huggingface/optimum-i
 ### CPU
 
 ```bash
-docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --port 9000 --rest_port 8000 --config_path /workspace/config.json
+docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5 --port 9000 --rest_port 8000 --config_path /workspace/config.json
 ```
 ### GPU
 
 In case you want to use GPU device to run the embeddings model, add extra docker parameters `--device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1)` 
 to `docker run` command, use the image with GPU support and make sure set the target_device in subconfig.json to GPU. Also make sure the export model quantization level and cache size fit to the GPU memory. All of that can be applied with the commands:
 
-```
+```bash
 python demos/common/export_models/export_model.py embeddings --source_model Alibaba-NLP/gte-large-en-v1.5 --weight-format int8 --target_device GPU --config_file_path models/config.json --model_repository_path models
 
-docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:latest-gpu --rest_port 8000 --config_path /workspace/config.json
+docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5-gpu --rest_port 8000 --config_path /workspace/config.json
 ```
 ### Check readiness
 
diff --git a/demos/python_demos/clip_image_classification/download_model_requirements.txt b/demos/python_demos/clip_image_classification/download_model_requirements.txt
@@ -1,6 +1,4 @@
 --extra-index-url "https://download.pytorch.org/whl/cpu"
---extra-index-url "https://storage.openvinotoolkit.org/simple/wheels/nightly"
---pre
 openvino==2024.5.*
 numpy<2.0
 transformers==4.40.2
diff --git a/demos/rerank/README.md b/demos/rerank/README.md
@@ -2,15 +2,13 @@
 
 ## Get the docker image
 
-Build the image from source to try this new feature. It will be included in the public image in the coming version 2024.5.
+Pull the image from Dockerhub with CPU support:
 ```bash
-git clone https://github.com/openvinotoolkit/model_server.git
-cd model_server
-make release_image GPU=1
+docker pull openvino/model_server:2024.5
 ```
-It will create an image called `openvino/model_server:latest`.
-> **Note:** This operation might take 40min or more depending on your build host.
-> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
+or if you want to include also the support for GPU execution:
+```bash
+docker pull openvino/model_server:2024.5-gpu
 
 ## Model preparation
 > **Note** Python 3.9 or higher is needed for that step
@@ -53,7 +51,7 @@ models
 ## Deployment 
 
 ```bash
-docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --port 9000 --rest_port 8000 --config_path /workspace/config.json
+docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5 --port 9000 --rest_port 8000 --config_path /workspace/config.json
 ```
 
 Readiness of the model can be reported with a simple curl command.