Why do I get this error after executing the command python3 scripts/launch_triton_server.py --world_size 1 --model_repo=llama_ifb/? #405

SevenEmotion · 2024-04-12T02:03:08Z

SevenEmotion
Apr 12, 2024

I0412 01:49:31.086437 386 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f3bec000000' with size 268435456
I0412 01:49:31.088762 386 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0412 01:49:31.093652 386 model_lifecycle.cc:461] loading: postprocessing:1
I0412 01:49:31.093690 386 model_lifecycle.cc:461] loading: preprocessing:1
I0412 01:49:31.093763 386 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0412 01:49:31.093790 386 model_lifecycle.cc:461] loading: tensorrt_llm_bls:1
I0412 01:49:31.146171 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I0412 01:49:31.146238 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] max_num_sequences is not specified, will be set to the TRT engine max_batch_size
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
E0412 01:49:31.247550 386 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
E0412 01:49:31.247624 386 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
I0412 01:49:31.247646 386 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
I0412 01:49:31.247701 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_bls_0_0 (CPU device 0)
I0412 01:49:31.492028 386 model_lifecycle.cc:818] successfully loaded 'tensorrt_llm_bls'
I0412 01:49:31.736910 386 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize

I0412 01:49:31.738283 386 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize

E0412 01:49:31.829928 386 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize

E0412 01:49:31.830012 386 model_lifecycle.cc:621] failed to load 'preprocessing' version 1: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize

I0412 01:49:31.830029 386 model_lifecycle.cc:756] failed to load 'preprocessing'
E0412 01:49:31.831569 386 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize

E0412 01:49:31.831641 386 model_lifecycle.cc:621] failed to load 'postprocessing' version 1: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize

I0412 01:49:31.831668 386 model_lifecycle.cc:756] failed to load 'postprocessing'
E0412 01:49:31.831731 386 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'postprocessing' which has no loaded version. Model 'postprocessing' loading failed with error: version 1 is at UNAVAILABLE state: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
;
I0412 01:49:31.831783 386 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0412 01:49:31.831832 386 server.cc:619]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability |
| | | ":"6.000000","shm-region-prefix-name":"prefix0_","default-max-batch-size":"4"}} |
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability |
| | | ":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+

I0412 01:49:31.831930 +------------------+ | Model +------------------+ | postprocessing | 1 | | | | | | | | | | | | | preprocessing | 1 | | | | | | | | | | | | | tensorrt_llm | 1 | tensorrt_llm_bls | 1 +------------------+ 386 server.cc:662]
---------+--------------------------------------------------------------------------------------------------------------------------------------------+
| Version | Status |
---------+--------------------------------------------------------------------------------------------------------------------------------------------+
| UNAVAILABLE: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of: |
| | (1) a tokenizers library serialization file, |
| | (2) a slow tokenizer instance to convert or |
| | (3) an equivalent slow tokenizer class to instantiate and convert. |
| | You need to have sentencepiece installed to convert a slow tokenizer to a fast one. |
| | |
| | At: |
| | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init |
| | /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init |
| | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained |
| | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained |
| | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained |
| | /tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize |
| UNAVAILABLE: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of: |
| | (1) a tokenizers library serialization file, |
| | (2) a slow tokenizer instance to convert or |
| | (3) an equivalent slow tokenizer class to instantiate and convert. |
| | You need to have sentencepiece installed to convert a slow tokenizer to a fast one. |
| | |
| | At: |
| | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init |
| | /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init |
| | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained |
| | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained |
| | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained |
| | /tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize |
| UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found |
| READY |
---------+--------------------------------------------------------------------------------------------------------------------------------------------+

I0412 01:49:31.862367 386 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA A100-PCIE-40GB
I0412 01:49:31.862637 386 metrics.cc:710] Collecting CPU metrics
I0412 01:49:31.862782 386 tritonserver.cc:2458]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_te |
| | nsor_data parameters statistics trace logging |
| model_repository_path[0] | llama_ifb/ |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0412 01:49:31.862790 386 server.cc:293] Waiting for in-flight requests to complete.
I0412 01:49:31.862795 386 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I0412 01:49:31.862836 386 server.cc:324] All models are stopped, unloading models
I0412 01:49:31.862841 386 server.cc:331] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0412 01:49:32.862929 386 server.cc:331] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
Cleaning up...
I0412 01:49:33.161251 386 model_lifecycle.cc:603] successfully unloaded 'tensorrt_llm_bls' version 1
I0412 01:49:33.863024 386 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why do I get this error after executing the command python3 scripts/launch_triton_server.py --world_size 1 --model_repo=llama_ifb/? #405

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why do I get this error after executing the command python3 scripts/launch_triton_server.py --world_size 1 --model_repo=llama_ifb/? #405

Uh oh!

SevenEmotion Apr 12, 2024

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.

Process name: [[57830,1],0] Exit code: 1

Replies: 0 comments

SevenEmotion
Apr 12, 2024

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

Process name: [[57830,1],0]
Exit code: 1