Why do I get this error after executing the command python3 scripts/launch_triton_server.py --world_size 1 --model_repo=llama_ifb/? #405
Unanswered
SevenEmotion
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I0412 01:49:31.086437 386 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f3bec000000' with size 268435456
I0412 01:49:31.088762 386 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0412 01:49:31.093652 386 model_lifecycle.cc:461] loading: postprocessing:1
I0412 01:49:31.093690 386 model_lifecycle.cc:461] loading: preprocessing:1
I0412 01:49:31.093763 386 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0412 01:49:31.093790 386 model_lifecycle.cc:461] loading: tensorrt_llm_bls:1
I0412 01:49:31.146171 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I0412 01:49:31.146238 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] max_num_sequences is not specified, will be set to the TRT engine max_batch_size
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
E0412 01:49:31.247550 386 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
E0412 01:49:31.247624 386 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
I0412 01:49:31.247646 386 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
I0412 01:49:31.247701 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_bls_0_0 (CPU device 0)
I0412 01:49:31.492028 386 model_lifecycle.cc:818] successfully loaded 'tensorrt_llm_bls'
I0412 01:49:31.736910 386 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
I0412 01:49:31.738283 386 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize
E0412 01:49:31.829928 386 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize
E0412 01:49:31.830012 386 model_lifecycle.cc:621] failed to load 'preprocessing' version 1: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize
I0412 01:49:31.830029 386 model_lifecycle.cc:756] failed to load 'preprocessing'
E0412 01:49:31.831569 386 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
E0412 01:49:31.831641 386 model_lifecycle.cc:621] failed to load 'postprocessing' version 1: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
I0412 01:49:31.831668 386 model_lifecycle.cc:756] failed to load 'postprocessing'
E0412 01:49:31.831731 386 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'postprocessing' which has no loaded version. Model 'postprocessing' loading failed with error: version 1 is at UNAVAILABLE state: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
;
I0412 01:49:31.831783 386 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0412 01:49:31.831832 386 server.cc:619]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability |
| | | ":"6.000000","shm-region-prefix-name":"prefix0_","default-max-batch-size":"4"}} |
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability |
| | | ":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
I0412 01:49:31.831930 386 server.cc:662]
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing | 1 | UNAVAILABLE: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of: |
| | | (1) a
tokenizers
library serialization file, || | | (2) a slow tokenizer instance to convert or |
| | | (3) an equivalent slow tokenizer class to instantiate and convert. |
| | | You need to have sentencepiece installed to convert a slow tokenizer to a fast one. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained |
| | | /tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize |
| preprocessing | 1 | UNAVAILABLE: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of: |
| | | (1) a
tokenizers
library serialization file, || | | (2) a slow tokenizer instance to convert or |
| | | (3) an equivalent slow tokenizer class to instantiate and convert. |
| | | You need to have sentencepiece installed to convert a slow tokenizer to a fast one. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained |
| | | /tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize |
| tensorrt_llm | 1 | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found |
| tensorrt_llm_bls | 1 | READY |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+
I0412 01:49:31.862367 386 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA A100-PCIE-40GB
I0412 01:49:31.862637 386 metrics.cc:710] Collecting CPU metrics
I0412 01:49:31.862782 386 tritonserver.cc:2458]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_te |
| | nsor_data parameters statistics trace logging |
| model_repository_path[0] | llama_ifb/ |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0412 01:49:31.862790 386 server.cc:293] Waiting for in-flight requests to complete.
I0412 01:49:31.862795 386 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I0412 01:49:31.862836 386 server.cc:324] All models are stopped, unloading models
I0412 01:49:31.862841 386 server.cc:331] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0412 01:49:32.862929 386 server.cc:331] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
Cleaning up...
I0412 01:49:33.161251 386 model_lifecycle.cc:603] successfully unloaded 'tensorrt_llm_bls' version 1
I0412 01:49:33.863024 386 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[57830,1],0]
Exit code: 1
Beta Was this translation helpful? Give feedback.
All reactions