You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E0212 21:46:42.323909 655 model.py:120] "Traceback (most recent call last):\n File "/root/models/gpt2/1/model.py", line 88, in execute\n req = self.decoder.convert_triton_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request\n request = Request()\n ^^^^^^^^^\n File "", line 3, in init\nTypeError: 'numpy.ndarray' object is not callable\n"
triton - ERROR - Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable
triton - ERROR - Unexpected error:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 51, in main
run()
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 45, in run
args.func(args)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/parser.py", line 363, in handle_infer
client.infer(model=args.model, prompt=args.prompt)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 217, in infer
self.__async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 221, in __async_infer
self.__grpc_async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 273, in __grpc_async_infer
raise result
tritonclient.utils.InferenceServerException: Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable
additional notes
I check the triton_decoder.py in tensorrtllm_backend/infight_batcher_llm. It has the same code as gpt.
The text was updated successfully, but these errors were encountered:
System Info
Docker Image: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
CPU: x86_64
GPU: H100
The container also includes the following:
[Ubuntu 24.04] including [Python 3.12]
[NVIDIA CUDA 12.6.3]
[NVIDIA cuBLAS 12.6.4.1]
[cuDNN 9.6.0.74]
[NVIDIA NCCL 2.23.4]
[NVIDIA TensorRT™ 10.7.0.23]
OpenUCX 1.15.0
GDRCopy 2.4.1
NVIDIA HPC-X 2.21
OpenMPI 4.1.7]]
[nvImageCodec 0.2.0.7]
ONNX Runtime 1.20.1
Intel[ OpenVINO ]
DCGM 3.3.6
[TensorRT-LLM] version [release/0.16.0]
[vLLM] version 0.5.5
Who can help?
After triton server was launched successfully, check its status by running triton status. It show triton server is running and ready.
Then sending following two requests:
1 triton infer -m gpt2 --prompt hello -i grpc -u localhost -p 8001
2. genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --num-prompts 1000 --random-seed 123 --synthetic-input-tokens-mean 1000 --synthetic-input-tokens-stddev 0 --output-tokens-mean 512 --output-tokens-stddev 0 --output-tokens-mean-deterministic --tokenizer /root/models/gpt2/tokenizer --concurrency 16 --measurement-interval 8000 --profile-export-file my_profile_export.json --url localhost:8001
it always returned error message described in the actual result.
Is there anyone help on this.
Thanks/Gavin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
send request:
genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --num-prompts 1000 --random-seed 123 --synthetic-input-tokens-mean 1000 --synthetic-input-tokens-stddev 0 --output-tokens-mean 512 --output-tokens-stddev 0 --output-tokens-mean-deterministic --tokenizer /root/models/gpt2/tokenizer --concurrency 16 --measurement-interval 8000 --profile-export-file my_profile_export.json --url localhost:8001
Expected behavior
should be like this.
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Statistic ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p75 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Request latency (ns) │ 296,990,497 │ 43,312,449 │ 332,788,242 │ 327,475,292 │ 317,392,767 │ 310,343,333 │
│ Output sequence length │ 109 │ 11 │ 158 │ 142 │ 118 │ 113 │
│ Input sequence length │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
└────────────────────────┴─────────────┴────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
Output token throughput (per sec): 366.78
Request throughput (per sec): 3.37
actual behavior
E0212 21:46:42.323909 655 model.py:120] "Traceback (most recent call last):\n File "/root/models/gpt2/1/model.py", line 88, in execute\n req = self.decoder.convert_triton_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request\n request = Request()\n ^^^^^^^^^\n File "", line 3, in init\nTypeError: 'numpy.ndarray' object is not callable\n"
triton - ERROR - Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable
triton - ERROR - Unexpected error:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 51, in main
run()
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 45, in run
args.func(args)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/parser.py", line 363, in handle_infer
client.infer(model=args.model, prompt=args.prompt)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 217, in infer
self.__async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 221, in __async_infer
self.__grpc_async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 273, in __grpc_async_infer
raise result
tritonclient.utils.InferenceServerException: Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable
additional notes
I check the triton_decoder.py in tensorrtllm_backend/infight_batcher_llm. It has the same code as gpt.
The text was updated successfully, but these errors were encountered: