numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

freedom-168 · 2025-02-12T22:06:11Z

System Info

Docker Image: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
CPU: x86_64
GPU: H100
The container also includes the following:
[Ubuntu 24.04] including [Python 3.12]
[NVIDIA CUDA 12.6.3]
[NVIDIA cuBLAS 12.6.4.1]
[cuDNN 9.6.0.74]
[NVIDIA NCCL 2.23.4]
[NVIDIA TensorRT™ 10.7.0.23]
OpenUCX 1.15.0
GDRCopy 2.4.1
NVIDIA HPC-X 2.21
OpenMPI 4.1.7]]
[nvImageCodec 0.2.0.7]
ONNX Runtime 1.20.1
Intel[ OpenVINO ]
DCGM 3.3.6
[TensorRT-LLM] version [release/0.16.0]
[vLLM] version 0.5.5

Who can help?

After triton server was launched successfully, check its status by running triton status. It show triton server is running and ready.

Then sending following two requests:

1 triton infer -m gpt2 --prompt hello -i grpc -u localhost -p 8001
2. genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --num-prompts 1000 --random-seed 123 --synthetic-input-tokens-mean 1000 --synthetic-input-tokens-stddev 0 --output-tokens-mean 512 --output-tokens-stddev 0 --output-tokens-mean-deterministic --tokenizer /root/models/gpt2/tokenizer --concurrency 16 --measurement-interval 8000 --profile-export-file my_profile_export.json --url localhost:8001

it always returned error message described in the actual result.

Is there anyone help on this.

Thanks/Gavin

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

send request:

genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --num-prompts 1000 --random-seed 123 --synthetic-input-tokens-mean 1000 --synthetic-input-tokens-stddev 0 --output-tokens-mean 512 --output-tokens-stddev 0 --output-tokens-mean-deterministic --tokenizer /root/models/gpt2/tokenizer --concurrency 16 --measurement-interval 8000 --profile-export-file my_profile_export.json --url localhost:8001

Expected behavior

should be like this.

                                            LLM Metrics

┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Statistic ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p75 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Request latency (ns) │ 296,990,497 │ 43,312,449 │ 332,788,242 │ 327,475,292 │ 317,392,767 │ 310,343,333 │
│ Output sequence length │ 109 │ 11 │ 158 │ 142 │ 118 │ 113 │
│ Input sequence length │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
└────────────────────────┴─────────────┴────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
Output token throughput (per sec): 366.78
Request throughput (per sec): 3.37

actual behavior

E0212 21:46:42.323909 655 model.py:120] "Traceback (most recent call last):\n File "/root/models/gpt2/1/model.py", line 88, in execute\n req = self.decoder.convert_triton_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request\n request = Request()\n ^^^^^^^^^\n File "", line 3, in init\nTypeError: 'numpy.ndarray' object is not callable\n"
triton - ERROR - Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable

triton - ERROR - Unexpected error:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 51, in main
run()
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 45, in run
args.func(args)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/parser.py", line 363, in handle_infer
client.infer(model=args.model, prompt=args.prompt)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 217, in infer
self.__async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 221, in __async_infer
self.__grpc_async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 273, in __grpc_async_infer
raise result
tritonclient.utils.InferenceServerException: Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable

additional notes

I check the triton_decoder.py in tensorrtllm_backend/infight_batcher_llm. It has the same code as gpt.

The text was updated successfully, but these errors were encountered:

freedom-168 added the bug Something isn't working label Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

freedom-168 commented Feb 12, 2025

numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

Comments

freedom-168 commented Feb 12, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes