Skip to content

numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks
freedom-168 opened this issue Feb 12, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@freedom-168
Copy link

System Info

Docker Image: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
CPU: x86_64
GPU: H100
The container also includes the following:
[Ubuntu 24.04] including [Python 3.12]
[NVIDIA CUDA 12.6.3]
[NVIDIA cuBLAS 12.6.4.1]
[cuDNN 9.6.0.74]
[NVIDIA NCCL 2.23.4]
[NVIDIA TensorRT™ 10.7.0.23]
OpenUCX 1.15.0
GDRCopy 2.4.1
NVIDIA HPC-X 2.21
OpenMPI 4.1.7]]
[nvImageCodec 0.2.0.7]
ONNX Runtime 1.20.1
Intel[ OpenVINO ]
DCGM 3.3.6
[TensorRT-LLM] version [release/0.16.0]
[vLLM] version 0.5.5

Who can help?

After triton server was launched successfully, check its status by running triton status. It show triton server is running and ready.

Then sending following two requests:

1 triton infer -m gpt2 --prompt hello -i grpc -u localhost -p 8001
2. genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --num-prompts 1000 --random-seed 123 --synthetic-input-tokens-mean 1000 --synthetic-input-tokens-stddev 0 --output-tokens-mean 512 --output-tokens-stddev 0 --output-tokens-mean-deterministic --tokenizer /root/models/gpt2/tokenizer --concurrency 16 --measurement-interval 8000 --profile-export-file my_profile_export.json --url localhost:8001

it always returned error message described in the actual result.

Is there anyone help on this.

Thanks/Gavin

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

send request:

genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --num-prompts 1000 --random-seed 123 --synthetic-input-tokens-mean 1000 --synthetic-input-tokens-stddev 0 --output-tokens-mean 512 --output-tokens-stddev 0 --output-tokens-mean-deterministic --tokenizer /root/models/gpt2/tokenizer --concurrency 16 --measurement-interval 8000 --profile-export-file my_profile_export.json --url localhost:8001

Expected behavior

should be like this.

                                            LLM Metrics

┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Statistic ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p75 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Request latency (ns) │ 296,990,497 │ 43,312,449 │ 332,788,242 │ 327,475,292 │ 317,392,767 │ 310,343,333 │
│ Output sequence length │ 109 │ 11 │ 158 │ 142 │ 118 │ 113 │
│ Input sequence length │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
└────────────────────────┴─────────────┴────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
Output token throughput (per sec): 366.78
Request throughput (per sec): 3.37

actual behavior

E0212 21:46:42.323909 655 model.py:120] "Traceback (most recent call last):\n File "/root/models/gpt2/1/model.py", line 88, in execute\n req = self.decoder.convert_triton_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request\n request = Request()\n ^^^^^^^^^\n File "", line 3, in init\nTypeError: 'numpy.ndarray' object is not callable\n"
triton - ERROR - Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable

triton - ERROR - Unexpected error:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 51, in main
run()
File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 45, in run
args.func(args)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/parser.py", line 363, in handle_infer
client.infer(model=args.model, prompt=args.prompt)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 217, in infer
self.__async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 221, in __async_infer
self.__grpc_async_infer(model, inputs)
File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 273, in __grpc_async_infer
raise result
tritonclient.utils.InferenceServerException: Traceback (most recent call last):
File "/root/models/gpt2/1/model.py", line 88, in execute
req = self.decoder.convert_triton_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
request = Request()
^^^^^^^^^
File "", line 3, in init
TypeError: 'numpy.ndarray' object is not callable

additional notes

I check the triton_decoder.py in tensorrtllm_backend/infight_batcher_llm. It has the same code as gpt.

@freedom-168 freedom-168 added the bug Something isn't working label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant