You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
python inflight_batcher_llm_client.py \
--text "What is the capital of France?" \
--tokenizer-dir Qwen2.5-7B-Instruct \
--lora-task-id 0 \
--lora-path loras/email-lora-0-converted/
Output error
[StatusCode.INVALID_ARGUMENT] [request id: <id_unknown>] unexpected shape for input 'lora_config'for model 'tensorrt_llm'. Expected [-1,-1,3], got [1,196,4].
NOTE: Setting a non-zero max_batch_size in the model config requires a batch dimension to be prepended to each input shape.
Expected behavior
The LoRA adapter should work out-of-the-box using hf_lora_convert.py. The model.lora_config.npy should conform to the required [1, N, 3] shape and be accepted by the Triton server.
actual behavior
The output of hf_lora_convert.py produces a model.lora_config.npy file with shape [1, N, 4]. This causes the Triton server to reject the LoRA config with an INVALID_ARGUMENT error. Only after manually running fix_loras.py to truncate and reshape does it pass shape validation.
additional notes
# fix_loras.py#!/usr/bin/env python3importargparseimportosimportshutilimportnumpyasnpdeffix_lora_config(input_path, output_path):
# Loadconfig=np.load(input_path)
print(f"Original config shape: {config.shape}")
# Remove batch dimension if existsifconfig.ndim==3andconfig.shape[0] ==1:
config=np.squeeze(config, axis=0)
# Truncate to 3 columnsifconfig.shape[1] >3:
config=config[:, :3]
ifconfig.ndim!=2orconfig.shape[1] !=3:
raiseValueError(f"Invalid config shape after cleanup: {config.shape} (expected (N, 3))")
# Add back batch dimensionconfig=np.expand_dims(config, axis=0)
print(f"Fixed config shape: {config.shape}")
returnconfigdefmain():
parser=argparse.ArgumentParser(description="Fix LoRA .npy files for TensorRT-LLM compatibility.")
parser.add_argument("-i", "--input", required=True, help="Input directory with LoRA files")
parser.add_argument("-o", "--output", required=True, help="Output directory for fixed LoRA files")
args=parser.parse_args()
input_dir=os.path.abspath(args.input)
output_dir=os.path.abspath(args.output)
ifnotos.path.isdir(input_dir):
raiseFileNotFoundError(f"Input directory not found: {input_dir}")
os.makedirs(output_dir, exist_ok=True)
# Check for model.lora_weights.npyweight_file=os.path.join(input_dir, "model.lora_weights.npy")
ifnotos.path.isfile(weight_file):
raiseFileNotFoundError(f"{weight_file} not found")
shutil.copy2(weight_file, os.path.join(output_dir, "model.lora_weights.npy"))
print(f"Copied weights to: {output_dir}/model.lora_weights.npy")
# Fix and write model.lora_config.npyconfig_file=os.path.join(input_dir, "model.lora_config.npy")
ifnotos.path.isfile(config_file):
raiseFileNotFoundError(f"{config_file} not found")
fixed_config=fix_lora_config(config_file, output_dir)
np.save(os.path.join(output_dir, "model.lora_config.npy"), fixed_config)
print(f"Saved fixed config to: {output_dir}/model.lora_config.npy")
print("LoRA conversion complete.")
if__name__=="__main__":
main()
The text was updated successfully, but these errors were encountered:
System Info
System Info
nvcr.io/nvidia/tritonserver:25.04-trtllm-python-py3
--gpus all
,--shm-size=16GB
Who can help?
@byshiue @kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Qwen2.5-7B-Instruct
using TensorRT-LLM:huggingface-cli download doubleyyh/email-tuned-qwen2-lora --local-dir loras/email-lora-0 python hf_lora_convert.py -i loras/email-lora-0 -o loras/email-lora-0-converted # Do NOT run fix_loras.py
python inflight_batcher_llm_client.py \ --text "What is the capital of France?" \ --tokenizer-dir Qwen2.5-7B-Instruct \ --lora-task-id 0 \ --lora-path loras/email-lora-0-converted/
Output error
Expected behavior
The LoRA adapter should work out-of-the-box using hf_lora_convert.py. The model.lora_config.npy should conform to the required [1, N, 3] shape and be accepted by the Triton server.
actual behavior
The output of hf_lora_convert.py produces a model.lora_config.npy file with shape [1, N, 4]. This causes the Triton server to reject the LoRA config with an INVALID_ARGUMENT error. Only after manually running fix_loras.py to truncate and reshape does it pass shape validation.
additional notes
The text was updated successfully, but these errors were encountered: