Skip to content

[Doc]: Find out about how to optimize the parameters of vLLM V1 on ROCm #64

Open
@tjtanaa

Description

@tjtanaa

📚 The doc issue

Using docker image rocm/vllm-dev:base_aiter_test_main_20250606_tuned_20250609 as base to create our vLLM docker image.

Upload the results file based on models to the following link
https://embeddedllm502.sharepoint.com/:f:/s/ExternalSharing/Em4fFI2PF6hEoyW2OlvbJwcB-lOPSao34-Rw0Cv4CSv7CA?e=uqyhMQ

We need to run experiments similar to this blogpost https://blog.vllm.ai/2024/10/23/vllm-serving-amd.html across multiple facets of the vLLM V1 engine arguments to find out the best way to host v1 engine on vLLM.

Test Server Mode
Number of prompts: 200
ISL: OSL = 500 SharedGPT Dataset, 1000/2000, 5000/1000

The following aspects should be viewed as a permutation collection:

  1. With/Without AITER
    i. With AITER
    ii. Without AITER

  2. Chunked Prefill
    i. Without Chunked Prefill
    ii. Chunked Size = default (find out from source code)
    iii. Chunked Size = 2048
    iv. Chunked Size = 4096
    v. Chunked Size = 8192
    vi. Chunked Size = 16384
    vii. Chunked Size = 32768

  3. Prefix Caching
    i. Without prefix caching
    ii. With prefix caching

  4. Models:
    i. Llama4-Maverick-FP8 - TP8
    ii. DeepSeekV3 - TP8
    iii. Llama-3.1-70B-Instruct - TP1
    iv. Qwen3-32B-Instruct - TP1
    v. Llama-3.1-70B-Instruct --quantization ptpc_fp8 - TP1
    vi. Qwen/Qwen3-32B-FP8 - TP1

  5. If it has been merged ([Attention][V1] Toggle for v1 attention backend vllm-project/vllm#18275)

    • VLLM_V1_USE_PREFILL_DECODE_ATTENTION=True (this will use the prefill decode attention)
    • VLLM_V1_USE_PREFILL_DECODE_ATTENTION=False (this will use the unified triton attention)
  6. block-size

    • 1 (MLA ROCm only)
    • 8
    • 16
    • 32
    • 64
    • 128

Suggest a potential alternative/fix

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions