[Doc]: Find out about how to optimize the parameters of vLLM V1 on ROCm

### 📚 The doc issue

Using docker image `rocm/vllm-dev:base_aiter_test_main_20250606_tuned_20250609` as base to create our vLLM docker image.

Upload the results file based on models to the following link
https://embeddedllm502.sharepoint.com/:f:/s/ExternalSharing/Em4fFI2PF6hEoyW2OlvbJwcB-lOPSao34-Rw0Cv4CSv7CA?e=uqyhMQ

We need to run experiments similar to this blogpost https://blog.vllm.ai/2024/10/23/vllm-serving-amd.html across multiple facets of the vLLM V1 engine arguments to find out the best way to host v1 engine on vLLM.

Test Server Mode
Number of prompts: 200
ISL: OSL = 500 SharedGPT Dataset, 1000/2000, 5000/1000

The following aspects should be viewed as a permutation collection:
1. With/Without AITER
   i. With AITER
   ii. Without AITER

2. Chunked Prefill
   i. ~Without Chunked Prefill~
   ii. Chunked Size = default (find out from source code)
   iii. Chunked Size = 2048
   iv. Chunked Size = 4096
   v. Chunked Size = 8192
   vi. Chunked Size = 16384
   vii. Chunked Size = 32768

3. Prefix Caching
   i. Without prefix caching
   ii. With prefix caching

4. Models:
   i. Llama4-Maverick-FP8 - TP8
   ii. DeepSeekV3 - TP8
   iii. Llama-3.1-70B-Instruct - TP1
   iv. Qwen3-32B-Instruct - TP1
   v. Llama-3.1-70B-Instruct --quantization ptpc_fp8 - TP1
   vi. Qwen/Qwen3-32B-FP8 - TP1

5. If it has been merged (https://github.com/vllm-project/vllm/pull/18275)
   - VLLM_V1_USE_PREFILL_DECODE_ATTENTION=True (this will use the prefill decode attention)
   - VLLM_V1_USE_PREFILL_DECODE_ATTENTION=False (this will use the unified triton attention)

6. block-size
   - 1 (MLA ROCm only)
   - 8
   - 16
   - 32
   - 64
   - 128



### Suggest a potential alternative/fix

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc]: Find out about how to optimize the parameters of vLLM V1 on ROCm #64

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Doc]: Find out about how to optimize the parameters of vLLM V1 on ROCm #64

Description

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions