Skip to content

[ET-VK][ez] Fix Vulkan Validation layer errors due to consecutive command buffer encoding #11401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 9, 2025

Conversation

SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Jun 5, 2025

Stack from ghstack (oldest at bottom):

Changes

  • In VulkanBackend.cpp do not call encode_execute() during model load if the model compile spec specifies requires_dynamic_shapes as true
  • In test files, do not call encode_execute() if propagate_resize() is subsequently called.

Motivation

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, encode_execute() must be called after updating tensor sizes.

As a result, propagate_resize() now calls encode_execute() internally. This results in scenarios where encode_execute() is called once during model load, then again right before the first inference during propagate_resize(), without actually executing the command buffer in-between.

This causes Validation layer errors like


UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

Perf Impact

  • Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
  • No impact for non-dynamic shape models

Differential Revision: D76047203

cc @manuelcandales @cbilgin

…mand buffer encoding

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* In test files, do not call `encode_execute()` if `propagate_resize()` is subsequently called.

## Motivation

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Jun 5, 2025
Copy link

pytorch-bot bot commented Jun 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11401

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit bc4fc9f with merge base ef1d2ff (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SS-JIA added a commit that referenced this pull request Jun 5, 2025
…mand buffer encoding

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* In test files, do not call `encode_execute()` if `propagate_resize()` is subsequently called.

## Motivation

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)

ghstack-source-id: 288459856
Pull Request resolved: #11401
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 5, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76047203

…ecutive command buffer encoding"

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* In test files, do not call `encode_execute()` if `propagate_resize()` is subsequently called.

## Motivation

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)

cc manuelcandales cbilgin

[ghstack-poisoned]
SS-JIA added a commit that referenced this pull request Jun 6, 2025
…mand buffer encoding

Pull Request resolved: #11401

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* `propagate_resize()` will not call `encode_execute()` if dynamic shapes are not expected.

## Motivation

> In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

>  `propagate_resize()` will not call `encode_execute()` if dynamic shapes are not expected.

Llama models will need to call `propagate_resize()`, to account for changes in the second input tensor (a scalar tensor which represents input token position). However, the cost of `encode_execute()` is costly, so we should avoid calling it if possible.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)
ghstack-source-id: 288772382
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76047203

Copy link

github-actions bot commented Jun 6, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ecutive command buffer encoding"

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* In test files, do not call `encode_execute()` if `propagate_resize()` is subsequently called.

## Motivation

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)

cc manuelcandales cbilgin

[ghstack-poisoned]
SS-JIA added a commit that referenced this pull request Jun 6, 2025
…mand buffer encoding

Pull Request resolved: #11401

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* `propagate_resize()` will not call `encode_execute()` if dynamic shapes are not expected.

## Motivation

> In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

>  `propagate_resize()` will not call `encode_execute()` if dynamic shapes are not expected.

Llama models will need to call `propagate_resize()`, to account for changes in the second input tensor (a scalar tensor which represents input token position). However, the cost of `encode_execute()` is costly, so we should avoid calling it if possible.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models
ghstack-source-id: 288810484

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76047203

…ecutive command buffer encoding"

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* In test files, do not call `encode_execute()` if `propagate_resize()` is subsequently called.

## Motivation

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)

cc manuelcandales cbilgin

[ghstack-poisoned]
SS-JIA added a commit that referenced this pull request Jun 9, 2025
…mand buffer encoding

Pull Request resolved: #11401

## Changes

* In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true
* `propagate_resize()` will not call `encode_execute()` if dynamic shapes are not expected.

## Motivation

> In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true

Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes.

As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between.

This causes Validation layer errors like

```

UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x88d2b500000000e2, type: 10, name: NULL
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED.
    Objects: 2
        [0] 0x24086224ec0, type: 6, name: NULL
        [1] 0x6caffc00000000e3, type: 10, name: NULL

```

because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed.

>  `propagate_resize()` will not call `encode_execute()` if dynamic shapes are not expected.

Llama models will need to call `propagate_resize()`, to account for changes in the second input tensor (a scalar tensor which represents input token position). However, the cost of `encode_execute()` is costly, so we should avoid calling it if possible.

## Perf Impact

* Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes
* No impact for non-dynamic shape models
ghstack-source-id: 289110008

Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76047203

@facebook-github-bot facebook-github-bot merged commit 31640dc into gh/SS-JIA/237/base Jun 9, 2025
95 of 98 checks passed
@facebook-github-bot facebook-github-bot deleted the gh/SS-JIA/237/head branch June 9, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants