Skip to content

Commit 0a243bb

Browse files
fyellinalmarkleinKorijn
authored
Implement push-constants (#574)
* Initial implementation of push_constants * Initial implementation of push_constants * Better handling of limits Fix lint errors. * One more lint error. * And one more typo. * Change limits to use hyphens Combine the code that accesses features and limits for adapters and devices, since they are almost identical. Add an error for unknown limit * Forgot to uncomment some lines * Removed a couple of more comments * Fix typo in comment. Minor cleanup. * Move push_constants stuff to extras.py * Fix flake and codegen * Fix failing test * Linux is failing even though my Mac isn't. I have to figure out what's wrong. :-( * And one last lint problem * First pass at documentation. * First pass at documentation. * Undo accidental modification * See * Found one carryover from move to 22.1 that I forgot to include. Undoing all typo mistakes and moving to a different push. * Yikes. One more _api change * Yikes. One more _api change * Apply suggestions from code review Co-authored-by: Almar Klein <[email protected]> * Update comments. Comment @create_and_release as requested. * Tiny change to get tests to run again. * Apply suggestions from code review Co-authored-by: Almar Klein <[email protected]> --------- Co-authored-by: Almar Klein <[email protected]> Co-authored-by: Korijn van Golen <[email protected]>
1 parent 466af69 commit 0a243bb

File tree

8 files changed

+538
-98
lines changed

8 files changed

+538
-98
lines changed

docs/backends.rst

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,103 @@ The wgpu_native backend provides a few extra functionalities:
5959
:return: Device
6060
:rtype: wgpu.GPUDevice
6161

62+
The wgpu_native backend provides support for push constants.
63+
Since WebGPU does not support this feature, documentation on its use is hard to find.
64+
A full explanation of push constants and its use in Vulkan can be found
65+
`here <https://vkguide.dev/docs/chapter-3/push_constants/>`_.
66+
Using push constants in WGPU closely follows the Vulkan model.
67+
68+
The advantage of push constants is that they are typically faster to update than uniform buffers.
69+
Modifications to push constants are included in the command encoder; updating a uniform
70+
buffer involves sending a separate command to the GPU.
71+
The disadvantage of push constants is that their size limit is much smaller. The limit
72+
is guaranteed to be at least 128 bytes, and 256 bytes is typical.
73+
74+
Given an adapter, first determine if it supports push constants::
75+
76+
>> "push-constants" in adapter.features
77+
True
78+
79+
If push constants are supported, determine the maximum number of bytes that can
80+
be allocated for push constants::
81+
82+
>> adapter.limits["max-push-constant-size"]
83+
256
84+
85+
You must tell the adapter to create a device that supports push constants,
86+
and you must tell it the number of bytes of push constants that you are using.
87+
Overestimating is okay::
88+
89+
device = adapter.request_device(
90+
required_features=["push-constants"],
91+
required_limits={"max-push-constant-size": 256},
92+
)
93+
94+
Creating a push constant in your shader code is similar to the way you would create
95+
a uniform buffer.
96+
The fields that are only used in the ``@vertex`` shader should be separated from the fields
97+
that are only used in the ``@fragment`` shader which should be separated from the fields
98+
used in both shaders::
99+
100+
struct PushConstants {
101+
// vertex shader
102+
vertex_transform: vec4x4f,
103+
// fragment shader
104+
fragment_transform: vec4x4f,
105+
// used in both
106+
generic_transform: vec4x4f,
107+
}
108+
var<push_constant> push_constants: PushConstants;
109+
110+
To the pipeline layout for this shader, use
111+
``wgpu.backends.wpgu_native.create_pipeline_layout`` instead of
112+
``device.create_pipelinelayout``. It takes an additional argument,
113+
``push_constant_layouts``, describing
114+
the layout of the push constants. For example, in the above example::
115+
116+
push_constant_layouts = [
117+
{"visibility": ShaderState.VERTEX, "start": 0, "end": 64},
118+
{"visibility": ShaderStage.FRAGMENT, "start": 64, "end": 128},
119+
{"visibility": ShaderState.VERTEX + ShaderStage.FRAGMENT , "start": 128, "end": 192},
120+
],
121+
122+
Finally, you set the value of the push constant by using
123+
``wgpu.backends.wpgu_native.set_push_constants``::
124+
125+
set_push_constants(this_pass, ShaderStage.VERTEX, 0, 64, <64 bytes>)
126+
set_push_constants(this_pass, ShaderStage.FRAGMENT, 64, 128, <64 bytes>)
127+
set_push_constants(this_pass, ShaderStage.VERTEX + ShaderStage.FRAGMENT, 128, 192, <64 bytes>)
128+
129+
Bytes must be set separately for each of the three shader stages. If the push constant has
130+
already been set, on the next use you only need to call ``set_push_constants`` on those
131+
bytes you wish to change.
132+
133+
.. py:function:: wgpu.backends.wpgu_native.create_pipeline_layout(device, *, label="", bind_group_layouts, push_constant_layouts=[])
134+
135+
This method provides the same functionality as :func:`wgpu.GPUDevice.create_pipeline_layout`,
136+
but provides an extra `push_constant_layouts` argument.
137+
When using push constants, this argument is a list of dictionaries, where each item
138+
in the dictionary has three fields: `visibility`, `start`, and `end`.
139+
140+
:param device: The device on which we are creating the pipeline layout
141+
:param label: An optional label
142+
:param bind_group_layouts:
143+
:param push_constant_layouts: Described above.
144+
145+
.. py:function:: wgpu.backends.wgpu_native.set_push_constants(render_pass_encoder, visibility, offset, size_in_bytes, data, data_offset=0)
146+
147+
This function requires that the underlying GPU implement `push_constants`.
148+
These push constants are a buffer of bytes available to the `fragment` and `vertex`
149+
shaders. They are similar to a bound buffer, but the buffer is set using this
150+
function call.
151+
152+
:param render_pass_encoder: The render pass encoder to which we are pushing constants.
153+
:param visibility: The stages (vertex, fragment, or both) to which these constants are visible
154+
:param offset: The offset into the push constants at which the bytes are to be written
155+
:param size_in_bytes: The number of bytes to copy from the ata
156+
:param data: The data to copy to the buffer
157+
:param data_offset: The starting offset in the data at which to begin copying.
158+
62159

63160
The js_webgpu backend
64161
---------------------

tests/test_set_constant.py

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
import numpy as np
2+
import pytest
3+
4+
import wgpu.utils
5+
from tests.testutils import can_use_wgpu_lib, run_tests
6+
from wgpu import TextureFormat
7+
from wgpu.backends.wgpu_native.extras import create_pipeline_layout, set_push_constants
8+
9+
if not can_use_wgpu_lib:
10+
pytest.skip("Skipping tests that need the wgpu lib", allow_module_level=True)
11+
12+
13+
"""
14+
This code is an amazingly slow way of adding together two 10-element arrays of 32-bit
15+
integers defined by push constants and store them into an output buffer.
16+
17+
The first number of the addition is purposely pulled using the vertex stage, and the
18+
second number from the fragment stage, so that we can ensure that we are correctly
19+
using stage-separated push constants correctly.
20+
21+
The source code assumes the topology is POINT-LIST, so that each call to vertexMain
22+
corresponds with one call to fragmentMain.
23+
"""
24+
COUNT = 10
25+
26+
SHADER_SOURCE = (
27+
f"""
28+
const COUNT = {COUNT}u;
29+
"""
30+
"""
31+
// Put the results here
32+
@group(0) @binding(0) var<storage, read_write> data: array<u32, COUNT>;
33+
34+
struct PushConstants {
35+
values1: array<u32, COUNT>, // VERTEX constants
36+
values2: array<u32, COUNT>, // FRAGMENT constants
37+
}
38+
var<push_constant> push_constants: PushConstants;
39+
40+
struct VertexOutput {
41+
@location(0) index: u32,
42+
@location(1) value: u32,
43+
@builtin(position) position: vec4f,
44+
}
45+
46+
@vertex
47+
fn vertexMain(
48+
@builtin(vertex_index) index: u32,
49+
) -> VertexOutput {
50+
return VertexOutput(index, push_constants.values1[index], vec4f(0, 0, 0, 1));
51+
}
52+
53+
@fragment
54+
fn fragmentMain(@location(0) index: u32,
55+
@location(1) value: u32
56+
) -> @location(0) vec4f {
57+
data[index] = value + push_constants.values2[index];
58+
return vec4f();
59+
}
60+
"""
61+
)
62+
63+
BIND_GROUP_ENTRIES = [
64+
{"binding": 0, "visibility": "FRAGMENT", "buffer": {"type": "storage"}},
65+
]
66+
67+
68+
def setup_pipeline():
69+
adapter = wgpu.gpu.request_adapter(power_preference="high-performance")
70+
device = adapter.request_device(
71+
required_features=["push-constants"],
72+
required_limits={"max-push-constant-size": 128},
73+
)
74+
output_texture = device.create_texture(
75+
# Actual size is immaterial. Could just be 1x1
76+
size=[128, 128],
77+
format=TextureFormat.rgba8unorm,
78+
usage="RENDER_ATTACHMENT|COPY_SRC",
79+
)
80+
shader = device.create_shader_module(code=SHADER_SOURCE)
81+
bind_group_layout = device.create_bind_group_layout(entries=BIND_GROUP_ENTRIES)
82+
render_pipeline_layout = create_pipeline_layout(
83+
device,
84+
bind_group_layouts=[bind_group_layout],
85+
push_constant_layouts=[
86+
{"visibility": "VERTEX", "start": 0, "end": COUNT * 4},
87+
{"visibility": "FRAGMENT", "start": COUNT * 4, "end": COUNT * 4 * 2},
88+
],
89+
)
90+
pipeline = device.create_render_pipeline(
91+
layout=render_pipeline_layout,
92+
vertex={
93+
"module": shader,
94+
"entry_point": "vertexMain",
95+
},
96+
fragment={
97+
"module": shader,
98+
"entry_point": "fragmentMain",
99+
"targets": [{"format": output_texture.format}],
100+
},
101+
primitive={
102+
"topology": "point-list",
103+
},
104+
)
105+
render_pass_descriptor = {
106+
"color_attachments": [
107+
{
108+
"clear_value": (0, 0, 0, 0), # only first value matters
109+
"load_op": "clear",
110+
"store_op": "store",
111+
"view": output_texture.create_view(),
112+
}
113+
],
114+
}
115+
116+
return device, pipeline, render_pass_descriptor
117+
118+
119+
def test_normal_push_constants():
120+
device, pipeline, render_pass_descriptor = setup_pipeline()
121+
vertex_call_buffer = device.create_buffer(size=COUNT * 4, usage="STORAGE|COPY_SRC")
122+
bind_group = device.create_bind_group(
123+
layout=pipeline.get_bind_group_layout(0),
124+
entries=[
125+
{"binding": 0, "resource": {"buffer": vertex_call_buffer}},
126+
],
127+
)
128+
129+
encoder = device.create_command_encoder()
130+
this_pass = encoder.begin_render_pass(**render_pass_descriptor)
131+
this_pass.set_pipeline(pipeline)
132+
this_pass.set_bind_group(0, bind_group)
133+
134+
buffer = np.random.randint(0, 1_000_000, size=(2 * COUNT), dtype=np.uint32)
135+
set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, buffer)
136+
set_push_constants(this_pass, "FRAGMENT", COUNT * 4, COUNT * 4, buffer, COUNT * 4)
137+
this_pass.draw(COUNT)
138+
this_pass.end()
139+
device.queue.submit([encoder.finish()])
140+
info_view = device.queue.read_buffer(vertex_call_buffer)
141+
result = np.frombuffer(info_view, dtype=np.uint32)
142+
expected_result = buffer[0:COUNT] + buffer[COUNT:]
143+
assert all(result == expected_result)
144+
145+
146+
def test_bad_set_push_constants():
147+
device, pipeline, render_pass_descriptor = setup_pipeline()
148+
encoder = device.create_command_encoder()
149+
this_pass = encoder.begin_render_pass(**render_pass_descriptor)
150+
151+
def zeros(n):
152+
return np.zeros(n, dtype=np.uint32)
153+
154+
with pytest.raises(ValueError):
155+
# Buffer is to short
156+
set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, zeros(COUNT - 1))
157+
158+
with pytest.raises(ValueError):
159+
# Buffer is to short
160+
set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, zeros(COUNT + 1), 8)
161+
162+
163+
if __name__ == "__main__":
164+
run_tests(globals())

tests/test_wgpu_native_basics.py

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -424,18 +424,48 @@ def test_features_are_legal():
424424
)
425425
# We can also use underscore
426426
assert are_features_wgpu_legal(["push_constants", "vertex_writable_storage"])
427+
# We can also use camel case
428+
assert are_features_wgpu_legal(["PushConstants", "VertexWritableStorage"])
427429

428430

429431
def test_features_are_illegal():
430-
# not camel Case
431-
assert not are_features_wgpu_legal(["pushConstants"])
432432
# writable is misspelled
433433
assert not are_features_wgpu_legal(
434434
["multi-draw-indirect", "vertex-writeable-storage"]
435435
)
436436
assert not are_features_wgpu_legal(["my-made-up-feature"])
437437

438438

439+
def are_limits_wgpu_legal(limits):
440+
"""Returns true if the list of features is legal. Determining whether a specific
441+
set of features is implemented on a particular device would make the tests fragile,
442+
so we only verify that the names are legal feature names."""
443+
adapter = wgpu.gpu.request_adapter(power_preference="high-performance")
444+
try:
445+
adapter.request_device(required_limits=limits)
446+
return True
447+
except RuntimeError as e:
448+
assert "Unsupported features were requested" in str(e)
449+
return True
450+
except KeyError:
451+
return False
452+
453+
454+
def test_limits_are_legal():
455+
# A standard feature. Probably exists
456+
assert are_limits_wgpu_legal({"max-bind-groups": 8})
457+
# Two common extension features
458+
assert are_limits_wgpu_legal({"max-push-constant-size": 128})
459+
# We can also use underscore
460+
assert are_limits_wgpu_legal({"max_bind_groups": 8, "max_push_constant_size": 128})
461+
# We can also use camel case
462+
assert are_limits_wgpu_legal({"maxBindGroups": 8, "maxPushConstantSize": 128})
463+
464+
465+
def test_limits_are_not_legal():
466+
assert not are_limits_wgpu_legal({"max-bind-group": 8})
467+
468+
439469
if __name__ == "__main__":
440470
run_tests(globals())
441471

tests_mem/testutils.py

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,40 @@ def ob_name_from_test_func(func):
145145

146146

147147
def create_and_release(create_objects_func):
148-
"""Decorator."""
148+
"""
149+
This wrapper goes around a test that takes a single argument n. That test should
150+
be a generator function that yields a descriptor followed
151+
n different objects corresponding to the name of the test function. Hence
152+
a test named `test_release_foo_bar` would yield a descriptor followed by
153+
n FooBar objects.
154+
155+
The descriptor is a dictionary with three fields, each optional.
156+
In a typical situation, there will be `n` FooBar object after the test, and after
157+
releasing, there will be zero. However, sometimes there are auxiliary objects,
158+
in which case its necessary to provide one or more fields.
159+
160+
The keys "expected_counts_after_create" and "expected_counts_after_release" each have
161+
as their value a sub-dictionary giving the number of still-alive WGPU objects.
162+
The key "expected_counts_after_create" gives the expected state after the
163+
n objects have been created and put into a list; "expected_counts_after_release"
164+
gives the state after the n objects have been released.
165+
166+
These sub-dictionaries have as their keys the names of WGPU object types, and
167+
their value is a tuple of two integers: the first is the number of Python objects
168+
expected to exist and the second is the number of native objects. Any type not in
169+
the subdictionary has an implied value of (0, 0).
170+
171+
The key "ignore" has as its value a collection of object types that we should ignore
172+
in this test. Ideally we should not use this, but currently there are a few cases where
173+
we cannot reliably predict the number of objects in wgpu-native.
174+
175+
If the descriptor doesn't contain an "expected_counts_after_create", then the default
176+
is {"FooBar": (n, n)}, where "FooBar" is derived from the name of the test.
177+
178+
If the descriptor doesn't contain an "expected_counts_after_release", then the
179+
default is {}, indicated that creating and removing the objects should completely
180+
clean itself up.
181+
"""
149182

150183
def core_test_func():
151184
"""The core function that does the testing."""

0 commit comments

Comments
 (0)