-
Notifications
You must be signed in to change notification settings - Fork 280
SVE microbenchmarks with string operations #4841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Microbenchmarking tests on string operations (len, indexof, cmp) to compare the runtimes across scalar, Vector128 and SVE implementations.
@a74nh @kunalspathak @dotnet/arm64-contrib |
@dotnet-policy-service agree company="Arm" |
Question for the maintainers: Is microbenchmarks the right place for these? Microbenchmarks feels maybe too "small" but it doesn't fit into "real world". "Loops" or "vectorisation" would be a better category, but it doesn't exist. |
Thanks @jacob-crawley for coming up with this. I am wondering if you got a chance to verify the performance behavior of |
src/benchmarks/micro/sve/StrCmp.cs
Outdated
[Benchmark] | ||
public unsafe long SveStrCmp() | ||
{ | ||
int i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to add a check of Sve.IsSupported
here because we do have machines in perf lab that doesn't support SVE. @LoopedBard3 or @caaavik-msft can suggest right way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added these checks to the Sve tests
[Benchmark]
public unsafe long SveStrCmpTail()
{
if (Sve.IsSupported)
{
If there's a way of doing these checks as a filter before running the benchmarks please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamsitnik do you have any recommendations for this type of filtering?
Include a 'Sve.IsSupported' check on each SVE benchmark so they still run on machines that dont have support for SVE.
@caaavik-msft ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering the benchmarks with SVE enabled could be done by something similar to what we have here:
Defining a custom config type that is applied via an attribute: https://github.com/dotnet/BenchmarkDotNet/blob/ee248c319919ac112eb908394f1e941b78ca6a28/samples/BenchmarkDotNet.Samples/IntroFilters.cs#L9-L20
The filter could looks similar to this (I have not tested it):
[Config(typeof(Config))]
public class IntroFilters
{
private class Config : ManualConfig
{
public Config()
{
AddFilter(new SimpleFilter(_ => Sve.IsSupported));
}
}
}
for (int i = 0; i < Size; i++) | ||
{ | ||
if (_arr1[i] != _arr2[i] ) | ||
return _arr1[i] - _arr2[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks that are part of this repo are used to determine whether there is any performance regression in the .NET. Running this scalar benchmark every day multiple times would rather not catch any regression. So I would focus purely on the ones that use Sve
directly and indirectly (via Vector types if possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts...
- They are giving comparison point so we can easily show the advantage given by using intrinsics. Knowing an SVE loop is slightly slower than Vector128, but still massively faster than scalar I think is useful.
- If C# started to add loop optimisations / auto vectorisation then the gap between scalar and intrinsics will start to close.
- There are some loops (not in this PR) that cannot easily be optimised via vector128 (eg the partition used by a quicksort). For those we definitely want scalar versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @a74nh. The point of adding scalar version is not to catch any regression in that code, but compare the improvements we do using Vector128/Sve APIs.
Adds a custom filter to each benchmark class which means it can only be run if SVE is supported on the machine
Addressing upstream comments to renmae 'Scenario' param in StrCmp to 'Modify', and all scalar methos names have been renamed to 'Scalar()'.
The SVE .NET APIs were introduced in .NET 9 as an experimental feature.
Currently there are no microbenchmarks for testing the performance of these SVE features.
This PR introduces an initial set of microbenchmarks to measure the performance of SVE in comparison to scalar and Vector128 implementations using BenchmarkDotNet.
This commit includes benchmarks of the following string operations
Some tests contain two SVE implementations, those marked with 'Tail' at the end of the test name use fully populated vectors in each iteration with a scalar loop afterwards to compute any remaining values.
The purpose of these tests is to compare against the other SVE implementations to highlight the impact of using predicate vectors (that aren't all set to true) on performance.
Initial results of these tests (ran on cobalt-100) are as follows:
StrLen:
StrIndexof:
Str Cmp: