-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Enhancing indexOfDiff
efficiency in large input slices
#24097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MINGtoMING
wants to merge
9
commits into
ziglang:master
Choose a base branch
from
MINGtoMING:opt-index-of-diff
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ad384f4
to
dcf6ff4
Compare
b3ad9d8
to
ed77dfe
Compare
AArch64 Processor rev 0 (aarch64) vendor Kirin820 fn/T/len elapsed
---------------------------------------
indexOfDiff_V1/u8/5 19ns
indexOfDiff_V2/u8/5 21ns
std.mem.eql/u8/5 9ns
---------------------------------------
indexOfDiff_V1/u8/10 31ns
indexOfDiff_V2/u8/10 16ns
std.mem.eql/u8/10 9ns
---------------------------------------
indexOfDiff_V1/u8/20 55ns
indexOfDiff_V2/u8/20 27ns
std.mem.eql/u8/20 13ns
---------------------------------------
indexOfDiff_V1/u8/50 129ns
indexOfDiff_V2/u8/50 39ns
std.mem.eql/u8/50 26ns
---------------------------------------
indexOfDiff_V1/u8/100 254ns
indexOfDiff_V2/u8/100 52ns
std.mem.eql/u8/100 41ns
---------------------------------------
indexOfDiff_V1/u8/1000 2.667us
indexOfDiff_V2/u8/1000 359ns
std.mem.eql/u8/1000 356ns
---------------------------------------
indexOfDiff_V1/u8/10000 15.768us
indexOfDiff_V2/u8/10000 3.449us
std.mem.eql/u8/10000 3.591us
---------------------------------------
indexOfDiff_V1/u32/5 7ns
indexOfDiff_V2/u32/5 14ns
std.mem.eql/u32/5 7ns
---------------------------------------
indexOfDiff_V1/u32/10 14ns
indexOfDiff_V2/u32/10 36ns
std.mem.eql/u32/10 10ns
---------------------------------------
indexOfDiff_V1/u32/20 27ns
indexOfDiff_V2/u32/20 20ns
std.mem.eql/u32/20 15ns
---------------------------------------
indexOfDiff_V1/u32/50 66ns
indexOfDiff_V2/u32/50 44ns
std.mem.eql/u32/50 38ns
---------------------------------------
indexOfDiff_V1/u32/100 132ns
indexOfDiff_V2/u32/100 72ns
std.mem.eql/u32/100 68ns
---------------------------------------
indexOfDiff_V1/u32/1000 1.363us
indexOfDiff_V2/u32/1000 1.038us
std.mem.eql/u32/1000 980ns
---------------------------------------
indexOfDiff_V1/u32/10000 11.881us
indexOfDiff_V2/u32/10000 5.994us
std.mem.eql/u32/10000 6.089us
---------------------------------------
indexOfDiff_V1/u128/5 10ns
indexOfDiff_V2/u128/5 17ns
std.mem.eql/u128/5 12ns
---------------------------------------
indexOfDiff_V1/u128/10 18ns
indexOfDiff_V2/u128/10 28ns
std.mem.eql/u128/10 31ns
---------------------------------------
indexOfDiff_V1/u128/20 35ns
indexOfDiff_V2/u128/20 50ns
std.mem.eql/u128/20 45ns
---------------------------------------
indexOfDiff_V1/u128/50 90ns
indexOfDiff_V2/u128/50 115ns
std.mem.eql/u128/50 113ns
---------------------------------------
indexOfDiff_V1/u128/100 204ns
indexOfDiff_V2/u128/100 232ns
std.mem.eql/u128/100 228ns
---------------------------------------
indexOfDiff_V1/u128/1000 2.279us
indexOfDiff_V2/u128/1000 2.45us
std.mem.eql/u128/1000 2.323us
---------------------------------------
indexOfDiff_V1/u128/10000 26.107us
indexOfDiff_V2/u128/10000 25.562us
std.mem.eql/u128/10000 25.135us |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
The previous
std.mem.indexOfDiff
was implemented with a naive while loop, whose performance relies on the compiler's auto-vectorization. When processing large input data, it fails to fully utilize the CPU's capabilities, resulting in longer execution times. Therefore, I attempted to optimizestd.mem.indexOfDiff
by referencingstd.mem.eql
to better leverage CPU performance.The optimization strategy is as follows:
shortest < @sizeOf(usize)
: Use a while loop.shortest <= @sizeOf(usize) * 2 or vec_len == 0
: Use SWAR.shortest < vec_len
: Choose a smaller but appropriate vector length.shortest >= vec_len
: Use that vector length and perform loop unrolling.Benchmark
indexOfDiff(V1)
:std.mem.indexOfDiff
indexOfDiff(V2)
: cur impleql
:std.mem.eql
cpu: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx