Skip to content

Very poor performance from std::*_distribution #144237

Open
@Disservin

Description

@Disservin

Fortunately no short godbolt reproducer as of now, only the entire source https://godbolt.org/z/6TMYYsW3e

While working on the C++ data loader of
https://github.com/official-stockfish/nnue-pytorch/blob/master/training_data_loader.cpp
I noticed a 2x performance difference between latest clang and gcc.

Running a perf profile on this showed __ieee754_logl at the very top which is no where to be seen with gcc, assuming this function somehow didn't get properly optimized ?
Taking a look at the flamegraph shows it comes from the std::bernoulli_distribution seemingly any *_distribution call.

https://github.com/official-stockfish/nnue-pytorch/blob/e1f4c5fbd50b37b4f5315f5b364b502c061a8576/training_data_loader.cpp#L922

Image
Image

https://godbolt.org/z/6TMYYsW3e

I haven't been able to create a small standalone example as of yet which reproduces this, so if someone wants to compile the above example, then get the file from godbolt and run

clang++ -march=native test.cpp -O3 -o loader && ./loader test77-jan2022-2tb7p.high-simple-eval-1k.min-v2.binpack
The mentioned file can be downloaded from here https://huggingface.co/datasets/official-stockfish/master-smallnet-binpacks/tree/main

If you compile directly with libc++ instead of libstdc++, the program will be another 1.5x slower

clang++-21 libc++ 10.0457s
clang++-21 libstdc++ 5.43586s
g++-15 3.56669s

Metadata

Metadata

Assignees

No one assigned

    Labels

    libc++libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.performancerandom

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions