Never approximate `e^x` in transformer models #1237

bobqianic · 2023-09-01T15:45:06Z

bobqianic
Sep 1, 2023
Collaborator

A few days ago, I saw a method (A Fast, Compact Approximation of the Exponential Function) that could accelerate the calculation of e^x without losing more than 6% accuracy. The reason I wanted to speed up e^x is that I found that whisper_process_logits takes up 95% of the sample time under the default beam_size and best_of settings, of which 62% of the time is spent on the expf() function used in softmax calculations. Using the method mentioned in the paper, tests showed it can accelerate by 3.3X, reducing the single calculation time from 2.14ns to 0.645ns. The test platform was an i7-12700H. However, I found that this method is not applicable to Whisper; transformers are more sensitive than I thought. The quality will decrease as -bs and -bo increase, and sometimes large chunks of repeated content appear. So if you want to accelerate the transformer's softmax, try not to sacrifice the accuracy of the expf() function.

Before Correction:

static float fast_expf(float value) {
    static union {
        float f;
        int i;
    } cache;

    static float rln2 = 1 / logf(2.0f);

    cache.i = (1 << 23) * (value * rln2 + 127.0f);
    return cache.f;
}

After Correction:

static float fast_expf(float value) {
    static union {
        float f;
        int i;
    } cache;

    static float rln2 = 1 / logf(2.0f);

    cache.i = (1 << 23) * (value * rln2 + 126.938869425f);
    return cache.f;
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Never approximate `e^x` in transformer models #1237

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Never approximate e^x in transformer models #1237

bobqianic Sep 1, 2023 Collaborator

Before Correction:

After Correction:

Replies: 0 comments

Never approximate `e^x` in transformer models #1237

bobqianic
Sep 1, 2023
Collaborator