Never approximate e^x
in transformer models
#1237
bobqianic
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
A few days ago, I saw a method (A Fast, Compact Approximation of the Exponential Function) that could accelerate the calculation of
e^x
without losing more than6%
accuracy. The reason I wanted to speed upe^x
is that I found thatwhisper_process_logits
takes up95%
of the sample time under the defaultbeam_size
andbest_of
settings, of which62%
of the time is spent on theexpf()
function used insoftmax
calculations. Using the method mentioned in the paper, tests showed it can accelerate by3.3X
, reducing the single calculation time from2.14ns
to0.645ns
. The test platform was an i7-12700H. However, I found that this method is not applicable to Whisper; transformers are more sensitive than I thought. The quality will decrease as-bs
and-bo
increase, and sometimes large chunks of repeated content appear. So if you want to accelerate the transformer'ssoftmax
, try not to sacrifice the accuracy of theexpf()
function.Before Correction:
After Correction:
Beta Was this translation helpful? Give feedback.
All reactions