Replies: 2 comments
-
Quantization inherently comes with its set of trade-offs. While it can offer improved performance, such as faster inference and a reduced memory footprint, it often sacrifices accuracy since each neuron has a limited range of states it can represent. More advanced quantization techniques can mitigate some of these limitations but can't entirely eliminate them. |
Beta Was this translation helpful? Give feedback.
-
In that case for at least to speech recognition quantized AI models are useless if even 8 bit is so much degraded ( look at the text grabbed differences are huge ) ... how bad must be for 5 bit or wore like 4 bit. |
Beta Was this translation helpful? Give feedback.
-
As my tests show full ggml-large.bin vs ggml-large_q8.bin <- even 8bit is much worse than full model
Testes with the newest source whisper.cpp built for CPU only - windows 11
.\main.exe -m ggml-large.bin -f OUTPUT.WAV -t 28 -pc --prompt music <- proper in 99% grabbed text, almost perfect - audio is very bad quality from old vinyl
.\main.exe -m ggml-large_q8.bin -f OUTPUT.WAV -t 28 -pc --prompt music <-- q_8 version of that model ... much worse output ... why so bad? q_8 should be very close to fp16 quality ?? ...
wtf.
Why even q_8 is so bad comparing to full fp16 model ??
Beta Was this translation helpful? Give feedback.
All reactions