Tests show full ggml-large.bin vs ggml-large_q8.bin <- even 8bit is much worse than full model #1302

mirek190 · 2023-09-17T11:51:26Z

mirek190
Sep 17, 2023

As my tests show full ggml-large.bin vs ggml-large_q8.bin <- even 8bit is much worse than full model

Testes with the newest source whisper.cpp built for CPU only - windows 11

.\main.exe -m ggml-large.bin -f OUTPUT.WAV -t 28 -pc --prompt music <- proper in 99% grabbed text, almost perfect - audio is very bad quality from old vinyl

[00:00:00.000 --> 00:00:03.480]   [MUSIC PLAYING]
[00:00:03.480 --> 00:00:06.960]   [MUSIC PLAYING]
[00:00:06.960 --> 00:00:10.440]   [MUSIC PLAYING]
[00:00:11.400 --> 00:00:14.880]   [MUSIC PLAYING]
[00:00:14.880 --> 00:00:18.360]   [MUSIC PLAYING]
[00:00:18.360 --> 00:00:21.840]   [MUSIC PLAYING]
[00:00:21.840 --> 00:00:25.320]   [MUSIC PLAYING]
[00:00:25.320 --> 00:00:27.600]   ÔÖ¬ whoa, whoa, whoa, whoa ÔÖ¬
[00:00:27.600 --> 00:00:31.320]   ÔÖ¬ Dum, dum, dum, dum, dum ÔÖ¬
[00:00:31.320 --> 00:00:33.220]   ÔÖ¬ Dum bidoo be dum ÔÖ¬
[00:00:33.220 --> 00:00:35.660]   ÔÖ¬ Dum, dum, dum, dum ÔÖ¬
[00:00:35.660 --> 00:00:39.860]   ÔÖ¬ Dum bidoo be dum dum, dum, dum ÔÖ¬
[00:00:39.860 --> 00:00:42.260]   ÔÖ¬ Dum bidoo be dum ÔÖ¬
[00:00:42.260 --> 00:00:44.420]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:00:44.420 --> 00:00:48.320]   ÔÖ¬ Dum, dum, dum, dum, dum ÔÖ¬
[00:00:48.320 --> 00:00:52.400]   ÔÖ¬ Dum bidoo be dum dum, dum, dum ÔÖ¬
[00:00:52.900 --> 00:00:56.900]   ÔÖ¬ Dum bidoo be dum dum, dum, dum ÔÖ¬
[00:00:56.900 --> 00:00:59.160]   ÔÖ¬ Dum bidoo be dum ÔÖ¬
[00:00:59.160 --> 00:01:01.320]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:01:01.320 --> 00:01:04.660]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:04.660 --> 00:01:06.820]   ÔÖ¬ Come and go with me ÔÖ¬
[00:01:06.820 --> 00:01:09.000]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:09.000 --> 00:01:11.160]   ÔÖ¬ Way beyond the sea ÔÖ¬
[00:01:11.160 --> 00:01:13.320]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:13.320 --> 00:01:16.500]   ÔÖ¬ Come and go with me ÔÖ¬
[00:01:16.500 --> 00:01:18.660]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:01:18.660 --> 00:01:21.620]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:21.800 --> 00:01:23.800]   ÔÖ¬ Come into my heart ÔÖ¬
[00:01:23.800 --> 00:01:25.960]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:25.960 --> 00:01:28.120]   ÔÖ¬ We will never part ÔÖ¬
[00:01:28.120 --> 00:01:30.300]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:30.300 --> 00:01:33.460]   ÔÖ¬ Come and go with me ÔÖ¬
[00:01:33.460 --> 00:01:35.620]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:01:35.620 --> 00:01:38.300]   ÔÖ¬ Yes, I love you ÔÖ¬
[00:01:38.300 --> 00:01:40.460]   ÔÖ¬ Yes, I really love you ÔÖ¬
[00:01:40.460 --> 00:01:42.620]   ÔÖ¬ Please stay now ÔÖ¬
[00:01:42.620 --> 00:01:44.800]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:01:44.800 --> 00:01:46.960]   ÔÖ¬ Yes, I love you ÔÖ¬
[00:01:46.960 --> 00:01:49.120]   ÔÖ¬ No, I'll never leave you ÔÖ¬
[00:01:49.120 --> 00:01:51.300]   ÔÖ¬ You're never gonna be a- ÔÖ¬
[00:01:51.460 --> 00:01:53.460]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:53.460 --> 00:01:55.620]   ÔÖ¬ Come into my heart ÔÖ¬
[00:01:55.620 --> 00:01:57.800]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:01:57.800 --> 00:01:59.960]   ÔÖ¬ We will never part ÔÖ¬
[00:01:59.960 --> 00:02:02.120]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:02:02.120 --> 00:02:04.300]   ÔÖ¬ Come and go with me ÔÖ¬
[00:02:04.300 --> 00:02:06.460]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:02:06.460 --> 00:02:08.620]   ÔÖ¬
[00:02:08.620 --> 00:02:10.620]   [music]
[00:02:10.620 --> 00:02:12.780]   [clapping]
[00:02:12.780 --> 00:02:14.960]   [music]
[00:02:14.960 --> 00:02:17.120]   [clapping]
[00:02:17.120 --> 00:02:19.280]   [music]
[00:02:19.280 --> 00:02:21.460]   [clapping]
[00:02:21.460 --> 00:02:23.620]   [music]
[00:02:23.620 --> 00:02:25.780]   [clapping]
[00:02:25.780 --> 00:02:27.960]   ÔÖ¬ Yes, I love you ÔÖ¬
[00:02:27.960 --> 00:02:30.120]   ÔÖ¬ Yes, I really love you ÔÖ¬
[00:02:30.120 --> 00:02:32.280]   ÔÖ¬ Please stay now ÔÖ¬
[00:02:32.280 --> 00:02:34.460]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:02:34.460 --> 00:02:36.620]   ÔÖ¬ Yes, I love you ÔÖ¬
[00:02:36.780 --> 00:02:38.780]   ÔÖ¬ No, I'll never leave you ÔÖ¬
[00:02:38.780 --> 00:02:40.960]   ÔÖ¬ You're never gonna be a- ÔÖ¬
[00:02:40.960 --> 00:02:43.120]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:02:43.120 --> 00:02:45.280]   ÔÖ¬ Come into my heart ÔÖ¬
[00:02:45.280 --> 00:02:47.460]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:02:47.460 --> 00:02:49.620]   ÔÖ¬ We will never part ÔÖ¬
[00:02:49.620 --> 00:02:51.780]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:02:51.780 --> 00:02:53.960]   ÔÖ¬ Come and go with me ÔÖ¬
[00:02:53.960 --> 00:02:56.120]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:02:56.120 --> 00:02:58.280]   ÔÖ¬ Whoa, whoa, whoa, whoa ÔÖ¬
[00:02:58.280 --> 00:03:00.460]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:03:00.460 --> 00:03:02.620]   ÔÖ¬ Come and go with me ÔÖ¬
[00:03:02.620 --> 00:03:04.780]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:03:04.960 --> 00:03:06.960]   ÔÖ¬ Dum, dum, dum, dum ÔÖ¬
[00:03:06.960 --> 00:03:09.120]   ÔÖ¬ Dum, little darling ÔÖ¬
[00:03:09.120 --> 00:03:10.720]   (fire crackling)

.\main.exe -m ggml-large_q8.bin -f OUTPUT.WAV -t 28 -pc --prompt music <-- q_8 version of that model ... much worse output ... why so bad? q_8 should be very close to fp16 quality ?? ...
wtf.

[00:00:00.000 --> 00:00:03.480]   [MUSIC PLAYING]
[00:00:03.480 --> 00:00:06.960]   [MUSIC PLAYING]
[00:00:06.960 --> 00:00:10.440]   [MUSIC PLAYING]
[00:00:11.320 --> 00:00:14.800]   [MUSIC PLAYING]
[00:00:14.800 --> 00:00:18.280]   [MUSIC PLAYING]
[00:00:18.280 --> 00:00:21.760]   [MUSIC PLAYING]
[00:00:21.760 --> 00:00:25.240]   [MUSIC PLAYING]
[00:00:25.680 --> 00:00:29.160]   [MUSIC PLAYING]
[00:00:29.160 --> 00:00:32.640]   [MUSIC PLAYING]
[00:00:32.640 --> 00:00:35.640]   [MUSIC PLAYING]
[00:00:35.640 --> 00:00:39.120]   [MUSIC PLAYING]
[00:00:39.120 --> 00:00:42.880]   ÔÖ¬ Dum dum dum dum-de-do de dum ÔÖ¬
[00:00:42.880 --> 00:00:46.080]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:00:46.080 --> 00:00:50.960]   ÔÖ¬ Dum dum dum dum-de-do de dum ÔÖ¬
[00:00:50.960 --> 00:00:55.240]   ÔÖ¬ Dum dum dum dum-de-do de dum ÔÖ¬
[00:00:55.240 --> 00:01:00.080]   ÔÖ¬ Dum dum dum dum-de-do de dum ÔÖ¬
[00:01:00.080 --> 00:01:03.400]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:01:03.400 --> 00:01:05.860]   ÔÖ¬ Come here darlin' ÔÖ¬
[00:01:05.860 --> 00:01:08.060]   ÔÖ¬ Come and go with me ÔÖ¬
[00:01:08.060 --> 00:01:10.000]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:01:10.000 --> 00:01:12.360]   ÔÖ¬ Way beyond the sea ÔÖ¬
[00:01:12.360 --> 00:01:14.280]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:01:14.280 --> 00:01:17.280]   ÔÖ¬ Come and go with me ÔÖ¬
[00:01:17.280 --> 00:01:20.240]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:01:20.240 --> 00:01:22.720]   ÔÖ¬ Come here darlin' ÔÖ¬
[00:01:22.720 --> 00:01:25.040]   ÔÖ¬ Come into my heart ÔÖ¬
[00:01:25.040 --> 00:01:27.040]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:01:27.040 --> 00:01:29.320]   ÔÖ¬ We will never part ÔÖ¬
[00:01:29.320 --> 00:01:31.240]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:01:31.240 --> 00:01:34.080]   ÔÖ¬ Come and go with me ÔÖ¬
[00:01:34.080 --> 00:01:37.000]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:01:37.000 --> 00:01:39.040]   ÔÖ¬ Yes I love you ÔÖ¬
[00:01:39.040 --> 00:01:41.120]   ÔÖ¬ Yes I really love you ÔÖ¬
[00:01:41.120 --> 00:01:42.720]   ÔÖ¬ Please stay now ÔÖ¬
[00:01:42.720 --> 00:01:45.160]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:01:45.160 --> 00:01:47.160]   ÔÖ¬ Yes I love you ÔÖ¬
[00:01:47.160 --> 00:01:49.840]   ÔÖ¬ No I'll never leave you ÔÖ¬
[00:01:49.840 --> 00:01:53.520]   ÔÖ¬ You never gonna be a tear ÔÖ¬
[00:01:53.520 --> 00:01:55.920]   ÔÖ¬ Come here darlin' ÔÖ¬
[00:01:55.920 --> 00:01:58.240]   ÔÖ¬ Come into my heart ÔÖ¬
[00:01:58.240 --> 00:02:00.160]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:02:00.160 --> 00:02:02.440]   ÔÖ¬ We will never part ÔÖ¬
[00:02:02.440 --> 00:02:04.400]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:02:04.400 --> 00:02:07.320]   ÔÖ¬ Come and go with me ÔÖ¬
[00:02:07.320 --> 00:02:10.160]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:02:10.160 --> 00:02:28.760]   ÔÖ¬ Yes I love you ÔÖ¬
[00:02:28.760 --> 00:02:30.880]   ÔÖ¬ Yes I really love you ÔÖ¬
[00:02:30.880 --> 00:02:32.480]   ÔÖ¬ Please stay now ÔÖ¬
[00:02:32.480 --> 00:02:35.040]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:02:35.040 --> 00:02:37.040]   ÔÖ¬ Yes I love you ÔÖ¬
[00:02:37.040 --> 00:02:39.720]   ÔÖ¬ No I'll never leave you ÔÖ¬
[00:02:39.720 --> 00:02:43.440]   ÔÖ¬ You never gonna be a tear ÔÖ¬
[00:02:43.440 --> 00:02:45.960]   ÔÖ¬ Come here darlin' ÔÖ¬
[00:02:45.960 --> 00:02:48.200]   ÔÖ¬ Come into my heart ÔÖ¬
[00:02:48.200 --> 00:02:50.160]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:02:50.160 --> 00:02:52.400]   ÔÖ¬ We will never part ÔÖ¬
[00:02:52.400 --> 00:02:54.320]   ÔÖ¬ Oh darlin' ÔÖ¬
[00:02:54.320 --> 00:02:57.040]   ÔÖ¬ Come and go with me ÔÖ¬
[00:02:57.040 --> 00:03:00.160]   ÔÖ¬ Wah wah wah wah ÔÖ¬
[00:03:00.160 --> 00:03:03.280]   ÔÖ¬ Come, come, come, come, come ÔÖ¬
[00:03:03.280 --> 00:03:07.800]   ÔÖ¬ Come, little bit, come, come, come, come ÔÖ¬
[00:03:07.800 --> 00:03:10.540]   (fire crackling)

Why even q_8 is so bad comparing to full fp16 model ??

bobqianic · 2023-09-18T08:16:27Z

bobqianic
Sep 18, 2023
Collaborator

Quantization inherently comes with its set of trade-offs. While it can offer improved performance, such as faster inference and a reduced memory footprint, it often sacrifices accuracy since each neuron has a limited range of states it can represent. More advanced quantization techniques can mitigate some of these limitations but can't entirely eliminate them.

0 replies

mirek190 · 2023-09-18T10:26:16Z

mirek190
Sep 18, 2023
Author

In that case for at least to speech recognition quantized AI models are useless if even 8 bit is so much degraded ( look at the text grabbed differences are huge ) ... how bad must be for 5 bit or wore like 4 bit.
8 bit version should have very small perplexity like 0.001 or less.
Maybe llm models are more forgivable than those models or is something wrong made with quantization for those speech recognition models ....

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests show full ggml-large.bin vs ggml-large_q8.bin <- even 8bit is much worse than full model #1302

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Tests show full ggml-large.bin vs ggml-large_q8.bin <- even 8bit is much worse than full model #1302

mirek190 Sep 17, 2023

Replies: 2 comments

bobqianic Sep 18, 2023 Collaborator

mirek190 Sep 18, 2023 Author

mirek190
Sep 17, 2023

bobqianic
Sep 18, 2023
Collaborator

mirek190
Sep 18, 2023
Author