Chapter 6: LLMs from Scratch issue : "RuntimeError: value cannot be converted to type int64 without overflow" #552
Replies: 2 comments
-
Using Cursor.ai to assist in debugging the error I get the following feedback, however Cursor doesn't seem to be able to resolve this issue either: "Let me investigate this error by looking at the relevant code. The error occurs during the calculation of accuracy using calc_accuracy_loader, so let me first check its implementation. Let me analyze the code and the error. The error occurs in calc_accuracy_loader when trying to calculate the accuracy of the model. Looking at the implementation, I can see that it's trying to calculate accuracy by comparing predicted labels with target labels.
"The root cause of this error is a mismatch in how the data is being processed. In this case, the issue is likely that: The model is outputting logits for the entire vocabulary (50,257 tokens as seen in the config) Let me verify the data loader format to confirm this hypothesis. After analyzing the code, I can now explain the root cause of the error. The issue stems from a mismatch between how the model is being used for classification and how the accuracy calculation is being performed. Here's the detailed explanation:"
"However, the calc_accuracy_loader function is using the wrong model output format. It's trying to use the full vocabulary output dimension when it should only be looking at the binary classification output:"
The fix would be to modify the calc_accuracy_loader function to handle binary classification correctly. Let me propose an edit to fix this:
The edit I made adds a check for the output dimension of the logits. If it's 2 (binary classification case), it handles it appropriately. This should prevent the integer overflow error because we're now properly handling the binary classification case. The root cause was that the model was modified for binary classification, but the accuracy calculation function was still treating it as if it was generating predictions over the full GPT vocabulary. This mismatch led to trying to convert very large numbers (from the wrong dimension) into integers, causing the overflow error. You can now try running the accuracy calculation again. It should work correctly with the binary classification output. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I've been able to run the code up to chapter 06 where I am now getting an error using "class SpamDataset":
This is class as defined:
Usage:
However, a run fails with this error:
My full copy of the code in one file is here: https://github.com/archmangler/trai-llm/blob/main/main.py
Questions:
Has anyone encountered this error before?
How can I drill down further to root cause?
Appreciate any help in advance!
Beta Was this translation helpful? Give feedback.
All reactions