-
Notifications
You must be signed in to change notification settings - Fork 568
[Llama 3.1] Updates MLLOG tags #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
@Elnifio can you verify that the logs from this reference indeed pass the latest checkers? |
@@ -124,11 +124,11 @@ def log_metrics(self, metrics, step): | |||
assert step_time <= self.train_step_time_atol, f"Logged train step time ({step_time}) is slower than tolerable ({self.train_step_time_atol}). " | |||
|
|||
def log_validation_loss(self, metrics, step): | |||
consumed_tokens = (step - self.init_global_step) * self.gbs * self.seq_len | |||
consumed_tokens = step * self.gbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that be called consumed_samples
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the latest commit. Thanks for catching that!
@@ -202,12 +202,12 @@ def on_train_end(self, trainer, pl_module): | |||
|
|||
@rank_zero_only | |||
def on_validation_start(self, trainer, pl_module): | |||
mllogger.end(key=constants.BLOCK_STOP, metadata={'epoch_num': self.consumed_tokens(trainer)}) | |||
mllogger.start(key=constants.EVAL_START, metadata={'epoch_num': self.consumed_tokens(trainer)}) | |||
mllogger.end(key=constants.BLOCK_STOP, metadata={constants.SAMPLES_COUNT: self.consumed_tokens(trainer)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure where self.consumed_tokens()
is defined, but does it return tokens or samples? If tokens, we need to switch it to samples. If samples, we need to rename it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the latest commit.
@ShriyaPalsamudram @mmarcinkiewicz the latest run fails the RCP checker in the following test:
|
Further updating on this comment - I have ran the RCP checker against the latest compliance checker from PR 414 and it passed without any errors / warnings, so I'd say both PRs are ready to merge. |
No description provided.