-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Behaviour of batch_eval_metrics
determines the include_for_metrics
behaviour
#37683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report ! Can you share a minimal reproducer ? This is a bit strange since we are passing inputs_host which comes from inputs_decode and we should only select main_input_name = getattr(self.model, "main_input_name", "input_ids")
inputs_decode = (
self._prepare_input(inputs[main_input_name]) if "inputs" in args.include_for_metrics else None
) if self.args.batch_eval_metrics:
if self.compute_metrics is not None and preds_host is not None and labels_host is not None:
is_last_step = self.accelerator.gradient_state.end_of_dataloader
batch_kwargs = {}
batch_kwargs["losses"] = losses_host if "loss" in args.include_for_metrics else None
batch_kwargs["inputs"] = inputs_host if "inputs" in args.include_for_metrics else None |
In the L4351 , the On the other hand, in L4388 the code is |
Indeed, we should pass |
@SunMarc Aha! I was hoping you wouldn't say that we use |
Hello, This discrepancy seems to lead to another issue when using distributed training: In L436, gather is done on only |
System Info
Hello!
In the
evaluation_loop
method, there is an interplay between thebatch_eval_metrics
andinclude_for_metrics
arguments. Wheninclude_for_metrics
is set toinputs
, what is sent tocompute_metrics
is theall_inputs
object which only contains theinput_ids
) whereas whenbatch_eval_metrics
is set toTrue
,compute_metrics
is called withinputs
which includes theinput_ids
and also any other inputs that are passed to the model.This behaviour is inconsistent. Can you please look into this?
@zach-huggingface @SunMarc
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Any trainer example with both
should cause this.
Expected behavior
Ideally,
batch_eval_metrics
shouldn't dictate how thecompute_metrics
is called. Can this inconsistency be fixed?The text was updated successfully, but these errors were encountered: