Description
Dear setfit team,
first, thank you so much for all your great work!
I have a question/feature suggestion regarding early stopping in setfit classifier training:
The status quo in setfit is:
SetfitTrainer
allows using transformer's EarlyStoppingCallback (see https://huggingface.co/docs/setfit/en/how_to/callbacks) if combined with argumentsmetric_for_best_model
,greater_is_better
, andload_best_model_at_end
inTrainingArguments
- Early stopping is applied only in the embedding model fine-tuning stage (see the
train_embeddings()
method ofSetfitTrainer
instance). - Early stopping is not applied in the classification head training stage (see the
train_classifier()
method ofSetfitTrainer
instance which relies onSetfitHead
'sfit
method).
My question/feature suggestion is whether it'd be desirable to apply early stopping also in the classifier training step?
I think "yes" because I observed with the hard-coded number of classifier training epochs (I believe the current default in TrainingArguments
is 16 epochs) can lead to overconfidence in the classifier's predicted probabilities, at least in medium-sized datasets. Of course, I could just lower the number of classifier training epochs. But what's the appropriate value? Hence early stopping.
What this would require:
- either (a) SetfitHead's
fit
would need to accept eval_x and eval_y to check for performance on dev/validation set or (b) the logic in SetfitTrainer's train_classifier would need to implement epoch-wise training and evaluation. metric_for_best_model
andgreater_is_better
inTrainingArguments
would need to accept tuples because different metrics will be used for early stopping during embedding model fine-tuning (e.g."embedding_loss"
) and (end-to-end) classifier training (e.g.,"f1"
).
Are there any methodological reasons that speak against my proposal that I'm missing? And is this something you'd consider including in a feature setfit
release, @tomaarsen, assuming I'd and maybe others would contribute?