early stopping for training differentiable head

Dear setfit team, 

first, thank you so much for all your great work!

I have a question/feature suggestion regarding early stopping in setfit classifier training:

The status quo in setfit is: 

- `SetfitTrainer` allows using transformer's EarlyStoppingCallback (see https://huggingface.co/docs/setfit/en/how_to/callbacks) if combined with arguments `metric_for_best_model`, `greater_is_better`, and `load_best_model_at_end` in `TrainingArguments`
- Early stopping is applied _only_ in the embedding model fine-tuning stage (see the `train_embeddings()` method of `SetfitTrainer` instance).
- Early stopping is _not_ applied in the classification head training stage (see the `train_classifier()` method of `SetfitTrainer` instance which relies on `SetfitHead`'s `fit` method).

**My question/feature suggestion** is whether it'd be desirable to apply early stopping also in the classifier training step?

I think "yes" because I observed with the hard-coded number of classifier training epochs (I believe the current default in `TrainingArguments` is 16 epochs) can lead to overconfidence in the classifier's predicted probabilities, at least in medium-sized datasets. Of course, I could just lower the number of classifier training epochs. But what's the appropriate value? Hence early stopping.

What this would require:

- either (a) SetfitHead's `fit` would need to accept eval_x and eval_y to check for performance on dev/validation set or (b) the logic in SetfitTrainer's train_classifier would need to implement epoch-wise training and evaluation.
- `metric_for_best_model` and `greater_is_better` in `TrainingArguments` would need to accept tuples because different metrics will be used for early stopping during embedding model fine-tuning (e.g. `"embedding_loss"`) and (end-to-end) classifier training (e.g., `"f1"`).

Are there any methodological reasons that speak against my proposal that I'm missing? And is this something you'd consider including in a feature `setfit` release, @tomaarsen, assuming I'd and maybe others would contribute? 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

early stopping for training differentiable head #600

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

early stopping for training differentiable head #600

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions