Skip to content

How do I disable requests to huggingface.co:443 after training? #590

Open
@AdrianSchneble

Description

@AdrianSchneble

I'm currently evaluating setfit in a proof of concept situation. Unfortunately, I'm working behind a company firewall, where I do not have access to the world wide web, only to company-internal URLs.

That's a bit annoying in terms of downloading models, but I can work around that. More importantly, it seems there are calls to huggingface.co:443 after the training is done, which obviously cannot succeed due to the blocked internet access.
That wouldn't be big problem if the timeout were 1 minute or so, but it seems to be more like 5-10 minutes, which is a lot of time wasted just waiting for the results.

How can I disable these blocking HTTP requests?

My minimal training pipeline looks somewhat like this (shortened for readability, especially data loading):

model = SetFitModel.from_pretrained(
    "/local/path/local-bge-small-en-v1.5",
    local_files_only=True,
    multi_target_strategy="multi-output",
)
train_dataset, test_dataset = a_bunch_of_loading_and_sampling_code_thats_irrelevant_here()
args = TrainingArguments(
    batch_size=128,
    num_epochs=10,
    report_to=None
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    metric="f1",
    callbacks=None,
    column_mapping={"column": "mapping"},
    metric_kwargs={"average": "samples"}
)
trainer.train()

After all training steps are done, I get the following console logs:

INFO:sentence_transformers.trainer:Saving model checkpoint to checkpoints/checkpoint-258
INFO:sentence_transformers.SentenceTransformer:Save model to checkpoints/checkpoint-258
Request [id]: GET https://huggingface.co/api/models/setfit-test/local-bge-small-en-v1.5 (authenticated: False)
DEBUG:huggingface_hub.utils._http:Request [id]: GET https://huggingface.co/api/models/setfit-test/local-bge-small-en-v1.5 (authenticated: False)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443

Then nothing happens for about 10 minutes, before I get a "Batches: 100% [tqdm progress bar]", which is however finished almost immediately.

Is there any parameter I can set to disable this call to huggingface? "report_to=None" or "callbacks=None" don't seem to do the trick.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions