-
Notifications
You must be signed in to change notification settings - Fork 17
Update dependency huggingface_hub to v0.33.0 #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: konflux-poc
Are you sure you want to change the base?
Update dependency huggingface_hub to v0.33.0 #60
Conversation
999830e
to
6628110
Compare
Pull Request Test Coverage Report for Build 15505390404Details
💛 - Coveralls |
6628110
to
73043c0
Compare
73043c0
to
3a085d9
Compare
Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
3a085d9
to
985fb42
Compare
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Join our Discord community for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
This PR contains the following updates:
==0.23.4
->==0.33.0
Warning
Some dependencies could not be looked up. Check the warning logs for more information.
Release Notes
huggingface/huggingface_hub (huggingface_hub)
v0.33.0
: [v0.33.0]: Welcoming Featherless.AI and Groq as Inference Providers!Compare Source
⚡ New provider: Featherless.AI
Featherless AI is a serverless AI inference provider with unique model loading and GPU orchestration abilities that makes an exceptionally large catalog of models available for users. Providers often offer either a low cost of access to a limited set of models, or an unlimited range of models with users managing servers and the associated costs of operation. Featherless provides the best of both worlds offering unmatched model range and variety but with serverless pricing. Find the full list of supported models on the models page.
⚡ New provider: Groq
At the heart of Groq's technology is the Language Processing Unit (LPU™), a new type of end-to-end processing unit system that provides the fastest inference for computationally intensive applications with a sequential component, such as Large Language Models (LLMs). LPUs are designed to overcome the limitations of GPUs for inference, offering significantly lower latency and higher throughput. This makes them ideal for real-time AI applications.
Groq offers fast AI inference for openly-available models. They provide an API that allows developers to easily integrate these models into their applications. It offers an on-demand, pay-as-you-go model for accessing a wide range of openly-available LLMs.
🤖 MCP and Tiny-agents
It is now possible to run tiny-agents using a local server e.g. llama.cpp. 100% local agents are right behind the corner!
Fixing some DX issues in the
tiny-agents
CLI.tiny-agents
cli exit issues by @Wauplin in #3125📚 Documentation
New translation from the Hindi-speaking community, for the community!
🛠️ Small fixes and maintenance
😌 QoL improvements
🐛 Bug and typo fixes
🏗️ internal
Significant community contributions
The following contributors have made significant changes to the library over the last release:
v0.32.6
: [v0.32.6] [Upload large folder] fix for wrongly saved upload_mode/remote_oidCompare Source
Full Changelog: huggingface/huggingface_hub@v0.32.5...v0.32.6
v0.32.5
: [v0.32.5] [Tiny-Agents] inject environment variables in headersCompare Source
Full Changelog: huggingface/huggingface_hub@v0.32.4...v0.32.5
v0.32.4
: [v0.32.4]: Bug fixes intiny-agents
, and fix input handling for question-answering task.Compare Source
Full Changelog: huggingface/huggingface_hub@v0.32.3...v0.32.4
This release introduces bug fixes to
tiny-agents
andInferenceClient.question_answering
:asyncio.wait()
does not accept bare coroutines #3135 by @hanouticelinav0.32.3
: [v0.32.3]: Handle env variables intiny-agents
, better CLI exit and handling of MCP tool calls argumentsCompare Source
Full Changelog: huggingface/huggingface_hub@v0.32.2...v0.32.3
This release introduces some improvements and bug fixes to
tiny-agents
:tiny-agents
cli exit issues #3125v0.32.2
: [v0.32.2]: Add endpoint support in Tiny-Agent + fixsnapshot_download
on large reposCompare Source
Full Changelog: huggingface/huggingface_hub@v0.32.1...v0.32.2
v0.32.1
: [v0.32.1]: hot-fix: Fix tiny agents on WindowsCompare Source
Patch release to fix #3116
Full Changelog: huggingface/huggingface_hub@v0.32.0...v0.32.1
v0.32.0
: [v0.32.0]: MCP Client, Tiny Agents CLI and more!Compare Source
🤖 Powering LLMs with Tools: MCP Client & Tiny Agents CLI
✨ The
huggingface_hub
library now includes an MCP Client, designed to empower Large Language Models (LLMs) with the ability to interact with external Tools via Model Context Protocol (MCP). This client extends theInfrenceClient
and provides a seamless way to connect LLMs to both local and remote tool servers!In the following example, we use the Qwen/Qwen2.5-72B-Instruct model via the Nebius inference provider. We then add a remote MCP server, in this case, an SSE server which makes the Flux image generation tool available to the LLM:
For even simpler development, we now also offer a higher-level
Agent
class. These 'Tiny Agents' simplify creating conversational Agents by managing the chat loop and state, essentially acting as a user-friendly wrapper aroundMCPClient
. It's designed to be a simple while loop built right on top of an MCPClient.You can run these Agents directly from the command line:
You can run these Agents using your own local configs or load them directly from the Hugging Face dataset tiny-agents.
This is an early version of the
MCPClient
, and community contributions are welcome 🤗InferenceClient
is also aMCPClient
by @julien-c in #2986⚡ Inference Providers
Thanks to @diadorer, feature extraction (embeddings) inference is now supported with Nebius provider!
We’re thrilled to introduce Nscale as an official inference provider! This expansion strengthens the Hub as the go-to entry point for running inference on open-weight models 🔥
We also fixed compatibility issues with structured outputs across providers by ensuring the
InferenceClient
follows the OpenAI API specs structured output.💾 Serialization
We've introduced a new
@strict
decorator for dataclasses, providing robust validation capabilities to ensure data integrity both at initialization and during assignment. Here is a basic example:This feature also includes support for custom validators, class-wise validation logic, handling of additional keyword arguments, and automatic validation based on type hints. Documentation can be found here.
@strict
decorator for dataclass validation by @Wauplin in #2895This release brings also support for
DTensor
in_get_unique_id
/get_torch_storage_size
helpers, allowingtransformers
to seamlessly usesave_pretrained
withDTensor
.✨ HF API
When creating an Endpoint, the default for
scale_to_zero_timeout
is nowNone
, meaning endpoints will no longer scale to zero by default unless explicitly configured.We've also introduced experimental helpers to manage OAuth within FastAPI applications, bringing functionality previously used in Gradio to a wider range of frameworks for easier integration.
📚 Documentation
We now have much more detailed documentation for Inference! This includes more detailed explanations and examples to clarify that the
InferenceClient
can also be effectively used with local endpoints (llama.cpp, vllm, MLX..etc).🛠️ Small fixes and maintenance
😌 QoL improvements
api.endpoint
to arguments for_get_upload_mode
by @matthewgrossman in #3077🐛 Bug and typo fixes
read()
by @lhoestq in #3080🏗️ internal
hf-xet
optional by @hanouticelina in #3079Community contributions
huggingface-cli repo create
command by @Wauplin in #3094Significant community contributions
The following contributors have made significant changes to the library over the last release:
v0.31.4
: [v0.31.4]: strict dataclasses, supportDTensor
saving & some bug fixesCompare Source
This release includes some new features and bug fixes:
strict
decorators for runtime dataclass validation with custom and type-based checks. by @Wauplin in https://github.com/huggingface/huggingface_hub/pull/2895.DTensor
support to_get_unique_id
/get_torch_storage_size
helpers, enablingtransformers
to usesave_pretrained
withDTensor
. by @S1ro1 in https://github.com/huggingface/huggingface_hub/pull/3042.Full Changelog: huggingface/huggingface_hub@v0.31.2...v0.31.4
v0.31.3
Compare Source
v0.31.2
: [v0.31.2] Hot-fix: makehf-xet
optional again and bump the min version of the packageCompare Source
Patch release to make
hf-xet
optional. More context in #3079 and #3078.Full Changelog: huggingface/huggingface_hub@v0.31.1...v0.31.2
v0.31.1
Compare Source
v0.31.0
: [v0.31.0] LoRAs with Inference Providers,auto
mode for provider selection, embeddings models and moreCompare Source
🧑🎨 Introducing LoRAs with fal.ai and Replicate providers
We're introducing blazingly fast LoRA inference powered by
fal.ai and Replicate through Hugging Face Inference Providers! You can use any compatible LoRA available on the Hugging Face Hub and get generations at lightning fast speed ⚡
⚙️
auto
mode for provider selectionYou can now automatically select a provider for a model using
auto
mode — it will pick the first available provider based on your preferred order set in https://hf.co/settings/inference-providers.provider
argument. Previously, the default washf-inference
, so this change may be a breaking one if you're not specifying the provider name when initializingInferenceClient
orAsyncInferenceClient
.provider="auto"
by @julien-c in #3011🧠 Embeddings support with Sambanova (feature-extraction)
We added support for feature extraction (embeddings) inference with sambanova provider.
⚡ Other Inference features
HF Inference API provider is now fully integrated as an Inference Provider, this means it only supports a predefined list of deployed models, selected based on popularity.
Cold-starting arbitrary models from the Hub is no longer supported — if a model isn't already deployed, it won’t be available via HF Inference API.
Miscellaneous improvements and some bug fixes:
✅ Of course, all of those inference changes are available in the
AsyncInferenceClient
async equivalent 🤗🚀 Xet
Thanks to @bpronan's PR, Xet now supports uploading byte arrays:
Additionally, we’ve added documentation for environment variables used by
hf-xet
to optimize file download/upload performance — including options for caching (HF_XET_CHUNK_CACHE_SIZE_BYTES
), concurrency (HF_XET_NUM_CONCURRENT_RANGE_GETS
), high-performance mode (HF_XET_HIGH_PERFORMANCE
), and sequential writes (HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY
).Miscellaneous improvements:
✨ HF API
We added HTTP download support for files larger than 50GB — enabling more reliable handling of large file downloads.
We also added dynamic batching to
upload_large_folder
, replacing the fixed 50-files-per-commit rule with an adaptive strategy that adjusts based on commit success and duration — improving performance and reducing the risk of hitting the commits rate limit on large repositories.We added support for new arguments when creating or updating Hugging Face Inference Endpoints.
💔 Breaking changes
provider
argument inInferenceClient
andAsyncInferenceClient
is now "auto" instead of "hf-inference" (HF Inference API). This means provider selection will now follow your preferred order set in your inference provider settings.If your code relied on the previous default ("hf-inference"), you may need to update it explicitly to avoid unexpected behavior.
feature-extraction
andsentence-similarity
tasks has changed fromhttps://router.huggingface.co/hf-inference/pipeline/{task}/{model}
tohttps://router.huggingface.co/hf-inference/models/{model}/pipeline/{task}
.🛠️ Small fixes and maintenance
😌 QoL improvements
🐛 Bug and typo fixes
🏗️ internal
hf_xet
min version to 1.0.0 + make it required dep on 64 bits by @hanouticelina in #2971Community contributions
The following contributors have made significant changes to the library over the last release:
v0.30.2
: : Fix text-generation task in InferenceClientCompare Source
Fixing some
InferenceClient
-related bugs:Full Changelog: huggingface/huggingface_hub@v0.30.1...v0.30.2
v0.30.1
: : fix 'sentence-transformers/all-MiniLM-L6-v2' doesn't support task 'feature-extraction'Compare Source
Patch release to fix https://github.com/huggingface/huggingface_hub/issues/2967.
Full Changelog: huggingface/huggingface_hub@v0.30.0...v0.30.1
v0.30.0
: Xet is here! (+ many cool Inference-related things!)Compare Source
🚀 Ready. Xet. Go!
This might just be our biggest update in the past two years! Xet is a groundbreaking new protocol for storing large objects in Git repositories, designed to replace Git LFS. Unlike LFS, which deduplicates files, Xet operates at the chunk level—making it a game-changer for AI builders collaborating on massive models and datasets. Our Python integration is powered by xet-core, a Rust-based package that handles all the low-level details.
You can start using Xet today by installing the optional dependency:
With that, you can seamlessly download files from Xet-enabled repositories! And don’t worry—everything remains fully backward-compatible if you’re not ready to upgrade yet.
Blog post: Xet on the Hub
Docs: Storage backends → Xet
This is the result of collaborative work by @bpronan, @hanouticelina, @rajatarya, @jsulz, @assafvayner, @Wauplin, + many others on the infra/Hub side!
xetEnabled
as an expand property by @hanouticelina in #2907⚡ Enhanced InferenceClient
The
InferenceClient
has received significant updates and improvements in this release, making it more robust and easy to work with.We’re thrilled to introduce Cerebras and Cohere as official inference providers! This expansion strengthens the Hub as the go-to entry point for running inference on open-weight models.
Novita is now our 3rd provider to support text-to-video task after Fal.ai and Replicate:
It is now possible to centralize billing on your organization rather than individual accounts! This helps companies managing their budget and setting limits at a team level. Organization must be subscribed to Enterprise Hub.
Handling long-running inference tasks just got easier! To prevent request timeouts, we’ve introduced asynchronous calls for text-to-video inference. We are expecting more providers to leverage the same structure soon, ensuring better robustness and developer-experience.
Miscellaneous improvements:
InferenceClient
docstring to reflect thattoken=False
is no longer accepted by @abidlabs in #2853provider
parameter by @hanouticelina in #2949✨ New Features and Improvements
This release also includes several other notable features and improvements.
It's now possible to pass a path with wildcard to the upload command instead of passing
--include=...
option:Deploying an Inference Endpoint from the Model Catalog just got 100x easier! Simply select which model to deploy and we handle the rest to guarantee the best hardware and settings for your dedicated endpoints.
The
ModelHubMixin
got two small updates:config
until now)You can now sort by name, size, last updated and last used where using the
delete-cache
command:--sort
arg todelete-cache
to sort by size by @AlpinDale in #2815Since end 2024, it is possible to manage the LFS files stored in a repo from the UI (see docs). This release makes it possible to do the same programmatically. The goal is to enable users to free-up some storage space in their private repositories.
💔 Breaking Changes
labels
has been removed fromInferenceClient.zero_shot_classification
andInferenceClient.zero_shot_image_classification
tasks in favor ofcandidate_labels
. There has been a proper deprecation warning for that.🛠️ Small Fixes and Maintenance
🐛 Bug and Typo Fixes
Configuration
📅 Schedule: Branch creation - "after 5am on saturday" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
To execute skipped test pipelines write comment
/ok-to-test
.This PR has been generated by MintMaker (powered by Renovate Bot).