[v1] Move block management logic from KVCacheManager to SpecializedManager #17474
+279
−152
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Should be merged after #17398
To prepare for hybrid allocator, this PR moves logic that need to run for each specialized manager from KVCacheManager to SpecializedManager. As the
SpecializedManager
not only contains customized logic for different attention type, I renamed it toSingleTypeKVCacheManager
.Prefer to rename
specialized_manager.py
in a seperate PR.Didn't move hashing logic (e.g. req_to_block_hashes) to SpecializedManager as the HybridAllocator will do hashing in
KVCacheManager
level so that different managers with the same block_size can use the same block_hash.Splitted from #16101