A Python implementation of the CompressedList
class from R/Bioconductor for memory-efficient list-like objects.
CompressedList
is a memory-efficient container for list-like objects. Instead of storing each list element separately, it concatenates all elements into a single vector-like object and maintains information about where each original element begins and ends. This approach is significantly more memory-efficient than standard lists, especially when dealing with many list elements.
To get started, install the package from PyPI
pip install compressed-lists
from compressed_lists import CompressedIntegerList, CompressedStringList
# Create a CompressedIntegerList
int_data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
names = ["A", "B", "C"]
int_list = CompressedIntegerList.from_list(int_data, names)
# Access elements
print(int_list[0]) # [1, 2, 3]
print(int_list["B"]) # [4, 5]
print(int_list[1:3]) # Slice of elements
# Apply a function to each element
squared = int_list.lapply(lambda x: [i**2 for i in x])
print(squared[0]) # [1, 4, 9]
# Convert to a regular Python list
regular_list = int_list.to_list()
# Create a CompressedStringList
char_data = [["apple", "banana"], ["cherry", "date", "elderberry"], ["fig"]]
char_list = CompressedStringList.from_list(char_data)
The Partitioning
class handles the information about where each element begins and ends in the concatenated data. It allows for efficient extraction of elements without storing each element separately.
from compressed_lists import Partitioning
# Create partitioning from end positions
ends = [3, 5, 10]
names = ["A", "B", "C"]
part = Partitioning(ends, names)
# Get partition range for an element
start, end = part[1] # Returns (3, 5)
Note
Check out the documentation for extending CompressedLists to custom data types.
This project has been set up using BiocSetup and PyScaffold.