Skip to content

zip: Entry.name(); Entry.extract_to() accepting user-provided writers #23991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 85 additions & 15 deletions lib/std/zip.zig
Original file line number Diff line number Diff line change
Expand Up @@ -435,12 +435,14 @@ pub fn Iterator(comptime SeekableStream: type) type {
uncompressed_size: u64,
file_offset: u64,

pub fn extract(
/// Extract this entry. `extractor` must implement `openFile` and `openDir`.
/// See also `extract`.
pub fn extract_to(
self: Entry,
stream: SeekableStream,
options: ExtractOptions,
filename_buf: []u8,
dest: std.fs.Dir,
extractor: anytype,
) !u32 {
if (filename_buf.len < self.filename_len)
return error.ZipInsufficientBuffer;
Expand Down Expand Up @@ -532,41 +534,68 @@ pub fn Iterator(comptime SeekableStream: type) type {
if (filename[filename.len - 1] == '/') {
if (self.uncompressed_size != 0)
return error.ZipBadDirectorySize;
try dest.makePath(filename[0 .. filename.len - 1]);
var dir = try extractor.openDir(filename[0 .. filename.len - 1]);
dir.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this openDir function really isn't to open a directory, but to create directories to form that path, which is evident by the filesystem implementation of openDir below and the original code that was here before these changes.

There's also not much reason to have a close function; any extractor implementation could just call it before returning from it's openDir (or more accurately makePath) function if it was necessary. The filesystem implementation implements it as a no-op anyways.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this.

return std.hash.Crc32.hash(&.{});
}

const out_file = blk: {
if (std.fs.path.dirname(filename)) |dirname| {
var parent_dir = try dest.makeOpenPath(dirname, .{});
defer parent_dir.close();

const basename = std.fs.path.basename(filename);
break :blk try parent_dir.createFile(basename, .{ .exclusive = true });
}
break :blk try dest.createFile(filename, .{ .exclusive = true });
};
defer out_file.close();
const local_data_file_offset: u64 =
@as(u64, self.file_offset) +
@as(u64, @sizeOf(LocalFileHeader)) +
local_data_header_offset;
try stream.seekTo(local_data_file_offset);
var limited_reader = std.io.limitedReader(stream.context.reader(), self.compressed_size);
var file = try extractor.openFile(filename);
defer file.close();
const crc = try decompress(
self.compression_method,
self.uncompressed_size,
limited_reader.reader(),
out_file.writer(),
file.writer(),
);
if (limited_reader.bytes_left != 0)
return error.ZipDecompressTruncated;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but I do wonder if this could be factored out to remove the need for this extractor interface altogether, since all that's really needed here is a writer to decompress the data to.

For example, maybe have extract_to return a struct containing the filename/path, crc, and a "file or directory" enum, then this code that actually does the decompression and the openDir code above that creates the directory could be moved into extract, and the extractor interface can be removed.

Copy link
Author

@l1yefeng l1yefeng May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decompression depends on the (generalized concept of) opened file and opening the file depends on the filename. Then decompression may have to be moved from extract_to to its caller, too. That's too much for callers I'm afraid.

return crc;
}

/// Extract this entry to the file system.
pub fn extract(
self: Entry,
stream: SeekableStream,
options: ExtractOptions,
filename_buf: []u8,
dest: std.fs.Dir,
) !u32 {
return self.extract_to(stream, options, filename_buf, FsExtractor{ .dest = dest });
}
};
};
}

const FsExtractor = struct {
dest: std.fs.Dir,

const VirtualDir = struct {
fn close(_: VirtualDir) void {}
};

fn openDir(self: @This(), name: []u8) !VirtualDir {
try self.dest.makePath(name);
return VirtualDir{};
}

fn openFile(self: @This(), name: []u8) !std.fs.File {
if (std.fs.path.dirname(name)) |dirname| {
var parent_dir = try self.dest.makeOpenPath(dirname, .{});
defer parent_dir.close();

const basename = std.fs.path.basename(name);
return try parent_dir.createFile(basename, .{ .exclusive = true });
}
return try self.dest.createFile(name, .{ .exclusive = true });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function creates a file and ensures it's parent directory exists, and doesn't open a file in the usual sense, it should probably be renamed as such. Of course if the extractor interface is removed (see comment above), this wouldn't be relevant.

};

// returns true if `filename` starts with `root` followed by a forward slash
fn filenameInRoot(filename: []const u8, root: []const u8) bool {
return (filename.len >= root.len + 1) and
Expand Down Expand Up @@ -697,6 +726,47 @@ test "zip verify filenames" {
try testZipError(error.ZipFilenameHasBackslash, .{ .name = "foo\\bar", .content = "", .compression = .store }, .{});
}

test "zip extract file to buffer" {
const test_files = [_]File{
.{ .name = "a.txt", .content = "aaa", .compression = .store },
};

var extracted = std.ArrayList(u8).init(std.testing.allocator);
defer extracted.deinit();
var extractor = struct {
writer: ArrayList.Writer,

const ArrayList = std.ArrayList(u8);
const Extractor = @This();
const VirtualFile = struct {
context: *Extractor,
fn writer(self: @This()) ArrayList.Writer {
return self.context.writer;
}
fn close(_: @This()) void {}
};
fn openDir(_: @This(), _: []u8) !VirtualFile {
unreachable;
}
fn openFile(self: *@This(), _: []u8) !VirtualFile {
return VirtualFile{ .context = self };
}
}{ .writer = extracted.writer() };

var zip_buf: [4096]u8 = undefined;
var store: [test_files.len]FileStore = undefined;
var fbs = try testutil.makeZipWithStore(&zip_buf, &test_files, .{}, &store);
const stream = fbs.seekableStream();

var iter = try Iterator(@TypeOf(stream)).init(stream);
const entry = (try iter.next()).?;
var filename_buf: [8]u8 = undefined;
_ = try entry.extract_to(stream, .{}, &filename_buf, &extractor);

const content = comptime test_files[0].content;
try std.testing.expectEqualSlices(u8, content, extracted.items);
}

test "zip64" {
const test_files = [_]File{
.{ .name = "fram", .content = "fram foo fro fraba", .compression = .store },
Expand Down
Loading