Releases: git-for-windows/git-snapshots
Releases · git-for-windows/git-snapshots
Fri, 11 Oct 2024 13:58:53 +0200
path-walk: improve path-walk speed with many tags (#5205) In the presence of many tags, the use of oid_array_lookup() can become extremely slow. We should rely upon the SEEN bit instead. This affects the tag-peeling walk as well as the switch statement for adding the peeled object to the correct oid_array. ---- @derrickstolee found this while testing the 2.47.0.vfs.0.0 pre-release against a repo with many annotated tags. This is a backport of https://github.com/microsoft/git/pull/695.
Wed, 9 Oct 2024 12:39:24 +0200
builtin/gc: fix crash when running `git maintenance start` (#5198) > This patch was sent upstream by Patrick. I'm contributing it to Git for Windows quickly to make sure it gets into microsoft/git, but also in advance of any potential 2.47.1. It was reported on the mailing list that running `git maintenance start` immediately segfaults starting with b6c3f8e12c (builtin/maintenance: fix leak in `get_schedule_cmd()`, 2024-09-26). And indeed, this segfault is trivial to reproduce up to a point where one is scratching their head why we didn't catch this regression in our test suite. The root cause of this error is `get_schedule_cmd()`, which does not populate the `out` parameter in all cases anymore starting with the mentioned commit. Callers do assume it to always be populated though and will e.g. call `strvec_split()` on the returned value, which will of course segfault when the variable is uninitialized. So why didn't we catch this trivial regression? The reason is that our tests always set up the "GIT_TEST_MAINT_SCHEDULER" environment variable via "t/test-lib.sh", which allows us to override the scheduler command with a custom one so that we don't accidentally modify the developer's system. But the faulty code where we don't set the `out` parameter will only get hit in case that environment variable is _not_ set, which is never the case when executing our tests. Fix the regression by again unconditionally allocating the value in the `out` parameter, if provided. Add a test that unsets the environment variable to catch future regressions in this area.
Tue, 8 Oct 2024 09:13:06 +0200
fixup! path-walk API: avoid adding a root tree more than once Ooops. Must not risk a segmentation fault in a partial clone missing trees... Signed-off-by: Johannes Schindelin <[email protected]>
Tue, 8 Oct 2024 08:15:03 +0200
path-walk API: avoid adding a root tree more than once (#5195) When adding tree objects, we are very careful to avoid adding the same tree object more than once. There was one small gap in that logic, though: when adding a root tree object. Two refs can easily share the same root tree object, and we should still not add it more than once.
Mon, 7 Oct 2024 18:42:24 +0200
Fix `git log --graph -u` hangs (#5193) This fixes https://github.com/git-for-windows/git/issues/5185 by backporting https://github.com/gitgitgadget/git/pull/1806 (which, sadly, seems not to have made it into Git v2.47.0).
Fri, 4 Oct 2024 14:22:55 +0200
t0610: skip concurrent write test case on Windows We tried quite a few things, but this is a failure introduced at the last -rc before v2.47.0 _and_ it only documents existing behavior as far as Windows is concerned (concurrent writes are a problem there with reftables). So let's punt and simply disable this test for now, to take the pressure off of v2.47.0. Signed-off-by: Johannes Schindelin <[email protected]>
Thu, 26 Sep 2024 23:41:44 +0200
Merge 'readme' into HEAD Add a README.md for GitHub goodness. Signed-off-by: Johannes Schindelin <[email protected]>
Thu, 26 Sep 2024 20:32:11 +0200
Add experimental 'git survey' builtin (#5174) This introduces `git survey` to Git for Windows ahead of upstream for the express purpose of getting the path-based analysis in the hands of more folks. The inspiration of this builtin is [`git-sizer`](https://github.com/github/git-sizer), but since that command relies on `git cat-file --batch` to get the contents of objects, it has limits to how much information it can provide. This is mostly a rewrite of the `git survey` builtin that was introduced into the `microsoft/git` fork in microsoft/git#667. That version had a lot more bells and whistles, including an analysis much closer to what `git-sizer` provides. The biggest difference in this version is that this one is focused on using the path-walk API in order to visit batches of objects based on a common path. This allows identifying, for instance, the path that is contributing the most to the on-disk size across all versions at that path. For example, here are the top ten paths contributing to my local Git repository (which includes `microsoft/git` and `gitster/git`): ``` TOP FILES BY DISK SIZE ============================================================================ Path | Count | Disk Size | Inflated Size -----------------------------------------+-------+-----------+-------------- whats-cooking.txt | 1373 | 11637459 | 37226854 t/helper/test-gvfs-protocol | 2 | 6847105 | 17233072 git-rebase--helper | 1 | 6027849 | 15269664 compat/mingw.c | 6111 | 5194453 | 463466970 t/helper/test-parse-options | 1 | 3420385 | 8807968 t/helper/test-pkt-line | 1 | 3408661 | 8778960 t/helper/test-dump-untracked-cache | 1 | 3408645 | 8780816 t/helper/test-dump-fsmonitor | 1 | 3406639 | 8776656 po/vi.po | 104 | 1376337 | 51441603 po/de.po | 210 | 1360112 | 71198603 ``` This kind of analysis has been helpful in identifying the reasons for growth in a few internal monorepos. Those findings motivated the changes in #5157 and #5171. With this early version in Git for Windows, we can expand the reach of the experimental tool in advance of it being contributed to the upstream project. Unfortunately, this will mean that in the next `microsoft/git` rebase, @jeffhostetler's version will need to be pulled out since there are enough conflicts. These conflicts include how tables are stored and generated, as the version in this PR is slightly more general to allow for different kinds of data.
Thu, 26 Sep 2024 13:10:40 +0200
Introduce 'git backfill' to get missing blobs in a partial clone (#5172) This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
Wed, 25 Sep 2024 16:48:41 -0400
Add path walk API and its use in 'git pack-objects' (#5171) This is a follow up to #5157 as well as motivated by the RFC in gitgitgadget/git#1786. We have ways of walking all objects, but it is focused on visiting a single commit and then expanding the new trees and blobs reachable from that commit that have not been visited yet. This means that objects arrive without any locality based on their path. Add a new "path walk API" that focuses on walking objects in batches according to their type and path. This will walk all annotated tags, all commits, all root trees, and then start a depth-first search among all paths in the repo to collect trees and blobs in batches. The most important application for this is being fast-tracked to Git for Windows: `git pack-objects --path-walk`. This application of the path walk API discovers the objects to pack via this batched walk, and automatically groups objects that appear at a common path so they can be checked for delta comparisons. This use completely avoids any name-hash collisions (even the collisions that sometimes occur with the new `--full-name-hash` option) and can be much faster to compute since the first pass of delta calculations does not waste time on objects that are unlikely to be diffable. Some statistics are available in the commit messages.