Skip to content

Releases: git-for-windows/git-snapshots

Fri, 11 Oct 2024 13:58:53 +0200

25 Jan 10:27
8597dce
Compare
Choose a tag to compare
path-walk: improve path-walk speed with many tags (#5205)

In the presence of many tags, the use of oid_array_lookup() can become
extremely slow. We should rely upon the SEEN bit instead.

This affects the tag-peeling walk as well as the switch statement for
adding the peeled object to the correct oid_array.

----

@derrickstolee found this while testing the 2.47.0.vfs.0.0 pre-release
against a repo with many annotated tags.

This is a backport of https://github.com/microsoft/git/pull/695.

Wed, 9 Oct 2024 12:39:24 +0200

25 Jan 10:27
c5d00b2
Compare
Choose a tag to compare
builtin/gc: fix crash when running `git maintenance start` (#5198)

> This patch was sent upstream by Patrick. I'm contributing it to Git
for Windows quickly to make sure it gets into microsoft/git, but also in
advance of any potential 2.47.1.

It was reported on the mailing list that running `git maintenance start`
immediately segfaults starting with b6c3f8e12c (builtin/maintenance: fix
leak in `get_schedule_cmd()`, 2024-09-26). And indeed, this segfault is
trivial to reproduce up to a point where one is scratching their head
why we didn't catch this regression in our test suite.

The root cause of this error is `get_schedule_cmd()`, which does not
populate the `out` parameter in all cases anymore starting with the
mentioned commit. Callers do assume it to always be populated though and
will e.g. call `strvec_split()` on the returned value, which will of
course segfault when the variable is uninitialized.

So why didn't we catch this trivial regression? The reason is that our
tests always set up the "GIT_TEST_MAINT_SCHEDULER" environment variable
via "t/test-lib.sh", which allows us to override the scheduler command
with a custom one so that we don't accidentally modify the developer's
system. But the faulty code where we don't set the `out` parameter will
only get hit in case that environment variable is _not_ set, which is
never the case when executing our tests.

Fix the regression by again unconditionally allocating the value in the
`out` parameter, if provided. Add a test that unsets the environment
variable to catch future regressions in this area.

Tue, 8 Oct 2024 09:13:06 +0200

25 Jan 10:27
Compare
Choose a tag to compare
fixup! path-walk API: avoid adding a root tree more than once

Ooops. Must not risk a segmentation fault in a partial clone missing
trees...

Signed-off-by: Johannes Schindelin <[email protected]>

Tue, 8 Oct 2024 08:15:03 +0200

25 Jan 10:26
d48cc90
Compare
Choose a tag to compare
path-walk API: avoid adding a root tree more than once (#5195)

When adding tree objects, we are very careful to avoid adding the same
tree object more than once. There was one small gap in that logic,
though: when adding a root tree object. Two refs can easily share the
same root tree object, and we should still not add it more than once.

Mon, 7 Oct 2024 18:42:24 +0200

25 Jan 10:26
46171c9
Compare
Choose a tag to compare
Fix `git log --graph -u` hangs (#5193)

This fixes https://github.com/git-for-windows/git/issues/5185 by
backporting https://github.com/gitgitgadget/git/pull/1806 (which, sadly,
seems not to have made it into Git v2.47.0).

Fri, 4 Oct 2024 14:22:55 +0200

25 Jan 10:26
Compare
Choose a tag to compare
t0610: skip concurrent write test case on Windows

We tried quite a few things, but this is a failure introduced at the
last -rc before v2.47.0 _and_ it only documents existing behavior as
far as Windows is concerned (concurrent writes are a problem there with
reftables).

So let's punt and simply disable this test for now, to take the pressure
off of v2.47.0.

Signed-off-by: Johannes Schindelin <[email protected]>

Thu, 26 Sep 2024 23:41:44 +0200

25 Jan 10:26
Compare
Choose a tag to compare
Merge 'readme' into HEAD

Add a README.md for GitHub goodness.

Signed-off-by: Johannes Schindelin <[email protected]>

Thu, 26 Sep 2024 20:32:11 +0200

25 Jan 10:26
68f029a
Compare
Choose a tag to compare
Add experimental 'git survey' builtin (#5174)

This introduces `git survey` to Git for Windows ahead of upstream for
the express purpose of getting the path-based analysis in the hands of
more folks.

The inspiration of this builtin is
[`git-sizer`](https://github.com/github/git-sizer), but since that
command relies on `git cat-file --batch` to get the contents of objects,
it has limits to how much information it can provide.

This is mostly a rewrite of the `git survey` builtin that was introduced
into the `microsoft/git` fork in microsoft/git#667. That version had a
lot more bells and whistles, including an analysis much closer to what
`git-sizer` provides.

The biggest difference in this version is that this one is focused on
using the path-walk API in order to visit batches of objects based on a
common path. This allows identifying, for instance, the path that is
contributing the most to the on-disk size across all versions at that
path.

For example, here are the top ten paths contributing to my local Git
repository (which includes `microsoft/git` and `gitster/git`):

```
TOP FILES BY DISK SIZE
============================================================================
                                    Path | Count | Disk Size | Inflated Size
-----------------------------------------+-------+-----------+--------------
                       whats-cooking.txt |  1373 |  11637459 |      37226854
             t/helper/test-gvfs-protocol |     2 |   6847105 |      17233072
                      git-rebase--helper |     1 |   6027849 |      15269664
                          compat/mingw.c |  6111 |   5194453 |     463466970
             t/helper/test-parse-options |     1 |   3420385 |       8807968
                  t/helper/test-pkt-line |     1 |   3408661 |       8778960
      t/helper/test-dump-untracked-cache |     1 |   3408645 |       8780816
            t/helper/test-dump-fsmonitor |     1 |   3406639 |       8776656
                                po/vi.po |   104 |   1376337 |      51441603
                                po/de.po |   210 |   1360112 |      71198603
```

This kind of analysis has been helpful in identifying the reasons for
growth in a few internal monorepos. Those findings motivated the changes
in #5157 and #5171.

With this early version in Git for Windows, we can expand the reach of
the experimental tool in advance of it being contributed to the upstream
project.

Unfortunately, this will mean that in the next `microsoft/git` rebase,
@jeffhostetler's version will need to be pulled out since there are
enough conflicts. These conflicts include how tables are stored and
generated, as the version in this PR is slightly more general to allow
for different kinds of data.

Thu, 26 Sep 2024 13:10:40 +0200

25 Jan 10:25
5e2e8b4
Compare
Choose a tag to compare
Introduce 'git backfill' to get missing blobs in a partial clone (#5172)

This change introduces the `git backfill` command which uses the path
walk API to download missing blobs in a blobless partial clone.

By downloading blobs that correspond to the same file path at the same
time, we hope to maximize the potential benefits of delta compression
against multiple versions.

These downloads occur in a configurable batch size, presenting a
mechanism to perform "resumable" clones: `git clone --filter=blob:none`
gets the commits and trees, then `git backfill` will download all
missing blobs. If `git backfill` is interrupted partway through, it can
be restarted and will redownload only the missing objects.

When combining blobless partial clones with sparse-checkout, `git
backfill` will assume its `--sparse` option and download only the blobs
within the sparse-checkout. Users may want to do this as the repo size
will still be smaller than the full repo size, but commands like `git
blame` or `git log -L` will not suffer from many one-by-one blob
downloads.

Future directions should consider adding a pathspec or file prefix to
further focus which paths are being downloaded in a batch.

Wed, 25 Sep 2024 16:48:41 -0400

25 Jan 10:25
275c9db
Compare
Choose a tag to compare
Add path walk API and its use in 'git pack-objects' (#5171)

This is a follow up to #5157 as well as motivated by the RFC in
gitgitgadget/git#1786.

We have ways of walking all objects, but it is focused on visiting a
single commit and then expanding the new trees and blobs reachable from
that commit that have not been visited yet. This means that objects
arrive without any locality based on their path.

Add a new "path walk API" that focuses on walking objects in batches
according to their type and path. This will walk all annotated tags, all
commits, all root trees, and then start a depth-first search among all
paths in the repo to collect trees and blobs in batches.

The most important application for this is being fast-tracked to Git for
Windows: `git pack-objects --path-walk`. This application of the path
walk API discovers the objects to pack via this batched walk, and
automatically groups objects that appear at a common path so they can be
checked for delta comparisons.

This use completely avoids any name-hash collisions (even the collisions
that sometimes occur with the new `--full-name-hash` option) and can be
much faster to compute since the first pass of delta calculations does
not waste time on objects that are unlikely to be diffable.

Some statistics are available in the commit messages.