Source: GitHub Blog | Author: Taylor Blau
Protocol version 2 is now the default
You may remember when Git introduced a new version of its network fetch protocol way back in 2018. That protocol is now used by default in 2.26, so let’s refresh ourselves on what that means.
The biggest problem with the old protocol is that the server would immediately list all of the branches, tags, and other references in the repository before the client had a chance to send anything. For some repositories, this could mean sending megabytes of extra data, when the client really only wanted to know about the master branch.
The new protocol starts with the client request and provides a way for the client to tell the server which references it’s interested in. Fetching a single branch will only ask about that branch, while most clones will only ask about branches and tags. This might seem like everything, but server repositories may store other references (such as the head of every pull request opened in the repository since its creation).
Now, fetches from large repositories improve in speed, especially when the fetch itself is small, which makes the cost of the initial reference advertisement more expensive relatively speaking.
And the best part is that you won’t need to do anything! Due to some clever design, any client that speaks the new protocol can work seamlessly with both old and new servers, falling back to the original protocol if the server doesn’t support it. The only reason for the delay between introducing the protocol and making it the default was to let early adopters discover any bugs.
If you’re not ready to upgrade, you can try the new protocol immediately with Git 2.19 and above by setting this config option:
git config --global protocol.version 2
If you’re interested in the technical details, check out Introducing Git protocol version 2 from the feature’s author.
Some new config tricks
Git can read config options from a few different files: one for the repository (
.git/info/config), one for the user (
~/.gitconfig), and one for the whole system (
/etc/gitconfig). On top of that, options can be set from the command line or the environment (like
git -c foo=bar ...). Occasionally this leads to confusion: you know an option is set, but you’re not sure who is setting it, or whether or not they’re overriding an earlier setting.
git config command has had a
--show-origin option for a while, but the results are rather verbose. They show the actual path where the option is found. That’s helpful if you’re going to open the file in an editor, but less so if you’re going to use git config to modify the value (by writing
Git 2.26 introduces
--show-scope, which works similarly but uses names that can be fed back to
git config. For example:
$ git config --show-scope --get-regexp 'diff.*' global diff.statgraphwidth 35 local diff.colormoved plain $ git config --global --unset diff.statgraphwidth
The new option can be combined with
--show-origin for more detail, and can be used when looking up a single option, or listing them all:
$ git config --list --show-scope --show-origin global file:/home/user/.gitconfig diff.interhunkcontext=1 global file:/home/user/.gitconfig push.default=current [...] local file:.git/config branch.master.remote=origin local file:.git/config branch.master.merge=refs/heads/master
Another new trick is the ability to use wildcards when matching credential URLs. Any of Git’s HTTP config options can be set for all connections (such as
http.extraHeader) or only for connections to specific URLs (
http.https://example.com.extraHeader). Likewise, credential config can be set everywhere (
credential.helper) or for a specific URL (
http config matcher has some extra features; it can match wildcards like
*.example.com. But the older credential config matcher never learned that trick. Until now! This is useful for setting the username or a custom helper for subdomains. For example:
username = ttaylorr
This will work on
bar.example.com, and so on.
Updates to git sparse-checkout
From our last blog post and technical deep-dive, you may recall our discussion of the new sub-command
git sparse-checkout. We recommend checking out both of those posts, but in case you did and forgot, let’s take some time to cover what sparse-checkouts are, and how they’re used.
Sparse-checkouts are a way to have only part of your repository checked out at a time. For example, let’s consider that you’re working in a monorepo, and only need the
client/macos directory, and everything in it. You probably don’t want to spend time downloading old blobs from outside of that directory if you’re never going to use them. So, how do you do it?
Git does this in two parts:
- First, it asks the server to only send tree and commit objects instead of all blobs.
- Then, it tells the client to expect that some objects from the repository may be missing, and to ask the server for any of those objects if they are needed, say, for a checkout.
To tell Git to do both of these things, simply add
--sparse as command-line options to
git clone, and your repository will be cloned in such a way as to only fetch blob objects as they’re needed.
You’ll notice the first time that you checkout your repository that your Git client will issue a subsequent fetch to the server. This is done automatically in order to fetch the blobs in the top-level of your working copy so that it can be checked out for you to browse around it.
In historical versions of Git, the only way to add new sub-directories to Git’s list of directories that need their blobs populated has been to run
git sparse-checkout set, which sets the list of sparse-checkout directories, forcing you to re-specify them every time.
Git 2.26 now has a new
git sparse-checkout add mode, which allows you to add new directory entries one at a time. Here’s an example:
$ git clone --filter=blob:none --sparse email@example.com:git/git.git Cloning into 'git'... remote: Enumerating objects: 175470, done. remote: Total 175470 (delta 0), reused 0 (delta 0), pack-reused 175470 Receiving objects: 100% (175470/175470), 59.07 MiB | 10.48 MiB/s, done. Resolving deltas: 100% (111328/111328), done. remote: Enumerating objects: 379, done. remote: Counting objects: 100% (379/379), done. remote: Compressing objects: 100% (379/379), done. remote: Total 431 (delta 0), reused 0 (delta 0), pack-reused 52 Receiving objects: 100% (431/431), 1.73 MiB | 4.06 MiB/s, done. Updating files: 100% (432/432), done. $ cd git $ git sparse-checkout init --cone $ git sparse-checkout add t remote: Enumerating objects: 797, done. # ... Updating files: 100% (1946/1946), done. $ git sparse-checkout add Documentation remote: Enumerating objects: 334, done. # ... Updating files: 100% (723/723), done. $ git sparse-checkout list Documentation t
In the example, we told Git to clone
git/git excluding all blob objects and to avoid checking anything that wasn’t in the top-level directory out into our working copy. From the example, you can observe a few things:
- There are two “enumerating objects” lines in the output of
git clone. These come from the initial clone request, and the subsequent fetch to load in all blobs in the top-level directory.
- Even after multiple
git sparse-checkout adds, we don’t need to re-specify the directories that we want checked out, as we would have had to with
git sparse-checkout set.
- Sometimes Git asks for objects using more fetch requests than it needs to (like when we added directory t, three fetches took place when there only needed to be one). Some rough edges like this exist on the client side, and are being improved with each release.
If you’ve never used git grep to search through your Git repository, this release is a good time to try it out, since
git grep is now faster than before. If you haven’t used
git grep, here’s a quick run-down:
git grep behaves like
grep, but works in your Git repository. You can
grep through the checked-out contents of your repository, but you can also
grep through historical revisions, too.
git grep uses multiple threads to enhance its performance when scanning through the contents of your working tree. However, in previous releases of Git, due to some details of Git’s object storage mechanism,
git grep avoided using multiple threads when looking at historical revisions.
In Git 2.26, this limitation is no more, thanks to work by Matheus Tavares, a Google Summer of Code student to make reading from the object storage layer support concurrent access. Now, you can enjoy all of the benefits of
git grep --threads regardless of where you search. And since
--threads defaults to the number of cores on your workstation, you don’t even have to type
--threads at all.
Another lesser-known feature of Git are “worktrees”. In these posts, we’ve often discussed “the working copy”—in fact, there’s a reference in the tidbit just above this one! This has always intended to mean: “the copy of your repository that you have on your hard drive”. But, did you know that you can have multiple working copies per repository?
Though this may remind you of Git’s submodules, worktrees are entirely different. For example, I use a worktree to mount a special
meta branch in my fork of Git, which contains scripts and Makefile tweaks.
meta is a separate branch in
ttaylorr/git without a history, but I can mount it in a top-level
Meta directory in my checkout of
ttaylorr/git to check scripts into my repository without having to send them upstream, or use a submodule.
In Git 2.26, the completion engine that powers the results you receive when you type
git <TAB> learned about
git worktree, and can now complete subcommands, paths, refs, and more.
Way back in our post about Git 2.19, we had a number of color-related tidbits. We talked about
git config --color, colorization by age in git blame, and reminisced about
git diff --color-moved. Since it’s been a few versions, it felt right to talk about Git’s use of color again.
In many Git commands, you can use the
--format option to specify the appearance of the output to suit your liking. For example, you can write:
$ git log --format="%aN - %s"
This shows the author and message for each commit in the output of
git log. These
--format specifiers also support shorthands for the ANSI color escape sequences, so you can type
%C(blue) instead of
Now, Git supports the “bright” variant of the colors that have ANSI escape sequences, so you can now write
%C(brightblue) to obtain, well, bright blue!
If you use Scalar or are otherwise in-the-know, you might know about Git’s capability to interact with
fsmonitor-like tools (such as Facebook’s Watchman) in order to skip filesystem operations that are expensive over large trees. For example, instead of having Git query the filesystem for updates (which gets slower the larger your repository is on disk), Watchman can tell Git which files change, skipping those queries from Git altogether, making operations such as
git status (which typically involves many filesystem interactions) much faster.
Watchman supports a number of different ways to tag the time at which the filesystem was last updated: the UNIX epoch, a vector-like clock identifier, and opaque tokens [source]. Because Watchman prefers the clock identifier style, Git has been updated to understand that this may be sent instead of a UNIX epoch.
When you upgrade to Git 2.26, all you need to do is replace the hook your repository uses to communicate with Watchman, and Git will work as expected.
When we talked about partial clones earlier, we highlighted that the “partial” in “partial clones” roughly means taking the set of objects Git would have sent you and filtering it down to just the ones that you want. Ordinarily, this filtration process requires a full object traversal and thus gets slower as the number of unfiltered objects increases. However, some of these checks can be made faster by using Git’s bitmap machinery, and in Git 2.26, they have.
Because we don’t need to perform an object traversal to check whether an object is a blob, or what size it is, partial clone filters
--filter=blob:limit=<n> can run much more quickly using just bitmaps.
We’re running these patches at GitHub, so you can experiment with them and try out partial clones today by cloning any repository on GitHub.com.
When you’re performing a rebase, previous versions of Git have used a different mechanism for merging based on whether or not you’re using rebase interactively (this is the difference between
git rebase and
git rebase -i).
git rebase -i used the “merge” backend, and
git rebase used the “apply” backend. In 2.26, both
git rebase and
git rebase -i now both use the “merge” backend. The two backends behave slightly differently, and so it’s worth knowing about the differences.
For example: suppose you’re rebasing and a commit pauses with conflicts. After you resolve the conflicts and stage your files, you tell Git to move to the next step using
git rebase --continue. This used to immediately move ahead, taking the old commit message verbatim. With the new backend, you’re now prompted to edit the commit message so you can make note of the resolved conflicts.
For more, the author of this change contributed excellent documentation on some caveats and differences between the “merge” backend, and its former counterpart the “apply” backend, which you can learn more about from Git’s documentation.