Vendoring in Go with Git Submodules

Go 1.5 and newer includes support for vendoring. Vendoring is a way of managing dependencies where instead of relying on the install or build process to find the dependent libraries (as with building with C or using system installation directories with a dynamic language), or using some kind of “virtual installation” (eg, python’s virtualenv, perlbrew, etc), you can include the modules you are installing under a path in your library’s source tree.

In Go 1.5, you needed to set an environment variable (GO15VENDOREXPERIMENT=1) to enable the feature. In newer go versions, you need to set GO15VENDOREXPERIMENT=0 to disable the feature: one might conclude that the experiment was successful, or at least requiring further exploration.

Since Go came out, there have been an abundance of systems written to solve the vendoring and dependency management problem. Here I’d like to put forward an alternative system that’s so simple that in the simplest cases, it doesn’t even need separate tooling.

Git Submodules

Submodules in git were designed as a solution to this problem for projects managed using git. The way they work is that in your tree, you check in a commit object. This represents a checked out repository at that location. It also has a file called the .gitmodules file, which specifies where the repositories can be cloned from.

To add a new vendor dependency, I can use git submodule add; don’t worry, I’ll explain what the options all mean:

$git submodule add --name github.com/lib/pq \ git@github.com:lib/pq vendor/github.com/lib/pq Cloning into 'vendor/github.com/lib/pq'... remote: Counting objects: 1377, done. remote: Total 1377 (delta 0), reused 0 (delta 0), pack-reused 1377 Receiving objects: 100% (1377/1377), 598.03 KiB | 162.00 KiB/s, done. Resolving deltas: 100% (841/841), done. Checking connectivity... done.$


What this will do is clone the repository at git@github.com:lib/pq to the path vendor/github.com/lib/pq. The --name part is somewhat important. It doesn’t matter what you use here, but it shouldn’t change over the lifetime of your project.

You can see what is due to commit with git status and commit normally:

$git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: .gitmodules new file: vendor/github.com/lib/pq$ git commit -m "Vendor github.com/lib/pq"
2 files changed, 4 insertions(+)
create mode 160000 vendor/github.com/Pallinder/go-randomdata
$ Under the hood: git submodule mechanics Git’s better understood with reference to its very simple inner workings, so let’s look at the .gitmodules file that gave us: [submodule "github.com/lib/pq"] path = vendor/github.com/lib/pq url = git@github.com:lib/pq  OK, that’s just recorded the options we gave. Now let’s look at the checkout: $ cd vendor/github.com/lib/pq/
$ls -a . bench_test.go encode.go ssl_test.go .. buf.go encode_test.go url.go .git certs error.go url_test.go .gitignore conn.go hstore user_posix.go .travis.yml conn_test.go listen_example user_windows.go CONTRIBUTING.md copy.go notify.go LICENSE.md copy_test.go notify_test.go README.md doc.go oid$


The .git path is a regular file, not a directory! Let’s look at it:

$cat .git gitdir: ../../../../.git/modules/github.com/lib/pq  Sure enough, if you follow that path, you’ll arrive at the git repository for this path: $ cd ../../../../.git/modules/github.com/lib/pq
$ls -a . HEAD description hooks info objects refs .. config gitdir index logs packed-refs$


The path under modules is exactly what you specified to --name on the git submodule add command. This path is the local, symbolic name for the dependency. Even if you switch to another fork of a dependency, or move the checkout to a different location in your tree, it’s worth keeping this the same.

The initial clone

With vendored submodules, when you first clone, you’ll need to either use --recursive, or use git submodule init to clone all the dependent versions of modules; here it is, assuming that you are testing using a branch called git-vendoring instead of master:

$git clone --recursive -b git-vendoring git@github.com:cutesyname/yourproject Cloning into 'yourproject'... remote: Counting objects: 35331, done. remote: Compressing objects: 100% (12/12), done. remote: Total 35331 (delta 8), reused 3 (delta 3), pack-reused 35316 Receiving objects: 100% (35331/35331), 22.23 MiB | 300.00 KiB/s, done. Resolving deltas: 100% (24082/24082), done. Checking connectivity... done. Submodule 'github.com/lib/pq' (git@github.com:lib/pq) registered for path 'vendor/github.com/lib/pq' Cloning into 'vendor/github.com/lib/pq'... remote: Counting objects: 1377, done. remote: Total 1377 (delta 0), reused 0 (delta 0), pack-reused 1377 Receiving objects: 100% (1377/1377), 598.03 KiB | 509.00 KiB/s, done. Resolving deltas: 100% (841/841), done. Checking connectivity... done. Submodule path 'vendor/github.com/lib/pq': checked out 'dc50b6ad2d3ee836442cf3389009c7cd1e64bb43'$


This new clone also has the directory checked out and useful.

Switching branches (with different dependency versions)

As you switch branches, the dependencies are generally not automatically switched. However, you can easily see that this is the case with git status, and switch to the recorded one using git submodule update:

$git checkout olderversion On branch olderversion Your branch is up-to-date with 'origin/olderversion'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: vendor/github.com/lib/pq (new commits) no changes added to commit (use "git add" and/or "git commit -a")$ git submodule update
Submodule path 'vendor/github.com/lib/pq': checked out '5e3230b4aee4ae51bfd11634f6592e12936b6145'
$git status On branch olderversion Your branch is up-to-date with 'origin/olderversion'. nothing to commit, working directory clean$


There’s currently no way to make this automatic, but adding a git submodule update command to a post-checkout hook should be relatively safe. Continuous Integration builds where you don’t care about changes in your local checkout should probably use git submodule update --force.

One of the great things about this approach is that the above commands will typically execute in well under a second if run again on the same checkout, which works well with (for example) Circle CI build directory caching.

Updating a vendored dependency

To see how up to date your dependencies are, first fetch them all using git submodule foreach, and then use git submodule status:

$git submodule foreach git fetch Entering 'vendor/github.com/lib/pq'$ git submodule status
$ Want to switch to a release version? Check it out, and git add the dependency: $ cd vendor/github.com/lib/pq
$git checkout go1.0-cutoff Previous HEAD position was dc50b6a... Also send prepared statements' parameters over in binary HEAD is now at 5da8732... Add Jonathan Rudenberg to the list of contributors$ cd ..
$git add pq$ git commit -m "Pin libpq at 'go1.0-cutoff' tag"
[vendor-experiment ab799ab] Pin libpq at 'go1.0-cutoff' tag
1 file changed, 1 insertion(+), 1 deletion(-)
$ Investigating changes by submodule updates If you set diff.submodule to log, then you can see the changes in submodules when you use git log -p: $ git config --global diff.submodule log
$git log -1 -p | head commit ab799abfc61ccdddfe4cd8d1cad4327ffd6a9cc7 Author: Sam Vilain <sam@vilain.net> Date: Mon Jan 25 15:53:44 2016 -0800 Pin libpq at 'go1.0-cutoff' tag Submodule vendor/github.com/lib/pq dc50b6a..5da8732 (rewind): < Also send prepared statements' parameters over in binary < Add Chris Gilling to the list of contributors < Implement driver option binary_parameters$


Really, git log is just scratching the surface of what could be done here (as of Git 2.7.0, anyway). There’s enough information here for GUIs to do clever things like overlay the submodule project history with the superproject. It’s easy to imagine improvements to this, such as options showing a git diff --stat of the submodule, etc. Of course, vendoring-specific tooling could also do this, but by using submodules you benefit from any generic, non go-vendoring based software that is written.

Switching forks

If you have a dependency which has its fork changed, then just change the URL in the .gitmodules file, and use git submodule sync to fix up the remote; then you can git submodule update as before and git add the dependency which contains the fix you need.

This isn’t seamless; the git submodule sync command will need to be issued by people who switch branches to a version with a new fork the first time (but not thereafter, so long as your fork also tracks the original fork). But it does work, and you have resilience from things like the original repository disappearing: if the version your project needs is already in the clone, it does not need to use the network at all.

Otherwise, you can spot your local dependencies which are not vendored by looking in your $GOROOT for modules which were checked out. You can also use go build -a -v to call out any dependencies which are not already vendored: $ eval "$(go env)" # set GOROOT go build -a -v 2>&1 | grep -v 'github.com/cutesyname/yourproject' | while read dir do [ -d vendor/$dir -o -d $GOROOT/src/$dir ] ||
echo "\$dir is not stdlib, nor vendored"
done


At least one person on the project has to know not to use go get but instead to use the go submodule add command.

If the dependencies also have dependencies, all you have to do is fork the upstream, add vendoring to the project as you did to your own, and then make a pull request against the original upstream. (Just kidding. Go finds your dependencies’ imported modules if they are in your vendor/ tree.)

In Summary…

Using git submodule alone without any extra scripts or tooling is not currently for the feint of heart, but unlike the dim and distant past of the future, does basically work.

Historically in the early days of git, a lot of people used to write wrappers around core git functionality - things like making branches, fetching, updating branches and copying. The cogito tool was an early example of this. While cogito moved the needle forward and invented many concepts and features added to core git - remote tracking branches and history rewriting to name but two - most of these wrappers are just dead end script serving only to illustrate the conceptual model of authoring software that the writer posesses. They rarely add anything, and have tended to have become less and less necessary as people become more familiar with distributed version control and as git usability features have been added.

The lesson is, if a git feature sucks, but works, then use it anyway and hopefully someone will eventually contribute code to core to make it better. Perhaps that person will be you!