Go 1.5 and newer includes support for vendoring. Vendoring is a way of managing dependencies where instead of relying on the install or build process to find the dependent libraries (as with building with C or using system installation directories with a dynamic language), or using some kind of “virtual installation” (eg, python’s virtualenv, perlbrew, etc), you can include the modules you are installing under a path in your library’s source tree.
In Go 1.5, you needed to set an environment variable
GO15VENDOREXPERIMENT=1) to enable the feature. In newer go
versions, you need to set
GO15VENDOREXPERIMENT=0 to disable the
feature: one might conclude that the experiment was successful, or at
least requiring further exploration.
Since Go came out, there have been an abundance of systems written to solve the vendoring and dependency management problem. Here I’d like to put forward an alternative system that’s so simple that in the simplest cases, it doesn’t even need separate tooling.
Submodules in git were designed as a solution to this problem for
projects managed using git. The way they work is that in your tree,
you check in a commit object. This represents a checked out
repository at that location. It also has a file called the
.gitmodules file, which specifies where the repositories can be
Adding a new vendor dependency
To add a new vendor dependency, I can use
git submodule add; don’t
worry, I’ll explain what the options all mean:
$ git submodule add --name github.com/lib/pq \ firstname.lastname@example.org:lib/pq vendor/github.com/lib/pq Cloning into 'vendor/github.com/lib/pq'... remote: Counting objects: 1377, done. remote: Total 1377 (delta 0), reused 0 (delta 0), pack-reused 1377 Receiving objects: 100% (1377/1377), 598.03 KiB | 162.00 KiB/s, done. Resolving deltas: 100% (841/841), done. Checking connectivity... done. $
What this will do is clone the repository at
to the path
--name part is
somewhat important. It doesn’t matter what you use here, but it
shouldn’t change over the lifetime of your project.
You can see what is due to commit with
git status and commit normally:
$ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: .gitmodules new file: vendor/github.com/lib/pq $ git commit -m "Vendor github.com/lib/pq" [readme-update 5e2afb1] Add dependency on go-randomdata 2 files changed, 4 insertions(+) create mode 160000 vendor/github.com/Pallinder/go-randomdata $
Under the hood: git submodule mechanics
Git’s better understood with reference to its very simple inner
workings, so let’s look at the
.gitmodules file that gave us:
[submodule "github.com/lib/pq"] path = vendor/github.com/lib/pq url = email@example.com:lib/pq
OK, that’s just recorded the options we gave. Now let’s look at the checkout:
$ cd vendor/github.com/lib/pq/ $ ls -a . bench_test.go encode.go ssl_test.go .. buf.go encode_test.go url.go .git certs error.go url_test.go .gitignore conn.go hstore user_posix.go .travis.yml conn_test.go listen_example user_windows.go CONTRIBUTING.md copy.go notify.go LICENSE.md copy_test.go notify_test.go README.md doc.go oid $
.git path is a regular file, not a directory! Let’s look at it:
$ cat .git gitdir: ../../../../.git/modules/github.com/lib/pq
Sure enough, if you follow that path, you’ll arrive at the git repository for this path:
$ cd ../../../../.git/modules/github.com/lib/pq $ ls -a . HEAD description hooks info objects refs .. config gitdir index logs packed-refs $
The path under
modules is exactly what you specified to
git submodule add command. This path is the local,
symbolic name for the dependency. Even if you switch to another fork
of a dependency, or move the checkout to a different location in your
tree, it’s worth keeping this the same.
The initial clone
With vendored submodules, when you first clone, you’ll need to either
--recursive, or use
git submodule init to clone all the
dependent versions of modules; here it is, assuming that you are
testing using a branch called
git-vendoring instead of
$ git clone --recursive -b git-vendoring firstname.lastname@example.org:cutesyname/yourproject Cloning into 'yourproject'... remote: Counting objects: 35331, done. remote: Compressing objects: 100% (12/12), done. remote: Total 35331 (delta 8), reused 3 (delta 3), pack-reused 35316 Receiving objects: 100% (35331/35331), 22.23 MiB | 300.00 KiB/s, done. Resolving deltas: 100% (24082/24082), done. Checking connectivity... done. Submodule 'github.com/lib/pq' (email@example.com:lib/pq) registered for path 'vendor/github.com/lib/pq' Cloning into 'vendor/github.com/lib/pq'... remote: Counting objects: 1377, done. remote: Total 1377 (delta 0), reused 0 (delta 0), pack-reused 1377 Receiving objects: 100% (1377/1377), 598.03 KiB | 509.00 KiB/s, done. Resolving deltas: 100% (841/841), done. Checking connectivity... done. Submodule path 'vendor/github.com/lib/pq': checked out 'dc50b6ad2d3ee836442cf3389009c7cd1e64bb43' $
This new clone also has the directory checked out and useful.
Switching branches (with different dependency versions)
As you switch branches, the dependencies are generally not
automatically switched. However, you can easily see that this is the
git status, and switch to the recorded one using
$ git checkout olderversion On branch olderversion Your branch is up-to-date with 'origin/olderversion'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: vendor/github.com/lib/pq (new commits) no changes added to commit (use "git add" and/or "git commit -a") $ git submodule update Submodule path 'vendor/github.com/lib/pq': checked out '5e3230b4aee4ae51bfd11634f6592e12936b6145' $ git status On branch olderversion Your branch is up-to-date with 'origin/olderversion'. nothing to commit, working directory clean $
There’s currently no way to make this automatic, but adding a
submodule update command to a
post-checkout hook should be
relatively safe. Continuous Integration builds where you don’t care
about changes in your local checkout should probably use
submodule update --force.
One of the great things about this approach is that the above commands will typically execute in well under a second if run again on the same checkout, which works well with (for example) Circle CI build directory caching.
Updating a vendored dependency
To see how up to date your dependencies are, first fetch them all
git submodule foreach, and then use
git submodule status:
$ git submodule foreach git fetch Entering 'vendor/github.com/lib/pq' $ git submodule status dc50b6ad2d3ee836442cf3389009c7cd1e64bb43 vendor/github.com/lib/pq (go1.0-cutoff-56-gdc50b6a) $
Want to switch to a release version? Check it out, and
$ cd vendor/github.com/lib/pq $ git checkout go1.0-cutoff Previous HEAD position was dc50b6a... Also send prepared statements' parameters over in binary HEAD is now at 5da8732... Add Jonathan Rudenberg to the list of contributors $ cd .. $ git add pq $ git commit -m "Pin libpq at 'go1.0-cutoff' tag" [vendor-experiment ab799ab] Pin libpq at 'go1.0-cutoff' tag 1 file changed, 1 insertion(+), 1 deletion(-) $
Investigating changes by submodule updates
If you set
log, then you can see the changes
in submodules when you use
git log -p:
$ git config --global diff.submodule log $ git log -1 -p | head commit ab799abfc61ccdddfe4cd8d1cad4327ffd6a9cc7 Author: Sam Vilain <firstname.lastname@example.org> Date: Mon Jan 25 15:53:44 2016 -0800 Pin libpq at 'go1.0-cutoff' tag Submodule vendor/github.com/lib/pq dc50b6a..5da8732 (rewind): < Also send prepared statements' parameters over in binary < Add Chris Gilling to the list of contributors < Implement driver option binary_parameters $
git log is just scratching the surface of what could be
done here (as of Git 2.7.0, anyway). There’s enough information here
for GUIs to do clever things like overlay the submodule project
history with the superproject. It’s easy to imagine improvements to
this, such as options showing a
git diff --stat of the submodule,
etc. Of course, vendoring-specific tooling could also do this, but by
using submodules you benefit from any generic, non go-vendoring based
software that is written.
If you have a dependency which has its fork changed, then just change
the URL in the
.gitmodules file, and use
git submodule sync to
fix up the remote; then you can
git submodule update as before and
git add the dependency which contains the fix you need.
This isn’t seamless; the
git submodule sync command will need to
be issued by people who switch branches to a version with a new fork
the first time (but not thereafter, so long as your fork also tracks
the original fork). But it does work, and you have resilience from
things like the original repository disappearing: if the version your
project needs is already in the clone, it does not need to use the
network at all.
Adding missing dependencies
Otherwise, you can spot your local dependencies which are not vendored
by looking in your
$GOROOT for modules which were checked out.
You can also use
go build -a -v to call out any dependencies which
are not already vendored:
$ eval "$(go env)" # set GOROOT go build -a -v 2>&1 | grep -v 'github.com/cutesyname/yourproject' | while read dir do [ -d vendor/$dir -o -d $GOROOT/src/$dir ] || echo "$dir is not stdlib, nor vendored" done
At least one person on the project has to know not to use
but instead to use the
go submodule add command.
If the dependencies also have dependencies, all you have to do is fork
the upstream, add vendoring to the project as you did to your own, and
then make a pull request against the original upstream. (Just
kidding. Go finds your dependencies’ imported modules if they are in
git submodule alone without any extra scripts or tooling is
not currently for the feint of heart, but unlike the dim and distant
past of the future, does basically work.
Historically in the early days of git, a lot of people used to write wrappers around core git functionality - things like making branches, fetching, updating branches and copying. The cogito tool was an early example of this. While cogito moved the needle forward and invented many concepts and features added to core git - remote tracking branches and history rewriting to name but two - most of these wrappers are just dead end script serving only to illustrate the conceptual model of authoring software that the writer posesses. They rarely add anything, and have tended to have become less and less necessary as people become more familiar with distributed version control and as git usability features have been added.
The lesson is, if a git feature sucks, but works, then use it anyway and hopefully someone will eventually contribute code to core to make it better. Perhaps that person will be you!