Keeping Git Branches in Sync (Shallow Thoughts)

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sat, 05 Aug 2017

Keeping Git Branches in Sync

I do most of my coding on my home machine. But when I travel (or sit in boring meetings), sometimes I do a little hacking on my laptop. Most of my code is hosted in GitHub repos, so when I travel, I like to update all the repos on the laptop to make sure I have what I need even when I'm offline.

That works great as long as I don't make branches. I have a variable $myrepos that lists all the github repositories where I want to contribute, and with a little shell alias it's easy enough to update them all:

allgit() {
    pushd ~
    foreach repo ($myrepos)
        echo $repo :
        cd ~/src/$repo
        git pull
    end
    popd
}

That works well enough -- as long as you don't use branches.

Git's branch model seems to be that branches are for local development, and aren't meant to be shared, pushed, or synchronized among machines. It's ridiculously difficult in git to do something like, "for all branches on the remote server, make sure I have that branch and it's in sync with the server." When you create branches, they don't push to the server by default, and it's remarkably difficult to figure out which of your branches is actually tracking a branch on the server.

A web search finds plenty of people asking, and most of the Git experts answering say things like "Just check out the branch, then pull." In other words, if you want to work on a branch, you'd better know before you go offline exactly which branches in which repositories might have been created or updated since the last time you worked in that repository on that machine. I guess that works if you only ever work on one project in one repo and only on one or two branches at a time. It certainly doesn't work if you need to update lots of repos on a laptop for the first time in two weeks.

Further web searching does find a few possibilities. For checking whether there are files modified that need to be committed, git status --porcelain -uno works well. For checking whether changes are committed but not pushed, git for-each-ref --format="%(refname:short) %(push:track)" refs/heads | fgrep '[ahead' works ... if you make an alias so you never have to look at it.

Figuring out whether branches are tracking remotes is a lot harder. I found some recommendations like git branch -r | grep -v '\->' | while read remote; do git branch --track "${remote#origin/}" "$remote"; done and for remote in `git branch -r`; do git branch --track ${remote#origin/} $remote; done but neither of them really did what I wanted. I was chasing down the rabbit hole of writing shell loops using variables like

  localbranches=("${(@f)$(git branch | sed 's/..//')}")
  remotebranches=("${(@f)$(git branch -a | grep remotes | grep -v HEAD | grep -v master | sed 's_remotes/origin/__' | sed 's/..//')}")
when I thought, there must be a better way. Maybe using Python bindings?

git-python

In Debian, the available packages for Git Python bindings are python-git, python-pygit2, and python-dulwich. Nobody on #python seemed to like any of them, but based on quick attempts with all three, python-git seemed the most straightforward. Confusingly, though Debian calls it python-git, it's called "git-python" in its docs or in web searches, and it's "import git" when you use it.

It's pretty straightforward to use, at least for simple things. You can create a Repo object with

from git import Repo
repo = Repo('.')
and then you can get lists like repo.heads (local branches), repo.refs (local and remote branches and other refs such as tags), etc. Once you have a ref, you can use ref.name, check whether it's tracking a remote branch with ref.tracking_branch(), and make it track one with ref.set_tracking_branch(remoteref). That makes it very easy to get a list of branches showing which ones are tracking a remote branch, something that had proved almost impossible with the git command line.

Nice. But now I wanted more: I wanted to replace those baroque git status --porcelain and git for-each-ref commands I had been using to check whether my repos needed committing or pushing. That proved harder.

Checking for uncommitted files, I decided it would be easiest stick with the existing git status --porcelain -uno. Which was sort of true. git-python lets you call git commands, for cases where the Python bindings aren't quite up to snuff yet, but it doesn't handle all cases. I could call:

    output = repo.git.status(porcelain=True)
but I never did find a way to pass the -uno; I tried u=False, u=None, and u="no" but none of them worked. But -uno actually isn't that important so I decided to do without it.

I found out later that there's another way to call the git command, using execute, which lets you pass the exact arguments you'd pass on the command line. It didn't work to call for-each-ref the way I'd called repo.git.status (repo.git.for_each_ref isn't defined), but I could call it this way:

    foreachref = repo.git.execute(['git', 'for-each-ref',
                                   '--format="%(refname:short) %(push:track)"',
                                   'refs/heads'])
and then parse the output looking for "[ahead]". That worked, but ... ick. I wanted to figure out how to do that using Python.

It's easy to get a ref (branch) and its corresponding tracking ref (remote branch). ref.log() gives you a list of commits on each of the two branches, ordered from earliest to most recent, the opposite of git log. In the simple case, then, what I needed was to iterate backward over the two commit logs, looking for the most recent SHA that's common to both. The Python builtin reversed was useful here:

    for i, entry in enumerate(reversed(ref.log())):
        for j, upstream_entry in enumerate(reversed(upstream.log())):
            if entry.newhexsha == upstream_entry.newhexsha:
                return i, j

(i, j) are the number of commits on the local branch that the remote hasn't seen, and vice versa. If i is zero, or if there's nothing in ref.log(), then the repo has no new commits and doesn't need pushing.

Making branches track a remote

The last thing I needed to do was to make branches track their remotes. Too many times, I've found myself on the laptop, ready to work, and discovered that I didn't have the latest code because I'd been working on a branch on my home machine, and my git pull hadn't pulled the info for the branch because that branch wasn't in the laptop's repo yet. That's what got me started on this whole "update everything" script in the first place.

If you have a ref for the local branch and a ref for the remote branch, you can verify their ref.name is the same, and if the local branch has the same name but isn't tracking the remote branch, probably something went wrong with the local repo (like one of my earlier attempts to get branches in sync, and it's an easy fix: ref.set_tracking_branch(remoteref).

But what if the local branch doesn't exist yet? That's the situation I cared about most, when I've been working on a new branch and it's not on the laptop yet, but I'm going to want to work on it while traveling. And that turned out to be difficult, maybe impossible, to do in git-python.

It's easy to create a new local branch: repo.head.create(repo, name). But that branch gets created as a copy of master, and if you try to turn it into a copy of the remote branch, you get conflicts because the branch is ahead of the remote branch you're trying to copy, or vice versa. You really need to create the new branch as a copy of the remote branch it's supposed to be tracking.

If you search the git-python documentation for ref.create, there are references to "For more documentation, please see the Head.create method." Head.create takes a reference argument (the basic ref.create doesn't, though the documentation suggests it should). But how can you call Head.create? I had no luck with attempts like repo.git.Head.create(repo, name, reference=remotebranches[name]).

I finally gave up and went back to calling the command line from git-python.

repo.git.checkout(remotebranchname, b=name)
I'm not entirely happy with that, but it seems to work.

I'm sure there are all sorts of problems left to solve. But this script does a much better job than any git command I've found of listing the branches in my repositories, checking for modifications that require commits or pushes, and making local branches to mirror new branches on the server. And maybe with time the git-python bindings will improve, and eventually I'll be able to create new tracking branches locally without needing the command line.

The final script, such as it is: gitbranchsync.py.

Tags: , ,
[ 14:39 Aug 05, 2017    More programming | permalink to this entry | comments ]
(Commenting requires Javascript from ShallowSky.com and Disqus.com, and a cookie from Disqus.com.)
blog comments powered by Disqus