Shallow Thoughts : tags : git

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sat, 05 Aug 2017

Keeping Git Branches in Sync

I do most of my coding on my home machine. But when I travel (or sit in boring meetings), sometimes I do a little hacking on my laptop. Most of my code is hosted in GitHub repos, so when I travel, I like to update all the repos on the laptop to make sure I have what I need even when I'm offline.

That works great as long as I don't make branches. I have a variable $myrepos that lists all the github repositories where I want to contribute, and with a little shell alias it's easy enough to update them all:

allgit() {
    pushd ~
    foreach repo ($myrepos)
        echo $repo :
        cd ~/src/$repo
        git pull
    end
    popd
}

That works well enough -- as long as you don't use branches.

Git's branch model seems to be that branches are for local development, and aren't meant to be shared, pushed, or synchronized among machines. It's ridiculously difficult in git to do something like, "for all branches on the remote server, make sure I have that branch and it's in sync with the server." When you create branches, they don't push to the server by default, and it's remarkably difficult to figure out which of your branches is actually tracking a branch on the server.

A web search finds plenty of people asking, and most of the Git experts answering say things like "Just check out the branch, then pull." In other words, if you want to work on a branch, you'd better know before you go offline exactly which branches in which repositories might have been created or updated since the last time you worked in that repository on that machine. I guess that works if you only ever work on one project in one repo and only on one or two branches at a time. It certainly doesn't work if you need to update lots of repos on a laptop for the first time in two weeks.

Further web searching does find a few possibilities. For checking whether there are files modified that need to be committed, git status --porcelain -uno works well. For checking whether changes are committed but not pushed, git for-each-ref --format="%(refname:short) %(push:track)" refs/heads | fgrep '[ahead' works ... if you make an alias so you never have to look at it.

Figuring out whether branches are tracking remotes is a lot harder. I found some recommendations like git branch -r | grep -v '\->' | while read remote; do git branch --track "${remote#origin/}" "$remote"; done and for remote in `git branch -r`; do git branch --track ${remote#origin/} $remote; done but neither of them really did what I wanted. I was chasing down the rabbit hole of writing shell loops using variables like

  localbranches=("${(@f)$(git branch | sed 's/..//')}")
  remotebranches=("${(@f)$(git branch -a | grep remotes | grep -v HEAD | grep -v master | sed 's_remotes/origin/__' | sed 's/..//')}")
when I thought, there must be a better way. Maybe using Python bindings?

git-python

In Debian, the available packages for Git Python bindings are python-git, python-pygit2, and python-dulwich. Nobody on #python seemed to like any of them, but based on quick attempts with all three, python-git seemed the most straightforward. Confusingly, though Debian calls it python-git, it's called "git-python" in its docs or in web searches, and it's "import git" when you use it.

It's pretty straightforward to use, at least for simple things. You can create a Repo object with

from git import Repo
repo = Repo('.')
and then you can get lists like repo.heads (local branches), repo.refs (local and remote branches and other refs such as tags), etc. Once you have a ref, you can use ref.name, check whether it's tracking a remote branch with ref.tracking_branch(), and make it track one with ref.set_tracking_branch(remoteref). That makes it very easy to get a list of branches showing which ones are tracking a remote branch, something that had proved almost impossible with the git command line.

Nice. But now I wanted more: I wanted to replace those baroque git status --porcelain and git for-each-ref commands I had been using to check whether my repos needed committing or pushing. That proved harder.

Checking for uncommitted files, I decided it would be easiest stick with the existing git status --porcelain -uno. Which was sort of true. git-python lets you call git commands, for cases where the Python bindings aren't quite up to snuff yet, but it doesn't handle all cases. I could call:

    output = repo.git.status(porcelain=True)
but I never did find a way to pass the -uno; I tried u=False, u=None, and u="no" but none of them worked. But -uno actually isn't that important so I decided to do without it.

I found out later that there's another way to call the git command, using execute, which lets you pass the exact arguments you'd pass on the command line. It didn't work to call for-each-ref the way I'd called repo.git.status (repo.git.for_each_ref isn't defined), but I could call it this way:

    foreachref = repo.git.execute(['git', 'for-each-ref',
                                   '--format="%(refname:short) %(push:track)"',
                                   'refs/heads'])
and then parse the output looking for "[ahead]". That worked, but ... ick. I wanted to figure out how to do that using Python.

It's easy to get a ref (branch) and its corresponding tracking ref (remote branch). ref.log() gives you a list of commits on each of the two branches, ordered from earliest to most recent, the opposite of git log. In the simple case, then, what I needed was to iterate backward over the two commit logs, looking for the most recent SHA that's common to both. The Python builtin reversed was useful here:

    for i, entry in enumerate(reversed(ref.log())):
        for j, upstream_entry in enumerate(reversed(upstream.log())):
            if entry.newhexsha == upstream_entry.newhexsha:
                return i, j

(i, j) are the number of commits on the local branch that the remote hasn't seen, and vice versa. If i is zero, or if there's nothing in ref.log(), then the repo has no new commits and doesn't need pushing.

Making branches track a remote

The last thing I needed to do was to make branches track their remotes. Too many times, I've found myself on the laptop, ready to work, and discovered that I didn't have the latest code because I'd been working on a branch on my home machine, and my git pull hadn't pulled the info for the branch because that branch wasn't in the laptop's repo yet. That's what got me started on this whole "update everything" script in the first place.

If you have a ref for the local branch and a ref for the remote branch, you can verify their ref.name is the same, and if the local branch has the same name but isn't tracking the remote branch, probably something went wrong with the local repo (like one of my earlier attempts to get branches in sync, and it's an easy fix: ref.set_tracking_branch(remoteref).

But what if the local branch doesn't exist yet? That's the situation I cared about most, when I've been working on a new branch and it's not on the laptop yet, but I'm going to want to work on it while traveling. And that turned out to be difficult, maybe impossible, to do in git-python.

It's easy to create a new local branch: repo.head.create(repo, name). But that branch gets created as a copy of master, and if you try to turn it into a copy of the remote branch, you get conflicts because the branch is ahead of the remote branch you're trying to copy, or vice versa. You really need to create the new branch as a copy of the remote branch it's supposed to be tracking.

If you search the git-python documentation for ref.create, there are references to "For more documentation, please see the Head.create method." Head.create takes a reference argument (the basic ref.create doesn't, though the documentation suggests it should). But how can you call Head.create? I had no luck with attempts like repo.git.Head.create(repo, name, reference=remotebranches[name]).

I finally gave up and went back to calling the command line from git-python.

repo.git.checkout(remotebranchname, b=name)
I'm not entirely happy with that, but it seems to work.

I'm sure there are all sorts of problems left to solve. But this script does a much better job than any git command I've found of listing the branches in my repositories, checking for modifications that require commits or pushes, and making local branches to mirror new branches on the server. And maybe with time the git-python bindings will improve, and eventually I'll be able to create new tracking branches locally without needing the command line.

The final script, such as it is: gitbranchsync.py.

Tags: , ,
[ 14:39 Aug 05, 2017    More programming | permalink to this entry | ]

Mon, 23 Jan 2017

Testing a GitHub Pull Request

Several times recently I've come across someone with a useful fix to a program on GitHub, for which they'd filed a GitHub pull request.

The problem is that GitHub doesn't give you any link on the pull request to let you download the code in that pull request. You can get a list of the checkins inside it, or a list of the changed files so you can view the differences graphically. But if you want the code on your own computer, so you can test it, or use your own editors and diff tools to inspect it, it's not obvious how. That this is a problem is easily seen with a web search for something like download github pull request -- there are huge numbers of people asking how, and most of the answers are vague unclear.

That's a shame, because it turns out it's easy to pull a pull request. You can fetch it directly with git into a new branch as long as you have the pull request ID. That's the ID shown on the GitHub pull request page:

[GitHub pull request screenshot]

Once you have the pull request ID, choose a new name for your branch, then fetch it:

git fetch origin pull/PULL-REQUEST_ID/head:NEW-BRANCH-NAME
git checkout NEW-BRANCH-NAME

Then you can view diffs with something like git difftool NEW-BRANCH-NAME..master

Easy! GitHub should give a hint of that on its pull request pages.

Fetching a Pull Request diff to apply it to another tree

But shortly after I learned how to apply a pull request, I had a related but different problem in another project. There was a pull request for an older repository, but the part it applied to had since been split off into a separate project. (It was an old pull request that had fallen through the cracks, and as a new developer on the project, I wanted to see if I could help test it in the new repository.)

You can't pull a pull request that's for a whole different repository. But what you can do is go to the pull request's page on GitHub. There are 3 tabs: Conversation, Commits, and Files changed. Click on Files changed to see the diffs visually.

That works if the changes are small and only affect a few files (which fortunately was the case this time). It's not so great if there are a lot of changes or a lot of files affected. I couldn't find any "Raw" or "download" button that would give me a diff I could actually apply. You can select all and then paste the diffs into a local file, but you have to do that separately for each file affected. It might be, if you have a lot of files, that the best solution is to check out the original repo, apply the pull request, generate a diff locally with git diff, then apply that diff to the new repo. Rather circuitous. But with any luck that situation won't arise very often.

Update: thanks very much to Houz for the solution! (In the comments, below.) Just append .diff or .patch to the pull request URL, e.g. https://github.com/OWNER/REPO/pull/REQUEST-ID.diff which you can view in a browser or fetch with wget or curl.

Tags: , ,
[ 14:34 Jan 23, 2017    More programming | permalink to this entry | ]

Tue, 05 Apr 2016

Modifying a git repo so you can pull without a password

There's been a discussion in the GIMP community about setting up git repos to host contributed assets like scripts, plug-ins and brushes, to replace the long-stagnant GIMP Plug-in Repository. One of the suggestions involves having lots of tiny git repos rather than one that holds all the assets.

That got me to thinking about one annoyance I always have when setting up a new git repository on github: the repository is initially configured with an ssh URL, so I can push to it; but that means I can't pull from the repo without typing my ssh password (more accurately, the password to my ssh key).

Fortunately, there's a way to fix that: a git configuration can have one url for pulling source, and a different pushurl for pushing changes.

These are defined in the file .git/config inside each repository. So edit that file and take a look at the [remote "origin"] section.

For instance, in the GIMP source repositories, hosted on git.gnome.org, instead of the default of url = ssh://git.gnome.org/git/gimp I can set

pushurl = ssh://git.gnome.org/git/gimp
url = git://git.gnome.org/gimp
(disclaimer: I'm not sure this is still correct; my gnome git access stopped working -- I think it was during the Heartbleed security fire drill, or one of those -- and never got fixed.)

For GitHub the syntax is a little different. When I initially set up a repository, the url comes out something like url = git@github.com:username/reponame.git (sometimes the git@ part isn't included), and the password-free pull URL is something you can get from github's website. So you'll end up with something like this:

pushurl = git@github.com:username/reponame.git
url = https://github.com/username/reponame.git

Automating it

That's helpful, and I've made that change on all of my repos. But I just forked another repo on github, and as I went to edit .git/config I remembered what a pain this had been to do en masse on all my repos; and how it would be a much bigger pain to do it on a gazillion tiny GIMP asset repos if they end up going with that model and I ever want to help with the development. It's just the thing that should be scriptable.

However, the rules for what constitutes a valid git passwordless pull URL, and what constitutes a valid ssh writable URL, seem to encompass a lot of territory. So the quickie Python script I whipped up to modify .git/config doesn't claim to handle everything; it only handles the URLs I've encountered personally on Gnome and GitHub. Still, that should be useful if I ever have to add multiple repos at once. The script: repo-pullpush (yes, I know it's a terrible name) on GitHub.

Tags: , , ,
[ 12:28 Apr 05, 2016    More programming | permalink to this entry | ]

Fri, 04 Feb 2011

Quick tip: Disabling version control in Emacs

For some time I've been mildly annoyed that whenever I start emacs and open a file that's under any sort of version control -- cvs, svn, git or whatever -- I can't start editing right away, because emacs has to pause for a while and load a bunch of version-control cruft I never use. Sometimes it also causes problems later, when I try to write to the file or if I update the directory.

It wasn't obvious what keywords to search for, but I finally found a combination, emacs prevent OR disable autoload vc (the vc was the important part), which led me to the solution (found on this page):

;; Disable all version control
(setq vc-handled-backends nil)

Files load much faster now!

Tags: , , ,
[ 13:11 Feb 04, 2011    More linux/editors | permalink to this entry | ]

Tue, 02 Feb 2010

Configuring git colors

I spent a morning wrestling with git after writing a minor GIMP fix that I wanted to check in. Deceptively simple ideas, like "Check the git log to see the expected format of check-in messages", turned out to be easier said than done.

Part of the problem was git's default colors: colors calculated to be invisible to anyone using a terminal with dark text on a light background. And that sent me down the perilous path of git configuration.

git-config does have a manual page. But it lacks detail: you can't get from there to knowing what to change so that the first line of commits in git log doesn't show up yellow.

But that's okay, thought I: all I need to do is list the default settings, then change anything that's a light color like yellow to a darker color. Easy, right?

Well, no. It turns out there's no way to get the default settings -- because they aren't part of git's config; they're hardwired into the C code.

But you can find most of them with a seach for GIT_COLOR in the source. The most useful lines are these the ones in diff.c, builtin-branch.c and wt-status.c.

gitconfig

The next step is to translate those C lines to git preferences, something you can put in a .gitconfig. Here's a list of all the colors mentioned in the man page, and their default values -- I used "normal" for grep and interactive where I wasn't sure of the defaults.

[color "diff"]
	plain = normal
	meta = bold
	frag = cyan
	old = red
	new = green
	commit = yellow
	whitespace = normal red
[color "branch"]
	current = green
	local = normal
	remote = red
	plain = normal
[color "status"]
	header = normal
	added = red
	updated = green
	changed = red
	untracked = red
	nobranch = red
[color "grep"]
	match = normal
[color "interactive"]
	prompt = normal
	header = normal
	help = normal
	error = normal

The syntax and colors are fairly clearly explained in the manual: allowable colors are normal, black, red, green, yellow, blue, magenta, cyan and white. After the foreground color, you can optionally list a background color. You can also list an attribute, chosen from bold, dim, ul, blink and reverse -- only one at a time, no combining of attributes.

So if you really wanted to, you could say something like

[color "status"]
	header = normal blink
	added = magenta yellow
	updated = green reverse
	changed = red bold
	untracked = blue white
	nobranch = red white bold

Minimal changes for light backgrounds

What's the minimum you need to get everything readable? On the light grey background I use, I needed to change the yellow, cyan and green entries:

[color "diff"]
	frag = cyan
	new = green
	commit = yellow
[color "branch"]
	current = green
[color "status"]
	updated = green

Disclaimer: I haven't tested all these settings -- because I haven't yet figured out where all of them apply. That's another area where the manual is a bit short on detail ...

Tags: , ,
[ 23:26 Feb 02, 2010    More programming | permalink to this entry | ]