Shallow Thoughts : : programming

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sun, 23 Sep 2018

Writing Solar System Simulations with NAIF SPICE and SpiceyPy

Someone asked me about my Javascript Jupiter code, and whether it used PyEphem. It doesn't, of course, because it's Javascript, not Python (I wish there was something as easy as PyEphem for Javascript!); instead it uses code from the book Astronomical Formulae for Calculators by Jean Meeus. (His better known Astronomical Algorithms, intended for computers rather than calculators, is actually harder to use for programming because Astronomical Algorithms is written for BASIC and the algorithms are relatively hard to translate into other languages, whereas Astronomical Formulae for Calculators concentrates on explaining the algorithms clearly, so you can punch them into a calculator by hand, and this ends up making it fairly easy to implement them in a modern computer language as well.)

Anyway, the person asking also mentioned JPL's page HORIZONS Ephemerides page, which I've certainly found useful at times. Years ago, I tried emailing the site maintainer asking if they might consider releasing the code as open source; it seemed like a reasonable request, given that it came from a government agency and didn't involve anything secret. But I never got an answer.

[SpiceyPy example: Cassini's position] But going to that page today, I find that code is now available! What's available is a massive toolkit called SPICE (it's all in capitals but there's no indication what it might stand for. It comes from NAIF, which is NASA's Navigation and Ancillary Information Facility).

SPICE allows for accurate calculations of all sorts of solar system quantities, from the basic solar system bodies like planets to all of NASA's active and historical public missions. It has bindings for quite a few languages, including C. The official list doesn't include Python, but there's a third-party Python wrapper called SpiceyPy that works fine.

The tricky part of programming with SPICE is that most of the code is hidden away in "kernels" that are specific to the objects and quantities you're calculating. For any given program you'll probably need to download at least four "kernels", maybe more. That wouldn't be a problem except that there's not much help for figuring out which kernels you need and then finding them. There are lots of SPICE examples online but few of them tell you which kernels they need, let alone where to find them.

After wrestling with some of the examples, I learned some tricks for finding kernels, at least enough to get the basic examples working. I've collected what I've learned so far into a new GitHub repository: NAIF SPICE Examples. The README there explains what I know so far about getting kernels; as I learn more, I'll update it.

SPICE isn't easy to use, but it's probably much more accurate than simpler code like PyEphem or my Meeus-based Javascript code, and it can calculate so many more objects. It's definitely something worth knowing about for anyone doing solar system simulations.

Tags: , ,
[ 16:43 Sep 23, 2018    More programming | permalink to this entry | comments ]

Thu, 24 May 2018

Google Maps API No Longer Free?

A while ago I wrote an interactive trail map page for the PEEC nature center website. At the time, I wanted to use an open library, like OpenLayers or Leaflet; but there were no good sources of satellite/aerial map tiles at the time. The only one I found didn't work because they had a big blank area anywhere near LANL -- maybe because of the restricted airspace around the Lab. Anyway, I figured people would want a satellite option, so I used Google Maps instead despite its much more frustrating API.

This week we've been working on converting the website to https. Most things went surprisingly smoothly (though we had a lot more absolute URLs in our pages and databases than we'd realized). But when we got through, I discovered the trail map was broken. I'm still not clear why, but somehow the change from http to https made Google's API stop working. In trying to fix the problem, I discovered that Google's map API may soon cease to be free:

New pricing and product changes will go into effect starting June 11, 2018. For more information, check out the Guide for Existing Users.

That has a button for "Transition Tool" which, when you click it, won't tell you anything about the new pricing structure until you've already set up a billing account. Um ... no thanks, Google.

Googling for google maps api billing led to a page headed "Pricing that scales to fit your needs", which has an elaborate pricing structure listing a whole bnch of variants (I have no idea which of these I was using), of which the first $200/month is free. But since they insist on setting up a billing account, I'd probably have to give them a credit card number -- which one? My personal credit card, for a page that isn't even on my site? Does the nonprofit nature center even have a credit card? How many of these API calls is their site likely to get in a month, and what are the chances of going over the limit?

It all rubbed me the wrong way, especially when the context of "Your trail maps page that real people actually use has broken without warning, and will be held hostage until you give usa credit card number". This is what one gets for using a supposedly free (as in beer) library that's not Free open source software.

So I replaced Google with the excellent open source Leaflet library, which, as a bonus, has much better documentation than Google Maps. (It's not that Google's documentation is poorly written; it's that they keep changing their APIs, but there's no way to tell the dozen or so different APIs apart because they're all just called "Maps", so when you search for documentation you're almost guaranteed to get something that stopped working six years ago -- but the documentation is still there making it look like it's still valid.) And I was happy to discover that, in the time since I originally set up the trailmap page, some open providers of aerial/satellite map tiles have appeared. So we can use open source and have a satellite view.

Our trail map is back online with Leaflet, and with any luck, this time it will keep working. PEEC Los Alamos Area Trail Map.

Tags: , , ,
[ 16:13 May 24, 2018    More programming | permalink to this entry | comments ]

Mon, 14 May 2018

Plotting the Jet Stream, or Other Winds, with ECMWF Data

I've been trying to learn more about weather from a friend who used to work in the field -- in particular, New Mexico's notoriously windy spring. One of the reasons behind our spring winds relates to the location of the jet stream. But I couldn't find many good references showing how the jet stream moves throughout the year. So I decided to try to plot it myself -- if I could find the data. Getting weather data can surprisingly hard.

In my search, I stumbled across Geert Barentsen's excellent Annual variations in the jet stream (video). It wasn't quite what I wanted -- it shows the position of the jet stream in December in successive years -- but the important thing is that he provides a Python script on GitHub that shows how he produced his beautiful animation.

[Sample jet steam image]

Well -- mostly. It turns out his data sources are no longer available, and he didn't go into a lot of detail on where he got his data, only saying that it was from the ECMWF ERA re-analysis model (with a link that's now 404). That led me on a merry chase through the ECMWF website trying to figure out which part of which database I needed. ECMWF has lots of publically available databases (and even more) and they even have Python libraries to access them; and they even have a lot of documentation, but somehow none of the documentation addresses questions like which database includes which variables and how to find and fetch the data you're after, and a lot of the sample code doesn't actually work. I ended up using the "ERA Interim, Daily" dataset and requesting data for only specific times and only the variables and pressure levels I was interested in. It's a great source of data once you figure out how to request it.

Sign up for an ECMWF API Key

Access ECMWF Public Datasets (there's also Access MARS and I'm not sure what the difference is), which has links you can click on to register for an API key.

Once you get the email with your initial password, log in using the URL in the email, and change the password. That gave me a "next" button that, when I clicked it, took me to a page warning me that the page was obsolete and I should update whatever bookmark I had used to get there. That page also doesn't offer a link to the new page where you can get your key details, so go here: Your API key. The API Key page gives you some lines you can paste into ~/.ecmwfapirc.

You'll also have to accept the license terms for the databases you want to use.

Install the Python API

That sets you up to use the ECMWF api. They have a Web API and a Python library, plus some other Python packages, but after struggling with a bunch of Magics tutorial examples that mostly crashed or couldn't find data, I decided I was better off sticking to the basic Python downloader API and plotting the results with Matplotlib.

The Python data-fetching API works well. To install it, activate your preferred Python virtualenv or whatever you use for pip packages, then run the pip command shown at Web API Downloads (under "Click here to see the installation/update instructions..."). As always with pip packages, you'll have to decide on a Python version (they support both 2 and 3) and whether to use a virtualenv, the much-disrecommended sudo pip, pip3, etc. I used pip3 in a virtualenv and it worked fine.

Specify a dataset and parameters

That's great, but how do you know which dataset you want to load?

There doesn't seem to be anything that just lists which datasets have which variables. The only way I found is to go to the Web API page for a particular dataset to see the form where you can request different variables. For instance, I ended up using the "interim-full-daily" database, where you can choose date ranges and lists of parameters. There are more choices in the sidebar: for instance, clicking on "Pressure levels" lets you choose from a list of barometric pressures ranging from 1000 all the way down to 1. No units are specified, but they're millibars, also known as hectoPascals (hPa): 1000 is more or less the pressure at ground level, 250 is roughly where the jet stream is, and Los Alamos is roughly at 775 hPa (you can find charts of pressure vs. altitude on the web).

When you go to any of the Web API pages, it will show you a dialog suggesting you read about Data retrieval efficiency, which you should definitely do if you're expecting to request a lot of data, then click on the details for the database you're using to find out how data is grouped in "tape files". For instance, in the ERA-interim database, tapes are grouped by date, so if you're requesting multiple parameters for multiple months, request all the parameters for a given month together, rather than making one request for level 250, another request for level 1000, etc.

Once you've checked the boxes for the data you want, you can fetch the data via the web interface, or click on "View the MARS request" to get parameters you can plug into a Python script.

If you choose the Python script option as I did, you can start with the basic data retrieval example. Use the second example, the one that uses 'format' : "netcdf", which will (eventually) give you a file ending in .nc.

Requesting a specific area

You can request only a limited area,

"area": "75/-20/10/60",
but they're not very forthcoming on the syntax of that, and it's particularly confusing since "75/-20/10/60" supposedly means "Europe". It's hard to figure how those numbers as longitudes and latitudes correspond to Europe, which doesn't go down to 10 degrees latitude, let alone -20 degrees. The Post-processing keywords page gives more information: it's North/West/South/East, which still makes no sense for Europe, until you expand the Area examples tab on that page and find out that by "Europe" they mean Europe plus Saudi Arabia and most of North Africa.

Using the data: What's in it?

Once you have the data file, assuming you requested data in netcdf format, you can parse the .nc file with the netCDF4 Python module -- available as Debian package "python3-netcdf4", or via pip -- to read that file:

import netCDF4

data = netCDF4.Dataset('filename.nc')

But what's in that Dataset? Try running the preceding two lines in the interactive Python shell, then:

>>> for key in data.variables:
...   print(key)
... 
longitude
latitude
level
time
w
vo
u
v

You can find out more about a parameter, like its units, type, and shape (array dimensions). Let's look at "level":

>>> data['level']
<class 'netCDF4._netCDF4.Variable'>
int32 level(level)
    units: millibars
    long_name: pressure_level
unlimited dimensions: 
current shape = (3,)
filling on, default _FillValue of -2147483647 used

>>> data['level'][:]
array([ 250,  775, 1000], dtype=int32)

>>> type(data['level'][:])
<class 'numpy.ndarray'>

Levels has shape (3,): it's a one-dimensional array with three elements: 250, 775 and 1000. Those are the three levels I requested from the web API and in my Python script). The units are millibars.

More complicated variables

How about something more complicated? u and v are the two components of wind speed.

>>> data['u']
<class 'netCDF4._netCDF4.Variable'>
int16 u(time, level, latitude, longitude)
    scale_factor: 0.002161405503194121
    add_offset: 30.095301438361684
    _FillValue: -32767
    missing_value: -32767
    units: m s**-1
    long_name: U component of wind
    standard_name: eastward_wind
unlimited dimensions: time
current shape = (30, 3, 241, 480)
filling on
u (v is the same) has a shape of (30, 3, 241, 480): it's a 4-dimensional array. Why? Looking at the numbers in the shape gives a clue. The second dimension has 3 rows: they correspond to the three levels, because there's a wind speed at every level. The first dimension has 30 rows: it corresponds to the dates I requested (the month of April 2015). I can verify that:
>>> data['time'].shape
(30,)

Sure enough, there are 30 times, so that's what the first dimension of u and v correspond to. The other dimensions, presumably, are latitude and longitude. Let's check that:

>>> data['longitude'].shape
(480,)
>>> data['latitude'].shape
(241,)

Sure enough! So, although it would be nice if it actually told you which dimension corresponded with which parameter, you can probably figure it out. If you're not sure, print the shapes of all the variables and work out which dimensions correspond to what:

>>> for key in data.variables:
...   print(key, data[key].shape)

Iterating over times

data['time'] has all the times for which you have data (30 data points for my initial test of the days in April 2015). The easiest way to plot anything is to iterate over those values:

    timeunits = JSdata.data['time'].units
    cal = JSdata.data['time'].calendar
    for i, t in enumerate(JSdata.data['time']):
        thedate = netCDF4.num2date(t, units=timeunits, calendar=cal)

Then you can use thedate like a datetime, calling thedate.strftime or whatever you need.

So that's how to access your data. All that's left is to plot it -- and in this case I had Geert Barentsen's script to start with, so I just modified it a little to work with slightly changed data format, and then added some argument parsing and runtime options.

Converting to Video

I already wrote about how to take the still images the program produces and turn them into a video: Making Videos (that work in Firefox) from a Series of Images.

However, it turns out ffmpeg can't handle files that are named with timestamps, like jetstream-2017-06-14-250.png. It can only handle one sequential integer. So I thought, what if I removed the dashes from the name, and used names like jetstream-20170614-250.png with %8d? No dice: ffmpeg also has the limitation that the integer can have at most four digits.

So I had to rename my images. A shell command works: I ran this in zsh but I think it should work in bash too.

cd outdir
mkdir moviedir

i=1
for fil in *.png; do
  newname=$(printf "%04d.png" $i)
  ln -s ../$fil moviedir/$newname
  i=$((i+1))
done

ffmpeg -i moviedir/%4d.png -filter:v "setpts=2.5*PTS" -pix_fmt yuv420p jetstream.mp4
The -filter:v "setpts=2.5*PTS" controls the delay between frames -- I'm not clear on the units, but larger numbers have more delay, and I think it's a multiplier, so this is 2.5 times slower than the default.

When I uploaded the video to YouTube, I got a warning, "Your videos will process faster if you encode into a streamable file format." I then spent half a day trying to find a combination of ffmpeg arguments that avoided that warning, and eventually gave up. As far as I can tell, the warning only affects the 20 seconds or so of processing that happens after the 5-10 minutes it takes to upload the video, so I'm not sure it's terribly important.

Results

Here's a video of the jet stream from 2012 to early 2018, and an earlier effort with a much longer 6.0x delay.

And here's the script, updated from the original Barentsen script and with a bunch of command-line options to let you plot different collections of data: jetstream.py on GitHub.

Tags: , , ,
[ 14:18 May 14, 2018    More programming | permalink to this entry | comments ]

Fri, 27 Apr 2018

Displaying PDF with Python, Qt5 and Poppler

I had a need for a Qt widget that could display PDF. That turned out to be surprisingly hard to do. The Qt Wiki has a page on Handling PDF, which suggests only two alternatives: QtPDF, which is C++ only so I would need to write a wrapper to use it with Python (and then anyone else who used my code would have to compile and install it); or Poppler. Poppler is a common library on Linux, available as a package and used for programs like evince, so that seemed like the best route.

But Python bindings for Poppler are a bit harder to come by. I found a little one-page example using Poppler and Gtk3 via gi.repository ... but in this case I needed it to work with a Qt5 program, and my attempts to translate that example to work with Qt were futile. Poppler's page.render(ctx) takes a Cairo context, and Cairo is apparently a Gtk-centered phenomenon: I couldn't find any way to get a Cairo context from a Qt5 widget, and although I found some web examples suggesting renderToImage(), the Poppler available in gi.repository doesn't have that function.

But it turns out there's another Poppler: popplerqt5, available in the Debian package python3-poppler-qt5. That Poppler does have renderToImage, and you can take that image and paint it in a paint() callback or turn it into a pixmap you can use with a QLabel. Here's the basic sequence:

    document = Poppler.Document.load(filename)
    document.setRenderHint(Poppler.Document.TextAntialiasing)
    page = document.page(pageno)
    img = self.page.renderToImage(dpi, dpi)

    # Use the rendered image as the pixmap for a label:
    pixmap = QPixmap.fromImage(img)
    label.setPixmap(pixmap)

The line to set text antialiasing is not optional. Well, theoretically it's optional; go ahead, try it without that and see for yourself. It's basically unreadable.

Of course, there are plenty of other details to take care of. For instance, you can get the size of the rendered image:

    size = page.pageSize()
... after which you can use size.width() and size.height(). They're in points. There are 72 points per inch, so calculate accordingly in the dpi values you pass to renderToImage if you're targeting a specific DPI or need it to fit in a specific window size.

Window Resize and Efficient Rendering

Speaking of fitting to a window size, I wanted to resize the content whenever the window was resized, which meant redefining resizeEvent(self, event) on the widget. Initially my PDFWidget inherited from Qwidget with a custom paintEvent(), like this:

        # Create self.img once, early on:
        self.img = self.page.renderToImage(self.dpi, self.dpi)

    def paintEvent(self, event):
        qp = QPainter()
        qp.begin(self)
        qp.drawImage(QPoint(0, 0), self.img)
        qp.end()
(Poppler also has a function page.renderToPainter(), but I never did figure out how to get it to do anything useful.)

That worked, but when I added resizeEvent I got an infinite loop: paintEvent() called resizeEvent() which triggered another paintEvent(), ad infinitum. I couldn't find a way around that (GTK has similar problems -- seems like nearly everything you do generates another expose event -- but there you can temporarily disable expose events while you're drawing). So I rewrote my PDFWidget class to inherit from QLabel instead of QWidget, converted the QImage to a QPixmap and passed it to self.setPixmap(). That let me get rid of the paintEvent() function entirely and let QLabel handle the painting, which is probably more efficient anyway.

Showing all pages in a scrolled widget

renderToImage gives you one image corresponding to one page of the PDF document. More often, you'll want to see the whole document laid out, with all the pages. So you need a way to stack a bunch of widgets vertically, one for each page. You can do that with a QVBoxLayout on a widget inside a QScrollArea.

I haven't done much Qt5 programming, so I wasn't familiar with how these QVBoxes work. Most toolkits I've worked with have a VBox container widget to which you add child widgets, but in Qt5, you create a widget (no particular type -- a QWidget is enough), then create a layout object that modifies the widget, and add the sub-widgets to the layout object. There isn't much documentation for any of this, and very few examples of doing it in Python, so it took some fiddling to get it working.

Initial Window Size

One last thing: Qt5 doesn't seem to have a concept of desired initial window size. Most of the examples I found, especially the ones that use a .ui file, use setGeometry(); but that requires an (X, Y) position as well as (width, height), and there's no way to tell it to ignore the position. That means that instead of letting your window manager place the window according to your preferences, the window will insist on showing up at whatever arbitrary place you set in the code. Worse, most of the Qt5 examples I found online set the geometry to (0, 0): when I tried that, the window came up with the widget in the upper left corner of the screen and the window's titlebar hidden above the top of the screen, so there's no way to move the window to a better location unless you happen to know your window manager's hidden key binding for that. (Hint: on many Linux window managers, hold Alt down and drag anywhere in the window to move it. If that doesn't work, try holding down the "Windows" key instead of Alt.)

This may explain why I've been seeing an increasing number of these ill-behaved programs that come up with their titlebars offscreen. But if you want your programs to be better behaved, it works to self.resize(width, height) a widget when you first create it.

The current incarnation of my PDF viewer, set up as a module so you can import it and use it in other programs, is at qpdfview.py on GitHub.

Tags: , ,
[ 19:01 Apr 27, 2018    More programming | permalink to this entry | comments ]

Sun, 24 Dec 2017

Saving a transparent PNG image from Cairo, in Python

Dave and I will be giving a planetarium talk in February on the analemma and related matters.

Our planetarium, which runs a fiddly and rather limited program called Nightshade, has no way of showing the analemma. Or at least, after trying for nearly a week once, I couldn't find a way. But it can show images, and since I once wrote a Python program to plot the analemma, I figured I could use my program to generate the analemmas I wanted to show and then project them as images onto the planetarium dome.

[analemma simulation] But naturally, I wanted to project just the analemma and associated labels; I didn't want the blue background to cover up the stars the planetarium shows. So I couldn't just use a simple screenshot; I needed a way to get my GTK app to create a transparent image such as a PNG.

That turns out to be hard. GTK can't do it (either GTK2 or GTK3), and people wanting to do anything with transparency are nudged toward the Cairo library. As a first step, I updated my analemma program to use Cairo and GTK3 via gi.repository. Then I dove into Cairo.

I found one C solution for converting an existing Cairo surface to a PNG, but I didn't have much luck with it. But I did find a Python program that draws to a PNG without bothering to create a GUI. I could use that.

The important part of that program is where it creates a new Cairo "surface", and then creates a "context" for that surface:

surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, *imagesize)

cr = cairo.Context(surface)

A Cairo surface is like a canvas to draw on, and it knows how to save itself to a PNG image. A context is the equivalent of a GC in X11 programming: it knows about the current color, font and so forth. So the trick is to create a new surface, create a context, then draw everything all over again with the new context and surface.

A Cairo widget will already have a function to draw everything (in my case, the analemma and all its labels), with this signature:

    def draw(self, widget, ctx):

It already allows passing the context in, so passing in a different context is no problem. I added an argument specifying the background color and transparency, so I could use a blue background in the user interface but a transparent background for the PNG image:

    def draw(self, widget, ctx, background=None):

I also had a minor hitch: in draw(), I was saving the context as self.ctx rather than passing it around to every draw routine. That means calling it with the saved image's context would overwrite the one used for the GUI window. So I save it first.

Here's the final image saving code:

   def save_image(self, outfile):
        dst_surface = cairo.ImageSurface(cairo.FORMAT_ARGB32,
                                         self.width, self.height)

        dst_ctx = cairo.Context(dst_surface)

        # draw() will overwrite self.ctx, so save it first:
        save_ctx = self.ctx

        # Draw everything again to the new context,
        # with a transparent instead of an opaque background:
        self.draw(None, dst_ctx, (0, 0, 1, 0))  # transparent blue

        # Restore the GUI context:
        self.ctx = save_ctx

        dst_surface.write_to_png("example.png")
        print("Saved to", outfile)

Tags: , , , , ,
[ 19:39 Dec 24, 2017    More programming | permalink to this entry | comments ]

Sat, 05 Aug 2017

Keeping Git Branches in Sync

I do most of my coding on my home machine. But when I travel (or sit in boring meetings), sometimes I do a little hacking on my laptop. Most of my code is hosted in GitHub repos, so when I travel, I like to update all the repos on the laptop to make sure I have what I need even when I'm offline.

That works great as long as I don't make branches. I have a variable $myrepos that lists all the github repositories where I want to contribute, and with a little shell alias it's easy enough to update them all:

allgit() {
    pushd ~
    foreach repo ($myrepos)
        echo $repo :
        cd ~/src/$repo
        git pull
    end
    popd
}

That works well enough -- as long as you don't use branches.

Git's branch model seems to be that branches are for local development, and aren't meant to be shared, pushed, or synchronized among machines. It's ridiculously difficult in git to do something like, "for all branches on the remote server, make sure I have that branch and it's in sync with the server." When you create branches, they don't push to the server by default, and it's remarkably difficult to figure out which of your branches is actually tracking a branch on the server.

A web search finds plenty of people asking, and most of the Git experts answering say things like "Just check out the branch, then pull." In other words, if you want to work on a branch, you'd better know before you go offline exactly which branches in which repositories might have been created or updated since the last time you worked in that repository on that machine. I guess that works if you only ever work on one project in one repo and only on one or two branches at a time. It certainly doesn't work if you need to update lots of repos on a laptop for the first time in two weeks.

Further web searching does find a few possibilities. For checking whether there are files modified that need to be committed, git status --porcelain -uno works well. For checking whether changes are committed but not pushed, git for-each-ref --format="%(refname:short) %(push:track)" refs/heads | fgrep '[ahead' works ... if you make an alias so you never have to look at it.

Figuring out whether branches are tracking remotes is a lot harder. I found some recommendations like git branch -r | grep -v '\->' | while read remote; do git branch --track "${remote#origin/}" "$remote"; done and for remote in `git branch -r`; do git branch --track ${remote#origin/} $remote; done but neither of them really did what I wanted. I was chasing down the rabbit hole of writing shell loops using variables like

  localbranches=("${(@f)$(git branch | sed 's/..//')}")
  remotebranches=("${(@f)$(git branch -a | grep remotes | grep -v HEAD | grep -v master | sed 's_remotes/origin/__' | sed 's/..//')}")
when I thought, there must be a better way. Maybe using Python bindings?

git-python

In Debian, the available packages for Git Python bindings are python-git, python-pygit2, and python-dulwich. Nobody on #python seemed to like any of them, but based on quick attempts with all three, python-git seemed the most straightforward. Confusingly, though Debian calls it python-git, it's called "git-python" in its docs or in web searches, and it's "import git" when you use it.

It's pretty straightforward to use, at least for simple things. You can create a Repo object with

from git import Repo
repo = Repo('.')
and then you can get lists like repo.heads (local branches), repo.refs (local and remote branches and other refs such as tags), etc. Once you have a ref, you can use ref.name, check whether it's tracking a remote branch with ref.tracking_branch(), and make it track one with ref.set_tracking_branch(remoteref). That makes it very easy to get a list of branches showing which ones are tracking a remote branch, something that had proved almost impossible with the git command line.

Nice. But now I wanted more: I wanted to replace those baroque git status --porcelain and git for-each-ref commands I had been using to check whether my repos needed committing or pushing. That proved harder.

Checking for uncommitted files, I decided it would be easiest stick with the existing git status --porcelain -uno. Which was sort of true. git-python lets you call git commands, for cases where the Python bindings aren't quite up to snuff yet, but it doesn't handle all cases. I could call:

    output = repo.git.status(porcelain=True)
but I never did find a way to pass the -uno; I tried u=False, u=None, and u="no" but none of them worked. But -uno actually isn't that important so I decided to do without it.

I found out later that there's another way to call the git command, using execute, which lets you pass the exact arguments you'd pass on the command line. It didn't work to call for-each-ref the way I'd called repo.git.status (repo.git.for_each_ref isn't defined), but I could call it this way:

    foreachref = repo.git.execute(['git', 'for-each-ref',
                                   '--format="%(refname:short) %(push:track)"',
                                   'refs/heads'])
and then parse the output looking for "[ahead]". That worked, but ... ick. I wanted to figure out how to do that using Python.

It's easy to get a ref (branch) and its corresponding tracking ref (remote branch). ref.log() gives you a list of commits on each of the two branches, ordered from earliest to most recent, the opposite of git log. In the simple case, then, what I needed was to iterate backward over the two commit logs, looking for the most recent SHA that's common to both. The Python builtin reversed was useful here:

    for i, entry in enumerate(reversed(ref.log())):
        for j, upstream_entry in enumerate(reversed(upstream.log())):
            if entry.newhexsha == upstream_entry.newhexsha:
                return i, j

(i, j) are the number of commits on the local branch that the remote hasn't seen, and vice versa. If i is zero, or if there's nothing in ref.log(), then the repo has no new commits and doesn't need pushing.

Making branches track a remote

The last thing I needed to do was to make branches track their remotes. Too many times, I've found myself on the laptop, ready to work, and discovered that I didn't have the latest code because I'd been working on a branch on my home machine, and my git pull hadn't pulled the info for the branch because that branch wasn't in the laptop's repo yet. That's what got me started on this whole "update everything" script in the first place.

If you have a ref for the local branch and a ref for the remote branch, you can verify their ref.name is the same, and if the local branch has the same name but isn't tracking the remote branch, probably something went wrong with the local repo (like one of my earlier attempts to get branches in sync, and it's an easy fix: ref.set_tracking_branch(remoteref).

But what if the local branch doesn't exist yet? That's the situation I cared about most, when I've been working on a new branch and it's not on the laptop yet, but I'm going to want to work on it while traveling. And that turned out to be difficult, maybe impossible, to do in git-python.

It's easy to create a new local branch: repo.head.create(repo, name). But that branch gets created as a copy of master, and if you try to turn it into a copy of the remote branch, you get conflicts because the branch is ahead of the remote branch you're trying to copy, or vice versa. You really need to create the new branch as a copy of the remote branch it's supposed to be tracking.

If you search the git-python documentation for ref.create, there are references to "For more documentation, please see the Head.create method." Head.create takes a reference argument (the basic ref.create doesn't, though the documentation suggests it should). But how can you call Head.create? I had no luck with attempts like repo.git.Head.create(repo, name, reference=remotebranches[name]).

I finally gave up and went back to calling the command line from git-python.

repo.git.checkout(remotebranchname, b=name)
I'm not entirely happy with that, but it seems to work.

I'm sure there are all sorts of problems left to solve. But this script does a much better job than any git command I've found of listing the branches in my repositories, checking for modifications that require commits or pushes, and making local branches to mirror new branches on the server. And maybe with time the git-python bindings will improve, and eventually I'll be able to create new tracking branches locally without needing the command line.

The final script, such as it is: gitbranchsync.py.

Tags: , ,
[ 14:39 Aug 05, 2017    More programming | permalink to this entry | comments ]

Tue, 23 May 2017

Python help from the shell -- greppable and saveable

I'm working on a project involving PyQt5 (on which, more later). One of the problems is that there's not much online documentation, and it's hard to find out details like what signals (events) each widget offers.

Like most Python packages, there is inline help in the source, which means that in the Python console you can say something like

>>> from PyQt5.QtWebEngineWidgets import QWebEngineView
>>> help(QWebEngineView)
The problem is that it's ordered alphabetically; if you want a list of signals, you need to read through all the objects and methods the class offers to look for a few one-liners that include "unbound PYQT_SIGNAL".

If only there was a way to take help(CLASSNAME) and pipe it through grep!

A web search revealed that plenty of other people have wished for this, but I didn't see any solutions. But when I tried running python -c "help(list)" it worked fine -- help isn't dependent on the console.

That means that you should be able to do something like

python -c "from sys import exit; help(exit)"

Sure enough, that worked too.

From there it was only a matter of setting up a zsh function to save on complicated typing. I set up separate aliases for python2, python3 and whatever the default python is. You can get help on builtins (pythonhelp list) or on objects in modules (pythonhelp sys.exit). The zsh suffixes :r (remove extension) and :e (extension) came in handy for separating the module name, before the last dot, and the class name, after the dot.

#############################################################
# Python help functions. Get help on a Python class in a
# format that can be piped through grep, redirected to a file, etc.
# Usage: pythonhelp [module.]class [module.]class ...
pythonXhelp() {
    python=$1
    shift
    for f in $*; do
        if [[ $f =~ '.*\..*' ]]; then
            module=$f:r
            obj=$f:e
            s="from ${module} import ${obj}; help($obj)"
        else
            module=''
            obj=$f
            s="help($obj)"
        fi
        $python -c $s
    done
}
alias pythonhelp="pythonXhelp python"
alias python2help="pythonXhelp python2"
alias python3help="pythonXhelp python3"

So now I can type

python3help PyQt5.QtWebEngineWidgets.QWebEngineView | grep PYQT_SIGNAL
and get that list of signals I wanted.

Tags: , ,
[ 14:12 May 23, 2017    More programming | permalink to this entry | comments ]

Thu, 06 Apr 2017

Clicking through a translucent window: using X11 input shapes

It happened again: someone sent me a JPEG file with an image of a topo map, with a hiking trail and interesting stopping points drawn on it. Better than nothing. But what I really want on a hike is GPX waypoints that I can load into OsmAnd, so I can see whether I'm still on the trail and how to get to each point from where I am now.

My PyTopo program lets you view the coordinates of any point, so you can make a waypoint from that. But for adding lots of waypoints, that's too much work, so I added an "Add Waypoint" context menu item -- that was easy, took maybe twenty minutes. PyTopo already had the ability to save its existing tracks and waypoints as a GPX file, so no problem there.

[transparent image viewer overlayed on top of topo map] But how do you locate the waypoints you want? You can do it the hard way: show the JPEG in one window, PyTopo in the other, and do the "let's see the road bends left then right, and the point is off to the northwest just above the right bend and about two and a half times as far away as the distance through both road bends". Ugh. It takes forever and it's terribly inaccurate.

More than once, I've wished for a way to put up a translucent image overlay that would let me click through it. So I could see the image, line it up with the map in PyTopo (resizing as needed), then click exactly where I wanted waypoints.

I needed two features beyond what normal image viewers offer: translucency, and the ability to pass mouse clicks through to the window underneath.

A translucent image viewer, in Python

The first part, translucency, turned out to be trivial. In a class inheriting from my Python ImageViewerWindow, I just needed to add this line to the constructor:

    self.set_opacity(.5)

Plus one more step. The window was translucent now, but it didn't look translucent, because I'm running a simple window manager (Openbox) that doesn't have a compositor built in. Turns out you can run a compositor on top of Openbox. There are lots of compositors; the first one I found, which worked fine, was xcompmgr -c -t-6 -l-6 -o.1

The -c specifies client-side compositing. -t and -l specify top and left offsets for window shadows (negative so they go on the bottom right). -o.1 sets the opacity of window shadows. In the long run, -o0 is probably best (no shadows at all) since the shadow interferes a bit with seeing the window under the translucent one. But having a subtle .1 shadow was useful while I was debugging.

That's all I needed: voilà, translucent windows. Now on to the (much) harder part.

A click-through window, in C

X11 has something called the SHAPE extension, which I experimented with once before to make a silly program called moonroot. It's also used for the familiar "xeyes" program. It's used to make windows that aren't square, by passing a shape mask telling X what shape you want your window to be. In theory, I knew I could do something like make a mask where every other pixel was transparent, which would simulate a translucent image, and I'd at least be able to pass clicks through on half the pixels.

But fortunately, first I asked the estimable Openbox guru Mikael Magnusson, who tipped me off that the SHAPE extension also allows for an "input shape" that does exactly what I wanted: lets you catch events on only part of the window and pass them through on the rest, regardless of which parts of the window are visible.

Knowing that was great. Making it work was another matter. Input shapes turn out to be something hardly anyone uses, and there's very little documentation.

In both C and Python, I struggled with drawing onto a pixmap and using it to set the input shape. Finally I realized that there's a call to set the input shape from an X region. It's much easier to build a region out of rectangles than to draw onto a pixmap.

I got a C demo working first. The essence of it was this:

    if (!XShapeQueryExtension(dpy, &shape_event_base, &shape_error_base)) {
        printf("No SHAPE extension\n");
        return;
    }

    /* Make a shaped window, a rectangle smaller than the total
     * size of the window. The rest will be transparent.
     */
    region = CreateRegion(outerBound, outerBound,
                          XWinSize-outerBound*2, YWinSize-outerBound*2);
    XShapeCombineRegion(dpy, win, ShapeBounding, 0, 0, region, ShapeSet);
    XDestroyRegion(region);

    /* Make a frame region.
     * So in the outer frame, we get input, but inside it, it passes through.
     */
    region = CreateFrameRegion(innerBound);
    XShapeCombineRegion(dpy, win, ShapeInput, 0, 0, region, ShapeSet);
    XDestroyRegion(region);

CreateRegion sets up rectangle boundaries, then creates a region from those boundaries:

Region CreateRegion(int x, int y, int w, int h) {
    Region region = XCreateRegion();
    XRectangle rectangle;
    rectangle.x = x;
    rectangle.y = y;
    rectangle.width = w;
    rectangle.height = h;
    XUnionRectWithRegion(&rectangle, region, region);

    return region;
}

CreateFrameRegion() is similar but a little longer. Rather than post it all here, I've created a GIST: transregion.c, demonstrating X11 shaped input.

Next problem: once I had shaped input working, I could no longer move or resize the window, because the window manager passed events through the window's titlebar and decorations as well as through the rest of the window. That's why you'll see that CreateFrameRegion call in the gist: -- I had a theory that if I omitted the outer part of the window from the input shape, and handled input normally around the outside, maybe that would extend to the window manager decorations. But the problem turned out to be a minor Openbox bug, which Mikael quickly tracked down (in openbox/frame.c, in the XShapeCombineRectangles call on line 321, change ShapeBounding to kind). Openbox developers are the greatest!

Input Shapes in Python

Okay, now I had a proof of concept: X input shapes definitely can work, at least in C. How about in Python?

There's a set of python-xlib bindings, and they even supports the SHAPE extension, but they have no documentation and didn't seem to include input shapes. I filed a GitHub issue and traded a few notes with the maintainer of the project. It turned out the newest version of python-xlib had been completely rewritten, and supposedly does support input shapes. But the API is completely different from the C API, and after wasting about half a day tweaking the demo program trying to reverse engineer it, I gave up.

Fortunately, it turns out there's a much easier way. Python-gtk has shape support, even including input shapes. And if you use regions instead of pixmaps, it's this simple:

    if self.is_composited():
        region = gtk.gdk.region_rectangle(gtk.gdk.Rectangle(0, 0, 1, 1))
        self.window.input_shape_combine_region(region, 0, 0)

My transimageviewer.py came out nice and simple, inheriting from imageviewer.py and adding only translucency and the input shape.

If you want to define an input shape based on pixmaps instead of regions, it's a bit harder and you need to use the Cairo drawing API. I never got as far as working code, but I believe it should go something like this:

    # Warning: untested code!
    bitmap = gtk.gdk.Pixmap(None, self.width, self.height, 1)
    cr = bitmap.cairo_create()
    # Draw a white circle in a black rect:
    cr.rectangle(0, 0, self.width, self.height)
    cr.set_operator(cairo.OPERATOR_CLEAR)
    cr.fill();

    # draw white filled circle
    cr.arc(self.width / 2, self.height / 2, self.width / 4,
           0, 2 * math.pi);
    cr.set_operator(cairo.OPERATOR_OVER);
    cr.fill();

    self.window.input_shape_combine_mask(bitmap, 0, 0)

The translucent image viewer worked just as I'd hoped. I was able to take a JPG of a trailmap, overlay it on top of a PyTopo window, scale the JPG using the normal Openbox window manager handles, then right-click on top of trail markers to set waypoints. When I was done, a "Save as GPX" in PyTopo and I had a file ready to take with me on my phone.

Tags: , , ,
[ 17:08 Apr 06, 2017    More programming | permalink to this entry | comments ]