Shallow Thoughts : : programming

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Tue, 23 May 2017

Python help from the shell -- greppable and saveable

I'm working on a project involving PyQt5 (on which, more later). One of the problems is that there's not much online documentation, and it's hard to find out details like what signals (events) each widget offers.

Like most Python packages, there is inline help in the source, which means that in the Python console you can say something like

>>> from PyQt5.QtWebEngineWidgets import QWebEngineView
>>> help(QWebEngineView)
The problem is that it's ordered alphabetically; if you want a list of signals, you need to read through all the objects and methods the class offers to look for a few one-liners that include "unbound PYQT_SIGNAL".

If only there was a way to take help(CLASSNAME) and pipe it through grep!

A web search revealed that plenty of other people have wished for this, but I didn't see any solutions. But when I tried running python -c "help(list)" it worked fine -- help isn't dependent on the console.

That means that you should be able to do something like

python -c "from sys import exit; help(exit)"

Sure enough, that worked too.

From there it was only a matter of setting up a zsh function to save on complicated typing. I set up separate aliases for python2, python3 and whatever the default python is. You can get help on builtins (pythonhelp list) or on objects in modules (pythonhelp sys.exit). The zsh suffixes :r (remove extension) and :e (extension) came in handy for separating the module name, before the last dot, and the class name, after the dot.

#############################################################
# Python help functions. Get help on a Python class in a
# format that can be piped through grep, redirected to a file, etc.
# Usage: pythonhelp [module.]class [module.]class ...
pythonXhelp() {
    python=$1
    shift
    for f in $*; do
        if [[ $f =~ '.*\..*' ]]; then
            module=$f:r
            obj=$f:e
            s="from ${module} import ${obj}; help($obj)"
        else
            module=''
            obj=$f
            s="help($obj)"
        fi
        $python -c $s
    done
}
alias pythonhelp="pythonXhelp python"
alias python2help="pythonXhelp python2"
alias python3help="pythonXhelp python3"

So now I can type

python3help PyQt5.QtWebEngineWidgets.QWebEngineView | grep PYQT_SIGNAL
and get that list of signals I wanted.

Tags: , ,
[ 14:12 May 23, 2017    More programming | permalink to this entry | comments ]

Thu, 06 Apr 2017

Clicking through a translucent window: using X11 input shapes

It happened again: someone sent me a JPEG file with an image of a topo map, with a hiking trail and interesting stopping points drawn on it. Better than nothing. But what I really want on a hike is GPX waypoints that I can load into OsmAnd, so I can see whether I'm still on the trail and how to get to each point from where I am now.

My PyTopo program lets you view the coordinates of any point, so you can make a waypoint from that. But for adding lots of waypoints, that's too much work, so I added an "Add Waypoint" context menu item -- that was easy, took maybe twenty minutes. PyTopo already had the ability to save its existing tracks and waypoints as a GPX file, so no problem there.

[transparent image viewer overlayed on top of topo map] But how do you locate the waypoints you want? You can do it the hard way: show the JPEG in one window, PyTopo in the other, and do the "let's see the road bends left then right, and the point is off to the northwest just above the right bend and about two and a half times as far away as the distance through both road bends". Ugh. It takes forever and it's terribly inaccurate.

More than once, I've wished for a way to put up a translucent image overlay that would let me click through it. So I could see the image, line it up with the map in PyTopo (resizing as needed), then click exactly where I wanted waypoints.

I needed two features beyond what normal image viewers offer: translucency, and the ability to pass mouse clicks through to the window underneath.

A translucent image viewer, in Python

The first part, translucency, turned out to be trivial. In a class inheriting from my Python ImageViewerWindow, I just needed to add this line to the constructor:

    self.set_opacity(.5)

Plus one more step. The window was translucent now, but it didn't look translucent, because I'm running a simple window manager (Openbox) that doesn't have a compositor built in. Turns out you can run a compositor on top of Openbox. There are lots of compositors; the first one I found, which worked fine, was xcompmgr -c -t-6 -l-6 -o.1

The -c specifies client-side compositing. -t and -l specify top and left offsets for window shadows (negative so they go on the bottom right). -o.1 sets the opacity of window shadows. In the long run, -o0 is probably best (no shadows at all) since the shadow interferes a bit with seeing the window under the translucent one. But having a subtle .1 shadow was useful while I was debugging.

That's all I needed: voilà, translucent windows. Now on to the (much) harder part.

A click-through window, in C

X11 has something called the SHAPE extension, which I experimented with once before to make a silly program called moonroot. It's also used for the familiar "xeyes" program. It's used to make windows that aren't square, by passing a shape mask telling X what shape you want your window to be. In theory, I knew I could do something like make a mask where every other pixel was transparent, which would simulate a translucent image, and I'd at least be able to pass clicks through on half the pixels.

But fortunately, first I asked the estimable Openbox guru Mikael Magnusson, who tipped me off that the SHAPE extension also allows for an "input shape" that does exactly what I wanted: lets you catch events on only part of the window and pass them through on the rest, regardless of which parts of the window are visible.

Knowing that was great. Making it work was another matter. Input shapes turn out to be something hardly anyone uses, and there's very little documentation.

In both C and Python, I struggled with drawing onto a pixmap and using it to set the input shape. Finally I realized that there's a call to set the input shape from an X region. It's much easier to build a region out of rectangles than to draw onto a pixmap.

I got a C demo working first. The essence of it was this:

    if (!XShapeQueryExtension(dpy, &shape_event_base, &shape_error_base)) {
        printf("No SHAPE extension\n");
        return;
    }

    /* Make a shaped window, a rectangle smaller than the total
     * size of the window. The rest will be transparent.
     */
    region = CreateRegion(outerBound, outerBound,
                          XWinSize-outerBound*2, YWinSize-outerBound*2);
    XShapeCombineRegion(dpy, win, ShapeBounding, 0, 0, region, ShapeSet);
    XDestroyRegion(region);

    /* Make a frame region.
     * So in the outer frame, we get input, but inside it, it passes through.
     */
    region = CreateFrameRegion(innerBound);
    XShapeCombineRegion(dpy, win, ShapeInput, 0, 0, region, ShapeSet);
    XDestroyRegion(region);

CreateRegion sets up rectangle boundaries, then creates a region from those boundaries:

Region CreateRegion(int x, int y, int w, int h) {
    Region region = XCreateRegion();
    XRectangle rectangle;
    rectangle.x = x;
    rectangle.y = y;
    rectangle.width = w;
    rectangle.height = h;
    XUnionRectWithRegion(&rectangle, region, region);

    return region;
}

CreateFrameRegion() is similar but a little longer. Rather than post it all here, I've created a GIST: transregion.c, demonstrating X11 shaped input.

Next problem: once I had shaped input working, I could no longer move or resize the window, because the window manager passed events through the window's titlebar and decorations as well as through the rest of the window. That's why you'll see that CreateFrameRegion call in the gist: -- I had a theory that if I omitted the outer part of the window from the input shape, and handled input normally around the outside, maybe that would extend to the window manager decorations. But the problem turned out to be a minor Openbox bug, which Mikael quickly tracked down (in openbox/frame.c, in the XShapeCombineRectangles call on line 321, change ShapeBounding to kind). Openbox developers are the greatest!

Input Shapes in Python

Okay, now I had a proof of concept: X input shapes definitely can work, at least in C. How about in Python?

There's a set of python-xlib bindings, and they even supports the SHAPE extension, but they have no documentation and didn't seem to include input shapes. I filed a GitHub issue and traded a few notes with the maintainer of the project. It turned out the newest version of python-xlib had been completely rewritten, and supposedly does support input shapes. But the API is completely different from the C API, and after wasting about half a day tweaking the demo program trying to reverse engineer it, I gave up.

Fortunately, it turns out there's a much easier way. Python-gtk has shape support, even including input shapes. And if you use regions instead of pixmaps, it's this simple:

    if self.is_composited():
        region = gtk.gdk.region_rectangle(gtk.gdk.Rectangle(0, 0, 1, 1))
        self.window.input_shape_combine_region(region, 0, 0)

My transimageviewer.py came out nice and simple, inheriting from imageviewer.py and adding only translucency and the input shape.

If you want to define an input shape based on pixmaps instead of regions, it's a bit harder and you need to use the Cairo drawing API. I never got as far as working code, but I believe it should go something like this:

    # Warning: untested code!
    bitmap = gtk.gdk.Pixmap(None, self.width, self.height, 1)
    cr = bitmap.cairo_create()
    # Draw a white circle in a black rect:
    cr.rectangle(0, 0, self.width, self.height)
    cr.set_operator(cairo.OPERATOR_CLEAR)
    cr.fill();

    # draw white filled circle
    cr.arc(self.width / 2, self.height / 2, self.width / 4,
           0, 2 * math.pi);
    cr.set_operator(cairo.OPERATOR_OVER);
    cr.fill();

    self.window.input_shape_combine_mask(bitmap, 0, 0)

The translucent image viewer worked just as I'd hoped. I was able to take a JPG of a trailmap, overlay it on top of a PyTopo window, scale the JPG using the normal Openbox window manager handles, then right-click on top of trail markers to set waypoints. When I was done, a "Save as GPX" in PyTopo and I had a file ready to take with me on my phone.

Tags: , , ,
[ 17:08 Apr 06, 2017    More programming | permalink to this entry | comments ]

Sat, 25 Mar 2017

Reading keypresses in Python

As part of preparation for Everyone Does IT, I was working on a silly hack to my Python script that plays notes and chords: I wanted to use the computer keyboard like a music keyboard, and play different notes when I press different keys. Obviously, in a case like that I don't want line buffering -- I want the program to play notes as soon as I press a key, not wait until I hit Enter and then play the whole line at once. In Unix that's called "cbreak mode".

There are a few ways to do this in Python. The most straightforward way is to use the curses library, which is designed for console based user interfaces and games. But importing curses is overkill just to do key reading.

Years ago, I found a guide on the official Python Library and Extension FAQ: Python: How do I get a single keypress at a time?. I'd even used it once, for a one-off Raspberry Pi project that I didn't end up using much. I hadn't done much testing of it at the time, but trying it now, I found a big problem: it doesn't block.

Blocking is whether the read() waits for input or returns immediately. If I read a character with c = sys.stdin.read(1) but there's been no character typed yet, a non-blocking read will throw an IOError exception, while a blocking read will wait, not returning until the user types a character.

In the code on that Python FAQ page, blocking looks like it should be optional. This line:

fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)
is the part that requests non-blocking reads. Skipping that should let me read characters one at a time, block until each character is typed. But in practice, it doesn't work. If I omit the O_NONBLOCK flag, reads never return, not even if I hit Enter; if I set O_NONBLOCK, the read immediately raises an IOError. So I have to call read() over and over, spinning the CPU at 100% while I wait for the user to type something.

The way this is supposed to work is documented in the termios man page. Part of what tcgetattr returns is something called the cc structure, which includes two members called Vmin and Vtime. man termios is very clear on how they're supposed to work: for blocking, single character reads, you set Vmin to 1 (that's the number of characters you want it to batch up before returning), and Vtime to 0 (return immediately after getting that one character). But setting them in Python with tcsetattr doesn't make any difference.

(Python also has a module called tty that's supposed to simplify this stuff, and you should be able to call tty.setcbreak(fd). But that didn't work any better than termios: I suspect it just calls termios under the hood.)

But after a few hours of fiddling and googling, I realized that even if Python's termios can't block, there are other ways of blocking on input. The select system call lets you wait on any file descriptor until has input. So I should be able to set stdin to be non-blocking, then do my own blocking by waiting for it with select.

And that worked. Here's a minimal example:

import sys, os
import termios, fcntl
import select

fd = sys.stdin.fileno()
newattr = termios.tcgetattr(fd)
newattr[3] = newattr[3] & ~termios.ICANON
newattr[3] = newattr[3] & ~termios.ECHO
termios.tcsetattr(fd, termios.TCSANOW, newattr)

oldterm = termios.tcgetattr(fd)
oldflags = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)

print "Type some stuff"
while True:
    inp, outp, err = select.select([sys.stdin], [], [])
    c = sys.stdin.read()
    if c == 'q':
        break
    print "-", c

# Reset the terminal:
termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags)

A less minimal example: keyreader.py, a class to read characters, with blocking and echo optional. It also cleans up after itself on exit, though most of the time that seems to happen automatically when I exit the Python script.

Tags: ,
[ 12:42 Mar 25, 2017    More programming | permalink to this entry | comments ]

Fri, 24 Feb 2017

Coder Dojo: Kids Teaching Themselves Programming

We have a terrific new program going on at Los Alamos Makers: a weekly Coder Dojo for kids, 6-7 on Tuesday nights.

Coder Dojo is a worldwide movement, and our local dojo is based on their ideas. Kids work on programming projects to earn colored USB wristbelts, with the requirements for belts getting progressively harder. Volunteer mentors are on hand to help, but we're not lecturing or teaching, just coaching.

Despite not much advertising, word has gotten around and we typically have 5-7 kids on Dojo nights, enough that all the makerspace's Raspberry Pi workstations are filled and we sometimes have to scrounge for more machines for the kids who don't bring their own laptops.

A fun moment early on came when we had a mentor meeting, and Neil, our head organizer (who deserves most of the credit for making this program work so well), looked around and said "One thing that might be good at some point is to get more men involved." Sure enough -- he was the only man in the room! For whatever reason, most of the programmers who have gotten involved have been women. A refreshing change from the usual programming group. (Come to think of it, the PEEC web development team is three women. A girl could get a skewed idea of gender demographics, living here.) The kids who come to program are about 40% girls.

I wondered at the beginning how it would work, with no lectures or formal programs. Would the kids just sit passively, waiting to be spoon fed? How would they get concepts like loops and conditionals and functions without someone actively teaching them?

It wasn't a problem. A few kids have some prior programming practice, and they help the others. Kids as young as 9 with no previous programming experience walk it, sit down at a Raspberry Pi station, and after five minutes of being shown how to bring up a Python console and use Python's turtle graphics module to draw a line and turn a corner, they're happily typing away, experimenting and making Python draw great colorful shapes.

Python-turtle turns out to be a wonderful way for beginners to learn. It's easy to get started, it makes pretty pictures, and yet, since it's Python, it's not just training wheels: kids are using a real programming language from the start, and they can search the web and find lots of helpful examples when they're trying to figure out how to do something new (just like professional programmers do. :-)

Initially we set easy requirements for the first (white) belt: attend for three weeks, learn the names of other Dojo members. We didn't require any actual programming until the second (yellow) belt, which required writing a program with two of three elements: a conditional, a loop, a function.

That plan went out the window at the end of the first evening, when two kids had already fulfilled the yellow belt requirements ... even though they were still two weeks away from the attendance requirement for the white belt. One of them had never programmed before. We've since scrapped the attendance belt, and now the white belt has the conditional/loop/function requirement that used to be the yellow belt.

The program has been going for a bit over three months now. We've awarded lots of white belts and a handful of yellows (three new ones just this week). Although most of the kids are working in Python, there are also several playing music or running LED strips using Arduino/C++, writing games and web pages in Javascript, writing adventure games Scratch, or just working through Khan Academy lectures.

When someone is ready for a belt, they present their program to everyone in the room and people ask questions about it: what does that line do? Which part of the program does that? How did you figure out that part? Then the mentors review the code over the next week, and they get the belt the following week.

For all but the first belt, helping newer members is a requirement, though I suspect even without that they'd be helping each other. Sit a first-timer next to someone who's typing away at a Python program and watch the magic happen. Sometimes it feels almost superfluous being a mentor. We chat with the kids and each other, work on our own projects, shoulder-surf, and wait for someone to ask for help with harder problems.

Overall, a terrific program, and our only problems now are getting funding for more belts and more workstations as the word spreads and our Dojo nights get more crowded. I've had several adults ask me if there was a comparable program for adults. Maybe some day (I hope).

Tags: ,
[ 13:46 Feb 24, 2017    More programming | permalink to this entry | comments ]

Mon, 23 Jan 2017

Testing a GitHub Pull Request

Several times recently I've come across someone with a useful fix to a program on GitHub, for which they'd filed a GitHub pull request.

The problem is that GitHub doesn't give you any link on the pull request to let you download the code in that pull request. You can get a list of the checkins inside it, or a list of the changed files so you can view the differences graphically. But if you want the code on your own computer, so you can test it, or use your own editors and diff tools to inspect it, it's not obvious how. That this is a problem is easily seen with a web search for something like download github pull request -- there are huge numbers of people asking how, and most of the answers are vague unclear.

That's a shame, because it turns out it's easy to pull a pull request. You can fetch it directly with git into a new branch as long as you have the pull request ID. That's the ID shown on the GitHub pull request page:

[GitHub pull request screenshot]

Once you have the pull request ID, choose a new name for your branch, then fetch it:

git fetch origin pull/PULL-REQUEST_ID/head:NEW-BRANCH-NAME
git checkout NEW-BRANCH-NAME

Then you can view diffs with something like git difftool NEW-BRANCH-NAME..master

Easy! GitHub should give a hint of that on its pull request pages.

Fetching a Pull Request diff to apply it to another tree

But shortly after I learned how to apply a pull request, I had a related but different problem in another project. There was a pull request for an older repository, but the part it applied to had since been split off into a separate project. (It was an old pull request that had fallen through the cracks, and as a new developer on the project, I wanted to see if I could help test it in the new repository.)

You can't pull a pull request that's for a whole different repository. But what you can do is go to the pull request's page on GitHub. There are 3 tabs: Conversation, Commits, and Files changed. Click on Files changed to see the diffs visually.

That works if the changes are small and only affect a few files (which fortunately was the case this time). It's not so great if there are a lot of changes or a lot of files affected. I couldn't find any "Raw" or "download" button that would give me a diff I could actually apply. You can select all and then paste the diffs into a local file, but you have to do that separately for each file affected. It might be, if you have a lot of files, that the best solution is to check out the original repo, apply the pull request, generate a diff locally with git diff, then apply that diff to the new repo. Rather circuitous. But with any luck that situation won't arise very often.

Update: thanks very much to Houz for the solution! (In the comments, below.) Just append .diff or .patch to the pull request URL, e.g. https://github.com/OWNER/REPO/pull/REQUEST-ID.diff which you can view in a browser or fetch with wget or curl.

Tags: , ,
[ 14:34 Jan 23, 2017    More programming | permalink to this entry | comments ]

Thu, 19 Jan 2017

Plotting Shapes with Python Basemap wwithout Shapefiles

In my article on Plotting election (and other county-level) data with Python Basemap, I used ESRI shapefiles for both states and counties.

But one of the election data files I found, OpenDataSoft's USA 2016 Presidential Election by county had embedded county shapes, available either as CSV or as GeoJSON. (I used the CSV version, but inside the CSV the geo data are encoded as JSON so you'll need JSON decoding either way. But that's no problem.)

Just about all the documentation I found on coloring shapes in Basemap assumed that the shapes were defined as ESRI shapefiles. How do you draw shapes if you have latitude/longitude data in a more open format?

As it turns out, it's quite easy, but it took a fair amount of poking around inside Basemap to figure out how it worked.

In the loop over counties in the US in the previous article, the end goal was to create a matplotlib Polygon and use that to add a Basemap patch. But matplotlib's Polygon wants map coordinates, not latitude/longitude.

If m is your basemap (i.e. you created the map with m = Basemap( ... ), you can translate coordinates like this:

    (mapx, mapy) = m(longitude, latitude)

So once you have a region as a list of (longitude, latitude) coordinate pairs, you can create a colored, shaped patch like this:

    for coord_pair in region:
        coord_pair[0], coord_pair[1] = m(coord_pair[0], coord_pair[1])
    poly = Polygon(region, facecolor=color, edgecolor=color)
    ax.add_patch(poly)

Working with the OpenDataSoft data file was actually a little harder than that, because the list of coordinates was JSON-encoded inside the CSV file, so I had to decode it with json.loads(county["Geo Shape"]). Once decoded, it had some counties as a Polygonlist of lists (allowing for discontiguous outlines), and others as a MultiPolygonlist of list of lists (I'm not sure why, since the Polygon format already allows for discontiguous boundaries)

[Blue-red-purple 2016 election map]

And a few counties were missing, so there were blanks on the map, which show up as white patches in this screenshot. The counties missing data either have inconsistent formatting in their coordinate lists, or they have only one coordinate pair, and they include Washington, Virginia; Roane, Tennessee; Schley, Georgia; Terrell, Georgia; Marshall, Alabama; Williamsburg, Virginia; and Pike Georgia; plus Oglala Lakota (which is clearly meant to be Oglala, South Dakota), and all of Alaska.

One thing about crunching data files from the internet is that there are always a few special cases you have to code around. And I could have gotten those coordinates from the census shapefiles; but as long as I needed the census shapefile anyway, why use the CSV shapes at all? In this particular case, it makes more sense to use the shapefiles from the Census.

Still, I'm glad to have learned how to use arbitrary coordinates as shapes, freeing me from the proprietary and annoying ESRI shapefile format.

The code: Blue-red map using CSV with embedded county shapes

Tags: , , , , ,
[ 09:36 Jan 19, 2017    More programming | permalink to this entry | comments ]

Sat, 14 Jan 2017

Plotting election (and other county-level) data with Python Basemap

After my arduous search for open 2016 election data by county, as a first test I wanted one of those red-blue-purple charts of how Democratic or Republican each county's vote was.

I used the Basemap package for plotting. It used to be part of matplotlib, but it's been split off into its own toolkit, grouped under mpl_toolkits: on Debian, it's available as python-mpltoolkits.basemap, or you can find Basemap on GitHub.

It's easiest to start with the fillstates.py example that shows how to draw a US map with different states colored differently. You'll need the three shapefiles (because of ESRI's silly shapefile format): st99_d00.dbf, st99_d00.shp and st99_d00.shx, available in the same examples directory.

Of course, to plot counties, you need county shapefiles as well. The US Census has county shapefiles at several different resolutions (I used the 500k version). Then you can plot state and counties outlines like this:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

def draw_us_map():
    # Set the lower left and upper right limits of the bounding box:
    lllon = -119
    urlon = -64
    lllat = 22.0
    urlat = 50.5
    # and calculate a centerpoint, needed for the projection:
    centerlon = float(lllon + urlon) / 2.0
    centerlat = float(lllat + urlat) / 2.0

    m = Basemap(resolution='i',  # crude, low, intermediate, high, full
                llcrnrlon = lllon, urcrnrlon = urlon,
                lon_0 = centerlon,
                llcrnrlat = lllat, urcrnrlat = urlat,
                lat_0 = centerlat,
                projection='tmerc')

    # Read state boundaries.
    shp_info = m.readshapefile('st99_d00', 'states',
                               drawbounds=True, color='lightgrey')

    # Read county boundaries
    shp_info = m.readshapefile('cb_2015_us_county_500k',
                               'counties',
                               drawbounds=True)

if __name__ == "__main__":
    draw_us_map()
    plt.title('US Counties')
    # Get rid of some of the extraneous whitespace matplotlib loves to use.
    plt.tight_layout(pad=0, w_pad=0, h_pad=0)
    plt.show()
[Simple map of US county borders]

Accessing the state and county data after reading shapefiles

Great. Now that we've plotted all the states and counties, how do we get a list of them, so that when I read out "Santa Clara, CA" from the data I'm trying to plot, I know which map object to color?

After calling readshapefile('st99_d00', 'states'), m has two new members, both lists: m.states and m.states_info.

m.states_info[] is a list of dicts mirroring what was in the shapefile. For the Census state list, the useful keys are NAME, AREA, and PERIMETER. There's also STATE, which is an integer (not restricted to 1 through 50) but I'll get to that.

If you want the shape for, say, California, iterate through m.states_info[] looking for the one where m.states_info[i]["NAME"] == "California". Note i; the shape coordinates will be in m.states[i]n (in basemap map coordinates, not latitude/longitude).

Correlating states and counties in Census shapefiles

County data is similar, with county names in m.counties_info[i]["NAME"]. Remember that STATE integer? Each county has a STATEFP, m.counties_info[i]["STATEFP"] that matches some state's m.states_info[i]["STATE"].

But doing that search every time would be slow. So right after calling readshapefile for the states, I make a table of states. Empirically, STATE in the state list goes up to 72. Why 72? Shrug.

    MAXSTATEFP = 73
    states = [None] * MAXSTATEFP
    for state in m.states_info:
        statefp = int(state["STATE"])
        # Many states have multiple entries in m.states (because of islands).
        # Only add it once.
        if not states[statefp]:
            states[statefp] = state["NAME"]

That'll make it easy to look up a county's state name quickly when we're looping through all the counties.

Calculating colors for each county

Time to figure out the colors from the Deleetdk election results CSV file. Reading lines from the CSV file into a dictionary is superficially easy enough:

    fp = open("tidy_data.csv")
    reader = csv.DictReader(fp)

    # Make a dictionary of all "county, state" and their colors.
    county_colors = {}
    for county in reader:
        # What color is this county?
        pop = float(county["votes"])
        blue = float(county["results.clintonh"])/pop
        red = float(county["Total.Population"])/pop
        county_colors["%s, %s" % (county["name"], county["State"])] \
            = (red, 0, blue)

But in practice, that wasn't good enough, because the county names in the Deleetdk names didn't always match the official Census county names.

Fuzzy matches

For instance, the CSV file had no results for Alaska or Puerto Rico, so I had to skip those. Non-ASCII characters were a problem: "Doña Ana" county in the census data was "Dona Ana" in the CSV. I had to strip off " County", " Borough" and similar terms: "St Louis" in the census data was "St. Louis County" in the CSV. Some names were capitalized differently, like PLYMOUTH vs. Plymouth, or Lac Qui Parle vs. Lac qui Parle. And some names were just different, like "Jeff Davis" vs. "Jefferson Davis".

To get around that I used SequenceMatcher to look for fuzzy matches when I couldn't find an exact match:

def fuzzy_find(s, slist):
    '''Try to find a fuzzy match for s in slist.
    '''
    best_ratio = -1
    best_match = None

    ls = s.lower()
    for ss in slist:
        r = SequenceMatcher(None, ls, ss.lower()).ratio()
        if r > best_ratio:
            best_ratio = r
            best_match = ss
    if best_ratio > .75:
        return best_match
    return None

Correlate the county names from the two datasets

It's finally time to loop through the counties in the map to color and plot them.

Remember STATE vs. STATEFP? It turns out there are a few counties in the census county shapefile with a STATEFP that doesn't match any STATE in the state shapefile. Mostly they're in the Virgin Islands and I don't have election data for them anyway, so I skipped them for now. I also skipped Puerto Rico and Alaska (no results in the election data) and counties that had no corresponding state: I'll omit that code here, but you can see it in the final script, linked at the end.

    for i, county in enumerate(m.counties_info):
        countyname = county["NAME"]
        try:
            statename = states[int(county["STATEFP"])]
        except IndexError:
            print countyname, "has out-of-index statefp of", county["STATEFP"]
            continue

        countystate = "%s, %s" % (countyname, statename)
        try:
            ccolor = county_colors[countystate]
        except KeyError:
            # No exact match; try for a fuzzy match
            fuzzyname = fuzzy_find(countystate, county_colors.keys())
            if fuzzyname:
                ccolor = county_colors[fuzzyname]
                county_colors[countystate] = ccolor
            else:
                print "No match for", countystate
                continue

        countyseg = m.counties[i]
        poly = Polygon(countyseg, facecolor=ccolor)  # edgecolor="white"
        ax.add_patch(poly)

Moving Hawaii

Finally, although the CSV didn't have results for Alaska, it did have Hawaii. To display it, you can move it when creating the patches:

    countyseg = m.counties[i]
    if statename == 'Hawaii':
        countyseg = list(map(lambda (x,y): (x + 5750000, y-1400000), countyseg))
    poly = Polygon(countyseg, facecolor=countycolor)
    ax.add_patch(poly)
The offsets are in map coordinates and are empirical; I fiddled with them until Hawaii showed up at a reasonable place. [Blue-red-purple 2016 election map]

Well, that was a much longer article than I intended. Turns out it takes a fair amount of code to correlate several datasets and turn them into a map. But a lot of the work will be applicable to other datasets.

Full script on GitHub: Blue-red map using Census county shapefile

Tags: , , , , , ,
[ 15:10 Jan 14, 2017    More programming | permalink to this entry | comments ]

Thu, 22 Dec 2016

Tips on Developing Python Projects for PyPI

I wrote two recent articles on Python packaging: Distributing Python Packages Part I: Creating a Python Package and Distributing Python Packages Part II: Submitting to PyPI. I was able to get a couple of my programs packaged and submitted.

Ongoing Development and Testing

But then I realized all was not quite right. I could install new releases of my package -- but I couldn't run it from the source directory any more. How could I test changes without needing to rebuild the package for every little change I made?

Fortunately, it turned out to be fairly easy. Set PYTHONPATH to a directory that includes all the modules you normally want to test. For example, inside my bin directory I have a python directory where I can symlink any development modules I might need:

mkdir ~/bin/python
ln -s ~/src/metapho/metapho ~/bin/python/

Then add the directory at the beginning of PYTHONPATH:

export PYTHONPATH=$HOME/bin/python

With that, I could test from the development directory again, without needing to rebuild and install a package every time.

Cleaning up files used in building

Building a package leaves some extra files and directories around, and git status will whine at you since they're not version controlled. Of course, you could gitignore them, but it's better to clean them up after you no longer need them.

To do that, you can add a clean command to setup.py.

from setuptools import Command

class CleanCommand(Command):
    """Custom clean command to tidy up the project root."""
    user_options = []
    def initialize_options(self):
        pass
    def finalize_options(self):
        pass
    def run(self):
        os.system('rm -vrf ./build ./dist ./*.pyc ./*.tgz ./*.egg-info ./docs/sphinxdoc/_build')
(Obviously, that includes file types beyond what you need for just cleaning up after package building. Adjust the list as needed.)

Then in the setup() function, add these lines:

      cmdclass={
          'clean': CleanCommand,
      }

Now you can type

python setup.py clean
and it will remove all the extra files.

Keeping version strings in sync

It's so easy to update the __version__ string in your module and forget that you also have to do it in setup.py, or vice versa. Much better to make sure they're always in sync.

I found several version of that using system("grep..."), but I decided to write my own that doesn't depend on system(). (Yes, I should do the same thing with that CleanCommand, I know.)

def get_version():
    '''Read the pytopo module versions from pytopo/__init__.py'''
    with open("pytopo/__init__.py") as fp:
        for line in fp:
            line = line.strip()
            if line.startswith("__version__"):
                parts = line.split("=")
                if len(parts) > 1:
                    return parts[1].strip()

Then in setup():

      version=get_version(),

Much better! Now you only have to update __version__ inside your module and setup.py will automatically use it.

Using your README for a package long description

setup has a long_description for the package, but you probably already have some sort of README in your package. You can use it for your long description this way:

# Utility function to read the README file.
# Used for the long_description.
def read(fname):
    return open(os.path.join(os.path.dirname(__file__), fname)).read()
    long_description=read('README'),

Tags: , ,
[ 10:15 Dec 22, 2016    More programming | permalink to this entry | comments ]