Shallow Thoughts : tags : python

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Fri, 13 Sep 2024

How to Move Around More (Being an "Outdoor Person")

I like to think of myself as an outdoor person. I like hiking, mountain biking, astronomy, and generally enjoying the beauty of the world.

Except — let's not kid ourselves here — I'm really more of a computer geek. Without some sort of push, I can easily stay planted on my butt in front of the computer all day — sure, looking out the window and admiring the view (I do a lot of that since we moved to New Mexico), but still sitting indoors in the computer chair.

Earlier this year, the science podcast "Short Wave" played an NPR series called Body Electric that had a lot of interviews with scientists who have studied some aspect of the health benefits of motion versus sitting, and specifically, the idea of getting up and moving around for five minutes every half hour. They challenged listeners to try it, and featured statements from listeners about their improved health and energy levels.

Read more ...

Tags: , , ,
[ 18:55 Sep 13, 2024    More misc | permalink to this entry | ]

Thu, 01 Aug 2024

Fetching OpenStreetMap Details with OSMPythonTools

I was talking to a friend about LANL's proposed new powerline. A lot of people are opposing it because the line would run through the Caja del Rio, an open-space piñon-juniper area adjacent to Santa Fe which is owned by the US Forest Service. The proposed powerline would run from the Caja across the Rio Grande to the Lab. It would carry not just power but also a broadband fiber line, something Los Alamos town, if not the Lab, needs badly. On the other hand, those opposed worry about road-building and habitat destruction in the Caja.

[A bad map showing a proposed route but with no details labeled] I'm always puzzled reading accounts of the debate. There already is a powerline running through the Caja and across the Rio via Powerline Point. The discussions never say (a) whether the proposed line would take a different route, and if so, (b) Why? why can't they just tack on some more lines to the towers along the existing route?

For instance, in the slides from one of the public meetings, the map on slide 9 not only doesn't show the existing powerline, but also uses a basemap that has no borders and NO ROADS. Why would you use a map that doesn't show roads unless you're deliberately trying to confuse people?

Read more ...

Tags: , , , ,
[ 12:14 Aug 01, 2024    More mapping | permalink to this entry | ]

Sat, 04 May 2024

Creating an Image with Wrapped Text using the Python Imaging Library

I stumbled onto the page for this year's Asimov's Magazine Readers' Award Finalists. They offer all the stories right there -- but only as PDF. I prefer reading fiction on my ebook reader (a Kobo Clara with 6" screen), away from the computer. I spend too much time sitting at the computer as it is. But trying to read a PDF on a 6" screen is just painful.

The open-source ebook program Calibre has a command-line program called ebook-convert that can convert some PDF to epub. It did an okay job in this case — except that the PDFs had the wrong author name (they all have the same author, so I'm guessing it's the name of the person who prepared the PDFs for Asimov's), and the wrong title information (or maybe just no title), and ebook-convert compounded that error by generating cover images for each work that had the wrong title and author.

I went through the files and fixed each one's title and author metadata using my epubtag.py Python script. But what about the cover images? I wasn't eager to spend the time GIMPing up a cover image by hand for each of the stories.

Read more ...

Tags: , ,
[ 13:52 May 04, 2024    More programming | permalink to this entry | ]

Thu, 18 Apr 2024

Using Sox Play to Help Learn Guitar

I mentioned last month that I'm learning guitar. It's been going well and I'm having fun. But I've gotten to the point where I sometimes get chords confused: a song is listed as using E major and I play D major instead.

Also, it's important to practice transitions between chords, which is easy when you only know three chords; but with eight or so, I had stopped practicing transitions in general and was only practicing the ones that occur in songs I like to play.

I found myself wishing I had something like flash cards for guitar chords.

Someone must have already written that, right? But I couldn't find anything promising with a web search. And besides, it's more fun to write programs than to flail at unhelpful search engines, and you always end up learning something new.

Read more ...

Tags: , , , ,
[ 20:02 Apr 18, 2024    More linux | permalink to this entry | ]

Sun, 10 Mar 2024

How Common is Easter in March?

"Easter is March 31 this year," my husband said. "I think that's rare, having Easter in March."

"I guess so," I said.

"There's a Cray somewhere that they use to calculate the date each year," he joked.

And of course that made me want to find out if Easters in March really are rare.

Read more ...

Tags: ,
[ 12:49 Mar 10, 2024    More programming | permalink to this entry | ]

Mon, 20 Nov 2023

Pixel 6a Stores the Wrong GPS in Images: an Analysis

[Map of GPS from Pixel 6a photos compared with actual positions]

I've been relying more on my phone for photos I take while hiking, rather than carry a separate camera. The Pixel 6a takes reasonably good photos, if you can put up with the wildly excessive processing Google's camera app does whether you want it or not.

That opens the possibility of GPS tagging photos, so I'd have a good record of where on the trail each photo was taken.

But as it turns out: no. It seems the GPS coordinates the Pixel's camera app records in photos is always wrong, by a significant amount. And, weirdly, this doesn't seem to be something anyone's talking about on the web ... or am I just using the wrong search terms?

Read more ...

Tags: , , , , , , ,
[ 19:09 Nov 20, 2023    More mapping | permalink to this entry | ]

Thu, 28 Sep 2023

Opening a URL in a New Tab of an Existing Browser Window

My search for a good desktop Mastodon client has led me down a path that involved learning some fun ways to interact with existing browser windows on Linux with X programs like xdotool and wmctrl.

Like many people, I've switched from The App Formerly Known As Twitter to Mastodon (where I'm @akkana@fosstodon.org). But the next question was which Mastodon app to use.

Read more ...

Tags: , , , , , ,
[ 11:48 Sep 28, 2023    More linux | permalink to this entry | ]

Fri, 22 Sep 2023

Drag-and-Drop in Python Qt6

I had a need for a window to which I could drag and drop URLs.

I don't use drag-and-drop much, since I prefer using the commandline rather than a file manager and icon-studded desktop. Usually when I need some little utility and can't immediately find what I need, I whip up a little Python script.

This time, it wasn't so easy. Python has a GUI problem (as does open source in general): there are quite a few options, like TkInter, Qt, GTK, WxWidgets and assorted others, and they all have different strengths and (especially) weaknesses.

Drag-and-drop turns out to be something none of them do very well.

Read more ...

Tags: , ,
[ 18:45 Sep 22, 2023    More programming | permalink to this entry | ]

Thu, 07 Sep 2023

Los Alamos Voting Data on a Folium Choropleth Map

Somebody in a group I'm in has commented more than once that White Rock is a hotbed of Republicanism whereas Los Alamos leans Democratic. (For outsiders, our tiny county has two geographically-distinct towns in it, with separate zip codes, though officially they're both part of Los Alamos township which covers all of Los Alamos county. White Rock is about half the size of Los Alamos.)

After I'd heard her say it a couple times, I got curious. Was it true? I asked her for a reference, but she didn't have one. I decided to find out.

Read more ...

Tags: , , ,
[ 11:58 Sep 07, 2023    More programming | permalink to this entry | ]

Sun, 19 Mar 2023

Rsync: Combining Includes and Excludes

I back up my computer to a local disk (well, several redundant local disks) using rsync. (I don't particularly trust cloud providers, and in any case our internet connection is very slow, especially for upload, so waiting hours while the entire contents of my disk uploads isn't appealing.)

To save space and time, I have script that includes a list of files and directories I don't need to back up: browser cache directories, object files, build directories, generated files like thumbnails, large video files, downloaded source, and so on.

I also have a list of files I do want to back up even though they'd otherwise be excluded. For instance, I sometimes have local changes in my GIMP source directory, outsrc/gimp-master/gimp/, even though most of outsrc doesn't need to be backed up. Or /blog/tags/build in my local mirror of the shallowsky website, even though I have a rule that says directories named build shouldn't usually be backed up.

I've been using rsync's --include and --exclude to handle this. But I discovered yesterday that I'd been using them wrong, and some things I thought were getting backed up, weren't. It took some reading and experimenting before I figured out how these rsync flags actually work — which doesn't seem to be well explained anywhere.

Read more ...

Tags: , , ,
[ 16:11 Mar 19, 2023    More linux/cmdline | permalink to this entry | ]

Sun, 18 Dec 2022

View Mail Attachments from Mutt

Back in 2015, I wrote a script for the mutt mailer (or any plaintext mail program, really) to view MS Word documents (or other unfriendly formats) attached to emails. (This is unfortunately something that comes up constantly in email exchanges with League of Women Voters people —

Read more ...

Tags: , , , ,
[ 18:52 Dec 18, 2022    More tech/email | permalink to this entry | ]

Thu, 23 Jun 2022

Clicking through a Translucent Image Window

[transparent image viewer overlayed on top of topo map]

Five years ago, I wrote about Clicking through a translucent window: using X11 input shapes and how I used a translucent image window that allows click-through, positioned on top of PyTopo, to trace an image of an old map and create tracks or waypoints.

But the transimageviewer.py app that I wrote then was based on GTK2, which is now obsolete and has been removed from most Linux distro repositories. So when I found myself wanting GIS to help investigate a growing trail controversy in Pueblo Canyon, I discovered I didn't have a usable click-through image viewer.

Read more ...

Tags: , , , , ,
[ 19:08 Jun 23, 2022    More programming | permalink to this entry | ]

Sun, 12 Dec 2021

Battling Signup Spam on the Bill Tracker

I've spent a lot of the past week battling Russian spammers on the New Mexico Bill Tracker.

The New Mexico legislature just began a special session to define the new voting districts, which happens every 10 years after the census. When new legislative sessions start, the BillTracker usually needs some hand-holding to make sure it's tracking the new session. (I've been working on code to make it notice new sessions automatically, but it's not fully working yet). So when the session started, I checked the log files...

and found them full of Russian spam.

Specifically, what was happening was that a bot was going to my new user registration page and creating new accounts where the username was a paragraph of Cyrillic spam.

Read more ...

Tags: , , , , , ,
[ 18:50 Dec 12, 2021    More tech/web | permalink to this entry | ]

Fri, 03 Dec 2021

Importing Cookies from a Firefox Profile in Python

I wrote at length about my explorations into selenium to fetch stories from the New York Times (as a subscriber). But I mentioned in Part III that there was a much easier way to fetch those stories, as long as the stories didn't need JavaScript.

That way is to use normal file fetching (using urllib or requests), but with a CookieJar object containing the cookies from a Firefox session where I'd logged in.

Read more ...

Tags: , , , ,
[ 12:22 Dec 03, 2021    More programming | permalink to this entry | ]

Sat, 20 Nov 2021

Wikipedia: All Roads Lead to ... Philosophy?

At a recent LUG meeting, we were talking about various uses for web scraping, and someone brought up a Wikipedia game: start on any page, click on the first real link, then repeat on the page that comes up. The claim is that this chain always gets to Wikipedia's page on Philosophy.

We tried a few rounds, and sure enough, every page we tried did eventually get to Philosophy, usually via languages, which goes to communication, goes to discipline, action, intention, mental, thought, idea, philosophy.

It's a perfect game for a discussion of scraping. It should be an easy exercise to write a scraper to do this, right?

Read more ...

Tags: , , , ,
[ 19:31 Nov 20, 2021    More programming | permalink to this entry | ]

Thu, 11 Nov 2021

Selenium: Handling Timeouts and Errors

This is part 3 of my selenium exploration trying to fetch stories from the NY Times ((as a subscriber).

At the end of Part II, selenium was running on a server with the minimal number of X and GTK libraries installed.

But now that it can run unattended, there's nother problem: there are all kinds of ways this can fail, and your script needs to handle those errors somehow.

Before diving in, I should mention that for my original goal, fetching stories from the NY Times as a subscriber, it turned out I didn't need selenium after all. Since handling selenium errors turned out to be so brittle (as I'll describe in this article), I'm now using requests combined with a Python CookieJar. I'll write about that in a future article. Meanwhile ...

Handling Errors and Timeouts

Timeouts are a particular problem with selenium, because there doesn't seem to be any reliable way to change them so the selenium script doesn't hang for ridiculously long periods.

Read more ...

Tags: , , ,
[ 12:07 Nov 11, 2021    More programming | permalink to this entry | ]

Sun, 07 Nov 2021

Configuring Selenium to Run Headless, Without a Desktop

This is part 2 of my selenium exploration trying to fetch stories from the NY Times ((as a subscriber).

When we left off, I was learning the basics of selenium in order to fetch stories (as a subscriber) from the New York Times. Fetching stories was working properly, and all that remained was to put it in an automated script, then move it to a server where it could run automatically without my desktop machine needing to be on.

Unfortunately, that turned out to be the hardest part of the problem.

Read more ...

Tags: , , ,
[ 12:18 Nov 07, 2021    More programming | permalink to this entry | ]

Tue, 02 Nov 2021

Web Scraping with Selenium in Python

This is part 1 of my selenium exploration.

At the New Mexico GNU & Linux User Group, currently meeting virtually on Jitsi, someone expressed interest in scraping websites. Since I do quite a bit of scraping, I offered to give a tutorial on scraping with the Python module BeautifulSoup.

"What about selenium?" he asked. Sorry, I said, I've never needed selenium enough to figure it out.

But then a week later, I found I did have a need.

Read more ...

Tags: , , ,
[ 19:58 Nov 02, 2021    More programming | permalink to this entry | ]

Tue, 30 Mar 2021

Fetching Browser Cookies Programmatically

In my eternal quest for a decent RSS feed for top World/National news, I decided to try subscribing to the New York Times online. But when I went to try to add them to my RSS reader, I discovered it wasn't that easy: their login page sometimes gives a captcha, so you can't just set a username and password in the RSS reader.

A common technique for sites like this is to log in with a browser, then copy the browser's cookies into your news reading program. At least, I thought it was a common technique -- but when I tried a web search, examples were surprisingly hard to find.

None of the techniques to examine or save browser cookies were all that simple, so I ended up writing a browser_cookies.py Python script to extract cookies from chromium and firefox browsers.

Read more ...

Tags: , , , ,
[ 11:19 Mar 30, 2021    More programming | permalink to this entry | ]

Thu, 21 Jan 2021

Track Bills in the 2021 New Mexico Legislative Session

This year's New Mexico Legislative Session started Tuesday. For the last few weeks I've been madly scrambling to make sure the bugs are out of some of the New Mexico Bill Tracker's new features: notably, it now lets you switch between the current session and past sessions, and I cleaned up the caching code that tries to guard against hitting the legislative website too often.

Read more ...

Tags: , , , ,
[ 17:50 Jan 21, 2021    More politics | permalink to this entry | ]

Thu, 16 Jul 2020

Comet C/2020 F3 NEOWISE in the evening sky

[Comet C2020 F3 NEOWISE the morning of 2020-07-16 from White Rock, NM] Comet C/2020 F3 NEOWISE continues to improve, and as of Tuesday night it has moved into the evening sky (while also still being visible in the morning for a few more days).

I caught it Tuesday night at 9:30 pm. The sky was still a bit bright, and although the comet was easy in binoculars, it was a struggle to see it with the unaided eye. However, over the next fifteen minutes the sky darkened, and it looked pretty good by 9:50, considering the partly cloudy sky. I didn't attempt a photograph; this photo is from Sunday morning, in twilight and with a bright moon.

Read more ...

Tags: , , , ,
[ 12:58 Jul 16, 2020    More science/astro | permalink to this entry | ]

Sat, 11 Jul 2020

Comet C2020 F3 NEOWISE in the Morning (and eventually, the evening)

[Comet C2020F3 NEOWISE over California desert landscape, by Dbot3000]
Comet C2020F3 NEOWISE over California desert landscape. Photo by Dbot3000

I've learned not to get excited when I read about a new comet. They're so often a disappointment. That goes double for comets in the morning sky: I need a darned good reason to get up before dawn.

But the chatter among astronomers about the current comet, C2020 F3 NEOWISE, has been different. So when I found myself awake at 4 am, I grabbed some binoculars and went out on the deck to look.

And I was glad I did. NEOWISE is by far the best comet I've seen since Hale-Bopp. Which is not to say it's in Hale-Bopp's class -- certainly not. But it's easily visible to the unaided eye, with a substantial several-degree-long tail. Even in dawn twilight. Even with a bright moon. It's beautiful!

Update: the morning after I wrote that, I did get a photo, though it's not nearly as good as Dbot3000's that's shown here.


Read more ...

Tags: , , , ,
[ 18:18 Jul 11, 2020    More science/astro | permalink to this entry | ]

Fri, 29 May 2020

M is for Merging ("Dissolving") Several Geographic Shapes

[San Juan County Council Districts]
San Juan County Council districts
[San Juan County voting precincts]
San Juan County Council voting precincts

For this year's LWV NM Voter Guides at VOTE411.org, I've been doing a lot of GIS fiddling, since the system needs to know the voting districts for each race.

You would think it would be easy to find GIS for voting districts — surely that's public information? — but counties and the state are remarkably resistant to giving out any sort of data (they're happy to give you a PDF or a JPG), so finding the district data takes a lot of searching.

Often, when we finally manage to get GIS info, it isn't for what we want. For instance, for San Juan County, there's a file that claims to be County Commission districts (which would look like the image above left), but the shapes in the file are actually voting precincts (above right). A district is made up of multiple precincts; in San Juan, there are 77 precincts making up five districts.

In a case like that, you need some way of combining several shapes (a bunch of precincts) into one (a district).

GIS "Dissolving"

It turns out that the process of coalescing lots of small shapes into a smaller number of larger shapes is unintuitively called "dissolving".

Read more ...

Tags: , ,
[ 17:43 May 29, 2020    More mapping | permalink to this entry | ]

Sun, 01 Mar 2020

Plotting Epicycles

Galen Gisler, our master of Planetarium Tricks, presented something strange and cool in his planetarium show last Friday.

[inner planet orbits from north ecliptic pole, with Venus pentagram] He'd been looking for a way to visualize the "Venus Pentagram", a regularity where Venus' inferior conjunctions -- the point where Venus is approximately between Earth and the Sun -- follow a cycle of five. If you plot the conjunction positions, you'll see a pentagram, and the sixth conjunction will be almost (but not quite) in the same place where the first one was. Supposedly many ancient civilizations supposedly knew about this pattern, though as Galen noted (and I'd also noticed when researching my Stonehenge talk), the evidence is sometimes spotty.

Galen's latest trick: he moved the planetarium's observer location up above the Earth's north ecliptic pole. Then he told the planetarium to looked back at the Earth and lock the observer's position so it moves along with the Earth; then he let the planets move in fast-forward, leaving trails so their motions were plotted.

The result was fascinating to watch. You could see the Venus pentagram easily as it made its five loops toward Earth, and the loops of all the other planets as their distance from Earth changed over the course of both Earth's orbits and theirs.

You can see the patterns they make at right, with the Venus pentagram marked (click on the image for a larger version). Venus' orbit is white, Mercury is yellow, Mars is red. If you're wondering why Venus' orbit seems to go inside Mercury's, remember: this is a geocentric model, so it's plotting distance from Earth, and Venus gets both closer to and farther from Earth than Mercury does.

He said he'd shown this to the high school astronomy club and their reaction was, "My, this is complicated." Indeed. It gives insight into what a difficult problem geocentric astronomers had in trying to model planetary motion, with their epicycles and other corrections.

Of course that made me want one of my own. It's neat to watch it in the planetarium, but you can't do that every day.

So: Python, Gtk/Cairo, and PyEphem. It's pretty simple, really. The goal is to plot planet positions as viewed from high above the north ecliptic pole: so for each time step, for each planet, compute its right ascension and distance (declination doesn't matter) and convert that to rectangular coordinates. Then draw a colored line from the planet's last X, Y position to the new one. Save all the coordinates in case the window needs to redraw.

[planet orbits from north ecliptic pole] At first I tried using Skyfield, the Python library which is supposed to replace PyEphem (written by the same author). But Skyfield, while it's probably more accurate, is much harder to use than PyEphem. It uses SPICE kernels (my blog post on SPICE, some SPICE examples and notes), which means there's no clear documentation or list of which kernels cover what. I tried the kernels mentioned in the Skyfield documentation, and after running for a while the program died with an error saying its model for Jupiter in the de421.bsp kernel wasn't good beyond 2471184.5 (October 9 2053).

Rather than spend half a day searching for other SPICE kernels, I gave up on Skyfield and rewrote the program to use PyEphem, which worked beautifully and amazed me with how much faster it was: I had to rewrite my GTK code to use a timer just to slow it down to where I could see the orbits as they developed!

It's fun to watch; maybe not quite as spacey as Galen's full-dome view in the planetarium, but a lot more convenient. You need Python 3, PyEphem and the usual GTK3 introspection modules; on Debian-based systems I think the python3-gi-cairo package will pull in most of them as dependencies.

Plot your own epicycles: epicycles.py.

Tags: , , ,
[ 13:04 Mar 01, 2020    More science/astro | permalink to this entry | ]

Thu, 27 Feb 2020

Automatic Plant Watering with a Raspberry Pi

[Raspberry Pi automatic plant waterer] An automatic plant watering system is a project that's been on my back burner for years. I'd like to be able to go on vacation and not worry about whatever houseplant I'm fruitlessly nursing at the moment. (I have the opposite of a green thumb -- I have very little luck growing plants -- but I keep trying, and if nothing else, I can make sure lack of watering isn't the problem.)

I've had all the parts sitting around for quite some time, and had tried them all individually, but never seemed to make the time to put them all together. Today's "Raspberry Pi Jam" at Los Alamos Makers seemed like the ideal excuse.

Sensing Soil Moisture

First step: the moisture sensor. I used a little moisture sensor that I found on eBay. It says "YL-38" on it. It has the typical forked thingie you stick into the soil, connected to a little sensor board.

The board has four pins: power, ground, analog and digital outputs. The digital output would be the easiest: there's a potentiometer on the board that you can turn to adjust sensitivity, then you can read the digital output pin directly from the Raspberry Pi.

But I had bigger plans: in addition to watering, I wanted to keep track of how fast the soil dries out, and update a web page so that I could check my plant's status from anywhere. For that, I needed to read the analog pin.

Raspberry Pis don't have a way to read an analog input. (An Arduino would have made this easier, but then reporting to a web page would have been much more difficult.) So I used an ADS1115 16-bit I2sup>C Analog to Digital Converter board from Adafruit, along with Adafruit's ADS1x15 library. It's written for CircuitPython, but it works fine in normal Python on Raspbian.

It's simple to use. Wire power, ground, SDA and SDC to the appropriate Raspberry Pi pins (1, 6, 3 and 5 respectively). Connect the soil sensor's analog output pin with A0 on the ADC. Then

# Initialize the ADC
i2c = busio.I2C(board.SCL, board.SDA)
ads = ADS.ADS1015(i2c)
adc0 = AnalogIn(ads, ADS.P0)

# Read a value
value = adc0.value
voltage = adc0.voltage

With the probe stuck into dry soil, it read around 26,500 value, 3.3 volts. Damp soil was more like 14528, 1.816V. Suspended in water, it was more like 11,000 value, 1.3V.

Driving a Water Pump

The pump also came from eBay. They're under $5; search for terms like "Mini Submersible Water Pump 5V to 12V DC Aquarium Fountain Pump Micro Pump". As far as driving it is concerned, treat it as a motor. Which means you can't drive it directly from a Raspberry Pi pin: they don't generate enough current to run a motor, and you risk damaging the Pi with back-EMF when the motor stops.

[Raspberry Pi automatic plant waterer wiring] Instead, my go-to motor driver for small microcontroller projects is a SN754410 SN754410 H-bridge chip. I've used them before for driving little cars with a Raspberry Pi or with an Arduino. In this case the wiring would be much simpler, because there's only one motor and I only need to drive it in one direction. That means I could hardwire the two motor direction pins, and the only pin I needed to control from the Pi was the PWM motor speed pin. The chip also needs a bunch of ground wires (which it uses as heat sinks), a line to logic voltage (the Pi's 3.3V pin) and motor voltage (since it's such a tiny motor, I'm driving it from the Pi's 5v power pin).

Here's the full wiring diagram.

Driving a single PWM pin is a lot simpler than the dual bidirectional motor controllers I've used in other motor projects.

GPIO.setmode(GPIO.BCM)
GPIO.setup(23, GPIO.OUT)
pump = GPIO.PWM(PUMP_PIN, 50)
pump.start(0)

# Run the motor at 30% for 2 seconds, then stop.
pump.ChangeDutyCycle(30)
time.sleep(2)
pump.ChangeDutyCycle(0)

The rest was just putting together some logic: check the sensor, and if it's too dry, pump some water -- but only a little, then wait a while for the water to soak in -- and repeat. Here's the full plantwater.py script. I haven't added the website part yet, but the basic plant waterer is ready to use -- and ready to demo at tonight's Raspberry Pi Jam.

Tags: , ,
[ 13:50 Feb 27, 2020    More hardware | permalink to this entry | ]

Sat, 08 Feb 2020

Displaying Quotes on a Kiosk -- and Javascript Memory Leaks

The LWV had a 100th anniversary celebration earlier this week. In New Mexico, that included a big celebration at the Roundhouse. One of our members has collected a series of fun facts that she calls "100-Year Minutes". You can see them at lwvnm.org. She asked me if it would be possible to have them displayed somehow during our display at the Roundhouse.

Of course! I said. "Easy, no problem!" I said.

Famous last words.

There are two parts: first, display randomly (or sequentially) chosen quotes with large text in a fullscreen window. Second, set up a computer (the obvious choice is a Raspberry Pi) run the kiosk automatically. This article only covers the first part; I'll write about the Raspberry Pi setup separately.

A Simple Plaintext Kiosk Python Script

When I said "easy" and "no problem", I was imagining writing a little Python program: get text, scale it to the screen, loop. I figured the only hard part would be the scaling. the quotes aren't all the same length, but I want them to be easy to read, so I wanted each quote displayed in the largest font that would let the quote fill the screen.

Indeed, for plaintext it was easy. Using GTK3 in Python, first you set up a PangoCairo layout (Cairo is the way you draw in GTK3, Pango is the font/text rendering library, and a layout is Pango's term for a bunch of text to be rendered). Start with a really big font size, ask PangoCairo how large the layout would render, and if it's so big that it doesn't fit in the available space, reduce the font size and try again. It's not super elegant, but it's easy and it's fast enough. It only took an hour or two for a working script, which you can see at quotekiosk.py.

But some of the quotes had minor HTML formatting. GtkWebkit was orphaned several years ago and was never available for Python 3; the only Python 3 option I know of for displaying HTML is Qt5's QtWebEngine, which is essentially a fully functioning browser window.

Which meant that it seeming made more sense to write the whole kiosk as a web page, with the resizing code in JavaScript. I say "seemingly"; it didn't turn out that way.

JavaScript: Resizing Text to Fit Available Space

The hard part about using JavaScript was the text resizing, since I couldn't use my PangoCairo resizing code.

Much web searching found lots of solutions that resize a single line to fit the width of the screen, plus a lot of hand-waving suggestions that didn't work. I finally found a working solution in a StackOverflow thread: Fit text perfectly inside a div (height and width) without affecting the size of the div. The only one of the three solutions there that actually worked was the jQuery one. It basically does the same thing my original Python script did: check element.scrollHeight and if it overflows, reduce the font size and try again.

I used the jquery version for a little while, but eventually rewrote it to pure javascript so I wouldn't have to keep copying jquery-min.js around.

JS Timers on Slow Machines

There are two types of timers in Javascript: setTimeout, which schedules something to run once N seconds from now, and setInterval, which schedules something to run repeatedly every N seconds. At first I thought I wanted setInterval, since I want the kiosk to keep running, changing its quote every so often.

I coded that, and it worked okay on my laptop, but failed miserably on the Raspberry Pi Zero W. The Pi, even with a lightweight browser like gpreso (let alone chromium), takes so long to load a page and go through the resize-and-check-height loop that by the time it has finally displayed, it's about ready for the timer to fire again. And because it takes longer to scale a big quote than a small one, the longest quotes give you the shortest time to read them.

So I switched to setTimeout instead. Choose a quote (since JavaScript makes it hard to read local files, I used Python to read all the quotes in, turn them into a JSON list and write them out to a file that I included in my JavaScript code), set the text color to the background color so you can't see all the hacky resizing, run the resize loop, set the color back to the foreground color, and only then call setTimeout again:

function newquote() {
    // ... resizing and other slow stuff here

    setTimeout(newquote, 30000);
}

// Display the first page:
newquote();

That worked much better on the Raspberry Pi Zero W, so I added code to resize images in a similar fashion, and added some fancy CSS fade effects that it turned out the Pi was too slow to run, but it looks nice on a modern x86 machine. The full working kiosk code is quotekioska>).

Memory Leaks in JavaScript's innerHTML

I ran it for several hours on my development machine and it looked great. But when I copied it to the Pi, even after I turned off the fades (which looked jerky and terrible on the slow processor), it only ran for ten or fifteen minutes, then crashed. Every time. I tried it in several browsers, but they all crashed after running a while.

The obvious culprit, since it ran fine for a while then crashed, was a memory leak. The next step was to make a minimal test case.

I'm using innerHTML to change the kiosk content, because it's the only way I know of to parse and insert a snippet of HTML that may or may not contain paragraphs and other nodes. This little test page was enough to show the effect:

<h1>innerHTML Leak</h1>

<p id="thecontent">
</p>

<script type="text/javascript">
var i = 0;
function changeContent() {
    var s = "Now we're at number " + i;
    document.getElementById("thecontent").innerHTML = s;
    i += 1;

    setTimeout(changeContent, 2000);
}

changeContent();
</script>

Chromium has a nice performance recording tool that can show you memory leaks. (Firefox doesn't seem to have an equivalent, alas.)

[Chrome performance graph showing innerHTML node leak] To test a leak, go to More Tools > Developer Tools and choose the Performance tab. Load your test page, then click the Record button. Run it for a while, like a couple of minutes, then stop it and you'll see a graph like this (click on the image for a full-size version).

Both the green line, Nodes, and the blue line, JS Heap, are going up. But if you run it for longer, say, ten minutes, the garbage collector eventually runs and the JS Heap line drops back down. The Nodes line never does: the node count just continues going up and up and up no matter how long you run it.

So it looks like that's the culprit: setting innerHTML adds a new node (or several) each time you call it, and those nodes are never garbage collected. No wonder it couldn't run for long on the poor Raspberry Pi Zero with 512Gb RAM (the Pi 3 with 1Gb didn't fare much better).

It's weird that all browsers would have the same memory leak; maybe something about the definition of innerHTML causes it. I'm not enough of a Javascript expert to know, and the experts I was able to find didn't seem to know anything about either why it happened, or how to work around it.

Python html2text

So I gave up on JavaScript and went back to my original Python text kiosk program. After reading in an HTML snippet, I used the Python html2text module to convert the snippet to text, then displayed it. I added image resizing using GdkPixbuf and I was good to go.

quotekiosk.py ran just fine throughout the centennial party, and no one complained about the formatting not being fancy enough. A happy ending, complete with cake and lemonade. But I'm still curious about that JavaScript leak, and whether there's a way to work around it. Anybody know?

Tags: , , ,
[ 18:48 Feb 08, 2020    More tech/web | permalink to this entry | ]

Sat, 01 Feb 2020

Migrate a sqlite3 Flask App to Postgresql

The New Mexico legislature is in session again, which means the New Mexico Bill Tracker I wrote last year is back in season. But I guess the word has gotten out, because this year, I started seeing a few database errors. Specifically, "sqlite3.OperationalError: database is locked".

It turns out that even read queries on an sqlite3 database inside flask and sqlalchemy can sometimes keep the database open indefinitely. Consider something like:

    userbills = user.get_bills()    # this does a read query

    # Do some slow operations that don't involve the database at all
    for bill in userbills:
        slow_update_involving_web_scraping(bill)

    # Now bills are all updated; add and commit them.
    # Here's where the write operations start.
    for bill in userbills:
        db.session.add(bill)
    db.session.commit()

I knew better than to open a write query that might keep the database open during all those long running operations. But apparently, when using sqlite3, even the initial query of the database to get the user's bill list opens the database and keeps it open ... until when? Can you close it manually, then reopen it when you're ready? Does it help to call db.session.commit() after the read query? No one seems to know, and it's not obvious how to test to find out.

I've suspected for a long time that sqlite was only a temporary solution. While developing the billtracker, I hit quite a few difficulties where the answer turned out to be "well, this would be easy in a real database, but sqlite doesn't support that". I figured I'd eventually migrate to postgresql. But I'm such a database newbie that I'd been putting it off.

And rightly so. It turns out that migrating an existing database from sqlite3 to postgresql isn't something that gets written about much; I really couldn't find any guides on it. Apparently everybody but me just chooses the right database to begin with? Anyway, here are the steps on Debian. Obviously, install postgresql first.

Create a User and a Database

Postgresql has its own notion of users, which you need to create. At least on Debian, the default is that if you create a postgres user named martha, then the Linux user martha on the same machine can access databases that the postgres user martha has access to. This is controlled by the "peer" auth method, which you can read about in the postgresql documentation on pg_hba.conf.

First su to the postgres Linux user and run psql:

$ sudo su - postgres
$ psql

Inside psql, create a postgresql user with the same name as your flask user, and create a database for that user:

CREATE USER myflaskuser WITH PASSWORD 'password';
ALTER ROLE myflaskuser SET client_encoding TO 'utf8';
ALTER ROLE myflaskuser SET default_transaction_isolation TO 'read committed';
ALTER ROLE myflaskuser SET timezone TO 'UTC';

CREATE DATABASE dbname;
GRANT ALL PRIVILEGES ON DATABASE dbname TO myflaskuser;

If you like, you can also create a user yourusername and give it access to the same database, to make debugging easier.

With the database created, the next step is to migrate the old data from the sqlite database.

pgloader (if you have a very recent pgloader)

Using sqlalchemy in my flask app meant that I could use flask db upgrade to create the database schema in any database I chose. It does a lovely job of creating an empty database. Unfortunately, that's no help if you already have an existing database full of user accounts.

Some people suggested exporting data in either SQL or CSV format, then importing it into postgresql. Bad idea. There are many incompatibilities between the two databases: identifiers that work in sqlite but not in postgresql (like "user", which is a reserved word in postgres but a common table name in flask-based apps), capitalization of column names, incompatible date formats, and probably many more.

A program called pgloader takes care of many (but not all) of the incompatibilities. Create a file -- I'll call it migrate.pgloader -- like this:

load database
    from 'latest-sqlite-file.db'
    into postgresql:///new_db_name

with include drop, quote identifiers, create tables, create indexes, reset sequences

set work_mem to '16MB', maintenance_work_mem to '512 MB';

Then, from a Linux user who has access to the database (e.g. the myflaskuser you created earlier), run pgloader migrate.pgloader.

That worked nicely on my Ubuntu 19.10 desktop, which has pgloader 3.6.1. It failed utterly on the server, which is running Debian stable and pgloader 3.3.2. Building the latest pgloader from source didn't work on Debian either; it's based on Common Lisp, and the older CL on Debian dumped me into some kind of assembly debugger when I tried to build pgloader. Rather than build CL from source too, I looked for another option.

On an Older OS: Use pgloader Remotely

Postgresql can take commands from remote machines. So you can configure postgresql to accept remote connections, then run the migration from a machine with a new enough pgloader version.

There are two files to edit. The location of postgresql's configuration directory varies with version, so do a locate pg_hba.conf to find it. In that directory, first edit pg_hba.conf and add these lines to the end to allow net socket connections from IP4 and IP6:

host  all     all   0.0.0.0/0     md5
host  all     all   ::/0          md5

In the same directory, edit postgresql.conf and search for listen_addr. Comment out the localhost line if it's uncommented, and add this to allow connections from anywhere, not just localhost:

listen_addresses = '*'

Then restart the database with

service postgresql restart

Modify the migrate.pgloader file from the previous section so the "into" line looks like

    into postgresql://username:password@host/dbname
The username there is the postgres username, if you made that different from the Unix username. You need to use a password because postgres is no longer using peer auth (see that postgres documentation file I linked earlier).

Assuming this You're done with the remote connection part. If you don't need remote database connections for your app, you can now edit postgresql.conf, comment out that listen_addresses = '*' line, and restart the database again with service postgresql restart. Don't remove the two lines you added in pg_hba.conf; flask apparently needs them.

You're ready for the migration. Make sure you have the latest copy of the server's sqlite database, then, from your desktop, run:

pgloader migrate.pgloader

Migrate Autoincrements to Sequences

But that's not enough. If you're using any integer primary keys that autoincrement -- a pretty common database model -- postgresql doesn't understand that. Instead, it has sequence objects. You need to define a sequence, tie it to a table, and tell postgresql that when it adds a new object to the table, the default value of id is the maximum number in the corresponding sequence. Here's how to do that for the table named "user":

CREATE SEQUENCE user_id_seq OWNED by "user".id;
ALTER TABLE "user" ALTER COLUMN id SET default nextval('user_id_seq');
SELECT setval(pg_get_serial_sequence('user', 'id'), coalesce(max(id)+1,1), false) FROM "user";

Note the quotes around "user" because otherwise user is a postgresql reserved word. Repeat these three lines for every table in your database, except that you don't need the quotes around any table name except user.

Incidentally, I've been told that using autoincrement/sequence primary keys isn't best practice, because it can be a bottleneck if lots of different objects are being created at once. I used it because all the models I was following when I started with flask worked that way, but eventually I plan try to switch to using some other unique primary key.

Update: Turns out there was another problem with the sequences, and it was pretty annoying. I ended up with a bunch of indices with names like "idx_15517_ix_user_email" when they should have been "ix_user_email". The database superficially worked fine, but it havoc ensues if you ever need to do a flask/sqlalchemy/alembic migration, since sqlalchemy doesn't know anything about those indices with the funny numeric names. It's apparently possible to rename indices in postgresql, but it's a tricky operation that has to be done by hand for each index.

Now the database should be ready to test.

Test

Your flask app probably has something like this in config.py:

    SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \
        'sqlite:///' + os.path.join(basedir, 'dbname.db')

If so, you can export DATABSE_URL=postgresql:///dbname and then test it as you usually would. If you normally test on a local machine and not on the server, remember you can tell flask's test server to accept connections from remote machines with flask run --host=0.0.0.0

Database Backups

You're backing up your database, right? That's easier in sqlite where you can just copy the db file.

From the command line, you can back up a postgresql database with: pg_dump dbname > dbname-backup.pg You can do that from Python in a subprocess:

    with open(backup_file, 'w') as fp:
        subprocess.call(["pg_dump", dbname], stdout=fp)

Verify You're Using The New Database

I had some problems with that DATABASE_URL setting; I'd never used it so I didn't realize that it wasn't in the right place and didn't actually work. So I ran through my migration steps, changed DATABASE_URL, thought I was done, and realized later that the app was still running off sqlite3.

It's better to know for sure what your app is running. For instance, you can add a route to routes.py that prints details like that.

You can print app.config["SQLALCHEMY_DATABASE_URI"]. That's enough in theory, but I wanted to know for sure. Turns out str(db.session.get_bind()) will print the connection the flask app's database is actually using. So I added a route that prints both, plus some other information about the running app.

Whew! I was a bit surprised that migrating was as tricky as it was, and that there wasn't more documentation for it. Happy migrations, everyone.

Tags: , , , , ,
[ 12:34 Feb 01, 2020    More tech/web | permalink to this entry | ]

Tue, 14 Jan 2020

Plotting War

A recent article on Pharyngula blog, You ain’t no fortunate one, discussed US wars, specifically the qeustion: depending on when you were born, for how much of your life has the US been at war?

It was an interesting bunch of plots, constantly increasing until for people born after 2001, the percentage hit 100%.

Really? That didn't seem right. Wasn't the US in a lot of wars in the past? When I was growing up, it seemed like we were always getting into wars, poking our nose into other countries' business. Can it really be true that we're so much more warlike now than we used to be?

It made me want to see a plot of when the wars were, beyond Pharyngula's percentage-of-life pie charts. So I went looking for data.

The best source of war dates I could find was American Involvement in Wars from Colonial Times to the Present. I pasted that data into a table and reformatted it to turn it into Python data, and used matplotlib to plot it as a Gantt chart. (Script here: us-wars.py.)

[US Wars Since 1900]

Sure enough. If that Thoughtco page with the war dates is even close to accurate -- it could be biased toward listing recent conflicts, but I didn't find a more authoritative source for war dates -- the prevalence of war took a major jump in 2001. We used to have big gaps between wars, and except for Vietnam, the wars we were involved with were short, mostly less than a year each. But starting in 2001, we've been involved in a never-ending series of overlapping wars unprecedented in US history.

The Thoughtco page had wars going back to 1675, so I also made a plot showing all of them (click for the full-sized version). It's no different: short wars, not overlapping, all the way back to before the revolution. We've seen nothing in the past like the current warmongering. [US Wars Since 1675]

Depressing. Climate change isn't the only phenomenon showing a modern "hockey stick" curve, it seems.

Tags: , , ,
[ 12:25 Jan 14, 2020    More politics | permalink to this entry | ]

Tue, 01 Oct 2019

Making Web Maps using Python, Folium and Shapefiles

A friend recently introduced me to Folium, a quick and easy way of making web maps with Python.

The Folium Quickstart gets you started in a hurry. In just two lines of Python (plus the import line), you can write an HTML file that you can load in any browser to display a slippy map, or you can display it inline in a Jupyter notebook.

Folium uses the very mature Leaflet JavaScript library under the hood. But it lets you do all the development in a few lines of Python rather than a lot of lines of Javascript.

Having run through most of the quickstart, I was excited to try Folium for showing GeoJSON polygons. I'm helping with a redistricting advocacy project; I've gotten shapefiles for the voting districts in New Mexico, and have been wanting to build a map that shows them which I can then extend for other purposes.

Step 1: Get Some GeoJSON

The easiest place to get voting district data is from TIGER, the geographic arm of the US Census.

For the districts resulting from the 2010 Decadal Census, start here: Cartographic Boundary Files - Shapefile (you can also get them as KML, but not as GeoJSON). There's a category called "Congressional Districts: 116th Congress", and farther down the page, under "State-based Files", you can get shapefiles for the upper and lower houses of your state.

You can also likely download them from at www2.census.gov/geo/tiger/TIGER2010/, as long as you can figure out how to decode the obscure directory names. ELSD and POINTLM, so the first step is to figure out what those mean; I never found anything that could decode them.

(Before I found the TIGER district data, I took a more roundabout path that involved learning how to merge shapes; more on that in a separate post.)

Okay, now you have a shapefile (unzip the TIGER file to get a bunch of files with names like cb_2018_35_sldl_500k.* -- shape "files" are an absurd ESRI concept that actually use seven separate files for each dataset, so they're always packaged as a zip archive and programs that read shapefiles expect that when you pass them a .shp, there will be a bunch of other files with the same basename but different extensions in the same directory).

But Folium can't handle shapefiles, only GeoJSON. You can do that translation with a GDAL command:

ogr2ogr -t_srs EPSG:4326 -f GeoJSON file.json file.shp

Or you can do it programmatically with the GDAL Python bindings:

def shapefile2geojson(infile, outfile, fieldname):
    '''Translate a shapefile to GEOJSON.'''
    options = gdal.VectorTranslateOptions(format="GeoJSON",
                                          dstSRS="EPSG:4326")
    gdal.VectorTranslate(outfile, infile, options=options)

The EPSG:4326 specifier, if you read man ogr2ogr, is supposedly for reprojecting the data into WGS84 coordinates, which is what most web maps want (EPSG:4326 is an alias for WGS84). But it has an equally important function: even if your input shapefile is already in WGS84, adding that option somehow ensures that GDAL will use degrees as the output unit. The TIGER data already uses degrees so you don't strictly need that, but some data, like the precinct data I got from UNM RGIS, uses other units, like meters, which will confuse Folium and Leaflet. And the TIGER data isn't in WGS84 anyway; it's in GRS1980 (you can tell by reading the .prj file in the same directory as the .shp). Don't ask me about details of all these different geodetic reference systems; I'm still trying to figure it all out. Anyway, I recommend adding the EPSG:4326 as the safest option.

Step 2: Show the GeoJSON in a Folium Map

In theory, looking at the Folium Quickstart, all you need is folium.GeoJson(filename, name='geojson').add_to(m). In practice, you'll probably want to more, like

Each of these requires some extra work.

You can color the regions with a style function:

folium.GeoJson(jsonfile, style_function=style_fcn).add_to(m)

Here's a simple style function that chooses random colors:

import random

def random_html_color():
    r = random.randint(0,256)
    g = random.randint(0,256)
    b = random.randint(0,256)
    return '#%02x%02x%02x' % (r, g, b)

def style_fcn(x):
    return { 'fillColor': random_html_color() }

I wanted to let the user choose regions by clicking, but it turns out Folium doesn't have much support for that (it may be coming in a future release). You can do it by reading the GeoJSON yourself, splitting it into separate polygons and making them all separate Folium Polygons or GeoJSON objects, each with its own click behavior; but if you don't mind highlights and popups on mouseover instead of requiring a click, that's pretty easy. For highlighting in red whenever the user mouses over a polygon, set this highlight_function:

def highlight_fcn(x):
    return { 'fillColor': '#ff0000' }

For tooltips:

tooltip = folium.GeoJsonTooltip(fields=['NAME'])
In this case, 'NAME' is the field in the shapefile that I want to display when the user mouses over the region. If you're not sure of the field name, the nice thing about GeoJSON is that it's human readable. Generally you'll want to look inside "features", for "properties" to find the fields defined for each polygon. For instance, if I use jq to prettyprint the JSON generated for the NM state house districts:
$ jq . House.json | less
{
  "type": "FeatureCollection",
  "name": "cb_2018_35_sldl_500k",
  "crs": {
    "type": "name",
    "properties": {
      "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
    }
  },
  "features": [
    {
      "type": "Feature",
      "properties": {
        "STATEFP": "35",
        "SLDLST": "009",
        "AFFGEOID": "620L600US35009",
        "GEOID": "35009",
        "NAME": "9",
        "LSAD": "LL",
        "LSY": "2018",
        "ALAND": 3405159792,
        "AWATER": 5020507
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
...

If you still aren't sure which property name means what (for example, "NAME" could be anything), just keep browsing through the JSON file to see which fields change from feature to feature and give the values you're looking for, and it should become obvious pretty quickly.

Here's a working code example: polidistmap.py, and here's an example of a working map:

Tags: , , ,
[ 12:29 Oct 01, 2019    More mapping | permalink to this entry | ]

Thu, 01 Aug 2019

Silly Moon Names: a Nice Beginning Python Project

Every time the media invents a new moon term -- super blood black wolf moon, or whatever -- I roll my eyes.

[Lunar Perigee and Apogee sizes] First, this ridiculous "supermoon" thing is basically undetectable to the human eye. Here's an image showing the relative sizes of the absolute closest and farthest moons. It's easy enough to tell when you see the biggest and smallest moons side by side, but when it's half a degree in the sky, there's no way you'd notice that one was bigger or smaller than average.

Even better, here's a link to an animation of how the moon changes size and "librates" -- tilts so that we can see a little bit over onto the moon's far side -- during the course of a month.

Anyway, the media seem to lap this stuff up and every month there's a new stupid moon term. I'm sure nearly every astronomer was relieved to see the thoroughly sensible Gizmodo article yesterday, Oh My God Stop It With the Fake Moon Names What the Hell Is a 'Black Moon' That Isn't Anything. Not that that will stop the insanity.

If You Can't Beat 'Em, Join 'Em

And then, talking about the ridiculous moon name phenom with some friends, I realized I could play this game too. So I spent twenty minutes whipping up my own Silly Moon Name Generator.

It's super simple -- it just uses Linux' built-in dictionary, with no sense of which words are common, or adjectives or nouns or what. Of course it would be funnier with a hand-picked set of words, but there's a limit to how much time I want to waste on this.

You can add a parameter ?nwords=5 (or whatever number) if you want more or fewer words than four.

How Does It Work?

Random phrase generators like this are a great project for someone just getting started with Python. Python is so good at string manipulation that it makes this sort of thing easy: it only takes half a page of code to do something fun. So it's a great beginner project that most people would probably find more rewarding than cranking out Fibonacci numbers (assuming you're not a Fibonacci geek like I am). For more advanced programmers, random phrase generation can still be a fun and educational project -- skip to the end of this article for ideas.

For the basics, this is all you need: I've added comments explaining the code.

import random


def hypermoon(filename, nwords=4):
    '''Return a silly moon name with nwords words,
       each taken from a word list in the given filename.
    '''
    fp = open(filename)
    lines = fp.readlines()

    # A list to store the words to describe the moon:
    words = []
    for i in range(nwords):    # This will be run nwords times
        # Pick a random number between 0 and the number of lines in the file:
        whichline = random.randint(0, len(lines))

        # readlines() includes whitespace like newline characters.
        # Use whichline to pull one line from the file, and use
        # strip() to remove any extra whitespace:
        word = lines[whichline].strip()

        # Append it to our word list:
        words.append(word)

    # The last word in the phrase will be "moon", e.g.
    # super blood wolf black pancreas moon
    words.append("moon")

    # ' '.join(list) combines all the words with spaces between them
    return ' '.join(words)


# This is called when the program runs:
if __name__ == '__main__':
    random.seed()

    print(hypermoon('/usr/share/dict/words', 4))

A More Compact Format

In that code example, I expanded everything to try to make it clear for beginning programmers. In practice, Python lets you be a lot more terse, so the way I actually wrote it was more like:

def hypermoon(filename, nwords=4):
    with open(filename, encoding='utf-8') as fp:
        lines = fp.readlines()

    words = [ lines[random.randint(0, len(lines))].strip()
              for i in range(nwords) ]
    words.append('moon')
    return ' '.join(words)

There are three important differences (in bold):

Opening a file using "with" ensures the file will be closed properly when you're done with it. That's not important in this tiny example, but it's a good habit to get into.

I specify the 'utf-8' encoding when I open the file because when I ran it as a web app, it turned out the web server used the ASCII encoding and I got Python errors because there are accented characters in the dictionary somewhere. That's one of those Python annoyances you get used to when going beyond the beginner level.

The way I define words all in one line (well, it's conceptually one long line, though I split it into two so each line stays under 72 characters) is called a list comprehension. It's a nice compact alternative to defining an empty list [] and then calling append() a bunch of times, like I did in the first example.

Initially they might seem harder to read, but list comprehensions can actually make code clearer once you get used to them.

A Python Driven Web Page

Finally, to make it work as a web page, I added the CGI module. That isn't really a beginner thing so I won't paste it here, but you can see the CGI version at hypermoon.py on GitHub.

I should mention that there's some debate over CGI in Python. The movers and shakers in the Python community don't approve of CGI, and there's a plan to remove it from upcoming Python versions. The alternative is to use technologies like Flask or Django. while I'm a fan of Flask and have used it for several projects, it's way overkill for something like this, mostly because of all the special web server configuration it requires (and Django is far more heavyweight than Flask). In any case, be aware that the CGI module may be removed from Python's standard library in the near future. With any luck, python-cgi will still be available via pip install or as Linux distro packages.

More Advanced Programmers: Making it Funnier

I mentioned earlier that I thought the app would be a lot funnier with a handpicked set of words. I did that long, long ago with my Star Party Observing Report Generator (written in Perl; I hadn't yet started using Python back in 2001). That's easy and fun if you have the time to spare, or a lot of friends contributing.

You could instead use words taken from a set of input documents. For instance, only use words that appear in Shakespeare's plays, or in company mission statements, or in Wikipedia articles about dog breeds (this involves some web scraping, but Python is good at that too; I like BeautifulSoup).

Or you could let users contribute their own ideas for good words to use, storing the user suggestions in a database.

Another way to make the words seem more appropriate and less random might be to use one of the many natural language packages for Python, such as NLTK, the Natural Language Toolkit. That way, you could control how often you used adjectives vs. nouns, and avoid using verbs or articles at all.

Random word generators seem like a silly and trivial programming exercise -- because they are! But they're also a fun starting point for more advanced explorations with Python.

Tags: , , ,
[ 14:24 Aug 01, 2019    More humor | permalink to this entry | ]

Wed, 17 Jul 2019

Ray-Tracing Digital Elevation Data in 3D with Povray (Part III)

This is Part III of a four-part article on ray tracing digital elevation model (DEM) data. The goal: render a ray-traced image of mountains from a digital elevation model (DEM).

In Part II, I showed how the povray camera position and angle need to be adjusted based on the data, and the position of the light source depends on the camera position.

In particular, if the camera is too high, you won't see anything because all the relief will be tiny invisible bumps down below. If it's too low, it might be below the surface and then you can't see anything. If the light source is too high, you'll have no shadows, just a uniform grey surface.

That's easy enough to calculate for a simple test image like the one I used in Part II, where you know exactly what's in the file. But what about real DEM data where the elevations can vary?

Explore Your Test Data

[Hillshade of northern New Mexico mountains] For a test, I downloaded some data that includes the peaks I can see from White Rock in the local Jemez and Sangre de Cristo mountains.

wget -O mountains.tif 'http://opentopo.sdsc.edu/otr/getdem?demtype=SRTMGL3&west=-106.8&south=35.1&east=-105.0&north=36.5&outputFormat=GTiff'

Create a hillshade to make sure it looks like the right region:

gdaldem hillshade mountains.tif hillshade.png
pho hillshade.png
(or whatever your favorite image view is, if not pho). The image at right shows the hillshade for the data I'm using, with a yellow cross added at the location I'm going to use for the observer.

Sanity check: do the lowest and highest elevations look right? Let's look in both meters and feet, using the tricks from Part I.

>>> import gdal
>>> import numpy as np

>>> demdata = gdal.Open('mountains.tif')
>>> demarray = np.array(demdata.GetRasterBand(1).ReadAsArray())
>>> demarray.min(), demarray.max()
(1501, 3974)
>>> print([ x * 3.2808399 for x in (demarray.min(), demarray.max())])
[4924.5406899, 13038.057762600001]

That looks reasonable. Where are those highest and lowest points, in pixel coordinates?

>>> np.where(demarray == demarray.max())
(array([645]), array([1386]))
>>> np.where(demarray == demarray.min())
(array([1667]), array([175]))

Those coordinates are reversed because of the way numpy arrays are organized: (1386, 645) in the image looks like Truchas Peak (the highest peak in this part of the Sangres), while (175, 1667) is where the Rio Grande disappears downstream off the bottom left edge of the map -- not an unreasonable place to expect to find a low point. If you're having trouble eyeballing the coordinates, load the hillshade into GIMP and watch the coordinates reported at the bottom of the screen as you move the mouse.

While you're here, check the image width and height. You'll need it later.

>>> demarray.shape
(1680, 2160)
Again, those are backward: they're the image height, width.

Choose an Observing Spot

Let's pick a viewing spot: Overlook Point in White Rock (marked with the yellow cross on the image above). Its coordinates are -106.1803, 35.827. What are the pixel coordinates? Using the formula from the end of Part I:

>>> import affine
>>> affine_transform = affine.Affine.from_gdal(*demdata.GetGeoTransform())
>>> inverse_transform = ~affine_transform
>>> [ round(f) for f in inverse_transform * (-106.1803, 35.827) ]
[744, 808]

Just to double-check, what's the elevation at that point in the image? Note again that the numpy array needs the coordinates in reverse order: Y first, then X.

>>> demarray[808, 744], demarray[808, 744] * 3.28
(1878, 6159.839999999999)

1878 meters, 6160 feet. That's fine for Overlook Point. We have everything we need to set up a povray file.

Convert to PNG

As mentioned in Part II, povray will only accept height maps as a PNG file, so use gdal_translate to convert:

gdal_translate -ot UInt16 -of PNG mountains.tif mountains.png

Use the Data to Set Camera and Light Angles

The camera should be at the observer's position, and povray needs that as a line like

    location <rightward, upward, forward>
where those numbers are fractions of 1.

The image size in pixels is 2160x1680, and the observer is at pixel location (744, 808). So the first and third coordinates of location should be 744/2160 and 808/1680, right? Well, almost. That Y coordinate of 808 is measured from the top, while povray measures from the bottom. So the third coordinate is actually 1. - 808/1680.

Now we need height, but how do you normalize that? That's another thing nobody seems to document anywhere I can find; but since we're using a 16-bit PNG, I'll guess the maximum is 216 or 65536. That's meters, so DEM files can specify some darned high mountains! So that's why that location <0, .25, 0> line I got from the Mapping Hacks book didn't work: it put the camera at .25 * 65536 or 16,384 meters elevation, waaaaay up high in the sky.

My observer at Overlook Point is at 1,878 meters elevation, which corresponds to a povray height of 1878/65536. I'll use the same value for the look_at height to look horizontally. So now we can calculate all three location coordinates: 744/2160 = .3444, 1878/65536 = 0.0287, 1. - 808/1680 = 0.5190:

    location <.3444, 0.0287, .481>

Povray Glitches

Except, not so fast: that doesn't work. Remember how I mentioned in Part II that povray doesn't work if the camera location is at ground level? You have to put the camera some unspecified minimum distance above ground level before you see anything. I fiddled around a bit and found that if I multiplied the ground level height by 1.15 it worked, but 1.1 wasn't enough. I have no idea whether that will work in general. All I can tell you is, if you're setting location to be near ground level and the generated image looks super dark regardless of where your light source is, try raising your location a bit higher. I'll use 1878/65536 * 1.15 = 0.033.

For a first test, try setting look_at to some fixed place in the image, like the center of the top (north) edge (right .5, forward 1):

    location <.3444, 0.033, .481>
    look_at <.5, 0.033, 1>

That means you won't be looking exactly north, but that's okay, we're just testing and will worry about that later. The middle value, the elevation, is the same as the camera elevation so the camera will be pointed horizontally. (look_at can be at ground level or even lower, if you want to look down.)

Where should the light source be? I tried to be clever and put the light source at some predictable place over the observer's right shoulder, and most of the time it didn't work. I ended up just fiddling with the numbers until povray produced visible terrain. That's another one of those mysterious povray quirks. This light source worked fairly well for my DEM data, but feel free to experiment:

light_source { <2, 1, -1> color <1,1,1> }

All Together Now

Put it all together in a mountains.pov file:

camera {
    location <.3444, 0.0330, .481>
    look_at <.5, 0.0287, 1>
}

light_source { <2, 1, -1> color <1,1,1> }

height_field {
    png "mountains.png"
    smooth
    pigment {
        gradient y
        color_map {
            [ 0 color <.7 .7 .7> ]
            [ 1 color <1 1 1> ]
        }
    }
    scale <1, 1, 1>
}
[Povray-rendering of Black and Otowi Mesas from Overlook Point] Finally, you can run povray and generate an image!
povray +A +W800 +H600 +INAME_OF_POV_FILE +OOUTPUT_PNG_FILE

And once I finally got to this point I could immediately see it was correct. That's Black Mesa (Tunyo) out in the valley a little right of center, and I can see White Rock canyon in the foreground with Otowi Peak on the other side of the canyon. (I strongly recommend, when you experiment with this, that you choose a scene that's very distinctive and very familiar to you, otherwise you'll never be sure if you got it right.)

Next Steps

Now I've accomplished my goal: taking a DEM map and ray-tracing it. But I wanted even more. I wanted a 360-degree panorama of all the mountains around my observing point.

Povray can't do that by itself, but in Part IV, I'll show how to make a series of povray renderings and stitch them together into a panorama. Part IV, Making a Panorama from Raytraced DEM Images

Tags: , , , , ,
[ 16:43 Jul 17, 2019    More mapping | permalink to this entry | ]

Sun, 07 Jul 2019

Working with Digital Elevation Models with GDAL and Python (Ray Tracing Elevation Data, Part I)

Part III of a four-part article:

One of my hiking buddies uses a phone app called Peak Finder. It's a neat program that lets you spin around and identify the mountain peaks you see.

Alas, I can't use it, because it won't work without a compass, and [expletive deleted] Samsung disables the compass in their phones, even though the hardware is there. I've often wondered if I could write a program that would do something similar. I could use the images in planetarium shows, and could even include additions like predicting exactly when and where the moon would rise on a given date.

Before plotting any mountains, first you need some elevation data, called a Digital Elevation Model or DEM.

Get the DEM data

Digital Elevation Models are available from a variety of sources in a variety of formats. But the downloaders and formats aren't as well documented as they could be, so it can be a confusing mess.

USGS

[Typical experience with USGS map tiles not loading] USGS steers you to the somewhat flaky and confusing National Map Download Client. Under Data in the left sidebar, click on Elevation Products (3DEP), select the accuracy you need, then zoom and pan the map until it shows what you need.

Current Extent doesn't seem to work consistently, so use Box/Point and sweep out a rectangle. Then click on Find products. Each "product" should have a download link next to it, or if not, you can put it in your cart and View Cart.

Except that National Map tiles often don't load, so you can end up with a mostly-empty map (as shown here) where you have no idea what area you're choosing. Once this starts happening, switching to a different set of tiles probably won't help; all you can do is wait a few hours and hope it gets better..

Or get your DEM data somewhere else. Even if you stick with the USGS, they have a different set of DEM data, called SRTM (it comes from the Shuttle Radar Topography Mission) which is downloaded from a completely different place, SRTM DEM data, Earth Explorer. It's marginally easier to use than the National Map and less flaky about tile loading, and it gives you GeoTIFF files instead of zip files containing various ArcGIS formats. Sounds good so far; but once you've wasted time defining the area you want, suddenly it reveals that you can't download anything unless you first make an account, and you have to go through a long registration process that demands name, address and phone number (!) before you can actually download anything.

Of course neither of these sources lets you just download data for a given set of coordinates; you have to go through the interactive website any time you want anything. So even if you don't mind giving the USGS your address and phone number, if you want something you can run from a program, you need to go elsewhere.

Unencumbered DEM Sources

Fortunately there are several other sources for elevation data. Be sure to read through the comments, which list better sources than in the main article.

The best I found is OpenTypography's SRTM API, which lets you download arbitrary areas specified by latitude/longitude bounding boxes.

Verify the Data: gdaldem

[Making a DEM visible with GIMP Levels] Okay, you've got some DEM data. Did you get the area you meant to get? Is there any data there? DEM data often comes packaged as an image, primarily GeoTIFF. You might think you could simply view that in an image viewer -- after all, those nice preview images they show you on those interactive downloaders show the terrain nicely. But the actual DEM data is scaled so that even high mountains don't show up; you probably won't be able to see anything but blackness.

One way of viewing a DEM file as an image is to load it into GIMP. Bring up Colors->Levels, go to the input slider (the upper of the two sliders) and slide the rightmost triangle leftward until it's near the right edge of the histogram. Don't save it that way (that will mess up the absolute elevations in the file); it's just a quick way of viewing the data.

Update: A better, one-step way is Color > Auto > Stretch Contrast.


[hillshade generated by gdaldem] A better way to check DEM data files is a beautiful little program called gdaldem (part of the GDAL package). It has several options, like generating a hillshade image:

gdaldem hillshade n35_w107_1arc_v3.tif hillshade.png

Then view hillshade.png in your favorite image viewer and see if it looks like you expect. Having read quite a few elaborate tutorials on hillshade generation over the years, I was blown away at how easy it is with gdaldem.

Here are some other operations you can do on DEM data.

Translate the Data to Another Format

gdal has lots more useful stuff beyond gdaldem. For instance, my ultimate goal, ray tracing, will need a PNG:

gdal_translate -ot UInt16 -of PNG srtm_54_07.tif srtm_54_07.png

gdal_translate can recognize most DEM formats. If you have a complicated multi-file format like ARCGIS, try using the name of the directory where the files live.

Get Vertical Limits, for Scaling

What's the highest point in your data, and at what coordinates does that peak occur? You can find the highest and lowest points easily with Python's gdal package if you convert the gdal.Dataset into a numpy array:

import gdal
import numpy as np

demdata = gdal.Open(filename)
demarray = np.array(demdata.GetRasterBand(1).ReadAsArray())
print(demarray.min(), demarray.max())

That gives you the highest and lowest elevations. But where are they in the data? That's not super intuitive in numpy; the best way I've found is:

indices = np.where(demarray == demarray.max())
ymax, xmax = indices[0][0], indices[1][0]
print("The highest point is", demarray[ymax][xmax])
print("  at pixel location", xmax, ymax)

Translate Between Lat/Lon and Pixel Coordinates

But now that you have the pixel coordinates of the high point, how do you map that back to latitude and longitude? That's trickier, but here's one way, using the affine package:

import affine

affine_transform = affine.Affine.from_gdal(*demdata.GetGeoTransform())
lon, lat = affine_transform * (xmax, ymax)

What about the other way? You have latitude and longitude and you want to know what pixel location that corresponds to? Define an inverse transformation:

inverse_transform = ~affine_transform
px, py = [ round(f) for f in inverse_transform * (lon, lat) ]

Those transforms will become important once we get to Part III. But first, Part II, Understand Povray: Height Fields in Povray

Tags: , , ,
[ 18:15 Jul 07, 2019    More mapping | permalink to this entry | ]

Thu, 13 Jun 2019

Finding Astronomical Alignments in Ancient Monuments (or anywhere else)

Dave and I will be presenting a free program on Stonehenge at the Los Alamos Nature Center tomorrow, June 14.

The nature center has a list of programs people have asked for, and Stonehenge came up as a topic in our quarterly meeting half a year ago. Remembering my seventh grade fascination with Stonehenge and its astronomical alignments -- I discovered Stonehenge Decoded at the local library, and built a desktop model showing the stones and their alignments -- I volunteered. But after some further reading, I realized that not all of those alignments are all they're cracked up to be and that there might not be much of astronomical interest to talk about, and I un-volunteered.

But after thinking about it for a bit, I realized that "not all they're cracked up to be" makes an interesting topic in itself. So in the next round of planning, I re-volunteered; the result is tomorrow night's presentation.

The talk will include a lot of history of Stonehenge and its construction, and a review of some other important or amusing henges around the world. But this article is on the astronomy, or lack thereof.

The Background: Stonehenge Decoded

Stonehenge famously aligns with the summer solstice sunrise, and that's when tens of thousands of people flock to Salisbury, UK to see the event. (I'm told that the rest of the time, the monument is fenced off so you can't get very close to it, though I've never had the opportunity to visit.)

Curiously, archaeological evidence suggests that the summer solstice wasn't the big time for prehistorical gatherings at Stonehenge; the time when it was most heavily used was the winter solstice, when there's a less obvious alignment in the other direction. But never mind that.

[Gerald Hawkins' Stonehenge alignments from Stonehenge Decoded] In 1963, Gerald Hawkins wrote an article in Nature, which he followed up two years later with a book entitled Stonehenge Decoded. Hawkins had access to an IBM 7090, capable of a then-impressive 100 Kflops (thousand floating point operations per second; compare a Raspberry Pi 3 at about 190 Mflops, or about a hundred Gflops for something like an Intel i5). It cost $2.9 million (nearly $20 million in today's dollars).

Using the 7090, Hawkins mapped the positions of all of Stonehenge's major stones, then looked for interesting alignments with the sun and moon. He found quite a few of them. (Hawkins and Fred Hoyle also had a theory about the fifty-six Aubrey holes being a lunar eclipse predictor, which captured my seventh-grade imagination but which most researchers today think was more likely just a coincidence.)

But I got to thinking ... Hawkins mapped at least 38 stones if you don't count the Aubrey holes. If you take 38 randomly distributed points, what are the chances that you'll find interesting astronomical alignments?

A Modern Re-Creation of Hawkins' Work

Programmers today have it a lot easier than Hawkins did. We have languages like Python, with libraries like PyEphem to handle the astronomical calculations. And it doesn't hurt that our computers are about a million times faster.

Anyway, my script, skyalignments.py takes a GPX file containing a list of geographic coordinates and compares those points to sunrise and sunset at the equinoxes and solstices, as well as the full moonrise and moonset nearest the solstice or equinox. It can find alignments among all the points in the GPX file, or from a specified "observer" point to each point in the file. It allows a slop of a few degrees, 2 degrees by default; this is about four times the diameter of the sun or moon, but a half-step from your observing position can make a bigger difference than that. I don't know how much slop Hawkins used; I'd love to see his code.

[Astronomical alignments between pairs of New Mexico peaks] My first thought was, what if you stand on a mountain peak and look around you at other mountain peaks? (It's easy to get GPS coordinates for peaks; if you can't find them online you can click on them on a map.) So I plotted the major peaks in the Jemez and Sangre de Cristo mountains that I figured were all mutually visible. It came to 22 points; about half what Hawkins was working with.

My program found (114 alignments.

[Astronomical alignments between pairs of New Mexico peaks] Yikes! Way too many. What if I cut it down? So I tried eliminating all but the really obvious ones, the ones you really notice from across the valley. The most prominent 11 peaks: 5 in the Jemez, 6 in the Sangres.

That was a little more manageable. Now I was down to only 22 alignments.

Now, I'm pretty sure that the Ancient Ones -- or aliens -- didn't lay out the Jemez and Sangre de Cristo mountains to align with the rising and setting sun and moon. No, what this tells us is that pretty much any distribution of points will give you a bunch of astronomical alignments.

And that's just the sun and moon, all Hawkins was considering. If you look for writing on astronomical alignments in ancient monuments, you'll find all people claiming to have found alignments with all sorts of other rising and setting bodies, like Sirius and Orion's belt. Imagine how many alignments I could have found if I'd included the hundred brightest stars.

So I'm not convinced. Certainly Stonehenge's solstice alignment looks real; I'm not disputing that. And there are lots of other archaeoastronomy sites that are even more convincing, like the Chaco sun dagger. But I've also seen plenty of web pages, and plenty of talks, where someone maps out a collection of points at an ancient site and uses alignments among them as proof that it was an ancient observatory. I suspect most of those alignments are more evidence of random chance and wishful thinking than archeoastronomy.

Tags: , , , , ,
[ 14:54 Jun 13, 2019    More science/astro | permalink to this entry | ]

Wed, 05 Jun 2019

Styling GTK3 in Python with CSS

Lately I've been running with my default python set to Python 3. Debian still uses Python 2 as the default, which is reasonable, but adding a ~/bin/python symlink to /usr/bin/python3 helps me preview scripts that might become a problem once Debian does switch. I thought I had converted most of my Python scripts to Python 3 already, but this link is catching some I didn't convert.

Python has a nice script called 2to3 that can convert the bulk of most scripts with little fanfare. The biggest hassles that 2to3 can't handle are network related (urllib and urllib2) and, the big one, user interfaces. PyGTK, based on GTK2 has no Python 3 equivalent; in Python 3, the only option is to use GObject Introspection (gi) and GTK3. Since there's almost no documentation on python-gi and gtk3, converting a GTK script always involves a lot of fumbling and guesswork.

A few days ago I tried to play an MP3 in my little musicplayer.py script and discovered I'd never updated it. I have enough gi/GTK3 scripts by now that I thought something with such a simple user interface would be easy. Shows how much I know about GTK3!

I got the basic window ported pretty easily, but it looked terrible: huge margins everywhere, and no styling on the text, like the bold, large-sized text I had previously use to highlight the name of the currently playing song. I tried various approaches, but a lot of the old methods of styling have been deprecated in GTK3; you're supposed to use CSS. Except, of course, there's no documentation on it, and it turns out the CSS accepted by GTK3 is a tiny subset of the CSS you can use in HTML pages, but what the subset is doesn't seem to be documented anywhere.

How to Apply a Stylesheet

The first task was to get any CSS at all working. The GNOME Journal: Styling GTK with CSS was helpful in getting started, but had a lot of information that doesn't work (perhaps it did once). At least it gave me this basic snippet:

    css = '* { background-color: #f00; }'
    css_provider = gtk.CssProvider()
    css_provider.load_from_data(css)
    context = gtk.StyleContext()
    screen = Gdk.Screen.get_default()
    context.add_provider_for_screen(screen, css_provider,
                                    gtk.STYLE_PROVIDER_PRIORITY_APPLICATION)

Built-in Class Names

Great! if all you want to do is turn the whole app red. But in reality, you'll want to style different widgets differently. At least some classes have class names:

    css = 'button { background-color: #f00; }'
I found other pages suggesting using 'GtkButton in CSS, but that didn't work for me. How do you find the right class names? No idea, I never found a reference for that. Just guess, I guess.

User-set Class Names

What about classes -- for instance, make all the buttons in a ButtonBox white? You can add classes this way:

    button_context = button.get_style_context()
    button_context.add_class("whitebutton")

If you need to change a class (for instance, turn a red button green), first remove the old class:

    button_context = button.get_style_context()
    entry_style_context.remove_class("red")

Widget Names, like CSS ID

For single widgets, you can give the widget a name and use it like an ID in CSS. Like this:

    label = gtk.Label()
    label.set_use_markup(True)
    label.set_line_wrap(True)
    label.set_name("red_label")
    mainbox.pack_start(label, False, False, 0)
    css = '#red_label { background-color: #f00; }'
[ ... ]

Properties You Can Set

There is, amazingly, a page on which CSS properties GTK3 supports. That page doesn't mention it, but some properties like :hover are also supported. So you can write CSS tweaks like

.button { border-radius: 15; border-width: 2; border-style: outset; }
.button:hover { background: #dff; border-color: #8bb; }

And descendants work, so you can say somthing like

    buttonbox = gtk.ButtonBox(spacing=4)
    buttonbox.set_name("buttonbox")
    mainbox.pack_end(buttonbox, False, False, 0)

    btn = gtk.Button(label="A")
    buttonbox.add(btn)

    btn = gtk.Button(label="B")
    buttonbox.add(btn)
and then use CSS that affects all the buttons inside the buttonbox:
#buttonbox button { color: red; }

No mixed CSS Inside Labels

My biggest disappointment was that I couldn't mix styles inside a label. You can't do something like

label.set_label('Headline'
                'Normal text')

and expect to style the different parts separately. You can use very simple markup like <b>bold</b> normal, but anything further gives errors like "error parsing markup: Attribute 'class' is not allowed on the <span> tag" (you'll get the same error if you try "id"). I had to make separate GtkLabels for each text size and style I wanted, which is a lot more work. If you wanted to mix styles and have them reflow as the content length changed, I don't know how (or if) you could do it.

Fortunately, I don't strictly need that for this little app. So for now, I'm happy to have gotten this much working.

Tags: , , ,
[ 14:49 Jun 05, 2019    More programming | permalink to this entry | ]

Thu, 30 May 2019

Plotting a Sequence of Graphs in Matplotlib 3D

A friend and I were talking about temperature curves: specifically, the way the temperature sinks in the evening but then frequently rises again before it really starts cooling off.

I thought it would be fun to plot the curve of temperature as a function of time over successive days, as a 3-D plot. I knew matplotlib had a way to do 3D plots, but I've never actually generated one.

Well, it turns out there are lots of examples, but they all start by generating mysterious data blobs, and none of them explain the structure of the data they're using, and the documentation has mysterious parameters like "zs" that aren't explained anywhere. So getting something that worked was a fiddly process. Creating a color version, to distinguish the graphs better, was even more fiddly.

[Plotting a series of graphs using matplotlib 3d] So I wrote an example that I hope will make it a little clearer for anyone trying to use this library. It can plot using just lines:

[Plotting a series of graphs using matplotlib 3d, color option] ... or it can plot in color, cycling colors manually because by default matplotlib makes adjacent colors similar, exactly the opposite of what you'd want:

Here's the demo: multiplot3d.py on GitHub.

... Except there's a Bug

All is not perfect. Axes3D gets a bit confused sometimes about which layer is supposed to be in front of which other layer. You can see that on the two plots: in both cases, the fourth and fifth layers from the front are reversed, so the fifth layer is drawn in front of the fourth layer. I haven't yet found anyone in the matplotlib organization who seems to know much about Axes3D; eventually I'll file a bug but I want to write a shorter, clearer test case to illustrate the problem. Still, even with the bugs it's a useful technique to know.

Tags: , , ,
[ 09:57 May 30, 2019    More programming | permalink to this entry | ]

Fri, 25 Jan 2019

Announcing the New Mexico Bill Tracker

For the last few weeks I've been consumed with a project I started last year and then put aside for a while: a bill tracker.

The project sprung out of frustration at the difficulty of following bills as they pass through the New Mexico legislature. Bills I was interested in would die in committee, or they would make it to a vote, and I'd read about it a few days later and wish I'd known that it was a good time to write my representative or show up at the Roundhouse to speak. (I've never spoken at the Roundhouse, and whether I'd have the courage to actually do it remains to be seen, but at least I'd like to have the chance to decide.)

New Mexico has a Legislative web site where you can see the status of each bill, and they even offer a way to register and save a list of bills; but then there's no way to get alerts about bills that change status and might be coming up for debate.

New Mexico legislative sessions are incredibly short: 60 days in odd years, 30 days in even. During last year's 30-day session, I wrote some Python code that scraped the HTML pages describing a bill, extract the useful information like when the bill last changed status and where it was right now, present the information in a table where the user could easily scan it, and email the user a daily summary. Fortunately, the nmlegis.gov site, while it doesn't offer raw data for bill status, at least uses lots of id tags in its HTML which make them relatively easy to scrape.

Then the session ended and there was no further way to test it, since bills' statuses were no longer changing. So the billtracker moved to the back burner.

In the runup to this year's 60-day session, I started with Flask, a lightweight Python web library I've used for a couple of small projects, and added some extensions that help Flask handle tasks like user accounts. Then I patched in the legislative web scraping code from last year, and the result was The New Mexico Bill Tracker. I passed the word to some friends in the League of Women Voters and the Sierra Club to help me test it, and I think (hope) it's ready for wider testing.

There's lots more I'd like to do, of course. I still have no way of knowing when a bill will be up for debate. It looks like this year the Legislative web site is showing committ schedules in a fairly standard way, as opposed to the unparseable PDFs they used in past years, so I may be able to get that. Not that legislative committees actually stick to their published schedules; but at least it's a start.

New Mexico readers (or anyone else interested in following the progress of New Mexico bills) are invited to try it. Let me know about any problems you encounter. And if you want to adapt the billtracker for use in another state, send me a note! I'd love to see it extended and would be happy to work with you. Here's the source: BillTracker on GitHub.

Tags: , , , ,
[ 12:34 Jan 25, 2019    More politics | permalink to this entry | ]

Fri, 18 Jan 2019

Python: Find Your System's Biggest CPU Hogs

My machine has recently developed an overheating problem. I haven't found a solution for that yet -- you'd think Linux would have a way to automatically kill or suspend processes based on CPU temperature, but apparently not -- but my investigations led me down one interesting road: how to write a Python script that finds CPU hogs.

The psutil module can get a list of processes with psutil.process_iter(), which returns Process objects that have a cpu_percent() call. Great! Except it always returns 0.0, even for known hogs like Firefox, or if you start up a VLC and make it play video scaled to the monitor size.

That's because cpu_percent() needs to run twice, with an interval in between: it records the elapsed run time and sees how much it changes. You can pass an interval to cpu_percent() (the units aren't documented, but apparently they're seconds). But if you're calling it on more than one process -- as you usually will be -- it's better not to wait for each process. You have to wait at least a quarter of a second to get useful numbers, and longer is better. If you do that for every process on the system, you'll be waiting a long time.

Instead, use cpu_percent() in non-blocking mode. Pass None as the interval (or leave it blank since None is the default), then loop over the process list and call proc.cpu_percent(None) on each process, throwing away the results the first time. Then sleep for a while and repeat the loop: the second time, cpu_percent() will give you useful numbers.

def hoglist(delay=5):
    '''Return a list of processes using a nonzero CPU percentage
       during the interval specified by delay (seconds),
       sorted so the biggest hog is first.
    '''
    proccesses = list(psutil.process_iter())
    for proc in proccesses:
        proc.cpu_percent(None)    # non-blocking; throw away first bogus value

    print("Sleeping ...")
    sys.stdout.flush()
    time.sleep(delay)
    print()

    procs = []
    for proc in proccesses:
        percent = proc.cpu_percent(None)
        if percent:
            procs.append((proc.name(), percent))

    print(procs)
    procs.sort(key=lambda x: x[1], reverse=True)
    return procs

if __name__ == '__main__':
    prohogscs = hoglist()
    for p in hogs:
        print("%20s: %5.2f" % p)

It's a useful trick. Though actually applying this to a daemon that responds to temperature, to solve my overheating problem, is more complicated. For one thing, you need rules about special processes. If your Firefox goes wonky and starts making your X server take lots of CPU time, you want to suspend Firefox, not the X server.

Tags: , ,
[ 15:54 Jan 18, 2019    More programming | permalink to this entry | ]

Sun, 23 Sep 2018

Writing Solar System Simulations with NAIF SPICE and SpiceyPy

Someone asked me about my Javascript Jupiter code, and whether it used PyEphem. It doesn't, of course, because it's Javascript, not Python (I wish there was something as easy as PyEphem for Javascript!); instead it uses code from the book Astronomical Formulae for Calculators by Jean Meeus. (His better known Astronomical Algorithms, intended for computers rather than calculators, is actually harder to use for programming because Astronomical Algorithms is written for BASIC and the algorithms are relatively hard to translate into other languages, whereas Astronomical Formulae for Calculators concentrates on explaining the algorithms clearly, so you can punch them into a calculator by hand, and this ends up making it fairly easy to implement them in a modern computer language as well.)

Anyway, the person asking also mentioned JPL's page HORIZONS Ephemerides page, which I've certainly found useful at times. Years ago, I tried emailing the site maintainer asking if they might consider releasing the code as open source; it seemed like a reasonable request, given that it came from a government agency and didn't involve anything secret. But I never got an answer.

[SpiceyPy example: Cassini's position] But going to that page today, I find that code is now available! What's available is a massive toolkit called SPICE (it's all in capitals but there's no indication what it might stand for. It comes from NAIF, which is NASA's Navigation and Ancillary Information Facility).

SPICE allows for accurate calculations of all sorts of solar system quantities, from the basic solar system bodies like planets to all of NASA's active and historical public missions. It has bindings for quite a few languages, including C. The official list doesn't include Python, but there's a third-party Python wrapper called SpiceyPy that works fine.

The tricky part of programming with SPICE is that most of the code is hidden away in "kernels" that are specific to the objects and quantities you're calculating. For any given program you'll probably need to download at least four "kernels", maybe more. That wouldn't be a problem except that there's not much help for figuring out which kernels you need and then finding them. There are lots of SPICE examples online but few of them tell you which kernels they need, let alone where to find them.

After wrestling with some of the examples, I learned some tricks for finding kernels, at least enough to get the basic examples working. I've collected what I've learned so far into a new GitHub repository: NAIF SPICE Examples. The README there explains what I know so far about getting kernels; as I learn more, I'll update it.

SPICE isn't easy to use, but it's probably much more accurate than simpler code like PyEphem or my Meeus-based Javascript code, and it can calculate so many more objects. It's definitely something worth knowing about for anyone doing solar system simulations.

Tags: , ,
[ 16:43 Sep 23, 2018    More programming | permalink to this entry | ]

Mon, 14 May 2018

Plotting the Jet Stream, or Other Winds, with ECMWF Data

I've been trying to learn more about weather from a friend who used to work in the field -- in particular, New Mexico's notoriously windy spring. One of the reasons behind our spring winds relates to the location of the jet stream. But I couldn't find many good references showing how the jet stream moves throughout the year. So I decided to try to plot it myself -- if I could find the data. Getting weather data can surprisingly hard.

In my search, I stumbled across Geert Barentsen's excellent Annual variations in the jet stream (video). It wasn't quite what I wanted -- it shows the position of the jet stream in December in successive years -- but the important thing is that he provides a Python script on GitHub that shows how he produced his beautiful animation.

[Sample jet steam image]

Well -- mostly. It turns out his data sources are no longer available, and he didn't go into a lot of detail on where he got his data, only saying that it was from the ECMWF ERA re-analysis model (with a link that's now 404). That led me on a merry chase through the ECMWF website trying to figure out which part of which database I needed. ECMWF has lots of publically available databases (and even more) and they even have Python libraries to access them; and they even have a lot of documentation, but somehow none of the documentation addresses questions like which database includes which variables and how to find and fetch the data you're after, and a lot of the sample code doesn't actually work. I ended up using the "ERA Interim, Daily" dataset and requesting data for only specific times and only the variables and pressure levels I was interested in. It's a great source of data once you figure out how to request it.

Sign up for an ECMWF API Key

Access ECMWF Public Datasets (there's also Access MARS and I'm not sure what the difference is), which has links you can click on to register for an API key.

Once you get the email with your initial password, log in using the URL in the email, and change the password. That gave me a "next" button that, when I clicked it, took me to a page warning me that the page was obsolete and I should update whatever bookmark I had used to get there. That page also doesn't offer a link to the new page where you can get your key details, so go here: Your API key. The API Key page gives you some lines you can paste into ~/.ecmwfapirc.

You'll also have to accept the license terms for the databases you want to use.

Install the Python API

That sets you up to use the ECMWF api. They have a Web API and a Python library, plus some other Python packages, but after struggling with a bunch of Magics tutorial examples that mostly crashed or couldn't find data, I decided I was better off sticking to the basic Python downloader API and plotting the results with Matplotlib.

The Python data-fetching API works well. To install it, activate your preferred Python virtualenv or whatever you use for pip packages, then run the pip command shown at Web API Downloads (under "Click here to see the installation/update instructions..."). As always with pip packages, you'll have to decide on a Python version (they support both 2 and 3) and whether to use a virtualenv, the much-disrecommended sudo pip, pip3, etc. I used pip3 in a virtualenv and it worked fine.

Specify a dataset and parameters

That's great, but how do you know which dataset you want to load?

There doesn't seem to be anything that just lists which datasets have which variables. The only way I found is to go to the Web API page for a particular dataset to see the form where you can request different variables. For instance, I ended up using the "interim-full-daily" database, where you can choose date ranges and lists of parameters. There are more choices in the sidebar: for instance, clicking on "Pressure levels" lets you choose from a list of barometric pressures ranging from 1000 all the way down to 1. No units are specified, but they're millibars, also known as hectoPascals (hPa): 1000 is more or less the pressure at ground level, 250 is roughly where the jet stream is, and Los Alamos is roughly at 775 hPa (you can find charts of pressure vs. altitude on the web).

When you go to any of the Web API pages, it will show you a dialog suggesting you read about Data retrieval efficiency, which you should definitely do if you're expecting to request a lot of data, then click on the details for the database you're using to find out how data is grouped in "tape files". For instance, in the ERA-interim database, tapes are grouped by date, so if you're requesting multiple parameters for multiple months, request all the parameters for a given month together, rather than making one request for level 250, another request for level 1000, etc.

Once you've checked the boxes for the data you want, you can fetch the data via the web interface, or click on "View the MARS request" to get parameters you can plug into a Python script.

If you choose the Python script option as I did, you can start with the basic data retrieval example. Use the second example, the one that uses 'format' : "netcdf", which will (eventually) give you a file ending in .nc.

Requesting a specific area

You can request only a limited area,

"area": "75/-20/10/60",
but they're not very forthcoming on the syntax of that, and it's particularly confusing since "75/-20/10/60" supposedly means "Europe". It's hard to figure how those numbers as longitudes and latitudes correspond to Europe, which doesn't go down to 10 degrees latitude, let alone -20 degrees. The Post-processing keywords page gives more information: it's North/West/South/East, which still makes no sense for Europe, until you expand the Area examples tab on that page and find out that by "Europe" they mean Europe plus Saudi Arabia and most of North Africa.

Using the data: What's in it?

Once you have the data file, assuming you requested data in netcdf format, you can parse the .nc file with the netCDF4 Python module -- available as Debian package "python3-netcdf4", or via pip -- to read that file:

import netCDF4

data = netCDF4.Dataset('filename.nc')

But what's in that Dataset? Try running the preceding two lines in the interactive Python shell, then:

>>> for key in data.variables:
...   print(key)
... 
longitude
latitude
level
time
w
vo
u
v

You can find out more about a parameter, like its units, type, and shape (array dimensions). Let's look at "level":

>>> data['level']
<class 'netCDF4._netCDF4.Variable'>
int32 level(level)
    units: millibars
    long_name: pressure_level
unlimited dimensions: 
current shape = (3,)
filling on, default _FillValue of -2147483647 used

>>> data['level'][:]
array([ 250,  775, 1000], dtype=int32)

>>> type(data['level'][:])
<class 'numpy.ndarray'>

Levels has shape (3,): it's a one-dimensional array with three elements: 250, 775 and 1000. Those are the three levels I requested from the web API and in my Python script). The units are millibars.

More complicated variables

How about something more complicated? u and v are the two components of wind speed.

>>> data['u']
<class 'netCDF4._netCDF4.Variable'>
int16 u(time, level, latitude, longitude)
    scale_factor: 0.002161405503194121
    add_offset: 30.095301438361684
    _FillValue: -32767
    missing_value: -32767
    units: m s**-1
    long_name: U component of wind
    standard_name: eastward_wind
unlimited dimensions: time
current shape = (30, 3, 241, 480)
filling on
u (v is the same) has a shape of (30, 3, 241, 480): it's a 4-dimensional array. Why? Looking at the numbers in the shape gives a clue. The second dimension has 3 rows: they correspond to the three levels, because there's a wind speed at every level. The first dimension has 30 rows: it corresponds to the dates I requested (the month of April 2015). I can verify that:
>>> data['time'].shape
(30,)

Sure enough, there are 30 times, so that's what the first dimension of u and v correspond to. The other dimensions, presumably, are latitude and longitude. Let's check that:

>>> data['longitude'].shape
(480,)
>>> data['latitude'].shape
(241,)

Sure enough! So, although it would be nice if it actually told you which dimension corresponded with which parameter, you can probably figure it out. If you're not sure, print the shapes of all the variables and work out which dimensions correspond to what:

>>> for key in data.variables:
...   print(key, data[key].shape)

Iterating over times

data['time'] has all the times for which you have data (30 data points for my initial test of the days in April 2015). The easiest way to plot anything is to iterate over those values:

    timeunits = JSdata.data['time'].units
    cal = JSdata.data['time'].calendar
    for i, t in enumerate(JSdata.data['time']):
        thedate = netCDF4.num2date(t, units=timeunits, calendar=cal)

Then you can use thedate like a datetime, calling thedate.strftime or whatever you need.

So that's how to access your data. All that's left is to plot it -- and in this case I had Geert Barentsen's script to start with, so I just modified it a little to work with slightly changed data format, and then added some argument parsing and runtime options.

Converting to Video

I already wrote about how to take the still images the program produces and turn them into a video: Making Videos (that work in Firefox) from a Series of Images.

However, it turns out ffmpeg can't handle files that are named with timestamps, like jetstream-2017-06-14-250.png. It can only handle one sequential integer. So I thought, what if I removed the dashes from the name, and used names like jetstream-20170614-250.png with %8d? No dice: ffmpeg also has the limitation that the integer can have at most four digits.

So I had to rename my images. A shell command works: I ran this in zsh but I think it should work in bash too.

cd outdir
mkdir moviedir

i=1
for fil in *.png; do
  newname=$(printf "%04d.png" $i)
  ln -s ../$fil moviedir/$newname
  i=$((i+1))
done

ffmpeg -i moviedir/%4d.png -filter:v "setpts=2.5*PTS" -pix_fmt yuv420p jetstream.mp4
The -filter:v "setpts=2.5*PTS" controls the delay between frames -- I'm not clear on the units, but larger numbers have more delay, and I think it's a multiplier, so this is 2.5 times slower than the default.

When I uploaded the video to YouTube, I got a warning, "Your videos will process faster if you encode into a streamable file format." I then spent half a day trying to find a combination of ffmpeg arguments that avoided that warning, and eventually gave up. As far as I can tell, the warning only affects the 20 seconds or so of processing that happens after the 5-10 minutes it takes to upload the video, so I'm not sure it's terribly important.

Results

Here's a video of the jet stream from 2012 to early 2018, and an earlier effort with a much longer 6.0x delay.

And here's the script, updated from the original Barentsen script and with a bunch of command-line options to let you plot different collections of data: jetstream.py on GitHub.

Tags: , , ,
[ 14:18 May 14, 2018    More programming | permalink to this entry | ]

Fri, 27 Apr 2018

Displaying PDF with Python, Qt5 and Poppler

I had a need for a Qt widget that could display PDF. That turned out to be surprisingly hard to do. The Qt Wiki has a page on Handling PDF, which suggests only two alternatives: QtPDF, which is C++ only so I would need to write a wrapper to use it with Python (and then anyone else who used my code would have to compile and install it); or Poppler. Poppler is a common library on Linux, available as a package and used for programs like evince, so that seemed like the best route.

But Python bindings for Poppler are a bit harder to come by. I found a little one-page example using Poppler and Gtk3 via gi.repository ... but in this case I needed it to work with a Qt5 program, and my attempts to translate that example to work with Qt were futile. Poppler's page.render(ctx) takes a Cairo context, and Cairo is apparently a Gtk-centered phenomenon: I couldn't find any way to get a Cairo context from a Qt5 widget, and although I found some web examples suggesting renderToImage(), the Poppler available in gi.repository doesn't have that function.

But it turns out there's another Poppler: popplerqt5, available in the Debian package python3-poppler-qt5. That Poppler does have renderToImage, and you can take that image and paint it in a paint() callback or turn it into a pixmap you can use with a QLabel. Here's the basic sequence:

    document = Poppler.Document.load(filename)
    document.setRenderHint(Poppler.Document.TextAntialiasing)
    page = document.page(pageno)
    img = self.page.renderToImage(dpi, dpi)

    # Use the rendered image as the pixmap for a label:
    pixmap = QPixmap.fromImage(img)
    label.setPixmap(pixmap)

The line to set text antialiasing is not optional. Well, theoretically it's optional; go ahead, try it without that and see for yourself. It's basically unreadable.

Of course, there are plenty of other details to take care of. For instance, you can get the size of the rendered image:

    size = page.pageSize()
... after which you can use size.width() and size.height(). They're in points. There are 72 points per inch, so calculate accordingly in the dpi values you pass to renderToImage if you're targeting a specific DPI or need it to fit in a specific window size.

Window Resize and Efficient Rendering

Speaking of fitting to a window size, I wanted to resize the content whenever the window was resized, which meant redefining resizeEvent(self, event) on the widget. Initially my PDFWidget inherited from Qwidget with a custom paintEvent(), like this:

        # Create self.img once, early on:
        self.img = self.page.renderToImage(self.dpi, self.dpi)

    def paintEvent(self, event):
        qp = QPainter()
        qp.begin(self)
        qp.drawImage(QPoint(0, 0), self.img)
        qp.end()
(Poppler also has a function page.renderToPainter(), but I never did figure out how to get it to do anything useful.)

That worked, but when I added resizeEvent I got an infinite loop: paintEvent() called resizeEvent() which triggered another paintEvent(), ad infinitum. I couldn't find a way around that (GTK has similar problems -- seems like nearly everything you do generates another expose event -- but there you can temporarily disable expose events while you're drawing). So I rewrote my PDFWidget class to inherit from QLabel instead of QWidget, converted the QImage to a QPixmap and passed it to self.setPixmap(). That let me get rid of the paintEvent() function entirely and let QLabel handle the painting, which is probably more efficient anyway.

Showing all pages in a scrolled widget

renderToImage gives you one image corresponding to one page of the PDF document. More often, you'll want to see the whole document laid out, with all the pages. So you need a way to stack a bunch of widgets vertically, one for each page. You can do that with a QVBoxLayout on a widget inside a QScrollArea.

I haven't done much Qt5 programming, so I wasn't familiar with how these QVBoxes work. Most toolkits I've worked with have a VBox container widget to which you add child widgets, but in Qt5, you create a widget (no particular type -- a QWidget is enough), then create a layout object that modifies the widget, and add the sub-widgets to the layout object. There isn't much documentation for any of this, and very few examples of doing it in Python, so it took some fiddling to get it working.

Initial Window Size

One last thing: Qt5 doesn't seem to have a concept of desired initial window size. Most of the examples I found, especially the ones that use a .ui file, use setGeometry(); but that requires an (X, Y) position as well as (width, height), and there's no way to tell it to ignore the position. That means that instead of letting your window manager place the window according to your preferences, the window will insist on showing up at whatever arbitrary place you set in the code. Worse, most of the Qt5 examples I found online set the geometry to (0, 0): when I tried that, the window came up with the widget in the upper left corner of the screen and the window's titlebar hidden above the top of the screen, so there's no way to move the window to a better location unless you happen to know your window manager's hidden key binding for that. (Hint: on many Linux window managers, hold Alt down and drag anywhere in the window to move it. If that doesn't work, try holding down the "Windows" key instead of Alt.)

This may explain why I've been seeing an increasing number of these ill-behaved programs that come up with their titlebars offscreen. But if you want your programs to be better behaved, it works to self.resize(width, height) a widget when you first create it.

The current incarnation of my PDF viewer, set up as a module so you can import it and use it in other programs, is at qpdfview.py on GitHub.

Tags: , ,
[ 19:01 Apr 27, 2018    More programming | permalink to this entry | ]

Sat, 17 Feb 2018

Multiplexing Input or Output on a Raspberry Pi Part 2: Port Expanders

In the previous article I talked about Multiplexing input/output using shift registers for a music keyboard project. I ended up with three CD4021 8-bit shift registers cascaded. It worked; but I found that I was spending all my time in the delays between polling each bit serially. I wanted a way to read those bits faster. So I ordered some I/O expander chips.

[Keyboard wired to Raspberry Pi with two MCP23017 port expanders] I/O expander, or port expander, chips take a lot of the hassle out of multiplexing. Instead of writing code to read bits serially, you can use I2C. Some chips also have built-in pullup resistors, so you don't need all those extra wires for pullups or pulldowns. There are lots of options, but two common chips are the MCP23017, which controls 16 lines, and the MCP23008 and PCF8574p, which each handle 8. I'll only discuss the MCP23017 here, because if eight is good, surely sixteen is better! But the MCP23008 is basically the same thing with fewer I/O lines.

A good tutorial to get you started is How To Use A MCP23017 I2C Port Expander With The Raspberry Pi - 2013 Part 1 along with part 2, Python and part 3, reading input.

I'm not going to try to repeat what's in those tutorials, just fill in some gaps I found. For instance, I didn't find I needed sudo for all those I2C commands in Part 1 since my user is already in the i2c group.

Using Python smbus

Part 2 of that tutorial uses Python smbus, but it doesn't really explain all the magic numbers it uses, so it wasn't obvious how to generalize it when I added a second expander chip. It uses this code:

DEVICE = 0x20 # Device address (A0-A2)
IODIRA = 0x00 # Pin direction register
OLATA  = 0x14 # Register for outputs
GPIOA  = 0x12 # Register for inputs

# Set all GPA pins as outputs by setting
# all bits of IODIRA register to 0
bus.write_byte_data(DEVICE,IODIRA,0x00)

# Set output all 7 output bits to 0
bus.write_byte_data(DEVICE,OLATA,0)

DEVICE is the address on the I2C bus, the one you see with i2cdetect -y 1 (20, initially).

IODIRA is the direction: when you call

bus.write_byte_data(DEVICE, IODIRA, 0x00)
you're saying that all eight bits in GPA should be used for output. Zero specifies output, one input: so if you said
bus.write_byte_data(DEVICE, IODIRA, 0x1F)
you'd be specifying that you want to use the lowest five bits for output and the upper three for input.

OLATA = 0x14 is the command to use when writing data:

bus.write_byte_data(DEVICE, OLATA, MyData)
means write data to the eight GPA pins. But what if you want to write to the eight GPB pins instead? Then you'd use
OLATB  = 0x15
bus.write_byte_data(DEVICE, OLATB, MyData)

Likewise, if you want to read input from some of the GPB bits, use

GPIOB  = 0x13
val = bus.read_byte_data(DEVICE, GPIOB)

The MCP23017 even has internal pullup resistors you can enable:

GPPUA  = 0x0c    # Pullup resistor on GPA
GPPUB  = 0x0d    # Pullup resistor on GPB
bus.write_byte_data(DEVICE, GPPUB, inmaskB)

Here's a full example: MCP23017.py on GitHub.

Using WiringPi

You can also talk to an MCP23017 using the WiringPi library. In that case, you don't set all the bits at once, but instead treat each bit as though it were a separate pin. That's easier to think about conceptually -- you don't have to worry about bit shifting and masking, just use pins one at a time -- but it might be slower if the library is doing a separate read each time you ask for an input bit. It's probably not the right approach to use if you're trying to check a whole keyboard's state at once.

Start by picking a base address for the pin number -- 65 is the lowest you can pick -- and initializing:

pin_base = 65
i2c_addr = 0x20

wiringpi.wiringPiSetup()
wiringpi.mcp23017Setup(pin_base, i2c_addr)

Then you can set input or output mode for each pin:

wiringpi.pinMode(pin_base, wiringpi.OUTPUT)
wiringpi.pinMode(input_pin, wiringpi.INPUT)
and then write to or read from each pin:
wiringpi.digitalWrite(pin_no, 1)
val = wiringpi.digitalRead(pin_no)

WiringPi also gives you access to the MCP23017's internal pullup resistors:

wiringpi.pullUpDnControl(input_pin, 2)

Here's an example in Python: MCP23017-wiringpi.py on GitHub, and one in C: MCP23017-wiringpi.c on GitHub.

Using multiple MCP23017s

But how do you cascade several MCP23017 chips?

Well, you don't actually cascade them. Since they're I2C devices, you wire them so they each have different addresses on the I2C bus, then query them individually. Happily, that's easier than keeping track of how many bits you've looped through ona shift register.

Pins 15, 16 and 17 on the chip are the address lines, labeled A0, A1 and A2. If you ground all three you get the base address of 0x20. With all three connected to VCC, it will use 0x27 (binary 111 added to the base address). So you can send commands to your first device at 0x20, then to your second one at 0x21 and so on. If you're using WiringPi, you can call mcp23017Setup(pin_base2, i2c_addr2) for your second chip.

I had trouble getting the addresses to work initially, and it turned out the problem wasn't in my understanding of the address line wiring, but that one of my cheap Chinese breadboard had a bad power and ground bus in one quadrant. That's a good lesson for the future: when things don't work as expected, don't assume the breadboard is above suspicion.

Using two MCP23017 chips with their built-in pullup resistors simplified the wiring for my music keyboard enormously, and it made the code cleaner too. Here's the modified code: keyboard.py on GitHub.

What about the speed? It is indeed quite a bit faster than the shift register code. But it's still too laggy to use as a real music keyboard. So I'll still need to do more profiling, and maybe find a faster way of generating notes, if I want to play music on this toy.

Tags: , ,
[ 15:44 Feb 17, 2018    More hardware | permalink to this entry | ]

Tue, 13 Feb 2018

Multiplexing Input or Output on a Raspberry Pi Part 1: Shift Registers

I was scouting for parts at a thrift shop and spotted a little 23-key music keyboard. It looked like a fun Raspberry Pi project.

I was hoping it would turn out to use some common protocol like I2C, but when I dissected it, it turned out there was a ribbon cable with 32 wires coming from the keyboard. So each key is a separate pushbutton.

[23-key keyboard wired to a Raspberry Pi] A Raspberry Pi doesn't have that many GPIO pins, and neither does an Arduino Uno. An Arduino Mega does, but buying a Mega to go between the Pi and the keyboard kind of misses the point of scavenging a $3 keyboard; I might as well just buy an I2C or MIDI keyboard. So I needed some sort of I/O multiplexer that would let me read 31 keys using a lot fewer pins.

There are a bunch of different approaches to multiplexing. A lot of keyboards use a matrix approach, but that makes more sense when you're wiring up all the buttons from scratch, not starting with a pre-wired keyboard like this. The two approaches I'll discuss here are shift registers and multiplexer chips.

If you just want to get the job done in the most efficient way, you definitely want a multiplexer (port expander) chip, which I'll cover in Part 2. But for now, let's look at the old-school way: shift registers.

PISO Shift Registers

There are lots of types of shift registers, but for reading lots of inputs, you need a PISO shift register: "Parallel In, Serial Out." That means you can tell the chip to read some number -- typically 8 -- of inputs in parallel, then switch into serial mode and read all the bits one at a time.

Some PISO shift registers can cascade: you can connect a second shift register to the first one and read twice as many bits. For 23 keys I needed three 8-bit shift registers.

Two popular cascading PISO shift registers are the CD4021 and the SN74LS165. They work similarly but they're not exactly the same.

The basic principle with both the CD4021 and the SN74LS165: connect power and ground, and wire up all your inputs to the eight data pins. You'll need pullup or pulldown resistors on each input line, just like you normally would for a pushbutton; I recommend picking up a few high-value (like 1-10k) resistor arrays: you can get these in SIP (single inline package) or DIP (dual-) form factors that plug easily into a breadboard. Resistor arrays can be either independent two pins for each resistor in the array) or bussed (one pin in the chip is a common pin, which you wire to ground for a pulldown or V+ for a pullup; each of the rest of the pins is a resistor). I find bussed networks particularly handy because they can reduce the number of wires you need to run, and with a job where you're multiplexing lots of lines, you'll find that getting the wiring straight is a big part of the job. (See the photo above to see what a snarl this was even with resistor networks.)

For the CD4021, connect three more pins: clock and data pins (labeled CLK and either Q7 or Q8 on the chip's pinout, pins 10 and 3), plus a "latch" pin (labeled M, pin 9). For the SN74LS165, you need one more pin: you need clock and data (labeled CP and Q7, pins 2 and 9), latch (labeled PL, pin 1), and clock enable (labeled CE, pin 15).

At least for the CD4021, some people recommend a 0.1 uF bypass capacitor across the power/ground connections of each CD4021.

If you need to cascade several chips with the CD4021, wire DS (pin 11) from the first chip to Q7 (pin 3), then wire both chips clock lines together and both chips' data lines together. The SN74LS165 is the same: DS (pin 10) to Q8 (pin 9) and tie the clock and data lines together.

Once wired up, you toggle the latch to read the parallel data, then toggle it again and use the clock pin to read the series of bits. You can see the specific details in my Python scripts: CD4021.py on GitHub and SN74LS165.py on GitHub.

Some References

For wiring diagrams, more background, and Arduino code for the CD4021, read Arduino ShiftIn. For the SN74LS165, read: Arduino: SN74HC165N, 74HC165 8 bit Parallel in/Serial out Shift Register, or Sparkfun: Shift Registers.

Of course, you can use a shift register for output as well as input. In that case you need a SIPO (Serial In, Parallel Out) shift register like a 74HC595. See Arduino ShiftOut: Serial to Parallel Shifting-Out with a 74HC595 Interfacing 74HC595 Serial Shift Register with Raspberry Pi. Another, less common option is the 74HC164N: Using a SN74HC164N Shift Register With Raspberry Pi

For input from my keyboard, initially I used three CD4021s. It basically worked, and you can see the code for it at keyboard.py (older version, for CD4021 shift registers), on GitHub.

But it turned out that looping over all those bits was slow -- I've been advised that you should wait at least 25 microseconds between bits for the CD4021, and even at 10 microseconds I found there wasa significant delay between hitting the key and hearing the note.I thought it might be all the fancy numpy code to generate waveforms for the chords, but when I used the Python profiler, it said most of the program's time was taken up in time.sleep(). Fortunately, there's a faster solution than shift registers: port expanders, which I'll talk about in Multiplexing Part 2: Port Expanders.

Tags: , ,
[ 12:23 Feb 13, 2018    More hardware | permalink to this entry | ]

Sun, 21 Jan 2018

Reading Buttons from a Raspberry Pi

When you attach hardware buttons to a Raspberry Pi's GPIO pin, reading the button's value at any given instant is easy with GPIO.input(). But what if you want to watch for button changes? And how do you do that from a GUI program where the main loop is buried in some library?

Here are some examples of ways to read buttons from a Pi. For this example, I have one side of my button wired to the Raspberry Pi's GPIO 18 and the other side wired to the Pi's 3.3v pin. I'll use the Pi's internal pulldown resistor rather than adding external resistors.

The simplest way: Polling

The obvious way to monitor a button is in a loop, checking the button's value each time:

import RPi.GPIO as GPIO
import time

button_pin = 18

GPIO.setmode(GPIO.BCM)

GPIO.setup(button_pin, GPIO.IN, pull_up_down = GPIO.PUD_DOWN)

try:
    while True:
        if GPIO.input(button_pin):
            print("ON")
        else:
            print("OFF")

        time.sleep(1)

except KeyboardInterrupt:
    print("Cleaning up")
    GPIO.cleanup()

But if you want to be doing something else while you're waiting, instead of just sleeping for a second, it's better to use edge detection.

Edge Detection

GPIO.add_event_detect, will call you back whenever it sees the pin's value change. I'll define a button_handler function that prints out the value of the pin whenever it gets called:

import RPi.GPIO as GPIO
import time

def button_handler(pin):
    print("pin %s's value is %s" % (pin, GPIO.input(pin)))

if __name__ == '__main__':
    button_pin = 18

    GPIO.setmode(GPIO.BCM)

    GPIO.setup(button_pin, GPIO.IN, pull_up_down = GPIO.PUD_DOWN)

    # events can be GPIO.RISING, GPIO.FALLING, or GPIO.BOTH
    GPIO.add_event_detect(button_pin, GPIO.BOTH,
                          callback=button_handler,
                          bouncetime=300)

    try:
        time.sleep(1000)
    except KeyboardInterrupt:
        GPIO.cleanup()

Pretty nifty. But if you try it, you'll probably find that sometimes the value is wrong. You release the switch but it says the value is 1 rather than 0. What's up?

Debounce and Delays

The problem seems to be in the way RPi.GPIO handles that bouncetime=300 parameter.

The bouncetime is there because hardware switches are noisy. As you move the switch from ON to OFF, it doesn't go cleanly all at once from 3.3 volts to 0 volts. Most switches will flicker back and forth between the two values before settling down. To see bounce in action, try the program above without the bouncetime=300. There are ways of fixing bounce in hardware, by adding a capacitor or a Schmitt trigger to the circuit; or you can "debounce" the button in software, by waiting a while after you see a change before acting on it. That's what the bouncetime parameter is for.

But apparently RPi.GPIO, when it handles bouncetime, doesn't always wait quite long enough before calling its event function. It sometimes calls button_handler while the switch is still bouncing, and the value you read might be the wrong one. Increasing bouncetime doesn't help. This seems to be a bug in the RPi.GPIO library.

You'll get more reliable results if you wait a little while before reading the pin's value:

def button_handler(pin):
    time.sleep(.01)    # Wait a while for the pin to settle
    print("pin %s's value is %s" % (pin, GPIO.input(pin)))

Why .01 seconds? Because when I tried it, .001 wasn't enough, and if I used the full bounce time, .3 seconds (corresponding to 300 millisecond bouncetime), I found that the button handler sometimes got called multiple times with the wrong value. I wish I had a better answer for the right amount of time to wait.

Incidentally, the choice of 300 milliseconds for bouncetime is arbitrary and the best value depends on the circuit. You can play around with different values (after commenting out the .01-second sleep) and see how they work with your own circuit and switch.

You might think you could solve the problem by using two handlers:

    GPIO.add_event_detect(button_pin, GPIO.RISING, callback=button_on,
                          bouncetime=bouncetime)
    GPIO.add_event_detect(button_pin, GPIO.FALLING, callback=button_off,
                          bouncetime=bouncetime)
but that apparently isn't allowed: RuntimeError: Conflicting edge detection already enabled for this GPIO channel.

Even if you look just for GPIO.RISING, you'll still get some bogus calls, because there are both rising and falling edges as the switch bounces. Detecting GPIO.BOTH, waiting a short time and checking the pin's value is the only reliable method I've found.

Edge Detection from a GUI Program

And now, the main inspiration for all of this: when you're running a program with a graphical user interface, you don't have control over the event loop. Fortunately, edge detection works fine from a GUI program. For instance, here's a simple TkInter program that monitors a button and shows its state.

import Tkinter
from RPi import GPIO
import time

class ButtonWindow:
    def __init__(self, button_pin):
        self.tkroot = Tkinter.Tk()
        self.tkroot.geometry("100x60")

        self.label = Tkinter.Label(self.tkroot, text="????",
                                   bg="black", fg="white")
        self.label.pack(padx=5, pady=10, side=Tkinter.LEFT)

        self.button_pin = button_pin
        GPIO.setmode(GPIO.BCM)

        GPIO.setup(self.button_pin, GPIO.IN, pull_up_down=GPIO.PUD_DOWN)

        GPIO.add_event_detect(self.button_pin, GPIO.BOTH,
                              callback=self.button_handler,
                              bouncetime=300)

    def button_handler(self, channel):
        time.sleep(.01)
        if GPIO.input(channel):
            self.label.config(text="ON")
            self.label.configure(bg="red")
        else:
            self.label.config(text="OFF")
            self.label.configure(bg="blue")

if __name__ == '__main__':
    win = ButtonWindow(18)
    win.tkroot.mainloop()

You can see slightly longer versions of these programs in my GitHub Pi Zero Book repository.

Tags: , , ,
[ 11:32 Jan 21, 2018    More hardware | permalink to this entry | ]

Sun, 24 Dec 2017

Saving a transparent PNG image from Cairo, in Python

Dave and I will be giving a planetarium talk in February on the analemma and related matters.

Our planetarium, which runs a fiddly and rather limited program called Nightshade, has no way of showing the analemma. Or at least, after trying for nearly a week once, I couldn't find a way. But it can show images, and since I once wrote a Python program to plot the analemma, I figured I could use my program to generate the analemmas I wanted to show and then project them as images onto the planetarium dome.

[analemma simulation] But naturally, I wanted to project just the analemma and associated labels; I didn't want the blue background to cover up the stars the planetarium shows. So I couldn't just use a simple screenshot; I needed a way to get my GTK app to create a transparent image such as a PNG.

That turns out to be hard. GTK can't do it (either GTK2 or GTK3), and people wanting to do anything with transparency are nudged toward the Cairo library. As a first step, I updated my analemma program to use Cairo and GTK3 via gi.repository. Then I dove into Cairo.

I found one C solution for converting an existing Cairo surface to a PNG, but I didn't have much luck with it. But I did find a Python program that draws to a PNG without bothering to create a GUI. I could use that.

The important part of that program is where it creates a new Cairo "surface", and then creates a "context" for that surface:

surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, *imagesize)

cr = cairo.Context(surface)

A Cairo surface is like a canvas to draw on, and it knows how to save itself to a PNG image. A context is the equivalent of a GC in X11 programming: it knows about the current color, font and so forth. So the trick is to create a new surface, create a context, then draw everything all over again with the new context and surface.

A Cairo widget will already have a function to draw everything (in my case, the analemma and all its labels), with this signature:

    def draw(self, widget, ctx):

It already allows passing the context in, so passing in a different context is no problem. I added an argument specifying the background color and transparency, so I could use a blue background in the user interface but a transparent background for the PNG image:

    def draw(self, widget, ctx, background=None):

I also had a minor hitch: in draw(), I was saving the context as self.ctx rather than passing it around to every draw routine. That means calling it with the saved image's context would overwrite the one used for the GUI window. So I save it first.

Here's the final image saving code:

   def save_image(self, outfile):
        dst_surface = cairo.ImageSurface(cairo.FORMAT_ARGB32,
                                         self.width, self.height)

        dst_ctx = cairo.Context(dst_surface)

        # draw() will overwrite self.ctx, so save it first:
        save_ctx = self.ctx

        # Draw everything again to the new context,
        # with a transparent instead of an opaque background:
        self.draw(None, dst_ctx, (0, 0, 1, 0))  # transparent blue

        # Restore the GUI context:
        self.ctx = save_ctx

        dst_surface.write_to_png("example.png")
        print("Saved to", outfile)

Tags: , , , , ,
[ 19:39 Dec 24, 2017    More programming | permalink to this entry | ]

Sat, 05 Aug 2017

Keeping Git Branches in Sync

I do most of my coding on my home machine. But when I travel (or sit in boring meetings), sometimes I do a little hacking on my laptop. Most of my code is hosted in GitHub repos, so when I travel, I like to update all the repos on the laptop to make sure I have what I need even when I'm offline.

That works great as long as I don't make branches. I have a variable $myrepos that lists all the github repositories where I want to contribute, and with a little shell alias it's easy enough to update them all:

allgit() {
    pushd ~
    foreach repo ($myrepos)
        echo $repo :
        cd ~/src/$repo
        git pull
    end
    popd
}

That works well enough -- as long as you don't use branches.

Git's branch model seems to be that branches are for local development, and aren't meant to be shared, pushed, or synchronized among machines. It's ridiculously difficult in git to do something like, "for all branches on the remote server, make sure I have that branch and it's in sync with the server." When you create branches, they don't push to the server by default, and it's remarkably difficult to figure out which of your branches is actually tracking a branch on the server.

A web search finds plenty of people asking, and most of the Git experts answering say things like "Just check out the branch, then pull." In other words, if you want to work on a branch, you'd better know before you go offline exactly which branches in which repositories might have been created or updated since the last time you worked in that repository on that machine. I guess that works if you only ever work on one project in one repo and only on one or two branches at a time. It certainly doesn't work if you need to update lots of repos on a laptop for the first time in two weeks.

Further web searching does find a few possibilities. For checking whether there are files modified that need to be committed, git status --porcelain -uno works well. For checking whether changes are committed but not pushed, git for-each-ref --format="%(refname:short) %(push:track)" refs/heads | fgrep '[ahead' works ... if you make an alias so you never have to look at it.

Figuring out whether branches are tracking remotes is a lot harder. I found some recommendations like git branch -r | grep -v '\->' | while read remote; do git branch --track "${remote#origin/}" "$remote"; done and for remote in `git branch -r`; do git branch --track ${remote#origin/} $remote; done but neither of them really did what I wanted. I was chasing down the rabbit hole of writing shell loops using variables like

  localbranches=("${(@f)$(git branch | sed 's/..//')}")
  remotebranches=("${(@f)$(git branch -a | grep remotes | grep -v HEAD | grep -v master | sed 's_remotes/origin/__' | sed 's/..//')}")
when I thought, there must be a better way. Maybe using Python bindings?

git-python

In Debian, the available packages for Git Python bindings are python-git, python-pygit2, and python-dulwich. Nobody on #python seemed to like any of them, but based on quick attempts with all three, python-git seemed the most straightforward. Confusingly, though Debian calls it python-git, it's called "git-python" in its docs or in web searches, and it's "import git" when you use it.

It's pretty straightforward to use, at least for simple things. You can create a Repo object with

from git import Repo
repo = Repo('.')
and then you can get lists like repo.heads (local branches), repo.refs (local and remote branches and other refs such as tags), etc. Once you have a ref, you can use ref.name, check whether it's tracking a remote branch with ref.tracking_branch(), and make it track one with ref.set_tracking_branch(remoteref). That makes it very easy to get a list of branches showing which ones are tracking a remote branch, something that had proved almost impossible with the git command line.

Nice. But now I wanted more: I wanted to replace those baroque git status --porcelain and git for-each-ref commands I had been using to check whether my repos needed committing or pushing. That proved harder.

Checking for uncommitted files, I decided it would be easiest stick with the existing git status --porcelain -uno. Which was sort of true. git-python lets you call git commands, for cases where the Python bindings aren't quite up to snuff yet, but it doesn't handle all cases. I could call:

    output = repo.git.status(porcelain=True)
but I never did find a way to pass the -uno; I tried u=False, u=None, and u="no" but none of them worked. But -uno actually isn't that important so I decided to do without it.

I found out later that there's another way to call the git command, using execute, which lets you pass the exact arguments you'd pass on the command line. It didn't work to call for-each-ref the way I'd called repo.git.status (repo.git.for_each_ref isn't defined), but I could call it this way:

    foreachref = repo.git.execute(['git', 'for-each-ref',
                                   '--format="%(refname:short) %(push:track)"',
                                   'refs/heads'])
and then parse the output looking for "[ahead]". That worked, but ... ick. I wanted to figure out how to do that using Python.

It's easy to get a ref (branch) and its corresponding tracking ref (remote branch). ref.log() gives you a list of commits on each of the two branches, ordered from earliest to most recent, the opposite of git log. In the simple case, then, what I needed was to iterate backward over the two commit logs, looking for the most recent SHA that's common to both. The Python builtin reversed was useful here:

    for i, entry in enumerate(reversed(ref.log())):
        for j, upstream_entry in enumerate(reversed(upstream.log())):
            if entry.newhexsha == upstream_entry.newhexsha:
                return i, j

(i, j) are the number of commits on the local branch that the remote hasn't seen, and vice versa. If i is zero, or if there's nothing in ref.log(), then the repo has no new commits and doesn't need pushing.

Making branches track a remote

The last thing I needed to do was to make branches track their remotes. Too many times, I've found myself on the laptop, ready to work, and discovered that I didn't have the latest code because I'd been working on a branch on my home machine, and my git pull hadn't pulled the info for the branch because that branch wasn't in the laptop's repo yet. That's what got me started on this whole "update everything" script in the first place.

If you have a ref for the local branch and a ref for the remote branch, you can verify their ref.name is the same, and if the local branch has the same name but isn't tracking the remote branch, probably something went wrong with the local repo (like one of my earlier attempts to get branches in sync, and it's an easy fix: ref.set_tracking_branch(remoteref).

But what if the local branch doesn't exist yet? That's the situation I cared about most, when I've been working on a new branch and it's not on the laptop yet, but I'm going to want to work on it while traveling. And that turned out to be difficult, maybe impossible, to do in git-python.

It's easy to create a new local branch: repo.head.create(repo, name). But that branch gets created as a copy of master, and if you try to turn it into a copy of the remote branch, you get conflicts because the branch is ahead of the remote branch you're trying to copy, or vice versa. You really need to create the new branch as a copy of the remote branch it's supposed to be tracking.

If you search the git-python documentation for ref.create, there are references to "For more documentation, please see the Head.create method." Head.create takes a reference argument (the basic ref.create doesn't, though the documentation suggests it should). But how can you call Head.create? I had no luck with attempts like repo.git.Head.create(repo, name, reference=remotebranches[name]).

I finally gave up and went back to calling the command line from git-python.

repo.git.checkout(remotebranchname, b=name)
I'm not entirely happy with that, but it seems to work.

I'm sure there are all sorts of problems left to solve. But this script does a much better job than any git command I've found of listing the branches in my repositories, checking for modifications that require commits or pushes, and making local branches to mirror new branches on the server. And maybe with time the git-python bindings will improve, and eventually I'll be able to create new tracking branches locally without needing the command line.

The final script, such as it is: gitbranchsync.py.

Tags: , ,
[ 14:39 Aug 05, 2017    More programming | permalink to this entry | ]

Tue, 23 May 2017

Python help from the shell -- greppable and saveable

I'm working on a project involving PyQt5 (on which, more later). One of the problems is that there's not much online documentation, and it's hard to find out details like what signals (events) each widget offers.

Like most Python packages, there is inline help in the source, which means that in the Python console you can say something like

>>> from PyQt5.QtWebEngineWidgets import QWebEngineView
>>> help(QWebEngineView)
The problem is that it's ordered alphabetically; if you want a list of signals, you need to read through all the objects and methods the class offers to look for a few one-liners that include "unbound PYQT_SIGNAL".

If only there was a way to take help(CLASSNAME) and pipe it through grep!

A web search revealed that plenty of other people have wished for this, but I didn't see any solutions. But when I tried running python -c "help(list)" it worked fine -- help isn't dependent on the console.

That means that you should be able to do something like

python -c "from sys import exit; help(exit)"

Sure enough, that worked too.

From there it was only a matter of setting up a zsh function to save on complicated typing. I set up separate aliases for python2, python3 and whatever the default python is. You can get help on builtins (pythonhelp list) or on objects in modules (pythonhelp sys.exit). The zsh suffixes :r (remove extension) and :e (extension) came in handy for separating the module name, before the last dot, and the class name, after the dot.

#############################################################
# Python help functions. Get help on a Python class in a
# format that can be piped through grep, redirected to a file, etc.
# Usage: pythonhelp [module.]class [module.]class ...
pythonXhelp() {
    python=$1
    shift
    for f in $*; do
        if [[ $f =~ '.*\..*' ]]; then
            module=$f:r
            obj=$f:e
            s="from ${module} import ${obj}; help($obj)"
        else
            module=''
            obj=$f
            s="help($obj)"
        fi
        $python -c $s
    done
}
alias pythonhelp="pythonXhelp python"
alias python2help="pythonXhelp python2"
alias python3help="pythonXhelp python3"

So now I can type

python3help PyQt5.QtWebEngineWidgets.QWebEngineView | grep PYQT_SIGNAL
and get that list of signals I wanted.

Tags: , ,
[ 14:12 May 23, 2017    More programming | permalink to this entry | ]

Thu, 06 Apr 2017

Clicking through a translucent window: using X11 input shapes

Update 2022-06-24: Although the concepts described in this article are still valid, the program I wrote depends on GTK2 and is therefore obsolete. I discuss versions for more modern toolkits here: Clicking through a Translucent Image Window.

It happened again: someone sent me a JPEG file with an image of a topo map, with a hiking trail and interesting stopping points drawn on it. Better than nothing. But what I really want on a hike is GPX waypoints that I can load into OsmAnd, so I can see whether I'm still on the trail and how to get to each point from where I am now.

My PyTopo program lets you view the coordinates of any point, so you can make a waypoint from that. But for adding lots of waypoints, that's too much work, so I added an "Add Waypoint" context menu item -- that was easy, took maybe twenty minutes. PyTopo already had the ability to save its existing tracks and waypoints as a GPX file, so no problem there.

[transparent image viewer overlayed on top of topo map] But how do you locate the waypoints you want? You can do it the hard way: show the JPEG in one window, PyTopo in the other, and do the "let's see the road bends left then right, and the point is off to the northwest just above the right bend and about two and a half times as far away as the distance through both road bends". Ugh. It takes forever and it's terribly inaccurate.

More than once, I've wished for a way to put up a translucent image overlay that would let me click through it. So I could see the image, line it up with the map in PyTopo (resizing as needed), then click exactly where I wanted waypoints.

I needed two features beyond what normal image viewers offer: translucency, and the ability to pass mouse clicks through to the window underneath.

A translucent image viewer, in Python

The first part, translucency, turned out to be trivial. In a class inheriting from my Python ImageViewerWindow, I just needed to add this line to the constructor:

    self.set_opacity(.5)

Plus one more step. The window was translucent now, but it didn't look translucent, because I'm running a simple window manager (Openbox) that doesn't have a compositor built in. Turns out you can run a compositor on top of Openbox. There are lots of compositors; the first one I found, which worked fine, was xcompmgr -c -t-6 -l-6 -o.1

The -c specifies client-side compositing. -t and -l specify top and left offsets for window shadows (negative so they go on the bottom right). -o.1 sets the opacity of window shadows. In the long run, -o0 is probably best (no shadows at all) since the shadow interferes a bit with seeing the window under the translucent one. But having a subtle .1 shadow was useful while I was debugging.

That's all I needed: voilà, translucent windows. Now on to the (much) harder part.

A click-through window, in C

X11 has something called the SHAPE extension, which I experimented with once before to make a silly program called moonroot. It's also used for the familiar "xeyes" program. It's used to make windows that aren't square, by passing a shape mask telling X what shape you want your window to be. In theory, I knew I could do something like make a mask where every other pixel was transparent, which would simulate a translucent image, and I'd at least be able to pass clicks through on half the pixels.

But fortunately, first I asked the estimable Openbox guru Mikael Magnusson, who tipped me off that the SHAPE extension also allows for an "input shape" that does exactly what I wanted: lets you catch events on only part of the window and pass them through on the rest, regardless of which parts of the window are visible.

Knowing that was great. Making it work was another matter. Input shapes turn out to be something hardly anyone uses, and there's very little documentation.

In both C and Python, I struggled with drawing onto a pixmap and using it to set the input shape. Finally I realized that there's a call to set the input shape from an X region. It's much easier to build a region out of rectangles than to draw onto a pixmap.

I got a C demo working first. The essence of it was this:

    if (!XShapeQueryExtension(dpy, &shape_event_base, &shape_error_base)) {
        printf("No SHAPE extension\n");
        return;
    }

    /* Make a shaped window, a rectangle smaller than the total
     * size of the window. The rest will be transparent.
     */
    region = CreateRegion(outerBound, outerBound,
                          XWinSize-outerBound*2, YWinSize-outerBound*2);
    XShapeCombineRegion(dpy, win, ShapeBounding, 0, 0, region, ShapeSet);
    XDestroyRegion(region);

    /* Make a frame region.
     * So in the outer frame, we get input, but inside it, it passes through.
     */
    region = CreateFrameRegion(innerBound);
    XShapeCombineRegion(dpy, win, ShapeInput, 0, 0, region, ShapeSet);
    XDestroyRegion(region);

CreateRegion sets up rectangle boundaries, then creates a region from those boundaries:

Region CreateRegion(int x, int y, int w, int h) {
    Region region = XCreateRegion();
    XRectangle rectangle;
    rectangle.x = x;
    rectangle.y = y;
    rectangle.width = w;
    rectangle.height = h;
    XUnionRectWithRegion(&rectangle, region, region);

    return region;
}

CreateFrameRegion() is similar but a little longer. Rather than post it all here, I've created a GIST: transregion.c, demonstrating X11 shaped input.

Next problem: once I had shaped input working, I could no longer move or resize the window, because the window manager passed events through the window's titlebar and decorations as well as through the rest of the window. That's why you'll see that CreateFrameRegion call in the gist: -- I had a theory that if I omitted the outer part of the window from the input shape, and handled input normally around the outside, maybe that would extend to the window manager decorations. But the problem turned out to be a minor Openbox bug, which Mikael quickly tracked down (in openbox/frame.c, in the XShapeCombineRectangles call on line 321, change ShapeBounding to kind). Openbox developers are the greatest!

Input Shapes in Python

Okay, now I had a proof of concept: X input shapes definitely can work, at least in C. How about in Python?

There's a set of python-xlib bindings, and they even supports the SHAPE extension, but they have no documentation and didn't seem to include input shapes. I filed a GitHub issue and traded a few notes with the maintainer of the project. It turned out the newest version of python-xlib had been completely rewritten, and supposedly does support input shapes. But the API is completely different from the C API, and after wasting about half a day tweaking the demo program trying to reverse engineer it, I gave up.

Fortunately, it turns out there's a much easier way. Python-gtk has shape support, even including input shapes. And if you use regions instead of pixmaps, it's this simple:

    if self.is_composited():
        region = gtk.gdk.region_rectangle(gtk.gdk.Rectangle(0, 0, 1, 1))
        self.window.input_shape_combine_region(region, 0, 0)

My transimageviewer.py came out nice and simple, inheriting from imageviewer.py and adding only translucency and the input shape.

If you want to define an input shape based on pixmaps instead of regions, it's a bit harder and you need to use the Cairo drawing API. I never got as far as working code, but I believe it should go something like this:

    # Warning: untested code!
    bitmap = gtk.gdk.Pixmap(None, self.width, self.height, 1)
    cr = bitmap.cairo_create()
    # Draw a white circle in a black rect:
    cr.rectangle(0, 0, self.width, self.height)
    cr.set_operator(cairo.OPERATOR_CLEAR)
    cr.fill();

    # draw white filled circle
    cr.arc(self.width / 2, self.height / 2, self.width / 4,
           0, 2 * math.pi);
    cr.set_operator(cairo.OPERATOR_OVER);
    cr.fill();

    self.window.input_shape_combine_mask(bitmap, 0, 0)

The translucent image viewer worked just as I'd hoped. I was able to take a JPG of a trailmap, overlay it on top of a PyTopo window, scale the JPG using the normal Openbox window manager handles, then right-click on top of trail markers to set waypoints. When I was done, a "Save as GPX" in PyTopo and I had a file ready to take with me on my phone.

Tags: , , , , ,
[ 17:08 Apr 06, 2017    More programming | permalink to this entry | ]

Sat, 25 Mar 2017

Reading keypresses in Python

As part of preparation for Everyone Does IT, I was working on a silly hack to my Python script that plays notes and chords: I wanted to use the computer keyboard like a music keyboard, and play different notes when I press different keys. Obviously, in a case like that I don't want line buffering -- I want the program to play notes as soon as I press a key, not wait until I hit Enter and then play the whole line at once. In Unix that's called "cbreak mode".

There are a few ways to do this in Python. The most straightforward way is to use the curses library, which is designed for console based user interfaces and games. But importing curses is overkill just to do key reading.

Years ago, I found a guide on the official Python Library and Extension FAQ: Python: How do I get a single keypress at a time?. I'd even used it once, for a one-off Raspberry Pi project that I didn't end up using much. I hadn't done much testing of it at the time, but trying it now, I found a big problem: it doesn't block.

Blocking is whether the read() waits for input or returns immediately. If I read a character with c = sys.stdin.read(1) but there's been no character typed yet, a non-blocking read will throw an IOError exception, while a blocking read will wait, not returning until the user types a character.

In the code on that Python FAQ page, blocking looks like it should be optional. This line:

fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)
is the part that requests non-blocking reads. Skipping that should let me read characters one at a time, block until each character is typed. But in practice, it doesn't work. If I omit the O_NONBLOCK flag, reads never return, not even if I hit Enter; if I set O_NONBLOCK, the read immediately raises an IOError. So I have to call read() over and over, spinning the CPU at 100% while I wait for the user to type something.

The way this is supposed to work is documented in the termios man page. Part of what tcgetattr returns is something called the cc structure, which includes two members called Vmin and Vtime. man termios is very clear on how they're supposed to work: for blocking, single character reads, you set Vmin to 1 (that's the number of characters you want it to batch up before returning), and Vtime to 0 (return immediately after getting that one character). But setting them in Python with tcsetattr doesn't make any difference.

(Python also has a module called tty that's supposed to simplify this stuff, and you should be able to call tty.setcbreak(fd). But that didn't work any better than termios: I suspect it just calls termios under the hood.)

But after a few hours of fiddling and googling, I realized that even if Python's termios can't block, there are other ways of blocking on input. The select system call lets you wait on any file descriptor until has input. So I should be able to set stdin to be non-blocking, then do my own blocking by waiting for it with select.

And that worked. Here's a minimal example:

import sys, os
import termios, fcntl
import select

fd = sys.stdin.fileno()
newattr = termios.tcgetattr(fd)
newattr[3] = newattr[3] & ~termios.ICANON
newattr[3] = newattr[3] & ~termios.ECHO
termios.tcsetattr(fd, termios.TCSANOW, newattr)

oldterm = termios.tcgetattr(fd)
oldflags = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)

print "Type some stuff"
while True:
    inp, outp, err = select.select([sys.stdin], [], [])
    c = sys.stdin.read()
    if c == 'q':
        break
    print "-", c

# Reset the terminal:
termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags)

A less minimal example: keyreader.py, a class to read characters, with blocking and echo optional. It also cleans up after itself on exit, though most of the time that seems to happen automatically when I exit the Python script.

Update, 2017: It turns out this doesn't work in Python 3: 3 needs some extra semantics when opening the file. For a nice example of nonblocking read in Python 3, see ballingt: Nonblocking stdin read works differently in Python 3.

Tags: ,
[ 12:42 Mar 25, 2017    More programming | permalink to this entry | ]

Sat, 14 Jan 2017

Plotting election (and other county-level) data with Python Basemap

After my arduous search for open 2016 election data by county, as a first test I wanted one of those red-blue-purple charts of how Democratic or Republican each county's vote was.

I used the Basemap package for plotting. It used to be part of matplotlib, but it's been split off into its own toolkit, grouped under mpl_toolkits: on Debian, it's available as python-mpltoolkits.basemap, or you can find Basemap on GitHub.

It's easiest to start with the fillstates.py example that shows how to draw a US map with different states colored differently. You'll need the three shapefiles (because of ESRI's silly shapefile format): st99_d00.dbf, st99_d00.shp and st99_d00.shx, available in the same examples directory.

Of course, to plot counties, you need county shapefiles as well. The US Census has county shapefiles at several different resolutions (I used the 500k version). Then you can plot state and counties outlines like this:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

def draw_us_map():
    # Set the lower left and upper right limits of the bounding box:
    lllon = -119
    urlon = -64
    lllat = 22.0
    urlat = 50.5
    # and calculate a centerpoint, needed for the projection:
    centerlon = float(lllon + urlon) / 2.0
    centerlat = float(lllat + urlat) / 2.0

    m = Basemap(resolution='i',  # crude, low, intermediate, high, full
                llcrnrlon = lllon, urcrnrlon = urlon,
                lon_0 = centerlon,
                llcrnrlat = lllat, urcrnrlat = urlat,
                lat_0 = centerlat,
                projection='tmerc')

    # Read state boundaries.
    shp_info = m.readshapefile('st99_d00', 'states',
                               drawbounds=True, color='lightgrey')

    # Read county boundaries
    shp_info = m.readshapefile('cb_2015_us_county_500k',
                               'counties',
                               drawbounds=True)

if __name__ == "__main__":
    draw_us_map()
    plt.title('US Counties')
    # Get rid of some of the extraneous whitespace matplotlib loves to use.
    plt.tight_layout(pad=0, w_pad=0, h_pad=0)
    plt.show()
[Simple map of US county borders]

Accessing the state and county data after reading shapefiles

Great. Now that we've plotted all the states and counties, how do we get a list of them, so that when I read out "Santa Clara, CA" from the data I'm trying to plot, I know which map object to color?

After calling readshapefile('st99_d00', 'states'), m has two new members, both lists: m.states and m.states_info.

m.states_info[] is a list of dicts mirroring what was in the shapefile. For the Census state list, the useful keys are NAME, AREA, and PERIMETER. There's also STATE, which is an integer (not restricted to 1 through 50) but I'll get to that.

If you want the shape for, say, California, iterate through m.states_info[] looking for the one where m.states_info[i]["NAME"] == "California". Note i; the shape coordinates will be in m.states[i]n (in basemap map coordinates, not latitude/longitude).

Correlating states and counties in Census shapefiles

County data is similar, with county names in m.counties_info[i]["NAME"]. Remember that STATE integer? Each county has a STATEFP, m.counties_info[i]["STATEFP"] that matches some state's m.states_info[i]["STATE"].

But doing that search every time would be slow. So right after calling readshapefile for the states, I make a table of states. Empirically, STATE in the state list goes up to 72. Why 72? Shrug.

    MAXSTATEFP = 73
    states = [None] * MAXSTATEFP
    for state in m.states_info:
        statefp = int(state["STATE"])
        # Many states have multiple entries in m.states (because of islands).
        # Only add it once.
        if not states[statefp]:
            states[statefp] = state["NAME"]

That'll make it easy to look up a county's state name quickly when we're looping through all the counties.

Calculating colors for each county

Time to figure out the colors from the Deleetdk election results CSV file. Reading lines from the CSV file into a dictionary is superficially easy enough:

    fp = open("tidy_data.csv")
    reader = csv.DictReader(fp)

    # Make a dictionary of all "county, state" and their colors.
    county_colors = {}
    for county in reader:
        # What color is this county?
        pop = float(county["votes"])
        blue = float(county["results.clintonh"])/pop
        red = float(county["Total.Population"])/pop
        county_colors["%s, %s" % (county["name"], county["State"])] \
            = (red, 0, blue)

But in practice, that wasn't good enough, because the county names in the Deleetdk names didn't always match the official Census county names.

Fuzzy matches

For instance, the CSV file had no results for Alaska or Puerto Rico, so I had to skip those. Non-ASCII characters were a problem: "Doña Ana" county in the census data was "Dona Ana" in the CSV. I had to strip off " County", " Borough" and similar terms: "St Louis" in the census data was "St. Louis County" in the CSV. Some names were capitalized differently, like PLYMOUTH vs. Plymouth, or Lac Qui Parle vs. Lac qui Parle. And some names were just different, like "Jeff Davis" vs. "Jefferson Davis".

To get around that I used SequenceMatcher to look for fuzzy matches when I couldn't find an exact match:

def fuzzy_find(s, slist):
    '''Try to find a fuzzy match for s in slist.
    '''
    best_ratio = -1
    best_match = None

    ls = s.lower()
    for ss in slist:
        r = SequenceMatcher(None, ls, ss.lower()).ratio()
        if r > best_ratio:
            best_ratio = r
            best_match = ss
    if best_ratio > .75:
        return best_match
    return None

Correlate the county names from the two datasets

It's finally time to loop through the counties in the map to color and plot them.

Remember STATE vs. STATEFP? It turns out there are a few counties in the census county shapefile with a STATEFP that doesn't match any STATE in the state shapefile. Mostly they're in the Virgin Islands and I don't have election data for them anyway, so I skipped them for now. I also skipped Puerto Rico and Alaska (no results in the election data) and counties that had no corresponding state: I'll omit that code here, but you can see it in the final script, linked at the end.

    for i, county in enumerate(m.counties_info):
        countyname = county["NAME"]
        try:
            statename = states[int(county["STATEFP"])]
        except IndexError:
            print countyname, "has out-of-index statefp of", county["STATEFP"]
            continue

        countystate = "%s, %s" % (countyname, statename)
        try:
            ccolor = county_colors[countystate]
        except KeyError:
            # No exact match; try for a fuzzy match
            fuzzyname = fuzzy_find(countystate, county_colors.keys())
            if fuzzyname:
                ccolor = county_colors[fuzzyname]
                county_colors[countystate] = ccolor
            else:
                print "No match for", countystate
                continue

        countyseg = m.counties[i]
        poly = Polygon(countyseg, facecolor=ccolor)  # edgecolor="white"
        ax.add_patch(poly)

Moving Hawaii

Finally, although the CSV didn't have results for Alaska, it did have Hawaii. To display it, you can move it when creating the patches:

    countyseg = m.counties[i]
    if statename == 'Hawaii':
        countyseg = list(map(lambda (x,y): (x + 5750000, y-1400000), countyseg))
    poly = Polygon(countyseg, facecolor=countycolor)
    ax.add_patch(poly)
The offsets are in map coordinates and are empirical; I fiddled with them until Hawaii showed up at a reasonable place. [Blue-red-purple 2016 election map]

Well, that was a much longer article than I intended. Turns out it takes a fair amount of code to correlate several datasets and turn them into a map. But a lot of the work will be applicable to other datasets.

Full script on GitHub: Blue-red map using Census county shapefile

Tags: , , , , , , , , , , ,
[ 15:10 Jan 14, 2017    More programming | permalink to this entry | ]

Sun, 08 Jan 2017

Using virtualenv to replace the broken pip install --user

Python's installation tool, pip, has some problems on Debian.

The obvious way to use pip is as root: sudo pip install packagename. If you hang out in Python groups at all, you'll quickly find that this is strongly frowned upon. It can lead to your pip-installed packages intermingling with the ones installed by Debian's apt-get, possibly causing problems during apt system updates.

The second most obvious way, as you'll see if you read pip's man page, is pip --user install packagename. This installs the package with only user permissions, not root, under a directory called ~/.local. Python automatically checks .local as part of its PYTHONPATH, and you can add ~/.local/bin to your PATH, so this makes everything transparent.

Or so I thought until recently, when I discovered that pip install --user ignores system-installed packages when it's calculating its dependencies, so you could end up with a bunch of incompatible versions of packages installed. Plus it takes forever to re-download and re-install dependencies you already had.

Pip has a clear page describing how pip --user is supposed to work, and that isn't what it's doing. So I filed pip bug 4222; but since pip has 687 open bugs filed against it, I'm not terrifically hopeful of that getting fixed any time soon. So I needed a workaround.

Use virtualenv instead of --user

Fortunately, it turned out that pip install works correctly in a virtualenv if you include the --system-site-packages option. I had thought virtualenvs were for testing, but quite a few people on #python said they used virtualenvs all the time, as part of their normal runtime environments. (Maybe due to pip's deficiencies?) I had heard people speak deprecatingly of --user in favor of virtualenvs but was never clear why; maybe this is why.

So, what I needed was to set up a virtualenv that I can keep around all the time and use by default every time I log in. I called it ~/.pythonenv when I created it:

virtualenv --system-site-packages $HOME/.pythonenv

Normally, the next thing you do after creating a virtualenv is to source a script called bin/activate inside the venv. That sets up your PATH, PYTHONPATH and a bunch of other variables so the venv will be used in all the right ways. But activate also changes your prompt, which I didn't want in my normal runtime environment. So I stuck this in my .zlogin file:

VIRTUAL_ENV_DISABLE_PROMPT=1 source $HOME/.pythonenv/bin/activate

Now I'll activate the venv once, when I log in (and once in every xterm window since I set XTerm*loginShell: true in my .Xdefaults. I see my normal prompt, I can use the normal Debian-installed Python packages, and I can install additional PyPI packages with pip install packagename (no --user, no sudo).

Tags:
[ 11:37 Jan 08, 2017    More programming | permalink to this entry | ]

Thu, 22 Dec 2016

Tips on Developing Python Projects for PyPI

I wrote two recent articles on Python packaging: Distributing Python Packages Part I: Creating a Python Package and Distributing Python Packages Part II: Submitting to PyPI. I was able to get a couple of my programs packaged and submitted.

Ongoing Development and Testing

But then I realized all was not quite right. I could install new releases of my package -- but I couldn't run it from the source directory any more. How could I test changes without needing to rebuild the package for every little change I made?

Fortunately, it turned out to be fairly easy. Set PYTHONPATH to a directory that includes all the modules you normally want to test. For example, inside my bin directory I have a python directory where I can symlink any development modules I might need:

mkdir ~/bin/python
ln -s ~/src/metapho/metapho ~/bin/python/

Then add the directory at the beginning of PYTHONPATH:

export PYTHONPATH=$HOME/bin/python

With that, I could test from the development directory again, without needing to rebuild and install a package every time.

Cleaning up files used in building

Building a package leaves some extra files and directories around, and git status will whine at you since they're not version controlled. Of course, you could gitignore them, but it's better to clean them up after you no longer need them.

To do that, you can add a clean command to setup.py.

from setuptools import Command

class CleanCommand(Command):
    """Custom clean command to tidy up the project root."""
    user_options = []
    def initialize_options(self):
        pass
    def finalize_options(self):
        pass
    def run(self):
        os.system('rm -vrf ./build ./dist ./*.pyc ./*.tgz ./*.egg-info ./docs/sphinxdoc/_build')
(Obviously, that includes file types beyond what you need for just cleaning up after package building. Adjust the list as needed.)

Then in the setup() function, add these lines:

      cmdclass={
          'clean': CleanCommand,
      }

Now you can type

python setup.py clean
and it will remove all the extra files.

Keeping version strings in sync

It's so easy to update the __version__ string in your module and forget that you also have to do it in setup.py, or vice versa. Much better to make sure they're always in sync.

I found several version of that using system("grep..."), but I decided to write my own that doesn't depend on system(). (Yes, I should do the same thing with that CleanCommand, I know.)

def get_version():
    '''Read the pytopo module versions from pytopo/__init__.py'''
    with open("pytopo/__init__.py") as fp:
        for line in fp:
            line = line.strip()
            if line.startswith("__version__"):
                parts = line.split("=")
                if len(parts) > 1:
                    return parts[1].strip()

Then in setup():

      version=get_version(),

Much better! Now you only have to update __version__ inside your module and setup.py will automatically use it.

Using your README for a package long description

setup has a long_description for the package, but you probably already have some sort of README in your package. You can use it for your long description this way:

# Utility function to read the README file.
# Used for the long_description.
def read(fname):
    return open(os.path.join(os.path.dirname(__file__), fname)).read()
    long_description=read('README'),

Tags: , ,
[ 10:15 Dec 22, 2016    More programming | permalink to this entry | ]

Sat, 17 Dec 2016

Distributing Python Packages Part II: Submitting to PyPI

In Part I, I discussed writing a setup.py to make a package you can submit to PyPI. Today I'll talk about better ways of testing the package, and how to submit it so other people can install it.

Testing in a VirtualEnv

You've verified that your package installs. But you still need to test it and make sure it works in a clean environment, without all your developer settings.

The best way to test is to set up a "virtual environment", where you can install your test packages without messing up your regular runtime environment. I shied away from virtualenvs for a long time, but they're actually very easy to set up:

virtualenv venv
source venv/bin/activate

That creates a directory named venv under the current directory, which it will use to install packages. Then you can pip install packagename or pip install /path/to/packagename-version.tar.gz

Except -- hold on! Nothing in Python packaging is that easy. It turns out there are a lot of packages that won't install inside a virtualenv, and one of them is PyGTK, the library I use for my user interfaces. Attempting to install pygtk inside a venv gets:

********************************************************************
* Building PyGTK using distutils is only supported on windows. *
* To build PyGTK in a supported way, read the INSTALL file.    *
********************************************************************

Windows only? Seriously? PyGTK works fine on both Linux and Mac; it's packaged on every Linux distribution, and on Mac it's packaged with GIMP. But for some reason, whoever maintains the PyPI PyGTK packages hasn't bothered to make it work on anything but Windows, and PyGTK seems to be mostly an orphaned project so that's not likely to change.

(There's a package called ruamel.venvgtk that's supposed to work around this, but it didn't make any difference for me.)

The solution is to let the virtualenv use your system-installed packages, so it can find GTK and other non-PyPI packages there:

virtualenv --system-site-packages venv
source venv/bin/activate

I also found that if I had a ~/.local directory (where packages normally go if I use pip install --user packagename), sometimes pip would install to .local instead of the venv. I never did track down why this happened some times and not others, but when it happened, a temporary mv ~/.local ~/old.local fixed it.

Test your Python package in the venv until everything works. When you're finished with your venv, you can run deactivate and then remove it with rm -rf venv.

Tag it on GitHub

Is your project ready to publish?

If your project is hosted on GitHub, you can have pypi download it automatically. In your setup.py, set

download_url='https://github.com/user/package/tarball/tagname',

Check that in. Then make a tag and push it:

git tag 0.1 -m "Name for this tag"
git push --tags origin master

Try to make your tag match the version you've set in setup.py and in your module.

Push it to pypitest

Register a new account and password on both pypitest and on pypi.

Then create a ~/.pypirc that looks like this:

[distutils]
index-servers =
  pypi
  pypitest

[pypi]
repository=https://pypi.python.org/pypi
username=YOUR_USERNAME
password=YOUR_PASSWORD

[pypitest]
repository=https://testpypi.python.org/pypi
username=YOUR_USERNAME
password=YOUR_PASSWORD

Yes, those passwords are in cleartext. Incredibly, there doesn't seem to be a way to store an encrypted password or even have it prompt you. There are tons of complaints about that all over the web but nobody seems to have a solution. You can specify a password on the command line, but that's not much better. So use a password you don't use anywhere else and don't mind too much if someone guesses.

Update: Apparently there's a newer method called twine that solves the password encryption problem. Read about it here: Uploading your project to PyPI. You should probably use twine instead of the setup.py commands discussed in the next paragraph.

Now register your project and upload it:

python setup.py register -r pypitest
python setup.py sdist upload -r pypitest

Wait a few minutes: it takes pypitest a little while before new packages become available. Then go to your venv (to be safe, maybe delete the old venv and create a new one, or at least pip uninstall) and try installing:

pip install -i https://testpypi.python.org/pypi YourPackageName

If you get "No matching distribution found for packagename", wait a few minutes then try again.

If it all works, then you're ready to submit to the real pypi:

python setup.py register -r pypi
python setup.py sdist upload -r pypi

Congratulations! If you've gone through all these steps, you've uploaded a package to pypi. Pat yourself on the back and go tell everybody they can pip install your package.

Some useful reading

Some pages I found useful:

A great tutorial except that it forgets to mention signing up for an account: Python Packaging with GitHub

Another good tutorial: First time with PyPI

Allowed PyPI classifiers -- the categories your project fits into Unfortunately there aren't very many of those, so you'll probably be stuck with 'Topic :: Utilities' and not much else.

Python Packages and You: not a tutorial, but a lot of good advice on style and designing good packages.

Tags: , ,
[ 16:19 Dec 17, 2016    More programming | permalink to this entry | ]

Sun, 11 Dec 2016

Distributing Python Packages Part I: Creating a Python Package

I write lots of Python scripts that I think would be useful to other people, but I've put off learning how to submit to the Python Package Index, PyPI, so that my packages can be installed using pip install.

Now that I've finally done it, I see why I put it off for so long. Unlike programming in Python, packaging is a huge, poorly documented hassle, and it took me days to get a working.package. Maybe some of the hints here will help other struggling Pythonistas.

Create a setup.py

The setup.py file is the file that describes the files in your project and other installation information. If you've never created a setup.py before, Submitting a Python package with GitHub and PyPI has a decent example, and you can find lots more good examples with a web search for "setup.py", so I'll skip the basics and just mention some of the parts that weren't straightforward.

Distutils vs. Setuptools

However, there's one confusing point that no one seems to mention. setup.py examples all rely on a predefined function called setup, but some examples start with

from distutils.core import setup
while others start with
from setuptools import setup

In other words, there are two different versions of setup! What's the difference? I still have no idea. The setuptools version seems to be a bit more advanced, and I found that using distutils.core , sometimes I'd get weird errors when trying to follow suggestions I found on the web. So I ended up using the setuptools version.

But I didn't initially have setuptools installed (it's not part of the standard Python distribution), so I installed it from the Debian package:

apt-get install python-setuptools python-wheel

The python-wheel package isn't strictly needed, but I found I got assorted warnings warnings from pip install later in the process ("Cannot build wheel") unless I installed it, so I recommend you install it from the start.

Including scripts

setup.py has a scripts option to include scripts that are part of your package:

    scripts=['script1', 'script2'],

But when I tried to use it, I had all sorts of problems, starting with scripts not actually being included in the source distribution. There isn't much support for using scripts -- it turns out you're actually supposed to use something called console_scripts, which is more elaborate.

First, you can't have a separate script file, or even a __main__ inside an existing class file. You must have a function, typically called main(), so you'll typically have this:

def main():
    # do your script stuff

if __name__ == "__main__":
    main()

Then add something like this to your setup.py:

      entry_points={
          'console_scripts': [
              script1=yourpackage.filename:main',
              script2=yourpackage.filename2:main'
          ]
      },

There's a secret undocumented alternative that a few people use for scripts with graphical user interfaces: use 'gui_scripts' rather than 'console_scripts'. It seems to work when I try it, but the fact that it's not documented and none of the Python experts even seem to know about it scared me off, and I stuck with 'console_scripts'.

Including data files

One of my packages, pytopo, has a couple of files it needs to install, like an icon image. setup.py has a provision for that:

      data_files=[('/usr/share/pixmaps',      ["resources/appname.png"]),
                  ('/usr/share/applications', ["resources/appname.desktop"]),
                  ('/usr/share/appname',      ["resources/pin.png"]),
                 ],

Great -- except it doesn't work. None of the files actually gets added to the source distribution.

One solution people mention to a "files not getting added" problem is to create an explicit MANIFEST file listing all files that need to be in the distribution. Normally, setup generates the MANIFEST automatically, but apparently it isn't smart enough to notice data_files and include those in its generated MANIFEST.

I tried creating a MANIFEST listing all the .py files plus the various resources -- but it didn't make any difference. My MANIFEST was ignored.

The solution turned out to be creating a MANIFEST.in file, which is used to generate a MANIFEST. It's easier than creating the MANIFEST itself: you don't have to list every file, just patterns that describe them:

include setup.py
include packagename/*.py
include resources/*
If you have any scripts that don't use the extension .py, don't forget to include them as well. This may have been why scripts= didn't work for me earlier, but by the time I found out about MANIFEST.in I had already switched to using console_scripts.

Testing setup.py

Once you have a setup.py, use it to generate a source distribution with:

python setup.py sdist
(You can also use bdist to generate a binary distribution, but you'll probably only need that if you're compiling C as part of your package. Source dists are apparently enough for pure Python packages.)

Your package will end up in dist/packagename-version.tar.gz so you can use tar tf dist/packagename-version.tar.gz to verify what files are in it. Work on your setup.py until you don't get any errors or warnings and the list of files looks right.

Congratulations -- you've made a Python package! I'll post a followup article in a day or two about more ways of testing, and how to submit your working package to PyPI.

Update: Part II is up: Distributing Python Packages Part II: Submitting to PyPI.

Tags: , ,
[ 12:54 Dec 11, 2016    More programming | permalink to this entry | ]

Wed, 05 Oct 2016

Play notes, chords and arbitrary waveforms from Python

Reading Stephen Wolfram's latest discussion of teaching computational thinking (which, though I mostly agree with it, is more an extended ad for Wolfram Programming Lab than a discussion of what computational thinking is and why we should teach it) I found myself musing over ideas for future computer classes for Los Alamos Makers. Students, and especially kids, like to see something other than words on a screen. Graphics and games good, or robotics when possible ... but another fun project a novice programmer can appreciate is music.

I found myself curious what you could do with Python, since I hadn't played much with Python sound generation libraries. I did discover a while ago that Python is rather bad at playing audio files, though I did eventually manage to write a music player script that works quite well. What about generating tones and chords?

A web search revealed that this is another thing Python is bad at. I found lots of people asking about chord generation, and a handful of half-baked ideas that relied on long obsolete packages or external program. But none of it actually worked, at least without requiring Windows or relying on larger packages like fluidsynth (which looked worth exploring some day when I have more time).

Play an arbitrary waveform with Pygame and NumPy

But I did find one example based on a long-obsolete Python package called Numeric which, when rewritten to use NumPy, actually played a sound. You can take a NumPy array and play it using a pygame.sndarray object this way:

import pygame, pygame.sndarray

def play_for(sample_wave, ms):
    """Play the given NumPy array, as a sound, for ms milliseconds."""
    sound = pygame.sndarray.make_sound(sample_wave)
    sound.play(-1)
    pygame.time.delay(ms)
    sound.stop()

Then you just need to calculate the waveform you want to play. NumPy can generate sine waves on its own, while scipy.signal can generate square and sawtooth waves. Like this:

import numpy
import scipy.signal

sample_rate = 44100

def sine_wave(hz, peak, n_samples=sample_rate):
    """Compute N samples of a sine wave with given frequency and peak amplitude.
       Defaults to one second.
    """
    length = sample_rate / float(hz)
    omega = numpy.pi * 2 / length
    xvalues = numpy.arange(int(length)) * omega
    onecycle = peak * numpy.sin(xvalues)
    return numpy.resize(onecycle, (n_samples,)).astype(numpy.int16)

def square_wave(hz, peak, duty_cycle=.5, n_samples=sample_rate):
    """Compute N samples of a sine wave with given frequency and peak amplitude.
       Defaults to one second.
    """
    t = numpy.linspace(0, 1, 500 * 440/hz, endpoint=False)
    wave = scipy.signal.square(2 * numpy.pi * 5 * t, duty=duty_cycle)
    wave = numpy.resize(wave, (n_samples,))
    return (peak / 2 * wave.astype(numpy.int16))

# Play A (440Hz) for 1 second as a sine wave:
play_for(sine_wave(440, 4096), 1000)

# Play A-440 for 1 second as a square wave:
play_for(square_wave(440, 4096), 1000)

Playing chords

That's all very well, but it's still a single tone, not a chord.

To generate a chord of two notes, you can add the waveforms for the two notes. For instance, 440Hz is concert A, and the A one octave above it is double the frequence, or 880 Hz. If you wanted to play a chord consisting of those two As, you could do it like this:

play_for(sum([sine_wave(440, 4096), sine_wave(880, 4096)]), 1000)

Simple octaves aren't very interesting to listen to. What you want is chords like major and minor triads and so forth. If you google for chord ratios Google helpfully gives you a few of them right off, then links to a page with a table of ratios for some common chords.

For instance, the major triad ratios are listed as 4:5:6. What does that mean? It means that for a C-E-G triad (the first C chord you learn in piano), the E's frequency is 5/4 of the C's frequency, and the G is 6/4 of the C.

You can pass that list, [4, 5, 5] to a function that will calculate the right ratios to produce the set of waveforms you need to add to get your chord:

def make_chord(hz, ratios):
    """Make a chord based on a list of frequency ratios."""
    sampling = 4096
    chord = waveform(hz, sampling)
    for r in ratios[1:]:
        chord = sum([chord, sine_wave(hz * r / ratios[0], sampling)])
    return chord

def major_triad(hz):
    return make_chord(hz, [4, 5, 6])

play_for(major_triad(440), length)

Even better, you can pass in the waveform you want to use when you're adding instruments together:

def make_chord(hz, ratios, waveform=None):
    """Make a chord based on a list of frequency ratios
       using a given waveform (defaults to a sine wave).
    """
    sampling = 4096
    if not waveform:
        waveform = sine_wave
    chord = waveform(hz, sampling)
    for r in ratios[1:]:
        chord = sum([chord, waveform(hz * r / ratios[0], sampling)])
    return chord

def major_triad(hz, waveform=None):
    return make_chord(hz, [4, 5, 6], waveform)

play_for(major_triad(440, square_wave), length)

There are still some problems. For instance, sawtooth_wave() works fine individually or for pairs of notes, but triads of sawtooths don't play correctly. I'm guessing something about the sampling rate is making their overtones cancel out part of the sawtooth wave. Triangle waves (in scipy.signal, that's a sawtooth wave with rising ramp width of 0.5) don't seem to work right even for single tones. I'm sure these are solvable, perhaps by fiddling with the sampling rate. I'll probably need to add graphics so I can look at the waveform for debugging purposes.

In any case, it was a fun morning hack. Most chords work pretty well, and it's nice to know how to to play any waveform I can generate.

The full script is here: play_chord.py on GitHub.

Tags: , ,
[ 11:29 Oct 05, 2016    More programming | permalink to this entry | ]

Sat, 06 Aug 2016

Adding a Back button in Python Webkit-GTK

I have a little browser script in Python, called quickbrowse, based on Python-Webkit-GTK. I use it for things like quickly calling up an anonymous window with full javascript and cookies, for when I hit a page that doesn't work with Firefox and privacy blocking; and as a quick solution for calling up HTML conversions of doc and pdf email attachments.

Python-webkit comes with a simple browser as an example -- on Debian it's installed in /usr/share/doc/python-webkit/examples/browser.py. But it's very minimal, and lacks important basic features like command-line arguments. One of those basic features I've been meaning to add is Back and Forward buttons.

Should be easy, right? Of course webkit has a go_back() method, so I just have to add a button and call that, right? Ha. It turned out to be a lot more difficult than I expected, and although I found a fair number of pages asking about it, I didn't find many working examples. So here's how to do it.

Add a toolbar button

In the WebToolbar class (derived from gtk.Toolbar): In __init__(), after initializing the parent class and before creating the location text entry (assuming you want your buttons left of the location bar), create the two buttons:

        backButton = gtk.ToolButton(gtk.STOCK_GO_BACK)
        backButton.connect("clicked", self.back_cb)
        self.insert(backButton, -1)
        backButton.show()

        forwardButton = gtk.ToolButton(gtk.STOCK_GO_FORWARD)
        forwardButton.connect("clicked", self.forward_cb)
        self.insert(forwardButton, -1)
        forwardButton.show()

Now create those callbacks you just referenced:

   def back_cb(self, w):
        self.emit("go-back-requested")

    def forward_cb(self, w):
        self.emit("go-forward-requested")

That's right, you can't just call go_back on the web view, because GtkToolbar doesn't know anything about the window containing it. All it can do is pass signals up the chain.

But wait -- it can't even pass signals unless you define them. There's a __gsignals__ object defined at the beginning of the class that needs all its signals spelled out. In this case, what you need is

       "go-back-requested": (gobject.SIGNAL_RUN_FIRST,
                              gobject.TYPE_NONE, ()),
       "go-forward-requested": (gobject.SIGNAL_RUN_FIRST,
                              gobject.TYPE_NONE, ()),
Now these signals will bubble up to the window containing the toolbar.

Handle the signals in the containing window

So now you have to handle those signals in the window. In WebBrowserWindow (derived from gtk.Window), in __init__ after creating the toolbar:

        toolbar.connect("go-back-requested", self.go_back_requested_cb,
                        self.content_tabs)
        toolbar.connect("go-forward-requested", self.go_forward_requested_cb,
                        self.content_tabs)

And then of course you have to define those callbacks:

def go_back_requested_cb (self, widget, content_pane):
    # Oops! What goes here?
def go_forward_requested_cb (self, widget, content_pane):
    # Oops! What goes here?

But whoops! What do we put there? It turns out that WebBrowserWindow has no better idea than WebToolbar did of where its content is or how to tell it to go back or forward. What it does have is a ContentPane (derived from gtk.Notebook), which is basically just a container with no exposed methods that have anything to do with web browsing.

Get the BrowserView for the current tab

Fortunately we can fix that. In ContentPane, you can get the current page (meaning the current browser tab, in this case); and each page has a child, which turns out to be a BrowserView. So you can add this function to ContentPane to help other classes get the current BrowserView:

    def current_view(self):
        return self.get_nth_page(self.get_current_page()).get_child()

And now, using that, we can define those callbacks in WebBrowserWindow:

def go_back_requested_cb (self, widget, content_pane):
    content_pane.current_view().go_back()
def go_forward_requested_cb (self, widget, content_pane):
    content_pane.current_view().go_forward()

Whew! That's a lot of steps for something I thought was going to be just adding two buttons and two callbacks.

Tags: , , ,
[ 16:45 Aug 06, 2016    More programming | permalink to this entry | ]

Tue, 05 Apr 2016

Modifying a git repo so you can pull without a password

There's been a discussion in the GIMP community about setting up git repos to host contributed assets like scripts, plug-ins and brushes, to replace the long-stagnant GIMP Plug-in Repository. One of the suggestions involves having lots of tiny git repos rather than one that holds all the assets.

That got me to thinking about one annoyance I always have when setting up a new git repository on github: the repository is initially configured with an ssh URL, so I can push to it; but that means I can't pull from the repo without typing my ssh password (more accurately, the password to my ssh key).

Fortunately, there's a way to fix that: a git configuration can have one url for pulling source, and a different pushurl for pushing changes.

These are defined in the file .git/config inside each repository. So edit that file and take a look at the [remote "origin"] section.

For instance, in the GIMP source repositories, hosted on git.gnome.org, instead of the default of url = ssh://git.gnome.org/git/gimp I can set

pushurl = ssh://git.gnome.org/git/gimp
url = git://git.gnome.org/gimp
(disclaimer: I'm not sure this is still correct; my gnome git access stopped working -- I think it was during the Heartbleed security fire drill, or one of those -- and never got fixed.)

For GitHub the syntax is a little different. When I initially set up a repository, the url comes out something like url = git@github.com:username/reponame.git (sometimes the git@ part isn't included), and the password-free pull URL is something you can get from github's website. So you'll end up with something like this:

pushurl = git@github.com:username/reponame.git
url = https://github.com/username/reponame.git

Automating it

That's helpful, and I've made that change on all of my repos. But I just forked another repo on github, and as I went to edit .git/config I remembered what a pain this had been to do en masse on all my repos; and how it would be a much bigger pain to do it on a gazillion tiny GIMP asset repos if they end up going with that model and I ever want to help with the development. It's just the thing that should be scriptable.

However, the rules for what constitutes a valid git passwordless pull URL, and what constitutes a valid ssh writable URL, seem to encompass a lot of territory. So the quickie Python script I whipped up to modify .git/config doesn't claim to handle everything; it only handles the URLs I've encountered personally on Gnome and GitHub. Still, that should be useful if I ever have to add multiple repos at once. The script: repo-pullpush (yes, I know it's a terrible name) on GitHub.

Tags: , , ,
[ 12:28 Apr 05, 2016    More programming | permalink to this entry | ]

Fri, 19 Feb 2016

GIMP ditty: change font size and face on every text layer

A silly little GIMP ditty:
I had a Google map page showing locations of lots of metal recycling places in Albuquerque. The Google map shows stars for each location, but to find out the name and location of each address, you have to mouse over each star. I wanted a printable version to carry in the car with me.

I made a screenshot in GIMP, then added text for the stars over the places that looked most promising. But I was doing this quickly, and as I added text for more locations, I realized that it was getting crowded and I wished I'd used a smaller font. How do you change the font size for ALL font layers in an image, all at once?

Of course GIMP has no built-in method for this -- it's not something that comes up very often, and there's no reason it would have a filter like that. But the GIMP PDB (Procedural DataBase, part of the GIMP API) lets you change font size and face, so it's an easy script to write.

In the past I would have written something like this in script-fu, but now that Python is available on all GIMP platforms, there's no reason not to use it for everything.

Changing font face is just as easy as changing size, so I added that as well.

I won't bother to break it down line by line, since it's so simple. Here's the script: changefont.py: Mass change font face and size in all GIMP text layers.

Tags: , ,
[ 11:11 Feb 19, 2016    More gimp | permalink to this entry | ]

Thu, 15 Oct 2015

Viewer for email attachments in Office formats

Update, December 2022:
viewmailattachments has been integrated with another mutt helper, viewhtmlmail.py, which can show HTML messages complete with embedded images. It's described in the article View Mail Attachments from Mutt and the script is at viewmailattachments.py. It no longer uses the "please wait" screen described in this article, but the rest of the discussion still applies.

I seem to have fallen into a nest of Mac users whose idea of email is a text part, an HTML part, plus two or three or seven attachments (no exaggeration!) in an unholy combination of .DOC, .DOCX, .PPT and other Microsoft Office formats, plus .PDF.

Converting to text in mutt

As a mutt user who generally reads all email as plaintext, normally my reaction to a mess like that would be "Thanks, but no thanks". But this is an organization that does a lot of good work despite their file format habits, and I want to help.

In mutt, HTML mail attachments are easy. This pair of entries in ~/.mailcap takes care of them:

text/html; firefox 'file://%s'; nametemplate=%s.html
text/html; lynx -dump %s; nametemplate=%s.html; copiousoutput
Then in .muttrc, I have
auto_view text/html
alternative_order text/plain text

If a message has a text/plain part, mutt shows that. If it has text/html but no text/plain, it looks for the "copiousoutput" mailcap entry, runs the HTML part through lynx (or I could use links or w3m) and displays that automatically. If, reading the message in lynx, it looks to me like the message has complex formatting that really needs a browser, I can go to mutt's attachments screen and display the attachment in firefox using the other mailcap entry.

Word attachments are not quite so easy, especially when there are a lot of them. The straightforward way is to save each one to a file, then run LibreOffice on each file, but that's slow and tedious and leaves a lot of temporary files behind. For simple documents, converting to plaintext is usually good enough to get the gist of the attachments. These .mailcap entries can do that:

application/msword; catdoc %s; copiousoutput
application/vnd.openxmlformats-officedocument.wordprocessingml.document; docx2txt %s -; copiousoutput
Alternatives to catdoc include wvText and antiword.

But none of them work so well when you're cross-referencing five different attachments, or for documents where color and formatting make a difference, like mail from someone who doesn't know how to get their mailer to include quoted text, and instead distinguishes their comments from the text they're replying to by making their new comments green (ugh!) For those, you really do need a graphical window.

I decided what I really wanted (aside from people not sending me these crazy emails in the first place!) was to view all the attachments as tabs in a new window. And the obvious way to do that is to convert them to formats Firefox can read.

Converting to HTML

I'd used wvHtml to convert .doc files to HTML, and it does a decent job and is fairly fast, but it can't handle .docx. (People who send Office formats seem to distribute their files fairly evenly between DOC and DOCX. You'd think they'd use the same format for everything they wrote, but apparently not.) It turns out LibreOffice has a command-line conversion program, unoconv, that can handle any format LibreOffice can handle. It's a lot slower than wvHtml but it does a pretty good job, and it can handle .ppt (PowerPoint) files too.

For PDF files, I tried using pdftohtml, but it doesn't always do so well, and it's hard to get it to produce a single HTML file rather than a directory of separate page files. And about three quarters of PDF files sent through email turn out to be PDF in name only: they're actually collections of images of single pages, wrapped together as a PDF file. (Mostly, when I see a PDF like that I just skip it and try to get the information elsewhere. But I wanted my program at least to be able to show what's in the document, and let the user choose whether to skip it.) In the end, I decided to open a firefox tab and let Firefox's built-in PDF reader show the file, though popping up separate mupdf windows is also an option.

I wanted to show the HTML part of the email, too. Sometimes there's formatting there (like the aforementioned people whose idea of quoting messages is to type their replies in a different color), but there can also be embedded images. Extracting the images and showing them in a browser window is a bit tricky, but it's a problem I'd already solved a couple of years ago: Viewing HTML mail messages from Mutt (or other command-line mailers).

Showing it all in a new Firefox window

So that accounted for all the formats I needed to handle. The final trick was the firefox window. Since some of these conversions, especially unoconv, are quite slow, I wanted to pop up a window right away with a "converting, please wait..." message. Initially, I used a javascript: URL, running the command:

firefox -new-window "javascript:document.writeln('<br><h1>Translating documents, please wait ...</h1>');"

I didn't want to rely on Javascript, though. A data: URL, which I hadn't used before, can do the same thing without javascript:

firefox -new-window "data:text/html,<br><br><h1>Translating documents, please wait ...</h1>"

But I wanted the first attachment to replace the contents of that same window as soon as it was ready, and then subsequent attachments open a new tab in that window. But it turned out that firefox is inconsistent about what -new-window and -new-tab do; there's no guarantee that -new-tab will show up in the same window you recently popped up with -new-window, and running just firefox URL might open in either the new window or the old, in a new tab or not, or might not open at all. And things got even more complicated after I decided that I should use -private-window to open these attachments in private browsing mode.

In the end, the only way firefox would behave in a repeatable, predictable way was to use -private-window for everything. The first call pops up the private window, and each new call opens a new tab in the private window. If you want two separate windows for two different mail messages, you're out of luck: you can't have two different private windows. I decided I could live with that; if it eventually starts to bother me, I can always give up on Firefox and write a little python-webkit wrapper to do what I need.

Using a file redirect instead

But that still left me with no way to replace the contents of the "Please wait..." window with useful content. Someone on #firefox came up with a clever idea: write the content to a page with a meta redirect.

So initially, I create a file pleasewait.html that includes the header:

<meta http-equiv="refresh" content="2;URL=pleasewait.html">
(other HTML, charset information, etc. as needed). The meta refresh means Firefox will reload the file every two seconds. When the first converted file is ready, I just change the header to redirect to URL=first_converted_file.html. Meanwhile, I can be opening the other documents in additional tabs.

Finally, I added the command to my .muttrc. When I'm viewing a message either in the index or pager screens, F10 will call the script and decode all the attachments.

macro index <F10> "<pipe-message>~/bin/viewmailattachments\n" "View all attachments in browser"
macro pager <F10> "<pipe-message>~/bin/viewmailattachments\n" "View all attachments in browser"

Whew! It was trickier than I thought it would be. But I find I'm using it quite a bit, and it takes a lot of the pain out of those attachment-full emails.

The script is available at: viewmailattachments.py on GitHub.

Tags: , , , , ,
[ 15:18 Oct 15, 2015    More linux | permalink to this entry | ]

Sun, 27 Sep 2015

Make a series of contrasting colors with Python

[PyTopo with contrasting color track logs] Every now and then I need to create a series of contrasting colors. For instance, in my mapping app PyTopo, when displaying several track logs at once, I want them to be different colors so it's easy to tell which track is which.

Of course, I could make a list of five or ten different colors and cycle through the list. But I hate doing work that a computer could do for me.

Choosing random RGB (red, green and blue) values for the colors, though, doesn't work so well. Sometimes you end up getting two similar colors together. Other times, you get colors that just don't work well, because they're so light they look white, or so dark they look black, or so unsaturated they look like shades of grey.

What does work well is converting to the HSV color space: hue, saturation and value. Hue is a measure of the color -- that it's red, or blue, or yellow green, or orangeish, or a reddish purple. Saturation measures how intense the color is: is it a bright, vivid red or a washed-out red? Value tells you how light or dark it is: is it so pale it's almost white, so dark it's almost black, or somewhere in between? (A similar model, called HSL, substitutes Lightness for Value, but is similar enough in concept.)

[GIMP color chooser] If you're not familiar with HSV, you can get a good feel for it by playing with GIMP's color chooser (which pops up when you click the black Foreground or white Background color swatch in GIMP's toolbox). The vertical rainbow bar selects Hue. Once you have a hue, dragging up or down in the square changes Saturation; dragging right or left changes Value. You can also change one at a time by dragging the H, S or V sliders at the upper right of the dialog.

Why does this matter? Because once you've chosen a saturation and value, or at least ensured that saturation is fairly high and value is somewhere in the middle of its range, you can cycle through hues and be assured that you'll get colors that are fairly different each time. If you had a red last time, this time it'll be a green, or yellow, or blue, depending on how much you change the hue.

How does this work programmatically?

PyTopo uses Python-GTK, so I need a function that takes a gtk.gdk.Color and chooses a new, contrasting Color. Fortunately, gtk.gdk.Color already has hue, saturation and value built in. Color.hue is a floating-point number between 0 and 1, so I just have to choose how much to jump. Like this:

def contrasting_color(color):
    '''Returns a gtk.gdk.Color of similar saturation and value
       to the color passed in, but a contrasting hue.
       gtk.gdk.Color objects have a hue between 0 and 1.
    '''
    if not color:
        return self.first_track_color;

    # How much to jump in hue:
    jump = .37

    return gtk.gdk.color_from_hsv(color.hue + jump,
                                  color.saturation,
                                  color.value)

What if you're not using Python-GTK?

No problem. The first time I used this technique, I was generating Javascript code for a company's analytics web page. Python's colorsys module works fine for converting red, green, blue triples to HSV (or a variety of other colorspaces) which you can then use in whatever graphics package you prefer.

Tags: , , ,
[ 13:27 Sep 27, 2015    More programming | permalink to this entry | ]

Thu, 20 Aug 2015

Python module for reading EPUB e-book metadata

Three years ago I wanted a way to manage tags on e-books in a lightweight way, without having to maintain a Calibre database and fire up the Calibre GUI app every time I wanted to check a book's tags. I couldn't find anything, nor did I find any relevant Python libraries, so I reverse engineered the (simple, XML-bsaed) EPUB format and wrote a Python script to show or modify epub tags.

I've been using that script ever since. It's great for Project Gutenberg books, which tend to be overloaded with tags that I don't find very useful for categorizing books ("United States -- Social life and customs -- 20th century -- Fiction") but lacking in tags that I would find useful ("History", "Science Fiction", "Mystery").

But it wasn't easy to include it in other programs. For the last week or so I've been fiddling with a Kobo ebook reader, and I wanted to write programs that could read epub and also speak Kobo-ese. (I'll write separately about the joys of Kobo hacking. It's really a neat little e-reader.)

So I've factored my epubtag script into a usable Python module, so as well as being a standalone program for viewing epub book data, it's easy to use from other programs. It's available on GitHub: epubtag.py: parse EPUB metadata and view or change subject tags.

Tags: , , , ,
[ 20:27 Aug 20, 2015    More programming | permalink to this entry | ]

Tue, 23 Jun 2015

Cross-Platform Android Development Toolkits: Kivy vs. PhoneGap / Cordova

Although Ant builds have made Android development much easier, I've long been curious about the cross-platform phone development apps: you write a simple app in some common language, like HTML or Python, then run something that can turn it into apps on multiple mobile platforms, like Android, iOS, Blackberry, Windows phone, UbuntoOS, FirefoxOS or Tizen.

Last week I tried two of the many cross-platform mobile frameworks: Kivy and PhoneGap.

Kivy lets you develop in Python, which sounded like a big plus. I went to a Kivy talk at PyCon a year ago and it looked pretty interesting. PhoneGap takes web apps written in HTML, CSS and Javascript and packages them like native applications. PhoneGap seems much more popular, but I wanted to see how it and Kivy compared. Both projects are free, open source software.

If you want to skip the gory details, skip to the summary: how do Kivy and PhoneGap compare?

PhoneGap

I tried PhoneGap first. It's based on Node.js, so the first step was installing that. Debian has packages for nodejs, so apt-get install nodejs npm nodejs-legacy did the trick. You need nodejs-legacy to get the "node" command, which you'll need for installing PhoneGap.

Now comes a confusing part. You'll be using npm to install ... something. But depending on which tutorial you're following, it may tell you to install and use either phonegap or cordova.

Cordova is an Apache project which is intertwined with PhoneGap. After reading all their FAQs on the subject, I'm as confused as ever about where PhoneGap ends and Cordova begins, which one is newer, which one is more open-source, whether I should say I'm developing in PhoneGap or Cordova, or even whether I should be asking questions on the #phonegap or #cordova channels on Freenode. (The one question I had, which came up later in the process, I asked on #phonegap and got a helpful answer very quickly.) Neither one is packaged in Debian.

After some searching for a good, comprehensive tutorial, I ended up on a The Cordova tutorial rather than a PhoneGap one. So I typed:

sudo npm install -g cordova

Once it's installed, you can create a new app, add the android platform (assuming you already have android development tools installed) and build your new app:

cordova create hello com.example.hello HelloWorld
cordova platform add android
cordova build

Oops!

Error: Please install Android target: "android-22"
Apparently Cordova/Phonegap can only build with its own preferred version of android, which currently is 22. Editing files to specify android-19 didn't work for me; it just gave errors at a different point.

So I fired up the Android SDK manager, selected android-22 for install, accepted the license ... and waited ... and waited. In the end it took over two hours to download the android-22 SDK; the system image is 13Gb! So that's a bit of a strike against PhoneGap.

While I was waiting for android-22 to download, I took a look at Kivy.

Kivy

As a Python enthusiast, I wanted to like Kivy best. Plus, it's in the Debian repositories: I installed it with sudo apt-get install python-kivy python-kivy-examples

They have a nice quickstart tutorial for writing a Hello World app on their site. You write it, run it locally in python to bring up a window and see what the app will look like. But then the tutorial immediately jumps into more advanced programming without telling you how to build and deploy your Hello World. For Android, that information is in the Android Packaging Guide. They recommend an app called Buildozer (cute name), which you have to pull from git, build and install.

buildozer init
buildozer android debug deploy run
got started on building ... but then I noticed that it was attempting to download and build its own version of apache ant (sort of a Java version of make). I already have ant -- I've been using it for weeks for building my own Java android apps. Why did it want a different version?

The file buildozer.spec in your project's directory lets you uncomment and customize variables like:

# (int) Android SDK version to use
android.sdk = 21

# (str) Android NDK directory (if empty, it will be automatically downloaded.)
# android.ndk_path = 

# (str) Android SDK directory (if empty, it will be automatically downloaded.)
# android.sdk_path = 

Unlike a lot of Android build packages, buildozer will not inherit variables like ANDROID_SDK, ANDROID_NDK and ANDROID_HOME from your environment; you must edit buildozer.spec.

But that doesn't help with ant. Fortunately, when I inspected the Python code for buildozer itself, I discovered there was another variable that isn't mentioned in the default spec file. Just add this line:

android.ant_path = /usr/bin

Next, buildozer gave me a slew of compilation errors:

kivy/graphics/opengl.c: No such file or directory
 ... many many more lines of compilation interspersed with errors
kivy/graphics/vbo.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.

I had to ask on #kivy to solve that one. It turns out that the current version of cython, 0.22, doesn't work with kivy stable. My choices were to uninstall kivy and pull the development version from git, or to uninstall cython and install version 0.21.2 via pip. I opted for the latter option. Either way, there's no "make clean", so removing the dist and build directories let me start over with the new cython.

apt-get purge cython
sudo pip install Cython==0.21.2
rm -rf ./.buildozer/android/platform/python-for-android/dist
rm -rf ./.buildozer/android/platform/python-for-android/build

Buildozer was now happy, and proceeded to download and build Python-2.7.2, pygame and a large collection of other Python libraries for the ARM platform. Apparently each app packages the Python language and all libraries it needs into the Android .apk file.

Eventually I ran into trouble because I'd named my python file hello.py instead of main.py; apparently this is something you're not allowed to change and they don't mention it in the docs, but that was easily solved. Then I ran into trouble again:

Exception: Unable to find capture version in ./main.py (looking for `__version__ = ['"](.*)['"]`)
The buildozer.spec file offers two types of versioning: by default "method 1" is enabled, but I never figured out how to get past that error with "method 1" so I commented it out and uncommented "method 2". With that, I was finally able to build an Android package.

The .apk file it created was quite large because of all the embedded Python libraries: for the little 77-line pong demo, /usr/share/kivy-examples/tutorials/pong in the Debian kivy-examples package, the apk came out 7.3Mb. For comparison, my FeedViewer native java app, roughly 2000 lines of Java plus a few XML files, produces a 44k apk.

The next step was to make a real mini app. But when I looked through the Kivy examples, they all seemed highly specialized, and I couldn't find any documentation that addressed issues like what widgets were available or how to lay them out. How do I add a basic text widget? How do I put a button next to it? How do I get the app to launch in portrait rather than landscape mode? Is there any way to speed up the very slow initialization?

I'd spent a few hours on Kivy and made a Hello World app, but I was having trouble figuring out how to do anything more. I needed a change of scenery.

PhoneGap, redux

By this time, android-22 had finally finished downloading. I was ready to try PhoneGap again.

This time,

cordova platforms add android
cordova build
worked fine. It took a long time, because it downloaded the huge gradle build system rather than using something simpler like ant. I already have a copy of gradle somewhere (I downloaded it for the OsmAnd build), but it's not in my path, and I was too beaten down by this point to figure out where it was and how to get cordova to point to it.

Cordova eventually produced a 1.8Mb "hello world" apk -- a quarter the size of the Kivy package, though 20 times as big as a native Java app. Deployed on Android, it initialized much faster than the Kivy app, and came up in portrait mode but rotated correctly if I rotated the phone.

Editing the HTML, CSS and Javascript was fairly simple. You'll want to replace pretty much all of the default CSS if you don't want your app monopolized by the Cordova icon.

The only tricky part was file access: opening a file:// URL didn't work. I asked on #phonegap and someone helpfully told me I'd need the file plugin. That was easy to find in the documentation, and I added it like this:

cordova plugin search file
cordova plugin add org.apache.cordova.file

My final apk, for a small web app I use regularly on Android, was almost the same size as their hello world example: 1.8Mb. And it works great: phonegap had no problem playing an audio clip, something that was tricky when I was trying to do the same thing from a native Android java WebView class.

Summary: How do Kivy and PhoneGap compare?

This has been a long article, I know. So how do Kivy and PhoneGap compare, and which one will I be using?

They both need a large amount of disk space for the development environment. I wish I had good numbers to give you, but I was working with both systems at the same time, and their packages are scattered all over the disk so I haven't found a good way of measuring their size. I suspect PhoneGap is quite a bit bigger, because it uses gradle rather than ant and because it insists on android-22.

On the other hand, PhoneGap wins big on packaged application size: its .apk files are a quarter the size of Kivy's.

PhoneGap definitely wins on documentation. Kivy has seemingly lots of documentation, but its tutorials jumped around rather than following a logical sequence, and I had trouble finding answers to basic questions like "How do I display a text field with a button?" PhoneGap doesn't need that, because the UI is basic HTML and CSS -- limited though they are, at least most people know how to use them.

Finally, PhoneGap wins on startup speed. For my very simple test app, startup was more or less immediate, while the Kivy Hello World app required several seconds of startup time on my Galaxy S4.

Kivy is an interesting project. I like the ant-based build, the straightforward .spec file, and of course the Python language. But it still has some catching up to do in performance and documentation. For throwing together a simple app and packaging it for Android, I have to give the win to PhoneGap.

Tags: , , , , , ,
[ 12:09 Jun 23, 2015    More programming | permalink to this entry | ]

Thu, 08 Jan 2015

Accessing image metadata: storing tags inside the image file

A recent Slashdot discussion on image tagging and organization a while back got me thinking about putting image tags inside each image, in its metadata.

Currently, I use my MetaPho image tagger to update a file named Tags in the same directory as the images I'm tagging. Then I have a script called fotogr that searches for combinations of tags in these Tags files.

That works fine. But I have occasionally wondered if I should also be saving tags inside the images themselves, in case I ever want compatibility with other programs. I decided I should at least figure out how that would work, in case I want to add it to MetaPho.

I thought it would be simple -- add some sort of key in the images's EXIF tags. But no -- EXIF has no provision for tags or keywords. But JPEG (and some other formats) supports lots of tags besides EXIF. Was it one of the XMP tags?

Web searching only increased my confusion; it seems that there is no standard for this, but there have been lots of pseudo-standards over the years. It's not clear what tag most programs read, but my impression is that the most common is the "Keywords" IPTC tag.

Okay. So how would I read or change that from a Python program?

Lots of Python libraries can read EXIF tags, including Python's own PIL library -- I even wrote a few years ago about reading EXIF from PIL. But writing it is another story.

Nearly everybody points to pyexiv2, a fairly mature library that even has a well-written pyexiv2 tutorial. Great! The only problem with it is that the pyexiv2 front page has a big red Deprecation warning saying that it's being replaced by GExiv2. With a link that goes to a nonexistent page; and Debian doesn't seem to have a package for GExiv2, nor could I find a tutorial on it anywhere.

Sigh. I have to say that pyexiv2 sounds like a much better bet for now even if it is supposedly deprecated.

Following the tutorial, I was able to whip up a little proof of concept that can look for an IPTC Keywords tag in an existing image, print out its value, add new tags to it and write it back to the file.

import sys
import pyexiv2

if len(sys.argv) < 2:
    print "Usage:", sys.argv[0], "imagename.jpg [tag ...]"
    sys.exit(1)

metadata = pyexiv2.ImageMetadata(sys.argv[1])
metadata.read()

newkeywords = sys.argv[2:]

keyword_tag = 'Iptc.Application2.Keywords'
if keyword_tag in metadata.iptc_keys:
    tag = metadata[keyword_tag]
    oldkeywords = tag.value
    print "Existing keywords:", oldkeywords
    if not newkeywords:
        sys.exit(0)
    for newkey in newkeywords:
        oldkeywords.append(newkey)
    tag.value = oldkeywords
else:
    print "No IPTC keywords set yet"
    if not newkeywords:
        sys.exit(0)
    metadata[keyword_tag] = pyexiv2.IptcTag(keyword_tag, newkeywords)

tag = metadata[keyword_tag]
print "New keywords:", tag.value

metadata.write()

Does that mean I'm immediately adding it to MetaPho? No. To be honest, I'm not sure I care very much, since I don't have any other software that uses that IPTC field and no other MetaPho user has ever asked for it. But it's nice to know that if I ever have a reason to add it, I can.

Tags: , , ,
[ 10:28 Jan 08, 2015    More photo | permalink to this entry | ]

Thu, 11 Sep 2014

Making emailed LinkedIn discussion thread links actually work

I don't use web forums, the kind you have to read online, because they don't scale. If you're only interested in one subject, then they work fine: you can keep a browser tab for your one or two web forums perenially open and hit reload every few hours to see what's new. If you're interested in twelve subjects, each of which has several different web forums devoted to it -- how could you possibly keep up with that? So I don't bother with forums unless they offer an email gateway, so they'll notify me by email when new discussions get started, without my needing to check all those web pages several times per day.

LinkedIn discussions mostly work like a web forum. But for a while, they had a reasonably usable email gateway. You could set a preference to be notified of each new conversation. You still had to click on the web link to read the conversation so far, but if you posted something, you'd get the rest of the discussion emailed to you as each message was posted. Not quite as good as a regular mailing list, but it worked pretty well. I used it for several years to keep up with the very active Toastmasters group discussions.

About a year ago, something broke in their software, and they lost the ability to send email for new conversations. I filed a trouble ticket, and got a note saying they were aware of the problem and working on it. I followed up three months later (by filing another ticket -- there's no way to add to an existing one) and got a response saying be patient, they were still working on it. 11 months later, I'm still being patient, but it's pretty clear they have no intention of ever fixing the problem.

Just recently I fiddled with something in my LinkedIn prefs, and started getting "Popular Discussions" emails every day or so. The featured "popular discussion" is always something stupid that I have no interest in, but it's followed by a section headed "Other Popular Discussions" that at least gives me some idea what's been posted in the last few days. Seemed like it might be worth clicking on the links even though it means I'd always be a few days late responding to any conversations.

Except -- none of the links work. They all go to a generic page with a red header saying "Sorry it seems there was a problem with the link you followed."

I'm reading the plaintext version of the mail they send out. I tried viewing the HTML part of the mail in a browser, and sure enough, those links worked. So I tried comparing the text links with the HTML:

Text version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&amp;t=gde&amp;midToken=AQEqep2nxSZJIg&amp;ek=b2_anet_digest&amp;li=82&amp;m=group_discussions&amp;ts=textdisc-6&amp;itemID=5914453683503906819&amp;itemType=member&amp;anetID=98449
HTML version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&t=gde&midToken=AQEqep2nxSZJIg&ek=b2_anet_digest&li=17&m=group_discussions&ts=grouppost-disc-6&itemID=5914453683503906819&itemType=member&anetID=98449

Well, that's clear as mud, isn't it?

HTML entity substitution

I pasted both links one on top of each other, to make it easier to compare them one at a time. That made it fairly easy to find the first difference:

Text version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&amp;t=gde&amp;midToken= ...
HTML version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&t=gde&midToken= ...

Time to die laughing: they're doing HTML entity substitution on the plaintext part of their email notifications, changing & to &amp; everywhere in the link.

If you take the link from the text email and replace &amp; with &, the link works, and takes you to the specific discussion.

Pagination

Except you can't actually read the discussion. I went to a discussion that had been open for 2 days and had 35 responses, and LinkedIn only showed four of them. I don't even know which four they are -- are they the first four, the last four, or some Facebook-style "four responses we thought you'd like". There's a button to click on to show the most recent entries, but then I only see a few of the most recent responses, still not the whole thread.

Hooray for the web -- of course, plenty of other people have had this problem too, and a little web searching unveiled a solution. Add a pagination token to the end of the URL that tells LinkedIn to show 1000 messages at once.

&count=1000&paginationToken=
It won't actually show 1000 (or all) responses -- but if you start at the beginning of the page and scroll down reading responses one by one, it will auto-load new batches. Yes, infinite scrolling pages can be annoying, but at least it's a way to read a LinkedIn conversation in order.

Making it automatic

Okay, now I know how to edit one of their URLs to make it work. Do I want to do that by hand any time I want to view a discussion? Noooo!

Time for a script! Since I'll be selecting the URLs from mutt, they'll be in the X PRIMARY clipboard. And unfortunately, mutt adds newlines so I might as well strip those as well as fixing the LinkedIn problems. (Firefox will strip newlines for me when I paste in a multi-line URL, but why rely on that?)

Here's the important part of the script:

import subprocess, gtk

primary = gtk.clipboard_get(gtk.gdk.SELECTION_PRIMARY)
if not primary.wait_is_text_available() :
    sys.exit(0)
link = primary.wait_for_text()
link = link.replace("\n", "").replace("&amp;", "&") + \
       "&count=1000&paginationToken="
subprocess.call(["firefox", "-new-tab", link])

And here's the full script: linkedinify on GitHub. I also added it to pyclip, the script I call from Openbox to open a URL in Firefox when I middle-click on the desktop.

Now I can finally go back to participating in those discussions.

Tags: , , ,
[ 13:10 Sep 11, 2014    More tech/web | permalink to this entry | ]

Thu, 31 Jul 2014

Predicting planetary visibility with PyEphem

Part II: Predicting Conjunctions

After I'd written a basic script to calculate when planets will be visible, the next step was predicting conjunctions, times when two or more planets are close together in the sky.

Finding separation between two objects is easy in PyEphem: it's just one line once you've set up your objects, observer and date.

p1 = ephem.Mars()
p2 = ephem.Jupiter()
observer = ephem.Observer()  # and then set it to your city, etc.
observer.date = ephem.date('2014/8/1')
p1.compute(observer)
p2.compute(observer)

ephem.separation(p1, p2)

So all I have to do is loop over all the visible planets and see when the separation is less than some set minimum, like 4 degrees, right?

Well, not really. That tells me if there's a conjunction between a particular pair of planets, like Mars and Jupiter. But the really interesting events are when you have three or more objects close together in the sky. And events like that often span several days. If there's a conjunction of Mars, Venus, and the moon, I don't want to print something awful like

Friday:
  Conjunction between Mars and Venus, separation 2.7 degrees.
  Conjunction between the moon and Mars, separation 3.8 degrees.
Saturday:
  Conjunction between Mars and Venus, separation 2.2 degrees.
  Conjunction between Venus and the moon, separation 3.9 degrees.
  Conjunction between the moon and Mars, separation 3.2 degrees.
Sunday:
  Conjunction between Venus and the moon, separation 4.0 degrees.
  Conjunction between the moon and Mars, separation 2.5 degrees.

... and so on, for each day. I'd prefer something like:

Conjunction between Mars, Venus and the moon lasts from Friday through Sunday.
  Mars and Venus are closest on Saturday (2.2 degrees).
  The moon and Mars are closest on Sunday (2.5 degrees).

At first I tried just keeping a list of planets involved in the conjunction. So if I see Mars and Jupiter close together, I'd make a list [mars, jupiter], and then if I see Venus and Mars on the same date, I search through all the current conjunction lists and see if either Venus or Mars is already in a list, and if so, add the other one. But that got out of hand quickly. What if my conjunction list looks like [ [mars, venus], [jupiter, saturn] ] and then I see there's also a conjunction between Mars and Jupiter? Oops -- how do you merge those two lists together?

The solution to taking all these pairs and turning them into a list of groups that are all connected actually lies in graph theory: each conjunction pair, like [mars, venus], is an edge, and the trick is to find all the connected edges. But turning my list of conjunction pairs into a graph so I could use a pre-made graph theory algorithm looked like it was going to be more code -- and a lot harder to read and less maintainable -- than making a bunch of custom Python classes.

I eventually ended up with three classes: ConjunctionPair, for a single conjunction observed between two bodies on a single date; Conjunction, a collection of ConjunctionPairs covering as many bodies and dates as needed; and ConjunctionList, the list of all Conjunctions currently active. That let me write methods to handle merging multiple conjunction events together if they turned out to be connected, as well as a method to summarize the event in a nice, readable way.

So predicting conjunctions ended up being a lot more code than I expected -- but only because of the problem of presenting it neatly to the user. As always, user interface represents the hardest part of coding.

The working script is on github at conjunctions.py.

Tags: , , ,
[ 19:57 Jul 31, 2014    More science/astro | permalink to this entry | ]

Wed, 23 Jul 2014

Predicting planetary visibility with PyEphem

Part 1: Basic Planetary Visibility

All through the years I was writing the planet observing column for the San Jose Astronomical Association, I was annoyed at the lack of places to go to find out about upcoming events like conjunctions, when two or more planets are close together in the sky. It's easy to find out about conjunctions in the next month, but not so easy to find sites that will tell you several months in advance, like you need if you're writing for a print publication (even a club newsletter).

For some reason I never thought about trying to calculate it myself. I just assumed it would be hard, and wanted a source that could spoon-feed me the predictions.

The best source I know of is the RASC Observer's Handbook, which I faithfully bought every year and checked each month so I could enter that month's events by hand. Except for January and February, when I didn't have the next year's handbook yet by the time my column went to press and I was on my own. I have to confess, I was happy to get away from that aspect of the column when I moved.

In my new town, I've been helping the local nature center with their website. They had some great pages already, like a What's Blooming Now? page that keeps track of which flowers are blooming now and only shows the current ones. I've been helping them extend it by adding features like showing only flowers of a particular color, separating the data into CSV databases so it's easier to add new flowers or butterflies, and so forth. Eventually we hope to build similar databases of birds, reptiles and amphibians.

And recently someone suggested that their astronomy page could use some help. Indeed it could -- it hadn't been updated in about five years. So we got to work looking for a source of upcoming astronomy events we could use as a data source for the page, and we found sources for a few things, like moon phases and eclipses, but not much.

Someone asked about planetary conjunctions, and remembering how I'd always struggled to find that data, especially in months when I didn't have the RASC handbook yet, I got to wondering about calculating it myself. Obviously it's possible to calculate when a planet will be visible, or whether two planets are close to each other in the sky. And I've done some programming with PyEphem before, and found it fairly easy to use. How hard could it be?

Note: this article covers only the basic problem of predicting when a planet will be visible in the evening. A followup article will discuss the harder problem of conjunctions.

Calculating planet visibility with PyEphem

The first step was figuring out when planets were up. That was straightforward. Make a list of the easily visible planets (remember, this is for a nature center, so people using the page aren't expected to have telescopes):

import ephem

planets = [
    ephem.Moon(),
    ephem.Mercury(),
    ephem.Venus(),
    ephem.Mars(),
    ephem.Jupiter(),
    ephem.Saturn()
    ]

Then we need an observer with the right latitude, longitude and elevation. Elevation is apparently in meters, though they never bother to mention that in the PyEphem documentation:

observer = ephem.Observer()
observer.name = "Los Alamos"
observer.lon = '-106.2978'
observer.lat = '35.8911'
observer.elevation = 2286  # meters, though the docs don't actually say

Then we loop over the date range for which we want predictions. For a given date d, we're going to need to know the time of sunset, because we want to know which planets will still be up after nightfall.

observer.date = d
sunset = observer.previous_setting(sun)

Then we need to loop over planets and figure out which ones are visible. It seems like a reasonable first approach to declare that any planet that's visible after sunset and before midnight is worth mentioning.

Now, PyEphem can tell you directly the rising and setting times of a planet on a given day. But I found it simplified the code if I just checked the planet's altitude at sunset and again at midnight. If either one of them is "high enough", then the planet is visible that night. (Fortunately, here in the mid latitudes we don't have to worry that a planet will rise after sunset and then set again before midnight. If we were closer to the arctic or antarctic circles, that would be a concern in some seasons.)

min_alt = 10. * math.pi / 180.
for planet in planets:
    observer.date = sunset
    planet.compute(observer)
    if planet.alt > min_alt:
        print planet.name, "is already up at sunset"

Easy enough for sunset. But how do we set the date to midnight on that same night? That turns out to be a bit tricky with PyEphem's date class. Here's what I came up with:

    midnight = list(observer.date.tuple())
    midnight[3:6] = [7, 0, 0]
    observer.date = ephem.date(tuple(midnight))
    planet.compute(observer)
    if planet.alt > min_alt:
        print planet.name, "will rise before midnight"

What's that 7 there? That's Greenwich Mean Time when it's midnight in our time zone. It's hardwired because this is for a web site meant for locals. Obviously, for a more general program, you should get the time zone from the computer and add accordingly, and you should also be smarter about daylight savings time and such. The PyEphem documentation, fortunately, gives you tips on how to deal with time zones. (In practice, though, the rise and set times of planets on a given day doesn't change much with time zone.)

And now you have your predictions of which planets will be visible on a given date. The rest is just a matter of writing it out into your chosen database format.

In the next article, I'll cover planetary and lunar conjunctions -- which were superficially very simple, but turned out to have some tricks that made the programming harder than I expected.

Tags: , , ,
[ 21:32 Jul 23, 2014    More science/astro | permalink to this entry | ]

Sun, 11 May 2014

Sonograms in Python

I went to a terrific workshop last week on identifying bird songs. We listened to recordings of songs from some of the trickier local species, and discussed the differences and how to remember them. I'm not a serious birder -- I don't do lists or Big Days or anything like that, and I dislike getting up at 6am just because the birds do -- but I do try to identify birds (as well as mammals, reptiles, rocks, geographic features, and pretty much anything else I see while hiking or just sitting in the yard) and I've always had trouble remembering their songs.

[Sonogram of ruby-crowned kinglet] One of the tools birders use to study bird songs is the sonogram. It's a plot of frequency (on the vertical axis) and intensity (represented by color, red being louder) versus time. Looking at a sonogram you can identify not just how fast a bird trills and whether it calls in groups of three or five, but whether it's buzzy/rattly (a vertical line, lots of frequencies at once) or a purer whistle, and whether each note is ascending or descending.

The class last week included sonograms for the species we studied. But what about other species? The class didn't cover even all the local species I'd like to be able to recognize. I have several collections of bird calls on CD (which I bought to use in combination with my "tweet" script -- yes, the name messes up google searches, but my tweet predates Twitter -- a tweet Python script and tweet in HTML for Android). It would be great to be able to make sonograms from some of those recordings too.

But a search for Linux sonogram turned up nothing useful. Audacity has a histogram visualization mode with lots of options, but none of them seem to result in a usable sonogram, and most discussions I found on the net agreed that it couldn't do it. There's another sound editor program called snd which can do sonograms, but it's fiddly to use and none of the many color schemes produce a sonogram that I found very readable.

Okay, what about python scripts? Surely that's been done?

I had better luck there. Matplotlib's pylab package has a specgram() call that does more or less what I wanted, and here's an example of how to use pylab.specgram(). (That post also has another example using a library called timeside, but timeside's PyPI package doesn't have any dependency information, and after playing the old RPM-chase game installing another dependency, trying it, then installing the next dependency, I gave up.)

The only problem with pylab.specgram() was that it shows the full range of the sound, both in time and frequency. The recordings I was examining can last a minute or more and go up to 20,000 Hz -- and when pylab tries to fit that all on the screen, you end up with a plot where the details are too small to show you anything useful.

You'd think there would be a way for pylab.specgram() to show only part of the spectrum, but that doesn't seem to be. I finally found a Stack Overflow discussion where "edited" gives an excellent rewritten version of pylab.specgram which allows setting minimum and maximum frequency cutoffs. Worked great!

Then I did some fiddling to allow for analyzing only part of the recording -- Python's wave package has no way to read in just the first six seconds of a .wav file, so I had to read in the whole file, read the data into a numpy array, then take a slice representing the seconds of the recording I actually wanted.

But now I can plot nice sonograms of any bird song I want to see, print them out or stick them on my Android device so I can carry them with me.

Update: Oops! I forgot to include a link to the script. Here it is: Sonograms in Python.


Tags: , , , ,
[ 09:17 May 11, 2014    More programming | permalink to this entry | ]

Thu, 17 Apr 2014

Back from PyCon

I'm back from Montreal, settling back in.

The PiDoorbell tutorial went well, in the end. Of course just about everything that could go wrong, did. The hard-wired ethernet connection we'd been promised didn't materialize, and there was no way to get the Raspberry Pis onto the conference wi-fi because it used browser authentication (it still baffles me why anyone still uses that! Browser authentication made sense in 2007 when lots of people only had 801.11g and couldn't do WPA; it makes absolutely zero sense now).

Anyway, lacking a sensible way to get everyone's Pis on the net, Deepa stepped as network engineer for the tutorial and hooked up the router she had brought to her laptop's wi-fi connection so the Pis could route through that.

Then we found we had too few SD cards. We didn't realize why until afterward: when we compared the attendee count to the sign-up list we'd gotten, we had quite a few more attendees than we'd planned for. We had a few extra SD cards, but not enough, so I and a couple of the other instructors/TAs had to loan out SD cards we'd brought for our own Pis. ("Now edit /etc/network/interfaces ... okay, pretend you didn't see that, that's the password for my home router, now delete that and change it to ...")

Then some of the SD cards turned out not to have been updated with the latest packages, Mac users couldn't find the drivers to run the serial cable, Windows users (or was it Macs?) had trouble setting static ethernet addresses so they could ssh to the Pi, all the problems we'd expected and a few we hadn't.

But despite all the problems, the TAs: Deepa (who was more like a co-presenter than a TA), Serpil, Lyz and Stuart, plus Rupa and I, were able to get everyone working. All the attendees got their LEDs blinking, their sonar rangefinders rangefinding, and the PiDoorbell script running. Many people brought cameras and got their Pis snapping pictures when the sensor registered someone in front of it. Time restrictions and network problems meant that most people didn't get the Dropbox and Twilio registration finished to get notifications sent to their phones, but that's okay -- we knew that was a long shot, and everybody got far enough that they can add the network notifications later if they want.

And the most important thing is that everybody looked like they were having a good time. We haven't seen the reviews (I'm not sure if PyCon shares reviews with the tutorial instructors; I hope so, but a lot of conferences don't) but I hope everybody had fun and felt like they got something out of it.

The rest of PyCon was excellent, too. I went to some great talks, got lots of ideas for new projects and packages I want to try, had fun meeting new people, and got to see a little of Montreal. And ate a lot of good food.

Now I'm back in the land of enchantment, with its crazy weather -- we've gone from snow to sun to cold breezes to HOT to threatening thunderstorm in the couple of days I've been back. Never a dull moment! I confess I'm missing those chocolate croissants for breakfast just a little bit. We still don't have internet: it's nearly 9 weeks since Comcast's first visit, and their latest prediction (which changes every time I talk to them) is a week from today.

But it's warm and sunny this morning, there's a white-crowned sparrow singing outside the window, and I've just seen our first hummingbird (a male -- I think it's a broad-tailed, but it'll take a while to be confident of IDs on all these new-to-me birds). PyCon was fun -- but it's nice to be home.

Tags: , ,
[ 10:20 Apr 17, 2014    More conferences | permalink to this entry | ]

Sun, 06 Apr 2014

Snow-Hail while preparing for Montreal

Things have been hectic in the last few days before I leave for Montreal with last-minute preparation for our PyCon tutorial, Build your own PiDoorbell - Learn Home Automation with Python next Wednesday.

[Snow-hail coming down on the Piñons] But New Mexico came through on my next-to-last full day with some pretty interesting weather. A windstorm in the afternoon gave way to thunder (but almost no lightning -- I saw maybe one indistinct flash) which gave way to a strange fluffy hail that got gradually bigger until it eventually grew to pea-sized snowballs, big enough and snow enough to capture well in photographs as they came down on the junipers and in the garden.

Then after about twenty minutes the storm stopped the sun came out. And now I'm back to tweaking tutorial slides and thinking about packing while watching the sunset light on the Rio Grande gorge.

But tomorrow I leave it behind and fly to Montreal. See you at PyCon!

Tags: , , , , , ,
[ 18:55 Apr 06, 2014    More misc | permalink to this entry | ]

Wed, 29 Jan 2014

PyCon Tutorial: Build your own PiDoorbell - Learn Home Automation with Python

[Raspberry Pi from wikipedia] The first batch of hardware has been ordered for Rupa's and my tutorial at PyCon in Montreal this April!

We're presenting Build your own PiDoorbell - Learn Home Automation with Python on the afternoon of Wednesday, April 9.

It'll be a hands-on workshop, where we'll experiment with the Raspberry Pi's GPIO pins and learn how to control simple things like an LED. Then we'll hook up sonar rangefinders to the RPis, and build a little device that can be used to monitor visitors at your front door, birds at your feeder, co-workers standing in front of your monitor while you're away, or just about anything else you can think of.

Participants will bring their own Raspberry Pi computers and power supplies -- attendees of last year's PyCon got them there, but a new Model A can be gotten for $30, and a model B for $40.

We'll provide everything else. We worried that requiring participants to bring a long list of esoteric hardware was just asking for trouble, so we worked a deal with PyCon and they're sponsoring hardware for attendees. Thank you, PyCon! CodeChix is fronting the money for the kits and helping with our travel expenses, thanks to donations from some generous sponsors. We'll be passing out hardware kits and SD cards at the beginning of the workshop, which attendees can take home afterward.

We're also looking for volunteer T/As. The key to a good hardware workshop is having lots of helpers who can make sure everybody's keeping up and nobody's getting lost. We have a few top-notch T/As signed up already, but we can always use more. We can't provide hardware for T/As, but most of it's quite inexpensive if you want to buy your own kit to practice on. And we'll teach you everything you need to know about how get your PiDoorbell up and running -- no need to be an expert at hardware or even at Python, as long as you're interested in learning and in helping other people learn.

This should be a really fun workshop! PyCon tutorial sign-ups just opened recently, so sign up for the tutorial (we do need advance registration so we know how many hardware kits to buy). And if you're going to be at PyCon and are interested in being a T/A, drop me or Rupa a line and we'll get you on the list and get you all the information you need.

See you at PyCon!

Tags: , , , , , ,
[ 20:32 Jan 29, 2014    More hardware | permalink to this entry | ]

Wed, 11 Dec 2013

Counting syllables in Python

When I wrote recently about my Dactylic dinosaur doggerel, I glossed over a minor problem with my final poem: the rules of double-dactylic doggerel say that the sixth line (or sometimes the seventh) should be a single double-dactyl word -- something like "paleontologist" or "hexasyllabic'ly". I used "dinosaur orchestra" -- two words, which is cheating.

I don't feel too guilty about that. If you read the post, you may recall that the verse was the result of drifting grumpily through an insomniac morning where I would have preferred to be getting back to sleep. Coming up with anything that scans at all is probably good enough.

Still, it bugged me, not being able to think of a double-dactylic word that related somehow to Parasaurolophus. So I vowed that, later that day when I was up and at the computer, I would attempt to find one and rewrite the poem accordingly.

I thought that would be fairly straightforward. Not so much. I thought there would be some utility I could run that would count syllables for me, then I could run /usr/share/dict/words through it, print out all the 6-syllable words, and find one that fit. Turns out there is no such utility.

But Python has a library for everything, doesn't it?

Some searching turned up PyHyphen, which includes some syllable-counting functions. It apparently uses the hyphenation dictionaries that come with LibreOffice.

There's a Debian package for it, python-pyhyphen -- but it doesn't work. First, it depends on another package, hyphen-en-us, but doesn't have that dependency encoded in the package, even as a suggested or recommended package. But even when you install the hyphenated dictionary, it still doesn't work because it doesn't point to the dictionary in the place it was installed. Looks like that problem was reported almost two years ago, bug 627944: python-pyhyphen: doesn't work out-of-the-box with hyphen-* packages. There's a fix there that involves editing two files, /usr/lib/python2.7/dist-packages/hyphen/config.py and /usr/lib/python2.7/dist-packages/hyphen/__init__.py.

Or you can just give up on Debian and pip install pyhyphen, which is a lot easier.

But once you get it working, you find that it's terrible. It was wrong about almost every word I tried. I hope not too many people are relying on this hyphen-en-us dictionary for important documents. Its results seemed nearly random, and I quickly gave up on it for getting a useful list of words around six syllables.

Just for fun, since my count syllables web search turned up quite a few websites claiming that functionality, I tried entering some of my long test words manually. All of the websites I tried were wrong more than half the time, and often they were off by more than two syllables. I don't mind off-by-ones -- I can look at words claiming 5 and 7 syllables while searching for double dactyls -- but if I have to include 4-syllable words as well, I'll never find what I'm looking for.

That discouraged me from using another Python suggestion I'd seen, the nltk (natural language toolkit) package. I've been looking for an excuse to play with nltk, and some day I will, but for this project I was looking for a quick approximate solution, and the nltk examples I found mostly looked like using it would require a bigger time commitment than I was willing to devote to silly poetry. And if none of the dedicated syllable-counting websites or dictionaries got it right, would a big time investment in nltk pay off?

Anyway, by this time I'd wasted more than an hour poking around various libraries and websites for this silly unimportant problem, and I decided that with that kind of time investment, I could probably do better on my own than the official solutions were giving me. Why not basically just count vowels?

So I whipped up a little script, countsyl, that did just that. I gave it a list of vowels, with a few simple rules. Obviously, you can't just say every vowel is a new syllable -- there are too many double vowels and silent letters and such. But you can't say that any run of multiple vowels together counts as one syllable, because sometimes the vowels do count; and you can't make absolute rules like "'e' at the end of a word is always silent", because sometimes it isn't. So I kept both minimum and maximum syllable counts for each word, and printed both.

And much to my surprise, without much tuning at all my silly little script immediately much better results than the hyphenation dictionary or the dedicated websites.

Alas, although it did give me quite a few hexasyllabic words in /usr/share/dict/words, none of them were useful at all for a program on Parasaurolophus. What I really needed was a musical term (since that's what the poem is about). What about a musical dictionary?

I found a list of musical terms on Wikipedia: Glossary of musical terminology, saved it as a local file, ran a few vim substitutes and turned it into a plain list of words. That did a little better, and gave me some possible ideas: (non?)contrapuntally? (something)harmonically? extemporaneously?

But none of them worked out, and by then I'd run out of steam. I gave up and blogged the poem as originally written, with the cheating two-word phrase "dinosaur orchestra", and vowed to write up how to count words in Python -- which I have now done. Quite noncontrapuntally, and definitely not extemporaneously. But at least I have a useful little script next time I want to get an approximate syllable count.

Tags: , , , , ,
[ 17:51 Dec 11, 2013    More programming | permalink to this entry | ]

Wed, 13 Nov 2013

Does scrolling output make a program slower? Followup.

Last week I wrote about some tests I'd made to answer the question Does scrolling output make a program slower? My test showed that when running a program that generates lots of output, like an rsync -av, the rsync process will slow way down as it waits for all that output to scroll across whatever terminal client you're using. Hiding the terminal helps a lot if it's an xterm or a Linux console, but doesn't help much with gnome-terminal.

A couple of people asked in the comments about the actual source of the slowdown. Is the original process -- the rsync, or my test script, that's actually producing all that output -- actually blocking waiting for the terminal? Or is it just that the CPU is so busy doing all that font rendering that it has no time to devote to the original program, and that's why it's so much slower?

I found pingu on IRC (thanks to JanC) and the group had a very interesting discussion, during which I ran a series of additional tests.

In the end, I'm convinced that CPU allocation to the original process is not the issue, and that output is indeed blocked waiting for the terminal to display the output. Here's why.

First, I installed a couple of performance meters and looked at the CPU load while rendering. With conky, CPU use went up equally (about 35-40%) on both CPU cores while the test was running. But that didn't tell me anything about which processes were getting all that CPU.

htop was more useful. It showed X first among CPU users, xterm second, and my test script third. However, the test script never got more than 10% of the total CPU during the test; X and xterm took up nearly all the remaining CPU.

Even with the xterm hidden, X and xterm were the top two CPU users. But this time the script, at number 3, got around 30% of the CPU rather than 10%. That still doesn't seem like it could account for the huge difference in speed (the test ran about 7 times faster with xterm hidden); but it's interesting to know that even a hidden xterm will take up that much CPU.

It was also suggested that I try running it to /dev/null, something I definitely should have thought to try before. The test took .55 seconds with its output redirected to /dev/null, and .57 seconds redirected to a file on disk (of course, the kernel would have been buffering, so there was no disk wait involved). For comparison, the test had taken 56 seconds with xterm visible and scrolling, and 8 seconds with xterm hidden.

I also spent a lot of time experimenting with sleeping for various amounts of time between printed lines. With time.sleep(.0001) and xterm visible, the test took 104.71 seconds. With xterm shaded and the same sleep, it took 98.36 seconds, only 6 seconds faster. Redirected to /dev/null but with a .0001 sleep, it took 97.44 sec.

I think this argues for the blocking theory rather than the CPU-bound one: the argument being that the sleep gives the program a chance to wait for the output rather than blocking the whole time. If you figure it's CPU bound, I'm not sure how you'd explain the result.

But a .0001 second sleep probably isn't very accurate anyway -- we were all skeptical that Linux can manage sleep times that small. So I made another set of tests, with a .001 second sleep every 10 lines of output. The results: 65.05 with xterm visible; 63.36 with xterm hidden; 57.12 to /dev/null. That's with a total of 50 seconds of sleeping included (my test prints 500000 lines). So with all that CPU still going toward font rendering, the visible-xterm case still only took 7 seconds longer than the /dev/null case. I think this argues even more strongly that the original test, without the sleep, is blocking, not CPU bound.

But then I realized what the ultimate test should be. What happens when I run the test over an ssh connection, with xterm and X running on my local machine but the actual script running on the remote machine?

The remote machine I used for the ssh tests was a little slower than the machine I used to run the other tests, but that probably doesn't make much difference to the results.

The results? 60.29 sec printing over ssh (LAN) to a visible xterm; 7.24 sec doing the same thing with xterm hidden. Fairly similar to what I'd seen before when the test, xterm and X were all running on the same machine.

Interestingly, the ssh process during the test took 7% of my CPU, almost as much as the python script was getting before, just to transfer all the output lines so xterm could display them.

So I'm convinced now that the performance bottleneck has nothing to do with the process being CPU bound and having all its CPU sucked away by rendering the output, and that the bottleneck is in the process being blocked in writing its output while waiting for the terminal to catch up.

I'd be interested it hear further comments -- are there other interpretations of the results besides mine? I'm also happy to run further tests.

Tags: , , ,
[ 17:19 Nov 13, 2013    More linux | permalink to this entry | ]

Fri, 08 Nov 2013

Does scrolling output make a program slower?

While watching my rsync -av messages scroll by during a big backup, I wondered, as I often have, whether that -v (verbose) flag was slowing my backup down.

In other words: when you run a program that prints lots of output, so there's so much output the terminal can't display it all in real-time -- like an rsync -v on lots of small files -- does the program wait ("block") while the terminal catches up?

And if the program does block, can you speed up your backup by hiding the terminal, either by switching to another desktop, or by iconifying or shading the terminal window so it's not visible? Is there any difference among the different ways of hiding the terminal, like switching desktops, iconifying and shading?

Since I've never seen a discussion of that, I decided to test it myself. I wrote a very simple Python program:

import time

start = time.time()

for i in xrange(500000):
    print "Now we have printed", i, "relatively long lines to stdout."

print time.time() - start, "seconds to print", i, "lines."

I ran it under various combinations of visible and invisible terminal. The results were striking. These are rounded to the nearest tenth of a second, in most cases the average of several runs:

Terminal type Seconds
xterm, visible 56.0
xterm, other desktop 8.0
xterm, shaded 8.5
xterm, iconified 8.0
Linux framebuffer, visible 179.1
Linux framebuffer, hidden 3.7
gnome-terminal, visible 56.9
gnome-terminal, other desktop 56.7
gnome-terminal, iconified 56.7
gnome-terminal, shaded 43.8

Discussion:

First, the answer to the original question is clear. If I'm displaying output in an xterm, then hiding it in any way will make a huge difference in how long the program takes to complete.

On the other hand, if you use gnome-terminal instead of xterm, hiding your terminal window won't make much difference. Gnome-terminal is nearly as fast as xterm when it's displaying; but it apparently lacks xterm's smarts about not doing that work when it's hidden. If you use gnome-terminal, you don't get much benefit out of hiding it.

I was surprised how slow the Linux console was (I'm using the framebuffer in the Debian 3.2.0-4-686-pae on Intel graphics). But it's easy to see where that time is going when you watch the output: in xterm, you see lots of blank space as xterm skips drawing lines trying to keep up with the program's output. The framebuffer doesn't do that: it prints and scrolls every line, no matter how far behind it gets.

But equally interesting is how much faster the framebuffer is when it's not visible. (I typed Ctrl-alt-F2, logged in, ran the program, then typed Ctrl-alt-F7 to go back to X while the program ran.) Obviously xterm is doing some background processing that the framebuffer console doesn't need to do. The absolute time difference, less than four seconds, is too small to worry about, but it's interesting anyway.

I would have liked to try it my test a base Linux console, with no framebuffer, but figuring out how to get a distro kernel out of framebuffer mode was a bigger project than I wanted to tackle that afternoon.

I should mention that I wasn't super-scientific about these tests. I avoided doing any heavy work on the machine while the tests were running, but I was still doing light editing (like this article), reading mail and running xchat. The times for multiple runs were quite consistent, so I don't think my light system activity affected the results much.

So there you have it. If you're running an output-intensive program like rsync -av and you care how fast it runs, use either xterm or the console, and leave it hidden most of the time.

Tags: , , ,
[ 15:17 Nov 08, 2013    More linux | permalink to this entry | ]

Mon, 07 Oct 2013

Viewing HTML mail messages from Mutt (or other command-line mailers)

Update: the script described in this article has been folded into another script called viewmailattachments.py.

Command-line mailers like mutt have one disadvantage: viewing HTML mail with embedded images. Without images, HTML mail is no problem -- run it through lynx, links or w3m. But if you want to see images in place, how do you do it?

Mutt can send a message to a browser like firefox ... but only the textual part of the message. The images don't show up.

That's because mail messages include images, not as separate files, but as attachments within the same file, encoded it a format known as MIME (Multipurpose Internet Mail Extensions). An image link in the HTML, instead of looking like <img src="picture.jpg">., will instead look something like <img src="cid:0635428E-AE25-4FA0-93AC-6B8379300161">. (Apple's Mail.app) or <img src="cid:1.3631871432@web82503.mail.mud.yahoo.com">. (Yahoo's webmail).

CID stands for Content ID, and refers to the ID of the image as it is encoded in MIME inside the image. GUI mail programs, of course, know how to decode this and show the image. Mutt doesn't.

A web search finds a handful of shell scripts that use the munpack program (part of the mpack package on Debian systems) to split off the files; then they use various combinations of sed and awk to try to view those files. Except that none of the scripts I found actually work for messages sent from modern mailers -- they don't decode the CID links properly.

I wasted several hours fiddling with various shell scripts, trying to adjust sed and awk commands to figure out the problem, when I had the usual epiphany that always eventually arises from shell script fiddling: "Wouldn't this be a lot easier in Python?"

Python's email package

Python has a package called email that knows how to list and unpack MIME attachments. Starting from the example near the bottom of that page, it was easy to split off the various attachments and save them in a temp directory. The key is

import email

fp = open(msgfile)
msg = email.message_from_file(fp)
fp.close()

for part in msg.walk():

That left the problem of how to match CIDs with filenames, and rewrite the links in the HTML message accordingly.

The documentation on the email package is a bit unclear, unfortunately. For instance, they don't give any hints what object you'll get when iterating over a message with walk, and if you try it, they're just type 'instance'. So what operations can you expect are legal on them? If you run help(part) in the Python console on one of the parts you get from walk, it's generally class Message, so you can use the Message API, with functions like get_content_type(), get_filename(). and get_payload().

More useful, it has dictionary keys() for the attributes it knows about each attachment. part.keys() gets you a list like

['Content-Type', 
 'Content-Transfer-Encoding',
 'Content-ID',
 'Content-Disposition' ]

So by making a list relating part.get_filename() (with a made-up filename if it doesn't have one already) to part['Content-ID'], I'd have enough information to rewrite those links.

Case-insensitive dictionary matching

But wait! Not so simple. That list is from a Yahoo mail message, but if you try keys() on a part sent by Apple mail, instead if will be 'Content-Id'. Note the lower-case d, Id, instead of the ID that Yahoo used.

Unfortunately, Python doesn't have a way of looking up items in a dictionary with the key being case-sensitive. So I used a loop:

    for k in part.keys():
        if k.lower() == 'content-id':
            print "Content ID is", part[k]

Most mailers seem to put angle brackets around the content id, so that would print things like "Content ID is <14.3631871432@web82503.mail.mud.yahoo.com>". Those angle brackets have to be removed, since the CID links in the HTML file don't have them.

for k in part.keys():
    if k.lower() == 'content-id':
        if part[k].startswith('<') and part[k].endswith('>'):
            part[k] = part[k][1:-1]

But that didn't work -- the angle brackets were still there, even though if I printed part[k][1:-1] it printed without angle brackets. What was up?

Unmutable parts inside email.Message

It turned out that the parts inside an email Message (and maybe the Message itself) are unmutable -- you can't change them. Python doesn't throw an exception; it just doesn't change anything. So I had to make a local copy:

for k in part.keys():
    if k.lower() == 'content-id':
        content_id = part[k]
        if content_id.startswith('<') and content_id.endswith('>'):
            content_id = content_id[1:-1]
and then save content_id, not part[k], in my list of filenames and CIDs.

Then the rest is easy. Assuming I've built up a list called subfiles containing dictionaries with 'filename' and 'Content-Id', I can do the substitution in the HTML source:

    htmlsrc = html_part.get_payload(decode=True)
    for sf in subfiles:
        htmlsrc = re.sub('cid: ?' + sf['Content-Id'],
                         'file://' + sf['filename'],
                         htmlsrc, flags=re.IGNORECASE)

Then all I have to do is hook it up to a key in my .muttrc:

# macro  index  <F10>  "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/viewhtmlmail.py\n" "View HTML in browser"
# macro  pager  <F10>  "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/viewhtmlmail.py\n" "View HTML in browser"

Works nicely! Here's the complete script: viewhtmlmail.

Tags: , , , , ,
[ 11:49 Oct 07, 2013    More tech/email | permalink to this entry | ]

Wed, 28 Aug 2013

Python scripts for Android

Python on Android. Wouldn't that make so many things so much easier?

I've known for a long time about SL4A, but when I read, a year or two ago, that Google officially disclaimed support for languages other than Java and C and didn't want their employees working on projects like SL4A, I decided it wasn't a good bet.

But recently I heard from someone who had just discovered SL4A and its Python support and talked about it like a going thing. I had an Android scripting problem I really wanted to solve, and decided it was time to take another look.

It turns out SL4A and its Python interpreter are still being maintained, and indeed, I was able to solve my problem that way. But the documentation was scanty at best. So here are some shortcuts.

Getting Python running on Android

How do you install it in the first place? Took me three or four tries: it turns out it's extremely picky about the order in which you do things, and the documentation doesn't warn you about that. Follow these steps:

  1. Enable "Unknown Sources" under Application settings if you haven't already.
  2. Download both sl4a_r6.apk and PythonForAndroid_r4.apk
  3. Install sl4a from the apk. Do not install Python yet.
  4. Find SL4A in Applications and run it. It will say "no matches found" (i.e. no scripts) but that's okay: the important thing is that it creates the directory where the scripts will live, /sdcard/sl4a/scripts, without which PythonForAndroid would fail to install.
  5. Install PythonForAndroid from the apk.
  6. Find Python for Android in Applications and run it. Tap Install. This will install the sample scripts, and you'll be ready to go.

Make a shortcut on the home screen:

You've written a script and it does what you want. But to run it, you have to run SL4A, choose the Python interpreter, scroll around to find the script, tap on it, and indicate whether or not you want to see the console. Way too many steps!

Turns out you can make a shortcut on the home screen to an SL4A script, like this: (thanks to this tip):

This will give you the familiar twin-snake Python icon on your home screen. There doesn't seem to be any way to change this to a different icon.

Wait, what about UI?

Well, that still seems to be a big hole in the whole SL4A model. You can write great scripts that print to the console. You can even do a few specialized things, like popup menus, messages (what the Python Android module calls makeToast()) and notifications. The test.py sample script is a great illustration of how to use all those features, plus a lot more.

But what if you want to show a window, put a few buttons in it, let the user control things? Nobody seems to have thought about that possibility. I mean, it's not "sorry, we haven't had time to implement this", it isn't even mentioned as something someone would want to do on an Android device. Boggle.

The only possibility I've found is that there is apparently a way to use Android's WebView class from Python. I have not tried this yet; when I do, I'll write it up separately.

WebView may not be the best way to do UI. I've spent many hours tearing my hair out over its limitations even when called from Java. But still, it's something. And one very interesting thing about it is that it provides an easy way to call up an HTML page, either local or remote, from an Android home screen icon. So that may be the best reason yet to check out SL4A.

Tags: , ,
[ 22:31 Aug 28, 2013    More programming | permalink to this entry | ]

Tue, 20 Aug 2013

Using Google Maps with Python to turn a list of addresses into waypoints

A few days ago I wrote about how I use making waypoint files for a list of house addresses is OsmAnd. For waypoint files, you need latitude/longitude coordinates, and I was getting those from a web page that used the online Google Maps API to convert an address into latitude and longitude coordinates.

It was pretty cool at first, but pasting every address into the latitude/longitude web page and then pasting the resulting coordinates into the address file, got old, fast. That's exactly the sort of repetitive task that computers are supposed to handle for us.

The lat/lon page used Javascript and the Google Maps API. and I already had a Google Maps API key (they have all sorts of fun APIs for map geeks) ... but I really wanted something that could run locally, reading and converting a local file.

And then I discovered the Python googlemaps package. Exactly what I needed! It's in the Python Package Index, so I installed it with pip install googlemaps. That enabled me to change my waymaker Python script: if the first line of a description wasn't a latitude and longitude, instead it looked for something that might be an address.

Addresses in my data files might be one line or might be two, but since they're all US addresses, I know they'll end with a two-capital-letter state abbreviation and a 5-digit zip code: 2948 W Main St. Anytown, NM 12345. You can find that with a regular expression:

    match = re.search('.*[A-Z]{2}\s+\d{5}$', line)

But first I needed to check whether the first line of the entry was already latitude/longitude coordinates, since I'd already converted some of my files. That uses another regular expression. Python doesn't seem to have a built-in way to search for generic numeric expressions (containing digits, decimal points or +/- symbols) so I made one, since I had to use it twice if I was searching for two numbers with whitespace between them.

    numeric = '[\+\-\d\.]'
    match = re.search('^(%s+)\s+(%s+)$' % (numeric, numeric),
                      line)
(For anyone who wants to quibble, I know the regular expression isn't perfect. For instance, it would match expressions like 23+48..6.1-64.5. Not likely to be a problem in these files, so I didn't tune it further.)

If the script doesn't find coordinates but does find something that looks like an address, it feeds the address into Google Maps and gets the resulting coordinates. That code looks like this:

from googlemaps import GoogleMaps

gmaps = GoogleMaps('YOUR GOOGLE MAPS API KEY HERE')
try:
    lat, lon = gmaps.address_to_latlng(addr)
except googlemaps.GoogleMapsError, e:
    print "Oh, no! Couldn't geocode", addr
    print e

Overall, a nice simple solution made possible with python-googlemaps. The full script is on github: waymaker.

Tags: , , , , , , ,
[ 12:24 Aug 20, 2013    More mapping | permalink to this entry | ]

Tue, 28 May 2013

A quick URL shortener

For years I've used bookmarklets to shorten URLs. For instance, with is.gd, I set up a bookmark to javascript:document.location='http://is.gd/create.php?longurl='+encodeURIComponent(location.href);, give it a keyword like isgd, and then when I'm on a page I want to paste into Twitter (the only reason I need a URL shortener), I type Ctrl-L (to focus the URL bar) then isgd and hit return. Easy.

But with the latest rev of Firefox (I'm not sure if this started with version 20 or 21), sometimes javascript: links don't work. They just display the javascript source in the URLbar rather than executing it. Lacking a solution to the Firefox problem, I still needed a way of shortening URLs. So I looked into Python solutions.

It turns out there are a few URL shorteners with public web APIs. is.gd is one of them; shorturl.com is another. There are also APIs for bit.ly and goo.gl if you don't mind registering and getting an API key. Given that, it's pretty easy to write a Python script.

Which of course I did: shorturl.

[Python url shortening script] In the browser, I select the URL I want (e.g. by doubleclicking in the URLbar, or by right-clicking and choosing "Copy link location". That puts the URL in the X selection. Then I run the shorturl script, with no arguments. (I have it in my window manager's root menu.)

shorturl reads the X selection and shortens the URL (it tries is.gd first, then shorturl.com if is.gd doesn't work for some reason). Then it pops up a little window showing me both the short URL and the original long one, so I can be sure I shortened the right thing. (One thing I don't like about a lot of the URL services is that they don't tell you the original URL; I only find out later that I tweeted a link to something that wasn't at all the link I intended to share.)

It also copies the short URL into the X selection, so after verifying that the long URL was the one I wanted, I can go straight to my Twitter window (in my case, a Bitlbee tab in my IRC client) and middleclick to paste it.

After I've pasted the short link, I can dismiss the window by typing q. Don't type q too early -- since the python script owns the X selection, you won't be able to paste it anywhere once you've closed the window. (Unless you're running a selection-managing app like klipper.)

I just wish there were some way to use it for Twitter's own shortener, t.co. It's so frustrating that Twitter makes us all shorten URLs to fit in 140 characters just so they can shorten them again with their own service -- in the process removing any way for readers to see where the link will go. Sorry, folks -- nothing I can do about that. Complain to Twitter about why they won't let anyone use t.co directly.

Tags: , ,
[ 12:42 May 28, 2013    More tech/web | permalink to this entry | ]

Sat, 25 May 2013

Telling your Raspberry Pi that your terminal is bigger than 24 lines

When I'm working with an embedded Linux box -- a plug computer, or most recently with a Raspberry Pi -- I usually use GNU screen as my terminal program. screen /dev/ttyUSB0 115200 connects to the appropriate USB serial port at the appropriate speed, and then you can log in just as if you were using telnet or ssh.

With one exception: the window size. Typically everything is fine until you use an editor, like vim. Once you fire up an editor, it assumes your terminal window is only 24 lines high, regardless of its actual size. And even after you exit the editor, somehow your window will have been changed so that it scrolls at the 24th line, leaving the bottom of the window empty.

Tracking down why it happens took some hunting. Tthere are lots of different places the screen size can be set. Libraries like curses can ask the terminal its size (but apparently most programs don't). There's a size built into most terminfo entries (specified by the TERM environment variable) -- but it's not clear that gets used very much any more. There are environment variables LINES and COLUMNS, and a lot of programs read those; but they're often unset, and even if they are set, you can't trust them. And setting any of these didn't help -- I could change TERM and LINES and COLUMNS all I wanted, but as soon as I ran vim the terminal would revert to that scrolling-at-24-lines behavior.

In the end it turned out the important setting was the tty setting. You can get a summary of what the tty driver thinks its size is:

% stty size
32 80

But to set it, you use rows and columns rather than size. I discovered I could type stty rows 32 (or whatever my current terminal size was), and then I could run vim and it would stay at 32 rather than reverting to 24. So that was the important setting vim was following.

The basic problem was that screen, over a serial line, doesn't have a protocol for passing the terminal's size information, the way a remote login program like ssh, rsh or telnet does. So how could I get my terminal size set appropriately on login?

Auto-detecting terminal size

There's one program that will do it for you, which I remembered from the olden days of Unix, back before programs like telnet had this nice size-setting built in. It's called resize, and on Debian, it turned out to be part of the xterm package.

That's actually okay on my current Raspberry Pi, since I have X libraries installed in case I ever want to hook up a monitor. But in general, a little embedded Linux box shouldn't need X, so I wasn't very satisfied with this solution. I wanted something with no X dependencies. Could I do the same thing in Python?

How it works

Well, as I mentioned, there are ways of getting the size of the actual terminal window, by printing an escape sequence and parsing the result.

But finding the escape sequence was trickier than I expected. It isn't written about very much. I ended up running script and capturing the output that resize sent, which seemed a little crazy: '\e[7\e[r\e[999;999H\e[6n' (where \e means the escape character). Holy cow! What are all those 999s?

Apparently what's going on is that there isn't any sequence to ask xterm (or other terminal programs) "What's your size?" But there is a sequence to ask, "Where is the cursor on the screen right now?"

So what you do is send a sequence telling it to go to row 999 and column 999; and then another sequence asking "Where are you really?" Then read the answer: it's the window size.

(Note: if we ever get monitors big enough for 1000x1000 terminals, this will fail. I'm not too worried.)

Reading the answer

Okay, great, we've asked the terminal where it is, and it responds. How do we read the answer? That was actually the trickiest part.

First, you have to write to /dev/tty, not just stdout.

Second, you need the output to be available for your program to read, not just echo in the terminal for the user to see. Setting the tty to noncanonical mode does that.

Third, you can't just do a normal blocking read of stdin -- it'll never return. Instead, put stdin into non-blocking mode and use select() to see when there's something available to read.

And of course, you have to make sure you reset the terminal back to normal canonical line-buffered mode when you're done, whether or not your read succeeds.

Once you do all that, you can read the output, which will look something like "\e[32;80R". The two numbers, of course, are the lines and columns values you want; ignore the rest.

stty in python

Oh, yes, and one other thing: once you've read the terminal size, how do you set the stty size appropriately? You can't just run system('stty rows %d' % (rows) seems like it should work, but it doesn't, probably because it's using stdout instead of /dev/tty. But I did find one way to do it, the enigmatic:

fcntl.ioctl(fd, termios.TIOCSWINSZ,
            struct.pack("HHHH", rows, cols, 0, 0))

Here it all is in one script, which you can install on your Raspberry Pi (or other embedded Linux box) and run from .bash_profile:
termsize: set stty size to the size of the current terminal window.

Update, 2017: Turns out this doesn't quite work in Python 3, but I've updated the script, so use the code in the script rather than copying and pasting from this article. The explanation of the basic method hasn't changed.

Tags: , , , , ,
[ 19:47 May 25, 2013    More hardware | permalink to this entry | ]

Wed, 15 May 2013

Finding versions of installed packages in Debian/Ubuntu

Checking versions in Debian-based systems is a bit of a pain.

This happens to me a couple of times a month: for some reason I need to know what version of something I'm currently running -- often a library, like libgtk. aptitude show will tell you all about a package -- but only if you know its exact name. You can't do aptitude show libgtk or even aptitude show '*libgtk*' -- you have to know that the package name is libgtk2.0-0. Why is it libgtk2.0-0? I have no idea, and it makes no sense to me.

So I always have to do something like aptitude search libgtk | egrep '^i' to find out what packages I have installed that matches the name libgtk, find the package I want, then copy and paste that name after typing aptitude show.

But it turns out it's super easy in Python to query Debian packages using the Python apt package. In fact, this is all the code you need:

import sys
import apt

cache = apt.cache.Cache()

pat = sys.argv[1]

for pkgname in cache.keys():
    if pat in pkgname:
        pkg = cache[pkgname]
        instver = pkg.installed
        if instver:
            print pkg.name, instver.version
Then run aptver libgtk and you're all set.

In practice, I wanted nicer formatting, with columns that lined up, so the actual script is a little longer. I also added a -u flag to show uninstalled packages as well as installed ones. Amusingly, the code to format the columns took about twice as many lines as the code that does the actual work. There doesn't seem to be a standard way of formatting columns in Python, though there are lots of different implementations on the web. Now there's one more -- in my aptver on github.

Tags: , , , ,
[ 16:07 May 15, 2013    More linux | permalink to this entry | ]

Sat, 13 Apr 2013

Parsing NOAA historical weather data

We've been considering the possibility of moving out of the Bay Area to somewhere less crowded, somewhere in the desert southwest we so love to visit. But that also means moving to somewhere with much harsher weather.

How harsh? It's pretty easy to search for a specific location and get average temperatures. But what if I want to make a table to compare several different locations? I couldn't find any site that made that easy.

No problem, I say. Surely there's a Python library, I say. Well, no, as it turns out. There are Python APIs to get the current weather anywhere; but if you want historical weather data, or weather data averaged over many years, you're out of luck.

NOAA purports to have historical climate data, but the only dataset I found was spotty and hard to use. There's an FTP site containing directories by year; inside are gzipped files with names like 723710-03162-2012.op.gz. The first two numbers are station numbers, and there's a file at the top level called ish-history.txt with a list of the station codes and corresponding numbers. Not obvious!

Once you figure out the station codes, the files themselves are easy to parse, with lines like

STN--- WBAN   YEARMODA    TEMP       DEWP      SLP        STP       VISIB      WDSP     MXSPD   GUST    MAX     MIN   PRCP   SNDP   FRSHTT
724945 23293  20120101    49.5 24    38.8 24  1021.1 24  1019.5 24    9.9 24    1.5 24    4.1  999.9    68.0    37.0   0.00G 999.9  000000
Each line represents one day (20120101 is January 1st, 2012), and the codes are explained in another file called GSOD_DESC.txt. For instance, MAX is the daily high temperature, and SNDP is snow depth.

[NOAA historical temp program] So all I needed was to decode the station names, download the right files and parse them. That took about a day to write (including a lot of time wasted futzing with mysterious incantations for matplotlib).

Little accessibility refresher: I showed it to Dave -- "Neat, look at this, San Jose is the blue pair, Flagstaff is green and Page is red." His reaction: "This makes no sense. They all look the same to me. I have no idea which is which." Oops -- right. Don't use color as your only visual indicator. I knew that, supposedly! So I added markers in different shapes for each site. (I wish somebody would teach that lesson to Google Maps, which uses color as its only indicator on the traffic layer, so it's useless for red-green colorblind people.)

Back to the data -- it turns out NOAA doesn't actually have that much historical data available for download. If you search on most of these locations, you'll find sites that claim to have historical temperatures dating back 50 years or more, sometimes back to the 1800s. But NOAA typically only has files starting at about 2005 or 2006. I don't know where sites are getting this older data, or how reliable it is.

Still, averages since 2006 are still interesting to compare. Here's a run of noaatemps.py KSJC KFLG KSAF KLAM KCEZ KPGA KCNY. It's striking how moderate California weather is compared to any of these inland sites. No surprise there. Another surprise was that Los Alamos, despite its high elevation, has more moderate weather than most of the others -- lower highs, higher lows. I was a bit disappointed at how sparse the site list was -- no site in Moab? Really? So I used Canyonlands Field instead.

Anyway, it's fun for a data junkie to play around with, and it prints data on other weather factors, like precipitation and snowpack, although it doesn't plot them yet. The code is on my GitHub scripts page, under Weather.

Anyone found a better source for historical weather information? I'd love to have something that went back far enough to do some climate research, see what sites are getting warmer, colder, or seeing greater or lesser spreads between their extreme temperatures. The NOAA dataset obviously can't do that, so there must be something else that weather researchers use. Data on other countries would be interesting, too. Is there anything that's available to the public?

Tags: , , ,
[ 22:57 Apr 13, 2013    More programming | permalink to this entry | ]

Tue, 19 Mar 2013

Letters not used in Python keywords

One of the closing lightning talks at PyCon this year concerned the answers to a list of Python programming puzzles given at some other point during the conference. I hadn't seen the questions (I'm still not sure where they are), but some of the problems looked fun.

One of them was: "What are the letters not used in Python keywords?" I hadn't known about Python's keyword module, which could come in handy some day:

>>> import keyword
>>> keyword.kwlist
['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield']

So, given the list of keywords, what's the best way to find the list of unique letters?

Any time you want a list of unique anything, you want a set. For instance,

>>> set([1, 2, 3, 2, 2, 4, 5, 1, 5])
set([1, 2, 3, 4, 5])
But first you need a list of letters so can make a set out of it.

Split the list of words into a list of letters

My first idea was to use list comprehensions. You can split a single word into letters like this:

>>> [ x for x in 'hello' ]
['h', 'e', 'l', 'l', 'o']

It took a bit of fiddling to get the right syntax to apply that to every word in the list:

>>> [[c for c in w] for w in keyword.kwlist]
[['a', 'n', 'd'], ['a', 's'], ['a', 's', 's', 'e', 'r', 't'], ... ]

Update: Dave Foster points out that [list(w) for w in keyword.kwlist] is another way, simpler and cleaner way than the double list comprehension.

That's a list of lists, so it needs to be flattened into a single list of letters before we can turn it into a set.

Flatten the list of lists

There are lots of ways to flatten a list of lists. Here are four of them:

[item for sublist in [[c for c in w] for w in keyword.kwlist] for item in sublist]

reduce(lambda x,y: x+y, [[c for c in w] for w in keyword.kwlist])

import itertools
list(itertools.chain.from_iterable([[c for c in w] for w in keyword.kwlist]))

sum([[c for c in w] for w in keyword.kwlist], [])

That last one, using sum(), makes use of the fact that Python uses + for list concatenation -- in other words, that [1, 2, 3] + [4, 5, 6] is [1, 2, 3, 4, 5, 6]. But the first method (item for sublist in) is faster: see Making a flat list out of list of lists in Python on StackOverflow. And another StackOverflow thread has a nice script for plotting speed vs. list size of various flatteners.

A simpler way of making the set

But it turns out none of this list comprehension stuff is needed anyway. set('word') splits words into letters already:

>>> set('bubble')
set(['e', 'b', 'u', 'l'])
Ignore the order -- elements of a set often end up displaying in some strange order. The important thing is that it has all the letters and no repeats.

Now we have an easy way of making a set containing the letters in one word. But how do we apply that to a list of words?

Again I initially tried using list comprehensions, then realized there's an easier way. Given a list of strings, it's trivial to join them into a single string using ''.join(). And that gives us our set of letters within keywords:

>>> set(''.join(keyword.kwlist))
set(['a', 'c', 'b', 'e', 'd', 'g', 'f', 'i', 'h', 'k', 'm', 'l', 'o', 'n', 'p', 's', 'r', 'u', 't', 'w', 'y', 'x'])

What letters are not in the set?

Almost done! But the original problem was to find the letters not in keywords. We can do that by subtracting this set from the set of all letters from a to z. How do we get that? The string module will give us a list:

>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz'

You could also use a list comprehension and ord and chr (alas, range won't give you a range of letters directly):

>>> [chr(i) for i in range(ord('a'), ord('z')+1)]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
It's a bit longer, but doesn't require an import.

Now that you have your a-z set, just subtract the two sets:

>>> set(string.lowercase[:]) - set(''.join(keyword.kwlist))
set(['q', 'j', 'z', 'v'])

So the only letters not used in Python keywords are q, j, z and v.

Just a useless little ditty, really ... but I thought it was a fun exercise, so maybe you will too.

Tags: ,
[ 13:36 Mar 19, 2013    More programming | permalink to this entry | ]

Sat, 16 Mar 2013

SimpleCV on Raspberry Pi

I'm at PyCon, and I spent a lot of the afternoon in the Raspberry Pi lab.

Raspberry Pis are big at PyCon this year -- because everybody at the conference got a free RPi! To encourage everyone to play, they have a lab set up, well equipped with monitors, keyboards, power and ethernet cables, plus a collection of breadboards, wires, LEDs, switches and sensors.

I'm primarily interested in the RPi as a robotics controller, one powerful enough to run a camera and do some minimal image processing (which an Arduino can't do). And on Thursday, I attended a PyCon tutorial on the Python image processing library SimpleCV. It's a wrapper for OpenCV that makes it easy to access parts of images, do basic transforms like greyscale, monochrome, blur, flip and rotate, do edge and line detection, and even detect faces and other objects. Sounded like just the ticket, if I could get it to work on a Raspberry Pi.

SimpleCV can be a bit tricky to install on Mac and Windows, apparently. But the README on the SimpleCV git repository gives an easy 2-line install for Ubuntu. It doesn't run on Debian Squeeze (though it installs), because apparently it depends on a recent version of pygame and Squeeze's is too old; but Ubuntu Pangolin handled it just fine.

The question was, would it work on Raspbian Wheezy? Seemed like a perfect project to try out in the PyCon RPi lab. Once my RPi was set up and I'd run an apt-get update, I used used netsurf (the most modern of the lightweight browsers available on the RPi) to browse to the SimpleCV installation instructions. The first line,

sudo apt-get install ipython python-opencv python-scipy python-numpy python-pygame python-setuptools python-pip
was no problem. All those packages are available in the Raspbian repositories.

But the second line,

sudo pip install https://github.com/ingenuitas/SimpleCV/zipball/master
failed miserably. Seems that pip likes to put its large downloaded files in /tmp; and on Raspbian, running off an SD card, /tmp quite reasonably is a tmpfs, running in RAM. But that means it's quite small, and programs that expect to be able to use it to store large files are doomed to failure.

I tried a couple of simple Linux patches, with no success. You can't rename /tmp to replace it with a symlink to a directory on the SD card, because /tmp is always in use. And pip makes a new temp directory name each time it's run, so you can't just symlink the pip location to a place on the SD card.

I thought about rebooting after editing the tmpfs out of /etc/fstab, but it turns out it's not set up there, and it wasn't obvious how to disable the tmpfs. Searching later from home, the size is set in /etc/default/tmpfs. As for disabling the tmpfs and using the SD card instead, it's not clear. There's a block of code in /etc/init.d/mountkernfs.sh that makes that decision; it looks like symlinking /tmp to somewhere else might do it, or else commenting out the code that sets RAMTMP="yes". But I haven't tested that.

Instead of rebooting, I downloaded the file to the SD card:

wget https://github.com/ingenuitas/SimpleCV/master

But it turned out it's not so easy to pip install from a local file. After much fussing around I came up with this, which worked:

pip install http:///home/pi/master --download-cache /home/pi/tmp

That worked, and the resulting SimpleCV install worked nicely! I typed some simple tests into the simplecv shell, playing around with their built-in test image "lenna":

img = Image('lenna')
img.show()
img.binarize().show()
img.toGray().show()
img.edges().show()
img.invert().show()

And, for something a little harder, some face feature detection: let's find her eyes and outline them in yellow.

img.listHaarFeatures()
img.findHaarFeatures('eye.xml').draw(color=Color.YELLOW)
[Lenna, edges] [Lenna, eyes detected]

SimpleCV is lots of fun! And the edge detection was quite fast on the RPi -- this may well be usable by a robot, once I get the motors going.

Tags: , , , , ,
[ 21:43 Mar 16, 2013    More linux/install | permalink to this entry | ]

Thu, 21 Feb 2013

New project: Metapho image tagger

I'm excited about my new project: MetaPho, an image tagger.

It arose out of a discussion on the LinuxChix Techtalk list: photo collection management software. John Sturdy was looking for an efficient way of viewing and tagging large collections of photos. Like me, he likes fast, lightweight, keyboard-driven programs. And like me, he didn't want a database-driven system that ties you forever to one image cataloging program. I put my image tags in plaintext files, named Keywords, so that I can easily write scripts to search or modify them, or user grep, and I can even make quick changes with a text editor.

I shared some tips on how I use my Pho image viewer for tagging images, and it sounded close to what he was looking for. But as we discussed ideas about image tagging, we realized that there were things he wanted to do that pho doesn't do well, things not offered by any other image tagger we've been able to find. While discussing how we might add new tagging functionality to pho, I increasingly had the feeling that I was trying to fit off-road tires onto a Miata -- or insert your own favorite metaphor for "making something do something it wasn't designed to do."

Pho is a great image viewer, but the more I patched it to handle tagging, the uglier and more complicated the code got, and it also got more complex to use.

[metapho screenshot] And really, everything we needed for tagging could be easily done in a Python-GTK application. (Pho is written in C because it does a lot of complicated focus management to deal with how window managers handle window moving and resizing. A tagger wouldn't need any of that.)

I whipped up a demo image viewer in a few hours and showed it to John. We continued the discussion, I made a GitHub repo, and over the next week or so the code grew into an efficient and already surprisingly usable image tagger.

We have big plans for it, like tags organized into categories so we can have lots of tags without cluttering the interface too much. But really, even as it is, it's better than anything I've used before. I've been scanning in lots of photos from old family albums (like this one of my mother and grandmother, and me at 9 months) and it's been great to be able to add and review tags easily.

If you want to check out MetaPho, or contribute to it (either code or user interface design), it lives in my MetaPho repository on GitHub. And I wrote up a quick man page in markdown format: metapho.1.md.

Feedback and contributors welcome!

Tags: , , , , ,
[ 19:31 Feb 21, 2013    More programming | permalink to this entry | ]

Sat, 19 Jan 2013

Converting C to Python with a vi regexp

I'm fiddling with a serial motor controller board, trying to get it working with a Raspberry Pi. (It works nicely with an Arduino, but one thing I'm learning is that everything hardware-related is far easier with Arduino than with RPi.)

The excellent Arduino library helpfully provided by Pololu has a list of all the commands the board understands. Since it's Arduino, they're in C++, and look something like this:

#define QIK_GET_FIRMWARE_VERSION         0x81
#define QIK_GET_ERROR_BYTE               0x82
#define QIK_GET_CONFIGURATION_PARAMETER  0x83
[ ... ]
#define QIK_CONFIG_DEVICE_ID                        0
#define QIK_CONFIG_PWM_PARAMETER                    1
and so on.

On the Arduino side, I'd prefer to use Python, so I need to get them to look more like:

    QIK_GET_FIRMWARE_VERSION = 0x81
    QIK_GET_ERROR_BYTE = 0x82
    QIK_GET_CONFIGURATION_PARAMETER = 0x83
[ ... ]
    QIK_CONFIG_DEVICE_ID = 0
    QIK_CONFIG_PWM_PARAMETER = 1
... and so on ... with an indent at the beginning of each line since I want this to be part of a class.

There are 32 #defines, so of course, I didn't want to make all those changes by hand. So I used vim. It took a little fiddling -- mostly because I'd forgotten that vim doesn't offer + to mean "one or more repetitions", so I had to use * instead. Here's the expression I ended up with:

.,$s/\#define *\([A-Z0-9_]*\) *\(.*\)/ \1 = \2/

In English, you can read this as:

From the current line to the end of the file (,.$/), look for a pattern consisting of only capital letters, digits and underscores ([A-Z0-9_]). Save that as expression #1 (\( \)). Skip over any spaces, then take the rest of the line (.*), and call it expression #2 (\( \)).

Then replace all that with a new line consisting of 4 spaces, expression 1, a spaced-out equals sign, and expression 2 ( \1 = \2).

Who knew that all you needed was a one-line regular expression to translate C into Python?

(Okay, so maybe it's not quite that simple. Too bad a regexp won't handle the logic inside the library as well, and the pin assignments.)

Tags: , , , , ,
[ 21:38 Jan 19, 2013    More linux/editors | permalink to this entry | ]

Wed, 31 Oct 2012

Comparing sunset times with PyEphem

We were marveling at how early it's getting dark now -- seems like a big difference even compared to a few weeks ago. Things change fast this time of year.

Since we're bouncing back and forth a lot between southern and northern California, Dave wondered how Los Angeles days differed from San Jose days. Of course, San Jose being nearly 4 degrees farther north, it should have shorter days -- but by the weirdness of orbital mechanics that doesn't necessarily mean that the sun sets earlier in San Jose. His gut feel was that LA was actually getting an earlier sunset.

"I can calculate that," I said, and fired up a Python interpreter to check with PyEphem. Since PyEphem doesn't know San Jose (hmph! San Jose is bigger than San Francisco) I used San Francisco.

Since PyEphem's Observer class only has next_rising() and next_setting(), I had to set a start date of midnight so I could subtract the two dates properly to get the length of the day.

>>> sun = ephem.Sun()
>>> la = ephem.city('Los Angeles')
>>> sf = ephem.city('San Francisco')
>>> 
>>> mid = ephem.Date('2012/10/31 8:00')
>>> 
>>> la.next_rising(sun, start=mid)
2012/10/31 14:11:57
>>> la.next_setting(sun, start=mid)
2012/11/1 01:00:45
>>> la.next_setting(sun, start=mid) - la.next_rising(sun, start=mid)
0.45055988136300584
>>> 
>>> sf.next_rising(sun, start=mid)
2012/10/31 14:34:19
>>> sf.next_setting(sun, start=mid)
2012/11/1 01:11:44
>>> sf.next_setting(sun, start=mid) - sf.next_rising(sun, start=mid)
0.4426457611261867

So Dave's intuition was right: northern California really does have a later sunset than southern California at this time of year, even though the total day length is shorter -- the difference in sunrise time makes up for the later sunset.

How much shorter?

>>> (la.next_setting(sun, start=mid) - la.next_rising(sun, start=mid)) - (sf.next_setting(sun, start=mid) - sf.next_rising(sun, start=mid))
0.007914120236819144
>>> ((la.next_setting(sun, start=mid) - la.next_rising(sun, start=mid)) - (sf.next_setting(sun, start=mid) - sf.next_rising(sun, start=mid))) * 24
0.18993888568365946
>>> ((la.next_setting(sun, start=mid) - la.next_rising(sun, start=mid)) - (sf.next_setting(sun, start=mid) - sf.next_rising(sun, start=mid))) * 24 * 60
11.396333141019568

And we have our answer -- there's about 11 minutes difference in day length between SF and LA.

Tags: , ,
[ 11:46 Oct 31, 2012    More science/astro | permalink to this entry | ]

Wed, 17 Oct 2012

Asynchronous sound playing in Python

A little while back I wrote about my Python xchat script to play sound alerts.

But one thing that's been annoying me about it -- it was a problem with the old perl alert script too -- is the repeated sounds. If lots of twitter updates come in on the Bitlbee channel, or if someone pastes numerous lines into a channel, I hear POPPOPPOPPOPPOPPOP or repetitions of whatever the alert sound is for that type of message. It's annoying to me, but even more so to anyone else in the same room.

It would be so much nicer if I could have it play just one repetition of any given alert, even if there are eight lines all coming in at the same time. So I decided to write a Python class to handle that.

My existing code used subprocesses to call the basic ALSA sound player, /usr/bin/aplay -q. Initially I used
if not os.fork() : os.execl(APLAY, APLAY, "-q", alertfile)
but I later switched to the cleaner
subprocess.call([APLAY, '-q', alertfile])
But of course, it would be better to do it all from Python without requiring an external process like aplay. So I looked into that first.

Sadly, it turns out Python audio support is a mess. The built-in libraries are fairly limited in functionality and formats, and the external libraries that handle sound are mostly unmaintained, unless you want to pull in a larger library like pygame. After a little web searching I decided that maybe an aplay subprocess wasn't so bad after all.

Okay, so how should I handle the subprocesses? I decided the best way was to keep track of what sound was currently playing. If another alert fires for the same sound while that sound is already playing, just ignore it. If an alert comes in for a different sound, then wait() for the current sound to finish, then start the new sound.

That's all quite easy with Python's subprocess module. subprocess.Popen() returns a Popen object that tracks a process ID and can check whether that process has finished or not. If self.curpath is the path to the sound currently playing and self.current is the Popen object for whatever aplay process is currently running, then:

    if self.current :
        if self.current.poll() == None :
            # Current process hasn't finished yet. Is this the same sound?
            if path == self.curpath :
                # A repeat of the currently playing sound.
                # Don't play it more than once.
                return
            else :
                # Trying to play a different sound.
                # Wait on the current sound then play the new one.
                self.wait()

    self.curpath = path
    self.current = subprocess.Popen([ "/usr/bin/aplay", '-q', path ] )

Finally, it's a good idea when exiting the program to check whether any aplay process is running, and wait() for it. Otherwise, you might end up with a zombie aplay process.

    def __del__(self) :
        self.wait()

I don't know if xchat actually closes down Python objects gracefully, so I don't know whether the __del__ destructor will actually be called. But at least I tried. It's possible that a context manager might be more reliable.

The full scripts are on github at pyplay.py for the basic SoundPlayer class, and chatsounds.py for the xchat script that includes SoundPlayer.

Tags: , ,
[ 13:07 Oct 17, 2012    More programming | permalink to this entry | ]

Wed, 26 Sep 2012

Writing xchat scripts in Python (to play sound alerts)

I use xchat as my IRC client. Mostly I like it, but its sound alerts aren't quite as configurable as I'd like. I have a few channels, like my Bitlbee Twitter feed, where I want a much more subtle alert, or no alert at all. And I want an easy way of turning sounds on and off, in case I get busy with something and need to minimize distractions.

Years ago I grabbed a perl xchat plug-in called "Smet's NickSound" that did something close to what I wanted. I've hacked a few things into it. But every time I try to customize it any further, I'm hit with the pain of write-only Perl. I've written Perl scripts, honest. But I always have a really hard time reading anyone else's Perl code and figuring out what it's doing. When I dove in again recently to try to figure out why I was getting so many alerts when first starting up xchat, I finally decided: learning how to write a Python xchat script couldn't be any harder than reverse engineering a Perl one.

First, of course, I looked for an existing nick sound Python script ... and totally struck out. In fact, mostly I struck out on finding any xchat Python scripts at all. I know there are Python bindings for xchat, because there's documentation for them. But sample plug-ins? Nope. For some reason, nobody's writing xchat plug-ins in Python.

I eventually found two minimal examples: this very simple example and the more elaborate utf8decoder. I was able to put them together and cobble up a working nick sound plug-in. It's easy once you have an example to work from to help you figure out the event hook arguments.

So here's my own little example, which may help the next person trying to learn xchat Python scripting: chatsounds.py on github.

Tags: , , ,
[ 22:13 Sep 26, 2012    More programming | permalink to this entry | ]

Wed, 19 Sep 2012

Python: Do two dates span a particular day of the week or month?

When I'm using my RSS reader FeedMe, I normally check every feed every day. But that can be wasteful: some feeds, like World Wide Words, only update once a week. A few feeds update even less often, like serialized books that come out once a month or whenever the author has time to add something new.

So I decided it would be nice to add some "when" logic to FeedMe, so I could add when = Sat in the config section for World Wide Words and have it only update once a week.

That sounded trivial -- a little python parsing logic to tell days from numbers, a few calls to time.localtime() and I was done.

Except of course I wasn't. Because sometimes, like when I'm on vacation, I don't always update every day. If I missed a Saturday, then I'd never see that week's edition of World Wide Words. And that would be terrible!

So what I really needed was a way to ask, "Has a Saturday occurred (including today) since the last time I ran feedme?"

The last time I ran feedme is easy to determine: it's in the last modified date of the cache file. Or, in more Pythonic terms, it's statbuf = os.stat(cachefile).st_mtime. And of course I can get the current time with time.localtime(). But how do I figure out whether a given week or month day falls between those two dates?

I'm sure this particular wheel has been invented many times. There's probably even a nifty Python library somewhere to do it. But how do you google for that? I tried to think of keywords and found nothing. So I went for a nice walk in the redwoods and thought about it for a bit, and came up with a solution.

Turns out for the week day case, you can just use modular arithmetic: if (weekday_2 - target_weekday) % 7 < (weekday_2 - weekday_1) then the day does indeed fall between the two dates.

Things are a little more complicated for the day of the month, though, because you don't know whether you need mod 30 or 31 or 29 or 28, so you either have to make your own table, or import the calendar module just so you can call calendar.monthrange().

I decided it was easier to use logic: if the difference between the two dates is greater than 31, then it definitely includes any month day. Otherwise, check whether they're in the same month or not, and do greater than/less than comparisons on the three dates.

Throw in a bunch of conversion to make it easy to call, and a bunch of unit tests to make sure everything works and my later tweaks don't break anything, and I had a nice function I could call from Feedme.

falls_between.py on github

Tags: ,
[ 22:07 Sep 19, 2012    More programming | permalink to this entry | ]

Sun, 02 Sep 2012

GIMP plug-in to export scaled-down versions of images

In a discussion on Google+arising from my Save/Export clean plug-in, someone said to the world in general

PLEASE provide an option to select the size of the export. Having to scale the XCF then export then throw out the result without saving is really awkward.

I thought, What a good idea! Suppose you're editing a large image, with layers and text and other jazz, saving in GIMP's native XCF format, but you want to export a smaller version for the web. Every time you make a significant change, you have to: Scale (remembering the scale size or percentage you're targeting); Save a Copy (or Export in GIMP 2.8); then Undo the Scale. If you forget the Undo, you're in deep trouble and might end up overwriting the XCF original with a scaled-down version.

If I had a plug-in that would export to another file type (such as JPG) with a scale factor, remembering that scale factor so I didn't have to, it would save me both effort and risk.

And that sounded pretty easy to write, using some of the tricks I'd learned from my Save/Export Clean and wallpaper scripts. So I wrote export-scaled.py

Update: Please consider using Saver instead. Saver integrates the various functions I used to have in different save/export plug-ins; it should do everything export-scaled.py does and more, and export-scaled.py is no longer maintained. If you need something export-scaled.py does that saver doesn't do as well, please let me know.

It's still brand new, so if anyone tries it, I'd appreciate knowing if it's useful or if you have any problems with it.

Geeky programming commentary

(Folks not interested in the programming details can stop reading now.)

Linked input fields

[screenshot: linked input fields] One fun project was writing a set of linked text entries for the dialog:
Scale to: Percentage 100 % Width: 640 Height: 480

Change any one of the three, and the other two change automatically. There's no chain link between width and height: It's assumed that if you're exporting a scaled copy, you won't want to change the image's aspect ratio, so any one of the three is enough.

That turned out to be surprisingly hard to do with GTK SpinBoxes: I had to read their values as strings and parse them, because the numeric values kept snapping back to their original values as soon as focus went to another field.

Image parasites

Another fun challenge was how to save the scale ratio, so the second time you call up the plug-in on the same image it uses whatever values you used the first time. If you're going to scale to 50%, you don't want to have to type that in every time. And of course, you want it to remember the exported file path, so you don't have to navigate there every time.

For that, I used GIMP parasites: little arbitrary pieces of data you can attach to any image. I've known about parasites for a long time, but I'd never had occasion to use them in a Python plug-in before. I was pleased to find that they were documented in the official GIMP Python documentation, and they worked just as documented. It was easy to test them, too: in the Python console (Filters->Python->Console...), type something like

img = gimp_image_list()[0]
img.parasite_list()
img.parasite_find(img.parasite_list()[0])
and so forth. Nice!

Not prompting for JPG settings

My plug-in was almost done. But when I ran it and told it to save to filenamecopy.jpg, it prompted me with that annoying JPEG settings dialog. Okay, being prompted once isn't so bad. But then when I exported a second time, it prompted me again, and didn't remember the values from before. So the question was, what controls whether the settings dialog is shown, and how could I prevent it?

Of course, I could prompt the user for JPEG quality, then call jpeg-save-file directly -- but what if you want to export to PNG or GIF or some other format? I needed something more general

Turns out, nobody really remembers how this works, and it's not documented anywhere. Some people thought that passing run_mode=RUN_WITH_LAST_VALS when I called pdb.gimp_file_save() would do the trick, but it didn't help.

So I guessed that there might be a parasite that was storing those settings: if the JPEG save plug-in sees the parasite, it uses those values and doesn't prompt. Using the Python console technique I just mentioned, I tried checking the parasites on a newly created image and on an image read in from an existing JPG file, then saving each one as JPG and checking the parasite list afterward.

Bingo! When you read in a JPG file, it has a parasite called 'jpeg-settings'. (The new image doesn't have this, naturally). But after you write a file to JPG from within GIMP, it has not only 'jpeg-settings' but also a second parasite, 'jpeg-save-options'.

So I made the plug-in check the scaled image after saving it, looking for any parasites with names ending in either -settings or -save-options; any such parasites are copied to the original image. Then, the next time you invoke Export Scaled, it does the same search, and copies those parasites to the scaled image before calling gimp-file-save.

That darned invisible JPG settings dialog

One niggling annoyance remained. The first time you get the JPG settings dialog, it pops up invisibly, under the Export dialog you're using. So if you didn't know to look for it by moving the dialog, you'd think the plug-in had frozen. GIMP 2.6 had a bug where that happened every time I saved, so I assumed there was nothing I can do about it.

GIMP 2.8 has fixed that bug -- yet it still happened when my plug-in called gimp_file_save: the JPG dialog popped up under the currently active dialog, at least under Openbox.

There isn't any way to pass window IDs through gimp_file_save so the JPG dialog pops up as transient to a particular window. But a few days after I wrote the export-scaled, I realized there was still something I could do: hide the dialog when the user clicks Save. Then make sure that I show it again if any errors occur during saving.

Of course, it wasn't quite that simple. Calling chooser.hide() by itself does nothing, because X is asynchronous and things don't happen in any predictable order. But it's possible to force X to sync the display: chooser.get_display().sync().

I'm not sure how robust this is going to be -- but it seems to work well in the testing I've done so far, and it's really nice to get that huge GTK file chooser dialog out of the way as soon as possible.

Tags: , ,
[ 18:34 Sep 02, 2012    More gimp | permalink to this entry | ]

Tue, 21 Aug 2012

GIMP: Re-uniting Save and Export

In GIMP 2.8, the developers changed the way you save files. "Save" is now used only for GIMP's native format, XCF (and compressed variants like .xcf.gz and .xcf.bz2). Other formats that may lose information on layers, fonts and other aspects of the edited image must be "Exported" rather than saved.

This has caused much consternation and flameage on the gimp-user mailing list, especially from people who use GIMP primarily for simple edits to JPEG or PNG files.

I don't particularly like the new model myself. Sometimes I use GIMP in the way the developers are encouraging, adding dozens of layers, fonts, layer masks and other effects. Much more often, I use GIMP to crop and rescale a handful of JPG photos I took with my camera on a hike. While I found it easy enough to adapt to using Ctrl-E (Export) instead of Ctrl-S (Save), it was annoying that when I exited the app, I'd always get am "Unsaved images" warning, and it was impossible to tell from the warning dialog which images were safely exported and which might not have been saved or exported at all.

But flaming on the mailing lists, much as some people seem to enjoy it (500 messages on the subject and still counting!) wasn't the answer. The developers have stated very clearly that they're not going to change the model back. So is there another solution?

Yes -- a very simple solution, in fact. Write a plug-in that saves or exports the current image back to its current file name, then marks it as clean so GIMP won't warn about it when you quit.

It turned out to be extremely easy to write, and you can get it here: GIMP: Save/export clean plug-in. If it suits your GIMP workflow, you can even bind it to Ctrl-S ... or any other key you like.

Warning: I deliberately did not add any "Are you sure you want to overwrite?" confirmation dialogs. This plug-in will overwrite your current file, without asking for permission. After all, that's its job. So be aware of that.

How it's written

Here are some details about how it works. Non software geeks can skip the rest of this article.

When I first looked into writing this, I was amazed at how simple it was: really just two lines of Python (plus the usual plug-in registration boilerplate).

    pdb.gimp_file_save(img, drawable, img.filename, img.filename)
    pdb.gimp_image_clean_all(img)

The first line saves the image back to its current filename. (The gimp-file-save PDB call still handles all types, not just XCF.) The second line marks the image as clean.

Both of those are PDB calls, which means that people who don't have GIMP Python could write script-fu to do this part.

So why didn't I use script-fu? Because I quickly found that if I bound the plug-in to Ctrl-S, I'd want to use it for new images -- images that don't have a filename yet. And for that, you need to pop up some sort of "Save as" dialog -- something Python can do easily, and Script-fu can't do at all.

A Save-as dialog with smart directory default

I couldn't use the standard GIMP save-as dialog: as far as I can tell, there's no way to call that dialog from a plug-in. But it turned out the GTK save-as dialog has no default directory to start in: you have to set the starting directory every time. So I needed a reasonable initial directory.

I didn't want to come up with some desktop twaddle like ~/My Pictures or whatever -- is there really anyone that model fits? Certainly not me. I debated maintaining a preference you could set, or saving the last used directory as a preference, but that complicates things and I wasn't sure it's really that helpful for most people anyway.

So I thought about where I usually want to save images in a GIMP session. Usually, I want to save them to the same directory where I've been saving other images in the same session, right?

I can figure that out by looping through all currently open images with for img in gimp.image_list() : and checking os.path.dirname(img.filename) for each one. Keep track of how many times each directory is being used; whichever is used the most times is probably where the user wants to store the next image.

Keeping count in Python

Looping through is easy, but what's the cleanest, most Pythonic way of maintaining the count for each directory and finding the most popular one? Naturally, Python has a class for that, collections.Counter.

Once I've counted everything, I can ask for the most common path. The code looks a bit complicated because most_common(1) returns a one-item list of a tuple of the single most common path and the number of times it's been used -- for instance, [ ('/home/akkana/public_html/images/birds', 5) ]. So the path is the first element of the first element, or most_common(1)[0][0]. Put it together:

    counts = collections.Counter()
    for img in gimp.image_list() :
        if img.filename :
            counts[os.path.dirname(img.filename)] += 1
    try :
        return counts.most_common(1)[0][0]
    except :
        return None

So that's the only tricky part of this plug-in. The rest is straightforward, and you can read the code on GitHub: save-export-clean.py.

Tags: , ,
[ 12:26 Aug 21, 2012    More gimp | permalink to this entry | ]

Sat, 09 Jun 2012

Viewing and modifying epub ebook tags

My epub Books folder is starting to look like my physical bookshelf at home -- huge and overflowing with books I hope to read some day. Mostly free books from the wonderful Project Gutenberg and DRM-free books from publishers and authors who support that model.

With the Nook's standard library viewer that's impossible to manage. All you can do is sort all those books alphabetically by title or author and laboriously page through, some five books to a page, hoping the one you want will catch your eye. Worse, sometimes books show up in the author view but don't show up in the title view, or vice versa. I guess Barnes & Noble think nobody keeps more than ten or so books on their shelves.

Fortunately on my rooted Nook I have the option of using better readers, like FBreader and Aldiko, that let me sort by tags. If I want to read something about the Civil War, or Astronomy, or just relax with some Science Fiction, I can browse by keyword.

Well, in theory. In practice, tagging of ebooks is inconsistent and not very useful.

For instance, the Gutenberg tags for Othello are:

while the tags for Vanity Fair are

The Prince and the Pauper's tag list looks like:

while Captains Courageous looks like

I can understand wanting to tag details like this, but few of those tags are helpful when I'm browsing books on my little handheld device. I can't imagine sitting down to read and thinking, "Let's see, what books do I have on Interracial marriage? Or Saltwater fishing? No, on second thought I'd rather read some fiction set in the time of Edward VI, King of England, 1537-1553."

And of course, with over 90 books loaded on my ebook readers, it means I have hundreds of entries in my tags list, with few of them including more than one book.

Clearly what I needed to do was to change the tags on my ebooks.

Viewing and modifying epub tags

That ought to be simple, right? But ebooks are still a very young technology, and there's surprisingly little software devoted to them. Calibre can probably do it if you don't mind maintaining your whole book collection under calibre; but I like to be able to work on files one at a time or in small groups. And I couldn't find a program that would let me do that.

What to do? Well, epub is a fairly simple XML format, right? So modifying it with Python shouldn't that hard.

Managing epub in Python

An epub file is a collection of XML files packaged in a zip archive. So I unzipped one of my epub books and poked around. I found the tags in a file called content.opf, inside a <metadata> tag. They look like this:

<dc:subject>Science fiction</dc:subject>

So I could use Python's zipfile module to access the content.opf file inside the zip archive, then use the xml.dom.minidom parser to get to the tags. Writing a script to display existing tags was very easy.

What about replacing the old, unweildy tag list with new, simple tags?

It's easy enough to add nodes in Python's minidom. So the trick is writing it back to the epub file. The zipfile module doesn't have a way to modify a zip file in place, so I created a new zip archive and copied files from the old archive to the new one, replacing content.opf with a new version.

Python's difficulty with character sets in XML

But I hit a snag in writing the new content.opf. Python's XML classes have a toprettyxml() method to write the contents of a DOM tree. Seemed simple, and that worked for several ebooks ... until I hit one that contained a non-ASCII character. Then Python threw a UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 606: ordinal not in range(128).

Of course, there are ways (lots of them) to encode that output string -- I could do

ozf.writestr(info, dom.toprettyxml().encode(encoding, 'xmlcharrefreplace'))
, or
writestr(info, dom.toprettyxml(encoding=encoding)
Except ... what should I pass as the encoding? The content.opf file started with its encoding:
<?xml version='1.0' encoding='UTF-8'?>
but Python's minidom offers no way to get that information. In fact, none of Python's XML parsers seem to offer this.

Since you need a charset to avoid the UnicodeEncodeError, the only options are (1) always use a fixed charset, like utf-8, for content.opf, or (2) open content.opf and parse the charset line by hand after Python has already parsed the rest of the file. Yuck! So I chose the first option ... I can always revisit that if the utf-8 in content.opf ever causes problems.

The final script

Charset difficulties aside, though, I'm quite pleased with my epubtags.py script. It's very handy to be able to print tags on any .epub file, and after cleaning up the tags on my ebooks, it's great to be able to browse by category in FBreader. Here's the program: epubtag.py.

Tags: , ,
[ 13:05 Jun 09, 2012    More programming | permalink to this entry | ]

Sat, 26 May 2012

Use stdeb to make Debian packages for a Python package

I write a lot of little Python scripts. And I use Ubuntu and Debian. So why aren't any of my scripts packaged for those distros?

Because Debian packaging is absurdly hard, and there's very little documentation on how to do it. In particular, there's no help on how to take something small, like a Python script, and turn it into a package someone else could install on a Debian system. It's pretty crazy, since RPM packaging of Python scripts is so easy.

Recently at the Ubuntu Developers' Summit, Asheesh of OpenHatch pointed me toward a Python package called stdeb that simplifies a lot of the steps and makes Python packaging fairly straightforward.

You'll need a setup.py file to describe your Python script, and you'll probably want a .desktop file and an icon. If you haven't done that before, see my article on Packaging Python for MeeGo for some hints.

Then install python-stdeb. The package has some requirements that aren't listed as dependencies, so you'll need to install:

apt-get install python-stdeb fakeroot python-all
(I have no idea why it needs python-all, which installs only a directory /usr/share/doc/python-all with some policy documentation files, but if you don't install it, stdeb will fail later.)

Now create a config file for stdeb to tell it what Debian/Ubuntu version you're going to be targeting, if it's anything other than Debian unstable (stdeb's default). Unfortunately, there seems to be no way to pass this on the command line rather than in a config file. So if you want to make packages for several distros, you'll have to edit the config file for every distro you want to support. Here's what I'm using for Ubuntu 12.04 Precise Pangolin:

[DEFAULT]
Suite: precise

Now you're ready to run stdeb. I know of two ways to run it. You can generate both source and binary packages, like this:

python setup.py --command-packages=stdeb.command bdist_deb
Or you can generate source packages only, like this:
python setup.py --command-packages=stdeb.command sdist_dsc

Either syntax creates a directory called deb_dist. It contains a lot of files including a source .dsc, several tarballs, a copy of your source directory, and (if you used bdist_deb) a binary .deb package.

If you used the bdist_deb form, don't be put off that it concludes with a message:

dpkg-buildpackage: binary only upload (no source included)
It's fibbing: the source .dsc is there as well as the binary .deb. I presume it prints the warning because it creates them as separate steps, and the binary is the last step.

Now you can use dpkg -i to install your binary deb, or you can use the source dsc for various purposes, like creating a repository or a Launchpad PPA. But those involve a lot more steps -- so I'll cover that in a separate article about creating PPAs.

Update: you can find that article here: Creating packages for a Launchpad PPA.

Tags: , , , ,
[ 11:44 May 26, 2012    More programming | permalink to this entry | ]

Fri, 27 Apr 2012

Venus is at its brightest -- why? And how to calculate it

Venus has been a beautiful sight in the evening sky for months, but at the end of April it's reaching a brightness peak, magnitude -4.7.

By then, if you look at it in a telescope or even good binoculars, you'll see it has waned to a crescent. That's a bit non-obvious: when the moon is a crescent, it's a lot fainter than a full moon. So why is Venus brightest in its crescent phase?

It has to do with their orbits. The moon is always about the same distance away, about 385,000 km or 239,000 miles (I've owned cars with more miles than that!), though it varies a little, from 362,600 km at perigee to 405,400 km at apogee.

When we look at the full moon, not only are we seeing the whole Earth-facing surface illuminated, but the central part of that light is reflecting straight up off the moon's surface. When we look at a crescent moon, we're seeing light that's near the moon's sunrise or sunset point -- dimmer and more spread out than the concentrated light of noon -- and in addition we're seeing less of it.

Venus, in contrast, varies its distance from us immensely. We can't see Venus when it's "full", because it's on the other side of the sun from us and lost in the sun's glow. It'll next be there a year from now, in April of 2013. But if we could see it when it's full, Venus would be a distant 1.7 AU from us. An AU is an Astronomical Unit, the average distance of the earth from the sun or about 89 million miles, so Venus when it's full is about 170 million miles away. Its disk is a tiny 9.9 arcseconds (an arcsecond is 1/3600 of a degree) -- about the size of Mars this month.

In contrast, when we look at the crescent Venus around the end of this month, although we're only seeing about 28% of its surface illuminated, and that only with glancing twilight rays, it's much closer to us -- less than half an AU, or about 45 million miles -- and its disk extends a huge 37 arcseconds, bigger than Jupiter this month.

Of course, eventually, as Venus pulls between us and the sun, its crescent gets so slim that even its huge size can't compensate. So its peak brightness happens when those two curves cross, when the disk is somewhere around 27% illuminated, as happens at the end of this month and the beginning of May.

Exactly when? Good question. The RASC Handbook says Venus' "greatest illuminated extent" is on April 30, but PyEphem and XEphem say Venus is actually brighter from May 3-8 ... and when it emerges from the sun's glare and moves into the morning sky in June, it'll be slightly brighter still, peaking at magnitude -4.8 in the first week of July.)

Tracking Venus with PyEphem

When I started my Shallow Sky column this month, I saw the notice of Venus's maximum brightness and greatest illuminated extent in the RASC Handbook. But I wanted more details -- how much did its distance and size really change, when would the brightness peak again as it emerged from the sun's glare, when would it next be "full"?

PyEphem made it easy to calculate all this. Just create an ephem.Venus() object, calculate its values for any date of interest, then print out parameters like phase, mag, earth_distance and size. In just a few minutes of programming, I had a nice table of Venus data.

import ephem

venus = ephem.Venus()

print '%10s   %6s %6s %6s %6s' % ('date', '%', 'mag', 'dist', 'size')
def print_venus(when) :
    venus.compute(when)
    fmt = '%02d-%02d-%02d   %6.2f %6.2f %6.2f %6.2f'
    trip = when.triple()
    print fmt % (trip[0], trip[1], trip[2],
                 venus.phase, venus.mag, venus.earth_distance, venus.size)

# Loop from the beginning of 2012 through the middle of 2013:
d = ephem.date('2012')
end_date = ephem.date('2013/6/1')
while d < end_date :
    print_venus(d)
    # Add a week:
    d = ephem.date(d + ephem.hour * 24)

I've found PyEphem very handy for calculations like this -- and it's great to be able to double-check listings in other publications.

Tags: , ,
[ 14:44 Apr 27, 2012    More science/astro | permalink to this entry | ]

Fri, 16 Mar 2012

Image manipulation in Python

Someone asked me about determining whether an image was "portrait" or "landscape" mode from a script.

I've long had a script for automatically rescaling and rotating images, using ImageMagick under the hood and adjusting automatically for aspect ratio. But the scripts are kind of a mess -- I've been using them for over a decade, and they started life as a csh script back in the pnmscale days, gradually added ImageMagick and jpegtran support and eventually got translated to (not very good) Python.

I've had it in the back of my head that I should rewrite this stuff in cleaner Python using the ImageMagick bindings, rather than calling its commandline tools. So the question today spurred me to look into that. I found that ImageMagick isn't the way to go, but PIL would be a fine solution for most of what I need.

ImageMagick: undocumented and inconstant

Ubuntu has a python-pythonmagick package, which I installed. Unfortunately, it has no documentation, and there seems to be no web documentation either. If you search for it, you find a few other people asking where the documentation is.

Using things like help(PythonMagick) and help(PythonMagick.Image), you can ferret out a few details, like how to get an image's size:

import PythonMagick
filename = 'img001.jpg'
img = PythonMagick.Image(filename)
size = img.size()
print filename, "is", size.width(), "x", size.height()

Great. Now what if you want to rescale it to some other size? Web searching found examples of that, but it doesn't work, as illustrated here:

>>> img.scale('1024x768')
>>> img.size().height()
640

The built-in help was no help:

>>> help(img.scale)
Help on method scale:

scale(...) method of PythonMagick.Image instance
    scale( (Image)arg1, (Geometry)arg2) -> None :
    
        C++ signature :
            void scale(Magick::Image {lvalue},Magick::Geometry)

So what does it want for (Geometry)? Strings don't seem to work, 2-tuples don't work, and there's no Geometry object in PythonMagick. By this time I was tired of guesswork. Can the Python Imaging Library do better?

PIL -- the Python Imaging Library

PIL, happily, does have documentation. So it was easy to figure out how to get an image's size:

from PIL import Image
im = Image.open(filename)
w = im.size[0]
h = im.size[1]
print filename, "is", w, "x", h
It was equally easy to scale it to half its original size, then write it to a file:
newim = im.resize((w/2, h/2))
newim.save("small-" + filename)

Reading EXIF

Wow, that's great! How about EXIF -- can you read that? Yes, PIL has a module for that too:

import PIL.ExifTags

exif = im._getexif()
for tag, value in exif.items():
    decoded = PIL.ExifTags.TAGS.get(tag, tag)
    print decoded, '->', value

There are other ways to read exif -- pyexiv2 seems highly regarded. It has documentation, a tutorial, and apparently it can even write EXIF tags.

If neither PIL nor pyexiv2 meets your needs, here's a Stack Overflow thread on other Python EXIF solutions, and here's another discussion of Python EXIF. But since you probably already have PIL, it's certainly an easy way to get started.

What about the query that started all this: how to find out whether an image is portrait or landscape? Well, the most important thing is the image dimensions themselves -- whether img.size[0] > img.size[1]. But sometimes you want to know what the camera's orientation sensor thought. For that, you can use this code snippet:

for tag, value in exif.items():
    decoded = PIL.ExifTags.TAGS.get(tag, tag)
    if decoded == 'Orientation':
        print decoded, ":", value
Then compare the number you get to this Exif Orientation table. Normal landscape-mode photos will be 1.

Given all this, have I actually rewritten resizeall and rotateall using PIL? Why, no! I'll put it on my to-do list, honest. But since the scripts are actually working fine (just don't look at the code), I'll leave them be for now.

Tags: , , , ,
[ 15:33 Mar 16, 2012    More programming | permalink to this entry | ]

Tue, 17 Jan 2012

Converting HTML pages into PDF

I've long wanted a way of converting my HTML presentation slides to PDF. Mostly because conference organizers tend to freak out at slides in any format other than Open/Libre Office, Powerpoint or PDF. Too many times, I've submitted a tarball of my slides and discovered they weren't even listed on the conference site. (I ask you, in what way is a tarball more difficult to deal with than an .odp file?) Slide-sharing websites also have a limited set of formats they'll accept.

A year or so ago, I added screenshot capability to my webkit-based presentation program, Preso, do "screenshots", but I really needed PDF, not images.

Now, creating PDF from HTML shouldn't be that hard. Every browser has a print function that can print to a PDF file. So why is it so hard to create PDF from HTML in any kind of scriptable way?

After much searching and experimenting, I finally found a Python code snippet that worked: XHTML to PDF using PyGTK4 Webkit from Alex Dong. It uses Python-Qt, not GTK, so I can't integrate it into my Preso app, but that's okay -- a separate tool is just as good.

(I struggled to write an equivalent in PyGTK, but gave up due to the complete lack of documentation of Python-Webkit-GTK, and not much more for gtk.PrintOperation(). QWebView's documentation may not be as complete as I'd like, but at least there is some.)

Printing from QtWebView to QPrinter

Here are the important things I learned about QWebView from fiddling around with Alex's code to adapt it to what I needed, which is printing a list of pages to sequentially numbered files:

Things I learned about QPrinter():

Anyway, it's a little hacky with that empirical zoom factor ... but it works! The program is here: qhtmlprint: convert HTML to PDF using Qt Webkit.

And it does produce reasonable PDF, with the text properly vectorized, not just raster screenshots of each page.

Printing the slides in the right order

Terrific -- now I can feed a list of slides to qhtmlprint and get a bunch of PDF files back. How do I print the right slides?

My slides are listed in order in an array inside a Javascript file, one per line. If I grep .html navigate.js, I get a list like this:

    "arduino.html",
    "img.html?pix/arduinos/arduino-clones.jpg",
    "getting_started.html",
    "img.html?pix/projects/led.jpg",
    //"blink.html",
    "arduino-ide.html",

To pass that to qhtmlprint, I only need to remove the commented-out lines (the ones with //) and strip off the quotes and commas. I can do that all in one command with a grep and sed pipeline:

qhtmlprint ` fgrep .html navigate.js  | grep -v // | sed -e 's/",/"/' -e 's/"//g' `

And voiaà! I have a bunch of fileNNN.pdf files.

Creating a multi-page slide deck

Okay, great! Now how do I stick those files all together into one slide deck I can submit to conference organizers?

That part's easy -- Ghostscript can do it.

gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=slidedeck.pdf -dBATCH file*.pdf

And now slidedeck.pdf contains my whole presentation, ready to go.

Tags: , ,
[ 12:16 Jan 17, 2012    More programming | permalink to this entry | ]

Sun, 08 Jan 2012

Parsing HTML in Python

I've been having (mis)adventures learning about Python's various options for parsing HTML.

Up until now, I've avoided doing any HTMl parsing in my RSS reader FeedMe. I use regular expressions to find the places where content starts and ends, and to screen out content like advertising, and to rewrite links. Using regexps on HTML is generally considered to be a no-no, but it didn't seem worth parsing the whole document just for those modest goals.

But I've long wanted to add support for downloading images, so you could view the downloaded pages with their embedded images if you so chose. That means not only identifying img tags and extracting their src attributes, but also rewriting the img tag afterward to point to the locally stored image. It was time to learn how to parse HTML.

Since I'm forever seeing people flamed on the #python IRC channel for using regexps on HTML, I figured real HTML parsing must be straightforward. A quick web search led me to Python's built-in HTMLParser class. It comes with a nice example for how to use it: define a class that inherits from HTMLParser, then define some functions it can call for things like handle_starttag and handle_endtag; then call self.feed(). Something like this:

from HTMLParser import HTMLParser

class MyFancyHTMLParser(HTMLParser):
  def fetch_url(self, url) :
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    link = response.geturl()
    html = response.read()
    response.close()
    self.feed(html)   # feed() starts the HTMLParser parsing

  def handle_starttag(self, tag, attrs):
    if tag == 'img' :
      # attrs is a list of tuples, (attribute, value)
      srcindex = self.has_attr('src', attrs)
      if srcindex < 0 :
        return   # img with no src tag? skip it
      src = attrs[srcindex][1]
      # Make relative URLs absolute
      src = self.make_absolute(src)
      attrs[srcindex] = (attrs[srcindex][0], src)

    print '<' + tag
    for attr in attrs :
      print ' ' + attr[0]
      if len(attr) > 1 and type(attr[1]) == 'str' :
        # make sure attr[1] doesn't have any embedded double-quotes
        val = attr[1].replace('"', '\"')
        print '="' + val + '"')
    print '>'

  def handle_endtag(self, tag):
    self.outfile.write('</' + tag.encode(self.encoding) + '>\n')

Easy, right? Of course there are a lot more details, but the basics are simple.

I coded it up and it didn't take long to get it downloading images and changing img tags to point to them. Woohoo! Whee!

The bad news about HTMLParser

Except ... after using it a few days, I was hitting some weird errors. In particular, this one:
HTMLParser.HTMLParseError: bad end tag: ''

It comes from sites that have illegal content. For instance, stories on Slate.com include Javascript lines like this one inside <script></script> tags:
document.write("<script type='text/javascript' src='whatever'></scr" + "ipt>");

This is technically illegal html -- but lots of sites do it, so protesting that it's technically illegal doesn't help if you're trying to read a real-world site.

Some discussions said setting self.CDATA_CONTENT_ELEMENTS = () would help, but it didn't.

HTMLParser's code is in Python, not C. So I took a look at where the errors are generated, thinking maybe I could override them. It was easy enough to redefine parse_endtag() to make it not throw an error (I had to duplicate some internal strings too). But then I hit another error, so I redefined unknown_decl() and _scan_name(). And then I hit another error. I'm sure you see where this was going. Pretty soon I had over 100 lines of duplicated code, and I was still getting errors and needed to redefine even more functions. This clearly wasn't the way to go.

Using lxml.html

I'd been trying to avoid adding dependencies to additional Python packages, but if you want to parse real-world HTML, you have to. There are two main options: Beautiful Soup and lxml.html. Beautiful Soup is popular for large projects, but the consensus seems to be that lxml.html is more error-tolerant and lighter weight.

Indeed, lxml.html is much more forgiving. You can't handle start and end tags as they pass through, like you can with HTMLParser. Instead you parse the HTML into an in-memory tree, like this:

  tree = lxml.html.fromstring(html)

How do you iterate over the tree? lxml.html is a good parser, but it has rather poor documentation, so it took some struggling to figure out what was inside the tree and how to iterate over it.

You can visit every element in the tree with

for e in tree.iter() :
  print e.tag

But that's not terribly useful if you need to know which tags are inside which other tags. Instead, define a function that iterates over the top level elements and calls itself recursively on each child.

The top of the tree itself is an element -- typically the <html></html> -- and each element has .tag and .attrib. If it contains text inside it (like a <p> tag), it also has .text. So to make something that works similarly to HTMLParser:

def crawl_tree(tree) :
  handle_starttag(tree.tag, tree.attrib)
  if tree.text :
    handle_data(tree.text)
  for node in tree :
    crawl_tree(node)
  handle_endtag(tree.tag)

But wait -- we're not quite all there. You need to handle two undocumented cases.

First, comment tags are special: their tag attribute, instead of being a string, is <built-in function Comment> so you have to handle that specially and not assume that tag is text that you can print or test against.

Second, what about cases like <p>Here is some <i>italicised</i> text.</p> ? in this case, you have the p tag, and its text is "Here is some ". Then the p has a child, the i tag, with text of "italicised". But what about the rest of the string, " text."?

That's called a tail -- and it's the tail of the adjacent i tag it follows, not the parent p tag that contains it. Confusing!

So our function becomes:

def crawl_tree(tree) :
  if type(tree.tag) is str :
    handle_starttag(tree.tag, tree.attrib)
    if tree.text :
      handle_data(tree.text)
    for node in tree :
      crawl_tree(node)
    handle_endtag(tree.tag)
  if tree.tail :
    handle_data(tree.tail)

See how it works? If it's a comment (tree.tag isn't a string), we'll skip everything -- except the tail. Even a comment might have a tail:
<p>Here is some <!-- this is a comment --> text we want to show.</p>
so even if we're skipping comment we need its tail.

I'm sure I'll find other gotchas I've missed, so I'm not releasing this version of feedme until it's had a lot more testing. But it looks like lxml.html is a reliable way to parse real-world pages. It even has a lot of convenience functions like link rewriting that you can use without iterating the tree at all. Definitely worth a look!

Tags: , ,
[ 15:04 Jan 08, 2012    More programming | permalink to this entry | ]

Thu, 29 Dec 2011

Plotting the Analemma

My SJAA planet-observing column for January is about the Analemma and the Equation of Time.

The analemma is that funny figure-eight you see on world globes in the middle of the Pacific Ocean. Its shape is the shape traced out by the sun in the sky, if you mark its position at precisely the same time of day over the course of an entire year.

The analemma has two components: the vertical component represents the sun's declination, how far north or south it is in our sky. The horizontal component represents the equation of time.

The equation of time describes how the sun moves relatively faster or slower at different times of year. It, too, has two components: it's the sum of two sine waves, one representing how the earth speeds up and slows down as it moves in its elliptical orbit, the other a function the tilt (or "obliquity") of the earth's axis compared to its orbital plane, the ecliptic.

[components of the Equation of time] The Wikipedia page for Equation of time includes a link to a lovely piece of R code by Thomas Steiner showing how the two components relate. It's labeled in German, but since the source is included, I was able to add English labels and use it for my article.

But if you look at photos of real analemmas in the sky, they're always tilted. Shouldn't they be vertical? Why are they tilted, and how does the tilt vary with location? To find out, I wanted a program to calculate the analemma.

Calculating analemmas in PyEphem

The very useful astronomy Python package PyEphem makes it easy to calculate the position of any astronomical object for a specific location. Install it with: easy_install pyephem for Python 2, or easy_install ephem for Python 3.

import ephem
observer = ephem.city('San Francisco')
sun = ephem.Sun()
sun.compute(observer)
print sun.alt, sun.az

The alt and az are the altitude and azimuth of the sun right now. They're printed as strings: 25:23:16.6 203:49:35.6 but they're actually type 'ephem.Angle', so float(sun.alt) will give you a number in radians that you can use for calculations.

Of course, you can specify any location, not just major cities. PyEphem doesn't know San Jose, so here's the approximate location of Houge Park where the San Jose Astronomical Association meets:

observer = ephem.Observer()
observer.name = "San Jose"
observer.lon = '-121:56.8'
observer.lat = '37:15.55'

You can also specify elevation, barometric pressure and other parameters.

So here's a simple analemma, calculating the sun's position at noon on the 15th of each month of 2011:

    for m in range(1, 13) :
        observer.date('2011/%d/15 12:00' % (m))
        sun.compute(observer)

I used a simple PyGTK window to plot sun.az and sun.alt, so once it was initialized, I drew the points like this:

    # Y scale is 45 degrees (PI/2), horizon to halfway to zenith:
    y = int(self.height - float(self.sun.alt) * self.height / math.pi)
    # So make X scale 45 degrees too, centered around due south.
    # Want az = PI to come out at x = width/2.
    x = int(float(self.sun.az) * self.width / math.pi / 2)
    # print self.sun.az, float(self.sun.az), float(self.sun.alt), x, y
    self.drawing_area.window.draw_arc(self.xgc, True, x, y, 4, 4, 0, 23040)

So now you just need to calculate the sun's position at the same time of day but different dates spread throughout the year.

[analemma in San Jose at noon clock time] And my 12-noon analemma came out almost vertical! Maybe the tilt I saw in analemma photos was just a function of taking the photo early in the morning or late in the afternoon? To find out, I calculated the analemma for 7:30am and 4:30pm, and sure enough, those were tilted.

But wait -- notice my noon analemma was almost vertical -- but it wasn't exactly vertical. Why was it skewed at all?

Time is always a problem

As always with astronomy programs, time zones turned out to be the hardest part of the project. I tried to add other locations to my program and immediately ran into a problem.

The ephem.Date class always uses UTC, and has no concept of converting to the observer's timezone. You can convert to the timezone of the person running the program with localtime, but that's not useful when you're trying to plot an analemma at local noon.

At first, I was only calculating analemmas for my own location. So I set time to '20:00', that being the UTC for my local noon. And I got the image at right. It's an analemma, all right, and it's almost vertical. Almost ... but not quite. What was up?

Well, I was calculating for 12 noon clock time -- but clock time isn't the same as mean solar time unless you're right in the middle of your time zone.

You can calculate what your real localtime is (regardless of what politicians say your time zone should be) by using your longitude rather than your official time zone:

    date = '2011/%d/12 12:00' % (m)
    adjtime = ephem.date(ephem.date(date) \
                    - float(self.observer.lon) * 12 / math.pi * ephem.hour)
    observer.date = adjtime

Maybe that needs a little explaining. I take the initial time string, like '2011/12/15 12:00', and convert it to an ephem.date. The number of hours I want to adjust is my longitude (in radians) times 12 divided by pi -- that's because if you go pi (180) degrees to the other side of the earth, you'll be 12 hours off. Finally, I have to multiply that by ephem.hour because ... um, because that's the way to add hours in PyEphem and they don't really document the internals of ephem.Date.

[analemma in San Jose at noon clock time] Set the observer date to this adjusted time before calculating your analemma, and you get the much more vertical figure you see here. This also explains why the morning and evening analemmas weren't symmetrical in the previous run.

This code is location independent, so now I can run my analemma program on a city name, or specify longitude and latitude.

PyEphem turned out to be a great tool for exploring analemmas. But to really understand analemma shapes, I had more exploring to do. I'll write about that, and post my complete analemma program, in the next article.

Tags: , , , , ,
[ 20:54 Dec 29, 2011    More science/astro | permalink to this entry | ]

Thu, 22 Dec 2011

Calculating the Solstice and shortest day

Today is the winter solstice -- the official beginning of winter.

The solstice is determined by the Earth's tilt on its axis, not anything to do with the shape of its orbit: the solstice is the point when the poles come closest to pointing toward or away from the sun. To us, standing on Earth, that means the winter solstice is the day when the sun's highest point in the sky is lowest.

You can calculate the exact time of the equinox using the handy Python package PyEphem. Install it with: easy_install pyephem for Python 2, or easy_install ephem for Python 3. Then ask it for the date of the next or previous equinox. You have to give it a starting date, so I'll pick a date in late summer that's nowhere near the solstice:

>>> ephem.next_solstice('2011/8/1')
2011/12/22 05:29:52
That agrees with my RASC Observer's Handbook: Dec 22, 5:30 UTC. (Whew!)

PyEphem gives all times in UTC, so, since I'm in California, I subtract 8 hours to find out that the solstice was actually last night at 9:30. If I'm lazy, I can get PyEphem to do the subtraction for me:

ephem.date(ephem.next_solstice('2011/8/1') - 8./24)
2011/12/21 21:29:52
I used 8./24 because PyEphem's dates are in decimal days, so in order to subtract 8 hours I have to convert that into a fraction of a 24-hour day. The decimal point after the 8 is to get Python to do the division in floating point, otherwise it'll do an integer division and subtract int(8/24) = 0.

The shortest day

The winter solstice also pretty much marks the shortest day of the year. But was the shortest day yesterday, or today? To check that, set up an "observer" at a specific place on Earth, since sunrise and sunset times vary depending on where you are. PyEphem doesn't know about San Jose, so I'll use San Francisco:

>>> import ephem
>>> observer = ephem.city("San Francisco")
>>> sun = ephem.Sun()
>>> for i in range(20,25) :
...   d = '2011/12/%i 20:00' % i
...   print d, (observer.next_setting(sun, d) - observer.previous_rising(sun, d)) * 24
2011/12/20 20:00 9.56007901422
2011/12/21 20:00 9.55920379754
2011/12/22 20:00 9.55932991847
2011/12/23 20:00 9.56045709446
2011/12/24 20:00 9.56258416496
I'm multiplying by 24 to get hours rather than decimal days.

So the shortest day, at least here in the bay area, was actually yesterday, 2011/12/21. Not too surprising, since the solstice wasn't that long after sunset yesterday.

If you look at the actual sunrise and sunset times, you'll find that the latest sunrise and earliest sunset don't correspond to the solstice or the shortest day. But that's all tied up with the equation of time and the analemma ... and I'll cover that in a separate article.

Tags: , , , ,
[ 11:28 Dec 22, 2011    More science/astro | permalink to this entry | ]

Wed, 16 Nov 2011

New trails, and new PyTopo 1.1 release

A new trail opened up above Alum Rock park! Actually a whole new open space preserve, called Sierra Vista -- with an extensive set of trails that go all sorts of interesting places.

Dave and I visit Alum Rock frequently -- we were married there -- so having so much new trail mileage is exciting. We tried to explore it on foot, but quickly realized the mileage was more suited to mountain bikes. Even with bikes, we'll be exploring this area for a while (mostly due to not having biked in far too long, so it'll take us a while to work up to that much riding ... a combination of health problems and family issues have conspired to keep us off the bikes).

Of course, part of the fun of discovering a new trail system is poring over maps trying to figure out where the trails will take us, then taking GPS track logs to study later to see where we actually went.

And as usual when uploading GPS track logs and viewing them in pytopo, I found some things that weren't working quite the way I wanted, so the session ended up being less about studying maps and more about hacking Python.

In the end, I fixed quite a few little bugs, improved some features, and got saved sites with saved zoom levels working far better.

Now, PyTopo 1.0 happened quite a while ago -- but there were two of us hacking madly on it at the time, and pinning down the exact time when it should be called 1.0 wasn't easy. In fact, we never actually did it. I know that sounds silly -- of all releases to not get around to, finally reaching 1.0? Nevertheless, that's what happened.

I thought about cheating and calling this one 1.0, but we've had 1.0 beta RPMs floating around for so long (and for a much earlier release) that that didn't seem right.

So I've called the new release PyTopo 1.1. It seems to be working pretty solidly. It's certainly been very helpful to me in exploring the new trails. It's great for cross-checking with Google Earth: the OpenCycleMap database has much better trail data than Google does, and pytopo has easy track log loading and will work offline, while Google has the 3-D projection aerial imagery that shows where trails and roads were historically (which may or may not correspond to where they decide to put the new trails). It's great to have both.

Anyway, here's the new PyTopo.

Tags: , , , ,
[ 20:59 Nov 16, 2011    More mapping | permalink to this entry | ]

Sun, 16 Oct 2011

Monitor an Arduino's serial output from Python

Debugging Arduino sensors can sometimes be tricky. While working on my Arduino sonar project, I found myself wanting to know what values the Arduino was reading from its analog port.

It's easy enough to print from the Arduino to its USB-serial line. First add some code like this in setup():

    Serial.begin(9600);
Then in loop(), if you just read the value "val":
    Serial.println(val);

Serial output from Python

That's all straightforward -- but then you need something that reads it on the PC side.

When you're using the Arduino Java development environment, you can set it up to display serial output in a few lines at the bottom of the window. But it's not terrifically easy to read there, and I don't want to be tied to the Java IDE -- I'm much happier doing my Arduino development from the command line. But then how do you read serial output when you're debugging? In general, you can use the screen program to talk to serial ports -- it's the tool of choice to log in to plug computers. For the Arduino, you can do something like this: screen /dev/ttyUSB0 9600

But I found that a bit fiddly for various reasons. And I discovered that it's easy to write something like this in Python, using the serial module.

You can start with something as simple as this:

import serial

ser = serial.Serial("/dev/ttyUSB0", 9600)
while True:
    print ser.readline()

Serial input as well as output

That worked great for debugging purposes. But I had another project (which I will write up separately) where I needed to be able to send commands to the Arduino as well as reading output it printed. How do you do both at once?

With the select module, you can monitor several file descriptors at once. If the user has typed something, send it over the serial line to the Arduino; if the Arduino has printed something, read it and display it for the user to see.

That loop looks like this:

while True :
    # Check whether the user has typed anything (timeout of .2 sec):
    inp, outp, err = select.select([sys.stdin, self.ser], [], [], .2)

    # If the user has typed anything, send it to the Arduino:
    if sys.stdin in inp :
        line = sys.stdin.readline()
        self.ser.write(line)

    # If the Arduino has printed anything, display it:
    if self.ser in inp :
line = self.ser.readline().strip()
print "Arduino:", line

Add in a loop to find the right serial port (the Arduino doesn't always show up on /dev/ttyUSB0) and a little error and exception handling, and I had a useful script that met all my Arduino communication needs: ardmonitor.

Tags: , , , ,
[ 20:27 Oct 16, 2011    More hardware | permalink to this entry | ]

Tue, 27 Sep 2011

Banishing errant tooltips

Every now and then I have to run a program that doesn't manage its tooltips well. I mouse over some button to find out what it does, a tooltip pops up -- but then the tooltip won't go away. Even if I change desktops, the tooltip follows me and stays up on all desktops. Worse, it's set to stay on top of all other windows, so it blocks anything underneath it.

The places where I see this happen most often are XEphem (probably as an artifact of the broken Motif libraries we're stuck with on Linux); Adobe's acroread (Acrobat Reader), though perhaps that's gotten better since I last used it; and Wine.

I don't use Wine much, but lately I've had to use it for a medical imaging program that doesn't seem to have a Linux equivalent (viewing PETscan data). Every button has a tooltip, and once a tooltip pops up, it never goes aawy. Eventually I might have five of six of these little floating windows getting in the way of whatever I'm doing on other desktops, until I quit the wine program.

So how does one get rid of errant tooltips littering your screen? Could I write an Xlib program that could nuke them?

Finding window type

First we need to know what's special about tooltip windows, so the program can identify them. First I ran my wine program and produced some sticky tooltips.

Once they were up, I ran xwininfo and clicked on a tooltip. It gave me a bunch of information about the windows size and location, color depth, etc. ... but the useful part is this:

  Override Redirect State: yes

In X, override-redirect windows are windows that are immune to being controlled by the window manager. That's why they don't go away when you change desktops, or move when you move the parent window.

So what if I just find all override-redirect windows and unmap (hide) them? Or would that kill too many innocent victims?

Python-Xlib

I thought I'd have to write my little app in C, since it's doing low-level Xlib calls. But no -- there's a nice set of Python bindings, python-xlib. The documentation isn't great, but it was still pretty easy to whip something up.

The first thing I needed was a window list: I wanted to make sure I could find all the override-redirect windows. Here's how to do that:

from Xlib import display

dpy = display.Display()
screen = dpy.screen()
root = screen.root
tree = root.query_tree()

for w in tree.children :
    print w

w is a Window (documented here). I see in the documentation that I can get_attributes(). I'd also like to know which window is which -- calling get_wm_name() seems like a reasonable way to do that. Maybe if I print them, those will tell me how to find the override-redirect windows:

for w in tree.children :
    print w.get_wm_name(), w.get_attributes()

Window type, redux

Examining the list, I could see that override_redirect was one of the attributes. But there were quite a lot of override-redirect windows. It turns out many apps, such as Firefox, use them for things like menus. Most of the time they're not visible. But you can look at w.get_attributes().map_state to see that.

So that greatly reduced the number of windows I needed to examine:

for w in tree.children :
    att = w.get_attributes()
    if att.map_state and att.override_redirect :
        print w.get_wm_name(), att

I learned that tooltips from well-behaved programs like Firefox tended to set wm_name to the contents of the tooltip. Wine doesn't -- the wine tooltips had an empty string for wm_name. If I wanted to kill just the wine tooltips, that might be useful to know.

But I also noticed something more important: the tooltip windows were also "transient for" their parent windows. Transient for means a temporary window popped up on behalf of a parent window; it's kept on top of its parent window, and goes away when the parent does.

Now I had a reasonable set of attributes for the windows I wanted to unmap. I tried it:

for w in tree.children :
    att = w.get_attributes()
    if att.map_state and att.override_redirect and w.get_wm_transient_for():
        w.unmap()

It worked! At least in my first test: I ran the wine program, made a tooltip pop up, then ran my killtips program ... and the tooltip disappeared.

Multiple tooltips: flushing the display

But then I tried it with several tooltips showing (yes, wine will pop up new tooltips without hiding the old ones first) and the result wasn't so good. My program only hid the first tooltip. If I ran it again, it would hide the second, and again for the third. How odd!

I wondered if there might be a timing problem. Adding a time.sleep(1) after each w.unmap() fixed it, but sleeping surely wasn't the right solution.

But X is asynchronous: things don't necessarily happen right away. To force (well, at least encourage) X to deal with any queued events it might have stacked up, you can call dpy.flush().

I tried adding that after each w.unmap(), and it worked. But it turned out I only need one

dpy.flush()
at the end of the program, just exiting. Apparently if I don't do that, only the first unmap ever gets executed by the X server, and the rest are discarded. Sounds like flush() is a good idea as the last line of any python-xlib program.

killtips will hide tooltips from well-behaved programs too. If you have any tooltips showing in Firefox or any GTK programs, or any menus visible, killtips will unmap them. If I wanted to make sure the program only attacked the ones generated by wine, I could add an extra test on whether w.get_wm_name() == "".

But in practice, it doesn't seem to be a problem. Well-behaved programs handle having their tooltips unmapped just fine: the next time you call up a menu or a tooltip, the program will re-map it.

Not so in wine: once you dismiss one of those wine tooltips, it's gone forever, at least until you quit and restart the program. But that doesn't bother me much: once I've seen the tooltip for a button and found out what that button does, I'm probably not going to need to see it again for a while.

So I'm happy with killtips, and I think it will solve the problem. Here's the full script: killtips.

Tags: , , , ,
[ 11:36 Sep 27, 2011    More programming | permalink to this entry | ]

Fri, 09 Sep 2011

Count characters or words in the X selection from Python

This post is, above all, a lesson in doing a web search first. Even when what you're looking for is so obscure you're sure no one else has wanted it. But the script I got out of it might turn out to be useful.

It started with using Bitlbee for Twitter. I love bitlbee -- it turns a Twitter stream into just another IRC channel tab in the xchat I'm normally running anyway.

The only thing I didn't love about bitlbee is that, unlike the twitter app I'd previously used, I didn't have any way to keep track of when I neared the 140-character limit. There were various ways around that, mostly involving pasting the text into other apps before submitting it. But they were all too many steps.

It occurred to me that one way around this was to select-all, then run something that would show me the number of characters in the X selection. That sounded like an easy app to write.

Getting the X selection from Python

I was somewhat surprised to find that Python has no way of querying the X selection. It can do just about everything else -- even simulate X events. But there are several command-line applications that can print the selection, so it's easy enough to run xsel or xclip from Python and read the output.

I ended up writing a little app that brings up a dialog showing the current count, then hangs around until you dismiss it, querying the selection once a second and updating the count. It's called countsel.

Of course, if you don't want to write a Python script you can use commandline tools directly. Here are a couple of examples, using xclip instead of xsel: xterm -title 'lines words chars' -geometry 25x2 -e bash -c 'xclip -o | wc; read -n 1' pops up a terminal showing the "wc" counts of the selection once, and xterm -title 'lines words chars' -geometry 25x1 -e watch -t 'xclip -o | wc' loops over those counts printing them once a second.

Binding commands to a key is different for every window manager. In Openbox, I added this to rc.xml to call up my program whenever I type W-t (short for Twitter):

    <keybind key="W-t">
      <action name="Execute">
        <execute>/home/akkana/bin/countsel</execute>
      </action>
    </keybind>

Now, any time I needed to check my character count, I could triple-click or type Shift-Home, then hit W-t to call up the dialog and get a count. Then I could leave the dialog up, and whenever I wanted a new count, just Shift-Home or triple-click again, and the dialog updates automatically. Not perfect, but not bad.

Xchat plug-in for a much more elegant solution

Only after getting countsel working did it occur to me to wonder if anyone else had the same Bitlbee+xchat+twitter problem. And a web search found exactly what I needed: xchat-inputcount.pl, a wonderful xchat script that adds a character-counter next to the input box as you're typing. It's a teensy bit buggy, but still, it's far better than my solution. I had no idea you could add user-interface elements to xchat like that!

But that's okay. Countsel didn't take long to write. And I've added word counting to countsel, so I can use it for word counts on anything I'm writing.

Tags: , ,
[ 12:32 Sep 09, 2011    More programming | permalink to this entry | ]

Wed, 31 Aug 2011

Read Excel XLS spreadsheets with Python

Someone mailed out information to a club I'm in as an .XLS file. Another Excel spreadsheet. Sigh.

I do know one way to read them. Fire up OpenOffice, listen to my CPU fan spin as I wait forever for the app to start up, open the xls file, then click in one cell after another as I deal with the fact that spreadsheet programs only show you a tiny part of the text in each cell. I'm not against spreadsheets per se -- they're great for calculating tables of interconnected numbers -- but they're a terrible way to read tabular data.

Over the years, lots of open-source programs like word2x and catdoc have sprung up to read the text in MS Word .doc files. Surely by now there must be something like that for XLS files?

Well, I didn't find any ready-made programs, but I found something better: Python's xlrd module, as well as a nice clear example at ScienceOSS of how to Read Excel files from Python.

Following that example, in six lines I had a simple program to print the spreadsheet's contents:

import xlrd

for filename in sys.argv[1:] :
    wb = xlrd.open_workbook(filename)
    for sheetname in wb.sheet_names() :
        sh = wb.sheet_by_name(sheetname)
        for rownum in range(sh.nrows) :
            print sh.row_values(rownum)

Of course, having gotten that far, I wanted better formatting so I could compare the values in the spreadsheet. Didn't take long to write, and the whole thing still came out under 40 lines: xlsrd. And I was able to read that XLS file that was mailed to the club, easily and without hassle.

I'm forever amazed at all the wonderful, easy-to-use modules there are for Python.

Tags: ,
[ 10:58 Aug 31, 2011    More programming | permalink to this entry | ]

Thu, 25 Aug 2011

Deleting email from a mail server with Python

How do you delete email from a mail server without downloading or reading it all?

Why? Maybe you got a huge load of spam and you need to delete it. Maybe you have your laptop set up to keep a copy of your mail on the server so you can get it on your desktop later ... but after a while you realize it's not worth downloading all that mail again. In my case, I use an ISP that keeps copies of all mail forwarded from one alias to another, so I periodically need to clean out the copies.

There are quite a few reasons you might want to delete mail without reading it ... so I was surprised to find that there didn't seem to be any easy way to do so.

But POP3 is a fairly simple protocol. How hard could it be to write a Python script to do what I needed?

Not hard at all, in fact. The poplib package does most of the work for you, encapsulating both the networking and the POP3 protocol. It even does SSL, so you don't have to send your password in the clear.

Once you've authenticated, you can list() messages, which gives you a status and a list of message numbers and sizes, separated by a space. Just loop through them and delete each one.

Here's a skeleton program to delete messages:

server = "mail.example.com"
port = 995
user = "myname"
passwd = "seekrit"

pop = poplib.POP3_SSL(server, port)
pop.user(user)
pop.pass_(passwd)

poplist = pop.list()
if poplist[0].startswith('+OK') :
    msglist = poplist[1]
    for msgspec in msglist :
        # msgspec is something like "3 3941", 
        # msg number and size in octets
        msgnum = int(msgspec.split(' ')[0])
        print "Deleting msg %d\r" % msgnum,
        pop.dele(msgnum)
    else :
        print "No messages for", user
else :
    print "Couldn't list messages: status", poplist[0]
pop.quit()

Of course, you might want to add more error checking, loop through a list of users, etc. Here's the full script: deletemail.

Tags: , ,
[ 17:41 Aug 25, 2011    More programming | permalink to this entry | ]

Fri, 19 Aug 2011

Beginning Python: Sorting lists of objects

The Beginning Python class has pretty much died down -- although there are still a couple of interested students posting really great homework solutions, I think most people have fallen behind, and it's time to wrap up the course.

So today, I didn't post a formal lesson. But I did have something to share about how I used Python's object-oriented capabilities to solve a problem I had copying new podcast files onto my MP3 player. I used Python's built-in list sort() function, along with the easy way it lets me define operators like < and > for any object I define.

You can read all about it in my post to the Courses list describing how I sorted my list of podcast objects. Or just go straight to the final program, pods.

Tags: , ,
[ 19:48 Aug 19, 2011    More education | permalink to this entry | ]

Fri, 12 Aug 2011

Beginning Python, Lesson 9: More extras

Lesson 9 in my online Python course is up: Lesson 9: Extras (requested topics), including string operations, web development and GUI toolkits.

The web development and GUI toolkits are topics which were requested by students, while the string ops are things that just seemed too useful not to include.

Tags: , ,
[ 17:45 Aug 12, 2011    More education | permalink to this entry | ]

Fri, 05 Aug 2011

Beginning Python, Lesson 8: Extras

Lesson 8 in my online Python course is up: Lesson 8: Extras, including exception handling, optional arguments, and running system commands. A motley collection of fun and useful topics that didn't quite fit anywhere in the earlier formal lessons, but you'll find a lot of use for them in writing real-world Python scripts. In the homework, I have some examples of some of my scripts using these techniques; I'm sure the students will have lots of interesting problems of their own.

Tags: , ,
[ 14:56 Aug 05, 2011    More education | permalink to this entry | ]

Sat, 30 Jul 2011

Beginning Python, Lesson 7: Object-oriented programming

Lesson 7 in my online Python course is up: Lesson 7: Object-oriented programming.

This is the last formal lesson in the Beginning Python class. But I will be posting a few more "tips and tricks" lessons, little things that didn't fit in other lessons plus suggestions for useful Python packages students may want to check out as they continue their Python hacking.

Tags: , ,
[ 10:28 Jul 30, 2011    More education | permalink to this entry | ]

Fri, 22 Jul 2011

Beginning Python, Lesson 6: Functions and Dictionaries

Lesson 6 in my online Python course is up: Lesson 6: Functions and Dictionaries.

We're getting near the end of the course -- partly because I think students may be saturated, though I may post one more lesson. I'll post on the list and see what the students think about it.

This afternoon, though, is pretty much booked up trying to get my mother's new Nook Touch e-book reader working with Linux. Would be easy ... except that she wants to be able to check out books from her local public library, which of course uses proprietary software from Adobe and other companies to do DRM. It remains to be seen if this will be possible ... of course, I'll post the results once we know.

Tags: , ,
[ 17:49 Jul 22, 2011    More education | permalink to this entry | ]

Fri, 15 Jul 2011

Beginning Python, Lesson 5

Lesson 5 in my online Python course is up: Infinite loops, modulo, and random numbers.

It's a motley mix of topics, mostly because I wanted to have a fun homework project that actually did something interesting. I hope everyone enjoys it!

Tags: , ,
[ 16:44 Jul 15, 2011    More education | permalink to this entry | ]

Fri, 08 Jul 2011

Beginning Python, Lesson 4

Lesson 4 in my online Python course is up: Modules and command-line arguments.

This lesson is a little longer than previous lessons, but that's partly because of a couple of digressions at the beginning. Hope I didn't overdo it! The homework includes an optional debugging problem for folks who want to dive a little deeper into this stuff.

Tags: , ,
[ 20:20 Jul 08, 2011    More education | permalink to this entry | ]

Sun, 03 Jul 2011

Beginning Python, Lesson 3: Strings and Lists

Lesson 3 in my online Python course is up: Fun with Strings and Lists.

There may be some backlog on the mailing list -- my first attempt to post the lesson didn't show up at all, but my second try made it. Mail seems to be flowing now, but if you try to post something and it doesn't show up, let me know or tell us on irc.linuxchix.org, so we know if there's a continuing problem that needs to be fixed, not just a one-time glitch.

Meanwhile, I'm having some trouble getting new blog entries posted. Due to some network glitches, I had to migrate shallowsky.com to a different ISP, and it turns out the PyBlosxom 1.4 I'd been using doesn't work with more recent versions of Python; but none of my PyBlosxom plug-ins work in 1.5. Aren't software upgrades a joy? So I'm getting lots of practice debugging other people's Python code trying to get the plug-ins updated, and there probably won't be many blog entries until I've figured that out.

Once that's all straightened out, I should have a cool new PyTopo feature to report on, as well as some Arduino hacks I've had on the back burner for a while.

Tags: , ,
[ 11:57 Jul 03, 2011    More education | permalink to this entry | ]

Fri, 24 Jun 2011

Beginning Python, Lesson 2 posted

I've just posted Lesson 2 in my online Python course, covering loops, if statements, and beer! You can read it in the list archives: Lesson 2: Loops, if, and beer, or, better, subscribe to the list so you can join the discussion.

I hope everybody has fun writing loops!

Tags: , ,
[ 16:10 Jun 24, 2011    More education | permalink to this entry | ]

Thu, 16 Jun 2011

Beginning Programming in Python course starting

I'm about to start a new LinuxChix course: Beginning Programming in Python.

It will be held on the Linuxchix Courses mailing list: to follow the course, subscribe to the list. Lessons will be posted weekly, on Fridays, with the first lesson starting tomorrow, Friday, June 17.

This is intended a short course, probably only 4-5 weeks to start with, aimed mostly at people who are new to programming. Though of course anyone is welcome, even if you've programmed before. And experienced programmers are welcome to hang out, lurk and help answer questions. I might extended the course if people are still interested and having fun.

The course is free (just subscribe to the mailing list) and open to both women and men. Standard LinuxChix rules apply: Be polite, be helpful. And do the homework. :-)

Tags: , ,
[ 09:51 Jun 16, 2011    More education | permalink to this entry | ]

Fri, 20 May 2011

Packaging Python for MeeGo (or other RPM-based distros)

Writing Python scripts for MeeGo is easy. But how do you package a Python script in an RPM other MeeGo users can install?

It turned out to be far easier than I expected. Python and Ubuntu had all the tools I needed.

First you'll need a .desktop file describing your app, if you don't already have one. This gives window managers the information they need to show your icon and application name so the user can run it. Here's the one I wrote for PyTopo: pytopo.desktop.

Of course, you'll also want a desktop icon. Most other applications on MeeGo seemed to use 48x48 pixel PNG images, so that's what I made, though it seems to be quite flexible -- an SVG is ideal.

With your script, desktop file and an icon, you're ready to create a package.

Create a setup.py file describing your package, as in the distutils simple example or the more detailed distutils setup script page. For a sample standalone script with a desktop file and icon, you can take a look at my PyTopo setup.py.

Starting from the Python setup script, Python's distutils can generate RPM or even Windows packages -- assuming you have the appropriate tools installed on your machine.

I'm on an Ubuntu (Debian-based) machine, and all the docs imply you have to be on an RPM-based distro to make an RPM. Happily, that's not true: Ubuntu has RPM tools you can install.

$ sudo apt-get install rpm

Then let Python do its thing:

$ python setup.py bdist_rpm

Python generates the spec file and everything else needed and builds a multiarch RPM that's ready to install on MeeGo. You can install it by copying it to the MeeGo device with scp dist/PyTopo-1.0-1.noarch.rpm meego@address.of.device:/tmp/. Then, as root on the device, install it with rpm -i /tmp/PyTopo-1.0-1.noarch.rpm. You're done!

To see a working example, you can browse my latest PyTopo source (only what's in SVN; it needs a few more tweaks before it's ready for a formal release). Or try the RPM I made for MeeGo: PyTopo-1.0-1.noarch.rpm. I'd love to hear whether this works on other RPM-based distros.

What about Debian packages?

Curiously, making a Debian package on Debian/Ubuntu is much less straightforward even if you're starting on a Debian/Ubuntu machine. Distutils can't do it on its own. There's a Debian Python package recipe, but it begins with a caution that you shouldn't use it for a package you want to submit. For that, you probably have to wade through the Complete Ubuntu Packaging Guide. Clearly, that will need a separate article.

Tags: , , , ,
[ 18:44 May 20, 2011    More programming | permalink to this entry | ]

Fri, 13 May 2011

Children of the Code -- Derived Python projects

I got some fun email today -- two different people letting me know about new projects derived from my Python code.

One is M-Poker, originally based on a PyQt tutorial I wrote for Linux Planet. Ville Jyrkkä has taken that sketch and turned it into a real poker program. And it uses PySide now -- the new replacement for PyQt, and one I need to start using for MeeGo development. So I'll be taking a look at M-Poker myself and maybe learning things from it. There are some screenshots on the blog A Hacker's Life in Finland.

The other project is xkemu, a Python module for faking keypresses, grown out of pykey, a Python version of my Crikey keypress generation program. xkemu-server.py looks like a neat project -- you can run it and send it commands to generate key presses, rather than just running a script each time.

(Sniff) My children are going out into the world and joining other projects. I feel so proud. :-)

Tags: ,
[ 21:04 May 13, 2011    More programming | permalink to this entry | ]

Mon, 18 Apr 2011

A simple Python mixer (to solve a problem with sound in Natty)

I had to buy a new hard drive recently, and figured as long as I had a new install ahead of me, why not try the latest Ubuntu 11.04 beta, "Natty Narwhal"?

One of the things I noticed right away was that sound was really LOUD! -- and my usual volume keys weren't working to change that.

I have a simple setup under openbox: Meta-F7 and Meta-F8 call a shell script called "louder" and "softer" (two links to the same script), and depending on how it's invoked, the script calls aumix -v +4 or aumix -v -4.

Great, except it turns out aumix doesn't work -- at all -- under Natty (bug 684416). Rumor has it that Natty has dropped all support for OSS sound, though I don't know if that's actually true -- the bug has been sitting for four months without anyone commenting on it. (Ubuntu never seems terribly concerned about having programs in their repositories that completely fail to do anything; sadly, programs can persist that way for years.)

The command-line replacement for aumix seems to be amixer, but its documentation is sketchy at best. After a bit of experimentation, I found if I set the Master volume to 100% using alsamixergui, I could call amixer set PCM 4- or 4-. But I couldn't use amixer set Master 4+ -- sometimes it would work but most of the time it wouldn't.

That all seemed a bit too flaky for me -- surely there must be a better way? Some magic Python library? Sure enough, there's python-alsaaudio, and learning how to use it took a lot less time than I'd already wasted trying random amixer commands to see what worked. Here's the program:

#!/usr/bin/env python
# Set the volume louder or softer, depending on program name.

import alsaaudio, sys, os

increment = 4

# First find a mixer. Use the first one.
try :
    mixer = alsaaudio.Mixer('Master', 0)
except alsaaudio.ALSAAudioError :
    sys.stderr.write("No such mixer\n")
    sys.exit(1)

cur = mixer.getvolume()[0]
if os.path.basename(sys.argv[0]).startswith("louder") :
    mixer.setvolume(cur + increment, alsaaudio.MIXER_CHANNEL_ALL)
else :
    mixer.setvolume(cur - increment, alsaaudio.MIXER_CHANNEL_ALL)
print "Volume from", cur, "to", mixer.getvolume()[0]

Tags: , , ,
[ 21:13 Apr 18, 2011    More programming | permalink to this entry | ]

Fri, 18 Mar 2011

Finding Twitter references to you

Twitter is a bit frustrating when you try to have conversations there. You say something, then an hour later, someone replies to you (by making a tweet that includes your Twitter @handle). If you're away from your computer, or don't happen to be watching it with an eagle eye right then -- that's it, you'll never see it again. Some Twitter programs alert you to @ references even if they're old, but many programs don't.

Wouldn't it be nice if you could be notified regularly if anyone replied to your tweets, or mentioned you?

Happily, you can. The Twitter API is fairly simple; I wrote a Python function a while back to do searches in my Twitter app "twit", based on a code snippet I originally cribbed from Gwibber. But if you take out all the user interface code from twit and use just the simple JSON code, you get a nice short app. The full script is here: twitref, but the essence of it is this:

import sys, simplejson, urllib, urllib2

def get_search_data(query):
    s = simplejson.loads(urllib2.urlopen(
            urllib2.Request("http://search.twitter.com/search.json",
                            urllib.urlencode({"q": query}))).read())
    return s

def json_search(query):
    for data in get_search_data(query)["results"]:
        yield data

if __name__ == "__main__" :
    for searchterm in sys.argv[1:] :
        print "**** Tweets containing", searchterm
        statuses = json_search(searchterm)
        for st in statuses :
            print st['created_at']
            print "<%s> %s" % (st['from_user'], st['text'])
            print ""

You can run twitref @yourname from the commandline now and then. You can even call it as a cron job and mail yourself the output, if you want to make sure you see replies. Of course, you can use it to search for other patterns too, like twitref #vss or twitref #scale9x.

You'll need the simplejson Python library, which most distros offer as a package; on Ubuntu, install python-simplejson.

It's unclear how long any of this will continue to be supported, since Twitter recently announced that they disapprove of third-party apps using their API. Oh, well ... if Twitter stops allowing outside apps, I'm not sure how interested I'll be in continuing to use it.

On the other hand, their original announcement on Google Groups seems to have been removed -- I was going to link to it here and discovered it was no longer there. So maybe Twitter is listening to the outcry and re-thinking their position.

Tags: , , ,
[ 10:53 Mar 18, 2011    More programming | permalink to this entry | ]

Thu, 10 Mar 2011

On Linux Planet: Plotting mail logs with CairoPlot

[Pie chart showing origins of spam]

My latest LinuxPlanet article is on plotting pretty graphs from Python with CairoPlot.

Of course, to demonstrate a graphing package I needed some data. So I decided to plot some stats parsed from my Postfix mail log file. We bounce a lot of mail (mostly spam but some false positives from mis-configured email servers) that comes in with bogus HELO addresses. So I thought I'd take a graphical look at the geographical sources of those messages.

The majority were from IPs that weren't identifiable at all -- no reverse DNS info. But after that, the vast majority turned out to be, surprisingly, from .il (Israel) and .br (Brazil).

Surprised me! What fun to get useful and interesting data when I thought I was just looking for samples for an article.

Tags: , , ,
[ 15:08 Mar 10, 2011    More programming | permalink to this entry | ]

Tue, 22 Feb 2011

Python for (Cartalk) Puzzlers

Last week's Car Talk had a fun puzzler called "Three Pieces of Paper":

Three different numbers are chosen at random, and one is written on each of three slips of paper. The slips are then placed face down on the table. The objective is to choose the slip upon which is written the largest number.

Here are the rules: You can turn over any slip of paper and look at the amount written on it. If for any reason you think this is the largest, you're done; you keep it. Otherwise you discard it and turn over a second slip. Again, if you think this is the one with the biggest number, you keep that one and the game is over. If you don't, you discard that one too.

What are the odds of winning? The obvious answer is one in three, but you can do better than that. After thinking about it a little I figured out the strategy pretty quickly (I won't spoil it here; follow the link above to see the answer). But the question was: how often does the correct strategy give you the answer?

It made for a good "things to think about when trying to fall asleep" insomnia game. And I mostly convinced myself that the answer was 50%. But probability problems are tricky beasts (witness the Monty Hall Problem, which even professional mathematicians got wrong) and I wasn't confident about it. Even after hearing Click and Clack describe the answer on this week's show, asserting that the answer was 50%, I still wanted to prove it to myself.

Why not write a simple program? That way I could run lots of trials and see if the strategy wins 50% of the time.

So here's my silly Python program:

#! /usr/bin/env python

# Cartalk puzzler Feb 2011

import random, time

random.seed()

tot = 0
wins = 0

while True:
    # pick 3 numbers:
    n1 = random.randint(0, 100)
    n2 = random.randint(0, 100)
    n3 = random.randint(0, 100)

    # Always look at but discard the first number.
    # If the second number is greater than the first, stick with it;
    # otherwise choose the third number.
    if n2 > n1 :
        final = n2
    else :
        final = n3

    biggest = max(n1, n2, n3)
    win = (final == biggest)
    tot += 1
    if win :
        wins += 1
    print "%4d %4d %4d %10d %10s %6d/%-6d = %10d%%" % (n1, n2, n3, final,
                                                       str(win),
                                                       wins, tot,
                                                       int(wins*100/tot))
    if tot % 1000 == 0:
        print "(%d ...)" % tot
        time.sleep(1)

It chooses numbers between 0 and 100, for no particular reason; I could randomize that, but it wouldn't matter to the result. I made it print out all the outcomes, but pause for a second after every thousand trials ... otherwise the text scrolls too fast to read.

And indeed, the answer converges very rapidly to 50%. Hurray!

After I wrote the script, I checked Car Talk's website. They have a good breakdown of all the possible outcomes and how they map to a probability. Of course, I could have checked that first, before writing the program. But I was thinking about this in the car while driving home, with no access to the web ... and besides, isn't it always more fun to prove something to yourself than to take someone else's word for it?

Tags: , , ,
[ 21:17 Feb 22, 2011    More programming | permalink to this entry | ]

Fri, 18 Feb 2011

New GIMP Arrow Designer

[arrow] While writing a blog post on GIMP's confusing Auto button (to be posted soon), I needed some arrows, and discovered a bug in my Arrow Designer script when making arrows that are mostly vertical.

So I fixed it. You can get the new Arrow Designer 0.5 on my GIMP Arrow Designer page.

It's purely a coincidence that I discovered this a week before SCALE, where I'll be speaking on Writing GIMP Scripts and Plug-Ins. Arrow Designer is one of my showpieces for making interactive plug-ins with GIMP-Python, so I'm glad I noticed the bug when I did.

Tags: , ,
[ 21:28 Feb 18, 2011    More gimp | permalink to this entry | ]

Mon, 31 Jan 2011

Feedme 0.7

[FeedMe, Seymour!] I've been enjoying my Android tablet e-reader for a couple of months now ... and it's made me realize some of the shortcomings in FeedMe. So of course I've been making changes along the way -- quite a few of them, from handling multiple output file types (html, plucker, ePub or FictionBook) to smarter handling of start, end and skip patterns to a different format of the output directory.

It's been fairly solid for a few weeks now, so it's time to release ... FeedMe 0.7.

Tags: , , ,
[ 22:32 Jan 31, 2011    More programming | permalink to this entry | ]

Tue, 18 Jan 2011

X Terminal Colors (and dark and light backgrounds)

[Displaying colors in an xterm] At work, I'm testing some web programming on a server where we use a shared account -- everybody logs in as the same user. That wouldn't be a problem, except nearly all Linuxes are set up to use colors in programs like ls and vim that are only readable against a dark background. I prefer a light background (not white) for my terminal windows.

How, then, can I set things up so that both dark- and light-backgrounded people can use the account? I could set up a script that would set up a different set of aliases and configuration files, like when I changed my vim colors. Better, I could fix all of them at once by changing my terminal's idea of colors -- so when the remote machine thinks it's feeding me a light color, I see a dark one.

I use xterm, which has an easy way of setting colors: it has a list of 16 colors defined in X resources. So I can change them in ~/.Xdefaults.

That's all very well. But first I needed a way of seeing the existing colors, so I knew what needed changing, and of testing my changes.

Script to show all terminal colors

I thought I remembered once seeing a program to display terminal colors, but now that I needed one, I couldn't find it. Surely it should be trivial to write. Just find the escape sequences and write a script to substitute 0 through 15, right?

Except finding the escape sequences turned out to be harder than I expected. Sure, I found them -- lots of them, pages that conflicted with each other, most giving sequences that didn't do anything visible in my xterm.

Eventually I used script to capture output from a vim session to see what it used. It used <ESC>[38;5;Nm to set color N, and <ESC>[m to reset to the default color. This more or less agreed Wikipedia's ANSI escape code page, which says <ESC>[38;5; does "Set xterm-256 text coloor" with a note "Dubious - discuss". The discussion says this isn't very standard. That page also mentions the simpler sequence <ESC>[0;Nm to set the first 8 colors.

Okay, so why not write a script that shows both? Like this:

#! /usr/bin/env python

# Display the colors available in a terminal.

print "16-color mode:"
for color in range(0, 16) :
    for i in range(0, 3) :
        print "\033[0;%sm%02s\033[m" % (str(color + 30), str(color)),
    print

# Programs like ls and vim use the first 16 colors of the 256-color palette.
print "256-color mode:"
for color in range(0, 256) :
    for i in range(0, 3) :
        print "\033[38;5;%sm%03s\033[m" % (str(color), str(color)),
    print

Voilà! That shows the 8 colors I needed to see what vim and ls were doing, plus a lovely rainbow of other possible colors in case I ever want to do any serious ASCII graphics in my terminal.

Changing the X resources

The next step was to change the X resources. I started by looking for where the current resources were set, and found them in /etc/X11/app-defaults/XTerm-color:

$ grep color /etc/X11/app-defaults/XTerm-color
irrelevant stuff snipped
*VT100*color0: black
*VT100*color1: red3
*VT100*color2: green3
*VT100*color3: yellow3
*VT100*color4: blue2
*VT100*color5: magenta3
*VT100*color6: cyan3
*VT100*color7: gray90
*VT100*color8: gray50
*VT100*color9: red
*VT100*color10: green
*VT100*color11: yellow
*VT100*color12: rgb:5c/5c/ff
*VT100*color13: magenta
*VT100*color14: cyan
*VT100*color15: white
! Disclaimer: there are no standard colors used in terminal emulation.
! The choice for color4 and color12 is a tradeoff between contrast, depending
! on whether they are used for text or backgrounds.  Note that either color4 or
! color12 would be used for text, while only color4 would be used for a
! Originally color4/color12 were set to the names blue3/blue
!*VT100*color4: blue3
!*VT100*color12: blue
!*VT100*color4: DodgerBlue1
!*VT100*color12: SteelBlue1

So all I needed to do was take the ones that don't show up well -- yellow, green and so forth -- and change them to colors that work better, choosing from the color names in /etc/X11/rgb.txt or my own RGB values. So I added lines like this to my ~/.Xdefaults:

!! color2 was green3
*VT100*color2: green4
!! color8 was gray50
*VT100*color8: gray30
!! color10 was green
*VT100*color10: rgb:00/aa/00
!! color11 was yellow
*VT100*color11: dark orange
!! color14 was cyan
*VT100*color14: dark cyan
... and so on.

Now I can share accounts, and I no longer have to curse at those default ls and vim settings!

Update: Tip from Mikachu: ctlseqs.txt is an excellent reference on terminal control sequences.


Tags: , , , , ,
[ 10:56 Jan 18, 2011    More linux | permalink to this entry | ]

Tue, 04 Jan 2011

Fontasia v 0.5

[Fontasia, a font viewer and categorizer] I had a nice relaxing holiday season. A little too relaxing -- I didn't get much hacking done, and spent more time fighting with things that didn't work than making progress fixing things.

But I did spend quite a bit of time with my laptop, currently running Arch Linux, trying to get the fonts to work as well as they do in Ubuntu. I don't have a definite solution yet to my Arch font issues, but all the fiddling with fonts did lead me to realize that I needed an easier way to preview specific fonts in bold.

So I added Bold and Italic buttons to fontasia, and called it Fontasia 0.5. I'm finding it quite handy for previewing all my fixed-width fonts while trying to find one emacs can display.

Tags: , ,
[ 23:00 Jan 04, 2011    More programming | permalink to this entry | ]

Sat, 30 Oct 2010

New versions of mapping programs: Pytopo and Ellie

[pytopo logo] On our recent Mojave trip, as usual I spent some of the evenings reviewing maps and track logs from some of the neat places we explored.

There isn't really any existing open source program for offline mapping, something that works even when you don't have a network. So long ago, I wrote Pytopo, a little program that can take map tiles from a Windows program called Topo! (or tiles you generate yourself somehow) and let you navigate around in that map.

But in the last few years, a wonderful new source of map tiles has become available: OpenStreetMap. On my last desert trip, I whipped up some code to show OSM tiles, but a lot of the code was hacky and empirical because I couldn't find any documentation for details like the tile naming scheme.

Well, that's changed. Upon returning to civilization I discovered there's now a wonderful page explaining the Slippy map tilenames very clearly, with sample code and everything. And that was the missing piece -- from there, all the things I'd been missing in pytopo came together, and now it's a useful self-contained mapping script that can download its own tiles, and cache them so that when you lose net access, your maps don't disappear along with everything else.

Pytopo can show GPS track logs and waypoints, so you can see where you went as well as where you might want to go, and whether that road off to the right actually would have connected with where you thought you were heading.

It's all updated in svn and on the Pytopo page.

Ellie

[Ellie icon]

Most of the pytopo work came after returning from the desert, when I was able to google and find that OSM tile naming page. But while still out there and with no access to the web, I wanted to review the track logs from some of our hikes and see how much climbing we'd done. I have a simple package for plotting elevation from track logs, called Ellie. But when I ran it, I discovered that I'd never gotten around to installing the pylab Python plotting package (say that three times fast!) on this laptop.

No hope of installing the package without a net ... so instead, I tweaked Ellie so that so that without pylab you can still print out statistics like total climb. While I was at it I added total distance, time spent moving and time spent stopped. Not a big deal, but it gave me the numbers I wanted. It's available as ellie 0.3.

Tags: , , , ,
[ 19:24 Oct 30, 2010    More mapping | permalink to this entry | ]

Fri, 15 Oct 2010

Snakes on a Couch! Using Python with CouchDB

Part II of my CouchDB tutorial is out at Linux Planet. In it, I use Python and CouchDB to write a simple application that keeps track of which restaurants you've been to recently, and to suggest new places to eat where you haven't been.

Snakes on a Couch, Part 2: Where do you want to eat?

Tags: , , , ,
[ 21:00 Oct 15, 2010    More writing | permalink to this entry | ]

Thu, 23 Sep 2010

Snakes on a Couch! Using Python with CouchDB

I've been learning CouchDB, the hot NoSQL database, as part of my new job. It's interesting -- a very different mindset compared to classic databases like MySQL.

There's a fairly good Python package for it, python-couchdb ... but the documentation is somewhat incomplete and there's very little else written about it, and virtually no sample code to steal.

That makes it a perfect topic for a Linux Planet tutorial! So here it is, Part 1:

Snakes on a Couch! Using Python with CouchDB.

I have a rather fun application for the database I introduce in the article, but you'll have to wait until Part 2, two weeks from now, to see the details.

Tags: , , , ,
[ 11:55 Sep 23, 2010    More writing | permalink to this entry | ]

Fri, 03 Sep 2010

Fontasia v 0.3

A couple of weeks ago I posted about fontasia, my new font-chooser app. [Fontasia: font viewer/categorizer It's gone through a couple of revisions since then, and Mikael Magnusson contributed several excellent improvements, like being able to render each font in the font list.

I'd been holding off on posting 0.3, hoping to have time to do something about the font buttons -- they really need to be smaller, so there's space for more categories. But between a new job and several other commitments, I haven't had time to implement that. And the fancy font list is so cool it really ought to be shared.

So here it is: fontasia 0.3.

Tags: , ,
[ 10:31 Sep 03, 2010    More programming | permalink to this entry | ]

Tue, 17 Aug 2010

Fontasia: View and categorize your fonts

[Fontasia: font viewer/categorizer We were talking about fonts again on IRC, and how there really isn't any decent font viewer on Linux that lets you group fonts into categories.

Any time you need to choose a font -- perhaps you know you need one that's fixed-width, script, cartoony, western-themed -- you have to go through your entire font list, clicking one by one on hundreds of fonts and saving the relevant ones somehow so you can compare them later. If you have a lot of fonts installed, it can take an hour or more to choose the right font for a project.

There's a program called fontypython that does some font categorization, but it's hard to use: it doesn't operate on your installed fonts, only on fonts you copy into a special directory. I never quite understood that; I want to categorize the fonts I can actually use on my system.

I've been wanting to write a font categorizer for a long time, but I always trip up on finding documentation on getting Python to render fonts. But this time, when I googled, I found jan bodnar's ZetCode Pango tutorial, which gave me all I needed and I was off and running.

Fontasia is initially a font viewer. It shows all your fonts in a list on the left, with a preview on the right. But it also lets you add categories: just type the category name in the box and click Add category and a button for that category will appear, with the current font added to it. A font can be in multiple categories.

Once you've categorized your fonts, a menu at the top of the window lets you show just the fonts in a particular category. So if you're working on a project that needs a Western-style font, show that category and you'll see only relevant fonts.

You can also show only the fonts you've categorized -- that way you can exclude fonts you never use -- I don't speak Tamil or Urdu so I don't really need to see those fonts when I'm choosing a font. Or you can show only the uncategorized fonts: this is useful when you add some new fonts to your system and need to go through them and categorize them.

I'm excited about fontasia. It's only a few days old and already used it several times for real-world font selection problems.

If you want to try it, it's here: Fontasia: View and categorize fonts.

Tags: , ,
[ 12:20 Aug 17, 2010    More programming | permalink to this entry | ]

Sat, 10 Jul 2010

Interactive arrow design in GIMP

How many times have you wanted an easy way of making arrows in GIMP?

I need arrows all the time, for screenshots and diagrams. And there really isn't any easy way to do that in GIMP. There's a script-fu for making arrows in the Plug-in registry, but it's fiddly and always takes quite a few iterations to get it right. More often, I use a collection of arrow brushes I downloaded from somewhere -- I can't remember exactly where I got my collection, but there are lots of options if you google gimp arrow brushes -- then use the free rotate tool to rotate the arrow in the right direction.

[GIMP Arrow Designer] The topic of arrows came up again on #gimp yesterday, and Alexia Death mentioned her script-fu in GIMP Fx Foundary that "abuses the selection" to make shapes, like stars and polygons. She suggested that it would be easy to make arrows the same way, using the current selection as a guide to where the arrow should go.

And that got me thinking about Joao Bueno's neat Python plug-in demo that watches the size of the selection and updates a dialog every time the selection changes. Why not write an interactive Python script that monitors the selection and lets you change the arrow by changing the size of the selection, while fine-tuning the shape and size of the arrowhead interactively via a dialog?

Of course I had to write it. And it works great! I wish I'd written this five years ago.

This will also make a great demo for my OSCON 2010 talk on Writing GIMP Scripts and Plug-ins, Thursday July 22. I wish I'd had it for Libre Graphics Meeting last month.

It's here: GIMP Arrow Designer.

Tags: , , ,
[ 11:25 Jul 10, 2010    More gimp | permalink to this entry | ]

Fri, 16 Apr 2010

Tee in Python

I needed a way to send the output of a Python program to two places simultaneously: print it on-screen, and save it to a file.

Normally I'd use the Linux command tee for that: prog | tee prog.out saves a copy of the output to the file prog.out as well as printing it. That worked fine until I added something that needed to prompt the user for an answer. That doesn't work when you're piping through tee: the output gets buffered and doesn't show up when you need it to, even if you try to flush() it explicitly.

I investigated shell-based solutions: the output I need is on sterr, while Python's raw_input() user prompt uses stdout, so if I could get the shell to send stderr through tee without stdout, that would have worked. My preferred shell, tcsh, can't do this at all, but bash supposedly can. But the best examples I could find on the web, like the arcane prog 2>&1 >&3 3>&- | tee prog.out 3>&- didn't work.

I considered using /dev/tty or opening a pty, but those calls only work on Linux and Unix and the program is otherwise cross-platform.

What I really wanted was a class that acts like a standard Python file object, but when you write to it it writes to two places: the log file and stderr.

I found an example of someone trying to write a Python tee class, but it didn't work: it worked for write() but not for print >>

I am greatly indebted to KirkMcDonald of #python for finding the problem. In the Python source implementing >>, PyFile_WriteObject (line 2447) checks the object's type, and if it's subclassed from the built-in file object, it writes directly to the object's fd instead of calling write().

The solution is to use composition rather than inheritance. Don't make your file-like class inherit from file, but instead include a file object inside it. Like this:

import sys

class tee :
    def __init__(self, _fd1, _fd2) :
        self.fd1 = _fd1
        self.fd2 = _fd2

    def __del__(self) :
        if self.fd1 != sys.stdout and self.fd1 != sys.stderr :
            self.fd1.close()
        if self.fd2 != sys.stdout and self.fd2 != sys.stderr :
            self.fd2.close()

    def write(self, text) :
        self.fd1.write(text)
        self.fd2.write(text)

    def flush(self) :
        self.fd1.flush()
        self.fd2.flush()

stderrsav = sys.stderr
outputlog = open(logfilename, "w")
sys.stderr = tee(stderrsav, outputlog)

And it works! print >>sys.stderr, "Hello, world" now goes to the file as well as stderr, and raw_input still works to prompt the user for input.

In general, I'm told, it's not safe to inherit from Python's built-in objects like file, because they tend to make assumptions instead of making virtual calls to your overloaded methods. What happened here will happen for other objects too. So use composition instead when extending Python's built-in types.

Tags: ,
[ 09:48 Apr 16, 2010    More programming | permalink to this entry | ]

Fri, 08 Jan 2010

Python-GTK regression: How to catch mouse button release

We just had the second earthquake in two days, and I was chatting with someone about past earthquakes and wanted to measure the distance to some local landmarks. So I fired up PyTopo as the easiest way to do that. Click on one point, click on a second point and it prints distance and bearing from the first point to the second.

Except it didn't. In fact, clicks weren't working at all. And although I have hacked a bit on parts of pytopo (the most recent project was trying to get scaling working properly in tiles imported from OpenStreetMap), the click handling isn't something I've touched in quite a while.

It turned out that there's a regression in PyGTK: mouse button release events now need you to set an event mask for button presses as well as button releases. You need both, for some reason. So you now need code that looks like this:

drawing_area.connect("button-release-event", button_event)
drawing_area.set_events(gtk.gdk.EXPOSURE_MASK |
                        # next line wasn't needed before:
                        gtk.gdk.BUTTON_PRESS_MASK |
                        gtk.gdk.BUTTON_RELEASE_MASK )

An easy fix ... once you find it.

I filed bug 606453 to see whether the regression was intentional.

I've checked in the fix to the PyTopo svn repository on Google Code. It's so nice having a public source code repository like that! I'm planning to move Pho to Google Code soon.

Tags: , , , ,
[ 14:20 Jan 08, 2010    More programming | permalink to this entry | ]

Wed, 25 Nov 2009

Character Sets and Encodings in Linux, part 2

Continuing the discussion of those funny characters you sometimes see in email or on web pages, today's Linux Planet article discusses how to convert and handle encoding errors, using Python or the command-line tool recode:

Mastering Characters Sets in Linux (Weird Characters, part 2).

Tags: , , , , , , ,
[ 15:06 Nov 25, 2009    More writing | permalink to this entry | ]

Wed, 11 Nov 2009

Building a Py-Webkit-GTK presentation tool

I almost always write my presentation slides using HTML. Usually I use Firefox to present them; it's the browser I normally run, so I know it's installd and the slides all work there. But there are several disadvantages to using Firefox:

Last year, when I was researching lightweight browsers, one of the ones that impressed me most was something I didn't expect: the demo app that comes with pywebkitgtk (package python-webkit on Ubuntu). In just a few lines of Python, you can create your own browser with any UI you like, with a fully functional content area. Their current demo even has tabs.

So why not use pywebkitgtk to create a simple fullscreen webkit-based presentation tool?

It was even simpler than I expected. Here's the code:

#!/usr/bin/env python
# python-gtk-webkit presentation program.
# Copyright (C) 2009 by Akkana Peck.
# Share and enjoy under the GPL v2 or later.

import sys
import gobject
import gtk
import webkit

class WebBrowser(gtk.Window):
    def __init__(self, url):
        gtk.Window.__init__(self)
        self.fullscreen()

        self._browser= webkit.WebView()
        self.add(self._browser)
        self.connect('destroy', gtk.main_quit)

        self._browser.open(url)
        self.show_all()

if __name__ == "__main__":
    if len(sys.argv) <= 1 :
        print "Usage:", sys.argv[0], "url"
        sys.exit(0)

    gobject.threads_init()
    webbrowser = WebBrowser(sys.argv[1])
    gtk.main()

That's all! No navigation needed, since the slides include javascript navigation to skip to the next slide, previous, beginning and end. It does need some way to quit (for now I kill it with ctrl-C) but that should be easy to add.

Webkit and image buffering

It works great. The only problem is that webkit's image loading turns out to be fairly poor compared to Firefox's. In a presentation where most slides are full-page images, webkit clears the browser screen to white, then loads the image, creating a noticable flash each time. Having the images in cache, by stepping through the slide show then starting from the beginning again, doesn't help much (these are local images on disk anyway, not loaded from the net). Firefox loads the same images with no flash and no perceptible delay.

I'm not sure if there's a solution. I asked some webkit developers and the only suggestion I got was to rewrite the javascript in the slides to do image preloading. I'd rather not do that -- it would complicate the slide code quite a bit solely for a problem that exists only in one library.

There might be some clever way to hack double-buffering in the app code. Perhaps something like catching the 'load-started' signal, switching to another gtk widget that's a static copy of the current page (if there's a way to do that), then switching back on 'load-finished'.

But that will be a separate article if I figure it out. Ideas welcome!

Update, years later: I've used this for quite a few real presentations now. Of course, I keep tweaking it: see my scripts page for the latest version.

Tags: , , , ,
[ 17:12 Nov 11, 2009    More programming | permalink to this entry | ]

Tue, 20 Oct 2009

Gathering RSS files for a Palm PDA: FeedMe

For years I've been reading daily news feeds on a series of PalmOS PDAs, using a program called Sitescooper that finds new pages on my list of sites, downloads them, then runs Plucker to translate them into Plucker's open Palm-compatible ebook format.

Sitescooper has an elaborate series of rules for trying to get around the complicated formatting in modern HTML web pages. It has an elaborate cache system to figure out what it's seen before. When sites change their design (which most news sites seem to do roughly monthly), it means going in and figuring out the new format and writing a new Sitescooper site file. And it doesn't understand RSS, so you can't use the simplified RSS that most sites offer. Finally, it's no longer maintained; in fact, I was the last maintainer, after the original author lost interest.

Several weeks ago, bma tweeted about a Python RSS reader he'd hacked up using the feedparser package. His reader targeted email, not Palm, but finding out about feedparser was enough to get me started. So I wrote FeedMe (Carla Schroder came up with the all-important name).

I've been using it for a couple of weeks now and I'm very happy with the results. It's still quite rough, of course, but it's already producing better files than Sitescooper did, and it seems more maintainable. Time will tell.

Of course it needs to be made more flexible, adjusted so that it can produce formats besides Plucker, and so on. I'll get to it.

And the only site I miss now, because it doesn't offer an RSS feed, is Linux Planet. Maybe I'll find a solution for that eventually.

Tags: , , , ,
[ 21:08 Oct 20, 2009    More programming | permalink to this entry | ]

Mon, 03 Aug 2009

Twit: Now with pattern searches

During OSCON a couple of weeks ago, I kept wishing I could do Twitter searches for a pattern like #oscon in a cleaner way than keeping a tab open in Firefox where I periodically hit Refresh.

Python-twitter doesn't support searches, alas, though it is part of the Twitter API. There's an experimental branch of python-twitter with searching, but I couldn't get it to work. But it turns out Gwibber is also written in Python, and I was able to lift some JSON code from Gwibber to implement a search. (Gwibber itself, alas, doesn't work for me: it bombs out looking for the Gnome keyring. Too bad, looks like it might be a decent client.)

I hacked up a "search for OSCON" program and used it a little during the week of the conference, then got home and absorbed in catching up and preparing for next week's GetSET summer camp, where I'm running an astronomy workshop and a Javascript workshop for high school girls. That's been keeping me frazzled, but I found a little time last night to clean up the search code and release Twit 0.3 with search and a few other new command-line arguments.

No big deal, but it was nice to take a hacking break from all this workshop coordinating. I'm definitely happier program than I am organizing events, that's for sure.

Tags: , ,
[ 18:23 Aug 03, 2009    More programming | permalink to this entry | ]

Thu, 09 Jul 2009

Twittering -- and writing Twitter clients

I finally dragged myself into 2009 and tried Twitter.

I'd been skeptical, but it's actually fairly interesting and not that much of a time sink. While it's true that some people tweet about every detail of their lives -- "I'm waiting for a bus" / "Oh, hooray, the bus is finally here" / "I got a good seat in the second row of the bus" / "The bus just passed Second St. and two kids got on" / "Here's a blurry photo from my phone of the Broadway Av. sign as we pass it" -- it's easy enough to identify those people and un-follow them.

And there are tons of people tweeting about interesting stuff. It's like a news ticker, but customizable -- news on the latest protests in Iran, the latest progress on freeing the Mars Spirit Rover, the latest interesting publication on dinosaur fossils, and what's going on at that interesting conference halfway around the world.

The trick is to figure out how you want the information delivered. I didn't want to have to leave a tab open in Firefox all the time. There was an xchat plug-in that sounded perfect -- I have an xchat window up most of the time I'm online -- but it turned out it works by picking one of the servers you're connected to, making a private channel and posting things there. That seemed abusive to the server -- what if everyone on Freenode did that?

So I wanted a separate client. Something lightweight and simple. Unfortunately, all the Twitter clients available for Linux either require that I install a lot of infrastructure first (either Adobe Air or Mono), or they just plain didn't work (a Twitter client where you can't click on links? Come on!)

But then I tried out the Python-Twitter bindings, and they were so easy to use I decided to write them up for my next Linux Planet article, which came out today: Write Your Own Linux Twitter Client In Less Time Than It Takes To Find One!.

The article shows how to use the bindings to write a bare-bones client. But of course, I've been hacking on the client all along, so the one I'm actually using has a lot more features like *ahem* letting you click on links. And letting you block threads, though I haven't actually tested that since I haven't seen any threads I wanted to block since my first day.

You can download the current version of Twit, and anyone who's interested can follow me on Twitter. I don't promise to be interesting -- that's up to you to decide -- but I do promise not to tweet about every block of my bus ride.

Tags: , , ,
[ 16:09 Jul 09, 2009    More writing | permalink to this entry | ]

Sat, 20 Jun 2009

Pytopo 0.8 released

On my last Mojave trip, I spent a lot of the evenings hacking on PyTopo.

I was going to try to stick to OpenStreetMap and other existing mapping applications like TangoGPS, a neat little smartphone app for downloading OpenStreetMap tiles that also runs on the desktop -- but really, there still isn't any mapping app that works well enough for exploring maps when you have no net connection.

In particular, uploading my GPS track logs after a day of mapping, I discovered that Tango really wasn't a good way of exploring them, and I already know Merkaartor, nice as it is for entering new OSM data, isn't very good at working offline. There I was, with PyTopo and a boring hotel room; I couldn't stop myself from tweaking a bit.

Adding tracklogs was gratifyingly easy. But other aspects of the code bother me, and when I started looking at what I might need to do to display those Tango/OSM tiles ... well, I've known for a while that some day I'd need to refactor PyTopo's code, and now was the time.

Surprisingly, I completed most of the refactoring on the trip. But even after the refactoring, displaying those OSM tiles turned out to be a lot harder than I'd hoped, because I couldn't find any reliable way of mapping a tile name to the coordinates of that tile. I haven't found any documentation on that anywhere, and Tango and several other programs all do it differently and get slightly different coordinates. That one problem was to occupy my spare time for weeks after I got home, and I still don't have it solved.

But meanwhile, the rest of the refactoring was done, nice features like track logs were working, and I've had to move on to other projects. I am going to finish the OSM tile MapCollection class, but why hold up a release with a lot of useful changes just for that?

So here's PyTopo 0.8, and the couple of known problems with the new features will have to wait for 0.9.

Tags: , , , , ,
[ 20:49 Jun 20, 2009    More programming | permalink to this entry | ]

Fri, 19 Jun 2009

Python: show all methods in a given object or module

A silly little thing, but something that Python books mostly don't mention and I can never find via Google:

How do you find all the methods in a given class, object or module?

Ideally the documentation would tell you. Wouldn't that be nice? But in the real world, you can't count on that, and examining all of an object's available methods can often give you a good guess at how to do whatever you're trying to do.

Python objects keep their symbol table in a dictionary called __dict__ (that's two underscores on either end of the word). So just look at object.__dict__. If you just want the names of the functions, use object.__dict__.keys().

Thanks to JanC for suggesting dir(object) and help(object), which can be more helpful -- not all objects have a __dict__.

Tags: , ,
[ 12:44 Jun 19, 2009    More programming | permalink to this entry | ]

Sun, 14 Jun 2009

Programming With PyGTK, part 3: Key events and object oriented Python

Part 3 of "Graphical Python Programming With PyGTK" uses object-oriented Python to clean up the code from Part 2, and also adds handling of key events to get rid of that silly Quit button. PythonGTK Programming part 3: Screensaver, Objects, and User Input

Tags: , ,
[ 12:18 Jun 14, 2009    More writing | permalink to this entry | ]

Mon, 01 Jun 2009

A GPX file manager

Someone on the OSM newbies list asked how he could strip waypoints out of a GPX track file. Seems he has track logs of an interesting and mostly-unmapped place that he wants to add to openstreetmap, but there are some waypoints that shouldn't be included, and he wanted a good way of separating them out before uploading.

Most of the replies involved "just edit the XML." Sure, GPX files are pretty simple and readable XML -- but a user shouldn't ever have to do that! Gpsman and gpsbabel were also mentioned, but they're not terribly easy to use either.

That reminded me that I had another XML-parsing task I'd been wanting to write in Python: a way to split track files from my Garmin GPS.

Sometimes, after a day of mapping, I end up with several track segments in the same track log file. Maybe I mapped several different trails; maybe I didn't get a chance to upload one day's mapping before going out the next day. Invariably some of the segments are of zero length (I don't know why the Garmin does that, but it always does). Applications like merkaartor don't like this one bit, so I usually end up editing the XML file and splitting it into segments by hand. I'm comfortable with XML -- but it's still silly.

I already have some basic XML parsing as part of PyTopo and Ellie, so I know the parsing very easy to do. So, spurred on by the posting on OSM-newbies, I wrote a little GPX parser/splitter called gpxmgr. gpxmgr -l file.gpx can show you how many track logs are in the file; gpxmgr -w file.gpx can write new files for each non-zero track log. Add -p if you want to be prompted for each filename (otherwise it'll use the name of the track log, which might be something like "ACTIVE\ LOG\ #2").

How, you may wonder, does that help the original poster's need to separate out waypoints from track files? It doesn't. See, my GPS won't save tracklogs and waypoints in the same file, even if you want them that way; you have to use two separate gpsbabel commands to upload a track file and a waypoint file. So I don't actually know what a tracklog-plus-waypoint file looks like. If anyone wants to use gpxmgr to manage waypoints as well as tracks, send me a sample GPX file that combines them both.

Tags: , , ,
[ 20:43 Jun 01, 2009    More mapping | permalink to this entry | ]

Thu, 28 May 2009

Programming With PyGTK, part 2: pretty screensaver-type graphics

Part 2 of Graphical Python Programming With PyGTK gets into how to do some cool Qix screensaver-style graphics, in: Graphical Python Programming With PyGTK, part 2: Write Your Own Screensaver.

There's also a digg link.

Tags: , ,
[ 18:09 May 28, 2009    More writing | permalink to this entry | ]

Thu, 14 May 2009

Graphical Python Programming With PyGTK

This week's Linux Planet article is another one on Python and graphical toolkits, but this time it's a little more advanced: Graphical Python Programming With PyGTK.

This one started out as a fun and whizzy screensaver sort of program that draws lots of pretty colors -- but I couldn't quite fit it all into one article, so that will have to wait for the sequel two weeks from now.

Tags: , ,
[ 19:53 May 14, 2009    More writing | permalink to this entry | ]

Thu, 23 Apr 2009

Linux Planet: GIMP Python Plugins, part II

Latest Linux Planet article: How to write a "blobify" GIMP plug-in in Python to make text look three-dimensional.

Creating a Fancy 3D-Effect GIMP Plugin in Python.

Tags: , ,
[ 11:46 Apr 23, 2009    More writing | permalink to this entry | ]

Thu, 09 Apr 2009

Linux Planet: Writing Plugins for GIMP in Python

Latest Linux Planet article: Part 1 of a two-parter on Writing GIMP scripts in Python. As usual, there's a Digg link too.

Tags: , ,
[ 22:21 Apr 09, 2009    More writing | permalink to this entry | ]

Thu, 26 Mar 2009

GUI Programming in Python For Beginners

Latest on Linux Planet: another introductory programming article, this time on Python's tkinter library: GUI Programming in Python For Beginners. (As usual, there's a Digg link and also a Reddit one.)

Tags: , ,
[ 16:36 Mar 26, 2009    More writing | permalink to this entry | ]

Tue, 03 Mar 2009

Ellie: Plot GPS elevation profiles

Ever since I got the GPS I've been wanting something that plots the elevation data it stores. There are lots of apps that will show me the track I followed in latitude and longitude, but I couldn't find anything that would plot elevations.

But GPX (the XML-based format commonly used to upload track logs) is very straightforward -- you can look at the file and read the elevations right out of it. I knew it wouldn't be hard to write a script to plot them in Python; it just needed a few quiet hours. Sounded like just the ticket for a rainy day stuck at home with a sore throat.

Sure enough, it was fairly easy. I used xml.dom.minidom to parse the file (I'd already had some experience with it in gimplabels for converting gLabels templates), and pylab from matplotlib for doing the plotting. Easy and nice looking.

I even threw in the nice "conditional main" code from Matt Harrison's SCALE7x Python talk, so it should be callable from other Python code.

Here's the page and a screenshot: Ellie: plot elevation from a GPS track.

Tags: , ,
[ 17:57 Mar 03, 2009    More programming | permalink to this entry | ]

Sat, 28 Feb 2009

langgrep: search only in scripts of a specified language

I was making a minor tweak to my garmin script that uses gpsbabel to read in tracklogs and waypoints from my GPS unit, and I needed to look up the syntax of how to do some little thing in sh script. (One of the hazards of switching languages a lot: you forget syntax details and have to look things up a lot, or at least I do.)

I have quite a collection of scripts in various languages in my ~/bin (plus, of course, all the scripts normally installed in /usr/bin on any Linux machine) so I knew I'd have lots of examples. But there are scripts of all languages sharing space in those directories; it's hard to find just sh examples. For about the two-hundredth time, I wished, "Wouldn't it be nice to have a command that can search for patterns only in files that are really sh scripts?"

And then, the inevitable followup ... "You know, that would be really easy to write."

So I did -- a little python hack called langgrep that takes a language, grep arguments and a file list, looks for a shebang line and only greps the files that have a shebang matching the specified language.

Of course, while writing langgrep I needed langgrep, to look up details of python syntax for things like string.find (I can never remember whether it's string.find(s, pat) or s.find(pat); the python libraries are usually nicely object-oriented but strings are an exception and it's the former, string.find). I experimented with various shell options -- this is Unix, so of course there are plenty of ways of doing this in the shell, without writing a script. For instance:

grep find `egrep -l '#\\!.*python' *`
grep find `file * | grep python | sed 's/:.*//'`
i in foo; file $i|grep python && grep find $i; done    # in sh/bash
These are all pretty straightforward, but when I try to make them into tcsh aliases things get a lot trickier. tcsh lets you make aliases that take arguments, so you can use !:1 to mean the first argument, !2-$ to mean all the arguments starting with the second one. That's all very well, but when you put them into a shell alias in a file like .cshrc that has to be parsed, characters like ! and $ can mean other things as well, so you have to escape them with \. So the second of those three lines above turns into something like
alias greplang "grep \!:2-$ `file * | grep \!:1 | sed 's/:.*//'`"
except that doesn't work either, so it probably needs more escaping somewhere. Anyway, I decided after a little alias hacking that figuring out the right collection of backslash escapes would probably take just as long as writing a python script to do the job, and writing the python script sounded more fun.

So here it is: my langgrep script. (Awful name, I know; better ideas welcome!) Use it like this (if python is the language you're looking for, find is the search pattern, and you want -w to find only "find" as a whole word):

langgrep python -w find ~/bin/*

Tags: , ,
[ 10:57 Feb 28, 2009    More programming | permalink to this entry | ]

Sat, 16 Aug 2008

Fast Pixel Ops in GIMP-Python

Last night Joao and I were on IRC helping someone who was learning to write gimp plug-ins. We got to talking about pixel operations and how to do them in Python. I offered my arclayer.py as an example of using pixel regions in gimp, but added that C is a lot faster for pixel operations. I wondered if reading directly from the tiles (then writing to a pixel region) might be faster.

But Joao knew a still faster way. As I understand it, one major reason Python is slow at pixel region operations compared to a C plug-in is that Python only writes to the region one pixel at a time, while C can write batches of pixels by row, column, etc. But it turns out you can grab a whole pixel region into a Python array, manipulate it as an array then write the whole array back to the region. He thought this would probably be quite a bit faster than writing to the pixel region for every pixel.

He showed me how to change the arclayer.py code to use arrays, and I tried it on a few test layers. Was it faster? I made a test I knew would take a long time in arclayer, a line of text about 1500 pixels wide. Tested it in the old arclayer; it took just over a minute to calculate the arc. Then I tried Joao's array version: timing with my wristwatch stopwatch, I call it about 1.7 seconds. Wow! That might be faster than the C version.

The updated, fast version (0.3) of arclayer.py is on my arclayer page.

If you just want the trick to using arrays, here it is:

from array import array

[ ... setting up ... ]
        # initialize the regions and get their contents into arrays:
        srcRgn = layer.get_pixel_rgn(0, 0, srcWidth, srcHeight,
                                     False, False)
        src_pixels = array("B", srcRgn[0:srcWidth, 0:srcHeight])

        dstRgn = destDrawable.get_pixel_rgn(0, 0, newWidth, newHeight,
                                            True, True)
        p_size = len(srcRgn[0,0])               
        dest_pixels = array("B", "\x00" * (newWidth * newHeight * p_size))

[ ... then inside the loop over x and y ... ]
                        src_pos = (x + srcWidth * y) * p_size
                        dest_pos = (newx + newWidth * newy) * p_size
                        
                        newval = src_pixels[src_pos: src_pos + p_size]
                        dest_pixels[dest_pos : dest_pos + p_size] = newval

[ ... when the loop is all finished ... ]
        # Copy the whole array back to the pixel region:
        dstRgn[0:newWidth, 0:newHeight] = dest_pixels.tostring() 

Good stuff!

Tags: , , ,
[ 22:02 Aug 16, 2008    More gimp | permalink to this entry | ]

Sun, 25 May 2008

Crikey in Python, and generating key events with XTest

A user on the One Laptop Per Child (OLPC, also known as the XO) platform wrote to ask me how to use crikey on that platform.

There are two stages to getting crikey running on a new platform:

  1. Build it, and
  2. Figure out how to make a key run a specific program.

The crikey page contains instructions I've collected for binding keys in various window managers, since that's usually the hard part. On normal Linux machines the first step is normally no problem. But apparently the OLPC comes with gcc but without make or the X header files. (Not too surprising: it's not a machine aimed at developers and I assume most people developing for the machine cross-compile from a more capable Linux box.)

We're still working on that (if my correspondant gets it working, I'll post the instructions), but while I was googling for information about the OLPC's X environment I stumbled upon a library I didn't know existed: python-xlib. It turns out it's possible to do most or all of what crikey does from Python. The OLPC is Python based; if I could write crikey in Python, it might solve the problem. So I whipped up a little key event generating script as a test.

Unfortunately, it didn't solve the OLPC problem (they don't include python-xlib on the machine either) but it was a fun exercises, and might be useful as an example of how to generate key events in python-xlib. It supports both event generating methods: the X Test extension and XSendEvent. Here's the script: /pykey-0.1.

But while I was debugging the X Test code, I had to solve a bug that I didn't remember ever solving in the C version of crikey. Sure enough, it needed the same fix I'd had to do in the python version. Two fixes, actually. First, when you send a fake key event through XTest, there's no way to specify a shift mask. So if you need a shifted character like A, you have to send KeyPress Shift, KeyPress a. But if that's all you send, XTest on some systems does exactly what the real key would do if held down and never released: it autorepeats. (But only for a little while, not forever. Go figure.)

So the real answer is to send KeyPress Shift, KeyPress a, KeyRelease a, KeyRelease Shift. Then everything works nicely. I've updated crikey accordingly and released version 0.7 (though since XTest isn't used by default, most users won't see any change from 0.6). In the XSendEvent case, crikey still doesn't send the KeyRelease event -- because some systems actually see it as another KeyPress. (Hey, what fun would computers be if they were consistent and always predictable, huh?)

Both C and Python versions are linked off the crikey page.

Tags: , , ,
[ 15:50 May 25, 2008    More programming | permalink to this entry | ]

Fri, 12 Oct 2007

PyTopo and PyGTK pixbuf memory leakage

On a recent Mojave desert trip, we tried to follow a minor dirt road that wasn't mapped correctly on any of the maps we had, and eventually had to retrace our steps. Back at the hotel, I fired up my trusty PyTopo on the East Mojave map set and tried to trace the road. But I found that as I scrolled along the road, things got slower and slower until it just wasn't usable any more.

PyTopo was taking up all of my poor laptop's memory. Why? Python is garbage collected -- you're not supposed to have to manage memory explicitly, like freeing pixbufs. I poked around in all the sample code and man pages I had available but couldn't find any pygtk examples that seemed to be doing any explicit freeing.

When we got back to civilization (read: internet access) I did some searching and found the key. It's even in the PyGTK Image FAQ, and there's also some discussion in a mailing list thread from 2003.

Turns out that although Python is supposed to handle its own garbage collection, the Python interpreter doesn't grok the size of a pixbuf object; in particular, it doesn't see the image bits as part of the object's size. So dereferencing lots of pixbuf objects doesn't trigger any "enough memory has been freed that it's time to run the garbage collector" actions.

The solution is easy enough: call gc.collect() explicitly after drawing a map (or any other time a bunch of pixbufs have been dereferenced).

So there's a new version of PyTopo, 0.6 that should run a lot better on small memory machines, plus a new collection format (yet another format from the packaged Topo! map sets) courtesy of Tom Trebisky.

Oh ... in case you're wondering, the ancient USGS maps from Topo! didn't show the road correctly either.

Tags: , , ,
[ 22:21 Oct 12, 2007    More programming | permalink to this entry | ]

Tue, 04 Sep 2007

Egg Timer in Python and TkInter

I left the water on too long in the garden again. I keep doing that: I'll set up something where I need to check back in five minutes or fifteen minutes, then I get involved in what I'm doing and 45 minutes later, the cornbread is burnt or the garden is flooded.

When I was growing up, my mom had a little mechanical egg timer. You twist the dial to 5 minutes or whatever, and it goes tick-tick-tick and then DING! I could probably find one of those to buy (they're probably all digital now and include clocks and USB plugs and bluetooth ports) but since the problem is always that I'm getting distracted by something on the computer, why not run an app there?

Of course, you can do this with shell commands. The simple solution is:

(sleep 300; zenity --info --text="Turn off the water!") &

But the zenity dialogs are small -- what if I don't notice it? -- and besides, I have to multiply by 60 to turn a minute delay into sleep seconds. I'm lazy -- I want the computer to do that for me!
Update: Ed Davies points out that "sleep 5m" also works.

A slightly more elaborate solution is at. Say something like: at now + 15 minutes and when it prompts for commands, type something like:

export DISPLAY=:0.0
zenity --info --text="Your cornbread is ready"
to pop up a window with a message. But that's too much typing and has the same problem of the small easily-ignored dialogs. I'd really rather have a great big red window that I can't possibly miss.

Surely, I thought, someone has already written a nice egg-timer application! I tried aptitude search timer and found several apps such as gtimer, which is much more complicated than I wanted (you can define named events and choose from a list of ... never mind, I stopped reading there). I tried googling, but didn't have much luck there either (lots of Windows and web apps, no Linux apps or cross-platform scripts).

Clearly just writing the damn thing was going to be easier than finding one. (Why is it that every time I want to do something simple on a computer, I have to write it? I feel so sorry for people who don't program.)

I wanted to do it in python, but what to use for the window that pops up? I've used python-gtk in the past, but I've been meaning to check out TkInter (the gui toolkit that's kinda-sorta part of Python) and this seemed like a nice opportunity since the goal was so simple.

The resulting script: eggtimer. Call it like this:

eggtimer 5 Turn off the water
and in five minutes, it will pop up a huge red window the size of the screen with your message in big letters. (Click it or hit a key to dismiss it.)

First Impressions of TkInter

It was good to have an excuse to try TkInter and compare it with python-gtk. TkInter has been recommended as something normally installed with Python, so the user doesn't have to install anything extra. This is apparently true on Windows (and maybe on Mac), but on Ubuntu it goes the other way: I already had pygtk, because GIMP uses it, but to use TkInter I had to install python-tk.

For developing I found TkInter irritating. Most of the irritation concerned the poor documentation: there are several tutorials demonstrating very basic uses, but not much detailed documentation for answering questions like "What class is the root Tk() window and what methods does it have?" (The best I found -- which never showed up in google, but was referenced from O'Reilly's Programming Python -- was here.) In contrast, python-gtk is very well documented.

Things I couldn't do (or, at least, couldn't figure out how to do, and googling found only postings from other people wanting to do the same thing):

I expect I'll be sticking with pygtk for future projects. It's just too hard figuring things out with no documentation. But it was fun having an excuse to try something new.

Tags: , ,
[ 14:35 Sep 04, 2007    More programming | permalink to this entry | ]

Fri, 25 Aug 2006

PyTopo 0.5

Belated release announcement: 0.5b2 of my little map viewer PyTopo has been working well, so I released 0.5 last week with only a few minor changes from the beta. I'm sure I'll immediately find six major bugs -- but hey, that's what point releases are for. I only did betas this time because of the changed configuration file format.

I also made a start on a documentation page for the .pytopo file (though it doesn't really have much that wasn't already written in comments inside the script).

Tags: , , , , , ,
[ 22:10 Aug 25, 2006    More programming | permalink to this entry | ]

Sat, 03 Jun 2006

Cleaner, More Flexible Python Map Viewing

A few months ago, someone contacted me who was trying to use my PyTopo map display script for a different set of map data, the Topo! National Parks series. We exchanged some email about the format the maps used.

I'd been wanting to make PyTopo more general anyway, and already had some hacky code in my local version to let it use a local geologic map that I'd chopped into segments. So, faced with an Actual User (always a good incentive!), I took the opportunity to clean up the code, use some of Python's support for classes, and introduce several classes of map data.

I called it 0.5 beta 1 since it wasn't well tested. But in the last few days, I had occasion to do some map exploring, cleaned up a few remaining bugs, and implemented a feature which I hadn't gotten around to implementing in the new framework (saving maps to a file).

I think it's ready to use now. I'm going to do some more testing: after visiting the USGS Open House today and watching Jim Lienkaemper's narrated Virtual Tour of the Hayward Fault, I'm all fired up about trying again to find more online geologic map data. But meanwhile, PyTopo is feature complete and has the known bugs fixed. The latest version is on the PyTopo page.

Tags: , , , , ,
[ 18:25 Jun 03, 2006    More programming | permalink to this entry | ]

Tue, 21 Jun 2005

A Fast Volume Control App

I updated my Debian sid system yesterday, and discovered today that gnome-volume-control has changed their UI yet again. Now the window comes up with two tabs, Playback and Capture; the default tab, Playback, has only one slider in it, PCM, and all the important sliders, like Volume, are under Capture. (I'm told this is some interaction with how ALSA sees my sound chip.)

That's just silly. I've never liked the app anyway -- it takes forever to come up, so I end up missing too much of any clip that starts out quiet. All I need is a simple, fast window with a single slider controlling master volume. But nothing like that seems to exist, except panel applets that are tied to the panels of particular window managers.

So I wrote one, in PyGTK. vol is a simple script which shows a slider, and calls aumix under the hood to get and set the volume. It's horizontal by default; vol -h gives a vertical slider.

Aside: it's somewhat amazing that Python has no direct way to read an integer out of a string containing more than just that integer: for example, to read 70 out of "70,". I had to write a function to handle that. It's such a terrific no-nonsense language most of the time, yet so bad at a few things. (And when I asked about a general solution in the python channel at [large IRC network], I got a bunch of replies like "use int(str[0:2])" and "use int(str[0:-1])". Shock and bafflement ensued when I pointed out that 5, 100, and -27 are all integers too and wouldn't be handled by those approaches.)

Tags: , , ,
[ 15:54 Jun 21, 2005    More programming | permalink to this entry | ]

Wed, 13 Apr 2005

PyTopo 0.3

I needed to print some maps for one of my geology class field trips, so I added a "save current map" key to PyTopo (which saves to .gif, and then I print it with gimp-print). It calls montage from Image Magick.

Get yer PyTopo 0.3 here.

Tags: , , , ,
[ 17:56 Apr 13, 2005    More programming | permalink to this entry | ]

Sat, 09 Apr 2005

Python Expose vs. Focus

A few days ago, I mentioned my woes regarding Python sending spurious expose events every time the drawing area gains or loses focus.

Since then, I've spoken with several gtk people, and investigated several workarounds, which I'm writing up here for the benefit of anyone else trying to solve this problem.

First, "it's a feature". What's happening is that the default focus in and out handlers for the drawing area (or perhaps its parent class) assume that any widget which gains keyboard focus needs to redraw its entire window (presumably because it's locate-highlighting and therefore changing color everywhere?) to indicate the focus change. Rather than let the widget decide that on its own, the focus handler forces the issue via this expose event. This may be a bad decision, and it doesn't agree with the gtk or pygtk documentation for what an expose event means, but it's been that way for long enough that I'm told it's unlikely to be changed now (people may be depending on the current behavior).

Especially if there are workarounds -- and there are.

I wrote that this happened only in pygtk and not C gtk, but I was wrong. The spurious expose events are only passed if the CAN_FOCUS flag is set. My C gtk test snippet did not need CAN_FOCUS, because the program from which it was taken, pho, already implements the simplest workaround: put the key-press handler on the window, rather than the drawing area. Window apparently does not have the focus/expose misbehavior.

I worry about this approach, though, because if there are any other UI elements in the window which need to respond to key events, they will never get the chance. I'd rather keep the events on the drawing area.

And that becomes possible by overriding the drawing area's default focus in/out handlers. Simply write a no-op handler which returns TRUE, and set it as the handler for both focus-in and focus-out. This is the solution I've taken (and I may change pho to do the same thing, though it's unlikely ever to be a problem in pho).

In C, there's a third workaround: query the default focus handlers, and disconnect() them. That is a little more efficient (you aren't calling your nop routines all the time) but it doesn't seem to be possible from pygtk: pygtk offers disconnect(), but there's no way to locate the default handlers in order to disconnect them.

But there's a fourth workaround which might work even in pygtk: derive a class from drawing area, and set the focus in and out handlers to null. I haven't actually tried this yet, but it may be the best approach for an app big enough that it needs its own UI classes.

One other thing: it was suggested that I should try using AccelGroups for my key bindings, instead of a key-press handler, and then I could even make the bindings user-configurable. Sounded great! AccelGroups turn out to be very easy to use, and a nice feature. But they also turn out to have undocumented limitations on what can and can't be an accelerator. In particular, the arrow keys can't be accelerators; which makes AccelGroup accelerators less than useful for a widget or app that needs to handle user-initiated scrolling or movement. Too bad!

Tags: , , ,
[ 21:52 Apr 09, 2005    More programming | permalink to this entry | ]

Wed, 06 Apr 2005

PyTopo is usable; pygtk is inefficient

While on vacation, I couldn't resist tweaking pytopo so that I could use it to explore some of the areas we were visiting.

It seems fairly usable now. You can scroll around, zoom in and out to change between the two different map series, and get the coordinates of a particular location by clicking. I celebrated by making a page for it, with a silly tux-peering-over-map icon.

One annoyance: it repaints every time it gets a focus in or out, which means, for people like me who use mouse focus, that it repaints twice for each time the mouse moves over the window. This isn't visible, but it would drag the CPU down a bit on a slow machine (which matters since mapping programs are particularly useful on laptops and handhelds).

It turns out this is a pygtk problem: any pygtk drawing area window gets spurious Expose events every time the focus changes (whether or not you've asked to track focus events), and it reports that the whole window needs to be repainted, and doesn't seem to be distinguishable in any way from a real Expose event. The regular gtk libraries (called from C) don't do this, nor do Xlib C programs; only pygtk.

I filed bug 172842 on pygtk; perhaps someone will come up with a workaround, though the couple of pygtk developers I found on #pygtk couldn't think of one (and said I shouldn't worry about it since most people don't use pointer focus ... sigh).

Tags: , , , , , ,
[ 17:26 Apr 06, 2005    More programming | permalink to this entry | ]

Sun, 27 Mar 2005

Python GTK Topographic Map Program

I couldn't stop myself -- I wrote up a little topo map viewer in PyGTK, so I can move around with arrow keys or by clicking near the edges. It makes it a lot easier to navigate the map directory if I don't know the exact starting coordinates.

It's called PyTopo, and it's in the same place as my earlier two topo scripts.

I think CoordsToFilename has some bugs; the data CD also has some holes, and some directories don't seem to exist in the expected place. I haven't figured that out yet.

Tags: , , , , ,
[ 18:53 Mar 27, 2005    More programming | permalink to this entry | ]

Topographic Maps for Linux

I've long wished for something like those topographic map packages I keep seeing in stores. The USGS (US Geological Survey) sells digitized versions of their maps, but there's a hefty setup fee for setting up an order, so it's only reasonable when buying large collections all at once.

There are various Linux mapping applications which do things like download squillions of small map sections from online mapping sites, but they're all highly GPS oriented and I haven't had much luck getting them to work without one. I don't (yet?) have a GPS; but even if I had one, I usually want to make maps for places I've been or might go, not for where I am right now. (I don't generally carry a laptop along on hikes!)

The Topo! map/software packages sold in camping/hiking stores (sometimes under the aegis of National Geographic are very reasonably priced. But of course, the software is written for Windows (and maybe also Mac), not much help to Linux users, and the box gives no indication of the format of the data. Googling is no help; it seems no Linux user has ever tried buying one of these packages to see what's inside. The employees at my local outdoor equipment store (Mel Cotton's) were very nice without knowing the answer, and offered the sensible suggestion of calling the phone number on the box, which turns out to be a small local company, "Wildflower Productions", located in San Francisco.

Calling Wildflower, alas, results in an all too familiar runaround: a touchtone menu tree where no path results in the possibility of contact with a human. Sometimes I wonder why companies bother to list a phone number at all, when they obviously have no intention of letting anyone call in.

Concluding that the only way to find out was to buy one, I did so. A worthwhile experiment, as it turned out! The maps inside are simple GIF files, digitized from the USGS 7.5-minute series and, wonder of wonders, also from the discontinued but still useful 15-minute series. Each directory contains GIF files covering the area of one 7.5 minute map, in small .75-minute square pieces, including pieces of the 15-minute map covering the same area.

A few minutes of hacking with python and Image Magick resulted in a script to stitch together all images in one directory to make one full USGS 7.5 minute map; after a few hours of hacking, I can stitch a map of arbitrary size given start and end longitude and latitude. My initial scripts, such as they are.

Of course, I don't yet have nicities like a key, or an interactive scrolling window, or interpretation of the USGS digital elevation data. I expect I have more work to do. But for now, just being able to generate and print maps for a specific area is a huge boon, especially with all the mapping we're doing in Field Geology class. GIMP's "measure" tool will come in handy for measuring distances and angles!

Tags: , , ,
[ 12:13 Mar 27, 2005    More programming | permalink to this entry | ]