I'm not sure where they got that idea; more science-leaning resources, like
Universe Today
and
Science Alert,
say 2024 is an "off" year for the Leonids,
with an expected Zenithal Hourly Rate (ZHR) of 15-20 meteors per hour
even with ideal conditions, which we don'e have because of an
almost-full moon.
For Day 15 of the 30 Day Map Challenge, "My Data", I'm highlighting
a feature I added to PyTopo
last week: the ability to read GPS tags in image files.
JPEG, and probably other image formats as well, lets you store GPS
coordinates inside the EXIF (EXchangeable Image File format) metadata
stored within each image file.
I like to think of myself as an outdoor person. I like hiking,
mountain biking, astronomy, and generally enjoying the beauty of
the world.
Except — let's not kid ourselves here — I'm
really more of a computer geek.
Without some sort of push, I can easily stay planted on my butt
in front of the computer all day — sure, looking out the window
and admiring the view (I do a lot of that since we moved to New Mexico),
but still sitting indoors in the computer chair.
Earlier this year,
the science podcast "Short Wave" played an NPR series called
Body
Electric that had a lot of interviews with scientists who have
studied some aspect of the health benefits of motion versus sitting,
and specifically, the idea of getting up and moving around for five
minutes every half hour. They challenged listeners to try it, and
featured statements from listeners about their improved health and
energy levels.
I was talking to a friend about LANL's proposed new powerline.
A lot of people are opposing it because
the line would run through the Caja del Rio, an open-space
piñon-juniper area adjacent to Santa Fe which is owned by the
US Forest Service.
The proposed powerline would run from the Caja across the Rio Grande to the Lab.
It would carry not just power but also a broadband fiber line, something
Los Alamos town, if not the Lab, needs badly.
On the other hand, those opposed worry about
road-building and habitat destruction in the Caja.
I'm always puzzled reading accounts of the debate. There already is a
powerline running through the Caja and across the Rio via Powerline Point.
The discussions never say (a) whether the proposed
line would take a different route, and if so, (b) Why? why can't they
just tack on some more lines to the towers along the existing route?
For instance, in the slides from one of the public meetings, the
map
on slide 9
not only doesn't show the existing powerline, but also
uses a basemap that has no borders and NO ROADS. Why would you use a
map that doesn't show roads unless you're deliberately trying to
confuse people?
I stumbled onto the page for this year's Asimov's Magazine
Readers'
Award Finalists. They offer all the stories right there --
but only as PDF. I prefer reading fiction on my ebook reader (a Kobo
Clara with 6" screen), away from the computer. I spend too much time
sitting at the computer as it is. But trying to read a PDF on a 6" screen
is just painful.
The open-source ebook program Calibre has a command-line program called
ebook-convert that can convert some PDF to epub. It did an
okay job in this case — except that the PDFs had the wrong
author name (they all have the same author, so I'm guessing it's the
name of the person who prepared the PDFs for Asimov's), and the wrong
title information (or maybe just no title), and ebook-convert
compounded that error by generating cover images for each work that had
the wrong title and author.
I went through the files and fixed each one's title and author metadata
using my
epubtag.py
Python script. But what about the cover images? I wasn't eager to spend
the time GIMPing up a cover image by hand for each of the stories.
I mentioned last month that I'm learning guitar. It's been going well
and I'm having fun. But I've gotten to the point where I sometimes get
chords confused: a song is listed as using E major and I play D major
instead.
Also, it's important to practice transitions between chords,
which is easy when you only know three chords; but with eight or so,
I had stopped practicing transitions in general and was only practicing
the ones that occur in songs I like to play.
I found myself wishing I had something like flash cards for guitar chords.
Someone must have already written that, right? But I couldn't find
anything promising with a web search. And besides, it's more fun to
write programs than to flail at unhelpful search engines, and you
always end up learning something new.
I maintain quite a few small websites. I have several of my own under
different domains (shallowsky.com, nmbilltracker.com and so forth),
plus a few smaller projects like flask apps running on a different port.
In addition, I maintain websites for several organizations on a volunteer
basis (because if you join any volunteer organization and they find out
you're at all technical, that's the first job they want you to do).
I typically maintain a local copy of each website, so I can try out
any change locally first.
I've been relying more on my phone
for photos I take while hiking, rather than carry a separate camera.
The Pixel 6a takes reasonably good photos, if you can put up with
the wildly excessive processing Google's camera app does whether you
want it or not.
That opens the possibility of GPS tagging photos, so I'd
have a good record of where on the trail each photo was taken.
But as it turns out: no. It seems the GPS coordinates the Pixel's
camera app records in photos is always wrong, by a significant amount.
And, weirdly, this doesn't seem to be something anyone's talking
about on the web ... or am I just using the wrong search terms?
I had a need for a window to which I could drag and drop URLs.
I don't use drag-and-drop much, since I prefer using the commandline
rather than a file manager and icon-studded desktop.
Usually when I need some little utility and can't immediately find
what I need, I whip up a little Python script.
This time, it wasn't so easy. Python has a GUI problem (as does open
source in general): there are quite a few options, like TkInter, Qt, GTK,
WxWidgets and assorted others, and they all have different strengths and
(especially) weaknesses.
Drag-and-drop turns out to be something none of them do very well.
Somebody in a group I'm in has commented more than once that White
Rock is a hotbed of Republicanism whereas Los Alamos leans Democratic.
(For outsiders, our tiny county has two geographically-distinct towns
in it, with separate zip codes, though officially they're both part of
Los Alamos township which covers all of Los Alamos county.
White Rock is about half the size of Los Alamos.)
After I'd heard her say it a couple times, I got curious. Was it true?
I asked her for a reference, but she didn't have one. I decided to
find out.
Five years ago, I wrote about
Clicking through a translucent window: using X11 input shapes
and how I used a translucent image window that allows click-through,
positioned on top of PyTopo, to trace an image of an old map and
create tracks or waypoints.
But the transimageviewer.py app that I wrote then was based on
GTK2, which is now obsolete and has been removed from most Linux
distro repositories. So when I found myself wanting GIS to help
investigate a
growing trail controversy in Pueblo Canyon,
I discovered I didn't have a usable click-through image viewer.
I've spent a lot of the past week battling Russian spammers on
the New Mexico Bill Tracker.
The New Mexico legislature just began a special session to define the
new voting districts, which happens every 10 years after the census.
When new legislative sessions start, the BillTracker usually needs
some hand-holding to make sure it's tracking the new session. (I've
been working on code to make it notice new sessions automatically, but
it's not fully working yet). So when the session started, I checked
the log files...
and found them full of Russian spam.
Specifically, what was happening was that a bot was going to my
new user registration page and creating new accounts where the
username was a paragraph of Cyrillic spam.
I wrote at length about my explorations into
selenium
to fetch stories from the New York Times (as a subscriber).
But I mentioned in Part III that there was a much easier way
to fetch those stories, as long as the stories didn't need JavaScript.
That way is to use normal file fetching (using urllib or requests),
but with a CookieJar object containing the cookies from a Firefox
session where I'd logged in.
At a recent LUG meeting, we were talking about various uses for web
scraping, and someone brought up a Wikipedia game: start on any page,
click on the first real link, then repeat on the page that comes up.
The claim is that this chain always gets to Wikipedia's page on
Philosophy.
We tried a few rounds, and sure enough, every page we tried did
eventually get to Philosophy, usually via languages, which goes to
communication, goes to discipline, action, intention, mental, thought,
idea, philosophy.
It's a perfect game for a discussion of scraping. It should be an easy
exercise to write a scraper to do this, right?
Part III: Handling Errors and Timeouts (this article)
At the end of Part II, selenium was running on a server with the
minimal number of X and GTK libraries installed.
But now that it can run unattended, there's nother problem:
there are all kinds of ways this can fail,
and your script needs to handle those errors somehow.
Before diving in, I should mention that for my original goal,
fetching stories from the NY Times as a subscriber,
it turned out I didn't need selenium after all.
Since handling selenium errors turned out to be so brittle
(as I'll describe in this article), I'm now using requests
combined with a Python CookieJar. I'll write about that in a
future article. Meanwhile ...
Handling Errors and Timeouts
Timeouts are a particular problem with selenium,
because there doesn't seem to be any reliable way to change them
so the selenium script doesn't hang for ridiculously long periods.
When we left off, I was learning
the
basics of selenium in order to fetch stories (as a subscriber)
from the New York Times. Fetching stories was working properly,
and all that remained was to put it in an automated script, then
move it to a server where it could run automatically without my
desktop machine needing to be on.
Unfortunately, that turned out to be the hardest part of the problem.
At the New Mexico GNU & Linux User Group,
currently meeting virtually on Jitsi, someone expressed interest in scraping
websites. Since I do quite a bit of scraping, I offered to give
a tutorial on scraping with the Python module
BeautifulSoup.
"What about selenium?" he asked. Sorry, I said, I've never needed
selenium enough to figure it out.
I have another PEEC Planetarium talk coming up in a few weeks,
a talk on the
summer solstice
co-presenting with Chick Keller on Fri, Jun 18 at 7pm MDT.
I'm letting Chick do most of the talking about archaeoastronomy
since he knows a lot more about it than I do, while I'll be talking
about the celestial dynamics -- what is a solstice, what is the sun
doing in our sky and why would you care, and some weirdnesses relating
to sunrise and sunset times and the length of the day.
And of course I'll be talking about the analemma, because
just try to stop me talking about analemmas whenever the topic
of the sun's motion comes up.
But besides the analemma, I need a lot of graphics of the earth
showing the terminator, the dividing line between day and night.
In my eternal quest for a decent RSS feed for top World/National news,
I decided to try subscribing to the New York Times online.
But when I went to try to add them to my RSS reader, I discovered
it wasn't that easy: their login page sometimes gives a captcha, so
you can't just set a username and password in the RSS reader.
A common technique for sites like this is to log in with a browser,
then copy the browser's cookies into your news reading program.
At least, I thought it was a common technique -- but when I tried
a web search, examples were surprisingly hard to find.
None of the techniques to examine or save browser cookies were all
that simple, so I ended up writing a
browser_cookies.py
Python script to extract cookies from chromium and firefox browsers.
This year's New Mexico Legislative Session started Tuesday.
For the last few weeks I've been madly scrambling to make sure
the bugs are out of some of the
New Mexico Bill Tracker's
new features: notably, it now lets you switch between the current
session and past sessions, and I cleaned up the caching code that tries to
guard against hitting the legislative website too often.
I got a new phone. (Not something that happens often.)
Fun, right? Well, partly, but also something I'd been dreading.
I had a feeling that my ancient RSS reader, FeedViewer,
which I use daily to read all my news feeds,
probably wouldn't work under a modern Android
(I wrote it for KitKat and it was last updated under Marshmallow).
And that was correct.
I was doing some disk housekeeping and noticed that my venerable
image viewer,
Pho,
was at version 1.0pre1, and had been since 2017.
It's had only very minimal changes since that time.
I guess maybe it's been long enough that it's time to
remove that -pre1 moniker, huh?
Of course I couldn't leave it at that. There were a couple of very
minor bugs I'd been ignoring, when you delete from the end or
beginning of the image list. So I fixed those, bumped the version,
updated the web page, tagged the git tree and made a release.
Pho is now 1.0. About time!
Comet C/2020 F3 NEOWISE continues to improve, and as of Tuesday night
it has moved into the evening sky (while also still being visible in
the morning for a few more days).
I caught it Tuesday night at 9:30 pm. The sky was still a bit bright,
and although the comet was easy in binoculars, it was a struggle to see
it with the unaided eye. However, over the next fifteen minutes the sky
darkened, and it looked pretty good by 9:50, considering the partly
cloudy sky. I didn't attempt a photograph; this photo is from Sunday morning,
in twilight and with a bright moon.
I've learned not to get excited when I read about a new comet. They're
so often a disappointment. That goes double for comets in the morning
sky: I need a darned good reason to get up before dawn.
But the chatter among astronomers about the current comet, C2020 F3
NEOWISE, has been different. So when I found myself awake at 4 am,
I grabbed some binoculars and went out on the deck to look.
And I was glad I did. NEOWISE is by far the best comet I've seen
since Hale-Bopp. Which is not to say it's in Hale-Bopp's class --
certainly not. But it's easily visible to the unaided eye, with a
substantial several-degree-long tail. Even in dawn twilight. Even
with a bright moon. It's beautiful!
Update: the morning after I wrote that,
I did
get a photo,
though it's not nearly as good as Dbot3000's that's shown here.
Galen Gisler, our master of Planetarium Tricks,
presented something strange and cool in his planetarium show last Friday.
He'd been looking for a way to visualize
the "Venus Pentagram", a regularity where Venus'
inferior conjunctions -- the point where Venus is approximately
between Earth and the Sun -- follow a cycle of five.
If you plot the conjunction positions, you'll see a pentagram,
and the sixth conjunction will be almost (but not quite) in the
same place where the first one was.
Supposedly many ancient civilizations supposedly knew about this
pattern, though as Galen noted (and I'd also noticed when researching
my Stonehenge talk), the evidence is sometimes spotty.
Galen's latest trick: he moved the planetarium's observer location
up above the Earth's north ecliptic pole. Then he told the planetarium to
looked back at the Earth and lock the observer's position so it
moves along with the Earth; then he let the planets move in fast-forward,
leaving trails so their motions were plotted.
The result was fascinating to watch. You could see the Venus pentagram
easily as it made its five loops toward Earth, and the loops of all
the other planets as their distance from Earth changed over the course
of both Earth's orbits and theirs.
You can see the patterns they make at right, with the Venus pentagram
marked (click on the image for a larger version).
Venus' orbit is white, Mercury is yellow, Mars is red.
If you're wondering why Venus' orbit seems to go inside Mercury's,
remember: this is a geocentric model, so it's plotting distance from
Earth, and Venus gets both closer to and farther from Earth than Mercury does.
He said he'd shown this to the high school astronomy club and their
reaction was, "My, this is complicated." Indeed.
It gives insight into what a difficult problem geocentric astronomers
had in trying to model planetary motion, with their epicycles and
other corrections.
Of course that made me want one of my own. It's neat to watch it in
the planetarium, but you can't do that every day.
So: Python, Gtk/Cairo, and PyEphem. It's pretty simple, really.
The goal is to plot planet positions as viewed from high
above the north ecliptic pole: so for each time step, for each planet,
compute its right ascension and distance (declination doesn't matter)
and convert that to rectangular coordinates. Then draw a colored line
from the planet's last X, Y position to the new one. Save all the
coordinates in case the window needs to redraw.
At first I tried using Skyfield, the Python library which is supposed
to replace PyEphem (written by the same author). But Skyfield, while
it's probably more accurate, is much harder to use than PyEphem.
It uses SPICE kernels
(my blog post
on SPICE, some SPICE
examples and notes), which means there's no clear documentation or
list of which kernels cover what. I tried the kernels mentioned in the
Skyfield documentation, and after running for a while the program
died with an error saying its model for Jupiter in the de421.bsp kernel
wasn't good beyond 2471184.5 (October 9 2053).
Rather than spend half a day searching for other SPICE kernels,
I gave up on Skyfield and rewrote the program to use PyEphem,
which worked beautifully and amazed me with how much faster it was: I
had to rewrite my GTK code to use a timer just to slow it down to
where I could see the orbits as they developed!
It's fun to watch; maybe not quite as spacey as Galen's full-dome view
in the planetarium, but a lot more convenient.
You need Python 3, PyEphem and the usual GTK3 introspection modules;
on Debian-based systems I think the python3-gi-cairo package
will pull in most of them as dependencies.
An automatic plant watering system is a
project that's been on my back burner for years.
I'd like to be able to go on vacation and not worry about
whatever houseplant I'm fruitlessly nursing at the moment.
(I have the opposite of a green thumb -- I have very little luck
growing plants -- but I keep trying, and if nothing else, I can
make sure lack of watering isn't the problem.)
I've had all the parts sitting around for quite some time,
and had tried them all individually,
but never seemed to make the time to put them all together.
Today's "Raspberry Pi Jam" at Los Alamos Makers seemed like
the ideal excuse.
Sensing Soil Moisture
First step: the moisture sensor. I used a little moisture sensor that
I found on eBay. It says "YL-38" on it. It has the typical forked thingie
you stick into the soil, connected to a little sensor board.
The board has four pins: power, ground, analog and digital outputs.
The digital output would be the easiest: there's a potentiometer on
the board that you can turn to adjust sensitivity, then you can read
the digital output pin directly from the Raspberry Pi.
But I had bigger plans: in addition to watering, I wanted to
keep track of how fast the soil dries out, and update a
web page so that I could check my plant's status from anywhere.
For that, I needed to read the analog pin.
Raspberry Pis don't have a way to read an analog input.
(An Arduino would have made this easier, but then reporting to a
web page would have been much more difficult.)
So I used an ADS1115 16-bit I2sup>C Analog to Digital
Converter board from Adafruit, along with
Adafruit's
ADS1x15 library. It's written for CircuitPython, but it works
fine in normal Python on Raspbian.
It's simple to use. Wire power, ground, SDA and SDC to the appropriate
Raspberry Pi pins (1, 6, 3 and 5 respectively). Connect the soil
sensor's analog output pin with A0 on the ADC. Then
# Initialize the ADC
i2c = busio.I2C(board.SCL, board.SDA)
ads = ADS.ADS1015(i2c)
adc0 = AnalogIn(ads, ADS.P0)
# Read a value
value = adc0.value
voltage = adc0.voltage
With the probe stuck into dry soil, it read around 26,500 value, 3.3 volts.
Damp soil was more like 14528, 1.816V.
Suspended in water, it was more like 11,000 value, 1.3V.
Driving a Water Pump
The pump also came from eBay. They're under $5; search for terms like
"Mini Submersible Water Pump 5V to 12V DC Aquarium Fountain Pump Micro Pump".
As far as driving it is concerned, treat it as a motor. Which means you
can't drive it directly from a Raspberry Pi pin: they don't generate
enough current to run a motor, and you risk damaging the Pi with back-EMF
when the motor stops.
Instead, my go-to motor driver for small microcontroller projects is
a SN754410 SN754410 H-bridge chip. I've used them before for
driving
little cars with a Raspberry Pi or
with
an Arduino. In this case the wiring would be much simpler, because
there's only one motor and I only need to drive it in one direction.
That means I could hardwire the two motor direction pins, and the
only pin I needed to control from the Pi was the PWM motor speed pin.
The chip also needs a bunch of ground wires (which it uses as heat
sinks), a line to logic voltage (the Pi's 3.3V pin) and motor voltage
(since it's such a tiny motor, I'm driving it from the Pi's 5v power pin).
Here's the full wiring diagram.
Driving a single PWM pin is a lot simpler than the dual bidirectional
motor controllers I've used in other motor projects.
GPIO.setmode(GPIO.BCM)
GPIO.setup(23, GPIO.OUT)
pump = GPIO.PWM(PUMP_PIN, 50)
pump.start(0)
# Run the motor at 30% for 2 seconds, then stop.
pump.ChangeDutyCycle(30)
time.sleep(2)
pump.ChangeDutyCycle(0)
The rest was just putting together some logic: check the sensor,
and if it's too dry, pump some water -- but only a little, then wait a
while for the water to soak in -- and repeat.
Here's the full
plantwater.py
script.
I haven't added the website part yet, but the basic plant waterer
is ready to use -- and ready to demo at tonight's Raspberry Pi Jam.
The LWV had a 100th anniversary celebration earlier this week.
In New Mexico, that included a big celebration at the Roundhouse. One of
our members has collected a series of fun facts that she calls
"100-Year Minutes". You can see them at
lwvnm.org.
She asked me if it would be possible to have them displayed somehow
during our display at the Roundhouse.
Of course! I said. "Easy, no problem!" I said.
Famous last words.
There are two parts: first, display randomly (or sequentially) chosen
quotes with large text in a fullscreen window. Second, set up a computer
(the obvious choice is a Raspberry Pi) run the kiosk automatically.
This article only covers the first part; I'll write about the
Raspberry
Pi setup separately.
A Simple Plaintext Kiosk Python Script
When I said "easy" and "no problem", I was imagining writing a
little Python program: get text, scale it to the screen, loop.
I figured the only hard part would be the scaling.
the quotes aren't all the same length, but I want them to be easy to read,
so I wanted each quote displayed in the largest font that would let the
quote fill the screen.
Indeed, for plaintext it was easy. Using GTK3 in Python, first you
set up a PangoCairo layout (Cairo is the way you draw in GTK3, Pango
is the font/text rendering library, and a layout is Pango's term
for a bunch of text to be rendered).
Start with a really big font size, ask PangoCairo how large the layout would
render, and if it's so big that it doesn't fit in the available space,
reduce the font size and try again.
It's not super elegant, but it's easy and it's fast enough.
It only took an hour or two for a working script, which you can see at
quotekiosk.py.
But some of the quotes had minor HTML formatting. GtkWebkit was
orphaned several years ago and was never available for Python 3; the
only Python 3 option I know of for displaying HTML is Qt5's
QtWebEngine, which is essentially a fully functioning browser window.
Which meant that it seeming made more sense to write the whole kiosk
as a web page, with the resizing code in JavaScript. I say "seemingly";
it didn't turn out that way.
JavaScript: Resizing Text to Fit Available Space
The hard part about using JavaScript was the text resizing, since
I couldn't use my PangoCairo resizing code.
Much web searching found lots of solutions that resize a single line
to fit the width of the screen, plus a lot of hand-waving
suggestions that didn't work.
I finally found a working solution in a StackOverflow thread:
Fit text perfectly inside a div (height and width) without affecting the size of the div.
The only one of the three solutions there that actually worked was
the jQuery one. It basically does the same thing my original Python
script did: check element.scrollHeight and if it overflows,
reduce the font size and try again.
I used the jquery version for a little while, but eventually rewrote it
to pure javascript so I wouldn't have to keep copying jquery-min.js around.
JS Timers on Slow Machines
There are two types of timers in Javascript:
setTimeout, which schedules something to run once N seconds from now, and
setInterval, which schedules something to run repeatedly every N seconds.
At first I thought I wanted setInterval, since I want
the kiosk to keep running, changing its quote every so often.
I coded that, and it worked okay on my laptop, but failed miserably
on the Raspberry Pi Zero W. The Pi, even with a lightweight browser
like gpreso (let alone chromium), takes so long to load a page and
go through the resize-and-check-height loop that by the time it has
finally displayed, it's about ready for the timer to fire again.
And because it takes longer to scale a big quote than a small one,
the longest quotes give you the shortest time to read them.
So I switched to setTimeout instead. Choose a quote (since JavaScript
makes it hard to read local files, I used Python to read all the
quotes in, turn them into a JSON list and write them out to a file
that I included in my JavaScript code), set the text color to the
background color so you can't see all the hacky resizing, run the
resize loop, set the color back to the foreground color, and only
then call setTimeout again:
function newquote() {
// ... resizing and other slow stuff here
setTimeout(newquote, 30000);
}
// Display the first page:
newquote();
That worked much better on the Raspberry Pi Zero W, so
I added code to resize images in a similar fashion, and added some fancy
CSS fade effects that it turned out the Pi was too slow to run, but it
looks nice on a modern x86 machine.
The full working kiosk code is
quotekioska>).
Memory Leaks in JavaScript's innerHTML
I ran it for several hours on my development machine and it looked
great. But when I copied it to the Pi, even after I turned off the
fades (which looked jerky and terrible on the slow processor), it
only ran for ten or fifteen minutes, then crashed. Every time.
I tried it in several browsers, but they all crashed after running a while.
The obvious culprit, since it ran fine for a while then crashed,
was a memory leak. The next step was to make a minimal test case.
I'm using innerHTML to change
the kiosk content, because it's the only way I know of to parse and
insert a snippet of HTML that may or may not contain paragraphs and
other nodes. This little test page was enough to show the effect:
<h1>innerHTML Leak</h1>
<p id="thecontent">
</p>
<script type="text/javascript">
var i = 0;
function changeContent() {
var s = "Now we're at number " + i;
document.getElementById("thecontent").innerHTML = s;
i += 1;
setTimeout(changeContent, 2000);
}
changeContent();
</script>
Chromium has a nice performance recording tool that can show
you memory leaks. (Firefox doesn't seem to have an equivalent, alas.)
To test a leak, go to More Tools > Developer Tools
and choose the Performance tab. Load your test page,
then click the Record button. Run it for a while, like a couple
of minutes, then stop it and you'll see a graph like this (click on
the image for a full-size version).
Both the green line, Nodes, and the blue line, JS Heap,
are going up. But if you run it for longer, say, ten minutes, the
garbage collector eventually runs and the JS Heap line
drops back down. The Nodes line never does:
the node count just continues going up and up and up no matter how
long you run it.
So it looks like that's the culprit: setting innerHTML
adds a new node (or several) each time you call it, and those nodes are
never garbage collected. No wonder it couldn't run for long on the
poor Raspberry Pi Zero with 512Gb RAM (the Pi 3 with 1Gb didn't fare
much better).
It's weird that all browsers would have the same memory leak; maybe
something about the definition of innerHTML causes it.
I'm not enough of a Javascript expert to know, and the experts I
was able to find didn't seem to know anything about either why it
happened, or how to work around it.
Python html2text
So I gave up on JavaScript and
went back to my original Python text kiosk program.
After reading in an HTML snippet, I used the Python html2text
module to convert the snippet to text, then displayed it.
I added image resizing using GdkPixbuf and I was good to go.
quotekiosk.py
ran just fine throughout the centennial party,
and no one complained about the formatting not being
fancy enough. A happy ending, complete with cake and lemonade.
But I'm still curious about that JavaScript
leak, and whether there's a way to work around it. Anybody know?
The New Mexico legislature is in session again, which means the
New Mexico Bill Tracker
I wrote last year is back in season. But I guess the word has gotten
out, because this year, I started seeing a few database errors.
Specifically, "sqlite3.OperationalError: database is locked".
It turns out that even read queries on an sqlite3 database inside
flask and sqlalchemy can sometimes keep the database open
indefinitely. Consider something like:
userbills = user.get_bills() # this does a read query
# Do some slow operations that don't involve the database at all
for bill in userbills:
slow_update_involving_web_scraping(bill)
# Now bills are all updated; add and commit them.
# Here's where the write operations start.
for bill in userbills:
db.session.add(bill)
db.session.commit()
I knew better than to open a write query that might keep the database
open during all those long running operations. But apparently, when
using sqlite3, even the initial query of the database to get the
user's bill list opens the database and keeps it open ... until
when? Can you close it manually, then reopen it when you're ready?
Does it help to call db.session.commit()
after the read query? No one seems to know, and it's not obvious
how to test to find out.
I've suspected for a long time that sqlite was only a temporary
solution. While developing the billtracker, I hit quite a few
difficulties where the answer turned out to be "well, this would be
easy in a real database, but sqlite doesn't support that". I figured
I'd eventually migrate to postgresql. But I'm such a database newbie
that I'd been putting it off.
And rightly so. It turns out that migrating an existing database
from sqlite3 to postgresql isn't something that gets written
about much; I really couldn't find any guides on it.
Apparently everybody but me just chooses the right
database to begin with? Anyway, here are the steps on Debian.
Obviously, install postgresql first.
Create a User and a Database
Postgresql has its own notion of users, which you need to create.
At least on Debian, the default is that if you create a postgres
user named martha, then the Linux user martha on the same machine
can access databases that the postgres user martha has access to.
This is controlled by the "peer" auth method, which you can read about in
the
postgresql documentation on pg_hba.conf.
First su to the postgres Linux user and run psql:
$ sudo su - postgres
$ psql
Inside psql, create a postgresql user with the same name as your
flask user, and create a database for that user:
CREATE USER myflaskuser WITH PASSWORD 'password';
ALTER ROLE myflaskuser SET client_encoding TO 'utf8';
ALTER ROLE myflaskuser SET default_transaction_isolation TO 'read committed';
ALTER ROLE myflaskuser SET timezone TO 'UTC';
CREATE DATABASE dbname;
GRANT ALL PRIVILEGES ON DATABASE dbname TO myflaskuser;
If you like, you can also create a user yourusername and give
it access to the same database, to make debugging easier.
With the database created, the next step is to migrate the old
data from the sqlite database.
pgloader (if you have a very recent pgloader)
Using sqlalchemy in my flask app meant that I could use
flask db upgrade to create the database schema in any
database I chose. It does a lovely job of creating an empty database.
Unfortunately, that's no help
if you already have an existing database full of user accounts.
Some people suggested exporting data in either SQL or CSV format,
then importing it into postgresql. Bad idea. There are many
incompatibilities between the two databases: identifiers that
work in sqlite but not in postgresql (like "user", which is a reserved
word in postgres but a common table name in flask-based apps),
capitalization of column names, incompatible date formats, and
probably many more.
A program called pgloader takes care of many (but not all)
of the incompatibilities.
Create a file -- I'll call it migrate.pgloader --
like this:
load database
from 'latest-sqlite-file.db'
into postgresql:///new_db_name
with include drop, quote identifiers, create tables, create indexes, reset sequences
set work_mem to '16MB', maintenance_work_mem to '512 MB';
Then, from a Linux user who has access to the database (e.g. the
myflaskuser you created earlier),
run pgloader migrate.pgloader.
That worked nicely on my Ubuntu 19.10 desktop, which has pgloader
3.6.1. It failed utterly on the server, which is running Debian stable
and pgloader 3.3.2.
Building the latest pgloader from source didn't work on Debian either;
it's based on Common Lisp, and the older CL on Debian dumped me into
some kind of assembly debugger when I tried to build pgloader. Rather
than build CL from source too, I looked for another option.
There are two files to edit. The location of postgresql's
configuration directory varies with version, so do a
locate pg_hba.conf to find it.
In that directory, first edit pg_hba.conf
and add these lines to the end to allow net socket connections
from IP4 and IP6:
host all all 0.0.0.0/0 md5
host all all ::/0 md5
In the same directory, edit postgresql.conf and
search for listen_addr.
Comment out the localhost line if it's uncommented, and add this to allow
connections from anywhere, not just localhost:
listen_addresses = '*'
Then restart the database with
service postgresql restart
Modify the migrate.pgloader file from the previous section
so the "into" line looks like
into postgresql://username:password@host/dbname
The username there is the postgres username, if you made that
different from the Unix username. You need to use a password because
postgres is no longer using peer auth (see that postgres
documentation file I linked earlier).
Assuming this
You're done with the remote connection part. If you don't need remote
database connections for your app, you can now edit
postgresql.conf, comment out that
listen_addresses = '*' line, and restart the database
again with service postgresql restart.
Don't remove the two lines you added in pg_hba.conf;
flask apparently needs them.
You're ready for the migration.
Make sure you have the latest copy of the server's sqlite database,
then, from your desktop, run:
pgloader migrate.pgloader
Migrate Autoincrements to Sequences
But that's not enough. If you're using any integer primary keys that
autoincrement -- a pretty common database model -- postgresql doesn't
understand that. Instead, it has sequence objects.
You need to define a sequence, tie it to a table, and tell postgresql
that when it adds a new object to the table,
the default value of id is the maximum number in the
corresponding sequence. Here's how to do that for the table named "user":
CREATE SEQUENCE user_id_seq OWNED by "user".id;
ALTER TABLE "user" ALTER COLUMN id SET default nextval('user_id_seq');
SELECT setval(pg_get_serial_sequence('user', 'id'), coalesce(max(id)+1,1), false) FROM "user";
Note the quotes around "user" because otherwise user is a postgresql
reserved word. Repeat these three lines for every table in your database,
except that you don't need the quotes around any table name except user.
Incidentally, I've been told that using autoincrement/sequence
primary keys isn't best practice, because it can be a bottleneck
if lots of different objects are being created at once. I used it
because all the models I was following when I started with flask
worked that way, but eventually I plan try to switch to using some
other unique primary key.
Update:
Turns out there was another problem with the sequences, and it was
pretty annoying. I ended up with a bunch of indices with names like
"idx_15517_ix_user_email" when they should have been "ix_user_email".
The database superficially worked fine, but it havoc ensues if you ever
need to do a flask/sqlalchemy/alembic migration, since sqlalchemy
doesn't know anything about those indices with the funny numeric names.
It's apparently possible to rename indices in postgresql, but it's a
tricky operation that has to be done by hand for each index.
Now the database should be ready to test.
Test
Your flask app probably has something like this in config.py:
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \
'sqlite:///' + os.path.join(basedir, 'dbname.db')
If so, you can export DATABSE_URL=postgresql:///dbname
and then test it as you usually would. If you normally test on a
local machine and not on the server, remember you can tell flask's
test server to accept connections from remote machines with
flask run --host=0.0.0.0
Database Backups
You're backing up your database, right?
That's easier in sqlite where you can just copy the db file.
From the command line, you can back up a postgresql database with:
pg_dump dbname > dbname-backup.pg
You can do that from Python in a subprocess:
with open(backup_file, 'w') as fp:
subprocess.call(["pg_dump", dbname], stdout=fp)
Verify You're Using The New Database
I had some problems with that DATABASE_URL setting; I'd never
used it so I didn't realize that it wasn't in the right place and
didn't actually work. So I ran through my migration steps, changed
DATABASE_URL, thought I was done, and realized later
that the app was still running off sqlite3.
It's better to know for sure what your app is running.
For instance, you can add a route to routes.py that prints
details like that.
You can print app.config["SQLALCHEMY_DATABASE_URI"].
That's enough in theory, but I wanted to know for sure.
Turns out str(db.session.get_bind()) will print the
connection the flask app's database is actually using. So I added a route
that prints both, plus some other information about the running app.
Whew! I was a bit surprised that migrating was as tricky as it was,
and that there wasn't more documentation for it. Happy migrations, everyone.
A recent article on Pharyngula blog,
You ain’t no fortunate one,
discussed US wars, specifically the qeustion: depending on when you were born,
for how much of your life has the US been at war?
It was an interesting bunch of plots, constantly increasing until
for people born after 2001, the percentage hit 100%.
Really? That didn't seem right.
Wasn't the US in a lot of wars in the past?
When I was growing up, it seemed like we were always getting into wars,
poking our nose into other countries' business.
Can it really be true that we're so much more warlike now than we used to be?
It made me want to see a plot of when the wars were, beyond Pharyngula's
percentage-of-life pie charts. So I went looking for data.
Sure enough. If that Thoughtco page with the war dates is even close to
accurate -- it could be biased toward listing recent conflicts,
but I didn't find a more authoritative source for war dates --
the prevalence of war took a major jump in 2001.
We used to have big gaps between wars, and except for Vietnam,
the wars we were involved with were short, mostly less than a year each.
But starting in 2001, we've been involved in a never-ending series of
overlapping wars unprecedented in US history.
The Thoughtco page had wars going back to 1675, so I also made a plot
showing all of them (click for the full-sized version).
It's no different: short wars, not overlapping, all the way back
to before the revolution. We've seen nothing in the past like the
current warmongering.
Depressing. Climate change isn't the only phenomenon showing
a modern "hockey stick" curve, it seems.
First I tried using a data: URI.
In that scheme, you encode a page's full content into the URL. For instance:
try this in your browser:
data:text/html,Hello%2C%20World!
Nice and easy -- and it even works from file: URIs!
Well, sort of works. It turns out it has a problem related to
the same-origin problems I saw with postMessage.
A data: URI is always opened with an origin of about:blank;
and two about:blank origin pages can't talk to each other.
But I don't need them to talk to each other if I'm not using postMessage,
do I? Yes, I do.
The problem is that stylesheet I included in htmlhead above:
All browsers I tested refuse to open the stylesheet in the
about:blank popup. This seems strange: don't people use stylesheets
from other domains fairly often? Maybe it's a behavior special to
null (about:blank) origin pages. But in any case, I couldn't find
a way to get my data: URI popup to load a stylesheet. So unless I
hard-code all the styles I want for the notes page into the Javascript
that opens the popup window (and I'd really rather not do that),
I can't use data: as a solution.
Clever hack: Use the Same Page, Loaded in a Different Way
That's when I finally came across
Remy Sharp's page, Creating popups without HTML files.
Remy first explores the data: URI solution, and rejects it because of
the cross-origin problem, just as I did. But then he comes up with a
clever hack. It's ugly, as he acknowledges ... but it works.
The trick is to create the popup with the URL of the parent page that
created it, but with a named anchor appended:
parentPage.html#popup.
Then, in the Javascript, check whether #popup is in the
URL. If not, we're in the parent page and still need to call
window.open to create the popup. If it is there, then
the JS code is being executed in the popup. In that case, rewrite the
page as needed. In my case, since I want the popup to show only whatever
is in the div named #notes, and the slide content is all inside a div
called #page, I can do this:
function updateNoteWindow() {
if (window.location.hash.indexOf('#notes') === -1) {
window.open(window.location + '#notes', 'noteWin',
'width=300,height=300');
return;
}
// If here, it's the popup notes window.
// Remove the #page div
var pageDiv = document.getElementById("page");
pageDiv.remove();
// and rename the #notes div so it will be displayed in a different place
var notesDiv = document.getElementById("notes");
notesDiv.id = "fullnotes";
}
It works great, even in file: URIs, and even in QtWebEngine.
That's the solution I ended up using.
I'm trying to update my
htmlpreso
HTML presentation slide system to allow for a separate notes window.
Up to now, I've just used display mirroring. I connect to the
projector at 1024x768, and whatever is on the first (topmost/leftmost)
1024x768 pixels of my laptop screen shows on the projector. Since my
laptop screen is wider than 1024 pixels, I can put notes to myself
to the right of the slide, and I'll see them but the audience won't.
That works fine, but I'd like to be able to make the screens completely
separate, so I can fiddle around with other things while still
displaying a slide on the projector. But since my slides are in HTML,
and I still want my presenter notes, that requires putting the notes
in a separate window, instead of just to the right of each slide.
The notes for each slide are in a <div id="notes">
on each page. So all I have to do is pop up another browser window
and mirror whatever is in that div to the new window, right?
Sure ...
except this is JavaScript, so nothing is simple. Every little thing
is going to be multiple days of hair-tearing frustration, and this
was no exception.
I should warn you up front that I eventually found a much simpler way
of doing this. I'm documenting this method anyway because it seems
useful to be able to communicate between two windows, but if you
just want a simple solution for the "pop up notes in another window"
problem, stay tuned for Part 2.
Step 0: Give Up On file:
Normally I use file: URLs for presentations. There's no need
to run a web server, and in fact, on my lightweight netbook I usually
don't start apache2 by default, only if I'm actually working on
web development.
But most of the methods of communicating between windows don't work in
file URLs, because of the "same-origin policy".
That policy is a good security measure: it ensures that a page
from innocent-url.com can't start popping up windows with content
from evilp0wnU.com without you knowing about it. I'm good with that.
The problem is that file: URLs have location.origin
of null, and every null-origin window is considered to be a
different origin -- even if they're both from the same directory. That
makes no sense to me, but there seems to be no way around it. So if I
want notes in a separate window, I have to run a web server and use
http://localhost.
Step 1: A Separate Window
The first step is to pop up the separate notes window, or get a
handle to it if it's already up.
JavaScript offers window.open(), but there's a trick:
if you just call
notewin = window.open("notewin.html", "notewindow")
you'll actually get a new tab, not a new window. If you actually
want a window, the secret code for that is to give it a size:
There's apparently no way to just get a handle to an existing window.
The only way is to call window.open(),
pop up a new window if it wasn't there before, or reloads it if it's
already there.
I saw some articles implying that passing an empty string ""
as the first argument would return a handle to an existing window without
changing it, but it's not true: in Firefox and Chromium, at least,
that makes the existing window load about:blank instead of
whatever page it already has. So just give it the same page every time.
Step 2: Figure Out When the Window Has Loaded
There are several ways to change the content in the popup window from
the parent, but they all have one problem:
if you update the content right away after calling window.open,
whatever content you put there will be overwritten immediately when
the popup reloads its notewin.html page (or even about:blank).
So you need to wait until the popup is finished loading.
That sounds suspiciously easy. Assuming you have a function called
updateNoteWinContent(), just do this:
// XXX This Doesn't work:
notewin.addEventListener('load', updateNoteWinContent, false);
Except it turns out the "load" event listener isn't called on reloads,
at least not in popups.
So this will work the first time, when the
note window first pops up, but never after that.
I tried other listeners, like "DOMContentLoaded" and
"readystatechange", but none of them are called on reload.
Why not? Who knows?
It's possible this is because the listener gets set too early, and
then is wiped out when the page reloads, but that's just idle
speculation.
For a while, I thought I was going to have to resort to an ugly hack:
sleep for several seconds in the parent window to give the popup time
to load:
await new Promise(r => setTimeout(r, 3000));
(requires declaring the calling function as async).
This works, but ... ick.
Fortunately, there's a better way.
Step 2.5: Simulate onLoad with postMessage
What finally worked was a tricky way to use postMessage()
in reverse. I'd already experimented with using postMessage()
from the parent window to the popup, but it didn't work because the
popup was still loading and wasn't ready for the content.
What works is to go the other way. In the code loaded by the popup
(notewin.html in this example), put some code at the end
of the page that calls
window.opener.postMessage("Loaded");
Then in the parent, handle that message, and don't try to update the
popup's content until you've gotten the message:
function receiveMessageFromPopup(event) {
console.log("Parent received a message from the notewin:", event.data);
// Optionally, check whether event.data == "Loaded"
// if you want to support more than one possible message.
// Update the "notes" div in the popup notewin:
var noteDiv = notewin.document.getElementById("notes");
noteDiv.innerHTML = "
In the end, though, this didn't solve my presentation problem.
I got it all debugged and working, only to discover that
postMessage doesn't work in QtWebEngine, so
I couldn't use it in my slide presentation app.
Fortunately, I found a couple of other ways: stay tuned for Part 2.
A note on debugging:
One thing that slowed me down was that JS I put in the popup didn't
seem to be running: I never saw its console.log() messages.
It took me a while to realize that each window has its own web console,
both in Firefox and Chromium. So you have to wait until the popup has
opened before you can see any debugging messages for it. Even then,
the popup window doesn't have a menu, and its context menu doesn't
offer a console window option. But it does offer Inspect element,
which brings up a Developer Tools window where you can click on
the Console tab to see errors and debugging messages.
A friend recently introduced me to Folium, a quick and easy way of
making web maps with Python.
The Folium
Quickstart gets you started in a hurry. In just two lines of Python
(plus the import line), you can write an HTML file that you can load
in any browser to display a slippy map, or you can display it inline in a
Jupyter notebook.
Folium uses the very mature Leaflet
JavaScript library under the hood. But it lets you do all the
development in a few lines of Python rather than a lot of lines
of Javascript.
Having run through most of the quickstart, I was excited to try
Folium for showing GeoJSON polygons. I'm helping with a redistricting
advocacy project; I've gotten shapefiles for the voting districts in
New Mexico, and have been wanting to build a map that shows them
which I can then extend for other purposes.
Step 1: Get Some GeoJSON
The easiest place to get voting district data is from TIGER, the
geographic arm of the US Census.
For the districts resulting from the 2010 Decadal Census,
start here:
Cartographic
Boundary Files - Shapefile (you can also get them as KML,
but not as GeoJSON). There's a category called
"Congressional Districts: 116th Congress", and farther down the page,
under "State-based Files", you can get shapefiles for the upper and
lower houses of your state.
You can also likely download them from
at www2.census.gov/geo/tiger/TIGER2010/,
as long as you can figure out how to decode the obscure directory names.
ELSD and POINTLM, so
the first step is to figure out what those mean; I never found anything
that could decode them.
(Before I found the TIGER district data, I took a more roundabout path that
involved learning how to merge shapes; more on that in a separate post.)
Okay, now you have a shapefile (unzip the TIGER file to get a bunch of
files with names like cb_2018_35_sldl_500k.* -- shape "files"
are an absurd ESRI concept that actually use seven separate files for
each dataset, so they're always packaged as a zip archive and programs
that read shapefiles expect that when you pass them a .shp,
there will be a bunch of other files with the same basename but
different extensions in the same directory).
But Folium can't handle shapefiles, only GeoJSON. You can do that
translation with a GDAL command:
Or you can do it programmatically with the GDAL Python bindings:
def shapefile2geojson(infile, outfile, fieldname):
'''Translate a shapefile to GEOJSON.'''
options = gdal.VectorTranslateOptions(format="GeoJSON",
dstSRS="EPSG:4326")
gdal.VectorTranslate(outfile, infile, options=options)
The EPSG:4326 specifier, if you read man ogr2ogr, is supposedly
for reprojecting the data into WGS84 coordinates, which is what most
web maps want (EPSG:4326 is an alias for WGS84). But it has an equally
important function: even if your input shapefile is already in WGS84,
adding that option somehow ensures that GDAL will use degrees as the
output unit. The TIGER data already uses degrees so you don't strictly
need that, but some data, like the precinct data I got from UNM RGIS,
uses other units, like meters, which will confuse Folium and Leaflet.
And the TIGER data isn't in WGS84 anyway; it's in GRS1980 (you can tell
by reading the .prj file in the same directory as the .shp).
Don't ask me about details of all these different geodetic reference systems;
I'm still trying to figure it all out. Anyway, I recommend adding the
EPSG:4326 as the safest option.
Step 2: Show the GeoJSON in a Folium Map
In theory, looking at the Folium Quickstart, all you need is
folium.GeoJson(filename, name='geojson').add_to(m).
In practice, you'll probably want to more, like
color different regions differently
show some sort of highlight when the user chooses a region
Here's a simple style function that chooses random colors:
import random
def random_html_color():
r = random.randint(0,256)
g = random.randint(0,256)
b = random.randint(0,256)
return '#%02x%02x%02x' % (r, g, b)
def style_fcn(x):
return { 'fillColor': random_html_color() }
I wanted to let the user choose regions by clicking, but it turns out
Folium doesn't have much support for that (it may be coming in a
future release). You can do it by reading the GeoJSON yourself,
splitting it into separate polygons and making them all separate Folium
Polygons or GeoJSON objects, each with its own click behavior; but
if you don't mind highlights and popups on mouseover instead of
requiring a click, that's pretty easy. For highlighting in red whenever
the user mouses over a polygon, set this highlight_function:
In this case, 'NAME' is the field in the shapefile that I want to display
when the user mouses over the region.
If you're not sure of the field name, the nice thing about GeoJSON
is that it's human readable. Generally you'll want to look inside
"features", for "properties" to find the fields defined for each polygon.
For instance, if I use jq to prettyprint the JSON generated for the NM
state house districts:
If you still aren't sure which property name means what (for example,
"NAME" could be anything), just keep browsing through the JSON file to
see which fields change from feature to feature and give the values
you're looking for, and it should become obvious pretty quickly.
Here's a working code example:
polidistmap.py,
and here's an example of a working map:
Every time the media invents a new moon term -- super blood black wolf
moon, or whatever -- I roll my eyes.
First, this ridiculous "supermoon" thing is basically undetectable to
the human eye. Here's an image showing the relative sizes of the absolute
closest and farthest moons. It's easy enough to tell when you see the
biggest and smallest moons side by side, but when it's half a degree
in the sky, there's no way you'd notice that one was bigger or smaller
than average.
And then, talking about the ridiculous moon name phenom with some
friends, I realized I could play this game too.
So I spent twenty minutes whipping up my own
Silly Moon Name Generator.
It's super simple -- it just uses Linux' built-in dictionary, with no sense
of which words are common, or adjectives or nouns or what.
Of course it would be funnier with a hand-picked set of words,
but there's a limit to how much time I want to waste on this.
You can add a parameter ?nwords=5 (or whatever number)
if you want more or fewer words than four.
How Does It Work?
Random phrase generators like this are a great project for someone
just getting started with Python.
Python is so good at string manipulation that it makes this sort
of thing easy: it only takes half a page of code to do something fun.
So it's a great beginner project that most people would probably find
more rewarding than cranking out Fibonacci numbers (assuming you're not a
Fibonacci
geek like I am).
For more advanced programmers, random phrase generation can still be a
fun and educational project -- skip to the end of this article for ideas.
For the basics, this is all you need: I've added comments explaining
the code.
import random
def hypermoon(filename, nwords=4):
'''Return a silly moon name with nwords words,
each taken from a word list in the given filename.
'''
fp = open(filename)
lines = fp.readlines()
# A list to store the words to describe the moon:
words = []
for i in range(nwords): # This will be run nwords times
# Pick a random number between 0 and the number of lines in the file:
whichline = random.randint(0, len(lines))
# readlines() includes whitespace like newline characters.
# Use whichline to pull one line from the file, and use
# strip() to remove any extra whitespace:
word = lines[whichline].strip()
# Append it to our word list:
words.append(word)
# The last word in the phrase will be "moon", e.g.
# super blood wolf black pancreas moon
words.append("moon")
# ' '.join(list) combines all the words with spaces between them
return ' '.join(words)
# This is called when the program runs:
if __name__ == '__main__':
random.seed()
print(hypermoon('/usr/share/dict/words', 4))
A More Compact Format
In that code example,
I expanded everything to try to make it clear for beginning programmers.
In practice, Python lets you be a lot more terse, so the way
I actually wrote it was more like:
def hypermoon(filename, nwords=4):
with open(filename, encoding='utf-8') as fp:
lines = fp.readlines()
words = [ lines[random.randint(0, len(lines))].strip()
for i in range(nwords) ]
words.append('moon')
return ' '.join(words)
There are three important differences (in bold):
Opening a file using "with" ensures the file will be closed properly
when you're done with it. That's not important in this tiny example, but
it's a good habit to get into.
I specify the 'utf-8' encoding when I open the file because when I
ran it as a web app, it turned out the web server used the ASCII
encoding and I got Python errors because there are accented characters
in the dictionary somewhere. That's one of those Python annoyances
you get used to when going beyond the beginner level.
The way I define words all in one line (well, it's conceptually
one long line, though I split it into two so each line stays under 72
characters) is called a list comprehension. It's a nice compact
alternative to defining an empty list [] and then
calling append() a bunch of times, like I did in the
first example.
Initially they might seem harder to read, but list comprehensions can
actually make code clearer once you get used to them.
A Python Driven Web Page
Finally, to make it work as a web page, I added the CGI module.
That isn't really a beginner thing so I won't paste it here,
but you can see the CGI version at
hypermoon.py
on GitHub.
I should mention that there's some debate over CGI in Python.
The movers and shakers in the Python community don't approve of CGI,
and there's a plan to remove it from upcoming Python versions.
The alternative is to use technologies like Flask or Django.
while I'm a fan of Flask and have used it for several projects,
it's way overkill for something like this, mostly because of all
the special web server configuration it requires (and Django is
far more heavyweight than Flask). In any case,
be aware that the CGI module may be removed from Python's standard
library in the near future. With any luck, python-cgi will still be
available via pip install or as Linux distro packages.
More Advanced Programmers: Making it Funnier
I mentioned earlier that I thought the app would be a lot funnier with
a handpicked set of words. I did that long, long ago with my
Star Party
Observing Report Generator (written in Perl; I hadn't yet
started using Python back in 2001). That's easy and fun if you
have the time to spare, or a lot of friends contributing.
You could instead use words taken from a set of input documents.
For instance, only use words that appear in Shakespeare's plays, or
in company mission statements, or in Wikipedia articles about dog breeds
(this involves some web scraping, but Python is good at that too;
I like
BeautifulSoup).
Or you could let users contribute their own ideas for good words to use,
storing the user suggestions in a database.
Another way to make the words seem more appropriate and less random
might be to use one of the many natural language packages for Python,
such as NLTK, the Natural Language Toolkit. That way, you could
control how often you used adjectives vs. nouns, and avoid using verbs
or articles at all.
Random word generators seem like a silly and trivial programming
exercise -- because they are! But they're also a fun starting
point for more advanced explorations with Python.
This is Part III of a four-part article on ray tracing digital elevation
model (DEM) data.
The goal: render a ray-traced image of mountains from a digital
elevation model (DEM).
In Part II, I showed how the povray camera position and angle
need to be adjusted based on the data, and the position of the light
source depends on the camera position.
In particular, if the camera is too high, you won't see anything
because all the relief will be tiny invisible bumps down below.
If it's too low, it might be below the surface and then you
can't see anything.
If the light source is too high, you'll have no shadows, just a
uniform grey surface.
That's easy enough to calculate for a simple test image like the one I
used in Part II, where you know exactly what's in the file.
But what about real DEM data where the elevations can vary?
Explore Your Test Data
For a test, I downloaded some data that includes the peaks I can see
from White Rock in the local Jemez and Sangre de Cristo mountains.
(or whatever your favorite image view is, if not
pho).
The image at right shows the hillshade for the data I'm using, with a
yellow cross added at the location I'm going to use for the observer.
Sanity check: do the lowest and highest elevations look right?
Let's look in both meters and feet, using the tricks from Part I.
>>> import gdal
>>> import numpy as np
>>> demdata = gdal.Open('mountains.tif')
>>> demarray = np.array(demdata.GetRasterBand(1).ReadAsArray())
>>> demarray.min(), demarray.max()
(1501, 3974)
>>> print([ x * 3.2808399 for x in (demarray.min(), demarray.max())])
[4924.5406899, 13038.057762600001]
That looks reasonable. Where are those highest and lowest points,
in pixel coordinates?
Those coordinates are reversed because of the way numpy arrays
are organized: (1386, 645) in the image looks like
Truchas Peak (the highest peak in this part of the Sangres), while
(175, 1667) is where the Rio Grande disappears downstream off the
bottom left edge of the map -- not an unreasonable place to expect to
find a low point. If you're having trouble eyeballing the coordinates,
load the hillshade into GIMP and watch the coordinates reported at the
bottom of the screen as you move the mouse.
While you're here, check the image width and height. You'll need it later.
>>> demarray.shape
(1680, 2160)
Again, those are backward: they're the image height, width.
Choose an Observing Spot
Let's pick a viewing spot: Overlook Point in White Rock
(marked with the yellow cross on the image above).
Its coordinates are -106.1803, 35.827. What are the pixel coordinates?
Using the formula from the end of Part I:
>>> import affine
>>> affine_transform = affine.Affine.from_gdal(*demdata.GetGeoTransform())
>>> inverse_transform = ~affine_transform
>>> [ round(f) for f in inverse_transform * (-106.1803, 35.827) ]
[744, 808]
Just to double-check, what's the elevation at that point in the image?
Note again that the numpy array needs the coordinates in reverse order:
Y first, then X.
The camera should be at the observer's position, and povray needs
that as a line like
location <rightward, upward, forward>
where those numbers are fractions of 1.
The image size in pixels is
2160x1680, and the observer is at pixel location (744, 808).
So the first and third coordinates of location should
be 744/2160 and 808/1680, right?
Well, almost. That Y coordinate of 808 is measured from the top,
while povray measures from the bottom. So the third coordinate is
actually 1. - 808/1680.
Now we need height, but how do you normalize that? That's another thing
nobody seems to document anywhere I can find; but since we're
using a 16-bit PNG, I'll guess the maximum is 216 or 65536.
That's meters, so DEM files can specify some darned high mountains!
So that's why that location <0, .25, 0> line I got
from the Mapping Hacks book didn't work: it put the camera at
.25 * 65536 or 16,384 meters elevation, waaaaay up high in the sky.
My observer at Overlook Point is at 1,878 meters elevation, which
corresponds to a povray height of 1878/65536. I'll use the same value
for the look_at height to look horizontally. So now we can
calculate all three location coordinates: 744/2160 = .3444,
1878/65536 = 0.0287, 1. - 808/1680 = 0.5190:
location <.3444, 0.0287, .481>
Povray Glitches
Except, not so fast: that doesn't work. Remember how I mentioned in
Part II that povray doesn't work if the camera location is at ground
level? You have to put the camera some unspecified minimum distance
above ground level before you see anything. I fiddled around a bit and
found that if I multiplied the ground level height by 1.15 it worked,
but 1.1 wasn't enough. I have no idea whether that will work in
general. All I can tell you is, if you're setting location to
be near ground level and the generated image looks super dark
regardless of where your light source is, try raising your location a
bit higher. I'll use 1878/65536 * 1.15 = 0.033.
For a first test, try setting look_at to some fixed place in
the image, like the center of the top (north) edge (right .5, forward 1):
That means you won't be looking exactly north, but that's okay, we're
just testing and will worry about that later.
The middle value, the elevation, is the same as the camera elevation
so the camera will be pointed horizontally. (look_at can be at
ground level or even lower, if you want to look down.)
Where should the light source be? I tried to be clever and put the
light source at some predictable place over the observer's right
shoulder, and most of the time it didn't work. I ended up just
fiddling with the numbers until povray produced visible terrain.
That's another one of those mysterious povray quirks. This light
source worked fairly well for my DEM data, but feel free to experiment:
And once I finally got to this point I could immediately see it was correct.
That's Black Mesa (Tunyo) out in the valley a little right of center,
and I can see White Rock
canyon in the foreground with Otowi Peak on the other side of the canyon.
(I strongly recommend, when you experiment with this, that you choose a
scene that's very distinctive and very familiar to you, otherwise you'll
never be sure if you got it right.)
Next Steps
Now I've accomplished my goal: taking a DEM map and ray-tracing it.
But I wanted even more. I wanted a 360-degree panorama of
all the mountains around my observing point.
Povray can't do that by itself, but
in Part IV, I'll show how to make a series of povray renderings
and stitch them together into a panorama.
Part IV,
Making a Panorama
from Raytraced DEM Images
One of my hiking buddies uses a phone app called Peak Finder. It's a neat
program that lets you spin around and identify the mountain peaks you see.
Alas, I can't use it, because it won't work without a compass, and
[expletive deleted] Samsung disables the compass in their phones, even
though the hardware is there. I've often wondered if I could write a program
that would do something similar. I could use the images in planetarium
shows, and could even include additions like
predicting exactly when and where the moon would rise on a given date.
Before plotting any mountains, first you need some elevation data,
called a Digital Elevation Model or DEM.
Get the DEM data
Digital Elevation Models are available from a variety of sources in a
variety of formats. But the downloaders and formats aren't as well
documented as they could be, so it can be a confusing mess.
USGS
USGS steers you to the somewhat flaky and confusing
National Map
Download Client. Under Data in the left sidebar, click
on Elevation Products (3DEP), select the accuracy you need,
then zoom and pan the map until it shows what you need.
Current Extent doesn't seem to work consistently, so use
Box/Point and sweep out a rectangle.
Then click on Find products.
Each "product" should have a download link next to it,
or if not, you can put it in your cart and View Cart.
Except that National Map tiles often don't load, so you can end up with a
mostly-empty map (as shown here) where you have no idea what area
you're choosing. Once this starts happening, switching to a different
set of tiles probably won't help; all you can do is wait a few hours
and hope it gets better..
Or get your DEM data somewhere else. Even if you stick with the USGS,
they have a different set of DEM data, called SRTM (it comes from the
Shuttle Radar Topography Mission) which is downloaded from a completely
different place,
SRTM DEM data, Earth Explorer.
It's marginally easier to use than the National Map and less flaky
about tile loading, and it gives you
GeoTIFF files instead of zip files containing various ArcGIS formats.
Sounds good so far; but once you've wasted time defining the area you
want, suddenly it reveals that you can't download anything unless you
first make an account, and
you have to go through a long registration process that demands name,
address and phone number (!) before you can actually download anything.
Of course neither of these sources lets you just download data for
a given set of coordinates; you have to go through the interactive
website any time you want anything. So even if you don't mind giving
the USGS your address and phone number, if you want something you can
run from a program, you need to go elsewhere.
The best I found is
OpenTypography's
SRTM API, which lets you download arbitrary areas specified by
latitude/longitude bounding boxes.
Verify the Data: gdaldem
Okay, you've got some DEM data. Did you get the area you meant to get?
Is there any data there? DEM data often comes packaged as an image,
primarily GeoTIFF. You might think you could simply view that in an
image viewer -- after all, those nice preview images they show you
on those interactive downloaders show the terrain nicely.
But the actual DEM data is scaled so that even high mountains
don't show up; you probably won't be able to see anything but blackness.
One way of viewing a DEM file as an image is to load it into GIMP.
Bring up Colors->Levels, go to the input slider (the upper
of the two sliders) and slide the rightmost triangle leftward until it's
near the right edge of the histogram. Don't save it that way (that will
mess up the absolute elevations in the file); it's just
a quick way of viewing the data.
Update: A better, one-step way is Color > Auto > Stretch Contrast.
A better way to check DEM data files is a beautiful little program
called gdaldem
(part of the GDAL package).
It has several options, like generating a hillshade image:
Then view hillshade.png in your favorite image viewer and see
if it looks like you expect. Having read quite a few elaborate
tutorials on hillshade generation over the years, I was blown away at
how easy it is with gdaldem.
Here are some other operations you can do on DEM data.
Translate the Data to Another Format
gdal has lots more useful stuff beyond gdaldem. For instance, my
ultimate goal, ray tracing, will need a PNG:
gdal_translate can recognize most DEM formats. If you have a
complicated multi-file format like ARCGIS,
try using the name of the directory where the files live.
Get Vertical Limits, for Scaling
What's the highest point in your data, and at what coordinates does that
peak occur? You can find the highest and lowest points easily with Python's
gdal package if you convert the gdal.Dataset into a numpy array:
That gives you the highest and lowest elevations. But where are they
in the data? That's not super intuitive in numpy; the best way I've
found is:
indices = np.where(demarray == demarray.max())
ymax, xmax = indices[0][0], indices[1][0]
print("The highest point is", demarray[ymax][xmax])
print(" at pixel location", xmax, ymax)
Translate Between Lat/Lon and Pixel Coordinates
But now that you have the pixel coordinates of the high point, how do
you map that back to latitude and longitude?
That's trickier, but here's one way, using the affine package:
What about the other way? You have latitude and longitude
and you want to know what pixel location that corresponds to?
Define an inverse transformation:
inverse_transform = ~affine_transform
px, py = [ round(f) for f in inverse_transform * (lon, lat) ]
Those transforms will become important once we get to Part III.
But first, Part II, Understand Povray:
Height Fields in Povray
Dave and I will be presenting a free program on Stonehenge at the Los
Alamos Nature Center tomorrow, June 14.
The nature center has a list of programs people have asked for, and
Stonehenge came up as a topic in our quarterly meeting half a year ago.
Remembering my seventh grade fascination
with Stonehenge and its astronomical alignments -- I discovered
Stonehenge Decoded at the local library, and built a desktop
model showing the stones and their alignments -- I volunteered.
But after some further reading, I realized that not all of those
alignments are all they're cracked up to be and that there might not
be much of astronomical interest to talk about, and I un-volunteered.
But after thinking about it for a bit, I realized that "not all
they're cracked up to be" makes an interesting topic in itself.
So in the next round of planning, I re-volunteered; the result is
tomorrow night's presentation.
The talk will include a lot of history of Stonehenge and its construction,
and a review of some other important or amusing henges around the world.
But this article is on the astronomy, or lack thereof.
The Background: Stonehenge Decoded
Stonehenge famously aligns with the summer solstice sunrise, and
that's when tens of thousands of people flock to Salisbury, UK to
see the event. (I'm told that the rest of the time, the monument is
fenced off so you can't get very close to it, though I've never had
the opportunity to visit.)
Curiously, archaeological evidence suggests that the summer solstice
wasn't the big time for prehistorical gatherings at Stonehenge; the
time when it was most heavily used was the winter solstice, when there's
a less obvious alignment in the other direction. But never mind that.
In 1963, Gerald Hawkins wrote an article in Nature, which he
followed up two years later with a book entitled Stonehenge Decoded.
Hawkins had access to an IBM 7090, capable of a then-impressive
100 Kflops (thousand floating point operations per second; compare
a Raspberry Pi 3 at about 190 Mflops, or about a hundred Gflops for
something like an Intel i5). It cost $2.9 million (nearly $20 million
in today's dollars).
Using the 7090, Hawkins mapped the positions of all of Stonehenge's
major stones, then looked for interesting alignments with the sun and moon.
He found quite a few of them.
(Hawkins and Fred Hoyle also had a theory about the fifty-six Aubrey
holes being a lunar eclipse predictor, which captured my seventh-grade
imagination but which most researchers today think was more likely
just a coincidence.)
But I got to thinking ... Hawkins mapped at least 38 stones if you
don't count the Aubrey holes. If you take 38 randomly distributed points,
what are the chances that you'll find interesting astronomical alignments?
A Modern Re-Creation of Hawkins' Work
Programmers today have it a lot easier than Hawkins did.
We have languages like Python, with libraries like PyEphem to handle
the astronomical calculations.
And it doesn't hurt that our computers are about a million times faster.
Anyway, my script,
skyalignments.py
takes a GPX file containing a list of geographic coordinates and compares
those points to sunrise and sunset at the equinoxes and solstices,
as well as the full moonrise and moonset nearest the solstice or equinox.
It can find alignments among all the points in the GPX file, or from a
specified "observer" point to each point in the file. It allows a slop
of a few degrees, 2 degrees by default; this is about four times the
diameter of the sun or moon, but a half-step from your observing
position can make a bigger difference than that. I don't know how
much slop Hawkins used; I'd love to see his code.
My first thought was, what if you stand on a mountain peak and look
around you at other mountain peaks? (It's easy to get GPS coordinates
for peaks; if you can't find them online you can click on them on a map.)
So I plotted the major peaks in the Jemez and Sangre de Cristo mountains
that I figured were all mutually visible. It came to 22 points; about
half what Hawkins was working with.
Yikes! Way too many. What if I cut it down? So I tried eliminating all
but the really obvious ones, the ones you really notice from across
the valley. The most prominent 11 peaks: 5 in the Jemez, 6 in the Sangres.
That was a little more manageable. Now I was down to only 22 alignments.
Now, I'm pretty sure that the Ancient Ones -- or aliens -- didn't lay
out the Jemez and Sangre de Cristo mountains to align with the rising
and setting sun and moon. No, what this tells us is that pretty much any
distribution of points will give you a bunch of astronomical alignments.
And that's just the sun and moon, all Hawkins was considering. If you
look for writing on astronomical alignments in ancient monuments,
you'll find all people claiming to have found alignments with all
sorts of other rising and setting bodies, like Sirius and
Orion's belt. Imagine how many alignments I could have found if I'd
included the hundred brightest stars.
So I'm not convinced.
Certainly Stonehenge's solstice alignment looks real; I'm not disputing that.
And there are lots of other archaeoastronomy sites that are even
more convincing, like the Chaco sun dagger. But I've also seen plenty of
web pages, and plenty of talks, where someone maps out a collection of
points at an ancient site and uses alignments among them as proof that
it was an ancient observatory. I suspect most of those alignments are more
evidence of random chance and wishful thinking than archeoastronomy.
Lately I've been running with my default python set to Python 3.
Debian still uses Python 2 as the default, which is reasonable, but
adding a ~/bin/python symlink to /usr/bin/python3
helps me preview scripts that might become a problem once Debian
does switch. I thought I had converted most of my Python scripts to
Python 3 already, but this link is catching some I didn't convert.
Python has a nice script called 2to3 that can convert the bulk
of most scripts with little fanfare.
The biggest hassles that 2to3 can't handle are network related
(urllib and urllib2) and, the big one, user interfaces.
PyGTK, based on GTK2 has no Python 3 equivalent; in Python 3,
the only option is to use GObject Introspection (gi) and
GTK3. Since there's almost no documentation on python-gi and gtk3,
converting a GTK script always involves a lot of fumbling and guesswork.
A few days ago I tried to play an MP3 in my little
musicplayer.py
script and discovered I'd never updated it. I have enough gi/GTK3 scripts
by now that I thought something with such a simple user interface
would be easy. Shows how much I know about GTK3!
I got the basic window ported pretty easily, but it looked terrible:
huge margins everywhere, and no styling on the text, like the bold,
large-sized text I had previously use to highlight the name of the
currently playing song. I tried various approaches, but a lot of the
old methods of styling have been deprecated in GTK3; you're supposed to
use CSS. Except, of course, there's no documentation on it, and it turns
out the CSS accepted by GTK3 is a tiny subset of the CSS you can use in
HTML pages, but what the subset is doesn't seem to be documented anywhere.
How to Apply a Stylesheet
The first task was to get any CSS at all working.
The
GNOME Journal: Styling GTK with CSS
was helpful in getting started, but had a lot of information that
doesn't work (perhaps it did once). At least it gave me this basic snippet:
Great! if all you want to do is turn the whole app red.
But in reality, you'll want to style different widgets differently.
At least some classes have class names:
css = 'button { background-color: #f00; }'
I found other pages suggesting using 'GtkButton in CSS,
but that didn't work for me. How do you find the right class names?
No idea, I never found a reference for that. Just guess, I guess.
User-set Class Names
What about classes -- for instance, make all the buttons in a ButtonBox white?
You can add classes this way:
There is, amazingly, a page on
which
CSS properties GTK3 supports.
That page doesn't mention it, but some properties like :hover
are also supported. So you can write CSS tweaks like
and then use CSS that affects all the buttons inside the buttonbox:
#buttonbox button { color: red; }
No mixed CSS Inside Labels
My biggest disappointment was that I couldn't mix styles inside a label.
You can't do something like
label.set_label('Headline'
'Normal text')
and expect to style the different parts separately.
You can use very simple markup like <b>bold</b> normal,
but anything further gives errors like
"error parsing markup: Attribute 'class' is not allowed on the
<span> tag" (you'll get the same error if you try "id").
I had to make separate GtkLabels for each text size and style I wanted,
which is a lot more work. If you wanted to mix styles and have them
reflow as the content length changed, I don't know how (or if)
you could do it.
Fortunately, I don't strictly need that for this little app.
So for now, I'm happy to have gotten this much working.
A friend and I were talking about temperature curves: specifically,
the way the temperature sinks in the evening but then frequently rises
again before it really starts cooling off.
I thought it would be fun to plot the curve of temperature as a
function of time over successive days, as a 3-D plot. I knew matplotlib
had a way to do 3D plots, but I've never actually generated one.
Well, it turns out there are lots of examples, but they all start by
generating mysterious data blobs, and none of them explain the
structure of the data they're using, and the documentation has
mysterious parameters like "zs" that aren't explained anywhere. So
getting something that worked was a fiddly process. Creating a color
version, to distinguish the graphs better, was even more fiddly.
So I wrote an example that I hope will make it a little clearer for
anyone trying to use this library. It can plot using just lines:
... or it can plot in color, cycling colors manually because by default
matplotlib makes adjacent colors similar, exactly the opposite of what
you'd want:
All is not perfect. Axes3D gets a bit confused sometimes about which
layer is supposed to be in front of which other layer. You can see that
on the two plots: in both cases, the fourth and fifth layers from the
front are reversed, so the fifth layer is drawn in front of the fourth
layer. I haven't yet found anyone in the matplotlib organization who
seems to know much about Axes3D; eventually I'll file a bug but I want
to write a shorter, clearer test case to illustrate the problem.
Still, even with the bugs it's a useful technique to know.
A while back, Dave ordered a weather station.
His research pointed to the
Ambient Weather WS-2000 as the best bang for the buck as far as accuracy
(after it's calibrated, which is a time consuming and exacting process
where you compare readings to a known-good mercury thermometer, a process
that I suspect most weather station owners don't bother with).
It comes with a little 7" display console that sits indoors and
reads the radio signal from the outside station as well as a second
thermometer inside, then displays all the current weather data.
It also uses wi-fi to report the data upstream to Ambient and,
optionally, to a weather site such as Wunderground.
(Which we did for a while, but now Wunderground is closing off
their public API, so why give them data if they're not going to
make it easy to share it?)
Having the console readout and the Ambient "dashboard" is all very
nice, but of course, being a data geek, I wanted a way to get the data
myself, so I could plot it, save it or otherwise process it. And
that's where Ambient falls short. The console, though it's already
talking on wi-fi, gives you no way to get the data. They sell a
separate unit called an "Observer" that provides a web page you
can scrape, and we actually ordered one, but it turned out to be
buggy and difficult to use, giving numbers that were substantially
different from what the console showed, and randomly failing to answer,
and we ended up returning the observer for a refund.
The other way of getting the data is online. Ambient provides an API
you can use for that purpose, if you email them for a key. It
mostly works, but it sometimes lags considerably behind real time, and
it seems crazy to have to beg for a key and then get data from a
company website that originated in our own backyard.
What I really wanted to do was read the signal from the weather
station directly. I'd planned for ages to look into how to do that,
but I'm a complete newbie to software defined radio and wasn't
sure where to start. Then one day I noticed an SDR discussion
on the #raspberrypi IRC channel on Freenode where I often hang out.
I jumped in, asked some questions, and got pointed in the right direction
and referred to the friendly and helpful #rtlsdr Freenode channel.
An Inexpensive SDR Dongle
Update:
Take everything that follows with a grain of salt.
I got it working, everything was great -- then when I tried it the
very next day after I wrote the article, none of it worked. At all.
The SDR dongle no longer saw anything from the station, even though
the station was clearly still sending to the console.
I never did get it working reliably, nor did I ever find out what
the problem was, and in the end I gave up.
Occasionally the dongle will see the weather station's output,
but most of the time it doesn't. It might be a temperature sensitivity
issue (though the dongle I bought is supposed to be temperature compensated).
Or maybe it's gremlins. Whatever it is, be warned that although the
information below might get you started, it probably won't get you
a reliably working SDR solution. I wish I knew the answer.
Indeed it did. The command to monitor the weather station is
rtl_433 -f 915M
rtl_433 already knows the protocol for the WS-2000,
so I didn't even need to do any decoding or reverse engineering;
it produces a running account of the periodic signals being
broadcast from the station. rtl_433 also helpfully offers -F json
and -F csv options, along with a few other formats.
What a great program!
JSON turned out to be the easiest for me to use; initially I thought
CSV would be more compact, but rtl_433's CSV format includes fields
for every possible quantity a weather station could ever broadcast.
When you think about it, that makes sense: once you're outputting
CSV you can't add a new field in mid-stream, so you'd better be
ready for anything. JSON, on the other hand, lets you report
just whatever the weather station reports, and it's easy to parse
from nearly any programming language.
Testing the SDR Dongle
Full disclosure: at first, rtl_433 -f 915M wasn't showing
me anything and I still don't know why. Maybe I had a loose connection
on the antenna, or maybe I got unlucky and the weather station picked
the exact wrong time to take a vacation. But while I was testing,
I found another program that was very helpful in testing whether
my SDR dongle was working: rtl_fm, which plays radio stations.
The only trick is finding the right arguments,
since the example from the man page just played static.
Here's what worked for me:
rtl_fm -f 101.1M -M fm -g 20 -s 200k -A fast -r 32k -l 0 -E deemp | play -r 32k -t raw -e s -b 16 -c 1 -V1 -
That command plays the 101.1 FM radio station. (I had to do a web search
to give me some frequencies of local radio stations; it's been
a long time since I listened to normal radio.)
Once I knew the dongle was working, I needed to verify what frequency
the weather station was using for its broadcasts.
What I really wanted was something that would scan frequencies around
915M and tell me what it found. Everyone kept pointing me to a program
called Gqrx. But it turns out Gqrx on Linux requires PulseAudio and
absolutely refuses to work or install without it, even if you have no
interest in playing audio. I didn't want to break my system's sound
(I've never managed to get sound working reliably under PulseAudio),
and although it's supposedly possible to build Gqrx without Pulse,
it's a difficult build: I saw plenty of horror stories, and it
requires Boost, always a sign that a build will be problematic.
I fiddled with it a little but decided it wasn't a good time investment.
I eventually found a scanner that worked:
RTLSDR-Scanner.
It let me set limiting frequencies and scan between them, and by
setting it to accumulate, I was able to verify that indeed, the
weather station (or something else!) was sending a signal on 915 MHz.
I guess by then, the original problem had fixed itself, and after that,
rtl_433 started showing me signals from the weather station.
It's not super polished, but it's the only scanner I've found that
works without requiring PulseAudio.
That Puzzling Rainfall Number
One mystery remained to be solved. The JSON I was getting from the
weather station looked like this (I've reformatted it for readablility):
This on a day when it hadn't rained in ages. What was up with that
"rainfall_mm" : 90.678 ?
I asked on the rtl_433 list and got a prompt and helpful answer:
it's a cumulative number, since some unspecified time in the past
(possibly the last time the battery was changed?) So as long as
I make a note of the rainfall_mm number, any change in
that number means new rainfall.
This being a snowy winter, I haven't been able to test that yet:
the WS-2000 doesn't measure snowfall unless some snow happens to melt
in the rain cup.
Some of the other numbers, like uv and uvi, are in mysterious unknown
units and sometimes give results that make no sense (why doesn't
uv go to zero at night? You're telling me that there's that much UV
in starlight?), but I already knew that was an issue with the Ambient.
It's not rtl_433's fault.
I notice that the numbers are often a bit different from what the
Ambient API reports; apparently they do some massaging of the numbers,
and the console has its own adjustment factors too.
We'll have to do some more calibration with a mercury thermometer
to see which set of numbers is right.
Anyway, cool stuff! It took no time at all to write a simple client
for my WatchWeather
web app that runs rtl_433 and monitors the JSON output.
I already had WatchWeather clients collecting reports from
Raspberry Pi Zero Ws sitting at various places in the house with
temperature/humidity sensors attached; and now my WatchWeather page
can include the weather station itself.
For the last few weeks I've been consumed with a project I started
last year and then put aside for a while: a bill tracker.
The project sprung out of frustration at the difficulty of following
bills as they pass through the New Mexico legislature. Bills I was
interested in would die in committee, or they would make it to a
vote, and I'd read about it a few days later and wish I'd known
that it was a good time to write my representative or show up at
the Roundhouse to speak. (I've never spoken at the Roundhouse,
and whether I'd have the courage to actually do it remains to be
seen, but at least I'd like to have the chance to decide.)
New Mexico has a Legislative web site
where you can see the status of each bill, and they even offer a way
to register and save a list of bills; but then there's no way to
get alerts about bills that change status and might be coming up for debate.
New Mexico legislative sessions are incredibly short: 60 days in
odd years, 30 days in even. During last year's 30-day session,
I wrote some Python code that scraped the HTML pages describing a bill,
extract the useful information like when the bill last changed
status and where it was right now, present the information
in a table where the user could easily scan it, and email the user a
daily summary.
Fortunately, the nmlegis.gov site, while it doesn't offer raw data for
bill status, at least uses lots of id tags in its HTML which make them
relatively easy to scrape.
Then the session ended and there was no further way to test it,
since bills' statuses were no longer changing. So the billtracker
moved to the back burner.
In the runup to this year's 60-day session, I started with Flask, a
lightweight Python web library I've used for a couple of small
projects, and added some extensions that help Flask handle tasks
like user accounts. Then I patched in the legislative web scraping
code from last year, and the result was
The New Mexico Bill Tracker.
I passed the word to some friends in the League of Women Voters and
the Sierra Club to help me test it, and I think (hope) it's ready for
wider testing.
There's lots more I'd like to do, of course. I still have no way of
knowing when a bill will be up for debate. It looks like this year
the Legislative web site is showing committ schedules in a fairly
standard way, as opposed to the unparseable PDFs they used in past years,
so I may be able to get that. Not that legislative committees actually
stick to their published schedules; but at least it's a start.
New Mexico readers (or anyone else interested in following the
progress of New Mexico bills) are invited to try it. Let me know about
any problems you encounter. And if you want to adapt the billtracker
for use in another state, send me a note! I'd love to see it extended
and would be happy to work with you. Here's the source:
BillTracker on GitHub.
My machine has recently developed an overheating problem.
I haven't found a solution for that yet -- you'd think Linux would
have a way to automatically kill or suspend processes based on CPU
temperature, but apparently not -- but my investigations led me
down one interesting road: how to write
a Python script that finds CPU hogs.
The psutil module can get a list
of processes with psutil.process_iter(), which returns
Process objects that have a cpu_percent() call.
Great! Except it always returns 0.0, even for known hogs like Firefox,
or if you start up a VLC and make it play video scaled to the monitor size.
That's because cpu_percent() needs to run twice,
with an interval in between:
it records the elapsed run time and sees how much it changes.
You can pass an interval to cpu_percent()
(the units aren't documented, but apparently they're seconds).
But if you're calling it on more than one process -- as you usually
will be -- it's better not to wait for each process.
You have to wait at least a quarter of a second to get useful
numbers, and longer is better. If you do that for every process on the
system, you'll be waiting a long time.
Instead, use cpu_percent() in non-blocking mode.
Pass None as the interval (or leave it blank since None is
the default), then loop over the process list and call
proc.cpu_percent(None) on each process, throwing away the
results the first time.
Then sleep for a while and repeat the loop: the second time,
cpu_percent() will give you useful numbers.
def hoglist(delay=5):
'''Return a list of processes using a nonzero CPU percentage
during the interval specified by delay (seconds),
sorted so the biggest hog is first.
'''
proccesses = list(psutil.process_iter())
for proc in proccesses:
proc.cpu_percent(None) # non-blocking; throw away first bogus value
print("Sleeping ...")
sys.stdout.flush()
time.sleep(delay)
print()
procs = []
for proc in proccesses:
percent = proc.cpu_percent(None)
if percent:
procs.append((proc.name(), percent))
print(procs)
procs.sort(key=lambda x: x[1], reverse=True)
return procs
if __name__ == '__main__':
prohogscs = hoglist()
for p in hogs:
print("%20s: %5.2f" % p)
It's a useful trick. Though actually applying this to a daemon
that responds to temperature, to solve my overheating problem, is more
complicated. For one thing, you need rules about special processes. If
your Firefox goes wonky and starts making your X server take lots of
CPU time, you want to suspend Firefox, not the X server.
Years ago, I saw someone demonstrating an obscure slide presentation
system, and one of the tricks it had was to let you draw on slides
with the mouse. So you could underline or arrow specific points,
or, more important (since underlines and arrows are easily included
in slides), draw something in response to an audience question.
Neat feature, but there were other reasons I didn't want to switch to
that particular slide system.
Many years later, and quite happy with my home-grown
htmlpreso system
for HTML-based slides, I was sitting in an astronomy panel discussion
listening to someone explain black holes when it occurred to me:
with HTML Canvas being a fairly mature technology, how hard could
it be to add drawing to my htmlpreso setup? It would just take a javascript
snippet that creates a canvas on top of the existing slide, plus
some basic event handling and drawing code that surely someone else
has already written.
Curled up in front of the fire last night with my laptop, it only took a
couple of hours to whip up a proof of concept that seems remarkably usable.
I've added it to htmlpreso.
I have to confess, I've never actually felt the need to draw on a slide
during a talk. But I still love knowing that it's possible.
It'll be interesting to see how often I actually use it.
To play with drawing on slides, go to my
HTMLPreso
self-documenting slide set (with JavaScript enabled)
and, on any slide, type Shift-D.
Some color swatches should appear in the upper right of the slide,
and now you can scribble over the tops of slides to your heart's content.
About fifteen years ago, a friend in LinuxChix blogged about doing the
"50-50 Book Challenge". The goal was to read fifty new books in a year,
plus another fifty old books she'd read before.
I had no idea whether this was a lot of books or not. How many books
do I read in a year? I had no idea. But now I wanted to know.
So I started keeping a list: not for the 50-50 challenge specifically,
but just to see what the numbers were like.
It would be easy enough to do this in a spreadsheet, but I'm not
really a spreadsheet kind of girl, unless there's a good reason to
use one, like accounting tables or other numeric data. So I used
a plain text file with a simple, readable format,
like these entries from that first year, 2004:
Dragon Hunter: Roy Chapman Andrews and the Central Asiatic Expeditions, Charles Gallenkamp, Michael J. Novacek
Fascinating account of a series of expeditions in the early 1900s
searching for evidence of early man. Instead, they found
groundbreaking dinosaur discoveries, including the first evidence
of dinosaurs protecting their eggs (Oviraptor).
Life of Pi
Uneven, quirky, weird. Parts of it are good, parts are awful.
I found myself annoyed by it ... but somehow compelled to keep
reading. The ending may have redeemed it.
The Lions of Tsavo : Exploring the Legacy of Africa's Notorious Man-Eaters, Bruce D. Patterson
Excellent overview of the Tsavo lion story, including some recent
findings. Makes me want to find the original book, which turns
out to be public domain in Project Gutenberg.
- Bellwether, Connie Willis
What can I say? Connie Willis is one of my favorite writers and
this is arguably her best book. Everyone should read it.
I can't imagine anyone not liking it.
If there's a punctuation mark in the first column, it's a reread.
(I keep forgetting what character to use, so sometimes it's a dot,
sometimes a dash, sometimes an atsign.)
If there's anything else besides a space, it's a new book.
Lines starting with spaces are short notes on what I thought
of the book. I'm not trying to write formal reviews, just reminders.
If I don't have anything in specific to say, I leave it blank or
write a word or two, like "fun" or "disappointing".
Crunching the numbers
That means it's fairly easy to pull out book titles and count them
with grep and wc. For years I just used simple aliases:
All books this year: egrep '^[^ ]' books2019 | wc -l
Just new books: egrep '^[^ -.@]' books2019 | wc -l
Just reread books: egrep '^[-.@]' books2019 | wc -l
But after I had years of accumulated data I started wanting to see
it all together, so I wrote a shell alias that I put in my .zshrc:
booksread() {
setopt extendedglob
for f in ~/Docs/Lists/books/books[0-9](#c4); do
year=$(echo $f | sed 's/.*books//')
let allbooks=$(egrep '^[^ ]' $f | grep -v 'Book List:' | wc -l)
let rereads=$(egrep '^[-.@\*]' $f | grep -v 'Book List:'| wc -l)
printf "%4s: All: %3d New: %3d rereads: %3d\n" \
$year $allbooks $(($allbooks - $rereads)) $rereads
done
}
In case you're curious, my numbers are all over the map:
So sometimes I beat that 100-book target that the 50-50 people advocated,
other times not. I'm not worried about the overall numbers. Some years
I race through a lot of lightweight series mysteries; other years I
spend more time delving into long nonfiction books.
But I have learned quite a few interesting tidbits.
What Does it all Mean?
I expected my reread count would be quite high.
As it turns out, I don't reread nearly as much as I thought.
I have quite a few "comfort books" that I like to read over and over
again (am I still five years old?), especially when I'm tired or ill.
I sometimes feel guilty about that, like I'm wasting time when I could
be improving my mind. I tell myself that it's not entirely a
waste: by reading these favorite books over and over, perhaps I'll
absorb some of the beautiful rhythms, strong characters, or clever
plot twists, that make me love them; and that maybe some of that will
carry over into my own writing. But it feels like rationalization.
But that first year, 2004, I read 44 new books and reread 9,
including the Lord of the Rings trilogy that I hadn't read
since I was a teenager. So I don't actually "waste" that much time on
rereading. Over the years, my highest reread count was 25 in 2011,
when I reread the whole Tony Hillerman series.
Is my reread count low because I'm conscious of the record-keeping,
and therefore I reread less than I would otherwise? I don't think so.
I'm still happy to pull out a battered copy of Tea with the Black
Dragon or Bellweather or Watership Down or
The Lion when I don't feel up to launching into a new book.
Another thing I wondered:
would keeping count encourage me to read more short mysteries and fewer
weighty non-fiction tomes? I admit I am a bit more aware of book
lengths now -- oh, god, the new Stephenson is how many pages?
-- but I try not to get competitive, even with myself, about numbers,
and I don't let a quest for big numbers keep me from reading Blood
and Thunder or The Invention of Nature. (And I had that
sinking feeling about Stephenson even before I started keeping a book
list. The man can write, but he could use an editor with a firm hand.)
What counts as a book? Do I feel tempted to pile up short,
easy books to "get credit" for them, or to finish a bad book I'm not
enjoying? Sometimes a little, but mostly no. What about novellas?
What about partial reads, like skipping chapters?
I decide on a case by case basis but don't stress over it.
I do keep entries for books I start and don't finish (with spaces at
the beginning of the line so they don't show up in the count), with
notes on why I gave up on them, or where I left off if I intend to go back.
Unexpected Benefits
Keeping track of my reading has turned out to have other benefits.
For instance, it prevents accidental rereads.
Last year Dave checked a mystery out of the library (we read a lot of
the same books, so anything one of us reads, the other will at least
consider). I looked at it and said "That sounds awfully familiar.
Haven't we already read it?" Sure enough, it was on my list from
the previous year, and I hadn't liked it. Dave doesn't keep a book
list, so he started reading, but eventually realized that he, too, had
read it before.
And sometimes my memory of a book isn't very clear, and my notes
on what I thought of a book are useful.
Last year, on a hike, a friend and I got to talking about the efforts
to eradicate rats on southern California's Channel Islands. I said
"Oh, I read an interesting novel about that recently. Was it
Barbara Kingsolver? No, wait ... I think it was T.C. Boyle.
Interesting book, you should check it out."
When I got home, I consulted my book lists and found it in 2011:
When the Killing's Done, T.C. Boyle
A tough slog through part 1, but it gets somewhat better in part 2
(there are actually a few characters you don't hate, finally)
and some plot eventually emerges, near the end of the novel.
I sent my friend an email rescinding my recommendation. I told her the
book does cover some interesting details related to the rat eradication,
but I'd forgotten that it was a poor excuse for a novel. In the end
she decided to read it anyway, and her opinion agreed with mine.
I believe she's started keeping a book list of her own now.
On the other hand, it's also good to have a record of delightful new
discoveries. A gem from last year:
Mr. Penumbra's 24-hour bookstore, Robin Sloan
Unexpectedly good! I read this because Sloan was on the Embedded
podcast, but I didn't expect much. Turns out Sloan can write!
Had me going from the beginning. Also, the glow-in-the-dark books
on the cover were fun.
Even if I forget Sloan's name (sad, I know, but I have a poor memory
for names), when I see a new book of his I'll know to check it out.
I didn't love his second book, Sourdough, quite as much as
Mr. Penumbra, but he's still an author worth following.
Someone asked me about my Javascript
Jupiter code, and whether it used PyEphem. It doesn't, of course,
because it's Javascript, not Python (I wish there was something
as easy as PyEphem for Javascript!); instead it uses code from the book
Astronomical Formulae for Calculators by Jean Meeus.
(His better known Astronomical Algorithms, intended for
computers rather than calculators, is actually harder to use for
programming because Astronomical Algorithms is written
for BASIC and the algorithms are relatively hard to translate into other
languages, whereas Astronomical Formulae for Calculators concentrates
on explaining the algorithms clearly, so you can punch them into a
calculator by hand, and this ends up making it fairly easy to
implement them in a modern computer language as well.)
Anyway, the person asking also mentioned JPL's page
HORIZONS Ephemerides
page, which I've certainly found useful at times.
Years ago, I tried emailing the site maintainer asking if they might
consider releasing the code as open source; it seemed like a
reasonable request, given that it came from a government agency
and didn't involve anything secret. But I never got an answer.
But going to that page today, I find that code is now available!
What's available is a massive toolkit called SPICE
(it's all in capitals but there's no indication what it might stand for.
It comes from NAIF, which is NASA's Navigation and Ancillary
Information Facility).
SPICE allows for accurate calculations of all sorts of solar system
quantities, from the basic solar system bodies like planets to
all of NASA's active and historical public missions.
It has bindings for quite a few languages, including C.
The official list doesn't include Python, but there's a third-party Python
wrapper called SpiceyPy
that works fine.
The tricky part of programming with SPICE is that most of the code is
hidden away in "kernels" that are specific to the objects and quantities
you're calculating. For any given program you'll probably need to
download at least four "kernels", maybe more. That wouldn't be a
problem except that there's not much help for figuring out which
kernels you need and then finding them. There are lots of SPICE
examples online but few of them tell you which kernels they need,
let alone where to find them.
After wrestling with some of the examples, I learned some tricks for
finding kernels, at least enough to get the basic examples working.
I've collected what I've learned so far into a new GitHub repository:
NAIF SPICE Examples.
The README there explains what I know so far about getting kernels;
as I learn more, I'll update it.
SPICE isn't easy to use, but it's probably much more accurate than
simpler code like PyEphem or my Meeus-based Javascript code, and it
can calculate so many more objects. It's definitely something
worth knowing about for anyone doing solar system simulations.
In my several recent articles about building Firefox from source,
I omitted one minor change I made, which will probably sound a bit silly.
A self-built Firefox thinks its name is "Nightly", so, for example,
the Help menu includes About Nightly.
Somehow I found that unreasonably irritating. It's not
a nightly build; in fact, I hope to build it as seldom as possible,
ideally only after a git pull when new versions are released.
Yet Firefox shows its name in quite a few places, so you're constantly
faced with that "Nightly". After all the work to build Firefox,
why put up with that?
To find where it was coming from,
I used my recursive grep alias which skips the obj- directory plus
things like object files and metadata. This is how I define it in my .zshrc
(obviously, not all of these clauses are necessary for this Firefox
search), and then how I called it to try to find instances of
"Nightly" in the source:
Even with all those exclusions, that still ends up printing
an enormous list. But it turns out all the important hits
are in the browser directory, so you can get away with
running it from there rather than from the top level.
I found a bunch of likely files that all had very similar
"Nightly" lines in them:
Since I didn't know which one was relevant, I changed each of them to
slightly different names, then rebuilt and checked to see which names
I actually saw while running the browser.
It turned out that
browser/branding/unofficial/locales/en-US/brand.dtd
is the file that controls the application name in the Help menu
and in Help->About -- though the title of the
About window is still "Nightly" and I haven't found what controls that.
branding/unofficial/locales/en-US/brand.ftl controls the
"Nightly" references in the Edit->Preferences window.
I don't know what all the others do.
There may be other instances of "Nightly" that appear elsewhere in the app,
the other files, but I haven't seen them yet.
When I first tried switching to Firefox Quantum,
the regression that bothered me most was
Ctrl-W, which I use everywhere as word erase (try it -- you'll get
addicted, like I am). Ctrl-W deletes words in the URL bar;
but if you type Ctrl-W in a text field on a website,
like when editing a bug report or a "Contact" form,
it closes the current tab, losing everything you've just typed.
It's always worked in Firefox in the past; this is a new problem
with Quantum, and after losing a page of typing for about the
20th time, I was ready to give up and find another browser.
A web search found plenty of people online asking about key bindings
like Ctrl-W, but apparently since the deprecation of XUL and XBL
extensions, Quantum no longer offers any way to change or even
just to disable its built-in key bindings.
I wasted a few days chasing a solution inspired by this
clever way
of remapping keys only for certain windows using
xdotool getactivewindow; I even went so far as to
write a Python script that intercepts keystrokes, determines the
application for the window where the key was typed, and remaps it if
the application and keystroke match a list of keys to be remapped.
So if Ctrl-W is typed in a Firefox window, Firefox will instead
receive Alt-Backspace. (Why not just type Alt-Backspace, you ask?
Because it's much harder to type, can't be typed from the home
position, and isn't in the same place on every keyboard the way
W is.)
But sadly, that approach didn't work because it turned out my window
manager, Openbox, acts on programmatically-generated key bindings as
well as ones that are actually typed. If I type a Ctrl-W and it's in
Firefox, that's fine: my Python program sees it, generates an
Alt-Backspace and everything is groovy.
But if I type a Ctrl-W in any other application, the program
doesn't need to change it, so it generates a Ctrl-W, which Openbox
sees and calls the program again, and you have an infinite loop.
I couldn't find any way around this. And admittedly, it's a horrible
hack having a program intercept every keystroke. So I needed to fix
Firefox somehow.
But after spending days searching for a way to customize Firefox's
keys, to no avail, I came to the conclusion that the only way was to
modify the source code and
rebuild
Firefox from source.
Ironically, one of the snags I hit in building it was that
I'd named my key remapper "pykey.py", and it was still in my
PYTHONPATH; it turns out the Firefox build also has a module called
pykey.py and mine was interfering. But eventually I got the build working.
Firefox Key Bindings
I was lucky: building was the only hard part, because
a very helpful person on Mozilla's #introduction IRC channel
pointed me toward the solution, saving me hours of debugging. Edit
browser/base/content/browser-sets.inc
around line 240
and remove reserved="true" from key_closeWindow.
It turned out I needed to remove reserved="true" from
the adjacent key_close line as well.
In theory, since browser-sets.inc isn't compiled C++, it seems
like you should be able to make this fix without building the whole
source tree. In an actual Firefox release,
browser-sets.inc is part of omni.ja, and indeed if
you unpack omni.ja you'll see the key_closeWindow and
key_close lines. So it seems like you ought to be able to
regenerate omni.ja without rebuilding all the C++ code.
Unfortunately, in practice omni.ja is more complicated than that.
Although you can unzip it and edit the files, if you zip it back up,
Firefox doesn't see it as valid. I guess that's why they renamed it
.ja: long ago it used to be omni.jar and, like other
.jar files, was a standard zip archive that you could edit.
But the new .ja file isn't documented anywhere I could find,
and all the web discussions I found on how to re-create it amounted
to "it's complicated, you probably don't want to try".
And you'd think that I could take the omni.ja file from my
desktop machine, where I built Firefox, and copy it to my laptop,
replacing the omni.ja file from a released copy of Firefox.
But no -- somehow, it isn't seen, and the old key bindings are still
active. They must be duplicated somewhere else, and I haven't figured
out where.
It sure would be nice to have a way to transfer an omni.ja.
Building Firefox on my laptop takes nearly a full day (though
hopefully rebuilding after pulling minor security updates won't be
quite so bad). If anyone knows of a way, please let me know!
After I'd
switched
from the Google Maps API to Leaflet get my trail map
working on my own website,
the next step was to move it to the Nature Center's website
to replace the broken Google Maps version.
PEEC, unfortunately for me, uses Wordpress (on the theory that this
makes it easier for volunteers and non-technical staff to add
content). I am not a Wordpress person at all; to me, systems
like Wordpress and Drupal mostly add obstacles that mean standard HTML
doesn't work right and has to be modified in nonstandard ways.
This was a case in point.
The Leaflet library for displaying maps relies on calling an
initialization function when the body of the page is loaded:
<body onLoad="javascript:init_trailmap();">
But in a Wordpress website, the <body> tag comes
from Wordpress, so you can't edit it to add an onload.
A web search found lots of people wanting body onloads, and
they had found all sorts of elaborate ruses to get around the problem.
Most of the solutions seemed like they involved editing
site-wide Wordpress files to add special case behavior depending
on the page name. That sounded brittle, especially on a site where
I'm not the Wordpress administrator: would I have to figure this out
all over again every time Wordpress got upgraded?
But I found a trick in a Stack Overflow discussion,
Adding onload to body,
that included a tricky bit of code. There's a javascript function to add
an onload to the
tag; then that javascript is wrapped inside a
PHP function. Then, if I'm reading it correctly, The PHP function registers
itself with Wordpress so it will be called when the Wordpress footer is
added; at that point, the PHP will run, which will add the javascript
to the body tag in time for for the onload
even to call the Javascript. Yikes!
But it worked.
Here's what I ended up with, in the PHP page that Wordpress was
already calling for the page:
<?php
/* Wordpress doesn't give you access to the <body> tag to add a call
* to init_trailmap(). This is a workaround to dynamically add that tag.
*/
function add_onload() {
?>
<script type="text/javascript">
document.getElementsByTagName('body')[0].onload = init_trailmap;
</script>
<?php
}
add_action( 'wp_footer', 'add_onload' );
?>
A while ago I wrote an
interactive
trail map page for the PEEC nature center website.
At the time, I wanted to use an open library, like OpenLayers or Leaflet;
but there were no good sources of satellite/aerial map tiles at the
time. The only one I found didn't work because they had a big blank
area anywhere near LANL -- maybe because of the restricted
airspace around the Lab. Anyway, I figured people would want a
satellite option, so I used Google Maps instead despite its much
more frustrating API.
This week we've been working on converting the website to https.
Most things went surprisingly smoothly (though we had a lot more
absolute URLs in our pages and databases than we'd realized).
But when we got through, I discovered the trail map was broken.
I'm still not clear why, but somehow the change from http to https
made Google's API stop working.
In trying to fix the problem, I discovered that Google's map API
may soon cease to be free:
New pricing and product changes will go into effect starting June 11,
2018. For more information, check out the
Guide for
Existing Users.
That has a button for "Transition Tool" which, when you click it,
won't tell you anything about the new pricing structure until you've
already set up a billing account. Um ... no thanks, Google.
Googling for google maps api billing led to a page headed
"Pricing
that scales to fit your needs", which has an elaborate pricing
structure listing a whole bnch of variants (I have no idea which
of these I was using), of which the first $200/month is free.
But since they insist on setting up a billing account, I'd probably
have to give them a credit card number -- which one? My personal
credit card, for a page that isn't even on my site? Does the nonprofit
nature center even have a credit card? How many of these API calls is
their site likely to get in a month, and what are the chances of going
over the limit?
It all rubbed me the wrong way, especially when the context
of "Your trail maps page that real people actually use has
broken without warning, and will be held hostage until you give usa
credit card number". This is what one gets for using a supposedly free
(as in beer) library that's not Free open source software.
So I replaced Google with the excellent open source
Leaflet library, which, as a
bonus, has much better documentation than Google Maps. (It's not that
Google's documentation is poorly written; it's that they keep changing
their APIs, but there's no way to tell the dozen or so different APIs
apart because they're all just called "Maps", so when you search for
documentation you're almost guaranteed to get something that stopped
working six years ago -- but the documentation is still there making
it look like it's still valid.)
And I was happy to discover that, in the time since I originally set
up the trailmap page, some open providers of aerial/satellite map
tiles have appeared. So we can use open source and have a
satellite view.
Our trail map is back online with Leaflet, and with any luck,
this time it will keep working.
PEEC
Los Alamos Area Trail Map.
Humble Bundle has a great
bundle going right now (for another 15 minutes -- sorry, I meant to post
this earlier) on books by Nebula-winning science fiction authors,
including some old favorites of mine, and a few I'd been meaning to read.
I like Humble Bundle a lot, but one thing about them I don't like:
they make it very difficult to download books, insisting that you click
on every single link (and then do whatever "Download this link / yes, really
download, to this directory" dance your browser insists on) rather than
offering a sane option like a tarball or zip file. I guess part of their
business model includes wanting their customers to get RSI. This has
apparently been a problem for quite some time; a web search found lots
of discussions of ways of automating the downloads, most of which
apparently no longer work (none of the ones I tried did).
But a wizard friend on IRC quickly came up with a solution:
some javascript you can paste into Firefox's console. She started
with a quickie function that fetched all but a few of the files, but
then modified it for better error checking and the ability to get
different formats.
In Firefox, open the web console (Tools/Web Developer/Web Console)
and paste this in the single-line javascript text field at the bottom.
// How many seconds to delay between downloads.
var delay = 1000;
// whether to use window.location or window.open
// window.open is more convenient, but may be popup-blocked
var window_open = false;
// the filetypes to look for, in order of preference.
// Make sure your browser won't try to preview these filetypes.
var filetypes = ['epub', 'mobi', 'pdf'];
var downloads = document.getElementsByClassName('download-buttons');
var i = 0;
var success = 0;
function download() {
var children = downloads[i].children;
var hrefs = {};
for (var j = 0; j < children.length; j++) {
var href = children[j].getElementsByClassName('a')[0].href;
for (var k = 0; k < filetypes.length; k++) {
if (href.includes(filetypes[k])) {
hrefs[filetypes[k]] = href;
console.log('Found ' + filetypes[k] + ': ' + href);
}
}
}
var href = undefined;
for (var k = 0; k < filetypes.length; k++) {
if (hrefs[filetypes[k]] != undefined) {
href = hrefs[filetypes[k]];
break;
}
}
if (href != undefined) {
console.log('Downloading: ' + href);
if (window_open) {
window.open(href);
} else {
window.location = href;
}
success++;
}
i++;
console.log(i + '/' + downloads.length + '; ' + success + ' successes.');
if (i < downloads.length) {
window.setTimeout(download, delay);
}
}
download();
If you have "Always ask where to save files" checked in
Preferences/General, you'll still get a download dialog for each book
(but at least you don't have to click; you can hit return for each
one). Even if this is your preference, you might want to consider
changing it before downloading a bunch of Humble books.
Anyway, pretty cool! Takes the sting out of bundles, especially big
ones like this 42-book collection.
I've been trying to learn more about weather from a friend who
used to work in the field -- in particular, New Mexico's notoriously
windy spring. One of the reasons behind our spring winds relates to
the location of the jet stream. But I couldn't find many
good references showing how the jet stream moves throughout the year.
So I decided to try to plot it myself -- if I could find the data.
Getting weather data can surprisingly hard.
In my search, I stumbled across Geert Barentsen's excellent
Annual
variations in the jet stream (video). It wasn't quite what I wanted --
it shows the position of the jet stream in December in successive
years -- but the important thing is that he provides a Python script
on GitHub that shows how he produced his beautiful animation.
Well -- mostly. It turns out his data sources are no longer available,
and he didn't go into a lot of detail on where he got his data, only
saying that it was from the ECMWF ERA re-analysis model (with a
link that's now 404).
That led me on a merry chase through the ECMWF website trying to
figure out which part of which database I needed. ECMWF has
lots
of publically available databases
(and even more)
and they even have Python libraries to access them;
and they even have a lot of documentation, but somehow none of
the documentation addresses questions like which database includes
which variables and how to find and fetch the data you're after,
and a lot of the sample code doesn't actually work.
I ended up using the "ERA Interim, Daily" dataset and requesting
data for only specific times and only the variables and pressure levels
I was interested in. It's a great source of data once you figure out
how to request it.
Sign up for an ECMWF API Key
Access ECMWF Public Datasets
(there's also
Access
MARS and I'm not sure what the difference is),
which has links you can click on to register for an API key.
Once you get the email with your initial password, log in using the URL
in the email, and change the password.
That gave me a "next" button that, when I clicked it, took me to
a page warning me that the page was obsolete and I should update
whatever bookmark I had used to get there.
That page also doesn't offer a link to the new page where you can
get your key details, so go here:
Your API key.
The API Key page gives you some lines you can paste into ~/.ecmwfapirc.
That sets you up to use the ECMWF api. They have a
Web
API and a Python library,
plus some
other Python packages,
but after struggling with a bunch of Magics tutorial examples that mostly
crashed or couldn't find data, I decided I was better off sticking to
the basic Python downloader API and plotting the results with Matplotlib.
The Python data-fetching API works well. To install it,
activate your preferred Python virtualenv or whatever you use
for pip packages, then run the pip command shown at
Web
API Downloads (under "Click here to see the
installation/update instructions...").
As always with pip packages, you'll have to decide on a Python version
(they support both 2 and 3) and whether to use a virtualenv, the
much-disrecommended sudo pip, pip3, etc. I used pip3 in a virtualenv
and it worked fine.
Specify a dataset and parameters
That's great, but how do you know which dataset you want to load?
There doesn't seem to be anything that just lists which datasets have
which variables. The only way I found is to go to the Web API page
for a particular dataset to see the form where you can request
different variables. For instance, I ended up using the
"interim-full-daily"
database, where you can choose date ranges and lists of parameters.
There are more choices in the sidebar: for instance, clicking on
"Pressure levels" lets you choose from a list of barometric pressures
ranging from 1000 all the way down to 1. No units are specified, but
they're millibars, also known as hectoPascals (hPa): 1000 is more or
less the pressure at ground level, 250 is roughly where the jet stream
is, and Los Alamos is roughly at 775 hPa (you can find charts of
pressure vs. altitude on the web).
When you go to any of the Web API pages, it will show you a dialog suggesting
you read about
Data
retrieval efficiency, which you should definitely do if you're
expecting to request a lot of data, then click on the details for
the database you're using to find out how data is grouped in "tape files".
For instance, in the ERA-interim database, tapes are grouped by date,
so if you're requesting multiple parameters for multiple months,
request all the parameters for a given month together, rather than
making one request for level 250, another request for level 1000, etc.
Once you've checked the boxes for the data you want, you can fetch the
data via the web interface, or click on "View the MARS request" to get
parameters you can plug into a Python script.
If you choose the Python script option as I did, you can start with the
basic data retrieval example.
Use the second example, the one that uses 'format' : "netcdf",
which will (eventually) give you a file ending in .nc.
Requesting a specific area
You can request only a limited area,
"area": "75/-20/10/60",
but they're not very forthcoming on the syntax of that, and it's
particularly confusing since "75/-20/10/60" supposedly means "Europe".
It's hard to figure how those numbers as longitudes and latitudes
correspond to Europe, which doesn't go down to 10 degrees latitude,
let alone -20 degrees. The
Post-processing
keywords page gives more information: it's North/West/South/East,
which still makes no sense for Europe,
until you expand the Area examples tab on that
page and find out that by "Europe" they mean Europe plus Saudi Arabia and
most of North Africa.
Using the data: What's in it?
Once you have the data file, assuming you requested data in netcdf format,
you can parse the .nc file with the netCDF4 Python module -- available
as Debian package "python3-netcdf4", or via pip -- to read that file:
import netCDF4
data = netCDF4.Dataset('filename.nc')
But what's in that Dataset?
Try running the preceding two lines in the
interactive Python shell, then:
>>> for key in data.variables:
... print(key)
...
longitude
latitude
level
time
w
vo
u
v
You can find out more about a parameter, like its units, type,
and shape (array dimensions). Let's look at "level":
>>> data['level']
<class 'netCDF4._netCDF4.Variable'>
int32 level(level)
units: millibars
long_name: pressure_level
unlimited dimensions:
current shape = (3,)
filling on, default _FillValue of -2147483647 used
>>> data['level'][:]
array([ 250, 775, 1000], dtype=int32)
>>> type(data['level'][:])
<class 'numpy.ndarray'>
Levels has shape (3,): it's a one-dimensional array with
three elements: 250, 775 and 1000.
Those are the three levels I requested from the web API and in my
Python script). The units are millibars.
More complicated variables
How about something more complicated? u and v are the two
components of wind speed.
>>> data['u']
<class 'netCDF4._netCDF4.Variable'>
int16 u(time, level, latitude, longitude)
scale_factor: 0.002161405503194121
add_offset: 30.095301438361684
_FillValue: -32767
missing_value: -32767
units: m s**-1
long_name: U component of wind
standard_name: eastward_wind
unlimited dimensions: time
current shape = (30, 3, 241, 480)
filling on
u (v is the same) has a shape of (30, 3, 241, 480): it's a
4-dimensional array. Why? Looking at the numbers in the shape gives a clue.
The second dimension has 3 rows: they correspond to the three levels,
because there's a wind speed at every level. The first dimension has
30 rows: it corresponds to the dates I requested (the month of April 2015).
I can verify that:
>>> data['time'].shape
(30,)
Sure enough, there are 30 times, so that's what the first dimension
of u and v correspond to. The other dimensions, presumably, are
latitude and longitude. Let's check that:
Sure enough! So, although it would be nice if it actually told you
which dimension corresponded with which parameter, you can probably
figure it out. If you're not sure, print the shapes of all the
variables and work out which dimensions correspond to what:
>>> for key in data.variables:
... print(key, data[key].shape)
Iterating over times
data['time'] has all the times for which you have data
(30 data points for my initial test of the days in April 2015).
The easiest way to plot anything is to iterate over those values:
timeunits = JSdata.data['time'].units
cal = JSdata.data['time'].calendar
for i, t in enumerate(JSdata.data['time']):
thedate = netCDF4.num2date(t, units=timeunits, calendar=cal)
Then you can use thedate like a datetime,
calling thedate.strftime or whatever you need.
So that's how to access your data. All that's left is to plot it --
and in this case I had Geert Barentsen's script to start with, so I
just modified it a little to work with slightly changed data format,
and then added some argument parsing and runtime options.
However, it turns out ffmpeg can't handle files that are named with
timestamps, like jetstream-2017-06-14-250.png. It can only
handle one sequential integer. So I thought, what if I removed the
dashes from the name, and used names like jetstream-20170614-250.png
with %8d? No dice: ffmpeg also has the limitation that the
integer can have at most four digits.
So I had to rename my images. A shell command works: I ran this in
zsh but I think it should work in bash too.
cd outdir
mkdir moviedir
i=1
for fil in *.png; do
newname=$(printf "%04d.png" $i)
ln -s ../$fil moviedir/$newname
i=$((i+1))
done
ffmpeg -i moviedir/%4d.png -filter:v "setpts=2.5*PTS" -pix_fmt yuv420p jetstream.mp4
The -filter:v "setpts=2.5*PTS" controls the delay between
frames -- I'm not clear on the units, but larger numbers have more delay,
and I think it's a multiplier,
so this is 2.5 times slower than the default.
When I uploaded the video to YouTube, I got a warning,
"Your videos will process faster if you encode into a streamable file format."
I then spent half a day trying to find a combination of ffmpeg arguments
that avoided that warning, and eventually gave up. As far as I can tell,
the warning only affects the 20 seconds or so of processing that happens
after the 5-10 minutes it takes to upload the video, so I'm not sure
it's terribly important.
And here's the script, updated from the original Barentsen script
and with a bunch of command-line options to let you plot different
collections of data:
jetstream.py on GitHub.
I had a need for a Qt widget that could display PDF. That turned out
to be surprisingly hard to do. The Qt Wiki has a page on
Handling PDF,
which suggests only two alternatives: QtPDF, which is C++ only
so I would need to write a wrapper to use it with Python (and
then anyone else who used my code would have to compile and install it);
or Poppler. Poppler is a common library on Linux, available as a
package and used for programs like evince, so that seemed like the
best route.
But Python bindings for Poppler are a bit harder to come by.
I found a little
one-page
example using Poppler and Gtk3 via gi.repository ...
but in this case I needed it to work with a Qt5 program, and my
attempts to translate that example to work with Qt were futile.
Poppler's page.render(ctx) takes a Cairo context,
and Cairo is apparently a Gtk-centered phenomenon: I couldn't find any way
to get a Cairo context from a Qt5 widget, and although I found some
web examples suggesting renderToImage(),
the Poppler available in gi.repository doesn't have that function.
But it turns out there's another Poppler:
popplerqt5,
available in the Debian package python3-poppler-qt5. That Poppler
does have renderToImage, and you can take that image and
paint it in a paint() callback or turn it into a pixmap you can use
with a QLabel. Here's the basic sequence:
document = Poppler.Document.load(filename)
document.setRenderHint(Poppler.Document.TextAntialiasing)
page = document.page(pageno)
img = self.page.renderToImage(dpi, dpi)
# Use the rendered image as the pixmap for a label:
pixmap = QPixmap.fromImage(img)
label.setPixmap(pixmap)
The line to set text antialiasing is not optional. Well, theoretically
it's optional; go ahead, try it without that and see for yourself.
It's basically unreadable.
Of course, there are plenty of other details to take care of.
For instance, you can get the size of the rendered image:
size = page.pageSize()
... after which you can use size.width()
and size.height(). They're in points.
There are 72 points per inch, so calculate accordingly in the dpi
values you pass to renderToImage if you're targeting a specific
DPI or need it to fit in a specific window size.
Window Resize and Efficient Rendering
Speaking of fitting to a window size, I wanted to resize the content
whenever the window was resized, which meant redefining
resizeEvent(self, event) on the widget.
Initially my PDFWidget inherited from Qwidget with a custom
paintEvent(), like this:
(Poppler also has a function page.renderToPainter(),
but I never did figure out how to get it to do anything useful.)
That worked, but when I added resizeEvent I got an
infinite loop: paintEvent() called resizeEvent() which triggered
another paintEvent(), ad infinitum. I couldn't find a way around
that (GTK has similar problems -- seems like nearly everything you do
generates another expose event -- but there you can temporarily disable
expose events while you're drawing). So I rewrote my PDFWidget
class to inherit from QLabel instead of QWidget, converted the
QImage to a QPixmap and passed it to self.setPixmap().
That let me get rid of the paintEvent() function entirely and
let QLabel handle the painting, which is probably more efficient
anyway.
Showing all pages in a scrolled widget
renderToImage gives you one image corresponding to one page of
the PDF document. More often, you'll want to see the whole document
laid out, with all the pages. So you need a way to stack a bunch of
widgets vertically, one for each page. You can do that with a
QVBoxLayout on a widget inside a QScrollArea.
I haven't done much Qt5 programming, so I wasn't familiar with how
these QVBoxes work. Most toolkits I've worked with have a VBox
container widget to which you add child widgets, but in Qt5, you
create a widget (no particular type -- a QWidget is enough), then
create a layout object that modifies the widget, and add the
sub-widgets to the layout object. There isn't much documentation for
any of this, and very few examples of doing it in Python, so it took
some fiddling to get it working.
Initial Window Size
One last thing: Qt5 doesn't seem to have a concept of desired initial
window size. Most of the examples I found, especially the ones that
use a .ui file, use setGeometry(); but that requires an
(X, Y) position as well as (width, height), and there's no way to tell
it to ignore the position. That means that instead of letting your
window manager place the window according to your preferences, the
window will insist on showing up at whatever arbitrary place you set
in the code. Worse, most of the Qt5 examples I found online set the
geometry to (0, 0): when I tried that, the window came up with the
widget in the upper left corner of the screen and the window's
titlebar hidden above the top of the screen, so there's no way to move
the window to a better location unless you happen to know your window
manager's hidden key binding for that. (Hint: on many Linux window
managers, hold Alt down and drag anywhere in the window to move it. If
that doesn't work, try holding down the "Windows" key instead of Alt.)
This may explain why I've been seeing an increasing number of these
ill-behaved programs that come up with their titlebars offscreen.
But if you want your programs to be better behaved, it works to
self.resize(width, height) a widget when you first create it.
The current incarnation of my PDF viewer, set up as a module so you
can import it and use it in other programs, is at
qpdfview.py
on GitHub.
When you attach hardware buttons to a Raspberry Pi's GPIO pin,
reading the button's value at any given instant is easy with
GPIO.input(). But what if you want to watch for
button changes? And how do you do that from a GUI program where
the main loop is buried in some library?
Here are some examples of ways to read buttons from a Pi.
For this example, I have one side of my button wired to the Raspberry
Pi's GPIO 18 and the other side wired to the Pi's 3.3v pin.
I'll use the Pi's internal pulldown resistor rather than adding
external resistors.
The simplest way: Polling
The obvious way to monitor a button is in a loop, checking the
button's value each time:
import RPi.GPIO as GPIO
import time
button_pin = 18
GPIO.setmode(GPIO.BCM)
GPIO.setup(button_pin, GPIO.IN, pull_up_down = GPIO.PUD_DOWN)
try:
while True:
if GPIO.input(button_pin):
print("ON")
else:
print("OFF")
time.sleep(1)
except KeyboardInterrupt:
print("Cleaning up")
GPIO.cleanup()
But if you want to be doing something else while you're waiting,
instead of just sleeping for a second, it's better to use edge detection.
Edge Detection
GPIO.add_event_detect,
will call you back whenever it sees the pin's value change.
I'll define a button_handler function that prints out
the value of the pin whenever it gets called:
import RPi.GPIO as GPIO
import time
def button_handler(pin):
print("pin %s's value is %s" % (pin, GPIO.input(pin)))
if __name__ == '__main__':
button_pin = 18
GPIO.setmode(GPIO.BCM)
GPIO.setup(button_pin, GPIO.IN, pull_up_down = GPIO.PUD_DOWN)
# events can be GPIO.RISING, GPIO.FALLING, or GPIO.BOTH
GPIO.add_event_detect(button_pin, GPIO.BOTH,
callback=button_handler,
bouncetime=300)
try:
time.sleep(1000)
except KeyboardInterrupt:
GPIO.cleanup()
Pretty nifty. But if you try it, you'll probably find that sometimes
the value is wrong. You release the switch but it says the value is
1 rather than 0. What's up?
Debounce and Delays
The problem seems to be in the way RPi.GPIO handles that
bouncetime=300 parameter.
The bouncetime is there because hardware switches are noisy. As you
move the switch from ON to OFF, it doesn't go cleanly all at once
from 3.3 volts to 0 volts. Most switches will flicker back
and forth between the two values before settling down. To see bounce
in action, try the program above without the bouncetime=300.
There are ways of fixing bounce in hardware, by adding a capacitor or
a Schmitt trigger to the circuit; or you can "debounce" the button
in software, by waiting a while after you see a change before
acting on it. That's what the bouncetime parameter is for.
But apparently RPi.GPIO, when it handles bouncetime, doesn't
always wait quite long enough before calling its event function.
It sometimes calls button_handler while the switch is still
bouncing, and the value you read might be the wrong one.
Increasing bouncetime doesn't help.
This seems to be a bug in the RPi.GPIO library.
You'll get more reliable results if you wait a little while before
reading the pin's value:
def button_handler(pin):
time.sleep(.01) # Wait a while for the pin to settle
print("pin %s's value is %s" % (pin, GPIO.input(pin)))
Why .01 seconds? Because when I tried it, .001 wasn't enough, and if
I used the full bounce time, .3 seconds (corresponding to 300 millisecond
bouncetime), I found that the button handler
sometimes got called multiple times with the wrong value. I wish
I had a better answer for the right amount of time to wait.
Incidentally, the choice of 300 milliseconds for bouncetime is arbitrary
and the best value depends on the circuit. You can play around with
different values (after commenting out the .01-second sleep) and
see how they work with your own circuit and switch.
You might think you could solve the problem by using two handlers:
but that apparently isn't allowed:
RuntimeError: Conflicting edge detection already enabled for
this GPIO channel.
Even if you look just for GPIO.RISING, you'll
still get some bogus calls, because there are both rising and falling
edges as the switch bounces. Detecting GPIO.BOTH, waiting
a short time and checking the pin's value is the only reliable method
I've found.
Edge Detection from a GUI Program
And now, the main inspiration for all of this: when you're running a
program with a graphical user interface, you don't have
control over the event loop. Fortunately, edge detection works
fine from a GUI program. For instance, here's a simple TkInter program
that monitors a button and shows its state.
Dave and I will be giving a planetarium talk in February
on the analemma and related matters.
Our planetarium, which runs a fiddly and rather limited program called
Nightshade, has no way of showing the analemma. Or at least, after
trying for nearly a week once, I couldn't find a way. But it can
show images, and since I once wrote a
Python
program to plot the analemma, I figured I could use my program
to generate the analemmas I wanted to show and then project them
as images onto the planetarium dome.
But naturally, I wanted to project just the analemma and
associated labels; I didn't want the blue background to
cover up the stars the planetarium shows. So I couldn't just use
a simple screenshot; I needed a way to get my GTK app to create a
transparent image such as a PNG.
That turns out to be hard. GTK can't do it (either GTK2 or GTK3),
and people wanting to do anything with transparency are nudged toward
the Cairo library. As a first step, I updated my analemma program to
use Cairo and GTK3 via gi.repository. Then I dove into Cairo.
A Cairo surface is like a canvas to draw on, and it knows how to
save itself to a PNG image.
A context is the equivalent of a GC in X11 programming:
it knows about the current color, font and so forth.
So the trick is to create a new surface, create a context,
then draw everything all over again with the new context and surface.
A Cairo widget will already have a function to draw everything
(in my case, the analemma and all its labels), with this signature:
def draw(self, widget, ctx):
It already allows passing the context in, so passing in a different
context is no problem. I added an argument specifying the background
color and transparency, so I could use a blue background in the user
interface but a transparent background for the PNG image:
def draw(self, widget, ctx, background=None):
I also had a minor hitch: in draw(), I was saving the context as
self.ctx rather than passing it around to every draw routine.
That means calling it with the saved image's context would overwrite
the one used for the GUI window. So I save it first.
Here's the final image saving code:
def save_image(self, outfile):
dst_surface = cairo.ImageSurface(cairo.FORMAT_ARGB32,
self.width, self.height)
dst_ctx = cairo.Context(dst_surface)
# draw() will overwrite self.ctx, so save it first:
save_ctx = self.ctx
# Draw everything again to the new context,
# with a transparent instead of an opaque background:
self.draw(None, dst_ctx, (0, 0, 1, 0)) # transparent blue
# Restore the GUI context:
self.ctx = save_ctx
dst_surface.write_to_png("example.png")
print("Saved to", outfile)
Are you interested in all things Raspberry Pi, or just curious about them?
Come join like-minded people this Thursday at 7pm for the inaugural meeting
of the Los Alamos Raspberry Pi club!
At Los Alamos Makers,
we've had the Coder Dojo for Teens going on for over a year now,
but there haven't been any comparable programs that welcomes adults.
Pi club is open to all ages.
The format will be similar to Coder Dojo: no lectures or formal
presentations, just a bunch of people with similar interests.
Bring a project you're working on, see what other people are working
on, ask questions, answer questions, trade ideas and share knowledge.
Bring your own Pi if you like, or try out one of the Pi 3 workstations
Los Alamos Makers has set up. (If you use one of the workstations there,
I recommend bringing a USB stick so you can save your work to take home.)
Although the group is officially for Raspberry Pi hacking, I'm sure
many attendees will interested in Arduino or other microcontrollers, or
Beaglebones or other tiny Linux computers; conversation and projects
along those lines will be welcome.
Beginners are welcome too. You don't have to own a Pi, know a resistor
from a capacitor, or know anything about programming. I've been asked
a few times about where an adult can learn to program. The Raspberry Pi
was originally introduced as a fun way to teach schoolchildren to
program computers, and it includes programming resources suitable to
all ages and abilities. If you want to learn programming on your own
laptop rather than a Raspberry Pi, we won't turn you away.
Raspberry Pi Club:
Thursdays, 7pm, at Los Alamos Makers, 3540 Orange Street (the old PEEC
location), Suite LV1 (the farthest door from the parking lot -- look
for the "elevated walkway" painted outside the door).
There's a Facebook event:
Raspberry Pi club
on Facebook. We have meetings scheduled for the next few Thursdays:
December 7, 14, and 21, and after that we'll decide based on interest.
Having written a basic blink program in C for
my
ATtiny85 with a USBtinyISP (Part 1), I wanted to use it to control other
types of hardware. That meant I wanted to be able to use Arduino libraries.
In "Additional Boards Manager" near the bottom, paste this: https://raw.githubusercontent.com/damellis/attiny/ide-1.6.x-boards-manager/package_damellis_attiny_index.json and click OK
Tools->Boards->Board Manager...
Find the ATTiny entry, click on it, and click Install
Back in the main Arduino IDE, Tools->Boards should now havea
couple of Attiny entries. Choose the one that corresponds to your
ATTiny; then, under Processor, narrow it down further.
In Tools->Programmer, choose the programmer you're using
(for example, USBtinyISP).
Now you should be able to Verify and Upload a blink sketch
just like you would to a regular Arduino, subject to the pin limitations
of the ATTiny.
That worked for blink. But it didn't work when I started adding libraries.
Since the command-line was what I really cared about, I moved on rather
than worrying about libraries just yet.
ATtiny with Arduino-Makefile
For most of my Arduino development I use an excellent package called
Arduino-Makefile.
There's a Debian package called arduino-mk that works fine for normal
Arduinos, but for ATtiny, there have been changes, so use the version
from git.
A minimal blink Makefile looks like this:
BOARD_TAG = uno
include /usr/share/arduino/Arduino.mk
It assumes that if you're in a directory called blink, it
should compile a file called blink.ino. It will also build
any additional .cpp files it finds there. make upload
uploads the code to a normal Arduino.
With Attiny it gets quite a bit more complicated.
The key is that you have to specify an alternate core:
ALTERNATE_CORE = ATTinyCore
But there are lots of different ATtiny cores, they're all different,
and they each need a different set of specifiers like BOARD_TAG in
the Makefile. Arduino-Makefile comes with an example, but it isn't
very useful since it doesn't say where to get the cores that correspond
with the various samples. I ended up filing a documentation bug and
exchanging some back-and-forth with the maintainer of the package,
Simon John, and here's what I learned.
First: as I mentioned earlier, you should use the latest git version
of Arduino-Makefile. The version in Debian is a little older and some
things have changed; while the older version can be made to work with
ATtiny, the recipes will be different from the ones here.
Second, the recipes for each core will be different depending on which
version of the Arduino software you're using. Simon
says he sticks to version 1.0.5 when he uses ATtinys, because newer
versions don't work as well. That may be smart (certainly he has a lot
more experience than I do), but I'm always hesitant to rely on
software that old, so I wanted to get things working with the latest
Arduino, 1.8.5, if i could, so that's what the recipes here will
reflect.
Third, as mentioned in Part 1, clock rate should be 1MHz, not 8MHz
as you'll see in a lot of web examples, so:
F_CPU = 1000000L
Fourth, uploading sketches. As mentioned in the last article, I'm using
a USBtinyISP. For that, I use ISP_PROG = usbtiny and
sketches are uploaded by typing make ispload rather than
the usual make upload. change that if you're usinga
different programmer.
With those preliminaries over:
I ended up getting two different cores working,
and there were two that didn't work.
Install the cores in subdirectories in
your ~/sketchbook/hardware directory. You can have multiple
cores installed at once if you want to test different cores.
Here are the recipes.
CodingBadly's arduino-tiny
This is the core that Simon says he prefers, so it's the one I'm going
to use as my default. It's at
https://github.com/Coding-Badly/arduino-tiny.git,
and also a version on Google Code. (Neither one has been updated since 2013.)
git clone it into your sketchbook/hardware.
Then either cp 'Prospective Boards.txt' boards.txt
or create a new boards.txt and copy from 'Prospective Boards.txt'
all the boards you're interested in (for instance, all the attiny85
definitions if attiny85 is the only attiny board you have).
Then your Makefile should look something like this:
If your Arduino software is installed in /usr/share/arduino you can
omit the first line.
Now copy blink.ino -- of course, you'll have to change pin 13
to be something between 1 and 6 since that's how many pins an ATtiny
has -- and try make and make ispload.
SpenceKonde's ATTinyCore
This core is at https://github.com/SpenceKonde/ATTinyCore.git.
I didn't need to copy boards.txt or make any other changes,
just clone it under sketches/hardware and then use this Makefile:
There are plenty of other ATtiny cores around. Here are two that
apparently worked once, but I couldn't get them working with the
current version of the tools. I'll omit links to them to try to
reduce the probability of search engines linking to them rather
than to the more up-to-date cores.
Damellis's attiny (you may see this referred to as HLT after the
domain name, "Highlowtech"), on GitHub as
damellis/attiny,
was the first core I got working with Debian's older version of
arduino-mk and Arduino 1.8.4. But when I upgraded to the latest
Arduino-Makefile and Arduino 1.8.5, it no longer worked. Ironic since
an older version of it was the one used in most of the tutorials I
found for using ATtiny with the Arduino IDE.
Simon says this core is buggy: in particular, there are problems with
software PWM.
I also tried rexxar-tc's arduino-tiny.core2 (also on GitHub).
I couldn't get it to work with any of the Makefile or Arduino
versions I tried, though it may have worked with Arduino 1.0.
With two working cores, I can get an LED to blink.
But libraries are the point of using the Arduino framework ...
and as I tried to move beyond blink.ino, I found that
not all Arduino libraries work with ATtiny.
In particular, Wire, used for protocols like I2C to talk to all
kinds of useful chips, doesn't work without substantial revisions.
But that's a whole separate topic. Stay tuned.
Arduinos are great for prototyping, but for a small, low-power,
cheap and simple design, an ATtiny chip seems like just the ticket.
For just a few dollars you can do most of what you could with an
Arduino and use a lot of the same code, as long as you can make do
with a little less memory and fewer pins.
I've been wanting to try them, and recently I ordered a few ATtiny85 chips.
There are quite a few ways to program them. You can buy programmers
specifically intended for an ATtiny, but I already had a USBtinyISP,
a chip used to program Arduino bootloaders, so that's what I'll
discuss here.
Wiring to the USBtinyISP
The best reference I found on wiring was
Using USBTinyISP to program ATTiny45 and ATTiny85.
That's pretty clear, but I made my own Fritzing diagram, with colors,
so it'll be easy to reconstruct it next time I need it.
The colors I used:
#include <avr/io.h>
#include <util/delay.h>
int main (void)
{
// Set Data Direction to output on port B, pins 2 and 3:
DDRB = 0b00001000;
while (1) {
// set PB3 high
PORTB = 0b00001000;
_delay_ms(500);
// set PB3 low
PORTB = 0b00000000;
_delay_ms(500);
}
return 1;
}
Then you need a Makefile. I started with the one linked from the electronut
page above. Modify it if you're using a programmer other than a USBtinyISP.
make builds the program, and make install
loads it to the ATtiny. And, incredibly, my light started blinking,
the first time!
Encouraged, I added another LED to make sure I understood.
The ATtiny85 has six pins you can use (the other two are power and ground).
The pin numbers correspond to the bits in DDRB and PORTB:
my LED was on PB3. I added another LED on PB2 and made it alternate
with the first one:
DDRB = 0b00001100;
[ ... ]
// set PB3 high, PB2 low
PORTB = 0b00001000;
_delay_ms(500);
// set PB3 low, PB2 high
PORTB = 0b00000100;
_delay_ms(500);
Timing Woes
But wait -- not everything was rosy. I was calling _delay_ms(500),
but it was waiting a lot longer than half a second between flashes.
What was wrong?
For some reason, a lot of ATtiny sample code on the web assumes the
chip is running at 8MHz. The chip's internal oscillator is indeed 8MHz
(though you can also run it with an external crystal at various
speeds) -- but its default mode uses that oscillator in "divide by
eight" mode, meaning its actual clock rate is 1MHz. But Makefiles
you'll find on the web don't take that into account (maybe because
they're all copied from the same original source). So, for instance,
the Makefile I got from electronut has
CLOCK = 8000000
If I changed that to
CLOCK = 1000000
now my delays were proper milliseconds, as I'd specified.
Here's my working
attiny85
blink Makefile.
In case you're curious about clock rate, it's specified by what are
called fuses, which sound permanent but aren't: they hold their
values when the chip loses power, but you can set them over and over.
You can read the current fuse settings like this:
avrdude -c usbtiny -p attiny85 -U lfuse:r:-:i -v
which should print something like this:
avrdude: safemode: hfuse reads as DF
avrdude: safemode: efuse reads as FF
avrdude: safemode: Fuses OK (E:FF, H:DF, L:62)
To figure out what that means, go to the
Fuse calculator,
scroll down to Current settings and enter the three values
you got from avrdude (E, H and L correspond to Extended, High and Low).
Then scroll up to Feature configuration
to see what the fuse settings correspond to.
In my case it was
Int. RC Osc. 8 Mhz; Start-up time PWRDWN/RESET; 6CK/14CK+
64ms; [CKSEL=1011 SUT=10]; default value
and Divide clock by 8 internally; [CKDIV8=0] was checked.
Nobody seems to have written much about AVR/ATTINY
programming in general. Symbols like PORTB and
functions like _delay_ms() come from files in
/usr/lib/avr/include/, at least on my Debian system.
There's not much there, so if you want library functions to handle
nontrivial hardware, you'll have to write them or find them somewhere else.
As for understanding pins, you're supposed to go to the datasheet and read it
through, all 234 pages. Hint: for understanding basics of reading from and
writing to ports, speed forward to section 10, I/O Ports.
A short excerpt from that section:
Three I/O memory address locations are allocated for each port, one
each for the Data Register - PORTx, Data Direction Register - DDRx,
and the Port Input Pins - PINx. The Port Input Pins I/O location is
read only, while the Data Register and the Data Direction Register are
read/write. However, writing a logic one to a bit in the PINx
Register, (comma sic) will result in a toggle in the
corresponding Data Register. In addition, the Pull-up Disable - PUD
bit in MCUCR disables the pull-up function for all pins in all ports
when set.
There's also some interesting information there about built-in pull-up
resistors and how to activate or deactivate them.
That's helpful, but here's the part I wish they'd said:
PORTB (along with DDRB and PINB) represents all six pins. (Why B? Is
there a PORTA? Not as far as I can tell; at least, no PORTA is
mentioned in the datasheet.) There are six output pins, corresponding
to the six pins on the chip that are not power or ground. Set the bits
in DDRB and PORTB to correspond to the pins you want to set. So if you
want to use pins 0 through 3 for output, do this:
DDRB = 0b00001111;
If you want to set logical pins 1 and 3 (corresponding to pins 6 and 2
on the chip) high, and the rest of the pins low, do this:
PORTB = 0b00001010;
To read from pins, use PINB.
In addition to basic functionality, all the pins have specialized
uses, like timers, SPI, ADC and even temperature measurement (see the
diagram above). The datasheet goes into more detail about how to get
into some of those specialized modes.
But a lot of those specialties are easier to deal with using
libraries. And there are a lot more libraries available for the Arduino
C++ environment than there are for a bare ATtiny using C.
So the next step is to program the ATtiny using Arduino ...
which deserves its own article.
I do most of my coding on my home machine. But when I travel (or sit
in boring meetings), sometimes I do a little hacking on my laptop.
Most of my code is hosted in GitHub
repos, so when I travel, I like to update all the repos on the laptop
to make sure I have what I need even when I'm offline.
That works great as long as I don't make branches. I have a variable
$myrepos that lists all the github repositories where I want to contribute,
and with a little shell alias it's easy enough to update them all:
allgit() {
pushd ~
foreach repo ($myrepos)
echo $repo :
cd ~/src/$repo
git pull
end
popd
}
That works well enough -- as long as you don't use branches.
Git's branch model seems to be that branches are for local development,
and aren't meant to be shared, pushed, or synchronized among machines.
It's ridiculously difficult in git to do something like, "for all
branches on the remote server, make sure I have that branch and it's
in sync with the server." When you create branches, they don't push
to the server by default, and it's remarkably difficult to figure out
which of your branches is actually tracking a branch on the server.
A web search finds plenty of people asking, and most of the Git experts
answering say things like "Just check out the branch, then pull."
In other words, if you want to work on a branch, you'd better know
before you go offline exactly which branches in which repositories
might have been created or updated since the last time you worked
in that repository on that machine. I guess that works if you only
ever work on one project in one repo and only on one or two branches
at a time. It certainly doesn't work if you need to update lots of
repos on a laptop for the first time in two weeks.
Further web searching does find a few possibilities. For checking
whether there are files modified that need to be committed,
git status --porcelain -uno works well.
For checking whether changes are committed but not pushed,
git for-each-ref --format="%(refname:short) %(push:track)" refs/heads | fgrep '[ahead'
works ... if you make an alias so you never have to look at it.
Figuring out whether branches are tracking remotes is a lot harder.
I found some recommendations like
git branch -r | grep -v '\->' | while read remote; do git branch --track "${remote#origin/}" "$remote"; done
and
for remote in `git branch -r`; do git branch --track ${remote#origin/} $remote; done
but neither of them really did what I wanted. I was chasing down the
rabbit hole of writing shell loops using variables like
localbranches=("${(@f)$(git branch | sed 's/..//')}")
remotebranches=("${(@f)$(git branch -a | grep remotes | grep -v HEAD | grep -v master | sed 's_remotes/origin/__' | sed 's/..//')}")
when I thought, there must be a better way. Maybe using Python bindings?
git-python
In Debian, the available packages for Git Python bindings are
python-git, python-pygit2, and python-dulwich.
Nobody on #python seemed to like any of them, but based on quick
attempts with all three, python-git seemed the most straightforward.
Confusingly, though Debian calls it python-git, it's called
"git-python" in
its docs or in web searches, and it's "import git" when you use it.
It's pretty straightforward to use, at least for simple things.
You can create a Repo object with
from git import Repo
repo = Repo('.')
and then you can get lists like repo.heads (local branches),
repo.refs (local and remote branches and other refs such
as tags), etc. Once you have a ref, you can use ref.name,
check whether it's tracking a remote branch
with ref.tracking_branch(), and make it track one with
ref.set_tracking_branch(remoteref). That makes it very
easy to get a list of branches showing which ones are tracking a remote
branch, something that had proved almost impossible with the git
command line.
Nice. But now I wanted more: I wanted to replace those baroque
git status --porcelain and git for-each-ref
commands I had been using to check whether my repos needed committing
or pushing. That proved harder.
Checking for uncommitted files, I decided it would be easiest stick with the existing
git status --porcelain -uno. Which was sort of true.
git-python lets you call git commands, for cases where the
Python bindings aren't quite up to snuff yet, but it doesn't handle
all cases. I could call:
output = repo.git.status(porcelain=True)
but I never did find a way to pass the -uno; I tried u=False,
u=None, and u="no" but none of them worked.
But -uno actually isn't that important so I decided to do without it.
I found out later that there's another way to call the git command,
using execute, which lets you pass the exact arguments you'd
pass on the command line. It didn't work to call for-each-ref
the way I'd called repo.git.status (repo.git.for_each_ref
isn't defined), but I could call it this way:
and then parse the output looking for "[ahead]". That worked, but ... ick.
I wanted to figure out how to do that using Python.
It's easy to get a ref (branch) and its corresponding tracking ref
(remote branch).
ref.log() gives you a list of commits on each of the two branches,
ordered from earliest to most recent, the opposite of git log.
In the simple case, then, what I needed was to iterate backward over
the two commit logs, looking for the most recent SHA that's common to both.
The Python builtin reversed was useful here:
for i, entry in enumerate(reversed(ref.log())):
for j, upstream_entry in enumerate(reversed(upstream.log())):
if entry.newhexsha == upstream_entry.newhexsha:
return i, j
(i, j) are the number of commits on the local branch that the
remote hasn't seen, and vice versa. If i is zero, or if there's nothing
in ref.log(), then the repo has no new commits and doesn't need
pushing.
Making branches track a remote
The last thing I needed to do was to make branches track their remotes.
Too many times, I've found myself on the laptop, ready to work, and
discovered that I didn't have the latest code because I'd been working
on a branch on my home machine, and my git pull hadn't pulled
the info for the branch because that branch wasn't in the laptop's
repo yet. That's what got me started on this whole "update everything"
script in the first place.
If you have a ref for the local branch and a ref for the remote branch,
you can verify their ref.name is the same, and if the local
branch has the same name but isn't tracking the remote branch,
probably something went wrong with the local repo (like one of my
earlier attempts to get branches in sync, and it's an easy fix:
ref.set_tracking_branch(remoteref).
But what if the local branch doesn't exist yet? That's the situation I
cared about most, when I've been working on a new branch and it's not
on the laptop yet, but I'm going to want to work on it while traveling.
And that turned out to be difficult, maybe impossible, to do in git-python.
It's easy to create a new local branch:
repo.head.create(repo, name).
But that branch gets created as a copy of master, and if you try to
turn it into a copy of the remote branch, you get conflicts because
the branch is ahead of the remote branch you're trying to copy, or
vice versa. You really need to create the new branch as a copy of
the remote branch it's supposed to be tracking.
If you search the git-python documentation for ref.create, there
are references to "For more documentation, please see the Head.create method."
Head.create takes a reference argument (the basic ref.create
doesn't, though the documentation suggests it should).
But how can you call Head.create? I had no luck with attempts like
repo.git.Head.create(repo, name, reference=remotebranches[name]).
I finally gave up and went back to calling the command line
from git-python.
repo.git.checkout(remotebranchname, b=name)
I'm not entirely happy with that, but it seems to work.
I'm sure there are all sorts of problems left to solve. But this
script does a much better job than any git command I've found of
listing the branches in my repositories, checking for modifications
that require commits or pushes, and making local branches
to mirror new branches on the server. And maybe with time the git-python
bindings will improve, and eventually I'll be able to create new tracking
branches locally without needing the command line.
Update 2022-06-24: Although the concepts described in this article
are still valid, the program I wrote depends on GTK2 and is therefore
obsolete. I discuss versions for more modern toolkits here:
Clicking through a Translucent Image Window.
It happened again: someone sent me a JPEG file with an image of a topo
map, with a hiking trail and interesting stopping points drawn on it.
Better than nothing. But what I really want on a hike is GPX waypoints
that I can load into OsmAnd, so I can see whether I'm still on the trail
and how to get to each point from where I am now.
My PyTopo program
lets you view the coordinates of any point, so you can make a waypoint
from that. But for adding lots of waypoints, that's too much work, so
I added an "Add Waypoint" context menu item -- that was easy,
took maybe twenty minutes.
PyTopo already had the ability to save its existing tracks and waypoints
as a GPX file, so no problem there.
But how do you locate the waypoints you want? You can do it the hard
way: show the JPEG in one window, PyTopo in the other, and
do the "let's see the road bends left then right, and the point is
off to the northwest just above the right bend and about two and a half
times as far away as the distance through both road bends". Ugh.
It takes forever and it's terribly inaccurate.
More than once, I've wished for a way to put up a translucent image
overlay that would let me click through it. So I could see the image,
line it up with the map in PyTopo (resizing as needed),
then click exactly where I wanted waypoints.
I needed two features beyond what normal image viewers offer:
translucency, and the ability to pass mouse clicks through to the
window underneath.
A translucent image viewer, in Python
The first part, translucency, turned out to be trivial.
In a class inheriting from my
Python
ImageViewerWindow, I just needed to add this line to the constructor:
self.set_opacity(.5)
Plus one more step.
The window was translucent now, but it didn't look translucent,
because I'm running a simple window manager (Openbox) that
doesn't have a compositor built in. Turns out you can run a compositor on top
of Openbox. There are lots of compositors; the first one I found,
which worked fine, was
xcompmgr -c -t-6 -l-6 -o.1
The -c specifies client-side compositing. -t and -l specify top and left
offsets for window shadows (negative so they go on the bottom right).
-o.1 sets the opacity of window shadows. In the long run, -o0 is
probably best (no shadows at all) since the shadow interferes
a bit with seeing the window under the translucent one. But having a
subtle .1 shadow was useful while I was debugging.
That's all I needed: voilà, translucent windows.
Now on to the (much) harder part.
A click-through window, in C
X11 has something called the SHAPE extension, which I experimented with
once before to make a silly program called
moonroot.
It's also used for the familiar "xeyes" program.
It's used to make windows that aren't square, by passing a shape mask
telling X what shape you want your window to be.
In theory, I knew I could do something like make a mask where every
other pixel was transparent, which would simulate a translucent image,
and I'd at least be able to pass clicks through on half the pixels.
But fortunately, first I asked the estimable Openbox guru Mikael
Magnusson, who tipped me off that the SHAPE extension also allows for
an "input shape" that does exactly what I wanted: lets you catch
events on only part of the window and pass them through on the rest,
regardless of which parts of the window are visible.
Knowing that was great. Making it work was another matter.
Input shapes turn out to be something hardly anyone uses, and
there's very little documentation.
In both C and Python, I struggled with drawing onto a pixmap
and using it to set the input shape. Finally I realized that there's a
call to set the input shape from an X region. It's much easier to build
a region out of rectangles than to draw onto a pixmap.
I got a C demo working first. The essence of it was this:
if (!XShapeQueryExtension(dpy, &shape_event_base, &shape_error_base)) {
printf("No SHAPE extension\n");
return;
}
/* Make a shaped window, a rectangle smaller than the total
* size of the window. The rest will be transparent.
*/
region = CreateRegion(outerBound, outerBound,
XWinSize-outerBound*2, YWinSize-outerBound*2);
XShapeCombineRegion(dpy, win, ShapeBounding, 0, 0, region, ShapeSet);
XDestroyRegion(region);
/* Make a frame region.
* So in the outer frame, we get input, but inside it, it passes through.
*/
region = CreateFrameRegion(innerBound);
XShapeCombineRegion(dpy, win, ShapeInput, 0, 0, region, ShapeSet);
XDestroyRegion(region);
CreateRegion sets up rectangle boundaries, then creates a region
from those boundaries:
Region CreateRegion(int x, int y, int w, int h) {
Region region = XCreateRegion();
XRectangle rectangle;
rectangle.x = x;
rectangle.y = y;
rectangle.width = w;
rectangle.height = h;
XUnionRectWithRegion(&rectangle, region, region);
return region;
}
Next problem: once I had shaped input working, I could no longer move
or resize the window, because the window manager passed events through
the window's titlebar and decorations as well as through the rest of
the window.
That's why you'll see that CreateFrameRegion call in the gist:
-- I had a theory that if I omitted the outer part of the window from
the input shape, and handled input normally around the outside, maybe
that would extend to the window manager decorations. But the problem
turned out to be a minor Openbox bug, which Mikael quickly
tracked down (in openbox/frame.c, in the
XShapeCombineRectangles call on line 321,
change ShapeBounding to kind).
Openbox developers are the greatest!
Input Shapes in Python
Okay, now I had a proof of concept: X input shapes definitely can work,
at least in C. How about in Python?
There's a set of python-xlib bindings, and they even supports the SHAPE
extension, but they have no documentation and didn't seem to include
input shapes. I filed a GitHub issue and traded a few notes with
the maintainer of the project.
It turned out the newest version of python-xlib had been completely
rewritten, and supposedly does support input shapes. But the API is
completely different from the C API, and after wasting about half a day
tweaking the demo program trying to reverse engineer it, I gave up.
Fortunately, it turns out there's a much easier way. Python-gtk has
shape support, even including input shapes. And if you use regions
instead of pixmaps, it's this simple:
if self.is_composited():
region = gtk.gdk.region_rectangle(gtk.gdk.Rectangle(0, 0, 1, 1))
self.window.input_shape_combine_region(region, 0, 0)
My transimageviewer.py
came out nice and simple, inheriting from imageviewer.py and adding only
translucency and the input shape.
If you want to define an input shape based on pixmaps instead of regions,
it's a bit harder and you need to use the Cairo drawing API. I never got as
far as working code, but I believe it should go something like this:
# Warning: untested code!
bitmap = gtk.gdk.Pixmap(None, self.width, self.height, 1)
cr = bitmap.cairo_create()
# Draw a white circle in a black rect:
cr.rectangle(0, 0, self.width, self.height)
cr.set_operator(cairo.OPERATOR_CLEAR)
cr.fill();
# draw white filled circle
cr.arc(self.width / 2, self.height / 2, self.width / 4,
0, 2 * math.pi);
cr.set_operator(cairo.OPERATOR_OVER);
cr.fill();
self.window.input_shape_combine_mask(bitmap, 0, 0)
The translucent image viewer worked just as I'd hoped. I was able to
take a JPG of a trailmap, overlay it on top of a PyTopo window, scale
the JPG using the normal Openbox window manager handles, then
right-click on top of trail markers to set waypoints. When I was done,
a "Save as GPX" in PyTopo and I had a file ready to take with me on my
phone.
As part of preparation for Everyone Does IT, I was working on a silly
hack to my
Python
script that plays notes and chords:
I wanted to use the computer keyboard like a music keyboard, and play
different notes when I press different keys. Obviously, in a case like
that I don't want line buffering -- I want the program to play notes
as soon as I press a key, not wait until I hit Enter and then play the
whole line at once. In Unix that's called "cbreak mode".
There are a few ways to do this in Python. The most straightforward way
is to use the curses library, which is designed for console based
user interfaces and games. But importing curses is overkill just to do
key reading.
Years ago, I found a guide on the official Python Library and
Extension FAQ:
Python:
How do I get a single keypress at a time?.
I'd even used it once, for a one-off Raspberry Pi project that I didn't
end up using much. I hadn't done much testing of it at the time, but
trying it now, I found a big problem: it doesn't block.
Blocking is whether the read() waits for input or returns immediately.
If I read a character with
c = sys.stdin.read(1) but there's been no character typed yet,
a non-blocking read will throw an IOError exception, while a blocking
read will wait, not returning until the user types a character.
In the code on that Python FAQ page, blocking looks like it should be
optional. This line:
is the part that requests non-blocking reads. Skipping that should
let me read characters one at a time, block until each character
is typed. But in practice, it doesn't work. If I omit the O_NONBLOCK flag,
reads never return, not even if I hit Enter; if I set O_NONBLOCK, the read
immediately raises an IOError. So I have to call read() over
and over, spinning the CPU at 100% while I wait for the user to type something.
The way this is supposed to work is documented in the termios
man page. Part of what tcgetattr returns is something called the
cc structure, which includes two members called Vmin
and Vtime. man termios is very clear on how they're
supposed to work: for blocking, single character reads, you set Vmin
to 1 (that's the number of characters you want it to batch up before
returning), and Vtime to 0 (return immediately after getting that one
character). But setting them in Python with tcsetattr
doesn't make any difference.
(Python also has a module called
tty
that's supposed to simplify this stuff, and you should be able to call
tty.setcbreak(fd). But that didn't work any better
than termios: I suspect it just calls termios under the hood.)
But after a few hours of fiddling and googling, I realized that even
if Python's termios can't block, there are other ways of blocking on input.
The select system call lets you wait on any file
descriptor until has input. So I should be able to set stdin to be
non-blocking, then do my own blocking by waiting for it with select.
And that worked. Here's a minimal example:
import sys, os
import termios, fcntl
import select
fd = sys.stdin.fileno()
newattr = termios.tcgetattr(fd)
newattr[3] = newattr[3] & ~termios.ICANON
newattr[3] = newattr[3] & ~termios.ECHO
termios.tcsetattr(fd, termios.TCSANOW, newattr)
oldterm = termios.tcgetattr(fd)
oldflags = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)
print "Type some stuff"
while True:
inp, outp, err = select.select([sys.stdin], [], [])
c = sys.stdin.read()
if c == 'q':
break
print "-", c
# Reset the terminal:
termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags)
I've been quiet for a while, partly because I've been busy preparing
for a booth at the upcoming
Everyone Does IT
event at PEEC, organized by
LANL.
In addition to booths from quite a few LANL and community groups,
they'll show the movie "CODE: Debugging the Gender Gap" in the planetarium,
I checked out the movie last week (our library has it) and it's a good
overview of the problem of diversity, and especially the problems
women face in in programming jobs.
I'll be at the Los Alamos Makers/Coder Dojo booth, where we'll be
showing an assortment of Raspberry Pi and Arduino based projects.
We've asked the Coder Dojo kids to come by and show off some of their projects.
I'll have my RPi crittercam there (such as it is) as well as another
Pi running motioneyeos, for comparison. (Motioneyeos turned out to be
remarkably difficult to install and configure, and doesn't seem to
do any better than my lightweight scripts at detecting motion without
false positives. But it does offer streaming video, which might be
nice for a booth.) I'll also be demonstrating cellular automata and
the Game of Life (especially since the CODE movie uses Life as a
background in quite a few scenes), music playing in Python,
a couple of Arduino-driven NeoPixel LED light strings, and possibly
an arm-waving penguin I built a few years ago for GetSET, if I can
get it working again: the servos aren't behaving reliably, but I'm not
sure yet whether it's a problem with the servos and their wiring or a
power supply problem.
The music playing script turned up an interesting Raspberry Pi problem.
The Pi has a headphone output, and initially when I plugged a powered
speaker into it, the program worked fine. But then later, it didn't.
After much debugging, it turned out that the difference was that I'd
made myself a user so I could have my normal shell environment.
I'd added my user to the audio group and all the other groups the
default "pi" user is in,
but the Pi's pulseaudio is set up to allow audio only from users
root and pi, and it ignores groups.
Nobody seems to have found a way around that, but
sudo apt-get purge pulseaudio solved the problem nicely.
I also hit a minor snag attempting to upgrade some of my older Raspbian
installs: lightdm can't upgrade itself (Errors were encountered
while processing: lightdm). Lots of people on the web have hit
this, and nobody has found a way around it; the only solution seems to
be to abandon the old installation and download a new Raspbian image.
But I think I have all my Raspbian cards installed and working now;
pulseaudio is gone, music plays, the Arduino light shows run.
Now to play around with servo power supplies and see if I can get
my penguin's arms waving again when someone steps in front of him.
Should be fun, and I can't wait to see the demos the other booths will have.
If you're in northern New Mexico, come by Everyone Does IT this Tuesday
night! It's 5:30-7:30 at PEEC,
the Los Alamos Nature Center, and everybody's welcome.
We have a terrific new program going on at
Los Alamos Makers:
a weekly Coder Dojo for kids, 6-7 on Tuesday nights.
Coder Dojo is a worldwide movement,
and our local dojo is based on their ideas.
Kids work on programming projects to earn colored USB wristbelts,
with the requirements for belts getting progressively harder.
Volunteer mentors are on hand to help, but we're not lecturing or
teaching, just coaching.
Despite not much advertising, word has gotten around and we typically
have 5-7 kids on Dojo nights, enough that all the makerspace's
Raspberry Pi workstations are filled and we sometimes have to scrounge
for more machines for the kids who don't bring their own laptops.
A fun moment early on came when we had a mentor meeting, and Neil,
our head organizer (who deserves most of the credit for making this
program work so well), looked around and said "One thing that might
be good at some point is to get more men involved." Sure enough --
he was the only man in the room! For whatever reason, most of the
programmers who have gotten involved have been women. A refreshing
change from the usual programming group.
(Come to think of it, the PEEC web development team is three women.
A girl could get a skewed idea of gender demographics, living here.)
The kids who come to program are about 40% girls.
I wondered at the beginning how it would work, with no lectures or
formal programs. Would the kids just sit passively, waiting to be
spoon fed? How would they get concepts like loops and conditionals
and functions without someone actively teaching them?
It wasn't a problem. A few kids have some prior programming practice,
and they help the others. Kids as young as 9 with no previous
programming experience walk it, sit down at a Raspberry Pi station,
and after five minutes of being shown how to bring up a Python console
and use Python's turtle graphics module to draw a line and turn a corner,
they're happily typing away, experimenting and making Python draw
great colorful shapes.
Python-turtle turns out to be a wonderful way for beginners to learn.
It's easy to get started, it makes pretty pictures, and yet, since
it's Python, it's not just training wheels: kids are using a real
programming language from the start, and they can search the web and
find lots of helpful examples when they're trying to figure out how to
do something new (just like professional programmers do. :-)
Initially we set easy requirements for the first (white) belt: attend
for three weeks, learn the names of other Dojo members. We didn't
require any actual programming until the second (yellow) belt, which
required writing a program with two of three elements: a conditional,
a loop, a function.
That plan went out the window at the end of the first evening, when
two kids had already fulfilled the yellow belt requirements ... even
though they were still two weeks away from the attendance requirement
for the white belt. One of them had never programmed before. We've
since scrapped the attendance belt, and now the white belt has the
conditional/loop/function requirement that used to be the yellow belt.
The program has been going for a bit over three months now. We've
awarded lots of white belts and a handful of yellows (three new ones
just this week). Although most of the kids are working in Python,
there are also several playing music or running LED strips using
Arduino/C++, writing games and web pages in Javascript, writing
adventure games Scratch, or just working through Khan Academy lectures.
When someone is ready for a belt, they present their program to
everyone in the room and people ask questions about it: what does that
line do? Which part of the program does that? How did you figure out
that part? Then the mentors review the code over the next week, and
they get the belt the following week.
For all but the first belt, helping newer members is a requirement,
though I suspect even without that they'd be helping each other. Sit a
first-timer next to someone who's typing away at a Python program and
watch the magic happen. Sometimes it feels almost superfluous being a
mentor. We chat with the kids and each other, work on our own projects,
shoulder-surf, and wait for someone to ask for help with harder problems.
Overall, a terrific program, and our only problems now are getting
funding for more belts and more workstations as the word spreads and
our Dojo nights get more crowded. I've had several adults ask me if
there was a comparable program for adults. Maybe some day (I hope).
Part of being a programmer is having an urge to automate repetitive tasks.
Every new HTML file I create should include some boilerplate HTML, like
<html><head></head></body></body></html>.
Every new Python file I create should start with
#!/usr/bin/env python, and most of them should end
with an if __name__ == "__main__": clause.
I get tired of typing all that, especially the dunderscores and
slash-greater-thans.
Long ago, I wrote an emacs function called newhtml
to insert the boilerplate code:
(defun newhtml ()
"Insert a template for an empty HTML page"
(interactive)
(insert "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n"
"<html>\n"
"<head>\n"
"<title></title>\n"
"</head>\n\n"
"<body>\n\n"
"<h1></h1>\n\n"
"<p>\n\n"
"</body>\n"
"</html>\n")
(forward-line -11)
(forward-char 7)
)
The motion commands at the end move the cursor back to point in
between the <title> and </title>, so I'm ready to type
the page title. (I should probably have it prompt me, so it can insert
the same string in title and h1, which
is almost always what I want.)
That has worked for quite a while. But when I decided it was time to
write the same function for python:
(defun newpython ()
"Insert a template for an empty Python script"
(interactive)
(insert "#!/usr/bin/env python\n"
"\n"
"\n"
"\n"
"if __name__ == '__main__':\n"
"\n"
)
(forward-line -4)
)
... I realized that I wanted to be even more lazy than that.
Emacs knows what sort of file it's editing -- it switches to html-mode
or python-mode as appropriate. Why not have it insert the template
automatically?
My first thought was to have emacs run the function upon loading a
file. There's a function with-eval-after-load which
supposedly can act based on file suffix, so something like
(with-eval-after-load ".py" (newpython))
is documented to work. But I found that it was never called, and
couldn't find an example that actually worked.
But then I realized that I have mode hooks for all the programming
modes anyway, to set up things like indentation preferences. Inserting
some text at the end of the mode hook seems perfectly simple:
The (= (buffer-size) 0) test ensures this only happens
if I open a new file. Obviously I don't want to be auto-inserting code
inside existing programs!
HTML mode was a little more complicated. I edit some files, like
blog posts, that use HTML formatting, and hence need html-mode,
but they aren't standalone HTML files that need the usual HTML
template inserted. For blog posts, I use a different file extension,
so I can use the elisp string-suffix-p to test for that:
;; s-suffix? is like Python endswith
(if (and (= (buffer-size) 0)
(string-suffix-p ".html" (buffer-file-name)))
(newhtml) )
I may eventually find other files that don't need the template;
if I need to, it's easy to add other tests, like the directory where
the new file will live.
A nice timesaver: open a new file and have a template automatically
inserted.
Several times recently I've come across someone with a useful fix
to a program on GitHub, for which they'd filed a GitHub pull request.
The problem is that GitHub doesn't give you any link on the pull
request to let you download the code in that pull request. You can
get a list of the checkins inside it, or a list of the changed files
so you can view the differences graphically. But if you want the code
on your own computer, so you can test it, or use your own editors and
diff tools to inspect it, it's not obvious how. That this is a problem
is easily seen with a web search for something like
download github pull request -- there are huge numbers
of people asking how, and most of the answers are vague unclear.
That's a shame, because it turns out it's easy to pull a pull request.
You can fetch it directly with git into a new branch as long as you
have the pull request ID. That's the ID shown on the GitHub pull
request page:
Once you have the pull request ID, choose a new name for your branch,
then fetch it:
Then you can view diffs with something like
git difftool NEW-BRANCH-NAME..master
Easy! GitHub should give a hint of that on its pull request pages.
Fetching a Pull Request diff to apply it to another tree
But shortly after I learned how to apply a pull request, I had a
related but different problem in another project. There was a pull
request for an older repository, but the part it applied to had since
been split off into a separate project. (It was an old pull request
that had fallen through the cracks, and as a new developer on the
project, I wanted to see if I could help test it in the new
repository.)
You can't pull a pull request that's for a whole different repository.
But what you can do is go to the pull request's page on GitHub.
There are 3 tabs: Conversation, Commits, and Files changed.
Click on Files changed to see the diffs visually.
That works if the changes are small and only affect a few files
(which fortunately was the case this time).
It's not so great if there are a lot of changes or a lot of files affected.
I couldn't find any "Raw" or "download" button that would give me a
diff I could actually apply. You can select all and then paste
the diffs into a local file, but you have to do that separately for
each file affected. It might be, if you have a lot of files, that the
best solution is to check out the original repo, apply the pull request,
generate a diff locally with git diff, then apply that
diff to the new repo. Rather circuitous. But with any luck that
situation won't arise very often.
Update: thanks very much to Houz for the solution! (In the comments, below.)
Just append .diff or .patch to the pull request URL, e.g.
https://github.com/OWNER/REPO/pull/REQUEST-ID.diff
which you can view in a browser or fetch with wget or curl.
But one of the election data files I found, OpenDataSoft's
USA 2016 Presidential Election by county
had embedded county shapes,
available either as CSV or as GeoJSON. (I used the CSV version, but
inside the CSV the geo data are encoded as JSON so you'll need JSON
decoding either way. But that's no problem.)
Just about all the documentation
I found on coloring shapes in Basemap assumed that the shapes were
defined as ESRI shapefiles. How do you draw shapes if you have
latitude/longitude data in a more open format?
As it turns out, it's quite easy, but it took a fair amount of poking
around inside Basemap to figure out how it worked.
In the loop over counties in the US in the previous article,
the end goal was to create a matplotlib Polygon
and use that to add a Basemap patch.
But matplotlib's Polygon wants map coordinates, not latitude/longitude.
If m is your basemap (i.e. you created the map with
m = Basemap( ... ), you can translate coordinates like this:
(mapx, mapy) = m(longitude, latitude)
So once you have a region as a list of (longitude, latitude) coordinate
pairs, you can create a colored, shaped patch like this:
for coord_pair in region:
coord_pair[0], coord_pair[1] = m(coord_pair[0], coord_pair[1])
poly = Polygon(region, facecolor=color, edgecolor=color)
ax.add_patch(poly)
Working with the OpenDataSoft data file was actually a little harder than
that, because the list of coordinates was JSON-encoded inside the CSV file,
so I had to decode it with json.loads(county["Geo Shape"]).
Once decoded, it had some counties as a Polygonlist of
lists (allowing for discontiguous outlines), and others as
a MultiPolygonlist of list of lists (I'm not sure why,
since the Polygon format already allows for discontiguous boundaries)
And a few counties were missing, so there were blanks on the map,
which show up as white patches in this screenshot.
The counties missing data either have inconsistent formatting in
their coordinate lists, or they have only one coordinate pair, and
they include Washington, Virginia; Roane, Tennessee; Schley, Georgia;
Terrell, Georgia; Marshall, Alabama; Williamsburg, Virginia; and Pike
Georgia; plus Oglala Lakota (which is clearly meant to be Oglala,
South Dakota), and all of Alaska.
One thing about crunching data files
from the internet is that there are always a few special cases you
have to code around. And I could have gotten those coordinates from
the census shapefiles; but as long as I needed the census shapefile
anyway, why use the CSV shapes at all? In this particular case, it
makes more sense to use the shapefiles from the Census.
Still, I'm glad to have learned how to use arbitrary coordinates as shapes,
freeing me from the proprietary and annoying ESRI shapefile format.
I used the Basemap package for plotting.
It used to be part of matplotlib, but it's been split off into its
own toolkit, grouped under mpl_toolkits: on Debian, it's
available as python-mpltoolkits.basemap, or you can find
Basemap on GitHub.
It's easiest to start with the
fillstates.py
example that shows
how to draw a US map with different states colored differently.
You'll need the three shapefiles (because of ESRI's silly shapefile format):
st99_d00.dbf, st99_d00.shp and st99_d00.shx, available
in the same examples directory.
Of course, to plot counties, you need county shapefiles as well.
The US Census has
county
shapefiles at several different resolutions (I used the 500k version).
Then you can plot state and counties outlines like this:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
def draw_us_map():
# Set the lower left and upper right limits of the bounding box:
lllon = -119
urlon = -64
lllat = 22.0
urlat = 50.5
# and calculate a centerpoint, needed for the projection:
centerlon = float(lllon + urlon) / 2.0
centerlat = float(lllat + urlat) / 2.0
m = Basemap(resolution='i', # crude, low, intermediate, high, full
llcrnrlon = lllon, urcrnrlon = urlon,
lon_0 = centerlon,
llcrnrlat = lllat, urcrnrlat = urlat,
lat_0 = centerlat,
projection='tmerc')
# Read state boundaries.
shp_info = m.readshapefile('st99_d00', 'states',
drawbounds=True, color='lightgrey')
# Read county boundaries
shp_info = m.readshapefile('cb_2015_us_county_500k',
'counties',
drawbounds=True)
if __name__ == "__main__":
draw_us_map()
plt.title('US Counties')
# Get rid of some of the extraneous whitespace matplotlib loves to use.
plt.tight_layout(pad=0, w_pad=0, h_pad=0)
plt.show()
Accessing the state and county data after reading shapefiles
Great. Now that we've plotted all the states and counties, how do we
get a list of them, so that when I read out "Santa Clara, CA" from
the data I'm trying to plot, I know which map object to color?
After calling readshapefile('st99_d00', 'states'), m has two new
members, both lists: m.states and m.states_info.
m.states_info[] is a list of dicts mirroring what was in the shapefile.
For the Census state list, the useful keys are NAME, AREA, and PERIMETER.
There's also STATE, which is an integer (not restricted to 1 through 50)
but I'll get to that.
If you want the shape for, say, California,
iterate through m.states_info[] looking for the one where
m.states_info[i]["NAME"] == "California".
Note i; the shape coordinates will be in m.states[i]n
(in basemap map coordinates, not latitude/longitude).
Correlating states and counties in Census shapefiles
County data is similar, with county names in
m.counties_info[i]["NAME"].
Remember that STATE integer? Each county has a STATEFP,
m.counties_info[i]["STATEFP"] that matches some state's
m.states_info[i]["STATE"].
But doing that search every time would be slow. So right after calling
readshapefile for the states, I make a table of states. Empirically,
STATE in the state list goes up to 72. Why 72? Shrug.
MAXSTATEFP = 73
states = [None] * MAXSTATEFP
for state in m.states_info:
statefp = int(state["STATE"])
# Many states have multiple entries in m.states (because of islands).
# Only add it once.
if not states[statefp]:
states[statefp] = state["NAME"]
That'll make it easy to look up a county's state name quickly when
we're looping through all the counties.
Calculating colors for each county
Time to figure out the colors from the Deleetdk election results CSV file.
Reading lines from the CSV file into a dictionary is superficially easy enough:
fp = open("tidy_data.csv")
reader = csv.DictReader(fp)
# Make a dictionary of all "county, state" and their colors.
county_colors = {}
for county in reader:
# What color is this county?
pop = float(county["votes"])
blue = float(county["results.clintonh"])/pop
red = float(county["Total.Population"])/pop
county_colors["%s, %s" % (county["name"], county["State"])] \
= (red, 0, blue)
But in practice, that wasn't good enough, because the county names
in the Deleetdk names didn't always match the official Census county names.
Fuzzy matches
For instance, the CSV file had no results for Alaska or Puerto Rico,
so I had to skip those. Non-ASCII characters were a problem:
"Doña Ana" county in the census data was "Dona Ana" in the CSV.
I had to strip off " County", " Borough" and similar terms:
"St Louis" in the census data was "St. Louis County" in the CSV.
Some names were capitalized differently, like PLYMOUTH vs. Plymouth,
or Lac Qui Parle vs. Lac qui Parle.
And some names were just different, like "Jeff Davis" vs. "Jefferson Davis".
To get around that I used SequenceMatcher to look for fuzzy matches
when I couldn't find an exact match:
def fuzzy_find(s, slist):
'''Try to find a fuzzy match for s in slist.
'''
best_ratio = -1
best_match = None
ls = s.lower()
for ss in slist:
r = SequenceMatcher(None, ls, ss.lower()).ratio()
if r > best_ratio:
best_ratio = r
best_match = ss
if best_ratio > .75:
return best_match
return None
Correlate the county names from the two datasets
It's finally time to loop through the counties in the map to color and
plot them.
Remember STATE vs. STATEFP? It turns out there are a few counties in
the census county shapefile with a STATEFP that doesn't match any
STATE in the state shapefile. Mostly they're in the Virgin Islands
and I don't have election data for them anyway, so I skipped them for now.
I also skipped Puerto Rico and Alaska (no results in the election data)
and counties that had no corresponding state: I'll omit that code here,
but you can see it in the final script, linked at the end.
for i, county in enumerate(m.counties_info):
countyname = county["NAME"]
try:
statename = states[int(county["STATEFP"])]
except IndexError:
print countyname, "has out-of-index statefp of", county["STATEFP"]
continue
countystate = "%s, %s" % (countyname, statename)
try:
ccolor = county_colors[countystate]
except KeyError:
# No exact match; try for a fuzzy match
fuzzyname = fuzzy_find(countystate, county_colors.keys())
if fuzzyname:
ccolor = county_colors[fuzzyname]
county_colors[countystate] = ccolor
else:
print "No match for", countystate
continue
countyseg = m.counties[i]
poly = Polygon(countyseg, facecolor=ccolor) # edgecolor="white"
ax.add_patch(poly)
Moving Hawaii
Finally, although the CSV didn't have results for Alaska, it did have
Hawaii. To display it, you can move it when creating the patches:
The offsets are in map coordinates and are empirical; I fiddled with
them until Hawaii showed up at a reasonable place.
Well, that was a much longer article than I intended. Turns out
it takes a fair amount of code to correlate several datasets and
turn them into a map. But a lot of the work will be applicable
to other datasets.
Back in 2012, I got interested in fiddling around with election data
as a way to learn about data analysis in Python. So I went searching
for results data on the presidential election. And got a surprise: it
wasn't available anywhere in the US. After many hours of searching,
the only source I ever found was at the UK newspaper, The Guardian.
Surely in 2016, we're better off, right? But when I went looking,
I found otherwise. There's still no official source for US election
results data; there isn't even a source as reliable as The Guardian
this time.
You might think Data.gov would be the place to go for official
election results, but no:
searching
for 2016 election on Data.gov yields nothing remotely useful.
The Federal
Election Commission has an election results page, but it only goes
up to 2014 and only includes the Senate and House, not presidential elections.
Archives.gov
has popular vote totals for the 2012 election but not the current one.
Maybe in four years, they'll have some data.
After striking out on official US government sites, I searched the web.
I found a few sources, none of them even remotely official.
Early on I found
Simon
Rogers, How to Download County-Level Results Data,
which leads to GitHub user tonmcg's
County
Level Election Results 12-16. It's a comparison of Democratic vs.
Republican votes in the 2012 and 2016 elections (I assume that means votes
for that party's presidential candidate, though the field names don't
make that entirely clear), with no information on third-party
candidates.
KidPixo's
Presidential Election USA 2016 on GitHub is a little better: the fields
make it clear that it's recording votes for Trump and Clinton, but still
no third party information. It's also scraped from the New York Times,
and it includes the scraping code so you can check it and have some
confidence on the source of the data.
Kaggle
claims to have election data, but you can't download their datasets
or even see what they have without signing up for an account.
Ben Hamner
has some publically available Kaggle data on GitHub, but only for the primary.
I also found several companies selling election data,
and several universities that had datasets available
for researchers with accounts at that university.
The most complete dataset I found, and the only open one that included
third party candidates, was through
OpenDataSoft.
Like the other two, this data is scraped from the NYT.
It has data for all the minor party candidates as well as the majors,
plus lots of demographic data for each county in the lower 48, plus
Hawaii, but not the territories, and the election data for all the
Alaska counties is missing.
You can get it either from a GitHub repo,
Deleetdk's
USA.county.data (look in inst/ext/tidy_data.csv.
If you want a larger version with geographic shape data included,
clicking through several other opendatasoft pages eventually gets
you to an export page,
USA 2016 Presidential Election by county,
where you can download CSV, JSON, GeoJSON and other formats.
The OpenDataSoft data file was pretty useful, though it had gaps
(for instance, there's no data for Alaska). I was able to make
my own red-blue-purple plot of county voting results (I'll write
separately about how to do that with python-basemap),
and to play around with statistics.
Implications of the lack of open data
But the point my search really brought home: By the time I finally
found a workable dataset, I was so sick of the search, and so
relieved to find anything at all, that I'd stopped being picky about
where the data came from. I had long since given up on finding
anything from a known source, like a government site or even a
newspaper, and was just looking for data, any data.
And that's not good. It means that a lot of the people doing
statistics on elections are using data from unverified sources,
probably copied from someone else who claimed to have scraped it,
using unknown code, from some post-election web page that likely no
longer exists. Is it accurate? There's no way of knowing.
What if someone wanted to spread news and misinformation? There's a
hunger for data, particularly on something as important as a US
Presidential election. Looking at Google's suggested results and
"Searches related to" made it clear that it wasn't just me: there are
a lot of people searching for this information and not being able to
find it through official sources.
If I were a foreign power wanting to spread disinformation, providing
easily available data files -- to fill the gap left by the US
Government's refusal to do so -- would be a great way to mislead
people. I could put anything I wanted in those files: there's no way
of checking them against official results since there are no official
results. Just make sure the totals add up to what people expect to
see. You could easily set up an official-looking site and put made-up
data there, and it would look a lot more real than all the people
scraping from the NYT.
If our government -- or newspapers, or anyone else -- really wanted to
combat "fake news", they should take open data seriously. They should
make datasets for important issues like the presidential election
publically available, as soon as possible after the election -- not
four years later when nobody but historians care any more.
Without that, we're leaving ourselves open to fake news and fake data.
But then I realized all was not quite right. I could install new releases of
my package -- but I couldn't run it from the source directory any more.
How could I test changes without needing to rebuild the package for
every little change I made?
Fortunately, it turned out to be fairly easy. Set PYTHONPATH to a
directory that includes all the modules you normally want to test.
For example, inside my bin directory I have a python directory
where I can symlink any development modules I might need:
Then add the directory at the beginning of PYTHONPATH:
export PYTHONPATH=$HOME/bin/python
With that, I could test from the development directory again,
without needing to rebuild and install a package every time.
Cleaning up files used in building
Building a package leaves some extra files and directories around,
and git status will whine at you since they're not
version controlled. Of course, you could gitignore them, but it's
better to clean them up after you no longer need them.
To do that, you can add a clean command to setup.py.
from setuptools import Command
class CleanCommand(Command):
"""Custom clean command to tidy up the project root."""
user_options = []
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
os.system('rm -vrf ./build ./dist ./*.pyc ./*.tgz ./*.egg-info ./docs/sphinxdoc/_build')
(Obviously, that includes file types beyond what you need for just
cleaning up after package building. Adjust the list as needed.)
Then in the setup() function, add these lines:
cmdclass={
'clean': CleanCommand,
}
Now you can type
python setup.py clean
and it will remove all the extra files.
Keeping version strings in sync
It's so easy to update the __version__ string in your module and
forget that you also have to do it in setup.py, or vice versa.
Much better to make sure they're always in sync.
I found several version of that using system("grep..."),
but I decided to write my own that doesn't depend on system().
(Yes, I should do the same thing with that CleanCommand, I know.)
def get_version():
'''Read the pytopo module versions from pytopo/__init__.py'''
with open("pytopo/__init__.py") as fp:
for line in fp:
line = line.strip()
if line.startswith("__version__"):
parts = line.split("=")
if len(parts) > 1:
return parts[1].strip()
Then in setup():
version=get_version(),
Much better! Now you only have to update __version__ inside your module
and setup.py will automatically use it.
Using your README for a package long description
setup has a long_description for the package, but you probably
already have some sort of README in your package. You can use it for
your long description this way:
# Utility function to read the README file.
# Used for the long_description.
def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()
In
Part
I, I discussed writing a setup.py
to make a package you can submit to PyPI.
Today I'll talk about better ways of testing the package,
and how to submit it so other people can install it.
Testing in a VirtualEnv
You've verified that your package installs. But you still need to test
it and make sure it works in a clean environment, without all your
developer settings.
The best way to test is to set up a "virtual environment", where you can
install your test packages without messing up your regular runtime
environment. I shied away from virtualenvs for a long time, but
they're actually very easy to set up:
virtualenv venv
source venv/bin/activate
That creates a directory named venv under the current directory,
which it will use to install packages.
Then you can pip install packagename or
pip install /path/to/packagename-version.tar.gz
Except -- hold on! Nothing in Python packaging is that easy.
It turns out there are a lot of packages that won't install inside
a virtualenv, and one of them is PyGTK, the library I use for my
user interfaces. Attempting to install pygtk inside a venv gets:
********************************************************************
* Building PyGTK using distutils is only supported on windows. *
* To build PyGTK in a supported way, read the INSTALL file. *
********************************************************************
Windows only? Seriously? PyGTK works fine on both Linux and Mac;
it's packaged on every Linux distribution, and on Mac it's packaged
with GIMP. But for some reason, whoever maintains the PyPI PyGTK
packages hasn't bothered to make it work on anything but Windows,
and PyGTK seems to be mostly an orphaned project so that's not likely
to change.
(There's a package called ruamel.venvgtk that's supposed to work around
this, but it didn't make any difference for me.)
The solution is to let the virtualenv use your system-installed packages,
so it can find GTK and other non-PyPI packages there:
I also found that if I had a ~/.local directory (where packages
normally go if I use pip install --user packagename),
sometimes pip would install to .local instead of the venv. I never
did track down why this happened some times and not others, but when
it happened, a temporary
mv ~/.local ~/old.local fixed it.
Test your Python package in the venv until everything works.
When you're finished with your venv, you can run deactivate
and then remove it with rm -rf venv.
Tag it on GitHub
Is your project ready to publish?
If your project is hosted on GitHub, you can have pypi download it
automatically. In your setup.py, set
Yes, those passwords are in cleartext. Incredibly, there doesn't seem
to be a way to store an encrypted password or even have it prompt you.
There are tons of complaints about that all over the web but nobody
seems to have a solution.
You can specify a password on the command line, but that's not much better.
So use a password you don't use anywhere else and don't mind too much
if someone guesses.
Update: Apparently there's a newer method called twine that solves the
password encryption problem. Read about it here:
Uploading your project to PyPI.
You should probably use twine instead of the setup.py commands discussed
in the next paragraph.
Wait a few minutes: it takes pypitest a little while before new packages
become available.
Then go to your venv (to be safe, maybe delete the old venv and create a
new one, or at least pip uninstall) and try installing:
Congratulations! If you've gone through all these steps, you've uploaded
a package to pypi. Pat yourself on the back and go tell everybody they
can pip install your package.
Allowed PyPI classifiers
-- the categories your project fits into
Unfortunately there aren't very many of those, so you'll probably be
stuck with 'Topic :: Utilities' and not much else.
Python
Packages and You: not a tutorial, but a lot of good advice on style
and designing good packages.
I write lots of Python scripts that I think would be useful to other
people, but I've put off learning how to submit to the Python Package Index,
PyPI, so that my packages can be installed using pip install.
Now that I've finally done it, I see why I put it off for so long.
Unlike programming in Python, packaging is a huge, poorly documented
hassle, and it took me days to get a working.package. Maybe some of the
hints here will help other struggling Pythonistas.
Create a setup.py
The setup.py file is the file that describes the files in your
project and other installation information.
If you've never created a setup.py before,
Submitting a Python package with GitHub and PyPI
has a decent example, and you can find lots more good examples with a
web search for "setup.py", so I'll skip the basics and just mention
some of the parts that weren't straightforward.
Distutils vs. Setuptools
However, there's one confusing point that no one seems to mention.
setup.py examples all rely on a predefined function
called setup, but some examples start with
from distutils.core import setup
while others start with
from setuptools import setup
In other words, there are two different versions of setup!
What's the difference? I still have no idea. The setuptools
version seems to be a bit more advanced, and I found that using
distutils.core , sometimes I'd get weird errors when
trying to follow suggestions I found on the web. So I ended up using
the setuptools version.
But I didn't initially have setuptools installed (it's not part of the
standard Python distribution), so I installed it from the Debian package:
apt-get install python-setuptools python-wheel
The python-wheel package isn't strictly needed, but I
found I got assorted warnings warnings from pip install
later in the process ("Cannot build wheel") unless I installed it, so
I recommend you install it from the start.
Including scripts
setup.py has a scripts option to include scripts that
are part of your package:
scripts=['script1', 'script2'],
But when I tried to use it, I had all sorts of problems, starting with
scripts not actually being included in the source distribution. There
isn't much support for using scripts -- it turns out
you're actually supposed to use something called
console_scripts, which is more elaborate.
First, you can't have a separate script file, or even a __main__
inside an existing class file. You must have a function, typically
called main(), so you'll typically have this:
def main():
# do your script stuff
if __name__ == "__main__":
main()
There's a secret undocumented alternative that a few people use
for scripts with graphical user interfaces: use 'gui_scripts' rather
than 'console_scripts'. It seems to work when I try it, but the fact
that it's not documented and none of the Python experts even seem to
know about it scared me off, and I stuck with 'console_scripts'.
Including data files
One of my packages, pytopo, has a couple of files it needs to install,
like an icon image. setup.py has a provision for that:
Great -- except it doesn't work. None of the files actually gets added
to the source distribution.
One solution people mention to a "files not getting added" problem is
to create an explicit MANIFEST file listing all files that need to be
in the distribution. Normally, setup generates the MANIFEST automatically,
but apparently it isn't smart enough to notice data_files
and include those in its generated MANIFEST.
I tried creating a MANIFEST listing all the .py files plus
the various resources -- but it didn't make any difference. My
MANIFEST was ignored.
The solution turned out to be creating a MANIFEST.in file, which is
used to generate a MANIFEST. It's easier than creating the MANIFEST
itself: you don't have to list every file, just patterns that describe
them:
include setup.py
include packagename/*.py
include resources/*
If you have any scripts that don't use the extension .py,
don't forget to include them as well. This may have been why
scripts= didn't work for me earlier, but by the time
I found out about MANIFEST.in I had already switched to using
console_scripts.
Testing setup.py
Once you have a setup.py, use it to generate a source distribution with:
python setup.py sdist
(You can also use bdist to generate a binary distribution, but you'll
probably only need that if you're compiling C as part of your package.
Source dists are apparently enough for pure Python packages.)
Your package will end up in dist/packagename-version.tar.gz
so you can use tar tf dist/packagename-version.tar.gz
to verify what files are in it. Work on your setup.py until you
don't get any errors or warnings and the list of files looks right.
Congratulations -- you've made a Python package!
I'll post a followup article in a day or two about more ways of testing,
and how to submit your working package to PyPI.
Reading
Stephen
Wolfram's latest discussion of teaching computational thinking
(which, though I mostly agree with it, is more an extended ad for
Wolfram Programming Lab than a discussion of what computational
thinking is and why we should teach it) I found myself musing over
ideas for future computer classes for
Los Alamos Makers.
Students, and especially kids, like to see something other than words
on a screen. Graphics and games good, or robotics when possible ...
but another fun project a novice programmer can appreciate is music.
I found myself curious what you could do with Python, since
I hadn't played much with Python sound generation libraries.
I did discover a while ago that
Python
is rather bad at playing audio files,
though I did eventually manage to write
a music
player script that works quite well.
What about generating tones and chords?
A web search revealed that this is another thing Python is bad at. I
found lots of people asking about chord generation, and a handful of
half-baked ideas that relied on long obsolete packages or external
program. But none of it actually worked, at least without requiring
Windows or relying on larger packages like fluidsynth (which looked
worth exploring some day when I have more time).
Play an arbitrary waveform with Pygame and NumPy
But I did find one example based on a long-obsolete Python package
called Numeric which, when rewritten to use NumPy, actually played a sound.
You can take a NumPy array and play it using a pygame.sndarray object
this way:
import pygame, pygame.sndarray
def play_for(sample_wave, ms):
"""Play the given NumPy array, as a sound, for ms milliseconds."""
sound = pygame.sndarray.make_sound(sample_wave)
sound.play(-1)
pygame.time.delay(ms)
sound.stop()
Then you just need to calculate the waveform you want to play. NumPy
can generate sine waves on its own, while scipy.signal can generate
square and sawtooth waves. Like this:
import numpy
import scipy.signal
sample_rate = 44100
def sine_wave(hz, peak, n_samples=sample_rate):
"""Compute N samples of a sine wave with given frequency and peak amplitude.
Defaults to one second.
"""
length = sample_rate / float(hz)
omega = numpy.pi * 2 / length
xvalues = numpy.arange(int(length)) * omega
onecycle = peak * numpy.sin(xvalues)
return numpy.resize(onecycle, (n_samples,)).astype(numpy.int16)
def square_wave(hz, peak, duty_cycle=.5, n_samples=sample_rate):
"""Compute N samples of a sine wave with given frequency and peak amplitude.
Defaults to one second.
"""
t = numpy.linspace(0, 1, 500 * 440/hz, endpoint=False)
wave = scipy.signal.square(2 * numpy.pi * 5 * t, duty=duty_cycle)
wave = numpy.resize(wave, (n_samples,))
return (peak / 2 * wave.astype(numpy.int16))
# Play A (440Hz) for 1 second as a sine wave:
play_for(sine_wave(440, 4096), 1000)
# Play A-440 for 1 second as a square wave:
play_for(square_wave(440, 4096), 1000)
Playing chords
That's all very well, but it's still a single tone, not a chord.
To generate a chord of two notes, you can add the waveforms for the
two notes. For instance, 440Hz is concert A, and the A one octave above
it is double the frequence, or 880 Hz. If you wanted to play a chord
consisting of those two As, you could do it like this:
Simple octaves aren't very interesting to listen to.
What you want is chords like major and minor triads and so forth.
If you google for chord ratios Google helpfully gives
you a few of them right off, then links to a page with
a table of
ratios for some common chords.
For instance, the major triad ratios are listed as 4:5:6.
What does that mean? It means that for a C-E-G triad (the first C
chord you learn in piano), the E's frequency is 5/4 of the C's
frequency, and the G is 6/4 of the C.
You can pass that list, [4, 5, 5] to a function that will calculate
the right ratios to produce the set of waveforms you need to add
to get your chord:
def make_chord(hz, ratios):
"""Make a chord based on a list of frequency ratios."""
sampling = 4096
chord = waveform(hz, sampling)
for r in ratios[1:]:
chord = sum([chord, sine_wave(hz * r / ratios[0], sampling)])
return chord
def major_triad(hz):
return make_chord(hz, [4, 5, 6])
play_for(major_triad(440), length)
Even better, you can pass in the waveform you want to use
when you're adding instruments together:
def make_chord(hz, ratios, waveform=None):
"""Make a chord based on a list of frequency ratios
using a given waveform (defaults to a sine wave).
"""
sampling = 4096
if not waveform:
waveform = sine_wave
chord = waveform(hz, sampling)
for r in ratios[1:]:
chord = sum([chord, waveform(hz * r / ratios[0], sampling)])
return chord
def major_triad(hz, waveform=None):
return make_chord(hz, [4, 5, 6], waveform)
play_for(major_triad(440, square_wave), length)
There are still some problems. For instance, sawtooth_wave() works
fine individually or for pairs of notes, but triads of sawtooths don't
play correctly. I'm guessing something about the sampling rate is
making their overtones cancel out part of the sawtooth wave. Triangle
waves (in scipy.signal, that's a sawtooth wave with rising ramp width
of 0.5) don't seem to work right even for single tones. I'm sure these
are solvable, perhaps by fiddling with the sampling rate. I'll
probably need to add graphics so I can look at the waveform for
debugging purposes.
In any case, it was a fun morning hack. Most chords work pretty well,
and it's nice to know how to to play any waveform I can generate.
I have a little browser script in Python, called
quickbrowse,
based on Python-Webkit-GTK. I use it for things like quickly calling
up an anonymous window with full javascript and cookies, for when I
hit a page that doesn't work with Firefox and privacy blocking;
and as a quick solution for calling up HTML conversions of doc and pdf
email attachments.
Python-webkit comes with a simple browser as an example -- on Debian
it's installed in /usr/share/doc/python-webkit/examples/browser.py.
But it's very minimal, and lacks important basic features like
command-line arguments. One of those basic features I've been meaning
to add is Back and Forward buttons.
Should be easy, right? Of course webkit has a go_back() method, so
I just have to add a button and call that, right? Ha. It turned out to
be a lot more difficult than I expected, and although I found a fair
number of pages asking about it, I didn't find many working examples.
So here's how to do it.
Add a toolbar button
In the WebToolbar class (derived from gtk.Toolbar):
In __init__(), after initializing the parent class and
before creating the location text entry (assuming you want your
buttons left of the location bar), create the two buttons:
That's right, you can't just call go_back on the web view, because
GtkToolbar doesn't know anything about the window containing it.
All it can do is pass signals up the chain.
But wait -- it can't even pass signals unless you define them.
There's a __gsignals__ object defined at the beginning
of the class that needs all its signals spelled out.
In this case, what you need is
And then of course you have to define those callbacks:
def go_back_requested_cb (self, widget, content_pane):
# Oops! What goes here?
def go_forward_requested_cb (self, widget, content_pane):
# Oops! What goes here?
But whoops! What do we put there? It turns out that WebBrowserWindow
has no better idea than WebToolbar did of where its content is or
how to tell it to go back or forward.
What it does have is a ContentPane (derived from gtk.Notebook),
which is basically just a container with no exposed methods that
have anything to do with web browsing.
Get the BrowserView for the current tab
Fortunately we can fix that. In ContentPane, you can get the current
page (meaning the current browser tab, in this case); and each page
has a child, which turns out to be a BrowserView.
So you can add this function to ContentPane to help other classes
get the current BrowserView:
There's been a discussion in the GIMP community about setting up git
repos to host contributed assets like scripts, plug-ins and brushes,
to replace the long-stagnant GIMP Plug-in Repository. One of the
suggestions involves having lots of tiny git repos rather than one
that holds all the assets.
That got me to thinking about one annoyance I always have when setting
up a new git repository on github: the repository is initially
configured with an ssh URL, so I can push to it; but that means
I can't pull from the repo without typing my ssh password (more
accurately, the password to my ssh key).
Fortunately, there's a way to fix that: a git configuration can have
one url for pulling source, and a different pushurl
for pushing changes.
These are defined in the file .git/config inside each
repository. So edit that file and take a look at the
[remote "origin"] section.
For instance, in the GIMP source repositories, hosted on git.gnome.org,
instead of the default of
url = ssh://git.gnome.org/git/gimp
I can set
(disclaimer: I'm not sure this is still correct; my gnome git access
stopped working -- I think it was during the Heartbleed security fire drill,
or one of those -- and never got fixed.)
For GitHub the syntax is a little different. When I initially set up
a repository, the url comes out something like
url = git@github.com:username/reponame.git
(sometimes the git@ part isn't included), and the password-free
pull URL is something you can get from github's website. So you'll end
up with something like this:
That's helpful, and I've made that change on all of my repos.
But I just forked another repo on github, and as I went to edit
.git/config I remembered what a pain this had been to
do en masse on all my repos; and how it would be a much bigger
pain to do it on a gazillion tiny GIMP asset repos if they end up
going with that model and I ever want to help with the development.
It's just the thing that should be scriptable.
However, the rules for what constitutes a valid git passwordless pull
URL, and what constitutes a valid ssh writable URL, seem to encompass
a lot of territory. So the quickie Python script I whipped up to
modify .git/config doesn't claim to handle everything; it only handles
the URLs I've encountered personally on Gnome and GitHub.
Still, that should be useful if I ever have to add multiple repos at once.
The script:
repo-pullpush
(yes, I know it's a terrible name) on GitHub.
A silly little GIMP ditty:
I had a Google map page showing locations of lots of metal recycling
places in Albuquerque. The Google map shows stars for each location,
but to find out the name and location of each address, you have to
mouse over each star. I wanted a printable version to carry in the
car with me.
I made a screenshot in GIMP, then added text for the stars over
the places that looked most promising. But I was doing this quickly,
and as I added text for more locations, I realized that it was getting
crowded and I wished I'd used a smaller font. How do you change the
font size for ALL font layers in an image, all at once?
Of course GIMP has no built-in method for this -- it's not something
that comes up very often, and there's no reason it would have a filter
like that. But the GIMP PDB (Procedural DataBase, part of the GIMP API)
lets you change font size and face, so it's an easy script to write.
In the past I would have written something like this in script-fu,
but now that Python is available on all GIMP platforms, there's no
reason not to use it for everything.
Changing font face is just as easy as changing size, so I added that
as well.
But it might be a bit more reliable to use config.status --
I'm guessing this is the file that make
uses when it finds it needs to re-run autogen.sh.
However, the syntax in that file is more complicated,
and parsing it taught me some useful zsh tricks.
I can see the relevant line from config.status like this:
--enable-foo --disable-bar are options I added
purely for testing. I wanted to make sure my shell function would
work with multiple arguments.
Ultimately, I want my shell function to call
autogen.sh --prefix=/usr/local/gimp-git --enable-foo --disable-bar
The goal is to end up with $args being a zsh array containing those
three arguments. So I'll need to edit out those quotes and split the
line into an array.
Sed tricks
The first thing to do is to get rid of that initial ac_cs_config=
in the line from config.status. That's easy with sed:
But since we're using sed anyway, there's no need to use grep to
get the line: we can do it all with sed.
First try:
sed -n '/^ac_cs_config/s/ac_cs_config=//p' config.status
Search for the line that starts with ac_cs_config (^ matches
the beginning of a line);
then replace ac_cs_config= with nothing, and p
print the resulting line.
-n tells sed not to print anything except when told to with a p.
But it turns out that if you give a sed substitution a blank pattern,
it uses the last pattern it was given. So a more compact version,
using the search pattern ^ac_cs_config, is:
sed -n '/^ac_cs_config=/s///p' config.status
But there's also another way of doing it:
sed '/^ac_cs_config=/!d;s///' config.status
! after a search pattern matches every line that doesn't match
the pattern. d deletes those lines. Then for lines that weren't
deleted (the one line that does match), do the substitution.
Since there's no -n, sed will print all lines that weren't deleted.
I find that version more difficult to read. But I'm including it
because it's useful to know how to chain several commands in sed,
and how to use ! to search for lines that don't match a pattern.
You can also use sed to eliminate the double quotes:
sed '/^ac_cs_config=/!d;s///;s/"//g' config.status
'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
But it turns out that zsh has a better way of doing that.
Zsh parameter substitution
I'm still relatively new to zsh, but I got some great advice on #zsh.
The first suggestion:
I'll be using final print -rl - $args for all these examples:
it prints an array variable with one member per line.
For the actual distclean function, of course, I'll be passing
the variable to autogen.sh, not printing it out.
First, let's look at the heart of that expression: the
args=( ${(Q)${(z)${(Q)REPLY}}}.
The heart of this is the expression ${(Q)${(z)${(Q)x}}}
The zsh parameter substitution syntax is a bit arcane, but each of
the parenthesized letters does some operation on the variable that follows.
(z) splits an expression and stores it in an array.
But to see that, we have to use print -l, so array members
will be printed on separate lines.
$ x="a b c"; print -l $x; print "....."; print -l ${(z)x}
a b c
.....
a
b
c
Zsh is smart about quotes, so if you have quoted expressions it will
group them correctly when assigning array members:
$
x="'a a' 'b b' 'c c'"; print -l $x; print "....."; print -l ${(z)x}
'a a' 'b b' 'c c'
.....
'a a'
'b b'
'c c'
So let's break down the larger expression: this is best read
from right to left, inner expressions to outer.
${(Q) ${(z) ${(Q) x }}}
| | | \
| | | The original expression,
| | | "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"
| | \
| | Strip off the double quotes:
| | '--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
| \
| Split into an array of three items
\
Strip the single quotes from each array member,
( --prefix=/usr/local/gimp-git --enable-foo --disable-bar )
Passing the sed results to the parameter substitution
There's still a little left to wonder about in our expression,
sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args
The IFS= read -r seems to be a common idiom in zsh scripting.
It takes standard input and assigns it to the variable $REPLY. IFS is
the input field separator: you can split variables into words by
spaces, newlines, semicolons or any other character you
want. IFS= sets it to nothing. But because the input expression --
"'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'" --
has quotes around it, IFS is ignored anyway.
So you can do the same thing with this simpler expression, to
assign the quoted expression to the variable $x.
I'll declare it a local variable: that makes no difference
when testing it in the shell, but if I call it in a function, I won't
have variables like $x and $args cluttering up my shell afterward.
local x=$(sed -n '/^ac_cs_config=/s///p' config.status); local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args
That works in the version of zsh I'm running here, 5.1.1. But I've
been warned that it's safer to quote the result of $(). Without
quotes, if you ever run the function in an older zsh, $x might end up
being set only to the first word of the expression. Second, it's a
good idea to put "local" in front of the variable; that way, $x won't
end up being set once you've returned from the function. So now we have:
local x="$(sed -n '/^ac_cs_config=/s///p' config.status)"; local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args
You don't even need to use a local variable. For added brevity (making
the function even more difficult to read! -- but we're way past the
point of easy readability), you could say:
Keeping up with source trees for open source projects, it often
happens that you pull the latest source, type make,
and get an error like this (edited for brevity):
$ make
cd . && /bin/sh ./missing --run aclocal-1.14
missing: line 52: aclocal-1.14: command not found
WARNING: aclocal-1.14' is missing on your system. You should only need it if you modified acinclude.m4' or configure.ac'. You might want to install the Automake' and Perl' packages. Grab them from any GNU archive site.
What's happening is that make is set up to run ./autogen.sh
(similar to running ./configure except it does some other stuff tailored
to people who build from the most current source tree) automatically
if anything has changed in the tree. But if the version of aclocal has
changed since the last time you ran autogen.sh or configure, then
running configure with the same arguments won't work.
Often, running a make distclean, to clean out all local
configuration in your tree and start from scratch, will fix the problem.
A simpler make clean might even be enough. But when you
try it, you get the same aclocal error.
Whoops! make clean runs make, which
triggers the rule that configure has to run before make, which fails.
It would be nice if the make rules were smart enough to
notice this and not require configure or autogen if the make target
is something simple like clean or distclean.
Alas, in most projects, they aren't.
But it turns out that even if you can't run autogen.sh with your
usual arguments -- e.g. ./autogen.sh --prefix=/usr/local/gimp-git
-- running ./autogen.sh by itself with no extra arguments
will often fix the problem.
This happens to me often enough with the GIMP source tree that I made
a shell alias for it:
alias distclean="./autogen.sh && ./configure && make clean"
Saving your configure arguments
Of course, this wipes out any arguments you've previously passed to
autogen and configure. So assuming this succeeds, your very next
action should be to run autogen again with the arguments you actually
want to use, e.g.:
./autogen.sh --prefix=/usr/local/gimp-git
Before you ran the distclean, you could get those arguments by looking
at the first few lines of config.log. But after you've run distclean,
config.log is gone -- what if you forgot to save the arguments first?
Or what if you just forget that you need to re-run autogen.sh again
after your distclean?
To guard against that, I wrote a somewhat more complicated shell function
to use instead of the simple alias I listed above.
The first trick is to get the arguments you previously passed to
configure. You can parse them out of config.log:
(There's a better place for getting those arguments,
config.status -- but parsing them from there is a bit more
complicated, so I'll follow up with a separate article on that,
chock-full of zsh goodness.)
So here's the distclean shell function, written for zsh:
The setopt localoptions errreturn at the beginning is a
zsh-ism that tells the shell to exit if there's an error.
You don't want to forge ahead and run configure and make clean
if your autogen.sh didn't work right.
errreturn does much the same thing as the
&& between the commands in the simpler shell alias above,
but with cleaner syntax.
If you're using bash, you could string all the commands on one line instead,
with && between them, something like this:
./autogen.sh && ./configure && make clean && ./autogen.sh $args
Or perhaps some bash user will tell me of a better way.
Update, December 2022:
viewmailattachments has been integrated with another mutt helper, viewhtmlmail.py, which can show HTML messages complete with embedded images. It's described
in the article View Mail Attachments from Mutt
and the script is at
viewmailattachments.py. It no longer uses the "please wait" screen described in this article, but the rest of the discussion still applies.
I seem to have fallen into a nest of Mac users whose idea of
email is a text part, an HTML part, plus two or three or seven attachments
(no exaggeration!) in an unholy combination of .DOC, .DOCX, .PPT and other
Microsoft Office formats, plus .PDF.
Converting to text in mutt
As a mutt user who generally reads all email as plaintext,
normally my reaction to a mess like that would be "Thanks, but
no thanks". But this is an organization that does a lot of good work
despite their file format habits, and I want to help.
In mutt, HTML mail attachments are easy.
This pair of entries in ~/.mailcap takes care of them:
auto_view text/html
alternative_order text/plain text
If a message has a text/plain part, mutt shows that. If it has text/html
but no text/plain, it looks for the "copiousoutput" mailcap entry,
runs the HTML part through lynx (or I could use links or w3m) and
displays that automatically.
If, reading the message in lynx, it looks to me like the message has
complex formatting that really needs a browser, I can go to
mutt's attachments screen and display the attachment in firefox
using the other mailcap entry.
Word attachments are not quite so easy, especially when there are a
lot of them. The straightforward way is to save each one to a file,
then run LibreOffice on each file, but that's slow and tedious
and leaves a lot of temporary files behind.
For simple documents, converting to plaintext is usually good
enough to get the gist of the attachments.
These .mailcap entries can do that:
Alternatives to catdoc include wvText and antiword.
But none of them work so well when you're cross-referencing five
different attachments, or for documents where color and formatting
make a difference, like mail from someone who doesn't know how to get
their mailer to include quoted text, and instead distinguishes their
comments from the text they're replying to by making their new
comments green (ugh!)
For those, you really do need a graphical window.
I decided what I really wanted (aside from people not sending me these
crazy emails in the first place!) was to view all the attachments as
tabs in a new window. And the obvious way to do that is to convert
them to formats Firefox can read.
Converting to HTML
I'd used wvHtml to convert .doc files to HTML, and it does a decent
job and is fairly fast, but it can't handle .docx. (People who send
Office formats seem to distribute their files fairly evenly between
DOC and DOCX. You'd think they'd use the same format for everything
they wrote, but apparently not.) It turns out LibreOffice has a
command-line conversion program, unoconv, that can handle any format
LibreOffice can handle. It's a lot slower than wvHtml but it does a
pretty good job, and it can handle .ppt (PowerPoint) files too.
For PDF files, I tried using pdftohtml, but it doesn't always do so well,
and it's hard to get it to produce a single HTML file rather than a
directory of separate page files. And about three quarters of PDF files
sent through email turn out to be PDF in name only: they're actually
collections of images of single pages, wrapped together as a PDF file.
(Mostly, when I see a PDF like that I just skip it and try to get the
information elsewhere. But I wanted my program at least to be able to
show what's in the document, and let the user choose whether to skip it.)
In the end, I decided to open a firefox tab and let Firefox's built-in
PDF reader show the file, though popping up separate mupdf windows is
also an option.
I wanted to show the HTML part of the email, too. Sometimes there's
formatting there (like the aforementioned people whose idea of quoting
messages is to type their replies in a different color), but there can
also be embedded images. Extracting the images and showing them in a
browser window is a bit tricky, but it's a problem I'd already solved
a couple of years ago:
Viewing HTML mail messages from Mutt (or other command-line mailers).
Showing it all in a new Firefox window
So that accounted for all the formats I needed to handle.
The final trick was the firefox window. Since some of these conversions,
especially unoconv, are quite slow, I wanted to pop up a window right
away with a "converting, please wait..." message.
Initially, I used a javascript: URL, running the command:
But I wanted the first attachment to replace the contents of that same
window as soon as it was ready, and then subsequent attachments open a
new tab in that window.
But it turned out that firefox is inconsistent about what -new-window
and -new-tab do; there's no guarantee that -new-tab will show up in
the same window you recently popped up with -new-window, and
running just firefox URL might open in either the new
window or the old, in a new tab or not, or might not open at all.
And things got even more complicated after I decided that I should use
-private-window to open these attachments in private browsing mode.
In the end, the only way firefox would behave in a repeatable,
predictable way was to use -private-window for everything.
The first call pops up the private window, and each new call opens
a new tab in the private window. If you want two separate windows
for two different mail messages, you're out of luck: you can't have
two different private windows. I decided I could live with that;
if it eventually starts to bother me, I can always give up on Firefox
and write a little python-webkit wrapper to do what I need.
Using a file redirect instead
But that still left me with no way to replace the contents of the
"Please wait..." window with useful content. Someone on #firefox
came up with a clever idea: write the content to a page with a
meta redirect.
So initially, I create a file pleasewait.html that includes the header:
(other HTML, charset information, etc. as needed).
The meta refresh means Firefox will reload the file every two seconds.
When the first converted file is ready, I just change the header
to redirect to URL=first_converted_file.html.
Meanwhile, I can be opening the other documents in additional tabs.
Finally, I added the command to my .muttrc. When I'm viewing a message
either in the index or pager screens, F10 will call the script and
decode all the attachments.
macro index <F10> "<pipe-message>~/bin/viewmailattachments\n" "View all attachments in browser"
macro pager <F10> "<pipe-message>~/bin/viewmailattachments\n" "View all attachments in browser"
Whew! It was trickier than I thought it would be.
But I find I'm using it quite a bit, and it takes a lot of the pain
out of those attachment-full emails.
Every now and then I need to create a series of contrasting colors.
For instance, in my mapping app
PyTopo,
when displaying several track logs at once, I want them to be different
colors so it's easy to tell which track is which.
Of course, I could make a list of five or ten different colors and
cycle through the list. But I hate doing work that a computer could
do for me.
Choosing random RGB (red, green and blue) values for the colors,
though, doesn't work so well. Sometimes you end up getting two similar
colors together. Other times, you get colors that just don't work
well, because they're so light they look white, or so dark they look
black, or so unsaturated they look like shades of grey.
What does work well is converting to the HSV color space:
hue, saturation and value.
Hue is a measure of the color -- that it's red, or blue, or
yellow green, or orangeish, or a reddish purple.
Saturation measures how intense the color is: is it a bright, vivid
red or a washed-out red? Value tells you how light or dark it is: is
it so pale it's almost white, so dark it's almost black, or somewhere
in between? (A similar model, called HSL, substitutes Lightness for Value,
but is similar enough in concept.)
If you're not familiar with HSV, you can get a good feel for it by
playing with GIMP's color chooser (which pops up when you click the
black Foreground or white Background color swatch in GIMP's toolbox).
The vertical rainbow bar selects Hue. Once you have a hue, dragging
up or down in the square changes Saturation;
dragging right or left changes Value.
You can also change one at a time by dragging the H, S or V sliders at
the upper right of the dialog.
Why does this matter? Because once you've chosen a saturation and value,
or at least ensured that saturation is fairly high and value is
somewhere in the middle of its range, you can cycle through hues
and be assured that you'll get colors that are fairly different each time.
If you had a red last time, this time it'll be a green, or yellow, or
blue, depending on how much you change the hue.
How does this work programmatically?
PyTopo uses Python-GTK, so I need a function that takes a gtk.gdk.Color
and chooses a new, contrasting Color. Fortunately, gtk.gdk.Color already
has hue, saturation and value built in.
Color.hue is a floating-point number between 0 and 1,
so I just have to choose how much to jump. Like this:
def contrasting_color(color):
'''Returns a gtk.gdk.Color of similar saturation and value
to the color passed in, but a contrasting hue.
gtk.gdk.Color objects have a hue between 0 and 1.
'''
if not color:
return self.first_track_color;
# How much to jump in hue:
jump = .37
return gtk.gdk.color_from_hsv(color.hue + jump,
color.saturation,
color.value)
What if you're not using Python-GTK?
No problem. The first time I used this technique, I was generating
Javascript code for a company's analytics web page.
Python's
colorsys module works fine for converting red, green, blue triples
to HSV (or a variety of other colorspaces) which you can then use in
whatever graphics package you prefer.
Three years ago I wanted a way to manage tags on e-books in a
lightweight way,
without having to maintain a Calibre database and fire up the
Calibre GUI app every time I wanted to check a book's tags.
I couldn't find anything, nor did I find any relevant Python
libraries, so I reverse engineered the (simple, XML-bsaed)
EPUB format and wrote a
Python
script to show or modify epub tags.
I've been using that script ever since. It's great for Project
Gutenberg books, which tend to be overloaded with tags that I don't
find very useful for categorizing books
("United States -- Social life and customs -- 20th century -- Fiction")
but lacking in tags that I would find useful ("History", "Science Fiction",
"Mystery").
But it wasn't easy to include it in other programs. For the last week
or so I've been fiddling with a Kobo ebook reader, and I wanted to
write programs that could read epub and also speak Kobo-ese. (I'll
write separately about the joys of Kobo hacking. It's really a neat
little e-reader.)
So I've factored my epubtag script into a usable Python module, so
as well as being a standalone program for viewing epub book data,
it's easy to use from other programs. It's available on GitHub:
epubtag.py:
parse EPUB metadata and view or change subject tags.
I wrote last week about
developing
apps with PhoneGap/Cordova. But one thing I didn't cover.
When you type cordova build, you're building only a
debug version of your app. If you want to release it, you have to sign it.
Figuring out how turned out to be a little tricky.
Most pages on the web say you can sign your apps by creating
platforms/android/ant.properties with the same keystore
information in it that you'd put in an
ant
build, then running
cordova build android --release
But Cordova completely ignored my ant.properties file and
went on creating a debug .apk file and no signed one.
I found various other purported solutions on the web, like creating
a build.json file in the app's top-level directory ... but that
just made Cordova die with a syntax error inside one of its own files).
This is the only method
that worked for me:
Create a file called
platforms/android/release-signing.properties, and put this in it:
storeFile=/path/to/your-keystore.keystore
storeType=jks
keyAlias=some-key
// if you don't want to enter the password at every build, use this:
keyPassword=your-key-password
storePassword=your-store-password
Then
cordova build android --release
finally works, and creates a file called
platforms/android/build/outputs/apk/android-release.apk
Although Ant
builds
have made Android development much easier, I've long been curious
about the cross-platform phone development apps: you write a simple
app in some common language, like HTML or Python, then run something
that can turn it into apps on multiple mobile platforms, like
Android, iOS, Blackberry, Windows phone, UbuntoOS, FirefoxOS or Tizen.
Last week I tried two of the many cross-platform mobile frameworks:
Kivy and PhoneGap.
Kivy lets you develop in Python, which sounded like a big plus. I went
to a Kivy talk at PyCon a year ago and it looked pretty interesting.
PhoneGap takes web apps written in HTML, CSS and Javascript and
packages them like native applications. PhoneGap seems much more
popular, but I wanted to see how it and Kivy compared.
Both projects are free, open source software.
I tried PhoneGap first.
It's based on Node.js, so the first step was installing that.
Debian has packages for nodejs, so
apt-get install nodejs npm nodejs-legacy did the trick.
You need nodejs-legacy to get the "node" command, which you'll
need for installing PhoneGap.
Now comes a confusing part. You'll be using npm to install ...
something. But depending on which tutorial you're following, it may
tell you to install and use either phonegap or cordova.
Cordova is an Apache project which is intertwined with PhoneGap. After
reading all their FAQs on the subject, I'm as confused as ever about
where PhoneGap ends and Cordova begins, which one is newer, which one
is more open-source, whether I should say I'm developing in PhoneGap
or Cordova, or even whether I should be asking questions on the
#phonegap or #cordova channels on Freenode. (The one question I had,
which came up later in the process, I asked on #phonegap and got a
helpful answer very quickly.) Neither one is packaged in Debian.
After some searching for a good, comprehensive tutorial, I ended up on a
The Cordova
tutorial rather than a PhoneGap one. So I typed:
sudo npm install -g cordova
Once it's installed, you can create a new app, add the android platform
(assuming you already have android development tools installed) and
build your new app:
Apparently Cordova/Phonegap can only build with its own
preferred version of android, which currently is 22.
Editing files to specify android-19 didn't work for me;
it just gave errors at a different point.
So I fired up the Android SDK manager, selected android-22 for install,
accepted the license ... and waited ... and waited. In the end it took
over two hours to download the android-22 SDK; the system image is 13Gb!
So that's a bit of a strike against PhoneGap.
While I was waiting for android-22 to download, I took a look at Kivy.
Kivy
As a Python enthusiast, I wanted to like Kivy best.
Plus, it's in the Debian repositories: I installed it with
sudo apt-get install python-kivy python-kivy-examples
They have a nice
quickstart
tutorial for writing a Hello World app on their site. You write
it, run it locally in python to bring up a window and see what the
app will look like. But then the tutorial immediately jumps into more
advanced programming without telling you how to build and deploy
your Hello World. For Android, that information is in the
Android
Packaging Guide. They recommend an app called Buildozer (cute name),
which you have to pull from git, build and install.
buildozer init
buildozer android debug deploy run
got started on building ... but then I noticed that it was attempting
to download and build its own version of apache ant
(sort of a Java version of make). I already have ant --
I've been using it for weeks for building my own Java android apps.
Why did it want a different version?
The file buildozer.spec in your project's
directory lets you uncomment and customize variables like:
# (int) Android SDK version to use
android.sdk = 21
# (str) Android NDK directory (if empty, it will be automatically downloaded.)
# android.ndk_path =
# (str) Android SDK directory (if empty, it will be automatically downloaded.)
# android.sdk_path =
Unlike a lot of Android build packages, buildozer will not inherit
variables like ANDROID_SDK, ANDROID_NDK and ANDROID_HOME from your
environment; you must edit buildozer.spec.
But that doesn't help with ant.
Fortunately, when I inspected the Python code for buildozer itself, I
discovered there was another variable that isn't mentioned in the
default spec file. Just add this line:
android.ant_path = /usr/bin
Next, buildozer gave me a slew of compilation errors:
kivy/graphics/opengl.c: No such file or directory
... many many more lines of compilation interspersed with errors
kivy/graphics/vbo.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
I had to ask on #kivy to solve that one. It turns out that the current
version of cython, 0.22, doesn't work with kivy stable. My choices were
to uninstall kivy and pull the development version from git, or to uninstall
cython and install version 0.21.2 via pip. I opted for the latter option.
Either way, there's no "make clean", so removing the dist and build
directories let me start over with the new cython.
Buildozer was now happy, and proceeded to download and build Python-2.7.2,
pygame and a large collection of other Python libraries for the ARM platform.
Apparently each app packages the Python language and all libraries it needs
into the Android .apk file.
Eventually I ran into trouble because I'd named my python file hello.py
instead of main.py; apparently this is something you're not allowed to change
and they don't mention it in the docs, but that was easily solved.
Then I ran into trouble again:
Exception: Unable to find capture version in ./main.py (looking for `__version__ = ['"](.*)['"]`)
The buildozer.spec file offers two types of versioning: by default "method 1"
is enabled, but I never figured out how to get past that error with
"method 1" so I commented it out and uncommented "method 2".
With that, I was finally able to build an Android package.
The .apk file it created was quite large because of all the embedded
Python libraries: for the little 77-line pong demo,
/usr/share/kivy-examples/tutorials/pong
in the Debian kivy-examples package, the apk came out 7.3Mb.
For comparison, my FeedViewer native java app, roughly 2000 lines of
Java plus a few XML files, produces a 44k apk.
The next step was to make a real mini app.
But when I looked through the Kivy examples, they all seemed highly
specialized, and I couldn't find any documentation that addressed issues
like what widgets were available or how to lay them out. How do I add a
basic text widget? How do I put a button next to it? How do I get
the app to launch in portrait rather than landscape mode?
Is there any way to speed up the very slow initialization?
I'd spent a few hours on Kivy and made a Hello World app, but I
was having trouble figuring out how to do anything more. I needed a
change of scenery.
PhoneGap, redux
By this time, android-22 had finally finished downloading.
I was ready to try PhoneGap again.
This time,
cordova platforms add android
cordova build
worked fine. It took a long time, because it downloaded the huge gradle
build system rather than using something simpler like ant. I already have
a copy of gradle somewhere (I downloaded it for the OsmAnd build), but
it's not in my path, and I was too beaten down by this point to figure
out where it was and how to get cordova to point to it.
Cordova eventually produced a 1.8Mb "hello world" apk --
a quarter the size of the Kivy package,
though 20 times as big as a native Java app.
Deployed on Android, it initialized much faster than the Kivy app, and
came up in portrait mode but rotated correctly if I rotated the phone.
Editing the HTML, CSS and Javascript was fairly simple. You'll want
to replace pretty much all of the default CSS if you don't want your app
monopolized by the Cordova icon.
The only tricky part was file access: opening a file:// URL
didn't work. I asked on #phonegap and someone helpfully told me I'd
need the file plugin. That was easy to find in the documentation, and
I added it like this:
My final apk, for a small web app I use regularly on Android,
was almost the same size as their hello world example: 1.8Mb.
And it works great: phonegap had no problem playing an audio clip,
something that was tricky when I was trying to do the same thing
from a native Android java WebView class.
Summary: How do Kivy and PhoneGap compare?
This has been a long article, I know. So how do Kivy and PhoneGap compare,
and which one will I be using?
They both need a large amount of disk space for the development environment.
I wish I had good numbers to give you, but I was working with
both systems at the same time, and their packages are scattered all
over the disk so I haven't found a good way of measuring their size. I
suspect PhoneGap is quite a bit bigger, because it uses gradle rather
than ant and because it insists on android-22.
On the other hand, PhoneGap wins big on packaged application size:
its .apk files are a quarter the size of Kivy's.
PhoneGap definitely wins on documentation. Kivy has seemingly lots of
documentation, but its tutorials jumped around rather than following
a logical sequence, and I had trouble finding answers to basic
questions like "How do I display a text field with a button?"
PhoneGap doesn't need that, because the UI is basic HTML and CSS --
limited though they are, at least most people know how to use them.
Finally, PhoneGap wins on startup speed. For my very simple test app,
startup was more or less immediate, while the Kivy Hello World app
required several seconds of startup time on my Galaxy S4.
Kivy is an interesting project. I like the ant-based build, the
straightforward .spec file, and of course the Python language.
But it still has some catching up to do in performance and documentation.
For throwing together a simple app and packaging it for Android, I
have to give the win to PhoneGap.
I recently needed to update an old Android app that I hadn't touched
in years. My Eclipse setup is way out of date, and I've been hearing
about more and more projects switching to using command-line builds.
I wanted to ditch my fiddly, difficult to install Eclipse setup
and switch to something easier to use.
Some of the big open-source packages, like OsmAnd, have switched to
gradle for their Java builds. So I tried to install gradle -- and
on Debian, apt-get install gradle wanted to pull
in a total of 153 packages! Maybe gradle wasn't the best option to pursue.
But there's another option for command-line android builds: ant.
When I tried apt-get install ant, since I
already have Java installed (I think the relevant package
is openjdk-7-jdk), it installed without needing a single
additional package.
For a small program, that's clearly a better way to go!
Then I needed to create a build directory and move my project into it.
That turned out to be fairly easy, too -- certainly compared to the
hours it spent setting up an Eclipse environment.
Here's how to set up your ant Android build:
First install the Android "Stand-alone SDK Tools" from
Installing
the Android SDK. This requires a fair amount of clicking around,
accepting licenses, and waiting for a long download.
Now install an SDK or two. Use
android sdk
to install new SDK versions, and
android list targets
to see what versions you have installed.
Create a new directory for your project, cd into it, and then:
Adjust the Android target for the version you want to use.
When this is done, type ant with no arguments to
make sure the directory structure was created properly.
If it doesn't print errors, that's a good sign.
Check that local.properties has sdk.dir set correctly.
It should have picked that up from your environment.
There will be a stub source file in src/tld/yourdomain/YourProject.java.
Edit it as needed, or, if you're transferring a project from another
build system such as eclipse, copy the existing .java files
to that directory.
If you have custom icons for your project, or other resources like
layout or menu files, put them in the appropriate directories under res.
The directory structure is the same as in eclipse, but unlike an eclipse
build, you can edit the files at any time without the build mysteriously
breaking.
Signing your app
Now you'll need a key to sign your app. Eclipse generates a debugging
key automatically, but ant doesn't. It's better to use a real key
anyway, since debugging keys expire and need to be regenerated periodically.
If you don't already have a key, generate one with:
It will ask you for a password; be sure to use one you won't forget
(or record it somewhere).
You can use any filename you want instead of my-key.keystore, and any
alias you want instead of mykey.
Now create a file called ant.properties containing these two lines:
Some tutorials tell you to put this in build.properties, but
that's outdated and no longer works.
If you forget your key alias, you can find out with this command and
the password:
keytool -list -keystore /path/to/my-key.keystore
Optionally, you can also include your key's password:
key.store.password=xxxx
key.alias.password=xxxx
If you don't, you'll be prompted twice for the password (which echoes
on the terminal, so be aware of that if anyone is bored enough to watch
over your shoulder as you build packages. I guess build-signing keys
aren't considered particularly high security). Of course, you should
make sure not to include both the private keystore file and the password
in any public code repository.
Building
Finally, you're ready to build!
ant release
If you get an error like:
AndroidManifest.xml:6: error: Error: No resource found that matches the given name (at 'icon' with value '@drawable/ic_launcher').
it's because older eclipse builds wanted icons named icon.png,
while ant wants them named ic_launcher.png. You can fix this
either by renaming your icons to res/drawable-hdpi/ic_launcher.png
(and the same for res/drawable-lpdi and -mdpi), or
by removing everything under bin (rm -rf bin/*)
and then editing AndroidManifest.xml. If you don't clear bin
before rebuilding, bin/AndroidManifest.xml will take
precendence over the AndroidManifest.xml in the root, so you
might have to edit both files.
After ant release, your binary will be in
bin/YourProject-release.apk.
If you have an adb connection, you can (re)install it with:
adb install -r bin/YourProject-release.apk
Done! So much easier than eclipse, and you can use any editor you want,
and check your files into any version control system.
That just leaves the coding part.
If only Java development were as easy as Python or C ...
I recently took over a website that's been neglected for quite a
while. As well as some bad links, I noticed a lot of old files, files
that didn't seem to be referenced by any of the site's pages.
Orphaned files.
So I went searching for a link checker that
also finds orphans. I figured that would be easy. It's
something every web site maintainer needs, right? I've gotten by
without one for my own website, but I know there are some bad links
and orphans there and I've often wanted a way to find them.
An intensive search turned up only one possibility: linklint, which
has a -orphan flag. Great! But, well, not really: after a few hours of
fiddling with options, I couldn't find any way to make it actually
find orphans. Either you run it on a http:// URL,
and it says it's searching for orphans but didn't find any (because it
ignors any local directory you specify); or you can run it just on a
local directory, in which case it finds a gazillion orphans
that aren't actually orphans, because they're referenced by files
generated with PHP or other web technology. Plus it flags all the
bad links in all those supposed orphans, which get in the way of
finding the real bad links you need to worry about.
I tried asking on a couple of technical mailing lists and IRC channels.
I found a few people who had managed to use linklint, but only by spidering
an entire website to local files (thus getting rid of any server side
dependencies like PHP, CGI or SSI) and then running linklint on the
local directory. I'm sure I could do that one time, for one website.
But if it's that much hassle, there's not much chance I'll keep
using to to keep websites maintained.
What I needed was a program that could look at a website and local
directory at the same time, and compare them, flagging any file
that isn't referenced by anything on the website. That sounded like it
would be such a simple thing to write.
So, of course, I had to try it. This is a tool that needs to exist --
and if for some bizarre reason it doesn't exist already, I was
going to remedy that.
Naturally, I found out that it wasn't quite as easy to write as it
sounded. Reconciling a URL like "http://mysite.com/foo/bar.html"
or "../asdf.html" with the corresponding path on disk turned
out to have a lot of twists and turns.
But in the end I prevailed. I ended up with a script called
weborphans
(on github). Give it both a local directory for the files making
up your website, and the URL of that website, for instance:
$ weborphans /var/www/ http://localhost/
It's still a little raw, certainly not perfect. But it's good
enough that I was able to find the 10 bad links and 606 orphaned
files on this website I inherited.
The local bird community has gotten me using
eBird.
It's sort of social networking for birders -- you can report sightings,
keep track of what birds you've seen where, and see what other people
are seeing in your area.
The only problem is the user interface for that last part. The data is
all there, but asking a question like "Where in this county have people
seen broad-tailed hummingbirds so far this spring?" is a lengthy
process, involving clicking through many screens and typing the
county name (not even a zip code -- you have to type the name).
If you want some region smaller than the county, good luck.
I found myself wanting that so often that I wrote an entry page for it.
My Bird Maps page
is meant to be used as a smart bookmark (also known as bookmarklets
or keyword bookmarks),
so you can type birdmap hummingbird or birdmap golden eagle
in your location bar as a quick way of searching for a species.
It reads the bird you've typed in, and looks through a list of
species, and if there's only one bird that matches, it takes you
straight to the eBird map to show you where people have reported
the bird so far this year.
If there's more than one match -- for instance, for birdmap hummingbird
or birdmap sparrow -- it will show you a list of possible matches,
and you can click on one to go to the map.
Like every Javascript project, it was both fun and annoying to write.
Though the hardest part wasn't programming; it was getting a list of
the nonstandard 4-letter bird codes eBird uses. I had to scrape one
of their HTML pages for that.
But it was worth it: I'm finding the page quite useful.
Firefox has made it increasingly difficult with every release to make
smart bookmarks. There are a few extensions, such as "Add Bookmark Here",
which make it a little easier. But without any extensions installed,
here's how you do it in Firefox 36:
First, go to the birdmap page
(or whatever page you want to smart-bookmark) and click on the * button
that makes a bookmark. Then click on the = next to the *, and in the
menu, choose Show all bookmarks.
In the dialog that comes up, find the bookmark you just made (maybe in
Unsorted bookmarks?) and click on it.
Click the More button at the bottom of the dialog.
(Click on the image at right for a full-sized screenshot.)
Now you should see a Keyword entry under the Tags entry
in the lower right of that dialog.
Change the Location to
http://shallowsky.com/birdmap.html?bird=%s.
Then give it a Keyword of birdmap
(or anything else you want to call it).
Close the dialog.
Now, you should be able to go to your location bar and type:
birdmap common raven
or
birdmap sparrow
and it will take you to my birdmap page. If the bird name specifies
just one bird, like common raven, you'll go straight from there to
the eBird map. If there are lots of possible matches, as with sparrow,
you'll stay on the birdmap page so you can choose which sparrow you want.
How to change the default location
If you're not in Los Alamos, you probably want a way to set your own
coordinates. Fortunately, you can; but first you have to get those
coordinates.
Here's the fastest way I've found to get coordinates for a region on eBird:
Click "Explore a Region"
Type in your region and hit Enter
Click on the map in the upper right
Then look at the URL: a part of it should look something like this:
env.minX=-122.202087&env.minY=36.89291&env.maxX=-121.208778&env.maxY=37.484802
If the map isn't right where you want it, try editing the URL, hitting
Enter for each change, and watch the map reload until it points where
you want it to. Then copy the four parameters and add them to your
smart bookmark, like this:
http://shallowsky.com/birdmap.html?bird=%s&minX=-122.202087&minY=36.89291&maxX=-121.208778&maxY=37.484802
Note that all of the the "env." have been removed.
The only catch is that I got my list of 4-letter eBird codes from an
eBird page for New Mexico.
I haven't found any way of getting the list for the entire US.
So if you want a bird that doesn't occur in New Mexico, my page might
not find it. If you like birdmap but want to use it in a different
state, contact me and tell me which state
you need, and I'll add those birds.
Google Code
is shutting
down. They've sent out notices to all project owners suggesting
they migrate projects to other hosting services.
I moved all my personal projects to GitHub years ago, back when
Google Code still didn't support git. But I'm co-owner on another
project that was still hosted there, and I volunteered to migrate it.
I remembered that being very easy back when I moved my personal projects:
GitHub had a one-click option to import from Google Code. I assumed
(I'm sure you know what that stands for) that it would be just as easy now.
Nope. Turns out GitHub no longer has any way to import from Google Code:
it tells you it can't find a repository there when you give it the
address to Google's SVN repository.
Google's announcement said they were providing an exporter to GitHub.
So I tried that next. I had the new repository ready on GitHub --
under the owner's account, not mine -- and I expected Google's
exporter to ask me for the repository.
Not so. As soon as I gave it my OAuth credentials, it immediately
created a new repository on GitHub under my name, using the name
we had used on Google Code (not the right name, since Google Code
project names have to be globally unique while GitHub projects don't).
So I had to wait for the export to finish; then, on GitHub, I went
to our real repository, and did an import there from the new
repository Google had created under my name. I have no idea how
long that took: GitHub's importer said it would email me when the
import was finished, but it didn't, so I waited several hours and
decided it was probably finished. Then I deleted the intermediate repository.
That worked fine, despite being a bit circuitous, and we're up and
running on GitHub now.
If you want to move your Google Code repository to GitHub without the
intermediate step of making a temporary repository, or if you don't
want to give Google OAuth access to your GitHub account,
here are some instructions (which I haven't tested) on how to do
the import via a local copy of the repo on your own machine, rather
than going directly from Google to GitHub:
krishnanand's
steps for migrating Google code to GitHub
Currently, I use my MetaPho
image tagger to update a file named Tags in the same directory as
the images I'm tagging. Then I have a script called
fotogr
that searches for combinations of tags in these Tags files.
That works fine. But I have occasionally wondered if I
should also be saving tags inside the images themselves, in case I
ever want compatibility with other programs. I decided I should at
least figure out how that would work, in case I want to add it to
MetaPho.
I thought it would be simple -- add some sort of key in the images's
EXIF tags. But no -- EXIF has no provision for tags or keywords.
But JPEG (and some other formats) supports lots of tags besides EXIF.
Was it one of the XMP tags?
Web searching only increased my confusion; it seems that there is
no standard for this, but there have been lots of pseudo-standards
over the years. It's not clear what tag most programs read, but my
impression is that the most common is the
"Keywords" IPTC tag.
Okay. So how would I read or change that from a Python program?
Lots of Python libraries can read EXIF tags, including Python's own
PIL library -- I even wrote a few years ago about
reading
EXIF from PIL. But writing it is another story.
Nearly everybody points to pyexiv2,
a fairly mature library that even has a well-written
pyexiv2 tutorial.
Great! The only problem with it is that the pyexiv2 front page has a big
red Deprecation warning saying that it's being replaced by GExiv2.
With a link that goes to a nonexistent page; and Debian doesn't seem
to have a package for GExiv2, nor could I find a tutorial on it anywhere.
Sigh. I have to say that pyexiv2 sounds like a much better bet for now
even if it is supposedly deprecated.
Following the tutorial, I was able to whip up a little proof of concept
that can look for an IPTC Keywords tag in an existing image, print out
its value, add new tags to it and write it back to the file.
import sys
import pyexiv2
if len(sys.argv) < 2:
print "Usage:", sys.argv[0], "imagename.jpg [tag ...]"
sys.exit(1)
metadata = pyexiv2.ImageMetadata(sys.argv[1])
metadata.read()
newkeywords = sys.argv[2:]
keyword_tag = 'Iptc.Application2.Keywords'
if keyword_tag in metadata.iptc_keys:
tag = metadata[keyword_tag]
oldkeywords = tag.value
print "Existing keywords:", oldkeywords
if not newkeywords:
sys.exit(0)
for newkey in newkeywords:
oldkeywords.append(newkey)
tag.value = oldkeywords
else:
print "No IPTC keywords set yet"
if not newkeywords:
sys.exit(0)
metadata[keyword_tag] = pyexiv2.IptcTag(keyword_tag, newkeywords)
tag = metadata[keyword_tag]
print "New keywords:", tag.value
metadata.write()
Does that mean I'm immediately adding it to MetaPho? No. To be honest,
I'm not sure I care very much, since I don't have any other software
that uses that IPTC field and no other MetaPho user has ever asked for it.
But it's nice to know that if I ever have a reason to add it, I can.
Today dinner was a bit delayed because I got caught up dealing with an
RSS feed that wasn't feeding. The website was down, and Python's
urllib2, which I use in my
"feedme" RSS fetcher,
has an inordinately long timeout.
That certainly isn't the first time that's happened, but I'd like it
to be the last. So I started to write code to set a shorter timeout,
and realized: how does one test that? Of course, the offending site
was working again by the time I finished eating dinner, went for a
little walk then sat down to code.
I did a lot of web searching, hoping maybe someone had already set up
a web service somewhere that times out for testing timeout code.
No such luck. And discussions of how to set up such a site
always seemed to center around installing elaborate heavyweight Java
server-side packages. Surely there must be an easier way!
How about PHP? A web search for that wasn't helpful either. But I
decided to try the simplest possible approach ... and it worked!
Just put something like this at the beginning of your HTML page
(assuming, of course, your server has PHP enabled):
<?php sleep(500); ?>
Of course, you can adjust that 500 to be any delay you like.
Or you can even make the timeout adjustable, with a
few more lines of code:
<?php
if (isset($_GET['timeout']))
sleep($_GET['timeout']);
else
sleep(500);
?>
Then surf to yourpage.php?timeout=6 and watch the page load
after six seconds.
Simple once I thought of it, but it's still surprising no one
had written it up as a cookbook formula. It certainly is handy.
Now I just need to get some Python timeout-handling code working.
I don't use web forums, the kind you have to read online, because they
don't scale. If you're only interested in one subject, then they work fine:
you can keep a browser tab for your one or two web forums perenially open
and hit reload every few hours to see what's new.
If you're interested in twelve subjects, each of which has several
different web forums devoted to it -- how could
you possibly keep up with that? So I don't bother with forums unless
they offer an email gateway, so they'll notify me by email when new
discussions get started, without my needing to check all those web
pages several times per day.
LinkedIn discussions mostly work like a web forum.
But for a while, they had a reasonably usable email gateway. You
could set a preference to be notified of each new conversation.
You still had to click on the web link to read the conversation so far,
but if you posted something, you'd get the rest of the discussion
emailed to you as each message was posted.
Not quite as good as a regular mailing list, but it worked pretty well.
I used it for several years to keep up with the very active Toastmasters
group discussions.
About a year ago, something broke in their software, and they lost
the ability to send email for new conversations. I filed a trouble
ticket, and got a note saying they were aware of the problem and
working on it. I followed up three months later (by filing another
ticket -- there's no way to add to an existing one) and got a response
saying be patient, they were still working on it. 11 months later,
I'm still being patient, but it's pretty clear they have no intention
of ever fixing the problem.
Just recently I fiddled with something in my LinkedIn prefs, and
started getting "Popular Discussions" emails every day or so.
The featured "popular discussion" is always something stupid that
I have no interest in, but it's followed by a section headed
"Other Popular Discussions" that at least gives me some idea what's
been posted in the last few days. Seemed like it might be worth
clicking on the links even though it means I'd always be a few days
late responding to any conversations.
Except -- none of the links work. They all go to a generic page with a red
header saying "Sorry it seems there was a problem with the link you followed."
I'm reading the plaintext version of the mail they send out. I tried
viewing the HTML part of the mail in a browser, and sure enough, those
links worked. So I tried comparing the text links with the HTML:
Text version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&t=gde&midToken=AQEqep2nxSZJIg&ek=b2_anet_digest&li=82&m=group_discussions&ts=textdisc-6&itemID=5914453683503906819&itemType=member&anetID=98449
HTML version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&t=gde&midToken=AQEqep2nxSZJIg&ek=b2_anet_digest&li=17&m=group_discussions&ts=grouppost-disc-6&itemID=5914453683503906819&itemType=member&anetID=98449
Well, that's clear as mud, isn't it?
HTML entity substitution
I pasted both links one on top of each other, to make it easier to
compare them one at a time. That made it fairly easy to find the first
difference:
Text version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&t=gde&midToken= ...
HTML version:
http://www.linkedin.com/e/v2?e=3x1l-hzwzd1q8-6f&t=gde&midToken= ...
Time to die laughing: they're doing HTML entity substitution on the
plaintext part of their email notifications, changing & to &
everywhere in the link.
If you take the link from the text email and replace & with &,
the link works, and takes you to the specific discussion.
Pagination
Except you can't actually read the discussion. I went to a discussion
that had been open for 2 days and had 35 responses, and LinkedIn only
showed four of them. I don't even know which four they are -- are
they the first four, the last four, or some Facebook-style "four
responses we thought you'd like". There's a button to click on to
show the most recent entries, but then I only see a few of the most
recent responses, still not the whole thread.
Hooray for the web -- of course, plenty of other people have had this
problem too, and a little web searching unveiled a solution. Add a
pagination token to the end of the URL that tells LinkedIn to show
1000 messages at once.
&count=1000&paginationToken=
It won't actually show 1000 (or all) responses -- but
if you start at the beginning of the page and scroll down reading
responses one by one, it will auto-load new batches.
Yes, infinite scrolling pages
can be annoying, but at least it's a way to read a LinkedIn
conversation in order.
Making it automatic
Okay, now I know how to edit one of their URLs to make it work. Do I
want to do that by hand any time I want to view a discussion? Noooo!
Time for a script! Since I'll be selecting the URLs from mutt, they'll
be in the X PRIMARY clipboard. And unfortunately, mutt adds newlines so
I might as well strip those as well as fixing the LinkedIn problems.
(Firefox will strip newlines for me when I paste in a multi-line URL,
but why rely on that?)
Here's the important part of the script:
import subprocess, gtk
primary = gtk.clipboard_get(gtk.gdk.SELECTION_PRIMARY)
if not primary.wait_is_text_available() :
sys.exit(0)
link = primary.wait_for_text()
link = link.replace("\n", "").replace("&", "&") + \
"&count=1000&paginationToken="
subprocess.call(["firefox", "-new-tab", link])
And here's the full script:
linkedinify
on GitHub. I also added it to
pyclip,
the script I call from Openbox to open a URL in Firefox
when I middle-click on the desktop.
Now I can finally go back to participating in those discussions.
Finding separation between two objects is easy in PyEphem: it's just one
line once you've set up your objects, observer and date.
p1 = ephem.Mars()
p2 = ephem.Jupiter()
observer = ephem.Observer() # and then set it to your city, etc.
observer.date = ephem.date('2014/8/1')
p1.compute(observer)
p2.compute(observer)
ephem.separation(p1, p2)
So all I have to do is loop over all the visible planets and see when
the separation is less than some set minimum, like 4 degrees, right?
Well, not really. That tells me if there's a conjunction between
a particular pair of planets, like Mars and Jupiter. But the really
interesting events are when you have three or more objects close
together in the sky. And events like that often span several days.
If there's a conjunction of Mars, Venus, and the moon, I don't want to
print something awful like
Friday:
Conjunction between Mars and Venus, separation 2.7 degrees.
Conjunction between the moon and Mars, separation 3.8 degrees.
Saturday:
Conjunction between Mars and Venus, separation 2.2 degrees.
Conjunction between Venus and the moon, separation 3.9 degrees.
Conjunction between the moon and Mars, separation 3.2 degrees.
Sunday:
Conjunction between Venus and the moon, separation 4.0 degrees.
Conjunction between the moon and Mars, separation 2.5 degrees.
... and so on, for each day. I'd prefer something like:
Conjunction between Mars, Venus and the moon lasts from Friday through Sunday.
Mars and Venus are closest on Saturday (2.2 degrees).
The moon and Mars are closest on Sunday (2.5 degrees).
At first I tried just keeping a list of planets involved in the conjunction.
So if I see Mars and Jupiter close together, I'd make a list [mars,
jupiter], and then if I see Venus and Mars on the same date, I search
through all the current conjunction lists and see if either Venus or
Mars is already in a list, and if so, add the other one. But that got
out of hand quickly. What if my conjunction list looks like
[ [mars, venus], [jupiter, saturn] ] and then I see there's also
a conjunction between Mars and Jupiter? Oops -- how do you merge
those two lists together?
The solution to taking all these pairs and turning them into a list
of groups that are all connected actually lies in graph theory: each
conjunction pair, like [mars, venus], is an edge, and the trick is to
find all the connected edges. But turning my list of conjunction pairs
into a graph so I could use a pre-made graph theory algorithm looked
like it was going to be more code -- and a lot harder to read and less
maintainable -- than making a bunch of custom Python classes.
I eventually ended up with three classes:
ConjunctionPair, for a single conjunction observed between two bodies
on a single date;
Conjunction, a collection of ConjunctionPairs covering as many bodies
and dates as needed;
and ConjunctionList, the list of all Conjunctions currently active.
That let me write methods to handle merging multiple conjunction
events together if they turned out to be connected, as well as a
method to summarize the event in a nice, readable way.
So predicting conjunctions ended up being a lot more code than I
expected -- but only because of the problem of presenting it neatly
to the user. As always, user interface represents the hardest part
of coding.
All through the years I was writing the planet observing column for
the San Jose Astronomical Association, I was annoyed at the lack of places
to go to find out about upcoming events like conjunctions, when two or
more planets are close together in the sky. It's easy to find out
about conjunctions in the next month, but not so easy to find sites
that will tell you several months in advance, like you need if you're
writing for a print publication (even a club newsletter).
For some reason I never thought about trying to calculate it myself.
I just assumed it would be hard, and wanted a source that could
spoon-feed me the predictions.
The best source I know of is the
RASC Observer's Handbook,
which I faithfully bought every year and checked each month so I could
enter that month's events by hand. Except for January and February, when I
didn't have the next year's handbook yet by the time my column went
to press and I was on my own.
I have to confess, I was happy to get away from that aspect of the
column when I moved.
In my new town, I've been helping the local nature center with their
website. They had some great pages already, like a
What's
Blooming Now? page that keeps track
of which flowers are blooming now and only shows the current ones.
I've been helping them extend it by adding features like showing only
flowers of a particular color, separating the data into CSV databases
so it's easier to add new flowers or butterflies, and so forth.
Eventually we hope to build similar databases of birds, reptiles and
amphibians.
And recently someone suggested that their astronomy page could use
some help. Indeed it could -- it hadn't been updated in about five years.
So we got to work looking for a source of upcoming astronomy events
we could use as a data source for the page, and we found sources for
a few things, like moon phases and eclipses, but not much.
Someone asked about planetary conjunctions, and remembering
how I'd always struggled to find that data, especially in months when
I didn't have the RASC handbook yet, I got to wondering about
calculating it myself.
Obviously it's possible to calculate when a planet will
be visible, or whether two planets are close to each other in the sky.
And I've done some programming with
PyEphem before, and found
it fairly easy to use. How hard could it be?
Note: this article covers only the basic problem of predicting when
a planet will be visible in the evening.
A followup article will discuss the harder problem of conjunctions.
Calculating planet visibility with PyEphem
The first step was figuring out when planets were up.
That was straightforward. Make a list of the easily visible planets
(remember, this is for a nature center, so people using the page
aren't expected to have telescopes):
Then we need an observer with the right latitude, longitude and
elevation. Elevation is apparently in meters, though they never bother
to mention that in the PyEphem documentation:
observer = ephem.Observer()
observer.name = "Los Alamos"
observer.lon = '-106.2978'
observer.lat = '35.8911'
observer.elevation = 2286 # meters, though the docs don't actually say
Then we loop over the date range for which we want predictions.
For a given date d, we're going to need to know the time of sunset,
because we want to know which planets will still be up after nightfall.
observer.date = d
sunset = observer.previous_setting(sun)
Then we need to loop over planets and figure out which ones are visible.
It seems like a reasonable first approach to declare that any planet
that's visible after sunset and before midnight is worth mentioning.
Now, PyEphem can tell you directly the rising and setting times of a planet
on a given day. But I found it simplified the code if I just checked
the planet's altitude at sunset and again at midnight. If either one
of them is "high enough", then the planet is visible that night.
(Fortunately, here in the mid latitudes we don't have to
worry that a planet will rise after sunset and then set again before
midnight. If we were closer to the arctic or antarctic circles, that
would be a concern in some seasons.)
min_alt = 10. * math.pi / 180.
for planet in planets:
observer.date = sunset
planet.compute(observer)
if planet.alt > min_alt:
print planet.name, "is already up at sunset"
Easy enough for sunset. But how do we set the date to midnight on
that same night? That turns out to be a bit tricky with PyEphem's
date class. Here's what I came up with:
What's that 7 there? That's Greenwich Mean Time when it's midnight in
our time zone. It's hardwired because this is for a web site meant for
locals. Obviously, for a more general program, you should get the time
zone from the computer and add accordingly, and you should also be
smarter about daylight savings time and such. The PyEphem documentation,
fortunately, gives you tips on how to deal with time zones.
(In practice, though, the rise and set times of planets on a given
day doesn't change much with time zone.)
And now you have your predictions of which planets will be visible
on a given date. The rest is just a matter of writing it out into
your chosen database format.
In the next article, I'll cover planetary and lunar
conjunctions -- which were superficially very simple, but turned out
to have some tricks that made the programming harder than I expected.
When I built my
http://shallowsky.com/blog/hardware/raspberry-pi-motion-camera.html
(and part
2), I always had the NoIR camera in the back of my mind. The NoIR is a
version of the Pi camera module with the infra-red blocking
filter removed, so you can shoot IR photos at night without disturbing
nocturnal wildlife (or alerting nocturnal burglars, if that's your target).
After I got the daylight version of the camera working, I ordered a NoIR
camera module and plugged it in to my RPi. I snapped some daylight
photos with raspstill and verified that it was connected and working;
then I waited for nightfall.
In the dark, I set up the camera and put my cup of hot chocolate in
front of it. Nothing. I hadn't realized that although CCD
cameras are sensitive in the near IR, the wavelengths only slightly
longer than visible light, they aren't sensitive anywhere near
the IR wavelengths that hot objects emit. For that, you need a special
thermal camera. For a near-IR CCD camera like the Pi NoIR, you need an
IR light source.
Knowing nothing about IR light sources, I did a search and came up
with something called a
"Infrared IR 12 Led Illuminator Board Plate for CCTV Security CCD Camera"
for about $5. It seemed similar to the light sources used on a few
pages I'd found for home-made night vision cameras, so I ordered it.
Then I waited, because I stupidly didn't notice until a week and a half
later that it was coming from China and wouldn't arrive for three weeks.
Always check the shipping time when ordering hardware!
When it finally arrived, it had a tiny 2-pin connector that I couldn't
match locally. In the end I bought a package of female-female SchmartBoard
jumpers at Radio Shack which were small enough to make decent contact
on the light's tiny-gauge power and ground pins.
I soldered up a connector that would let me use a a universal power
supply, taking a guess that it wanted 12 volts (most of the cheap LED
rings for CCD cameras seem to be 12V, though this one came with no
documentation at all). I was ready to test.
Testing the IR light
One problem with buying a cheap IR light with no documentation:
how do you tell if your power supply is working? Since the light is
completely invisible.
The only way to find out was to check on the Pi. I didn't want to have
to run back and forth between the dark room where the camera was set
up and the desktop where I was viewing raspistill images. So I
started a video stream on the RPi:
Then, on the desktop: I ran vlc, and opened the network stream:
rtsp://pi:8554/
(I have a "pi" entry in /etc/hosts, but using an IP address also works).
Now I could fiddle with hardware in the dark room while looking through
the doorway at the video output on my monitor.
It took some fiddling to get a good connection on that tiny connector
... but eventually I got a black-and-white view of my darkened room,
just as I'd expect under IR illumination.
I poked some holes in the milk carton and used twist-ties to seccure
the light source next to the NoIR camera.
Lights, camera, action
Next problem: mute all the blinkenlights, so my camera wouldn't look
like a christmas tree and scare off the nocturnal critters.
The Pi itself has a relatively dim red run light, and it's inside the
milk carton so I wasn't too worried about it.
But the Pi camera has quite a bright red
light that goes on whenever the camera is being used.
Even through the thick milk carton bottom, it was glaring and obvious.
Fortunately, you can
disable
the Pi camera light: edit /boot/config.txt and add this line
disable_camera_led=1
My USB wi-fi dongle has a blue light that flickers as it gets traffic.
Not super bright, but attention-grabbing. I addressed that issue
with a triple thickness of duct tape.
The IR LEDs -- remember those invisible, impossible-to-test LEDs?
Well, it turns out that in darkness, they emit a faint but still
easily visible glow. Obviously there's nothing I can do about that --
I can't cover the camera's only light source! But it's quite dim, so
with any luck it's not spooking away too many animals.
Results, and problems
For most of my daytime testing I'd used a threshold of 30 -- meaning
a pixel was considered to have changed if its value differed by more
than 30 from the previous photo. That didn't work at all in IR: changes
are much more subtle since we're seeing essentially a black-and-white
image, and I had to divide by three and use a sensitivity of 10 or 11
if I wanted the camera to trigger at all.
With that change, I did capture some nocturnal visitors, and some
early morning ones too. Note the funny colors on the daylight shots:
that's why cameras generally have IR-blocking filters if they're not
specifically intended for night shots.
But I'm not happy with the setup. For one thing, it has far too many
false positives. Maybe one out of ten or fifteen images actually has
an animal in it; the rest just triggered because the wind made the
leaves blow, or because a shadow moved or the color of the light changed.
A simple count of differing pixels is clearly not enough for this task.
Of course, the software could be smarter about things: it could try to
identify large blobs that had changed, rather than small changes
(blowing leaves) all over the image. I already know
SimpleCV
runs fine on the Raspberry Pi, and I could try using it to do
object detection.
But there's another problem with detection purely through camera images:
the Pi is incredibly slow to capture an image. It takes around 20 seconds
per cycle; some of that is waiting for the network but I think most of
it is the Pi talking to the camera. With quick-moving animals,
the animal may well be gone by the time the system has noticed a change.
I've caught several images of animal tails disappearing out of the
frame, including a quail who visited yesterday morning. Adding smarts
like SimpleCV will only make that problem worse.
So I'm going to try another solution: hooking up an infra-red motion detector.
I'm already working on setting up tests for that, and should have a
report soon. Meanwhile, pure image-based motion detection has been
an interesting experiment.
I'm helping an organization with some website work. But I'm not the only
one working on the website, and there's no version control.
I wanted an easy way to make sure all my files were up-to-date
before I start to work on one ... a way to mirror the website,
or at least specific directories, to my local disk.
Normally I use rsync -av over ssh to mirror directories, but
this website is on a server that only offers ftp access.
I've been using ncftp to copy files up one by one, but
although ncftp's manual says it has a mirror mode and I found a
few web references to that, I couldn't find anything telling me how
to activate it.
Making matters worse, there are some large files that I don't need to
mirror. The first time I tried to use get * in ncftp to get one directory,
it spent 15 minutes trying to download a huge powerpoint file, then
stalled and lost the connection. There are some big .doc and .docx files,
too. And ncftp doesn't seem to have a way to exclude specific files.
Enter lftp. It has a mirror mode (with documentation, even!)
which includes a -X to exclude files matching specified patterns.
lftp includes a -e to pass commands -- like "mirror" -- to it on the
command line. But the documentation doesn't say whether you can use
more than one command at a time. So it seemed safer to start up an lftp
session and pass a series of commands to it.
And that works nicely. Just set up the list of directories you want to
mirror, and you can write a nice shell function you can put in your.
.zshrc or .bashrc:
sitemirror() {
commands=""
for dir in thisdir thatdir theotherdir
do
commands="$commands
mirror --only-newer -vvv -X '*.ppt' -X '*.doc*' -X '*.pdf' htdocs/$dir $HOME/web/webmirror/$dir"
done
echo Commands to be run:
echo $commands
echo
lftp <<EOF
open -u 'user,password' ftp.example.com
$commands
bye
EOF
}
Super easy -- all I do is type sitemirror and wait a little.
Now I don't have any excuse for not being up to date.
Although I use emacs for most of my coding, I use vim quite a lot too,
for quick edits, mail messages, and anything I need to edit when logged
onto a remote server.
In particular, that means editing my procmail spam filter files
on the mail server.
The spam rules are mostly lists of regular expression patterns,
and they can include long lines, such as:
gift ?card .*(Visa|Walgreen|Applebee|Costco|Starbucks|Whitestrips|free|Wal.?mart|Arby)
My default vim settings for editing text, including line wrap,
don't work if get a flood of messages offering McDonald's gift cards
and decide I need to add a "|McDonald" on the end of that long line.
Of course, I can type ":set tw=0" to turn off wrapping, but who wants
to have to do that every time? Surely vim has a way to adjust settings
based on file type or location, like emacs has.
It didn't take long to find an example of
Project
specific settings on the vim wiki.
Thank goodness for the example -- I definitely wouldn't have figured
that syntax out just from reading manuals. From there, it was easy to
make a few modifications and set textwidth=0 if I'm opening a file in
my procmail directory:
" Set wrapping/textwidth according to file location and type
function! SetupEnvironment()
let l:path = expand('%:p')
if l:path =~ '/home/akkana/Procmail'
" When editing spam filters, disable wrapping:
setlocal textwidth=0
endfunction
autocmd! BufReadPost,BufNewFile * call SetupEnvironment()
Nice! But then I remembered other cases where I want to turn off
wrapping. For instance, editing source code in cases where emacs
doesn't work so well -- like remote logins over slow connections, or
machines where emacs isn't even installed, or when I need to do a lot
of global substitutes or repetitive operations. So I'd like to be able
to turn off wrapping for source code.
I couldn't find any way to just say "all source code file types" in vim.
But I can list the ones I use most often. While I was at it, I threw
in a special wrap setting for mail files:
" Set wrapping/textwidth according to file location and type
function! SetupEnvironment()
let l:path = expand('%:p')
if l:path =~ '/home/akkana/Procmail'
" When editing spam filters, disable wrapping:
setlocal textwidth=0
elseif (&ft == 'python' || &ft == 'c' || &ft == 'html' || &ft == 'php')
setlocal textwidth=0
elseif (&ft == 'mail')
" Slightly narrower width for mail (and override mutt's override):
setlocal textwidth=68
else
" default textwidth slightly narrower than the default
setlocal textwidth=70
endif
endfunction
autocmd! BufReadPost,BufNewFile * call SetupEnvironment()
As long as we're looking at language-specific settings, what about
doing language-specific indentation like emacs does? I've always
suspected vim must have a way to do that, but it doesn't enable it
automatically like emacs does. You need to set three variables,
assuming you prefer to use spaces rather than tabs:
" Indent specifically for the current filetype
filetype indent on
" Set indent level to 4, using spaces, not tabs
set expandtab shiftwidth=4
Then you can also use useful commands like << and >> for
in- and out-denting blocks of code, or ==, for indenting to the right
level. It turns out vim's language indenting isn't all that smart, at
least for Python, and gets the wrong answer a lot of them time. You
can't rely on it as a syntax checker the way you can with emacs.
But it's a lot better than no language-specific indentation.
I went to a terrific workshop last week on identifying bird songs.
We listened to recordings of songs from some of the trickier local species,
and discussed the differences and how to remember them. I'm not a serious
birder -- I don't do lists or Big Days or anything like that, and I
dislike getting up at 6am just because the birds do -- but I do try to
identify birds (as well as mammals, reptiles, rocks, geographic
features, and pretty much anything else I see while hiking or just
sitting in the yard) and I've always had trouble remembering their songs.
One of the tools birders use to study bird songs is the sonogram.
It's a plot of frequency (on the vertical axis) and intensity (represented
by color, red being louder) versus time. Looking at a sonogram
you can identify not just how fast a bird trills and whether it calls in
groups of three or five, but whether it's buzzy/rattly (a vertical
line, lots of frequencies at once) or a purer whistle, and whether
each note is ascending or descending.
The class last week included sonograms for the species we studied.
But what about other species? The class didn't cover even all the local
species I'd like to be able to recognize.
I have several collections of bird calls on CD
(which I bought to use in combination with my "tweet" script
-- yes, the name messes up google searches, but my tweet predates Twitter --
a tweet
Python script and
tweet
in HTML for Android).
It would be great to be able to make sonograms from some of those
recordings too.
But a search for Linux sonogram turned up nothing useful.
Audacity has a histogram visualization mode with lots of options, but
none of them seem to result in a usable sonogram, and most discussions
I found on the net agreed that it couldn't do it. There's another
sound editor program called snd which can do sonograms, but it's
fiddly to use and none of the many color schemes produce a sonogram
that I found very readable.
Okay, what about python scripts? Surely that's been done?
I had better luck there. Matplotlib's pylab package has a
specgram() call that does more or less what I wanted,
and here's
an
example of how to use pylab.specgram().
(That post also has another example using a library called timeside,
but timeside's PyPI package doesn't have any dependency information,
and after playing the old RPM-chase game installing another dependency,
trying it, then installing the next dependency, I gave up.)
The only problem with pylab.specgram() was that it shows
the full range of the sound, both in time and frequency.
The recordings I was examining can
last a minute or more and go up to 20,000 Hz -- and when pylab tries
to fit that all on the screen, you end up with a plot where the details
are too small to show you anything useful.
Then I did some fiddling to allow for analyzing only part of the
recording -- Python's wave package has no way to read in just the first
six seconds of a .wav file, so I had to read in the
whole file, read the data into a numpy array, then take a slice
representing the seconds of the recording I actually wanted.
But now I can plot nice sonograms of any bird song I want to see,
print them out or stick them on my Android device so I can carry them
with me.
Update: Oops! I forgot to include a link to the script. Here it is:
Sonograms
in Python.
But New Mexico came through on my next-to-last full day with some
pretty interesting weather. A windstorm in the afternoon gave way
to thunder (but almost no lightning -- I saw maybe one indistinct flash)
which gave way to a strange fluffy hail that got gradually bigger until
it eventually grew to pea-sized snowballs, big enough and snow enough
to capture well in photographs as they came down on the junipers
and in the garden.
Then after about twenty minutes the storm stopped the sun came out.
And now I'm back to tweaking tutorial slides and thinking about packing
while watching the sunset light on the Rio Grande gorge.
But tomorrow I leave it behind and fly to Montreal.
See you at PyCon!
It'll be a hands-on workshop, where we'll experiment with the
Raspberry Pi's GPIO pins and learn how to control simple things like
an LED. Then we'll hook up sonar rangefinders to the RPis, and
build a little device that can be used to monitor visitors at your
front door, birds at your feeder, co-workers standing in front of your
monitor while you're away, or just about anything else you can think of.
Participants will bring their own Raspberry Pi computers and power supplies
-- attendees of last year's PyCon got them there, but a new Model A
can be gotten for $30, and a model B for $40.
We'll provide everything else.
We worried that requiring participants to bring a long list of esoteric
hardware was just asking for trouble, so we worked a deal with PyCon
and they're sponsoring hardware for attendees. Thank you, PyCon!
CodeChix is fronting the money
for the kits and helping with our travel expenses, thanks to donations
from some generous sponsors.
We'll be passing out hardware kits and SD cards at the
beginning of the workshop, which attendees can take home afterward.
We're also looking for volunteer T/As.
The key to a good hardware workshop is having lots of
helpers who can make sure everybody's keeping up and nobody's getting lost.
We have a few top-notch T/As signed up already, but we can always
use more. We can't provide hardware for T/As, but most of it's quite
inexpensive if you want to buy your own kit to practice on. And we'll
teach you everything you need to know about how get your PiDoorbell
up and running -- no need to be an expert at hardware or even at
Python, as long as you're interested in learning and in helping
other people learn.
This should be a really fun workshop! PyCon tutorial sign-ups just
opened recently, so sign up for the tutorial (we do need advance
registration so we know how many hardware kits to buy). And if you're
going to be at PyCon and are interested in being a T/A, drop me or
Rupa a line and we'll get you on the list and get you all the
information you need.
I've been scanning a bunch of records with Audacity (using as a guide
Carla Schroder's excellent Book of
Audacity and a
Behringer
UCA222 USB audio interface -- audacity doesn't seem able to record
properly from the built-in sound card on any laptop I own, while it
works fine with the Behringer.
Audacity's user interface isn't great for assembly-line recording of
lots of tracks one after the other, especially on a laptop with a
trackpad that doesn't work very well, so I wasn't always as organized
with directory names as I could have been, and I ended up with a mess.
I was periodically backing up the recordings to my desktop, but as I
shifted from everything-in-one-directory to an organized system, the
two directories got out of sync.
To get them back in sync, I needed a way to answer this question:
is every file inside directory A (maybe in some subdirectory of it)
also somewhere under subdirectory B? In other words, can I safely
delete all of A knowing that anything in it is safely stored in B,
even though the directory structures are completely different?
I was hoping for some clever find | xargs way to do it,
but came up blank. So eventually I used a little zsh loop:
one find to get the list of files to test, then for each of
those, another find inside the target directory, then test
the exit code of find to see if it found the file.
(I'm assuming that if the songname.aup file is there, the songname_data
directory is too.)
for fil in $(find AAA/ -name '*.aup'); do
fil=$(basename $fil)
find BBB -name $fil >/dev/null
if [[ $? != 0 ]]; then
echo $fil is not in BBB
fi
done
When I wrote recently about my
Dactylic
dinosaur doggerel, I glossed over a minor problem with my final poem:
the rules of
double-dactylic
doggerel say that the sixth line (or sometimes the seventh) should
be a single double-dactyl word -- something like "paleontologist"
or "hexasyllabic'ly". I used "dinosaur orchestra" -- two words,
which is cheating.
I don't feel too guilty about that.
If you read the post, you may recall that the verse was the result of
drifting grumpily through an insomniac morning where I would have
preferred to be getting back to sleep. Coming up with anything that
scans at all is probably good enough.
Still, it bugged me, not being able to think of a double-dactylic word
that related somehow to Parasaurolophus. So I vowed that, later that
day when I was up and at the computer, I would attempt to find one and
rewrite the poem accordingly.
I thought that would be fairly straightforward. Not so much. I thought
there would be some utility I could run that would count syllables for
me, then I could run /usr/share/dict/words through it, print
out all the 6-syllable words, and find one that fit. Turns out there
is no such utility.
But Python has a library for everything, doesn't it?
Some searching turned up
PyHyphen,
which includes some syllable-counting functions.
It apparently uses the hyphenation dictionaries that come with
LibreOffice.
There's a Debian package for it, python-pyhyphen -- but it doesn't work.
First, it depends on another package, hyphen-en-us, but doesn't
have that dependency encoded in the package, even as a suggested or
recommended package. But even when you install the hyphenated dictionary,
it still doesn't work because it doesn't point to the dictionary in
the place it was installed.
Looks like that problem was reported almost two years ago,
bug 627944:
python-pyhyphen: doesn't work out-of-the-box with hyphen-* packages.
There's a fix there that involves editing two files,
/usr/lib/python2.7/dist-packages/hyphen/config.py and
/usr/lib/python2.7/dist-packages/hyphen/__init__.py.
Or you can just give up on Debian and pip install pyhyphen,
which is a lot easier.
But once you get it working, you find that it's terrible.
It was wrong about almost every word I tried.
I hope not too many people are relying on this hyphen-en-us dictionary
for important documents. Its results seemed nearly random, and I
quickly gave up on it for getting a useful list of words around
six syllables.
Just for fun, since my count syllables web search turned
up quite a few websites claiming that functionality, I tried entering
some of my long test words manually. All of the websites I tried were
wrong more than half the time, and often they were off by more than
two syllables. I don't mind off-by-ones -- I can look at words
claiming 5 and 7 syllables while searching for double dactyls --
but if I have to include 4-syllable words as well, I'll never find
what I'm looking for.
That discouraged me from using another Python suggestion I'd seen, the
nltk (natural language toolkit) package. I've been looking for an
excuse to play with nltk, and some day I will, but for this project
I was looking for a quick approximate solution, and the nltk examples
I found mostly looked like using it would require a bigger time
commitment than I was willing to devote to silly poetry. And if
none of the dedicated syllable-counting websites or dictionaries
got it right, would a big time investment in nltk pay off?
Anyway, by this time I'd wasted more than an hour poking around
various libraries and websites for this silly unimportant problem,
and I decided that with that kind of time investment, I could probably
do better on my own than the official solutions were giving me.
Why not basically just count vowels?
So I whipped up a little script,
countsyl,
that did just that. I gave it a list of vowels, with a few simple rules.
Obviously, you can't just say every vowel is a new syllable -- there
are too many double vowels and silent letters and such. But you can't
say that any run of multiple vowels together counts as one syllable,
because sometimes the vowels do count; and you can't make absolute
rules like "'e' at the end of a word is always silent", because
sometimes it isn't. So I kept both minimum and maximum syllable counts
for each word, and printed both.
And much to my surprise, without much tuning at all my silly little
script immediately much better results than the hyphenation dictionary
or the dedicated websites.
Alas, although it did give me quite a few hexasyllabic words in
/usr/share/dict/words, none of them were useful at all for a program
on Parasaurolophus. What I really needed was a musical term (since
that's what the poem is about). What about a musical dictionary?
I found a list of musical terms on
Wikipedia:
Glossary of musical terminology, saved it as a local file,
ran a few vim substitutes and turned it into a plain list of words.
That did a little better, and gave me some possible ideas:
(non?)contrapuntally?
(something)harmonically?
extemporaneously?
But none of them worked out, and by then I'd run out of steam.
I gave up and blogged the poem as originally written, with the
cheating two-word phrase "dinosaur orchestra", and vowed to write
up how to count words in Python -- which I have now done.
Quite noncontrapuntally, and definitely not extemporaneously.
But at least I have a useful little script next time I want to
get an approximate syllable count.
Last week I wrote about some tests I'd made to answer the question
Does
scrolling output make a program slower?
My test showed that when running a program that generates lots of output,
like an rsync -av, the rsync process will slow way down as it waits for
all that output to scroll across whatever terminal client you're using.
Hiding the terminal helps a lot if it's an xterm or a Linux console,
but doesn't help much with gnome-terminal.
A couple of people asked in the comments about the actual source of
the slowdown. Is the original process -- the rsync, or my test script,
that's actually producing all that output -- actually blocking waiting
for the terminal? Or is it just that the CPU is so busy doing all that
font rendering that it has no time to devote to the original program,
and that's why it's so much slower?
I found pingu on IRC (thanks to JanC) and the group had a very
interesting discussion, during which I ran a series of additional tests.
In the end, I'm convinced that CPU allocation to the original process
is not the issue, and that output is indeed blocked waiting for the
terminal to display the output. Here's why.
First, I installed a couple of performance meters and looked at the
CPU load while rendering. With conky, CPU use went up equally (about
35-40%) on both CPU cores while the test was running. But that didn't
tell me anything about which processes were getting all that CPU.
htop was more useful. It showed X first among CPU users, xterm second,
and my test script third. However, the test script never got more than
10% of the total CPU during the test; X and xterm took up nearly all
the remaining CPU.
Even with the xterm hidden, X and xterm were the top two CPU users.
But this time the script, at number 3, got around 30% of the CPU
rather than 10%. That still doesn't seem like it could account for the
huge difference in speed (the test ran about 7 times faster with xterm
hidden); but it's interesting to know that even a hidden xterm will
take up that much CPU.
It was also suggested that I try running it to /dev/null,
something I definitely should have thought to try before.
The test took .55 seconds with its output redirected to /dev/null,
and .57 seconds redirected to a file on disk (of course, the kernel
would have been buffering, so there was no disk wait involved).
For comparison, the test had taken 56 seconds with xterm visible
and scrolling, and 8 seconds with xterm hidden.
I also spent a lot of time experimenting with sleeping for various
amounts of time between printed lines.
With time.sleep(.0001) and xterm visible, the test took 104.71 seconds.
With xterm shaded and the same sleep, it took 98.36 seconds, only 6 seconds
faster. Redirected to /dev/null but with a .0001 sleep, it took 97.44 sec.
I think this argues for the blocking theory rather than the CPU-bound one:
the argument being that the sleep gives the program a chance
to wait for the output rather than blocking the whole time.
If you figure it's CPU bound, I'm not sure how you'd explain the result.
But a .0001 second sleep probably isn't very accurate anyway -- we
were all skeptical that Linux can manage sleep times that small.
So I made another set of tests, with a .001 second sleep every 10
lines of output. The results:
65.05 with xterm visible; 63.36 with xterm hidden; 57.12 to /dev/null.
That's with a total of 50 seconds of sleeping included
(my test prints 500000 lines).
So with all that CPU still going toward font rendering, the
visible-xterm case still only took 7 seconds longer than the /dev/null
case. I think this argues even more strongly that the original test,
without the sleep, is blocking, not CPU bound.
But then I realized what the ultimate test should be. What happens when
I run the test over an ssh connection, with xterm and X running on my
local machine but the actual script running on the remote machine?
The remote machine I used for the ssh tests was a little slower than the
machine I used to run the other tests, but that probably doesn't make
much difference to the results.
The results? 60.29 sec printing over ssh (LAN) to a visible xterm;
7.24 sec doing the same thing with xterm hidden. Fairly similar to
what I'd seen before when the test, xterm and X were all running on the
same machine.
Interestingly, the ssh process during the test took 7% of my CPU,
almost as much as the python script was getting before, just to
transfer all the output lines so xterm could display them.
So I'm convinced now that the performance bottleneck has nothing to do
with the process being CPU bound and having all its CPU sucked away by
rendering the output, and that the bottleneck is in the process being
blocked in writing its output while waiting for the terminal to catch up.
I'd be interested it hear further comments -- are there other
interpretations of the results besides mine?
I'm also happy to run further tests.
While watching my rsync -av messages scroll by during a big backup,
I wondered, as I often have, whether that -v (verbose) flag was slowing
my backup down.
In other words: when you run a program that prints lots of output,
so there's so much output the terminal can't display it all in real-time
-- like an rsync -v on lots of small files --
does the program wait ("block") while the terminal catches up?
And if the program does block, can you speed up your backup by
hiding the terminal, either by switching to another desktop, or by
iconifying or shading the terminal window so it's not visible?
Is there any difference among the different ways of hiding the
terminal, like switching desktops, iconifying and shading?
Since I've never seen a discussion of that, I decided to test it myself.
I wrote a very simple Python program:
import time
start = time.time()
for i in xrange(500000):
print "Now we have printed", i, "relatively long lines to stdout."
print time.time() - start, "seconds to print", i, "lines."
I ran it under various combinations of visible and invisible terminal.
The results were striking.
These are rounded to the nearest tenth of a second, in most cases
the average of several runs:
Terminal type
Seconds
xterm, visible
56.0
xterm, other desktop
8.0
xterm, shaded
8.5
xterm, iconified
8.0
Linux framebuffer, visible
179.1
Linux framebuffer, hidden
3.7
gnome-terminal, visible
56.9
gnome-terminal, other desktop
56.7
gnome-terminal, iconified
56.7
gnome-terminal, shaded
43.8
Discussion:
First, the answer to the original question is clear. If I'm displaying
output in an xterm, then hiding it in any way will make a huge
difference in how long the program takes to complete.
On the other hand, if you use gnome-terminal instead of xterm,
hiding your terminal window won't make much difference.
Gnome-terminal is nearly as fast as xterm when it's displaying;
but it apparently lacks xterm's smarts about not doing
that work when it's hidden. If you use gnome-terminal,
you don't get much benefit out of hiding it.
I was surprised how slow the Linux console was (I'm using the framebuffer
in the Debian 3.2.0-4-686-pae on Intel graphics). But it's easy to see
where that time is going when you watch the output: in xterm, you see
lots of blank space as xterm skips drawing lines trying to keep up
with the program's output. The framebuffer doesn't do that:
it prints and scrolls every line, no matter how far behind it gets.
But equally interesting is how much faster the framebuffer is when
it's not visible. (I typed Ctrl-alt-F2, logged in, ran the program,
then typed Ctrl-alt-F7 to go back to X while the program ran.)
Obviously xterm is doing some background processing that the framebuffer
console doesn't need to do. The absolute time difference, less than four
seconds, is too small to worry about, but it's interesting anyway.
I would have liked to try it my test a base Linux console, with no framebuffer,
but figuring out how to get a distro kernel out of framebuffer mode was
a bigger project than I wanted to tackle that afternoon.
I should mention that I wasn't super-scientific about these tests.
I avoided doing any heavy work on the machine while the tests were running,
but I was still doing light editing (like this article), reading mail and
running xchat. The times for multiple runs were quite consistent, so I
don't think my light system activity affected the results much.
So there you have it. If you're running an output-intensive program
like rsync -av and you care how fast it runs, use either xterm or the
console, and leave it hidden most of the time.
Update: the script described in this article has been folded into another
script called
viewmailattachments.py.
Command-line mailers like mutt have one disadvantage: viewing HTML mail
with embedded images. Without images, HTML mail is no problem -- run
it through lynx, links or w3m. But if you want to see images in place,
how do you do it?
Mutt can send a message to a browser like firefox ... but only the
textual part of the message. The images don't show up.
That's because mail messages include images,
not as separate files, but as attachments within the same file, encoded
it a format known as MIME (Multipurpose Internet Mail Extensions).
An image link in the HTML, instead of looking like
<img src="picture.jpg">., will instead look
something like
<img src="cid:0635428E-AE25-4FA0-93AC-6B8379300161">.
(Apple's Mail.app) or
<img src="cid:1.3631871432@web82503.mail.mud.yahoo.com">.
(Yahoo's webmail).
CID stands for Content ID, and refers to the ID of the image as
it is encoded in MIME inside the image. GUI mail programs, of course,
know how to decode this and show the image. Mutt doesn't.
A web search finds a handful of shell scripts that use
the munpack program (part of the mpack package on Debian
systems) to split off the files;
then they use various combinations of sed and awk to try to view those files.
Except that none of the scripts I found actually work for messages sent
from modern mailers -- they don't decode the
CID links properly.
I wasted several hours fiddling with various shell scripts, trying
to adjust sed and awk commands to figure out the problem, when I
had the usual epiphany that always eventually arises from shell script
fiddling: "Wouldn't this be a lot easier in Python?"
Python's email package
Python has a package called
email
that knows how to list and unpack MIME attachments. Starting from the
example near the bottom of that page, it was easy to split off the various
attachments and save them in a temp directory. The key is
import email
fp = open(msgfile)
msg = email.message_from_file(fp)
fp.close()
for part in msg.walk():
That left the problem of how to match CIDs with filenames, and rewrite
the links in the HTML message accordingly.
The documentation on the email package is a bit unclear, unfortunately.
For instance, they don't give any hints what object you'll get when
iterating over a message with walk, and if you try it,
they're just type 'instance'. So what operations can you expect are
legal on them? If you run help(part) in the Python console
on one of the parts you get from walk,
it's generally class Message, so you can use the
Message API,
with functions like get_content_type(),
get_filename(). and get_payload().
More useful, it has dictionary keys() for the attributes
it knows about each attachment. part.keys() gets you a list like
So by making a list relating part.get_filename() (with a
made-up filename if it doesn't have one already) to part['Content-ID'],
I'd have enough information to rewrite those links.
Case-insensitive dictionary matching
But wait! Not so simple. That list is from a Yahoo mail message, but
if you try keys() on a part sent by Apple mail, instead if will be
'Content-Id'. Note the lower-case d, Id, instead of the ID that Yahoo used.
Unfortunately, Python doesn't have a way of looking up items in a
dictionary with the key being case-sensitive. So I used a loop:
for k in part.keys():
if k.lower() == 'content-id':
print "Content ID is", part[k]
Most mailers seem to put angle brackets around the content id, so
that would print things like
"Content ID is <14.3631871432@web82503.mail.mud.yahoo.com>".
Those angle brackets have to be removed, since the
CID links in the HTML file don't have them.
for k in part.keys():
if k.lower() == 'content-id':
if part[k].startswith('<') and part[k].endswith('>'):
part[k] = part[k][1:-1]
But that didn't work -- the angle brackets were still there, even
though if I printed part[k][1:-1] it printed without angle brackets.
What was up?
Unmutable parts inside email.Message
It turned out that the parts inside an email Message (and maybe the
Message itself) are unmutable -- you can't change them. Python doesn't
throw an exception; it just doesn't change anything. So I had to make
a local copy:
for k in part.keys():
if k.lower() == 'content-id':
content_id = part[k]
if content_id.startswith('<') and content_id.endswith('>'):
content_id = content_id[1:-1]
and then save content_id, not part[k], in my list of filenames and CIDs.
Then the rest is easy. Assuming I've built up a list called subfiles
containing dictionaries with 'filename' and 'Content-Id', I can
do the substitution in the HTML source:
htmlsrc = html_part.get_payload(decode=True)
for sf in subfiles:
htmlsrc = re.sub('cid: ?' + sf['Content-Id'],
'file://' + sf['filename'],
htmlsrc, flags=re.IGNORECASE)
Then all I have to do is hook it up to a key in my .muttrc:
# macro index <F10> "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/viewhtmlmail.py\n" "View HTML in browser"
# macro pager <F10> "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/viewhtmlmail.py\n" "View HTML in browser"
Works nicely! Here's the complete script:
viewhtmlmail.
Someone on the gimp-developers list asked whether there was
documentation of GIMP's menu hooks.
I wasn't sure what they meant by "hooks", but GIMP menus do have
an interesting feature that plug-in writers should know about:
placeholders.
Placeholders let you group similar types of actions together.
For instance, iever notice that in the image window's File menu,
all the things that Open images are grouped together? There's Open,
Open as Layers..., Open Location... and Open Recent. And then
there's a group of Save actions all grouped together -- Save, Save
As..., Save a Copy... and so forth. That's because there's a
placeholder for Open in the File menu, and another placeholder
for Save.
When you write your own plug-ins, you can take advantage of these
placeholders. For instance, I want my Save/Export clean plug-in to
show up next to the other Save menu items, not somewhere else down
near the bottom of the menu -- so when I register it, I pass
menu = "<Image>/File/Save/" so GIMP knows to group
it with the other Save actions, even though it's directly in the File
menu, not a submenu called Save..
Pretty slick, huh? But how do you know what placeholders are available?
I took a look at the source.
In the menus/ subdirectory are all the menu definitions in XML,
and they're pretty straightforward.
In image-menu.xml you'll see things like
<placeholder name="Open">, <placeholder name="Save">
So to get a list of all the menu placeholders, you just need to
find all the "
grep '<placeholder' menus/*.xml
That's not actually so useful, though, because it doesn't tell you
what submenu contains the placeholder. For instance, Acquire is a
placeholder but you need to know that it's actually
File->Create->Acquire. So let's be a little more clever.
We want to see <menu lines as well as <placeholders,
but not <menuitem since those are just individual menu entries.
egrep '<(placeholder|menu) will do that.
Then pass it through some sed expressions to clean up the output,
loop over all the XML files, and I ended up with:
for f in *.xml; do
echo $f
egrep '<(placeholder|menu) ' $f | sed -e 's_<placeholder *name="_** _' -e 's_<menu.*name="__' -e 's_"/*>__'
done
It isn't perfect: a few lines still show up that
shouldn't -- but it'll get you the list you need. Fortunately the
GIMP developers are very good about things like code formatting,
so the identation of the file shows
which placeholder is inside which submenu.
I only found placeholders in the image window menu, plus a single
placeholder, "Outline", in the selection menu popup.
I'm a little confused about that menu file: it seems to duplicate
the existing Select menu in the image-menu.xml, except that
the placeholder items in question -- feather, sharpen, shrink, grow,
and border -- are in a placeholder called Outline in
selection-menu.xml, but in a placeholder called Modify
in image-menu.xml.
Anyway, here's the full list of placeholders, cleaned up for readability.
Placeholders are in bold and followed with an asterisk *.
Python on Android. Wouldn't that make so many things so much easier?
I've known for a long time about
SL4A, but
when I read, a year or two ago, that Google officially disclaimed
support for languages other than Java and C and didn't want their
employees working on projects like SL4A, I decided it wasn't a good bet.
But recently I heard from someone who had just discovered SL4A and
its Python support and talked about it like a going thing. I had an
Android scripting problem I really wanted to solve, and decided it
was time to take another look.
It turns out SL4A and its Python interpreter are still being
maintained, and indeed, I was able to solve my problem that way.
But the documentation was scanty at best. So here are some shortcuts.
Getting Python running on Android
How do you install it in the first place? Took me three or four tries:
it turns out it's extremely picky about the order in which you do
things, and the documentation doesn't warn you about that.
Follow these steps:
Enable "Unknown Sources" under Application settings if you haven't already.
Download both sl4a_r6.apk and PythonForAndroid_r4.apk
Install sl4a from the apk. Do not install Python yet.
Find SL4A in Applications and run it. It will say "no matches found"
(i.e. no scripts)
but that's okay: the important thing is that it creates the directory
where the scripts will live,
/sdcard/sl4a/scripts, without which PythonForAndroid would fail to install.
Install PythonForAndroid from the apk.
Find Python for Android in Applications and run it. Tap Install.
This will install the sample scripts, and you'll be ready to go.
Make a shortcut on the home screen:
You've written a script and it does what you want. But to run it, you
have to run SL4A, choose the Python interpreter, scroll around to find
the script, tap on it, and indicate whether or not you want to see
the console. Way too many steps!
Turns out you can make a shortcut on the home screen to an SL4A
script, like this:
(thanks to this
tip):
Hit the add icon button ("+") on the main screen.
tap on Shortcuts
scroll down to Scripts
choose your script
choose the icon indicating whether you want to show the console
while the script is running
This will give you the familiar twin-snake Python icon on your home screen.
There doesn't seem to be any way to change this to a different icon.
Wait, what about UI?
Well, that still seems to be a big hole in the whole SL4A model.
You can write great scripts that print to the console. You can even
do a few specialized things, like popup menus, messages (what the
Python Android module calls makeToast()) and notifications.
The test.py sample script is a great illustration of how
to use all those features, plus a lot more.
But what if you want to show a window, put a few buttons in it,
let the user control things? Nobody seems to have thought about
that possibility. I mean, it's not "sorry, we haven't had time to
implement this", it isn't even mentioned as something someone would
want to do on an Android device. Boggle.
The only possibility I've found is that there is apparently a way to use
Android's
WebView class from Python.
I have not tried this yet; when I do, I'll write it up separately.
WebView may not be the best way to do UI. I've spent many hours
tearing my hair out over its limitations even when called from Java.
But still, it's something. And one very interesting thing about it
is that it provides an easy way to call up an HTML page, either local
or remote, from an Android home screen icon. So that may be the best
reason yet to check out SL4A.
It was pretty cool at first, but pasting every address into the
latitude/longitude web page and then pasting the resulting coordinates
into the address file, got old, fast.
That's exactly the sort of repetitive task that computers are supposed
to handle for us.
The lat/lon page used Javascript and the Google Maps API.
and I already had a Google Maps API key (they have all sorts of fun
APIs for map geeks) ... but I really wanted something
that could run locally, reading and converting a local file.
And then I discovered the
Python googlemaps
package. Exactly what I needed! It's in the Python Package Index,
so I installed it with pip install googlemaps.
That enabled me to change my
waymaker
Python script: if the first line of a
description wasn't a latitude and longitude, instead it looked for
something that might be an address.
Addresses in my data files might be one line or might be two,
but since they're all US addresses, I know they'll end with a
two-capital-letter state abbreviation and a 5-digit zip code:
2948 W Main St. Anytown, NM 12345.
You can find that with a regular expression:
match = re.search('.*[A-Z]{2}\s+\d{5}$', line)
But first I needed to check whether the first line of the entry was already
latitude/longitude coordinates, since I'd already converted some of
my files. That uses another regular expression. Python doesn't seem
to have a built-in way to search for generic numeric expressions
(containing digits, decimal points or +/- symbols) so I made one,
since I had to use it twice if I was searching for two numbers with
whitespace between them.
numeric = '[\+\-\d\.]'
match = re.search('^(%s+)\s+(%s+)$' % (numeric, numeric),
line)
(For anyone who wants to quibble, I know the regular expression
isn't perfect.
For instance, it would match expressions like 23+48..6.1-64.5.
Not likely to be a problem in these files, so I didn't tune it further.)
If the script doesn't find coordinates
but does find something that looks like an address, it feeds the
address into Google Maps and gets the resulting coordinates.
That code looks like this:
from googlemaps import GoogleMaps
gmaps = GoogleMaps('YOUR GOOGLE MAPS API KEY HERE')
try:
lat, lon = gmaps.address_to_latlng(addr)
except googlemaps.GoogleMapsError, e:
print "Oh, no! Couldn't geocode", addr
print e
Overall, a nice simple solution made possible with python-googlemaps.
The full script is on github:
waymaker.
Dave and I have been doing some exploratory househunting trips,
and one of the challenges is how to maintain a list of houses and
navigate from location to location. It's basically like geocaching,
navigating from one known location to the next.
Sure, there are smartphone apps to do things like "show houses for
sale near here" against a Google Maps background. But we didn't want
everything, just the few gems we'd picked out ahead of time.
And some of the places we're looking are fairly remote -- you can't
always count on a consistent signal everywhere as you drive around,
let alone a connection fast enough to download map tiles.
Fortunately, I use a wonderful open-source Android program called
OsmAnd.
It's the best, bar none, at offline mapping: download data files
prepared from OpenStreetMap
vector data, and you're good to go, even into remote areas with no
network connectivity. It's saved our butts more than once exploring
remote dirt tracks in the Mojave. And since the maps come from
OpenStreetMap, if you find anything wrong with the map, you can fix it.
So the map part is taken care of. What about that list of houses?
Making waypoint files
On the other hand, one of OsmAnd's many cool features is that it can
show track logs. I can upload a GPX file from my Garmin, or record a
track within OsmAnd, and display the track on OsmAnd's map.
GPX track files can include waypoints. What if I made a GPX file
consisting only of waypoints and descriptions for each house?
My husband was already making text files of potentially interesting houses:
404 E David Dr
Flagstaff, AZ 86001
$355,000
3 Bed 2 Bath
1,673 Sq Ft
0.23 acres
http://blahblah/long_url
2948 W Wilson Dr
Flagstaff, AZ 86001
$285,000
3 Bed 2 Bath
1,908 Sq Ft
8,000 Sq Ft Lot
http://blahblah/long_url
... (and so on)
So I just needed to turn those into GPX.
GPX is a fairly straightforward XML format -- I've parsed GPX files
for pytopo
and for ellie,
and generating them from Python should be easier than parsing.
But first I needed latitude and longitude coordinates.
A quick web search solved that: an excellent page called
Find
latitude and longitude with Google Maps.
You paste the address in and it shows you the location on a map
along with latitude and longitude. Thanks to Bernard Vatant at Mondeca!
For each house, I copied the coordinates directly from the page
and pasted them into the file. (Though that got old after about the fifth
house; I'll write about automating that step in a separate article.)
Then I wrote a script called
waymaker
that parses a file of coordinates and descriptions and makes waypoint files.
Run it like this: waymaker infile.txt outfile.gpx
and it will create (or overwrite) a gpx file consisting of those waypoints.
Getting it into OsmAnd
I plugged my Android device into my computer's USB port, mounted it as
usb-storage and copied all the GPX files into osmand/tracks
(I had to create the tracks subdirectory myself, since I hadn't
recorded any tracks. After restarting OsmAnd, it was able to see all
the waypoint files.
OsmAnd has a couple of other ways of showing points besides track files.
"Favorites" lets you mark a point on the map and save it to various
Favorites categories. But although there's a file named favorites.gpx,
changes you make to it never show up in the program. Apparently they're
cached somewhere else. "POI" (short for Points of Interest) can be
uploaded, but only as a .obf OsmAnd file or a .sqlitedb database, and
there isn't much documentation on how to create either one.
GPX tracks seemed like the easiest solution, and I've been happy
with them so far.
Update: I asked on the osmand mailing list; it turns out that on the
Favorites screen (Define View, then Favorites) there's a Refresh
button that makes osmand re-read favorites.gpx. Works great.
It uses pretty much the same format as track files -- I took
<wpt></wpt> sequences I'd generated with waymaker and
added them to my existing favorites.gpx file, adding appropriate
categories. It's nice to have two different ways to display and
categorize waypoints within the app.
Using waypoints in OsmAnd
How do you view these waypoints once they're loaded?
When you're in OsmAnd's map view, tap the menu button and choose
Define View, then GPX track...
You'll see a list of all your GPX files; choose the one you want.
You'll be taken back to the map view,
at a location and zoom level that shows all your waypoints. Don't
panic if you don't see them immediately; sometimes I needed
to scroll and zoom around a little before OsmAnd noticed there were
waypoints and started drawing them.
Then you can navigate in the usual way. When you get to a waypoint,
tap on it to see the description brieftly -- I was happy to find that
multiple line descriptions work just fine. Or long-press on it to pop up a
persistent description window that will stay up until you dismiss it.
It worked beautifully for our trip, both for houses and for
other things like motels and points of interest along the way.
Wednesday I taught my "Robotics and Sensors" workshop at
the SWE GetSET summer camp.
It was lots of fun, and definitely better than last year. It helped that
I had a wonderful set of volunteers helping out -- five women from
CodeChix (besides myself), so we had
lots of programming expertise, plus a hardware engineer who was
wonderfully helpful with debugging circuits. Thanks so much to all the
volunteers! You really made the workshop!
We also had a great group of girls -- 14 high school seniors, all smart
and motivated, working in teams of two.
How much detail?
One big issue when designing a one-day programming workshop is how
much detail to provide in each example, and how much to leave to the
students to work out. Different people learn differently. I'm the sort
who learns from struggling through a problem, not from simply copying
an example, and last year I think I erred too much in that direction,
giving minimal information and encouraging the girls to work out the rest.
Some of them did fine, but others found it frustrating. In a one-day
workshop, if you have to spend too much time working everything out,
you might never get to the fun stuff.
So this year I took a different approach. For each new piece of hardware,
I gave them one small, but complete, working example, then suggested
ways they could develop that. So for the first example
(File->Examples->Basic->Blink is everyone's first
Arduino exercise), I gave everyone two LEDs and two resistors, and
as soon as they got their first LED blinking, I encouraged them to
try adding another.
It developed that about half the teams wired their second
LED right next to the first one, still on pin 13. Clever! but not what
I'd had in mind.
So I encouraged them to try moving the second LED to a different pin,
like pin 12, and see if they could make one LED turn on while the
other one turned off.
Another challenge with workshops is that people work at very different
speeds. You have to have projects the fast students can work on to keep them
from getting bored while the rest are catching up. So for LEDs, having
a box full of extra LEDs helped, and by the time we were ready to move on,
they had some great light shows going -- tri-colored blinkers, fast
flashers, slow double-blinks.
I had pushbuttons on the tentative agenda but I was pretty sure that
we'd skip that part. Pushbuttons are useful but they aren't really all
that much fun. You have to worry about details like pull-down resistors
and debouncing, too much detail when you have only six hours total.
Potentiometers are more rewarding. We went through
File->Examples->03.Analog->AnalogInput,
and a few teams also tried LED fading with
File->Examples->03.Analog->AnalogInOutSerial.
Music
But then we moved on to what was really the highlight of the day,
piezo speakers.
Again, I provided a small working
example
program to create a rising tone. The Arduino IDE has no good
speaker examples built in, so I'd made a short url for my
Robots and Sensors
workshop page, is.gd/getset, to make it easyto
copy/paste code. It took no time at all before their speakers were
making noise.
I was afraid they'd just stop there ...
but as it turned out, everybody was energized
(including me and the other volunteers) by all the funny noises,
and without any prompting the girls immediately got to work changing
their tones, making them rise faster or slower, or (with some help
from volunteers) making them fall instead of rise. Every team had
different sounds, and everybody was laughing and having fun as they
tweaked their code.
In fact, that happened so fast that we ended up with plenty of time
left before lunch. My plan was to do speakers right before lunch because
noise is distracting, and after you've done that you can't to
concentrate on anything else for a while. So I let them continue to
play with the speakers.
I was glad I did. At least three different teams took the initiative
to search the web and find sample code for playing music.
There were some hitches -- a lot of the code samples needed to be
tweaked a bit, from changing the pin where the speaker was plugged in,
to downloading an include file of musical notes. One page gave code
that didn't compile at all. But it was exciting to watch -- after all,
this sort of experimentation and trial-and-error is a big part
of what programmers do, and they all eventually got their music projects
working.
One thing I learned was that providing a complete working
.ino file makes a big difference. Some of the "music on Arduino"
pages the girls found provided C functions but no hints as to how
to call those functions. (It wasn't obvious to me, either.)
Some of my own examples for the afternoon projects were like that,
providing code snippets without setup() and loop(), and some teams
were at sea, unsure how to create setup() and loop(). Of course
I'd explained about setup() and loop() during the initial blink
exercise. But considering how much material we covered in such a short
time, it's not reasonable to expect everybody to remember details like
that. And the Arduino IDE error messages aren't terribly easy to read,
especially showing up orange on black in a tiny 3-line space at the
bottom of the window.
So, for future workshops, I'll provide complete .ino files for all my
own examples, plus a skeleton file with an empty setup() and loop()
already there.
It's okay to spoon feed basic details like the structure of an .ino
file if it gives the students more time to think about the really
interesting parts of their project.
Afternoon projects
After lunch, the afternoon was devoted to projects. Teams could pick
anything we had hardware for, work on it throughout the afternoon and
present it at the end of the workshop. There were two teams working on
robotic cars (sadly, as with so many motor projects, the hardware
ended up being too flaky and the cars didn't do much).
Other teams worked with sonar rangefinders, light sensors or tilt
switches, while some continued to work on their lights and music.
Everybody seemed like they were having a good time, and I'd seen a lot of
working (or at least partly working) projects as I walked around
during the afternoon, but when it came to present what they'd done,
I was a little sad.
There was a lot of "Well, I tried this, but I couldn't get it to work,
so then I switched to doing this." Of course, trying things and
changing course are also part of engineering ... that sentence
describes a lot of my own playing with hardware, now that I think of
it. But still ... I was sad hearing it.
Notes for next time
So, overall, I was happy with the workshop. I haven't seen the evaluation
forms yet, but it sure seemed like everybody was having fun,
and I know we volunteers did.
What are the points I want to remember for next time?
Start with small but complete working examples to introduce each
new hardware component.
Provide complete .ino files, not just code snippets.
Skip pushbuttons, but do try to cover AnalogInOutSerial and PWM output.
Or at least have printed handouts explaining the PWM outputs and LED fading.
Turnkey kits are good: the less "connect the blue wire to pin 7,
the green one to pin 8" the better. For things like cars, I'd
like something already wired up with battery and shield,
"Just add Arduino".
Keep a closer eye on the afternoon projects -- try to make sure
each team has something they're proud to show off.
Thanks again to the great volunteers! I'm looking forward to giving
this workshop again.
For years I've used bookmarklets to shorten URLs.
For instance, with is.gd, I set up a bookmark to
javascript:document.location='http://is.gd/create.php?longurl='+encodeURIComponent(location.href);,
give it a keyword like isgd, and then when I'm on a page
I want to paste into Twitter (the only reason I need a URL shortener),
I type Ctrl-L (to focus the URL bar) then isgd and hit return.
Easy.
But with the latest rev of Firefox (I'm not sure if this started with
version 20 or 21), sometimes javascript: links don't work. They just
display the javascript source in the URLbar rather than executing it.
Lacking a solution to the Firefox problem, I still needed a way of
shortening URLs. So I looked into Python solutions.
It turns out there are a few URL shorteners with public web APIs.
is.gd is one of them; shorturl.com is another.
There are also APIs for bit.ly and goo.gl if you don't mind
registering and getting an API key. Given that, it's pretty easy
to write a Python script.
In the browser, I select the URL I want (e.g. by doubleclicking in
the URLbar, or by right-clicking and choosing
"Copy link location". That puts the URL in the X selection.
Then I run the shorturl script, with no arguments. (I have it
in my window manager's root menu.)
shorturl reads the X selection and shortens the URL (it tries is.gd
first, then shorturl.com if is.gd doesn't work for some reason).
Then it pops up a little window showing me both the short URL and the
original long one, so I can be sure I shortened the right thing.
(One thing I don't like about a lot of the URL services is that
they don't tell you the original URL; I only find out later that
I tweeted a link to something that wasn't at all the link I intended
to share.)
It also copies the short URL into the X selection, so after verifying
that the long URL was the one I wanted, I can go straight to my Twitter
window (in my case, a Bitlbee tab in my IRC client) and middleclick
to paste it.
After I've pasted the short link, I can dismiss the window by typing q.
Don't type q too early -- since the python script owns the X selection,
you won't be able to paste it anywhere once you've closed the window.
(Unless you're running a selection-managing app like klipper.)
I just wish there were some way to use it for Twitter's own shortener,
t.co. It's so frustrating that Twitter makes us all shorten URLs to
fit in 140 characters just so they can shorten them again with their
own service -- in the process removing any way for readers to see where
the link will go. Sorry, folks -- nothing I can do about that.
Complain to Twitter about why they won't let anyone use t.co directly.
When I'm working with an embedded Linux box -- a plug computer, or most
recently with a Raspberry Pi -- I usually use GNU screen as my
terminal program.
screen /dev/ttyUSB0 115200 connects to the appropriate
USB serial port at the appropriate speed, and then you can log in
just as if you were using telnet or ssh.
With one exception: the window size. Typically everything is fine
until you use an editor, like vim. Once you fire up an editor, it
assumes your terminal window is only 24 lines high, regardless of
its actual size. And even after you exit the editor, somehow your
window will have been changed so that it scrolls at the 24th line,
leaving the bottom of the window empty.
Tracking down why it happens took some hunting.
Tthere are lots of different places the
screen size can be set. Libraries like curses can ask the terminal
its size (but apparently most programs don't). There's a size built
into most terminfo entries (specified by the TERM environment
variable) -- but it's not clear that gets used very much any more.
There are environment variables LINES and COLUMNS,
and a lot of programs read those; but they're often unset, and even if
they are set, you can't trust them. And setting any of these didn't
help -- I could change TERM and LINES and COLUMNS all I wanted, but
as soon as I ran vim the terminal would revert to that
scrolling-at-24-lines behavior.
In the end it turned out the important setting was the tty setting.
You can get a summary of what the tty driver thinks its size is:
% stty size
32 80
But to set it, you use rows and columns rather than
size.
I discovered I could type stty rows 32 (or whatever my
current terminal size was), and then I could run vim and it would stay
at 32 rather than reverting to 24. So that was the important setting vim
was following.
The basic problem was that screen, over a serial line, doesn't have a
protocol for passing the terminal's size information, the way
a remote login program like ssh, rsh or telnet does. So how could
I get my terminal size set appropriately on login?
Auto-detecting terminal size
There's one program that will do it for you, which I remembered
from the olden days of Unix, back before programs like telnet had this
nice size-setting built in. It's called resize, and on Debian,
it turned out to be part of the xterm package.
That's actually okay on my current Raspberry Pi, since I have X
libraries installed in case I ever want to hook up a monitor.
But in general, a little embedded Linux box shouldn't need X,
so I wasn't very satisfied with this solution. I wanted something with
no X dependencies. Could I do the same thing in Python?
How it works
Well, as I mentioned, there are ways of getting the size of the
actual terminal window, by printing an escape sequence and parsing
the result.
But finding the escape sequence was trickier than I expected. It isn't
written about very much. I ended up running script and
capturing the output that resize sent, which seemed a little crazy:
'\e[7\e[r\e[999;999H\e[6n' (where \e means the escape character).
Holy cow! What are all those 999s?
Apparently what's going on is that there isn't any sequence to ask
xterm (or other terminal programs) "What's your size?" But there is
a sequence to ask, "Where is the cursor on the screen right now?"
So what you do is send a sequence telling it to go to row 999 and
column 999; and then another sequence asking "Where are you really?"
Then read the answer: it's the window size.
(Note: if we ever get monitors big enough for 1000x1000 terminals,
this will fail. I'm not too worried.)
Reading the answer
Okay, great, we've asked the terminal where it is, and it responds.
How do we read the answer?
That was actually the trickiest part.
First, you have to write to /dev/tty, not just stdout.
Second, you need the output to be available for your program to read,
not just echo in the terminal for the user to see. Setting the tty
to noncanonical mode
does that.
Third, you can't just do a normal blocking read of stdin -- it'll
never return. Instead, put stdin into non-blocking mode and use
select()
to see when there's something available to read.
And of course, you have to make sure you reset the terminal back
to normal canonical line-buffered mode when you're done, whether
or not your read succeeds.
Once you do all that, you can read the output, which will look
something like "\e[32;80R". The two numbers, of course, are the
lines and columns values you want; ignore the rest.
stty in python
Oh, yes, and one other thing: once you've read the terminal size,
how do you set the stty size appropriately? You can't just run
system('stty rows %d' % (rows) seems like it should work,
but it doesn't, probably because it's using stdout instead of /dev/tty.
But I did find one way to do it, the enigmatic:
Update, 2017:
Turns out this doesn't quite work in Python 3, but I've updated the
script, so use the code in the script rather than copying and pasting
from this article. The explanation of the basic method hasn't changed.
Checking versions in Debian-based systems is a bit of a pain.
This happens to me a couple of times a month: for some reason I need
to know what version of something I'm currently running -- often a
library, like libgtk. aptitude show
will tell you all about a package -- but only if you know its exact name.
You can't do aptitude show libgtk or even
aptitude show '*libgtk*' -- you have to know that the
package name is libgtk2.0-0. Why is it libgtk2.0-0? I have no idea,
and it makes no sense to me.
So I always have to do something like
aptitude search libgtk | egrep '^i' to find out what
packages I have installed that matches the name libgtk, find the
package I want, then copy and paste that name after typing
aptitude show.
But it turns out it's super easy in Python to query Debian packages using the
Python
apt package. In fact, this is all the code you need:
import sys
import apt
cache = apt.cache.Cache()
pat = sys.argv[1]
for pkgname in cache.keys():
if pat in pkgname:
pkg = cache[pkgname]
instver = pkg.installed
if instver:
print pkg.name, instver.version
Then run aptver libgtk and you're all set.
In practice, I wanted nicer formatting, with columns that lined up, so
the actual script is a little longer. I also added a -u flag to show
uninstalled packages as well as installed ones. Amusingly, the code to
format the columns took about twice as many lines as the code that does the
actual work. There doesn't seem to be a standard way of formatting
columns in Python, though there are lots of different implementations
on the web. Now there's one more -- in my
aptver
on github.
We've been considering the possibility of moving out of the Bay Area
to somewhere less crowded, somewhere in the desert southwest we so
love to visit. But that also means moving to somewhere
with much harsher weather.
How harsh? It's pretty easy to search for a specific location and get
average temperatures. But what if I want to make a table to compare
several different locations? I couldn't find any site that made
that easy.
No problem, I say. Surely there's a Python library, I say.
Well, no, as it turns out. There are Python APIs to get the current
weather anywhere; but if you want historical weather data, or weather
data averaged over many years, you're out of luck.
NOAA purports to have historical climate data, but the only dataset I
found was spotty and hard to use. There's an
FTP site containing
directories by year; inside are gzipped files with names like
723710-03162-2012.op.gz. The first two numbers are station numbers,
and there's a file at the top level called ish-history.txt
with a list of the station codes and corresponding numbers.
Not obvious!
Once you figure out the station codes, the files themselves are easy to
parse, with lines like
Each line represents one day (20120101 is January 1st, 2012),
and the codes are explained in another file called
GSOD_DESC.txt.
For instance, MAX is the daily high temperature, and SNDP is snow depth.
So all I needed was to decode the station names, download the right files
and parse them. That took about a day to write (including a lot of
time wasted futzing with mysterious incantations for matplotlib).
Little accessibility refresher: I showed it to Dave -- "Neat, look at
this, San Jose is the blue pair, Flagstaff is green and Page is red."
His reaction:
"This makes no sense. They all look the same to me. I have no idea
which is which."
Oops -- right. Don't use color as your only visual indicator. I knew that,
supposedly! So I added markers in different shapes for each site.
(I wish somebody would teach that lesson to Google Maps, which uses
color as its only indicator on the traffic layer, so it's useless
for red-green colorblind people.)
Back to the data --
it turns out NOAA doesn't actually have that much historical data
available for download. If you search on most of these locations,
you'll find sites that claim to have historical temperatures dating
back 50 years or more, sometimes back to the 1800s. But NOAA typically
only has files starting at about 2005 or 2006. I don't know where
sites are getting this older data, or how reliable it is.
Still, averages since 2006 are still interesting to compare.
Here's a run of noaatemps.py KSJC KFLG KSAF KLAM KCEZ KPGA KCNY.
It's striking how moderate California weather is compared
to any of these inland sites. No surprise there. Another surprise
was that Los Alamos, despite its high elevation, has more moderate weather
than most of the others -- lower highs, higher lows. I was a bit
disappointed at how sparse the site list was -- no site in Moab?
Really? So I used Canyonlands Field instead.
Anyway, it's fun for a data junkie to play around with, and it prints
data on other weather factors, like precipitation and snowpack, although
it doesn't plot them yet.
The code is on my
GitHub
scripts page, under Weather.
Anyone found a better source for historical weather information?
I'd love to have something that went back far enough to do some
climate research, see what sites are getting warmer, colder, or
seeing greater or lesser spreads between their extreme temperatures.
The NOAA dataset obviously can't do that, so there must be something
else that weather researchers use. Data on other countries would be
interesting, too. Is there anything that's available to the public?
One of the closing lightning talks at PyCon this year concerned the answers
to a list of Python programming puzzles given at some other point during
the conference. I hadn't seen the questions (I'm still not sure
where they are), but some of the problems looked fun.
One of them was: "What are the letters not used in Python keywords?"
I hadn't known about Python's keyword module, which could
come in handy some day:
But first you need a list of letters so can make a set out of it.
Split the list of words into a list of letters
My first idea was to use list comprehensions. You can split a single
word into letters like this:
>>> [ x for x in 'hello' ]
['h', 'e', 'l', 'l', 'o']
It took a bit of fiddling to get the right syntax to apply that to
every word in the list:
>>> [[c for c in w] for w in keyword.kwlist]
[['a', 'n', 'd'], ['a', 's'], ['a', 's', 's', 'e', 'r', 't'], ... ]
Update: Dave Foster points out that
[list(w) for w in keyword.kwlist] is another way,
simpler and cleaner way than the double list comprehension.
That's a list of lists, so it needs to be flattened into a single
list of letters before we can turn it into a set.
Flatten the list of lists
There are lots of ways to flatten a list of lists.
Here are four of them:
[item for sublist in [[c for c in w] for w in keyword.kwlist] for item in sublist]
reduce(lambda x,y: x+y, [[c for c in w] for w in keyword.kwlist])
import itertools
list(itertools.chain.from_iterable([[c for c in w] for w in keyword.kwlist]))
sum([[c for c in w] for w in keyword.kwlist], [])
But it turns out none of this list comprehension stuff is needed anyway.
set('word') splits words into letters already:
>>> set('bubble')
set(['e', 'b', 'u', 'l'])
Ignore the order -- elements of a set often end up displaying in some
strange order. The important thing is that it has all the letters
and no repeats.
Now we have an easy way of making a set containing the letters in
one word. But how do we apply that to a list of words?
Again I initially tried using list comprehensions, then realized
there's an easier way. Given a list of strings, it's trivial to
join them into a single string using ''.join(). And that gives us
our set of letters within keywords:
Almost done! But the original problem was to find the letters not in
keywords. We can do that by subtracting this set from the set of all
letters from a to z. How do we get that? The string
module will give us a list:
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz'
You could also use a list comprehension and ord and
chr (alas, range won't give you a range of
letters directly):
I'm at PyCon, and I spent a lot of the afternoon in the Raspberry Pi lab.
Raspberry Pis are big at PyCon this year -- because everybody at
the conference got a free RPi! To encourage everyone to play, they
have a lab set up, well equipped with monitors, keyboards, power
and ethernet cables, plus a collection of breadboards, wires, LEDs,
switches and sensors.
I'm primarily interested in the RPi as a robotics controller,
one powerful enough to run a camera and do some minimal image processing
(which an Arduino can't do).
And on Thursday, I attended a PyCon tutorial on the Python image processing
library SimpleCV.
It's a wrapper for OpenCV that makes it easy to access parts of images,
do basic transforms like greyscale, monochrome, blur, flip and rotate,
do edge and line detection, and even detect faces and other objects.
Sounded like just the ticket, if I could get it to work on a Raspberry Pi.
SimpleCV can be a bit tricky to install on Mac and Windows, apparently.
But the README on the SimpleCV
git repository gives an easy 2-line install for Ubuntu. It doesn't
run on Debian Squeeze (though it installs), because apparently it
depends on a recent version of pygame and Squeeze's is too old;
but Ubuntu Pangolin handled it just fine.
The question was, would it work on Raspbian Wheezy? Seemed like a
perfect project to try out in the PyCon RPi lab. Once my RPi was
set up and I'd run an apt-get update, I used
used netsurf (the most modern of the lightweight browsers available
on the RPi) to browse to the
SimpleCV
installation instructions.
The first line,
failed miserably. Seems that pip likes to put its large downloaded
files in /tmp; and on Raspbian, running off an SD card, /tmp quite
reasonably is a tmpfs, running in RAM. But that means it's quite small,
and programs that expect to be able to use it to store large files
are doomed to failure.
I tried a couple of simple Linux patches, with no success.
You can't rename /tmp to replace it with a symlink to a directory on the
SD card, because /tmp is always in use. And pip makes a new temp directory
name each time it's run, so you can't just symlink the pip location to
a place on the SD card.
I thought about rebooting after editing the tmpfs out of /etc/fstab,
but it turns out it's not set up there, and it wasn't obvious how to
disable the tmpfs. Searching later from home, the size is
set in /etc/default/tmpfs. As for disabling the tmpfs and using the
SD card instead, it's not clear. There's a block of code in
/etc/init.d/mountkernfs.sh that makes that decision; it looks like
symlinking /tmp to somewhere else might do it, or else commenting out
the code that sets RAMTMP="yes". But I haven't tested that.
Instead of rebooting, I downloaded the file to the SD card:
That worked, and the resulting SimpleCV install worked nicely!
I typed some simple tests into the simplecv shell, playing around
with their built-in test image "lenna":
This is an edited and updated version of my "Shallow Sky" column
this month in the
SJAA Ephemeris newsletter.
A few months ago, I got email from a Jupiter observer
calling my attention to an interesting phenomenon of Jupiter's moons
that I hadn't seen before. The person who mailed me described himself
as a novice, and wasn't quite sure what he had seen, but he knew it
was odd. After some further discussion we pinned it down.
He was observing Jupiter at 11/11/12 at 00.25 UT (which would have
been mid-afternoon here in San Jose). Three of the moons were
visible, with only Ganymede missing. Then Ganymede appeared: near
Jupiter's limb, but not right on it. As he watched over the next
few minutes, Ganymede seemed to be moving backward -- in toward Jupiter
rather than away from it. Eventually it disappeared behind the planet.
It turned out that what he was seeing was the end of an eclipse.
Jupiter was still a few months away from opposition, so the shadow
thrown by the planet streamed off to one side as seen from our
inner-planet vantage point on Earth. At 0:26 UT on that evening, long
before he started observing, Ganymede, still far away from Jupiter's
limb, had entered Jupiter's shadow and disappeared into eclipse. It
took over two hours for Ganymede to cross Jupiter's shadow; but at
2:36, when it left the shadow, it hadn't yet disappeared behind the
planet. So it became visible again. It wasn't until 2:50
that Ganymede finally disappeared behind Jupiter.
So it was an interesting effect -- bright Ganymede appearing out of
nowhere, moving in toward Jupiter then disappearing again fourteen
minutes later. It was something I'd never seen, or thought to look for.
It's sort of like playing Whac-a-mole -- the moon appears only
briefly, so you've got to hit it with your telescope at just the right
time if you want to catch it before it disappears again.
A lot of programs don't show this eclipse effect -- including, I'm sad
to say, my own Javascript
Jupiter's moons web page. (I have since remedied that.)
The open source program Stellarium shows the effect; on the web,
Sky and Telescope's Jupiter's Moons page shows it, and even prints out
a table of times of various moon events, including eclipses.
These eclipse events aren't all that uncommon -- but only when the sun
angle is just right.
Searching in late February and early March this year, I found
several events for Ganymede and Europa (though, sadly, many of them were
during our daytime). By mid-March, the angles have changed so that
Europa doesn't leave Jupiter's shadow until after it's disappeared
behind the planet's limb; but Ganymede is farther out, so we can see
Ganymede appearances all the way through March and for months after.
The most interesting view, it seems to me, is right on the boundary
when the moon only appears for a short time before disappearing again.
Like the Europa eclipse that's happening this Sunday night, March 10.
Reporting on that one got a little tricky -- because that's the day we
switch to Daylight Savings time. I have to confess that I got a little
twisted up trying to compare results between programs that use UTC and
programs that use local time -- especially when the time zone converter
I was using to check my math told me "That time doesn't exist!"
Darnit, if we'd all just use UTC all the time, astronomy calculations
would be a lot easier! (Not to mention dropping the silly Daylight
Savings Time fiasco, but that's another rant.)
Before I go into the details, I want to point out that Jupiter's moons
are visible even in binoculars. So even if you don't have a telescope,
grab binoculars and set them up in as steady a way as you can -- if
you don't have a tripod adaptor, try bracing them on the top of a
gate or box.
On Sunday night, March 10, at some time around 7:40 pm PDT,
Europa peeks out from behind Jupiter's northeast limb.
(All times are given in PDT; add 7 hours for GMT.)
The sky will still be bright here in California -- the
sun sets at 7:12 that night -- but Jupiter will be 66 degrees up and
well away from the sun, so it shouldn't give you too much trouble.
Once Europa pops out, keep a close eye on it -- because if Sky & Tel's
calculations are right, it will disappear again just four minutes
later, at 7:44, into eclipse in Jupiter's shadow. It will remain
invisible for almost three hours, finally reappearing out of nowhere,
well off Jupiter's limb, at around 10:24 pm.
I want to stress that those times are approximate. In fact,
I tried simulating the event in several different programs, and got
wildly varying times:
Io disappears
Europa appears
Europa disappears
Europa reappears
Io appears
XEphem
7:15
7:43
7:59
10:06
10:43
S&T Jupiter's Moons
7:16
7:40
7:44
10:24
10:48
Javascript Jupiter
7:17
7:45
7:52
10:15
10:41
Stellarium
6:21
6:49
7:05
9:32
10:01
You'll note Stellarium seems to have a time zone problem ...
maybe because I ran the prediction while we were still in standard time,
not daylight savings time.
I'm looking forward to timing the events to see which program is
most accurate. I'm betting on XEphem. Once I know the real times,
maybe I can adjust my Javascript Jupiter's code to be more accurate.
If anyone else times the event, please send me your times, in case
something goes wrong here!
Anyway, the spread of times makes it clear that when observing this
sort of phenomenon, you should always set up the telescope ten or
fifteen minutes early, just in case. And ten extra minutes spent
observing Jupiter -- even without moons -- is certainly never
time wasted! Just keep an eye out for Europa to appear -- and be
ready to whack that moon before it disappears again.
I'm excited about my new project: MetaPho, an image tagger.
It arose out of a discussion on the LinuxChix Techtalk list:
photo collection management software.
John Sturdy was looking for an efficient way of viewing and tagging
large collections of photos. Like me, he likes fast, lightweight,
keyboard-driven programs. And like me, he didn't want a database-driven
system that ties you forever to one image cataloging program.
I put my image tags in plaintext files, named Keywords, so that
I can easily write scripts to search or modify them, or user grep,
and I can even make quick changes with a text editor.
I shared some tips on how I use my
Pho image viewer
for tagging images, and it sounded close to what he was looking for.
But as we discussed ideas about image tagging, we realized that
there were things he wanted to do that pho doesn't do well, things
not offered by any other image tagger we've been able to find.
While discussing how we might add new tagging functionality to pho,
I increasingly had the feeling that I was trying to fit off-road
tires onto a Miata -- or insert your own favorite metaphor for "making
something do something it wasn't designed to do."
Pho is a great image viewer, but the more I patched it to handle tagging,
the uglier and more complicated the code got, and it also got more
complex to use.
And really, everything we needed for tagging could be easily done in
a Python-GTK application. (Pho is written in C because it does a lot
of complicated focus management to deal with how window managers
handle window moving and resizing. A tagger wouldn't need any of that.)
I whipped up a demo image viewer in a few hours and showed it to John.
We continued the discussion, I made a GitHub repo, and over the next
week or so the code grew into an efficient and already surprisingly usable
image tagger.
We have big plans for it, like tags organized into categories so we
can have lots of tags without cluttering the interface too much.
But really, even as it is, it's better than anything I've used before.
I've been scanning in lots of photos from old family albums
(like this one of my mother and grandmother, and me at 9 months)
and it's been great to be able to add and review tags easily.
If you want to check out MetaPho, or contribute to it (either code or
user interface design), it lives in my
MetaPho
repository on GitHub.
And I wrote up a quick man page in markdown format:
metapho.1.md.
I'm fiddling with a serial motor controller board, trying to get it
working with a Raspberry Pi. (It works nicely with an Arduino, but
one thing I'm learning is that everything hardware-related is far
easier with Arduino than with RPi.)
The excellent Arduino library helpfully
provided by Pololu
has a list of all the commands the board understands. Since it's Arduino,
they're in C++, and look something like this:
... and so on ... with an indent at the beginning of each line since
I want this to be part of a class.
There are 32 #defines, so of course, I didn't want to make all those
changes by hand. So I used vim. It took a little fiddling -- mostly
because I'd forgotten that vim doesn't offer + to mean "one or more
repetitions", so I had to use * instead. Here's the expression
I ended up with:
.,$s/\#define *\([A-Z0-9_]*\) *\(.*\)/ \1 = \2/
In English, you can read this as:
From the current line to the end of the file (,.$/),
look for a pattern
consisting of only capital letters, digits and underscores
([A-Z0-9_]). Save that as expression #1 (\( \)).
Skip over any spaces, then take the rest of the line (.*),
and call it expression #2 (\( \)).
Then replace all that with a new line consisting of 4 spaces,
expression 1, a spaced-out equals sign, and expression 2
( \1 = \2).
Who knew that all you needed was a one-line regular expression to
translate C into Python?
(Okay, so maybe it's not quite that simple.
Too bad a regexp won't handle the logic inside the library as well,
and the pin assignments.)
We were marveling at how early it's getting dark now -- seems like
a big difference even compared to a few weeks ago. Things change fast
this time of year.
Since we're bouncing back and forth a lot between southern and northern
California, Dave wondered how Los Angeles days differed from San Jose days.
Of course, San Jose being nearly 4 degrees farther north, it should
have shorter days -- but by the weirdness of orbital mechanics that
doesn't necessarily mean that the sun sets earlier in San Jose.
His gut feel was that LA was actually getting an earlier sunset.
"I can calculate that," I said, and fired up a Python interpreter
to check with PyEphem. Since PyEphem doesn't know San Jose (hmph!
San Jose is bigger than San Francisco) I used San Francisco.
Since PyEphem's Observer class only has next_rising() and next_setting(),
I had to set a start date of midnight so I could subtract the two dates
properly to get the length of the day.
So Dave's intuition was right: northern California really does have a
later sunset than southern California at this time of year, even
though the total day length is shorter -- the difference in sunrise
time makes up for the later sunset.
But one thing that's been annoying me about it -- it was a problem
with the old perl alert script too -- is the repeated sounds.
If lots of twitter updates come in on the Bitlbee channel, or if
someone pastes numerous lines into a channel, I hear POPPOPPOPPOPPOPPOP
or repetitions of whatever the alert sound is for that type of message.
It's annoying to me, but even more so to anyone else in the same room.
It would be so much nicer if I could have it play just one repetition
of any given alert, even if there are eight lines all coming in at the
same time. So I decided to write a Python class to handle that.
My existing code used subprocesses to call the basic ALSA sound player,
/usr/bin/aplay -q.
Initially I used if not os.fork() : os.execl(APLAY, APLAY, "-q", alertfile)
but I later switched to the cleaner subprocess.call([APLAY, '-q', alertfile])
But of course, it would be better to do it all from Python without
requiring an external process like aplay. So I looked into that first.
Sadly, it turns out Python audio support is a mess. The built-in libraries
are fairly limited in functionality and formats, and the external
libraries that handle sound are mostly unmaintained, unless you want
to pull in a larger library like pygame. After a little web searching
I decided that maybe an aplay subprocess wasn't so bad after all.
Okay, so how should I handle the subprocesses? I decided the best way was
to keep track of what sound was currently playing. If another alert fires
for the same sound while that sound is already playing, just ignore it.
If an alert comes in for a different sound, then wait() for the
current sound to finish, then start the new sound.
That's all quite easy with Python's subprocess module.
subprocess.Popen() returns a Popen object that tracks
a process ID and can check whether that process has finished or not.
If self.curpath is the path to the sound currently playing
and self.current is the Popen object for whatever aplay process
is currently running, then:
if self.current :
if self.current.poll() == None :
# Current process hasn't finished yet. Is this the same sound?
if path == self.curpath :
# A repeat of the currently playing sound.
# Don't play it more than once.
return
else :
# Trying to play a different sound.
# Wait on the current sound then play the new one.
self.wait()
self.curpath = path
self.current = subprocess.Popen([ "/usr/bin/aplay", '-q', path ] )
Finally, it's a good idea when exiting the program to check whether
any aplay process is running, and wait() for it. Otherwise, you might
end up with a zombie aplay process.
def __del__(self) :
self.wait()
I don't know if xchat actually closes down Python objects gracefully,
so I don't know whether the __del__ destructor will actually be called.
But at least I tried. It's possible that a
context
manager might be more reliable.
The full scripts are on github at
pyplay.py
for the basic SoundPlayer class, and
chatsounds.py
for the xchat script that includes SoundPlayer.
I use xchat as my IRC client. Mostly I like it, but its sound alerts
aren't quite as configurable as I'd like. I have a few channels, like
my Bitlbee Twitter feed, where I want a much more subtle alert, or no
alert at all. And I want an easy way of turning sounds on and off,
in case I get busy with something and need to minimize distractions.
Years ago I grabbed a perl xchat plug-in called "Smet's NickSound"
that did something close to what I wanted. I've hacked a few things
into it. But every time I try to customize it any further, I'm hit
with the pain of write-only Perl. I've written Perl scripts, honest.
But I always have a really hard time reading anyone else's Perl code
and figuring out what it's doing. When I dove in again recently to
try to figure out why I was getting so many alerts when first starting
up xchat, I finally decided: learning how to write a Python xchat
script couldn't be any harder than reverse engineering a Perl one.
First, of course, I looked for an existing nick sound Python script ...
and totally struck out. In fact, mostly I struck out on finding any
xchat Python scripts at all. I know there are
Python bindings for
xchat, because there's documentation for them. But sample plug-ins?
Nope. For some reason, nobody's writing xchat plug-ins in Python.
I eventually found two minimal examples:
this very
simple example and the more elaborate
utf8decoder.
I was able to put them together and cobble up a working nick sound plug-in.
It's easy once you have an example to work from to help you figure out
the event hook arguments.
So here's my own little example, which may help the next person trying
to learn xchat Python scripting:
chatsounds.py
on github.
When I'm using my RSS reader
FeedMe,
I normally check every feed every day. But that can be wasteful: some
feeds, like World Wide Words,
only update once a week.
A few feeds update even less often, like serialized books that come
out once a month or whenever the author has time to add something new.
So I decided it would be nice to add some "when" logic to FeedMe,
so I could add when = Sat in the config section for World
Wide Words and have it only update once a week.
That sounded trivial -- a little python parsing logic to tell days from
numbers, a few calls to time.localtime() and I was done.
Except of course I wasn't. Because sometimes, like when I'm on vacation,
I don't always update every day. If I missed a Saturday, then I'd
never see that week's edition of World Wide Words. And that would
be terrible!
So what I really needed was a way to ask, "Has a Saturday occurred
(including today) since the last time I ran feedme?"
The last time I ran feedme is easy to determine: it's in the last
modified date of the cache file. Or, in more Pythonic terms, it's
statbuf = os.stat(cachefile).st_mtime. And of course
I can get the current time with time.localtime().
But how do I figure out whether a given week or month day falls
between those two dates?
I'm sure this particular wheel has been invented many times. There's
probably even a nifty Python library somewhere to do it. But how
do you google for that? I tried to think of keywords and found nothing.
So I went for a nice walk in the redwoods and thought about it for a bit,
and came up with a solution.
Turns out for the week day case, you can just use modular arithmetic:
if (weekday_2 - target_weekday) % 7 < (weekday_2 - weekday_1)
then the day does indeed fall between the two dates.
Things are a little more complicated for the day of the month, though,
because you don't know whether you need mod 30 or 31 or 29 or 28,
so you either have to make your own table, or import the calendar module
just so you can call calendar.monthrange().
I decided it was easier to use logic:
if the difference between the two dates is
greater than 31, then it definitely includes any month day. Otherwise,
check whether they're in the same month or not, and do greater than/less
than comparisons on the three dates.
Throw in a bunch of conversion to make it easy to call, and a bunch of
unit tests to make sure everything works and my later tweaks don't
break anything, and I had a nice function I could call from Feedme.
I decided to give myself a birthday present and release version 0.9.8 of
Pho, my image viewer,
at long last.
I've been using it essentially unchanged for many months now,
occasionally tweaking things or fixing minor bugs ... but I haven't
run into any bugs in quite a while, and think I've fixed all the
pending ones. Been meaning to make a release for a long time, but
somehow I keep getting sidetracked and forgetting about it.
This should rationalize the version number again ... the official
releases have been 0.9.7-preN forever, but there was an unofficial
0.9.7 and even a 0.9.8 that snuck in along with some patches I got
from David Gardner. It's been confusing. So now it's officially
0.9.8, and any figure versions will start with 0.9.9, and we might
even see a 1.0 one of these days. (I suppose it's time -- Pho is ten
years old!)
So here it is: Pho 0.9.8.
I think it's working well. If you're already a Pho user, or if
you want a lightweight image viewer that's also good at triaging and
annotating large batches of images, you might want to take a look.
In a discussion on Google+arising from my
Save/Export
clean plug-in, someone said to the world in general
PLEASE provide an option to select the size of the export. Having to
scale the XCF then export then throw out the result without saving is
really awkward.
I thought, What a good idea! Suppose you're editing a large image, with
layers and text and other jazz, saving in GIMP's native XCF format,
but you want to export a smaller version for the web. Every time you
make a significant change, you have to: Scale (remembering the scale
size or percentage you're targeting);
Save a Copy (or Export in GIMP 2.8);
then Undo the Scale.
If you forget the Undo, you're in deep trouble and might end up
overwriting the XCF original with a scaled-down version.
If I had a plug-in that would export to another file type (such
as JPG) with a scale factor, remembering that scale factor
so I didn't have to, it would save me both effort and risk.
And that sounded pretty easy to write,
using some of the tricks I'd learned from my Save/Export Clean
and wallpaper
scripts.
So I wrote export-scaled.py
Update: Please consider using
Saver instead.
Saver integrates the various functions I used to have in different
save/export plug-ins; it should do everything export-scaled.py does
and more, and export-scaled.py is no longer maintained.
If you need something export-scaled.py does that saver doesn't
do as well, please let me know.
It's still brand new, so if anyone tries it, I'd appreciate knowing
if it's useful or if you have any problems with it.
Geeky programming commentary
(Folks not interested in the programming details can stop reading now.)
Linked input fields
One fun project was writing a set of linked text entries for the dialog:
Scale to:
Percentage 100 %
Width: 640
Height: 480
Change any one of the three, and the other two change automatically.
There's no chain link between width and height:
It's assumed that if you're exporting a scaled copy, you won't want
to change the image's aspect ratio, so any one of the three is enough.
That turned out to be surprisingly hard to do with GTK SpinBoxes:
I had to read their values as strings and parse them,
because the numeric values kept snapping back
to their original values as soon as focus went to another field.
Image parasites
Another fun challenge was how to save the scale ratio, so the second
time you call up the plug-in on the same image it uses whatever values
you used the first time. If you're going to scale to 50%, you don't
want to have to type that in every time. And of course, you want it
to remember the exported file path, so you don't have to navigate
there every time.
For that, I used GIMP parasites: little arbitrary pieces of data you
can attach to any image. I've known about parasites for a long time,
but I'd never had occasion to use them in a Python plug-in before.
I was pleased to find that they were documented in the
official GIMP
Python documentation, and they worked just as documented.
It was easy to test them, too: in the Python console
(Filters->Python->Console...), type something like
My plug-in was almost done. But when I ran it and told it to save to
filenamecopy.jpg, it prompted me with that annoying JPEG
settings dialog.
Okay, being prompted once isn't so bad. But then
when I exported a second time, it prompted me again,
and didn't remember the values from before.
So the question was, what controls whether the settings dialog is
shown, and how could I prevent it?
Of course, I could prompt the user for JPEG quality, then call
jpeg-save-file directly -- but what if you want to export to PNG
or GIF or some other format? I needed something more general
Turns out, nobody really remembers how this works, and it's not
documented anywhere. Some people thought that passing
run_mode=RUN_WITH_LAST_VALS when I called
pdb.gimp_file_save() would do the trick, but it didn't help.
So I guessed that there might be a parasite that was storing those
settings: if the JPEG save plug-in sees the parasite, it uses those
values and doesn't prompt. Using the Python console technique I just
mentioned, I tried checking the parasites on a newly created image
and on an image read in from an existing JPG file, then saving each
one as JPG and checking the parasite list afterward.
Bingo! When you read in a JPG file, it has a parasite called
'jpeg-settings'. (The new image doesn't have this, naturally).
But after you write a file to JPG from within GIMP, it has not
only 'jpeg-settings' but also a second parasite, 'jpeg-save-options'.
So I made the plug-in check the scaled image after saving it,
looking for any parasites with names ending in either -settings
or -save-options; any such parasites are copied to the
original image. Then, the next time you invoke Export Scaled, it does
the same search, and copies those parasites to the scaled image before
calling gimp-file-save.
That darned invisible JPG settings dialog
One niggling annoyance remained.
The first time you get the JPG settings dialog, it
pops up invisibly, under the Export dialog you're using. So if
you didn't know to look for it by moving the dialog, you'd think the
plug-in had frozen. GIMP 2.6 had a bug where that happened every time
I saved, so I assumed there was nothing I can do about it.
GIMP 2.8 has fixed that bug -- yet it still happened
when my plug-in called gimp_file_save: the JPG dialog popped
up under the currently active dialog, at least under Openbox.
There isn't any way to pass window IDs through gimp_file_save so
the JPG dialog pops up as transient to a particular window. But a few
days after I wrote the export-scaled, I realized there was still
something I could do: hide the dialog when the user clicks Save.
Then make sure that I show it again if any errors occur during saving.
Of course, it wasn't quite that simple. Calling chooser.hide()
by itself does nothing, because X is asynchronous and things don't happen
in any predictable order. But it's possible to force X to sync the display:
chooser.get_display().sync().
I'm not sure how robust this is going to be -- but it seems to work well
in the testing I've done so far, and it's really nice to get that huge
GTK file chooser dialog out of the way as soon as possible.
In GIMP 2.8, the developers changed the way you save files. "Save" is
now used only for GIMP's native format, XCF (and compressed variants
like .xcf.gz and .xcf.bz2). Other formats that may lose information on
layers, fonts and other aspects of the edited image must be "Exported"
rather than saved.
This has caused much consternation and flameage on the gimp-user
mailing list, especially from people who use GIMP primarily for
simple edits to JPEG or PNG files.
I don't particularly like the new model myself. Sometimes I use GIMP
in the way the developers are encouraging, adding dozens of layers,
fonts, layer masks and other effects. Much more often, I use GIMP
to crop and rescale a handful of JPG photos I took with my camera on a hike.
While I found it easy enough to adapt to using
Ctrl-E (Export) instead of Ctrl-S (Save), it was annoying that when I
exited the app, I'd always get am "Unsaved images" warning, and it was
impossible to tell from the warning dialog which images were safely
exported and which might not have been saved or exported at all.
But flaming on the mailing lists, much as some people seem to enjoy it
(500 messages on the subject and still counting!)
wasn't the answer. The developers have stated very clearly that they're
not going to change the model back. So is there another solution?
Yes -- a very simple solution, in fact. Write a plug-in that saves or
exports the current image back to its current file name, then marks it
as clean so GIMP won't warn about it when you quit.
It turned out to be extremely easy to write, and you can get it here:
GIMP: Save/export
clean plug-in. If it suits your GIMP workflow, you can even
bind it to Ctrl-S ... or any other key you like.
Warning: I deliberately did not add any "Are you sure you want
to overwrite?" confirmation dialogs. This plug-in will overwrite
your current file, without asking for permission. After all, that's its
job. So be aware of that.
How it's written
Here are some details about how it works.
Non software geeks can skip the rest of this article.
When I first looked into writing this, I was amazed at how simple it was:
really just two lines of Python (plus the usual plug-in registration
boilerplate).
The first line saves the image back to its current filename.
(The gimp-file-save PDB call still handles all types, not just XCF.)
The second line marks the image as clean.
Both of those are PDB calls, which means that people who don't have
GIMP Python could write script-fu to do this part.
So why didn't I use script-fu? Because I quickly found that if I bound
the plug-in to Ctrl-S, I'd want to use it for new images -- images that
don't have a filename yet. And for that, you need to pop up some sort
of "Save as" dialog -- something Python can do easily, and Script-fu
can't do at all.
A Save-as dialog with smart directory default
I couldn't use the
standard GIMP save-as dialog: as far as I can tell, there's
no way to call that dialog from a plug-in.
But it turned out the GTK save-as dialog has no default directory to
start in: you have to set the starting directory every time. So I
needed a reasonable initial directory.
I didn't want to come up with some desktop twaddle like ~/My Pictures
or whatever -- is there really anyone that model fits? Certainly not me.
I debated maintaining a preference you could set, or saving the last
used directory as a preference, but that complicates things and I
wasn't sure it's really that helpful for most people anyway.
So I thought about where I usually want to save images in a GIMP session.
Usually, I want to save them to the same directory where I've been saving
other images in the same session, right?
I can figure that out by looping through all currently open images
with for img in gimp.image_list() : and checking
os.path.dirname(img.filename) for each one.
Keep track of how many times each directory is being used;
whichever is used the most times is probably where the user wants
to store the next image.
Keeping count in Python
Looping through is easy, but what's the cleanest, most Pythonic way
of maintaining the count for each directory and finding the most
popular one? Naturally, Python has a class for that,
collections.Counter.
Once I've counted everything, I can ask for the most common path.
The code looks a bit complicated because
most_common(1) returns a one-item list of a tuple of the single
most common path and the number of times it's been used -- for instance,
[ ('/home/akkana/public_html/images/birds', 5) ].
So the path is the first element of the first element, or
most_common(1)[0][0]. Put it together:
counts = collections.Counter()
for img in gimp.image_list() :
if img.filename :
counts[os.path.dirname(img.filename)] += 1
try :
return counts.most_common(1)[0][0]
except :
return None
So that's the only tricky part of this plug-in.
The rest is straightforward, and you can read the code on
GitHub:
save-export-clean.py.
Motor shields worked well -- but they cost around $50 each, more than
the Arduino itself. That's fine for a single one, but I'm teaching an
Arduino workshop (this Thursday!) for high school girls, and I needed
something I could buy more cheaply so I could get more of them.
(Incidentally, if any women in the Bay Area want to help with the
workshop this Thursday, June 28 2012,
I could definitely use a few more helpers! Please drop me an email.)
What I found on the web and on the Arduino IRC channel was immensely
confusing to someone who isn't an electronics expert -- most people
recommended things like building custom H-bridge circuits out of zener
diodes.
But it's not that complicated after all.
I got the help I needed from ITP Physical Computing's
DC Motor
Control Using an H-Bridge.
It turns out you can buy a chip called an SN754410 that implements an
H-bridge circuit -- including routing a power source to the motors
while keeping the Arduino's power supply isolated -- for under $2.
I ordered my
SN754410
chips from Jameco and they arrived the next day.
(Technically, the SN754410 is a "quad half-bridge" rather than an "dual
h-bridge". In practice I'm not sure of the difference. There's another
chip, the L298, which is a full h-bridge and is also cheap to buy --
but it's a bit harder to use because the pins are wonky and it doesn't
plug in directly to a breadboard unless you bend the pins around.
I decided to stick with the SN754410;
but the L298 might be better for high-powered motors.)
Now, the SN754410 isn't as simple to use as a motor shield. It has a
lot of wires -- for two motors, you'll need six Arduino output pins,
plus a 5v reference and ground, the four wires to the two motors,
and the two wires to the motor power supply. Here's the picture
of the wiring, made with Fritzing.
With all those wires, I didn't want to make the girls wire them up
themselves -- it's way too easy to make a mistake and connect the wrong
pin (as I found when doing my own testing). So I've wired up several of
them on mini-breadboards so they'll be more or less ready to use.
They look like little white mutant spiders with all the wires going
everywhere.
A simple library for half-bridge motor control
The programming for the SN754410 is a bit less simple than motor shields
as well. For each motor, you need an enable pin on the Arduino -- the
analog signal that controls the motor's speed, so it needs to be one
of the Arduino's PWM output pins, 9, 10 or 11 -- plus two logic pins,
which can be any of the digital output pins.
To spin the motor in one direction, set
the first logic pin high and the second low; to spin in the other
direction, reverse the pins, with the first one low and the second one high.
That's simple enough to program -- but I didn't look forward to trying
to explain it to a group of high school students with basically no
programming experience.
To make it simpler for them, I wrote a drop-in library that simplifies
the code quite a bit. It defines a Motor object that you can initialize
with the pins you'll be using -- the enable pin first, then the two logic pins.
Initialize them in setup() like this:
Then from your loop() function, you can make calls like this:
motors[0].setSpeed(128);
motors[1].setSpeed(-85);
Setting a negative speed will tell the library to reverse the logic pins
so the motor spins the opposite direction.
I hope this will make motors easier to deal with for the girls who
choose to try them. (We'll be giving them a choice of projects, so
some of them may prefer to make light shows with LEDs, or
music with piezo buzzers.)
You can get the code for the HalfBridge library, and a sample sketch
that uses it, at
my
Arduino github repository
Cheap and easy motor control -- and I have a fleet of toy cars to
connect to them. I hope this ends up being a fun workshop!
My epub Books folder is starting to look like my physical bookshelf at
home -- huge and overflowing with books I hope to read some day.
Mostly free books from the wonderful
Project Gutenberg and
DRM-free books from publishers and authors who support that model.
With the Nook's standard library viewer that's impossible to manage.
All you can do is sort all those books alphabetically by title or author
and laboriously page through, some five books to a page, hoping the
one you want will catch your eye. Worse, sometimes books show up in
the author view but don't show up in the title view, or vice versa.
I guess Barnes & Noble think nobody keeps more than ten or so
books on their shelves.
Fortunately on my rooted Nook I have the option of using better
readers, like FBreader and Aldiko, that let me sort by tags.
If I want to read something about the Civil War, or Astronomy, or just
relax with some Science Fiction, I can browse by keyword.
Well, in theory. In practice, tagging of ebooks is inconsistent
and not very useful.
I can understand wanting to tag details like this, but
few of those tags are helpful when I'm browsing books on
my little handheld device. I can't imagine sitting
down to read and thinking,
"Let's see, what books do I have on Interracial marriage? Or Saltwater
fishing? No, on second thought I'd rather read some fiction set in the
time of Edward VI, King of England, 1537-1553."
And of course, with over 90 books loaded on my ebook readers, it means
I have hundreds of entries in my tags list,
with few of them including more than one book.
Clearly what I needed to do was to change the tags on my ebooks.
Viewing and modifying epub tags
That ought to be simple, right? But ebooks are still a very young
technology, and there's surprisingly little software devoted to them.
Calibre can probably do it if you don't mind maintaining your whole
book collection under calibre; but I like to be able to work on files
one at a time or in small groups. And I couldn't find a program that
would let me do that.
What to do? Well, epub is a fairly simple XML format, right?
So modifying it with Python shouldn't that hard.
Managing epub in Python
An epub file is a collection of XML files packaged in a zip archive.
So I unzipped one of my epub books and poked around. I found the tags
in a file called content.opf, inside a <metadata> tag.
They look like this:
<dc:subject>Science fiction</dc:subject>
So I could use Python's
zipfile module
to access the content.opf file inside the zip archive, then use the
xml.dom.minidom
parser to get to the tags. Writing a script to display existing tags
was very easy.
What about replacing the old, unweildy tag list with new, simple tags?
It's easy enough to add nodes in Python's minidom.
So the trick is writing it back to the epub file.
The zipfile module doesn't have a way to modify a zip file
in place, so I created a new zip archive and copied files from the
old archive to the new one, replacing content.opf with a new
version.
Python's difficulty with character sets in XML
But I hit a snag in writing the new content.opf.
Python's XML classes have a toprettyxml() method to write the contents
of a DOM tree. Seemed simple, and that worked for several ebooks ...
until I hit one that contained a non-ASCII character. Then Python threw
a UnicodeEncodeError: 'ascii' codec can't encode character
u'\u2014' in position 606: ordinal not in range(128).
Of course, there are ways (lots of them) to encode that output string --
I could do
Except ... what should I pass as the encoding?
The content.opf file started with its encoding: <?xml version='1.0' encoding='UTF-8'?>
but Python's minidom offers no way to get that information.
In fact, none of Python's XML parsers seem to offer this.
Since you need a charset to avoid the UnicodeEncodeError,
the only options are (1) always use a fixed charset, like utf-8,
for content.opf, or (2) open content.opf and parse the
charset line by hand after Python has already parsed the rest of the file.
Yuck! So I chose the first option ... I can always revisit that if the utf-8
in content.opf ever causes problems.
The final script
Charset difficulties aside, though, I'm quite pleased with my epubtags.py
script. It's very handy to be able to print tags on any .epub file,
and after cleaning up the tags on my ebooks, it's great to be
able to browse by category in FBreader. Here's the program:
epubtag.py.
You can distribute a .deb file that people can download and install;
but it's a lot easier for people to install if you set up a repository,
so they can get automatic updates from you.
If you're targeting Ubuntu, the best way to do that is to set up a
Launchpad Personal
Package Archive, or PPA.
Create your PPA
First, create your PPA.
If you don't have a Launchpad account yet,
create one, add a GPG key, and sign the Code of Conduct.
Then log in to your account and click on Create a new PPA.
You'll have to pick a name and a display name for your PPA.
The default is "ppa", and many people leave personal PPAs as that.
You might want to give it a display name of yourname-ppa
or something similar if it's for a collection of stuff;
or you're only going to use it for software related to one program or
package, name it accordingly.
Ubuntu requires nonstandard paths
When you're creating your package with stdeb,
if you're ultimately targeting a PPA, you'll only need the souce dsc
package, not the binary deb.
But as you'll see, you'll need to rebuild it to make Launchpad happy.
If you're intending to go through the
Developer.ubuntu.com
process, there are specific requirements for version
numbering and tarball naming -- see "Packaging" in the
App
Review Board Guidelines.
Your app will also need to install unusual locations --
in particular, any files it installs, including the script itself,
need to be in
/opt/extras.ubuntu.com/<packagename> instead of a more
standard location.
How the user is supposed to run these apps (run a script to add
each of /opt/extras.ubuntu.com/* to your path?) is not clear to me;
I'm not sure this app review thing has been fully thought out.
In any case, you may need to massage your setup.py accordingly,
and keep a separate version around for when you're creating the
Ubuntu version of your app.
Okay, now comes the silly part. You know that source .dsc package
you just made? Now you have to unpack it and "build" it before you
can upload it. That's partly because you have to sign it
with your GPG key -- stdeb apparently can't do the signing step.
Normally, you'd sign a package with
debsign deb_dist/packagename_version.changes
(then type your GPG passphrase when prompted).
Unfortunately, that sort of signing doesn't work here.
If you used stdeb's bdist_deb to generate both binary and
source packages, the .changes file it generates will contain
both source and binary and Launchpad will reject it.
If you used sdist_dsc to generate only the source package,
then you don't have a .changes file to sign and submit to Launchpad.
So here's how you can make a signed, source-only .changes file
Launchpad will accept.
Since this will extract all your files again, I suggest doing this in
a temporary directory to make it easier to clean up afterward:
$ mkdir tmp
$ cd tmp
$ dpkg-source -x ../deb_dist/packagename_version.dsc
$ cd packagename_version
Now is a good time to take a look at the
deb_dist/packagename_version/debian/changelog that stdeb created,
and make sure it got the right version and OS codename for the
Ubuntu release you're targeting -- oneiric, precise, quantal or whatever.
stdeb's default is "unstable" (Debian) so you'll probably need to change it.
You can cross-check this information in the
deb_dist/packagename_version.changes file, which is the file
you'll actually be uploading to the PPA.
Finally, build and sign your source package:
$ debuild -S -sa
[type your GPG passphrase when prompted, twice]
$ dput ppa:yourppa ../packagename_version_source.changes
This will give you some output and eventually probably tell you
Successfully uploaded packages.
It's lying -- it may have failed. Watch your inbox
for messages. If Launchpad rejects your changes, you should get an
email fairly quickly.
If Launchpad accepts the changes, you'll get an Accepted email.
Great! But don't celebrate quite yet. Launchpad still has to build
your package before it can be installed. If you try to add your PPA
now, you'll get a 404.
Wait for Launchpad to build
You might as well add your repository now so you can install from it
once it's ready:
$ sudo add-apt-repository ppa:your-ppa-name
But don't apt-get update yet!
if you try that too soon, you'll get a 404, or an Ign meaning
that the repository exists but there are no packages in it for
your architecture.
It might be as long as a few hours before Launchpad builds your package.
To keep track of this, go to your Launchpad PPA page (something like
https://launchpad.net/~yourname/+archive/ppa) and look under
PPA Statistics for something like "1 package waiting to build".
Click on that link, then in the page that comes up, click on the link
like i386 build of pkgname version in ubuntu precise RELEASE.
That should give you a time estimate.
Wondering why it's being built for i386 when Python should be
arch independent? Worry not -- that's just the architecture that's
doing the building. Once it's built, your package should install anywhere.
Once the Launchpad build page finally says the package is built,
it's finally safe to run the usual apt-get update.
Add your key
But when you apt-get update you may get an error like this:
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 16126D3A3E5C1192
Obviously you have your own public key, so what's up?
You have to import the key from Ubuntu's keyserver,
and then export it into apt-key, before apt can use it --
even if it's your own key.
For this, you need the last 8 digits given in the NO PUBKEY message.
Take those 8 digits and run these two commands:
I'm told that apt-add-repository is supposed to add the key automatically,
but it didn't for me. Maybe it will if you wait until after your package
is built before calling apt-add-repository.
Now if you apt-get update, you should see no errors.
Finally, you can apt-get install pkgname.
Congratulations! You have a working PPA package.
I write a lot of little Python scripts. And I use Ubuntu and Debian.
So why aren't any of my scripts packaged for those distros?
Because Debian packaging is absurdly hard, and there's very little
documentation on how to do it. In particular, there's no help on how
to take something small, like a Python script,
and turn it into a package someone else could install on a Debian
system. It's pretty crazy, since
RPM
packaging of Python scripts is so easy.
Recently at the Ubuntu Developers' Summit, Asheesh of OpenHatch pointed me toward
a Python package called stdeb that simplifies a lot of the steps
and makes Python packaging fairly straightforward.
You'll need a setup.py file to describe your Python script, and
you'll probably want a .desktop file and an icon.
If you haven't done that before, see my article on
Packaging Python for MeeGo
for some hints.
Then install python-stdeb.
The package has some requirements that aren't listed
as dependencies, so you'll need to install:
apt-get install python-stdeb fakeroot python-all
(I have no idea why it needs python-all, which installs only a
directory /usr/share/doc/python-all with some policy
documentation files, but if you don't install it, stdeb will fail later.)
Now create a config file for stdeb to tell it what Debian/Ubuntu version
you're going to be targeting, if it's anything other than Debian unstable
(stdeb's default).
Unfortunately, there seems to be no way to pass this on the command
line rather than in a config file. So if you want to make packages for
several distros, you'll have to edit the config
file for every distro you want to support.
Here's what I'm using for Ubuntu 12.04 Precise Pangolin:
[DEFAULT]
Suite: precise
Now you're ready to run stdeb. I know of two ways to run it.
You can generate both source and binary packages, like this:
Either syntax creates a directory called deb_dist. It contains a lot of
files including a source .dsc, several tarballs, a copy of your source
directory, and (if you used bdist_deb) a binary .deb package.
If you used the bdist_deb form, don't be put off that
it concludes with a message:
dpkg-buildpackage: binary only upload (no source included)
It's fibbing: the source .dsc is there as well as the binary .deb.
I presume it prints the warning because it creates them as
separate steps, and the binary is the last step.
Now you can use dpkg -i to install your binary deb, or you can use
the source dsc for various purposes, like creating a repository or
a Launchpad PPA. But those involve a lot more steps -- so I'll
cover that in a separate article about creating PPAs.
I've mostly been enormously happy with my
upgrade from my old Archos 5 to the Samsung Galaxy Player 5.0.
The Galaxy does everything I always wanted the Archos to do,
all those things the Archos should have done but couldn't because
of its buggy and unsupported Android 1.6.
That is, I've been happy with everything except one thing: my
birdsong app no longer worked.
I have a little HTML app based on my "tweet" python script
which lets you choose from a list of birdsong MP3 files.
(The actual MP3 files are ripped from the excellent 4-CD
Stokes
Field Guide to Western Bird Songs set.)
The HTML app matches bird names as you type in characters.
(If you're curious, an earlier test version is at
tweet.html.)
On the Archos, I ran that under my
WebClient
Android app (I had to modify the HTML to add a keyboard, since in Android
1.6 the soft keyboard doesn't work in WebView text fields).
I chose a bird, and WebView passed off the MP3 file to the Archos'
built-in audio player. Worked great.
On the Samsung Galaxy, no such luck. Apparently Samsung's built-in
media player can only play files it has indexed itself. If you try
to use it to play an arbitrary file, say, "Song_Sparrow.mp3", it
will say: unknown file type. No matter that the file ends in .mp3 ...
and no matter that I've called
intent.setDataAndType(Uri.parse(url), "audio/mpeg"); ...
and no matter that the file is sitting on the SD cad and has in fact
been indexed already by the media player. You didn't navigate to it
via the media player's UI, so it refuses to play it.
I haven't been able to come up with an answer to how to make Samsung's
media player more flexible, and I was just starting a search for
alternate Android MP3 player apps, when I ran across
Play
mp3 in SD Card, using Android's MediaPlayer
and Error
creating MediaPlayer with Uri or file in assets
which gave me the solution. Instead of using an intent and letting
WebView call up a music application, you can use an Android
MediaPlayer
to play your file directly.
Here's what the code looks like, inside setWebViewClient() which is
itself inside onCreate():
@Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
if (url.endsWith(".mp3")) {
MediaPlayer mediaPlayer = new MediaPlayer();
try {
mediaPlayer.setDataSource(getApplicationContext(), Uri.parse(url));
mediaPlayer.prepare();
mediaPlayer.start();
}
catch (IllegalArgumentException e) { showMessage("Illegal argument exception on " + url); }
catch (IllegalStateException e) { showMessage("Illegal State exception on " + url); }
catch (IOException e) { showMessage("I/O exception on " + url); }
}
}
showMessage() is my little wrapper that pops up an error message dialog.
Of course, you can handle other types, not just files ending in .mp3.
And now I can take the Galaxy out on a birdwalk and use it to help me
identify bird songs.
Venus has been a beautiful sight in the evening sky for months, but
at the end of April it's reaching a brightness peak, magnitude -4.7.
By then, if you look at it in a telescope or even good binoculars,
you'll see it has waned to a crescent. That's a bit non-obvious:
when the moon is a crescent, it's a lot fainter than a full moon.
So why is Venus brightest in its crescent phase?
It has to do with their orbits. The moon is always about the same
distance away, about 385,000 km or 239,000 miles (I've owned cars with
more miles than that!), though it varies a little, from 362,600 km at
perigee to 405,400 km at apogee.
When we look at the full moon, not only are we seeing the whole
Earth-facing surface illuminated, but the central part of that light
is reflecting straight up off the moon's surface. When we look at a
crescent moon, we're seeing light that's near the moon's sunrise or
sunset point -- dimmer and more spread out than the concentrated light
of noon -- and in addition we're seeing less of it.
Venus, in contrast, varies its distance from us immensely.
We can't see Venus when it's "full", because it's on the other side of
the sun from us and lost in the sun's glow. It'll next be there a year
from now, in April of 2013. But if we could see it when it's full, Venus
would be a distant 1.7 AU from us. An AU is an Astronomical Unit, the
average distance of the earth from the sun or about 89 million miles,
so Venus when it's full is about 170 million miles away.
Its disk is a tiny 9.9 arcseconds (an arcsecond is 1/3600 of a degree)
-- about the size of Mars this month.
In contrast, when we look at the crescent Venus around the end of this
month, although we're only seeing about 28% of its surface illuminated,
and that only with glancing twilight rays, it's much closer to us --
less than half an AU, or about 45 million miles -- and its disk
extends a huge 37 arcseconds, bigger than Jupiter this month.
Of course, eventually, as Venus pulls between us and the sun, its
crescent gets so slim that even its huge size can't compensate. So
its peak brightness happens when those two curves cross, when the disk
is somewhere around 27% illuminated, as happens at the end of this
month and the beginning of May.
Exactly when? Good question. The RASC Handbook says Venus' "greatest
illuminated extent" is on April 30, but PyEphem and XEphem say Venus
is actually brighter from May 3-8 ... and when it emerges from the
sun's glare and moves into the morning sky in June, it'll be slightly
brighter still, peaking at magnitude -4.8 in the first week of July.)
Tracking Venus with PyEphem
When I started my Shallow
Sky column this month, I saw the notice of Venus's maximum
brightness and greatest illuminated extent in the
RASC Handbook. But
I wanted more details -- how much did its distance and size really change,
when would the brightness peak again as it emerged from the sun's glare,
when would it next be "full"?
PyEphem made it easy to
calculate all this. Just create an ephem.Venus() object,
calculate its values for any date of interest, then print out
parameters like phase, mag, earth_distance and size.
In just a few minutes of programming, I had a nice table of Venus data.
import ephem
venus = ephem.Venus()
print '%10s %6s %6s %6s %6s' % ('date', '%', 'mag', 'dist', 'size')
def print_venus(when) :
venus.compute(when)
fmt = '%02d-%02d-%02d %6.2f %6.2f %6.2f %6.2f'
trip = when.triple()
print fmt % (trip[0], trip[1], trip[2],
venus.phase, venus.mag, venus.earth_distance, venus.size)
# Loop from the beginning of 2012 through the middle of 2013:
d = ephem.date('2012')
end_date = ephem.date('2013/6/1')
while d < end_date :
print_venus(d)
# Add a week:
d = ephem.date(d + ephem.hour * 24)
I've found PyEphem very handy for calculations like this --
and it's great to be able to double-check listings in other publications.
I've been fighting a bug in Android's WebView class for ages:
on some pages, clicking FeedViewer's back arrow (which calls
WebView::goBack())
doesn't go back to the previous page. Instead, it jumps to some random
position in the current page. If you repeat it, eventually, after five
or so tries (depending on the page), eventually goBack() will finally
work and you'll be back at the previous page.
It was especially frustrating in that it didn't happen everywhere -- only
in pages from certain sites. I saw it all the time in pages from the
Los Angeles Times and from
Make Magazine, but only rarely
on other sites.
But I finally tracked it down: it's because those pages include
the HTML <iframe> tag. Apparently, if a WebView is on a page
(at least if it's a local page) that contains N iframes, the first
N calls to goBack will jump somewhere in the document -- probably
the location of the associated iframe, though I haven't verified that --
and only on the N+1st call will the WebView actually go back to the
previous page.
The oddest thing is, this doesn't seem to be reported anywhere.
Android's bug tracker finds nothing for webview iframe goback,
and web searching hasn't found a hint of it, even though I see this
in Android versions from 1.6 through 2.3.5. How is it possible that
other people haven't noticed this? I wonder if it only happens on
local file:// URLs, and not many people use those.
In any case, I'm very happy to have found the answer at last.
It was easy enough to modify FeedMe to omit iframes (and who wants
iframes in simplified HTML anyway?), and it's great
to have a Back button that works on any page.
I've been fiddling with several new Android devices, which means
I have to teach myself how to use adb all over again.
adb is the
Android
Debug Bridge, and it's great for debugging. It lets you type commands
on your desktop keyboard rather than tapping them into the soft
keyboard in Android's terminal emulator, it gives you a fast
way to install apps, and most important, it lets you get Java stack traces
from crashing programs.
Alas, the documentation is incomplete and sometimes just plain wrong.
Since I don't need adb very often, I always forget how to use it
between sessions, and it takes some googling to figure out the tricks.
Here are the commands I've found most helpful.
Start the server
First you have to start the adb, and that must be done as root.
But adb isn't a system program and probably lives in some path like
/home/username/path/to/android-sdk-linux_x86/tools.
Even if you put it in your own path, it may not be in root's.
You can probably run it with the explicit path:
If you're also running eclipse, that probably won't work the first time,
because eclipse may also have started an adb server (that gets in the
way when you try to run adb manually). if you don't see
"* daemon started successfully *", try killing the server and
restarting it:
# adb kill-server
# adb start-server
* daemon not running. starting it now on port 5037 *
* daemon started successfully *
Keep trying until you see that "* daemon started successfully *" message.
Connecting
$ adb usb
Occasionally, this will give "error: closed". Don't panic -- sometimes
this actually means "I noticed something connected on USB and automatically
connected to it, so no need to connect again." It's mysterious, and no
one seems to have an explanation for what's really happening. Anyway,
try running some adb commands and you may find you're actually connected.
Shell and install
The most fun is running an interactive shell on your Android device.
$ adb shell
It's just busybox, not a full shell, so you don't have nice things like
tab completion. But it's still awfully useful.
You can also install apps. On some devices (like the Nook, where I
haven't found a way to allow install from non-market sources), it's
the only way to install an apk file.
$ adb install /path/to/appname.apk
If the app is already installed, you'll get an error.
Theoretically you can also do adb uninstall first,
but when I tried that it just printed "Failure".
But you can use -r for "reinstall":
$ adb install -r /path/to/appname.apk
There is no mention of either uninstall or -r in the online adb documentation,
though adb -h mentions it.
Update: To uninstall, you need the full name of the package. To get
the names of installed packages (another undocumented command), do this:
adb shell pm list packages
Debug crashes with logcat
Finally, for debugging crashes, you can start up a logcat
and see system messages, including stack traces from crashing apps:
$ adb logcat
Logcat is great for fixing reproducible crashes. Sadly, it's not
so good for random/sporadic crashes that happen during the course
of using the device.
You're supposed to be able to do adb logcat -s AppName
if you're only interested in debugging messages from one app,
but that doesn't work for me -- I get no output even when the
app runs and crashes.
Someone asked me about determining whether an image was "portrait"
or "landscape" mode from a script.
I've long had a script for
automatically rescaling
and rotating images, using
ImageMagick under the hood and adjusting automatically for aspect ratio.
But the scripts are kind of a mess -- I've been using them for over a
decade, and they started life as a csh script back in the pnmscale
days, gradually added ImageMagick and jpegtran support and eventually
got translated to (not very good) Python.
I've had it in the back of my head that I should rewrite this
stuff in cleaner Python using the ImageMagick bindings, rather than
calling its commandline tools. So the question today spurred me to
look into that. I found that ImageMagick isn't the way to go, but
PIL would be a fine solution for most of what I need.
ImageMagick: undocumented and inconstant
Ubuntu has a python-pythonmagick package, which I installed.
Unfortunately, it has no documentation, and there seems to be no
web documentation either. If you search for it, you find a few
other people asking where the documentation is.
Using things like help(PythonMagick) and
help(PythonMagick.Image), you can ferret out a
few details, like how to get an image's size:
>>> help(img.scale)
Help on method scale:
scale(...) method of PythonMagick.Image instance
scale( (Image)arg1, (Geometry)arg2) -> None :
C++ signature :
void scale(Magick::Image {lvalue},Magick::Geometry)
So what does it want for (Geometry)? Strings don't seem to work,
2-tuples don't work, and there's no Geometry object in PythonMagick.
By this time I was tired of guesswork.
Can the Python Imaging Library do better?
What about the query that started all this: how to find out whether
an image is portrait or landscape? Well, the most important thing is
the image dimensions themselves -- whether img.size[0] > img.size[1].
But sometimes you want to know what the camera's orientation sensor
thought. For that, you can use this code snippet:
for tag, value in exif.items():
decoded = PIL.ExifTags.TAGS.get(tag, tag)
if decoded == 'Orientation':
print decoded, ":", value
Then compare the number you get to this Exif
Orientation table. Normal landscape-mode photos will be 1.
Given all this, have I actually rewritten resizeall and rotateall
using PIL? Why, no! I'll put it on my to-do list, honest.
But since the scripts are actually working fine (just don't look at the code),
I'll leave them be for now.
I've long wanted a way of converting my HTML presentation slides to PDF.
Mostly because conference organizers tend to freak out at slides in any
format other than Open/Libre Office, Powerpoint or PDF. Too many times,
I've submitted a tarball of my slides and discovered they weren't even
listed on the conference site. (I ask you, in what way is a tarball
more difficult to deal with than an .odp file?) Slide-sharing websites
also have a limited set of formats they'll accept.
A year or so ago, I added screenshot capability to my webkit-based
presentation program,
Preso,
do "screenshots", but I really needed PDF, not images.
Now, creating PDF from HTML shouldn't be that hard. Every browser has
a print function that can print to a PDF file. So why is it so hard
to create PDF from HTML in any kind of scriptable way?
After much searching and experimenting,
I finally found a Python code snippet that worked:
XHTML
to PDF using PyGTK4 Webkit from Alex Dong. It uses Python-Qt,
not GTK, so I can't integrate it into my Preso app, but that's okay --
a separate tool is just as good.
(I struggled to write an equivalent in PyGTK, but gave up due to the
complete lack of documentation of Python-Webkit-GTK, and not much
more for gtk.PrintOperation(). QWebView's documentation may not be
as complete as I'd like, but at least there is some.)
Printing from QtWebView to QPrinter
Here are the important things I learned about QWebView from fiddling
around with Alex's code to adapt it to what I needed, which is
printing a list of pages to sequentially numbered files:
To print, you need to wait until the page has finished loading,
so connect a function to SIGNAL("loadFinished(bool)"),
then load(QUrl(url)).
That loadFinished function remains registered, so as you load new
pages, it will be called each time. So you can load() the next URL
as the last step in your loadFinished callback.
If you get confused about callbacks and connect more than one of
them, bad things happen, and only the last page gets printed, or
QApplication.exit() doesn't exit at all.
Things I learned about QPrinter():
All the examples I found online set the page size with lines like
QPrinter.setPageSize(QPrinter.A4) or setPaperSize(QPrinter.A4)
(setPageSize is apparently deprecated in favor of setPaperSize); but
If you want to set a specific size, you can do that with a line like
QPrinter.setPaperSize(QSizeF(1024, 768), QPrinter.DevicePixel)
The second argument (DevicePixel) is a
unit,
from this list.
That line gives you the right aspect ratio. But if you think
"DevicePixels" means the size will correspond to pixels in your browser
window (just because the documentation says so), you're sadly mistaken.
If you want a PDF page that actually corresponds to the size of your
browser window, you can get it by calling QWebView.setZoomFactor(z)
You'll have to experiment to find the right value of z;
I found I needed about 1.24 if I wanted to capture my
full 1366x768 slides, or exactly 2.0 if I wanted to restrict the
saved PDF to only the 1024x758 part that shows up in the projector.
To pass that to qhtmlprint, I only need to remove the commented-out lines
(the ones with //) and strip off the quotes and commas. I can do that
all in one command with a grep and sed pipeline:
When I give talks that need slides, I've been using my
Slide
Presentations in HTML and JavaScript for many years.
I uploaded it in 2007 -- then left it there, without many updates.
But meanwhile, I've been giving lots of presentations, tweaking the code,
tweaking the CSS to make it display better. And every now and then I get
reminded that a few other people besides me are using this stuff.
For instance, around a year ago, I gave a talk where nearly all the
slides were just images. Silly to have to make a separate HTML file
to go with each image. Why not just have one file, img.html, that
can show different images? So I wrote some code that lets you go to
a URL like img.html?pix/whizzyphoto.jpg, and it will display
it properly, and the Next and Previous slide links will still work.
Of course, I tweak this software mainly when I have a talk coming up.
I've been working lately on my SCALE talk, coming up on January 22:
Fun
with Linux and Devices (be ready for some fun Arduino demos!)
Sometimes when I overload on talk preparation, I procrastinate
by hacking the software instead of the content of the actual talk.
So I've added some nice changes just in the past few weeks.
For instance, the speaker notes that remind me of where I am in
the talk and what's coming next. I didn't have any way to add notes on
image slides. But I need them on those slides, too -- so I added that.
Then I decided it was silly not to have some sort of automatic
reminder of what the next slide was. Why should I have to
put it in the speaker notes by hand? So that went in too.
And now I've done the less fun part -- collecting it all together and
documenting the new additions. So if you're using my HTML/JS slide
kit -- or if you think you might be interested in something like that
as an alternative to Powerpoint or Libre Office Presenter -- check
out the presentation I have explaining the package, including the
new features.
I've been having (mis)adventures learning about Python's various
options for parsing HTML.
Up until now, I've avoided doing any HTMl parsing
in my RSS reader FeedMe.
I use regular expressions to find the places where content starts and
ends, and to screen out content like advertising, and to rewrite links.
Using regexps on HTML is generally considered to be a no-no, but it
didn't seem worth parsing the whole document just for those modest goals.
But I've long wanted to add support for downloading images, so you
could view the downloaded pages with their embedded images if you so chose.
That means not only identifying img tags and extracting their src
attributes, but also rewriting the img tag afterward to point to the
locally stored image. It was time to learn how to parse HTML.
Since I'm forever seeing people flamed on the #python IRC channel for
using regexps on HTML, I figured real HTML parsing must be straightforward.
A quick web search led me to
Python's built-in
HTMLParser class. It comes with a nice example for how to use it:
define a class that inherits from HTMLParser, then define
some functions it can call for things like handle_starttag and
handle_endtag; then call self.feed(). Something like this:
from HTMLParser import HTMLParser
class MyFancyHTMLParser(HTMLParser):
def fetch_url(self, url) :
request = urllib2.Request(url)
response = urllib2.urlopen(request)
link = response.geturl()
html = response.read()
response.close()
self.feed(html) # feed() starts the HTMLParser parsing
def handle_starttag(self, tag, attrs):
if tag == 'img' :
# attrs is a list of tuples, (attribute, value)
srcindex = self.has_attr('src', attrs)
if srcindex < 0 :
return # img with no src tag? skip it
src = attrs[srcindex][1]
# Make relative URLs absolute
src = self.make_absolute(src)
attrs[srcindex] = (attrs[srcindex][0], src)
print '<' + tag
for attr in attrs :
print ' ' + attr[0]
if len(attr) > 1 and type(attr[1]) == 'str' :
# make sure attr[1] doesn't have any embedded double-quotes
val = attr[1].replace('"', '\"')
print '="' + val + '"')
print '>'
def handle_endtag(self, tag):
self.outfile.write('</' + tag.encode(self.encoding) + '>\n')
Easy, right? Of course there are a lot more details, but the
basics are simple.
I coded it up and it didn't take long to get it downloading images
and changing img tags to point to them. Woohoo!
Whee!
The bad news about HTMLParser
Except ... after using it a few days, I was hitting some weird errors.
In particular, this one:
HTMLParser.HTMLParseError: bad end tag: ''
It comes from sites that have illegal content. For instance, stories
on Slate.com include Javascript lines like this one inside
<script></script> tags: document.write("<script type='text/javascript' src='whatever'></scr" + "ipt>");
This is
technically illegal html -- but lots of sites do it, so protesting
that it's technically illegal doesn't help if you're trying to read a
real-world site.
Some discussions said setting
self.CDATA_CONTENT_ELEMENTS = () would help, but it didn't.
HTMLParser's code is in Python, not C. So I took a look at where the
errors are generated, thinking maybe I could override them.
It was easy enough to redefine parse_endtag() to make it not throw
an error (I had to duplicate some internal strings too). But then I
hit another error, so I redefined unknown_decl() and
_scan_name().
And then I hit another error. I'm sure you see where this was going.
Pretty soon I had over 100 lines of duplicated code, and I was still
getting errors and needed to redefine even more functions.
This clearly wasn't the way to go.
Using lxml.html
I'd been trying to avoid adding dependencies to additional Python packages,
but if you want to parse real-world HTML, you have to.
There are two main options: Beautiful Soup and lxml.html.
Beautiful Soup is popular for large projects, but the consensus seems
to be that lxml.html is more error-tolerant and lighter weight.
Indeed, lxml.html is much more forgiving. You can't handle start and
end tags as they pass through, like you can with HTMLParser. Instead
you parse the HTML into an in-memory tree, like this:
tree = lxml.html.fromstring(html)
How do you iterate over the tree? lxml.html is a good parser, but it
has rather poor documentation, so it took some struggling to figure out
what was inside the tree and how to iterate over it.
You can visit every element in the tree with
for e in tree.iter() :
print e.tag
But that's not terribly useful if you need to know which
tags are inside which other tags. Instead, define a function that iterates
over the top level elements and calls itself recursively on each child.
The top of the tree itself is an element -- typically the
<html></html> -- and each element has .tag and .attrib.
If it contains text inside it (like a <p> tag), it also has
.text. So to make something that works similarly to HTMLParser:
def crawl_tree(tree) :
handle_starttag(tree.tag, tree.attrib)
if tree.text :
handle_data(tree.text)
for node in tree :
crawl_tree(node)
handle_endtag(tree.tag)
But wait -- we're not quite all there. You need to handle two
undocumented cases.
First, comment tags are special: their tag attribute,
instead of being a string, is <built-in function Comment>
so you have to handle that specially and not assume that tag
is text that you can print or test against.
Second, what about cases like
<p>Here is some <i>italicised</i> text.</p>
? in this case, you have the p tag, and its text is
"Here is some ".
Then the p has a child, the i tag, with text of "italicised".
But what about the rest of the string, " text."?
That's called a tail -- and it's the tail of the adjacent i tag it follows,
not the parent p tag that contains it. Confusing!
So our function becomes:
def crawl_tree(tree) :
if type(tree.tag) is str :
handle_starttag(tree.tag, tree.attrib)
if tree.text :
handle_data(tree.text)
for node in tree :
crawl_tree(node)
handle_endtag(tree.tag)
if tree.tail :
handle_data(tree.tail)
See how it works? If it's a comment (tree.tag isn't a string),
we'll skip everything -- except the tail. Even a comment
might have a tail: <p>Here is some <!-- this is a comment --> text we want to show.</p>
so even if we're skipping comment we need its tail.
I'm sure I'll find other gotchas I've missed, so I'm not releasing
this version of feedme until it's had a lot more testing. But it
looks like lxml.html is a reliable way to parse real-world pages.
It even has a lot of convenience functions like link rewriting
that you can use without iterating the tree at all. Definitely worth
a look!
The analemma is that funny figure-eight you see on world globes in the
middle of the Pacific Ocean. Its shape is the shape traced out by
the sun in the sky, if you mark its position at precisely the same
time of day over the course of an entire year.
The analemma has two components: the vertical component represents
the sun's declination, how far north or south it is in our sky.
The horizontal component represents the equation of time.
The equation of time describes how the sun moves relatively faster or
slower at different times of year. It, too, has two components: it's
the sum of two sine waves, one representing how the earth speeds up
and slows down as it moves in its elliptical orbit, the other a
function the tilt (or "obliquity") of the earth's axis compared to
its orbital plane, the ecliptic.
But if you look at photos
of real analemmas in the sky, they're always tilted. Shouldn't they
be vertical? Why are they tilted, and how does the tilt vary with
location? To find out, I wanted a program to calculate the analemma.
Calculating analemmas in PyEphem
The very useful astronomy Python package
PyEphem
makes it easy to calculate the position of any astronomical object
for a specific location. Install it with: easy_install pyephem
for Python 2, or easy_install ephem for Python 3.
The alt and az are the altitude and azimuth of the sun right now.
They're printed as strings: 25:23:16.6 203:49:35.6
but they're actually type 'ephem.Angle', so float(sun.alt) will
give you a number in radians that you can use for calculations.
Of course, you can specify any location, not just major cities.
PyEphem doesn't know San Jose, so here's the approximate location of
Houge Park where the San Jose Astronomical
Association meets:
You can also specify elevation, barometric pressure and other parameters.
So here's a simple analemma, calculating the sun's position at noon
on the 15th of each month of 2011:
for m in range(1, 13) :
observer.date('2011/%d/15 12:00' % (m))
sun.compute(observer)
I used a simple PyGTK window to plot sun.az and sun.alt, so once
it was initialized, I drew the points like this:
# Y scale is 45 degrees (PI/2), horizon to halfway to zenith:
y = int(self.height - float(self.sun.alt) * self.height / math.pi)
# So make X scale 45 degrees too, centered around due south.
# Want az = PI to come out at x = width/2.
x = int(float(self.sun.az) * self.width / math.pi / 2)
# print self.sun.az, float(self.sun.az), float(self.sun.alt), x, y
self.drawing_area.window.draw_arc(self.xgc, True, x, y, 4, 4, 0, 23040)
So now you just need to calculate the sun's position at the same time
of day but different dates spread throughout the year.
And my 12-noon analemma came out almost vertical! Maybe the tilt I saw
in analemma photos was just a function of taking the photo early in
the morning or late in the afternoon? To find out, I calculated the
analemma for 7:30am and 4:30pm, and sure enough, those were tilted.
But wait -- notice my noon analemma was almost vertical -- but
it wasn't exactly vertical. Why was it skewed at all?
Time is always a problem
As always with astronomy programs, time zones turned out to be the
hardest part of the project. I tried to add other locations to my
program and immediately ran into a problem.
The ephem.Date class always uses UTC, and has no concept
of converting to the observer's timezone. You can convert to the timezone
of the person running the program with localtime, but
that's not useful when you're trying to plot an analemma at local noon.
At first, I was only calculating analemmas for my own location.
So I set time to '20:00', that being the UTC for my local noon.
And I got the image at right. It's an analemma, all right, and
it's almost vertical. Almost ... but not quite. What was up?
Well, I was calculating for 12 noon clock time -- but clock time isn't
the same as mean solar time unless you're right in the middle of your
time zone.
You can calculate what your real localtime is (regardless of
what politicians say your time zone should be) by using your longitude
rather than your official time zone:
Maybe that needs a little explaining. I take the initial time string,
like '2011/12/15 12:00', and convert it to an ephem.date.
The number of hours I want to adjust is my longitude (in radians)
times 12 divided by pi -- that's because if you go pi (180) degrees
to the other side of the earth, you'll be 12 hours off.
Finally, I have to multiply that by ephem.hour because ...
um, because that's the way to add hours in PyEphem and they don't really
document the internals of ephem.Date.
Set the observer date to this adjusted time before calculating your
analemma, and you get the much more vertical figure you see here.
This also explains why the morning and evening analemmas weren't
symmetrical in the previous run.
This code is location independent, so now I can run my analemma program
on a city name, or specify longitude and latitude.
PyEphem turned out to be a great tool for exploring analemmas.
But to really understand analemma shapes, I had more exploring to do.
I'll write about that, and post my complete analemma program,
in the next article.
Today is the winter solstice -- the official beginning of winter.
The solstice is determined by the Earth's tilt on its axis, not
anything to do with the shape of its orbit: the solstice is the point
when the poles come closest to pointing toward or away from the sun.
To us, standing on Earth, that means the winter solstice is the day
when the sun's highest point in the sky is lowest.
You can calculate the exact time of the equinox using the handy Python
package PyEphem.
Install it with: easy_install pyephem
for Python 2, or easy_install ephem for Python 3.
Then ask it for the date of the next or previous equinox.
You have to give it a starting date, so I'll pick a date in late summer
that's nowhere near the solstice:
That agrees with my RASC Observer's Handbook: Dec 22, 5:30 UTC. (Whew!)
PyEphem gives all times in UTC, so, since I'm in California, I subtract
8 hours to find out that the solstice was actually last night at 9:30.
If I'm lazy, I can get PyEphem to do the subtraction for me:
I used 8./24 because PyEphem's dates are in decimal days, so in order
to subtract 8 hours I have to convert that into a fraction of a 24-hour day.
The decimal point after the 8 is to get Python to do the division in
floating point, otherwise it'll do an integer division and subtract
int(8/24) = 0.
The shortest day
The winter solstice also pretty much marks the shortest day of the year.
But was the shortest day yesterday, or today?
To check that, set up an "observer" at a specific place on Earth,
since sunrise and sunset times vary depending on where you are.
PyEphem doesn't know about San Jose, so I'll use San Francisco:
>>> import ephem
>>> observer = ephem.city("San Francisco")
>>> sun = ephem.Sun()
>>> for i in range(20,25) :
... d = '2011/12/%i 20:00' % i
... print d, (observer.next_setting(sun, d) - observer.previous_rising(sun, d)) * 24
2011/12/20 20:00 9.56007901422
2011/12/21 20:00 9.55920379754
2011/12/22 20:00 9.55932991847
2011/12/23 20:00 9.56045709446
2011/12/24 20:00 9.56258416496
I'm multiplying by 24 to get hours rather than decimal days.
So the shortest day, at least here in the bay area, was actually yesterday,
2011/12/21. Not too surprising, since the solstice wasn't that long
after sunset yesterday.
If you look at the actual sunrise and sunset times, you'll find
that the latest sunrise and earliest sunset don't correspond to the
solstice or the shortest day. But that's all tied up with the equation
of time and the analemma ... and I'll cover that in a separate article.
A new trail opened up above Alum Rock park! Actually a whole new open
space preserve, called Sierra Vista -- with an extensive set of trails
that go all sorts of interesting places.
Dave and I visit Alum Rock frequently -- we were married there --
so having so much new trail mileage is exciting. We tried to explore it
on foot, but quickly realized the mileage was more suited to mountain
bikes. Even with bikes, we'll be exploring this area for a while
(mostly due to not having biked in far too long, so it'll take us
a while to work up to that much riding ... a combination of health
problems and family issues have conspired to keep us off the bikes).
Of course, part of the fun of discovering a new trail system is poring
over maps trying to figure out where the trails will take us, then
taking GPS track logs to study later to see where we actually went.
And as usual when uploading GPS track logs and viewing them in pytopo,
I found some things that weren't working quite the way I wanted,
so the session ended up being less about studying maps and more
about hacking Python.
In the end, I fixed quite a few little bugs, improved some features,
and got saved sites with saved zoom levels working far better.
Now, PyTopo 1.0 happened quite a while ago -- but there were two of
us hacking madly on it at the time, and pinning down the exact time
when it should be called 1.0 wasn't easy. In fact, we never actually
did it. I know that sounds silly -- of all releases to not get around
to, finally reaching 1.0? Nevertheless, that's what happened.
I thought about cheating and calling this one 1.0, but we've had 1.0
beta RPMs floating around for so long (and for a much earlier release)
that that didn't seem right.
So I've called the new release PyTopo 1.1. It seems to be working
pretty solidly. It's certainly been very helpful to me in exploring
the new trails. It's great for cross-checking with Google Earth:
the OpenCycleMap database has much better trail data than Google
does, and pytopo has easy track log loading and will work offline,
while Google has the 3-D projection aerial imagery that shows
where trails and roads were historically (which may or may not
correspond to where they decide to put the new trails).
It's great to have both.
In case you haven't been following it, Stanford's computer science
department began a grand experiment in online learning early this month:
free, upper division college courses, given online and open
to the whole world. There are three classes offered:
Artificial Intelligence,
Machine Learning and
Introduction to Databases.
They've sparked an incredible response: exact numbers don't seem to be
available, but rumor is that AI had about 130,000 enrolees, while ML
had about 70,000. (Nobody seems to have published numbers for DB.)
Update, a day later: @seemsArtless tweets that
ML
currently has 87,000 registered users.
Why so much interest? Surely there are lots of places to get free
information (like wikipedia) and
even course lectures (like MIT).
And there are plenty of places to take classes for relatively low cost,
like local junior colleges or ed2go.
What's different about the Stanford classes is that they cover advanced
material, in far more depth than you'd find at a junior college or
typical online site. They offer graded homework so you can see how
you're doing, and there are other students taking the class at the
same time, so if you get stuck, there are all sorts of discussion
groups you can turn to. It's one thing to read a textbook or watch a
video by yourself; I find a class much more helpful, and judging by
the response to the Stanford classes, I'm not alone in that.
I agonized over whether to take AI or Machine Learning. They both
sounded so interesting! Since I couldn't decide, I initially signed up
for both, figuring I'd drop one if the load was too great.
By the end of the second week, I'd settled on Machine Learning.
I was starting to dread the AI class flash quizzes -- which didn't always
work right, but made it hard to proceed until you'd answered the
question right even if you couldn't see the question -- and to feel
frustrated about the lectures, which clearly were meant as a jumping
off point for students to go do their own outside reading.
On the other hand, I was really enjoying the Machine Learning
lectures, and looking forward to each new one.
And the real kicker:
Machine Learning includes programming assignments, so students can
implement the algorithms Professor Ng talks about in the lectures.
What's great about Machine Learning
Andrew Ng's video lectures are wonderfully clear, well paced and full of
interesting content.
He uses a lot of graphs to help students visualize what's going on
geometrically, rather than just relying on the equations.
(Better yet, in the programming exercises he shows us how to create
those graphs for ourselves.)
And he's great about flagging certain portions as possibly review
(you can skip this lecture if you already know linear algebra) or
advanced (this is some extra background for people who know calculus,
but you can skip it and still do fine in the course).
The technology is simpler than that used in the AI course.
If you have a slow net connection or travel a lot, you can download
the lectures as mp4 files and watch them offline.
You can download lecture slides as a PDF or PPT.
Review questions (graded) are handled with simple HTML forms.
All very simple, well-tested technology, and it works great.
I've had no problems accessing the servers, submitting homework
or anything else -- very impressive!
But the heart of the course is the programming exercises. ML is taught
in GNU octave, a framework and language for numerical computing
and matrix operations. Students aren't absolutely required to use
octave, but it's highly recommended: Professor Ng says he's found
that students learn much faster that way.
Sounds good to me, and octave looks like a useful skill, well worth
acquiring. I'm having fun learning it.
The programming exercises come with a lot of scaffold code plus a few
files with "Your code goes here". The actual amount of coding isn't
large. But I'm finding that it does the job: it forces me to make
sure I understand the matrix operations discussed in the lectures.
And at the end, you come out with something that's actually useful!
From the first few weeks, I have linear and logistic regression code
that I could use to analyze and visualize all sorts of datasets. Now,
at the end of week 4, we're halfway through writing a neural network to
recognize handwritten numerals from image data. How cool is that?
Suggestions for improvement
The class is a huge success. Who would have thought that you could
teach something this advanced on such a huge scale, so effectively?
I have only a couple of small suggestions -- ways the class could be
even better next time.
An errata page. In week 3, there was an error in the lecture and
notes, a - instead of a +, that made one part of programming ex. 2 quite
a bit trickier than it would otherwise have been. If I hadn't
noticed that the slides used + in some places and - in
others, I might never have gotten that part of
the assignment working. Lots of other people found that too, and there
were discussions in the Q&A forum ... but you wouldn't find it
without coming up with clever search terms.
The Q&A forum would be so much more useful if it was organized
by topic, and/or by week. There are some great discussions there, but
the only way of getting to them is by searching for the right terms.
There's no way to browse discussions, see how people are doing on
assignment 3, or look for errata and similar warnings. It would
help make the class more of a community, more like a real in-person class.
Hope for future expansion
I mentioned my suggestions because I fervently hope there is a "next time".
These classes are a great service, and I hope the huge response isn't
putting too much burden on the instructors.
"Common wisdom" among providers of online classes seems to be that
there's no demand outside of enrolled university students for hard
courses, courses with prerequisites, and especially courses that
involve (shudder) math. Just look at the offerings from any
online courseware or adult ed program -- they're long on art appreciation
and "Introduction to MS Word", short on physics and econometrics.
Even the for-pay online degree mills concentrate on humanities and
business, not technical subjects.
Stanford's experiment has proven that "common wisdom" is wrong -- that
tens of thousands of students will jump at the chance to take highly
technical, mathematical courses.
I'd love to see the model expanded to other subjects,
such as statistics, economics, physics, geology and climate science.
And, yes, there is money to be made here. If this many people will
take a free class, wouldn't quite a few of them be willing to pay?
Most couldn't afford $1000 like UC Extension classes -- but how about
$100, comparable to other online education classes?
Would people pay more if you offered college credit?
Online education providers, take note!
There's a large, underserved market for scientific and technical classes
out here in the long tail.
Debugging Arduino sensors can sometimes be tricky.
While working on my Arduino sonar
project, I found myself wanting to know what values
the Arduino was reading from its analog port.
It's easy enough to print from the Arduino to its USB-serial line.
First add some code like this in setup():
Serial.begin(9600);
Then in loop(), if you just read the value "val":
Serial.println(val);
Serial output from Python
That's all straightforward --
but then you need something that reads it on the PC side.
When you're using the Arduino Java development environment, you can
set it up to display serial output in a few lines at the bottom of
the window. But it's not terrifically easy to read there, and I
don't want to be tied to the Java IDE -- I'm much happier doing my
Arduino
development from the command line. But then how do you read serial
output when you're debugging?
In general, you can use the screen program to talk to serial
ports -- it's the tool of choice to log in to plug computers.
For the Arduino, you can do something like this:
screen /dev/ttyUSB0 9600
But I found that a bit fiddly for various reasons. And I discovered
that it's easy to write something like this in Python, using
the serial module.
You can start with something as simple as this:
import serial
ser = serial.Serial("/dev/ttyUSB0", 9600)
while True:
print ser.readline()
Serial input as well as output
That worked great for debugging purposes.
But I had another project (which I will write up separately)
where I needed to be able to send commands to the Arduino as well
as reading output it printed. How do you do both at once?
With the select module, you can monitor several file descriptors
at once. If the user has typed something, send it over the serial line
to the Arduino; if the Arduino has printed something, read it and
display it for the user to see.
That loop looks like this:
while True :
# Check whether the user has typed anything (timeout of .2 sec):
inp, outp, err = select.select([sys.stdin, self.ser], [], [], .2)
# If the user has typed anything, send it to the Arduino:
if sys.stdin in inp :
line = sys.stdin.readline()
self.ser.write(line)
# If the Arduino has printed anything, display it:
if self.ser in inp :
line = self.ser.readline().strip()
print "Arduino:", line
Add in a loop to find the right serial port (the Arduino doesn't always
show up on /dev/ttyUSB0) and a little error and exception handling,
and I had a useful script that met all my Arduino communication needs:
ardmonitor.
Every now and then I have to run a program that doesn't manage its
tooltips well. I mouse over some button to find out what it does,
a tooltip pops up -- but then the tooltip won't go away. Even if I
change desktops, the tooltip follows me and stays up on all desktops.
Worse, it's set to stay on top of all other windows, so it blocks
anything underneath it.
The places where I see this happen most often are XEphem (probably as an
artifact of the broken Motif libraries we're stuck with on Linux);
Adobe's acroread (Acrobat Reader), though perhaps that's gotten
better since I last used it; and Wine.
I don't use Wine much, but lately I've had to use it
for a medical imaging program that doesn't seem to have a Linux
equivalent (viewing PETscan data). Every button has a tooltip, and
once a tooltip pops up, it never goes aawy. Eventually I might have
five of six of these little floating windows getting in the way of
whatever I'm doing on other desktops, until I quit the wine program.
So how does one get rid of errant tooltips littering your screen?
Could I write an Xlib program that could nuke them?
Finding window type
First we need to know what's special about tooltip windows, so the program can
identify them. First I ran my wine program and produced some sticky tooltips.
Once they were up, I ran xwininfo and clicked on a tooltip.
It gave me a bunch of information about the windows size and location,
color depth, etc. ... but the useful part is this:
Override Redirect State: yes
In X,
override-redirect windows
are windows that are immune to being controlled by the window manager.
That's why they don't go away when you change desktops, or move when
you move the parent window.
So what if I just find all override-redirect windows and unmap (hide) them?
Or would that kill too many innocent victims?
Python-Xlib
I thought I'd have to write my little app in C, since it's doing
low-level Xlib calls. But no -- there's a nice set of Python bindings,
python-xlib. The documentation isn't great, but it was still pretty
easy to whip something up.
The first thing I needed was a window list: I wanted to make sure I
could find all the override-redirect windows. Here's how to do that:
from Xlib import display
dpy = display.Display()
screen = dpy.screen()
root = screen.root
tree = root.query_tree()
for w in tree.children :
print w
w is a
Window
(documented here). I see in the documentation that I can get_attributes().
I'd also like to know which window is which -- calling get_wm_name()
seems like a reasonable way to do that. Maybe if I print them, those
will tell me how to find the override-redirect windows:
for w in tree.children :
print w.get_wm_name(), w.get_attributes()
Window type, redux
Examining the list, I could see that override_redirect was one of
the attributes.
But there were quite a lot of override-redirect windows.
It turns out many apps, such as Firefox, use them for things like
menus. Most of the time they're not visible. But you can look at
w.get_attributes().map_state to see that.
So that greatly reduced the number of windows I needed to examine:
for w in tree.children :
att = w.get_attributes()
if att.map_state and att.override_redirect :
print w.get_wm_name(), att
I learned that tooltips from well-behaved programs like Firefox tended
to set wm_name to the contents of the tooltip. Wine doesn't -- the wine
tooltips had an empty string for wm_name. If I wanted to kill just
the wine tooltips, that might be useful to know.
But I also noticed something more important: the tooltip windows
were also "transient for" their parent windows.
Transient
for means a temporary window popped up on behalf of a parent window;
it's kept on top of its parent window, and goes away when the parent does.
Now I had a reasonable set of attributes for the windows I wanted to
unmap. I tried it:
for w in tree.children :
att = w.get_attributes()
if att.map_state and att.override_redirect and w.get_wm_transient_for():
w.unmap()
It worked! At least in my first test: I ran the wine program, made a
tooltip pop up, then ran my killtips program ... and the tooltip disappeared.
Multiple tooltips: flushing the display
But then I tried it with several tooltips showing (yes, wine will pop
up new tooltips without hiding the old ones first) and the result
wasn't so good. My program only hid the first tooltip. If I ran it again,
it would hide the second, and again for the third. How odd!
I wondered if there might be a timing problem.
Adding a time.sleep(1) after each w.unmap()
fixed it, but sleeping surely wasn't the right solution.
But X is asynchronous: things don't necessarily happen right away.
To force (well, at least encourage) X to deal with any queued events
it might have stacked up, you can call dpy.flush().
I tried adding that after each w.unmap(), and it worked. But it turned
out I only need one
dpy.flush()
at the end of the program, just exiting. Apparently if I don't do that,
only the first unmap ever gets executed by the X server, and the rest
are discarded. Sounds like flush() is a good idea as the last line
of any python-xlib program.
killtips will hide tooltips from well-behaved programs too.
If you have any tooltips showing in Firefox or any GTK programs, or
any menus visible, killtips will unmap them.
If I wanted to make sure the
program only attacked the ones generated by wine, I could
add an extra test on whether w.get_wm_name() == "".
But in practice, it doesn't seem to be a problem. Well-behaved
programs handle having their tooltips unmapped just fine: the next
time you call up a menu or a tooltip, the program will re-map it.
Not so in wine: once you dismiss one of those wine tooltips, it's gone
forever, at least until you quit and restart the program. But that
doesn't bother me much: once I've seen the tooltip for a button and
found out what that button does, I'm probably not going to need to see
it again for a while.
So I'm happy with killtips, and I think it will solve the problem.
Here's the full script:
killtips.
This post is, above all, a lesson in doing a web search first.
Even when what you're looking for is so obscure you're sure no one
else has wanted it. But the script I got out of it might turn out to
be useful.
It started with using
Bitlbee for Twitter.
I love bitlbee -- it turns a Twitter stream into just another IRC channel
tab in the xchat I'm normally running anyway.
The only thing I didn't love about bitlbee is that, unlike the twitter
app I'd previously used, I didn't have any way to keep track of when I
neared the 140-character limit. There were various ways around that,
mostly involving pasting the text into other apps before submitting it.
But they were all too many steps.
It occurred to me that one way around this was to select-all, then run
something that would show me the number of characters in the X selection.
That sounded like an easy app to write.
Getting the X selection from Python
I was somewhat surprised to find that Python has no way of querying the
X selection. It can do just about everything else -- even
simulate
X events. But there are several
command-line applications that can print the selection, so it's easy
enough to run xsel or xclip from Python and
read the output.
I ended up writing a little app that brings up a dialog showing the
current count, then hangs around until you dismiss it, querying the
selection once a second and updating the count. It's called
countsel.
Of course, if you don't want to write a Python script you can use
commandline tools directly. Here are a couple of examples, using xclip instead
of xsel:
xterm -title 'lines words chars' -geometry 25x2 -e bash -c 'xclip -o | wc; read -n 1'
pops up a terminal showing the "wc" counts of the selection once, and
xterm -title 'lines words chars' -geometry 25x1 -e watch -t 'xclip -o | wc'
loops over those counts printing them once a second.
Binding commands to a key is different for every window manager.
In Openbox, I added this to rc.xml to call up my program
whenever I type W-t (short for Twitter):
Now, any time I needed to check my character count, I could triple-click
or type Shift-Home, then hit W-t to call up the dialog and get a count.
Then I could leave the dialog up, and whenever I wanted a new count,
just Shift-Home or triple-click again, and the dialog updates automatically.
Not perfect, but not bad.
Xchat plug-in for a much more elegant solution
Only after getting countsel working did it occur to me
to wonder if anyone else had the same Bitlbee+xchat+twitter problem.
And a web search found exactly what I needed:
xchat-inputcount.pl,
a wonderful xchat script that adds a character-counter next to the
input box as you're typing. It's a teensy bit buggy, but still, it's
far better than my solution. I had no idea you could add user-interface
elements to xchat like that!
But that's okay. Countsel didn't take long to write.
And I've added word counting to countsel, so I can use it for
word counts on anything I'm writing.
Someone mailed out information to a club I'm in as an .XLS file.
Another Excel spreadsheet. Sigh.
I do know one way to read them. Fire up OpenOffice,
listen to my CPU fan spin as I wait forever for the app to start up,
open the xls file, then click in one cell after another as I deal
with the fact that spreadsheet programs only show you a tiny part
of the text in each cell. I'm not against spreadsheets per se --
they're great for calculating tables of interconnected numbers --
but they're a terrible way to read tabular data.
Over the years, lots of open-source programs like word2x and catdoc
have sprung up to read the text in MS Word .doc files. Surely by
now there must be something like that for XLS files?
Well, I didn't find any ready-made programs, but I found something better:
Python's xlrd module, as well as a nice clear example at ScienceOSS
of how to Read
Excel files from Python.
Following that example, in six lines I had a simple program to print
the spreadsheet's contents:
import xlrd
for filename in sys.argv[1:] :
wb = xlrd.open_workbook(filename)
for sheetname in wb.sheet_names() :
sh = wb.sheet_by_name(sheetname)
for rownum in range(sh.nrows) :
print sh.row_values(rownum)
Of course, having gotten that far, I wanted better formatting so I
could compare the values in the spreadsheet. Didn't take long to write,
and the whole thing still came out under 40 lines:
xlsrd.
And I was able to read that XLS file that was mailed to the club,
easily and without hassle.
I'm forever amazed at all the wonderful, easy-to-use modules there are
for Python.
A few weeks ago, at the annual
GetSET engineering summer
camp for high school girls, I taught my usual
one-day workshop on beginning programming in Javascript.
The big question every year is always how to make the class more interactive.
The girls who come to GetSET are great -- smart and motivated --
but after six hours of lectures and working through exercises,
anyone, of any age, is going to glaze over.
Especially when it's their first introduction to programming
and they only have a day to learn it.
People learn better when they're asking questions, thinking and solving
problems, not just listening or following instructions.
For years I've heard vague references to "programming a person"
as an exercise for teaching the basic idea of programming.
The idea is to get the students to come up with step-by-step
instructions for someone to do something --
say, walk across the room and pick up a water bottle -- so they
realize how specific you have to be. It also solves another problem:
giving everyone a break from sitting still and focusing on a computer screen.
But how do you really do it? What kind of problems work best in practice?
How much time should you allow? If you have a volunteer carrying
out the instructions, how do you keep them from skipping steps?
Surprisingly, I couldn't find anything written up to help an
inexperienced would-be teacher of programming.
What I needed was a chance to try out some ideas, or watch someone
with more of a clue on this sort of teaching. This year, I found
opportunities for both.
First try: Toastmasters
One of the reasons I love
Toastmasters,
especially with a small and friendly club like
Coherent Communicators,
is that it offers a safe place to try new presentation techniques
and get good feedback about what does and doesn't work.
So I made my first try at a Toastmasters meeting a few weeks before
the GetSET workshop.
I allowed 15-20 minutes for the exercise.
I explained to the audience that I wanted them to get me to turn left,
walk over to the easel at the side of the room, touch it, turn around,
walk back to the lectern, pick up the gavel and pound it on the lectern.
I would solicit a command from them, write it on the whiteboard,
then carry out the command and ask for the next command.
The day's audience was a fairly even mix of techies and non.
I had wondered whether the audience would be widely mixed in how
specific their instructions were, but they were fairly uniform --
mostly along the lines of "Turn 90 degrees left." "Take 5 steps."
"Take 2 more steps".
Of course, there were a few joking suggestions from the techies, like
"send an electrical impulse from your brain to your left quadriceps",
that you wouldn't expect with a high school group, but mostly everyone
was on the same page.
When I got near the easel, we hit "Raise your right arm". (Oops, not
close enough yet.) "Um ... lean forward about a foot?" A good
illustration of being specific ... just the sort of thing I was hoping for.
They got me back to the lectern, got me to pick up the gavel (I was
letting them skip a few steps by this point) ... and improvised a
little, getting me to knock my head rather than the lectern.
That was fun, and got some laughs ... it worked well.
I had hoped to do a second run where I guided them into understanding
a while loop ("while (not yet to the easel), take another step").
But seeing a yellow light from the timer, I opted for a quick
explanation of how a loop would work rather than guiding the audience
into it. I found out later that the timer had hit the wrong button and
only given me 8 minutes rather than my requested 15-20 ... so 20 minutes
actually would have been plenty of time to cover loops as well as
basic instructions. Disappointing ... but I was surprised we'd gotten
so much done in so little time.
Lessons learned:
Draft a volunteer to write the instructions on the board.
It was distracting and time-wasting to run back from the side of the
room to write each new instruction.
You can teach the basic concepts in less than ten minutes.
Try 2: "Program a blind robot"
For the real workshop, I had help in the form of Esther Heller, an
experienced girl scout leader as well as many year GetSET veteran.
Esther had done exercises like this before and was willing to take the lead;
I was looking forward to learning from her. We had discussed two
different variants, and decided to try both of them at different times
during the day.
For the first variant, we waited until mid-morning when the class was
bogging down a bit and looked like they needed a break. Esther called
for two volunteers: one programmer and one robot. The girl playing the
robot was blindfolded with a bandanna and escorted to the door of the
room, while Esther whispered the task to the other girl. The task was
something like walking over to a water bottle, picking it up,
walking over to another girl and handing it to her -- though the
rest of us didn't know that until it was completed.
The instructions suggested by the girls were quite similar to the ones
I'd heard in Toastmasters. There was lots of "Take 5 steps" ... "take
two more steps", guessing at how many steps it would take to get from
one place to another. No one came up with anything like a loop or
conditional. I'd wondered if anyone would try remote control -- "walk"
then wait until the right moment to yell "STOP!" -- but no one did.
The blindfolding worked really well. I'd worried that with a volunteer
chosen to be the robot, she might skip steps she hadn't been given.
But if the "robot" is blindfolded and doesn't know the task, she can't
skip steps; she can only do what she's programmed to. The only problem
was that a blindfolded person told to walk straight ahead does not
necessarily hold to a straight line, much to the consternation of
the girl playing the programmer.
There was a lot of "turn right" ... "no, not that much, turn back left
again" ... "now turn JUST A LITTLE to the right" that helped stress
the need for specificity -- exactly what we were after. I had wondered
beforehand whether anyone would ever suggest anything like "turn right
by 30 degrees", but no one, either in Toastmasters or GetSET, ever did.
The exercise was successful and everybody seemed to have fun,
so it broke up the morning well. We didn't get to loops or
conditionals, though.
I didn't record how long we spent, but it was probably in the neighborhood
of 20 minutes.
Lessons:
Blindfolding and choosing a volunteer definitely helps this exercise:
it solves the problem of a volunteer who might skip steps.
I wished the whole room knew what the task was ... but I'm not sure
how to accomplish that. Either you have to escort the "robot" far
enough away that she can't hear you explain it, or write it on the
board after she's blindfolded. Extra time either way.
Try 3, in groups: "The muffin is ready"
At the end of the day, we tried Esther's favorite variant. You're
watching TV, and you want to go to the kitchen, get an English muffin,
toast it, put butter/jam/peanut butter/whatever on it, take it back
to your seat and eat it. What are the steps?
Esther divided the girls into groups of 4-5 and passed out post-its
on which to write the steps. There was some inertia getting started ...
it was late in the day and everybody was tired. (That's not unique to
this exercise -- it's always a challenge to come up with something
that will hold the girls' interest for the last hour. It's a long day
for everyone.)
Eventually they got rolling and got into it -- I saw some very long
stacks of post-its from various groups. With ten minutes left to go in
the session, Esther picked two volunteers from one group: one to read
the instructions, one to execute them.
She pointed out places where they skipped steps -- "Hey, wait,
how can she get the muffins out of the cupboard without opening the
cupboard first?" After a minute or two, Esther called on a new pair
from a different group to continue where the first pair had left off.
As she worked through all the groups, you could see
each group becoming more cognizant of steps they had skipped, and
improvising them on the spot. Despite the end-of-day crankiness,
you could see they were learning from the exercise.
Lessons:
Splitting into groups allows for more discussion among the girls,
and comparing various groups' answers is fun.
Splitting into groups takes a lot of time, and you have to
monitor to make sure all the groups are actually working on the
problem and not just chatting.
So which is better? The muffin exercise
was definitely more time consuming than the previous
"robot" exercise, due to overhead of splitting into groups and
bringing up volunteers from each group. On the other hand, I could
see there was benefit in having them work in small groups, and in the
touch of competition in comparing their group's answers with the ones
from other groups.
It was hard to compare the two exercises directly to
see which one worked better, because of end-of-day crankiness.
But they both worked well -- I'm going to keep using some variant
of this in future workshops, ideally with loops and conditionals added.
Thanks, Esther, for your expertise ... and to the students and the rest
of the volunteers for making it a successful class!
How do you delete email from a mail server without downloading or
reading it all?
Why? Maybe you got a huge load of spam and you need to delete it.
Maybe you have your laptop set up to keep a copy of your mail on the
server so you can get it on your desktop later ... but after a while
you realize it's not worth downloading all that mail again.
In my case, I use an ISP that keeps copies of all mail forwarded from
one alias to another, so I periodically need to clean out the copies.
There are quite a few reasons you might want to delete mail without
reading it ... so I was surprised to find that there didn't seem to be
any easy way to do so.
But POP3 is a fairly simple protocol. How hard could it be
to write a Python script to do what I needed?
Not hard at all, in fact. The
poplib package
does most of the work for you, encapsulating both the networking and the
POP3 protocol. It even does SSL, so you don't have to send your password
in the clear.
Once you've authenticated, you can list() messages, which gives you a
status and a list of message numbers and sizes, separated by a space.
Just loop through them and delete each one.
Here's a skeleton program to delete messages:
server = "mail.example.com"
port = 995
user = "myname"
passwd = "seekrit"
pop = poplib.POP3_SSL(server, port)
pop.user(user)
pop.pass_(passwd)
poplist = pop.list()
if poplist[0].startswith('+OK') :
msglist = poplist[1]
for msgspec in msglist :
# msgspec is something like "3 3941",
# msg number and size in octets
msgnum = int(msgspec.split(' ')[0])
print "Deleting msg %d\r" % msgnum,
pop.dele(msgnum)
else :
print "No messages for", user
else :
print "Couldn't list messages: status", poplist[0]
pop.quit()
Of course, you might want to add more error checking, loop through a
list of users, etc. Here's the full script:
deletemail.
The Beginning Python class has pretty much died down -- although there
are still a couple of interested students posting really great homework
solutions, I think most people have fallen behind, and it's time to
wrap up the course.
So today, I didn't post a formal lesson. But I did have something to
share about how I used Python's object-oriented capabilities to solve
a problem I had copying new podcast files onto my MP3 player.
I used Python's built-in list sort() function, along with the
easy way it lets me define operators like < and > for any
object I define.
The web development and GUI toolkits are topics which were requested by
students, while the string ops are things that just seemed too useful
not to include.
Lesson 8 in my online Python course is up:
Lesson
8: Extras, including exception handling, optional arguments, and
running system commands.
A motley collection of fun and useful topics that didn't quite fit
anywhere in the earlier formal lessons, but you'll find a lot of use
for them in writing real-world Python scripts. In the homework, I have
some examples of some of my scripts using these techniques; I'm sure
the students will have lots of interesting problems of their own.
This is the last formal lesson in the Beginning Python class.
But I will be posting a few more "tips and tricks" lessons,
little things that didn't fit in other lessons plus suggestions
for useful Python packages students may want to check out as they
continue their Python hacking.
We're getting near the end of the course -- partly because I think
students may be saturated, though I may post one more lesson. I'll
post on the list and see what the students think about it.
This afternoon, though, is pretty much booked up trying to get my
mother's new Nook Touch e-book reader working with Linux.
Would be easy ... except that she wants to be able to check out
books from her local public library, which of course uses proprietary
software from Adobe and other companies to do DRM. It remains to be
seen if this will be possible ... of course, I'll post the results
once we know.
It's a motley mix of topics, mostly because I wanted to have a fun
homework project that actually did something interesting. I hope
everyone enjoys it!
This lesson is a little longer than previous lessons, but that's
partly because of a couple of digressions at the beginning.
Hope I didn't overdo it! The homework includes an optional debugging
problem for folks who want to dive a little deeper into this stuff.
There may be some backlog on the mailing list -- my first attempt
to post the lesson didn't show up at all, but my second try made it.
Mail seems to be flowing now, but
if you try to post something and it doesn't show up, let me know or
tell us on irc.linuxchix.org, so we know if there's a continuing
problem that needs to be fixed, not just a one-time glitch.
Meanwhile, I'm having some trouble getting new blog entries posted.
Due to some network glitches, I had to migrate shallowsky.com to a
different ISP, and it turns out the PyBlosxom 1.4 I'd been using
doesn't work with more recent versions of Python; but none of my
PyBlosxom plug-ins work in 1.5. Aren't software upgrades a joy?
So I'm getting lots of practice debugging other people's Python code
trying to get the plug-ins updated, and there probably won't be many
blog entries until I've figured that out.
Once that's all straightened out, I should have a cool new PyTopo
feature to report on, as well as some Arduino hacks I've had on the
back burner for a while.
I've just posted Lesson 2 in my online Python course, covering
loops, if statements, and beer! You can read it in the list archives:
Lesson
2: Loops, if, and beer, or, better, subscribe to the list so
you can join the discussion.
I'm about to start a new LinuxChix course:
Beginning Programming in Python.
It will be held on the
Linuxchix
Courses mailing list:
to follow the course, subscribe to the list.
Lessons will be posted weekly, on Fridays, with the
first lesson starting tomorrow, Friday, June 17.
This is intended a short course, probably only 4-5 weeks to start with,
aimed mostly at people who are new to programming. Though of course
anyone is welcome, even if you've programmed before. And experienced
programmers are welcome to hang out, lurk and help answer questions.
I might extended the course if people are still interested and
having fun.
The course is free (just subscribe to the mailing list)
and open to both women and men. Standard LinuxChix rules apply:
Be polite, be helpful. And do the homework. :-)
I've been doing more Arduino development lately.
But I don't use the Arduino Java development environment -- programming
is so much easier when you have a real editor, like emacs or vim, and
key bindings to speed everything up.
I've found very little documentation on how to do command-line Arduino
development, and most of the Makefiles out there are old and no longer
work. So I've written up a tutorial. It ended up too long for a blog
post, so I've made it a separate article:
Writing Python scripts for MeeGo is easy. But how do you package a
Python script in an RPM other MeeGo users can install?
It turned out to be far easier than I expected. Python and Ubuntu had
all the tools I needed.
First you'll need a .desktop file describing your app, if you don't
already have one. This gives window managers the information they
need to show your icon and application name so the user can run it.
Here's the one I wrote for PyTopo:
pytopo.desktop.
Of course, you'll also want a desktop icon. Most other applications on
MeeGo seemed to use 48x48 pixel PNG images, so that's what I made,
though it seems to be quite flexible -- an SVG is ideal.
With your script, desktop file and an icon, you're ready to create
a package.
I'm on an Ubuntu (Debian-based) machine, and all the docs imply you
have to be on an RPM-based distro to make an RPM. Happily, that's not
true: Ubuntu has RPM tools you can install.
$ sudo apt-get install rpm
Then let Python do its thing:
$ python setup.py bdist_rpm
Python generates the spec file and everything else needed and builds
a multiarch RPM that's ready to install on MeeGo.
You can install it by copying it to the MeeGo device with
scp dist/PyTopo-1.0-1.noarch.rpm meego@address.of.device:/tmp/.
Then, as root on the device, install it with
rpm -i /tmp/PyTopo-1.0-1.noarch.rpm.
You're done!
To see a working example, you can browse my latest
PyTopo source
(only what's in SVN; it needs a few more tweaks before it's ready for
a formal release). Or try the RPM I made for MeeGo:
PyTopo-1.0-1.noarch.rpm.
I'd love to hear whether this works on other RPM-based distros.
What about Debian packages?
Curiously, making a Debian package on Debian/Ubuntu is much less
straightforward even if you're starting on a Debian/Ubuntu machine.
Distutils can't do it on its own.
There's a
Debian
Python package recipe, but it begins with a caution that you
shouldn't use it for a package you want to submit.
For that, you probably have to wade through the
Complete Ubuntu
Packaging Guide. Clearly, that will need a separate article.
I got some fun email today -- two different people letting me know
about new projects derived from my Python code.
One is M-Poker,
originally based on a PyQt tutorial I wrote for Linux Planet.
Ville Jyrkkä has taken that sketch and turned it into a real
poker program.
And it uses PySide now -- the new replacement for PyQt, and one
I need to start using for MeeGo development. So I'll be taking a look
at M-Poker myself and maybe learning things from it.
There are some screenshots on the blog
A Hacker's Life in Finland.
The other project is xkemu,
a Python module for faking keypresses, grown out of
pykey,
a Python version of my
Crikey keypress
generation program. xkemu-server.py looks like a neat project -- you
can run it and send it commands to generate key presses, rather than
just running a script each time.
(Sniff) My children are going out into the world and joining other
projects. I feel so proud. :-)
Mostly the transition to Firefox4 has pretty smooth. But there's been one
big hassle: middlemouse content load URL doesn't work as well as it used to.
Middlemouse content load is a great Firefox feature on Linux and other Unix
platforms. You see a URL somewhere that doesn't have clickable URLs --
say, a plaintext mail message, or something somebody typed in IRC.
You highlight it with the mouse --
no need to Copy explicitly -- X does that automatically whenever you
highlight text). Then move to the Firefox window and click the middle mouse
button somewhere in the content window -- anywhere as long as it's not
over a link or text input -- and Firefox goes straight to the URL you pasted.
A few silly Linux distros, like Ubuntu, disable this feature
by default. You can turn it back on by going to about:config,
searching for middlemouse, and setting
middlemouse.contentLoadURL to true.
Except, in Firefox 4, much of the time nothing happens. In Firefox 4,
contentLoadURL only works if the URL you pasted is a complete URL,
like http://example.com. This is completely silly, because
most of the time, if you had a complete URL, it would have been
clickable already in whatever window you originally found it in.
When you need contentLoadURL is
when someone types "Go to example.com if you want to see this".
It's also great for when
you get those horrible Facebook or Stumbleupon or Google URLs like
http://www.facebook.com/l/bfd4f/example.com/somelink
and you want just the real link (example.com/somelink),
without the added cruft.
Hacking the jar
Hooray! It turns out the offending code is in browser.js,
so it's hackable without needing to recompile all of Firefox.
You just need to unpack omni.jar,
patch browser.js, then zip up a new omni.jar.
In other words, something like this (on Ubuntu Natty,
starting in an empty directory):
Except, as I was testing this, I discovered: I could make changes and
most of the time Firefox wouldn't see them. I would put in something
obvious like alert("Hello, world");, verify that the alert
was really in omni.jar, run Firefox, click the middle mouse button and --
no alert. Where was Firefox getting the code it was actually running,
if not from omni.jar?
I'll spare you the agonizing details of the hunt and just say that
eventually I discovered that if I ran Firefox from a different profile
on the same machine, I got a different result.
It turns out that if you remove either of two files,
extensions.sqlite and XUL.mfasl,
Firefox4 will re-read the new code in omni.jar.
Removing XUL.mfasl seems to be a little safer: extensions.sqlite
contains some details of which extensions are enabled.
Of course, back up both files first before experimenting with
removing them.
Why these files are keeping a cache of code that's already in omni.jar is
anybody's guess.
The Patch: fix contentLoadURL
Oh, and the change? Mikachu came up with a cleaner fix than
mine, so this is his. It accepts partial URLs like
example.com and also bookmarklet keywords:
Over the past week I've been playing with the MeeGo ExoPC tablet,
experimenting with the various options for building programs.
One advantage MeeGo has over Android or iOS is that there are quite
a lot of language and toolkit options. If you have an existing program,
especially if it runs on Linux, you can probably
find a way to port it to MeeGo fairly without much extra coding.
But since MeeGo is still quite new, not all of the options work quite
as well as you might hope -- yet. I'm sure they'll get better, but
here's what the development climate looks like now.
The documentation turns out to be somewhat incomplete; I'll write
up a howto as a separate article.
But once you get it installed (much easier than installing,
say, Eclipse for Android), it runs nicely. You can use normal Qt, not
a specialized environment, so existing Qt programs should be quick
to port, and the IDE takes care of packaging, copying the app to your
remote device and starting it running. Very painless.
Testing apps locally isn't so easy. They're set up to use QEMU,
which only works if your development machine has hardware virtualization.
Supposedly there's a way to make this work in virtualbox (which doesn't
require hardware virtualization), but the docs don't address how.
Still, testing on the target device is easy.
Unfortunately, not many of my existing programs are C++ and Qt.
I mostly write Python, C, and Javascript/HTML web apps.
Update: I should have mentioned that QtCreator also lets you program in QML,
an XML-like language that lets you design Qt user interface and
even write application code.
Web Apps under C++ and QWebView
So what about those web apps?
Since the C++ SDK was so well supported, I figured, why not use a
QWebView so I can run my HTML and Javascript code?
Unfortunately QWebView turns out to be tricky to use, has very
few sample apps available, and so far I've been unable to get it to
work under MeeGo at all.
Nokia Web RunTime
And anyway, for pure web apps, there's a kit explicitly for that:
WRT, the Nokia Web RunTime. But it's pretty heavyweight: the official
download and documentation is all based around Eclipse (for HTML/Javascript
apps? Seriously?) and a required set of Nokia libraries that I had
trouble installing.
But WRT apps are packaged according to the
W3C Widget Packaging and
Configuration specification -- so it's probably possible to
roll your own without the big Nokia WRT package.
More research required.
Python, Qt and PySide
I have a lot of Python apps, so I was very happy to see that
Python comes already
installed on MeeGo. (Perl is also supported; Ruby theoretically
is but it's not installed by default.)
Since Qt is the officially blessed user interface,
Python-Qt seems like the way to go.
Except -- Python in MeeGo has no Qt bindings installed.
The package that provides them is called
PySide, but depending on
where you look you'll either be steered toward a lengthy source
compile on the MeeGo device, or a zypper install from someone's
personal repository.
None of that is difficult for developers, but you can't expect users
to enable experimental repositories or compile from source. Is PySide going to
be a standard part of MeeGo by the time it ships to users? Nobody's saying.
It makes me leery of putting much energy into Python-Qt development
until I hear more.
Python-GTK
A little frustrated with the options so far,
I was idly poking around an interactive Python session and typed
>>> import gtk
And it worked! Even though Qt is the supported UI toolkit and nobody talks
about GTK, it's installed on MeeGo -- complete with Python bindings.
It gave two deprecation warnings, but what's a little deprecation
among friends?
A simple test program ran just fine.
So I whipped up a starter page for
PyTopo,
made a .desktop file and icon for it, copied the three files over
with scp -- and everything worked great.
There's no standard way yet to make a MeeGo RPM from a Python program,
so that will require a little hand-fiddling, but only a little.
And sadly, python-webkit-gtk isn't included,
so this isn't a good solution for porting web apps.
I'll be keeping an eye on PySide and continuing to experiment
with PySide, WRT and C++.
But in the meantime, it's great that it's so easy to port all
my existing Python-GTK programs to MeeGo.
Are you a GIMP user or Summer of Code student who's been
wanting to get involved,
but having trouble building, or a bit intimidated by the build process?
I'll be running a session on IRC to help anyone build GIMP
on Linux, as part of the
OpenHatch "Build it"
project.
The session will take place on #gimp on irc.gimp.org (also known as
GimpNet), on Fri, Apr 15, 0300 UTC -- that's Thursday
night in the Americas. To convert to your time zone,
run this command on your local machine:
This is a time that's usually fairly quiet on #gimp -- European users
don't fret, since it's pretty
easy to get help there during more Europe-friendly time zones.
I'll hang around for at least two hours; that should be plenty of
time to build GIMP and all its prerequisites.
For folks new to IRC, note that irc.gimp.org is its own server --
this is not the #gimp channel on Freenode. You can learn more about
IRC on the LinuxChix IRC for
Beginners page, or, if you have trouble getting an IRC client
configured, try this link for
mibbit
web chat.
Note: The #gimp IRC channel was recently under attack by trolls,
and it's possible that it may not be usable at the time of the session.
In that case, I will update this blog page with the name of an
alternate channel to use, and any other necessarily details.
Preparation
If you want to get your system set up ahead of time, I've put the
instructions needed to build on Ubuntu Lucid and other older Linux
distros here:
Gimp Building
Tips (for Linux).
I might be able to offer a little help with building on Macs,
but no guarantees.
Mac and Windows users, or people running a very old Linux distro
(more than a year old) might want to consider an alternate approach:
install Virtualbox or
VMware and install Ubuntu "Natty
Narwhal" (currently still in beta) in a virtual machine.
Of course, this isn't the only time you can get help with building GIMP.
There are folks around on #gimp most of the time who are happy to
help with problems. But if you've been meaning to get started and
want a good excuse, or you've been holding off on asking for help ...
come hang out with us and try it!
Okay, a webcam is sorta cool, but it's still a pretty easy thing to do
from any laptop. I wanted to demonstrate some lower-level hardware control.
As I mentioned in the previous article, trying to run hardware directly from a plug
computer is an exercise in frustration.
So what do you do when you want to drive low-level hardware?
Use an Arduino, of course!
Add the Arduino
Happily, the sheeva.with-linux
kernels include the FTDI driver you need to talk to an Arduino.
So you can plug the Arduino to the plug computer, then let the Arduino
read the sensor and write its value to the serial port, which you
can read on the plug.
I wrote a very simple Arduino sketch to read the analog output:
lightsensor.pde.
I'm allergic to Java IDEs, so I compiled the sketch from
the commandline using this
lightsensor
Arduino Makefile. Edit the Makefile to point to wherever you've
installed the Arduino software.
Now, on the plug, I needed a Python script to read the numbers coming in on
the serial line. I ran apt-get install python-serial, then wrote
this script:
readsensor.py.
The script loops, reading the sensor and writing the output to an
HTML file called arduino.html. Visit that in a browser from
your desktop or laptop, and watch it reload and change the number as
you wave your hand or a flashlight over the photocell.
Ultrasonic rangefinder for proximity detection
Pretty cool ... if you're extremely geeky and have no life. Otherwise,
it's maybe a bit limited. But can we use this Arduino technique to
do something useful in combination with the webcam exercise?
How about an
ultrasonic
sonar rangefinder?
The rangefinder comes with a little PC board, and you have to solder
wires to it. I wanted to be able to plug and unplug -- the rangefinger
also has digital outputs and I may want to experiment with those some day.
So I soldered an 8-pin header to the board. (The rangefinder board only
has 7 holes, so I had to cut off the 8th pin on the header.)
I ran power and ground wires to 5v and Gnd on the Arduino, and a wire from
the rangefinder's analog out to the Arduino's Analog In 2. A little
heatshrink keeps the three wires together.
Then I rubber-banded the rangefinder to the front of the webcam,
and I was ready to test.
Use a sketch almost identical to the one for the light sensor:
rangefinder.pde, and its
rangefinder Arduino Makefile.
I used pin 2 so I could leave the light sensor plugged in on Pin 1.
Now I ran that same readsensor.py script, paying attention to the
numbers being printed out. I found that they generally read around 35-40
when I was sitting right in front of it (camera mounted on my laptop
screen), and more like 150-250 when I got out of the way and pointed
it across the room.
So I wrote a script,
proximity.py,
that basically does this:
if data < 45 :
if verbose :
print "Snapping photo!"
os.system("fswebcam --device /dev/video0 -S 1 output.jpg")
It also rewrites the HTML file to display the value it read from the
rangefinder, though that part isn't so important.
Put it all together, and the proximity-sensitive camera
snaps a photo any time something is right in front of it;
otherwise, it keeps displaying the last photo and doesn't snap a
new one. Sample uses: find out who's using your
computer when you're away at lunch, or run a security camera at home,
or set up a camera to snap shots of the exotic wildlife that's
visiting your feeder or research station.
You could substitute an infra-red motion sensor and use it as a
motion-operated security camera or bird feeder camera. I ordered one,
but got bogged down trying to reverse-engineer the sensor (I should
have just ordered one from Adafruit or Sparkfun).
I'm happy to say this all worked pretty well as a demo. But mostly,
it's fun to know that I can plug in virtually any sensor and collect
any sort of data I want. Adding the Arduino makes the plug computer
much more fun and useful.
I was asked to give a talk on plug computers ("sheevaplugs") at a local LUG.
I thought at first I didn't have much to say about them, but after
writing down an outline I realized that wouldn't be a problem.
But plugs aren't that interesting unless you have something fun to
demonstrate. Sure, plugs can run a web server or a file server, but
that doesn't make for a very fun demo -- "woo, look, I'm loading a
web page!" What's more fun? Hardware.
The first step to running any hardware off a plug computer is to get
an upgraded kernel. The kernels that come with these things can't
drive any useful external gizmos.
I've often lamented how the folks who build plug computers
seem oblivious to the fact that a large part of their potential customer
base wants to drive hardware -- temperature and light sensors,
weather stations, garage door openers, servos, whatever.
By not including drivers for GPIO, 1-wire, video and so forth,
they're shutting out anyone who doesn't feel up to building a kernel.
And make no mistake: building a kernel for a sheevaplug is quite a bit
harder than building one for your laptop or workstation. Some of the
hardware isn't supported by fully open source drivers, and most Linux
distros don't offer a cross-compiler that can do the job.
I covered some of the issues in my LinuxPlanet article on
Cross-compiling
Custom Kernels for Plug Computers.
Fortunately, the sheeva.with-linux
kernels include a webcam driver. That seemed like a good start for a demo.
A simple webcam demo
My demo plug is running Debian Squeeze, which has a wealth of webcam
software available. Although there are lots of packages to stream live
video to a web server, they all have a lot of requirements, so
I settled for a simple snapshot program, fswebcam.
The command I needed to snap a photo is:
fswebcam --device /dev/video0 -S 1 output.jpeg
The -S 1 skips a frame to account for the fact that my
cheap and crappy webcam (a Gearhead
Sound FX) tends to return wildly striped green and purple images otherwise.
So I run that in a loop, something like:
while /bin/true; do
fswebcam --device /dev/video0 -S 1 output.jpeg
sleep 2
done
Now that I have a continuously updating image,
I need to run some sort of web server on the plug.
Plugs are perfectly capable of running apache or lighttpd or whatever
server you favor. But for this simple demo, I used
a tiny Python server script:
simpleserver.py.
Then all I have to do is a simple web page that includes <img
src="output.jpg"> and point my computer at http://192.168.1.102:8080
to see the image. Either refresh the page to see the image update, or
add something like
<meta http-equiv="Refresh" content='2'>
to make it refresh
The next parts of the demo added an Arduino to the mix. But this is
already getting long and I'm out of time ... so the second part of
this demo will follow in a day or two.
Twitter is a bit frustrating when you try to have
conversations there. You say something, then an hour later, someone
replies to you (by making a tweet that includes your Twitter @handle).
If you're away from your computer, or don't happen to be watching
it with an eagle eye right then -- that's it, you'll never see it again.
Some Twitter programs alert you to @ references even if they're old,
but many programs don't.
Wouldn't it be nice if you could be notified regularly if anyone
replied to your tweets, or mentioned you?
Happily, you can. The Twitter API is fairly simple; I wrote
a Python function a while back to do searches in my Twitter app "twit",
based on a code snippet I originally cribbed from Gwibber.
But if you take out all the user interface code from twit and
use just the simple JSON code, you get a nice short app.
The full script is here:
twitref,
but the essence of it is this:
import sys, simplejson, urllib, urllib2
def get_search_data(query):
s = simplejson.loads(urllib2.urlopen(
urllib2.Request("http://search.twitter.com/search.json",
urllib.urlencode({"q": query}))).read())
return s
def json_search(query):
for data in get_search_data(query)["results"]:
yield data
if __name__ == "__main__" :
for searchterm in sys.argv[1:] :
print "**** Tweets containing", searchterm
statuses = json_search(searchterm)
for st in statuses :
print st['created_at']
print "<%s> %s" % (st['from_user'], st['text'])
print ""
You can run twitref @yourname from the commandline
now and then. You can even call it as a cron job and mail
yourself the output, if you want to make sure you see replies.
Of course, you can use it to search for other patterns too,
like twitref #vss or twitref #scale9x.
You'll need the simplejson Python library, which most distros offer
as a package; on Ubuntu, install python-simplejson.
It's unclear how long any of this will continue to be supported, since
Twitter recently announced that they disapprove of third-party apps
using their API.
Oh, well ... if Twitter stops allowing outside apps, I'm not sure
how interested I'll be in continuing to use it.
On the other hand, their original announcement on Google Groups seems
to have been removed -- I was going to link to it here and discovered
it was no longer there. So maybe Twitter is listening to the outcry and
re-thinking their position.
Of course, to demonstrate a graphing package I needed some data.
So I decided to plot some stats parsed from my Postfix mail log file.
We bounce a lot of mail (mostly spam but some false positives from
mis-configured email servers) that comes in with bogus HELO
addresses. So I thought I'd take a graphical look at the
geographical sources of those messages.
The majority were from IPs that weren't identifiable at all --
no reverse DNS info. But after that, the vast majority turned out
to be, surprisingly, from .il (Israel) and .br (Brazil).
Surprised me! What fun to get useful and interesting data when I thought
I was just looking for samples for an article.
Three different numbers are chosen at random, and one is written on
each of three slips of paper. The slips are then placed face down on
the table. The objective is to choose the slip upon which is written
the largest number.
Here are the rules: You can turn over any slip of paper and look at
the amount written on it. If for any reason you think this is the
largest, you're done; you keep it. Otherwise you discard it and turn
over a second slip. Again, if you think this is the one with the
biggest number, you keep that one and the game is over. If you
don't, you discard that one too.
What are the odds of winning? The obvious answer is one in three,
but you can do better than that. After thinking about it a little
I figured out the strategy pretty quickly (I won't spoil it here;
follow the link above to see the answer). But the question was:
how often does the correct strategy give you the answer?
It made for a good "things to think about when trying to fall
asleep" insomnia game. And I mostly convinced myself that the
answer was 50%. But probability problems are tricky beasts
(witness the Monty
Hall Problem, which even professional mathematicians got wrong)
and I wasn't confident about it. Even after hearing Click and Clack
describe the answer on this week's show, asserting that the answer was 50%,
I still wanted to prove it to myself.
Why not write a simple program? That way I could run lots of
trials and see if the strategy wins 50% of the time.
So here's my silly Python program:
#! /usr/bin/env python
# Cartalk puzzler Feb 2011
import random, time
random.seed()
tot = 0
wins = 0
while True:
# pick 3 numbers:
n1 = random.randint(0, 100)
n2 = random.randint(0, 100)
n3 = random.randint(0, 100)
# Always look at but discard the first number.
# If the second number is greater than the first, stick with it;
# otherwise choose the third number.
if n2 > n1 :
final = n2
else :
final = n3
biggest = max(n1, n2, n3)
win = (final == biggest)
tot += 1
if win :
wins += 1
print "%4d %4d %4d %10d %10s %6d/%-6d = %10d%%" % (n1, n2, n3, final,
str(win),
wins, tot,
int(wins*100/tot))
if tot % 1000 == 0:
print "(%d ...)" % tot
time.sleep(1)
It chooses numbers between 0 and 100, for no particular reason;
I could randomize that, but it wouldn't matter to the result.
I made it print out all the outcomes, but pause for a second after
every thousand trials ... otherwise the text scrolls too fast to read.
And indeed, the answer converges very rapidly to 50%. Hurray!
After I wrote the script, I checked Car Talk's website.
They have a good breakdown of all the possible outcomes and how they
map to a probability. Of course, I could have checked that first,
before writing the program.
But I was thinking about this in the car while driving home, with no
access to the web ... and besides, isn't it always more fun to prove
something to yourself than to take someone else's word for it?
While writing a blog post on GIMP's confusing Auto button (to be
posted soon), I needed some arrows, and discovered a bug in my
Arrow Designer script when making arrows that are mostly vertical.
So I fixed it. You can get the new Arrow Designer 0.5 on my
GIMP
Arrow Designer page.
It's purely a coincidence that I discovered this a week before
SCALE, where I'll be speaking on
Writing
GIMP Scripts and Plug-Ins.
Arrow Designer is one of my showpieces for making interactive
plug-ins with GIMP-Python, so I'm glad I noticed the bug when I did.
I've been enjoying my Android tablet e-reader for a couple of months
now ... and it's made me realize some of the shortcomings in FeedMe.
So of course I've been making changes along the way -- quite a few
of them, from handling multiple output file types (html, plucker,
ePub or FictionBook) to smarter handling of start, end and skip
patterns to a different format of the output directory.
It's been fairly solid for a few weeks now, so it's time to release
... FeedMe 0.7.
Eclipse has been driving me batty with all the extra spaces it adds
everywhere -- blank lines all have indents on them, and lots of
code lines have extra spaces randomly tacked on to the end.
I sure wouldn't want to share files like that with coworkers
or post them as open source.
I found lots of suggestions on the web for eliminating extra whitespace,
and several places to configure this within Eclipse,
but most of them don't do anything.
Here's the one that actually worked:
Window->Preferences
Jave->Editor->Save Actions
Enable Perform the selected actions on save.
Enable Additional actions.
Click Configure.
In the Code Organizing tab., enable
Remove trailing whitespace for All lines.
Review all the other options there, since it will all happen automatically
whenever you save -- make sure there isn't anything there you
don't want.
Dismiss the Configure window.
Review the other options under Save Actions, since these will
also happen automatically now.
Don't forget to click Apply in the Save Actions
preference page.
Whew! There are other places to set this, in various Code style
and Cleanup options, but all all the others require taking some
action periodically,
like Source->Clean up...
By the way, while you're changing whitespace preferences,
you may also want the
Insert spaces for tabs preference under
General->Editors->Text Editors.
An easy way to check whether you've succeeded in exorcising the
spaces -- eclipse doesn't show them all, even when you tell it to --
is to :set hlsearch in vim, then search for a space.
(Here are some other ways to show
spaces in vim.) In emacs, you can M-x set-variable
show-trailing-whitespace to true, but that
doesn't show spaces on blank lines; for that you might want
whitespace.el
or similar packages.
At work, I'm testing some web programming on a server where we use a
shared account -- everybody logs in as the same user. That wouldn't
be a problem, except nearly all Linuxes are set up to use colors in
programs like ls and vim that are only readable against a dark background.
I prefer a light background (not white) for my terminal windows.
How, then, can I set things up so that both dark- and light-backgrounded
people can use the account? I could set up a script that would set up
a different set of aliases and configuration files, like when I
changed
my vim colors.
Better, I could fix all of them at once by
changing my terminal's idea of colors -- so when the remote machine
thinks it's feeding me a light color, I see a dark one.
I use xterm, which has an easy way of setting colors: it has a list
of 16 colors defined in X resources. So I can change them in ~/.Xdefaults.
That's all very well. But first I needed a way of seeing the existing
colors, so I knew what needed changing, and of testing my changes.
Script to show all terminal colors
I thought I remembered once seeing a program to display terminal colors,
but now that I needed one, I couldn't find it.
Surely it should be trivial to write. Just find the
escape sequences and write a script to substitute 0 through 15, right?
Except finding the escape sequences turned out to be harder than I
expected. Sure, I found them -- lots of them, pages that
conflicted with each other, most giving sequences that
didn't do anything visible in my xterm.
Eventually I used script to capture output from a vim session
to see what it used. It used <ESC>[38;5;Nm to set color
N, and <ESC>[m to reset to the default color.
This more or less agreed Wikipedia's
ANSI
escape code page, which says <ESC>[38;5; does "Set xterm-256
text coloor" with a note "Dubious - discuss". The discussion says this
isn't very standard. That page also mentions the simpler sequence
<ESC>[0;Nm to set the
first 8 colors.
Okay, so why not write a script that shows both? Like this:
#! /usr/bin/env python
# Display the colors available in a terminal.
print "16-color mode:"
for color in range(0, 16) :
for i in range(0, 3) :
print "\033[0;%sm%02s\033[m" % (str(color + 30), str(color)),
print
# Programs like ls and vim use the first 16 colors of the 256-color palette.
print "256-color mode:"
for color in range(0, 256) :
for i in range(0, 3) :
print "\033[38;5;%sm%03s\033[m" % (str(color), str(color)),
print
Voilà! That shows the 8 colors I needed to see what vim and ls
were doing, plus a lovely rainbow of other possible colors in case I ever
want to do any serious ASCII graphics in my terminal.
Changing the X resources
The next step was to change the X resources. I started
by looking for where the current resources were set, and found them
in /etc/X11/app-defaults/XTerm-color:
$ grep color /etc/X11/app-defaults/XTerm-color
irrelevant stuff snipped
*VT100*color0: black
*VT100*color1: red3
*VT100*color2: green3
*VT100*color3: yellow3
*VT100*color4: blue2
*VT100*color5: magenta3
*VT100*color6: cyan3
*VT100*color7: gray90
*VT100*color8: gray50
*VT100*color9: red
*VT100*color10: green
*VT100*color11: yellow
*VT100*color12: rgb:5c/5c/ff
*VT100*color13: magenta
*VT100*color14: cyan
*VT100*color15: white
! Disclaimer: there are no standard colors used in terminal emulation.
! The choice for color4 and color12 is a tradeoff between contrast, depending
! on whether they are used for text or backgrounds. Note that either color4 or
! color12 would be used for text, while only color4 would be used for a
! Originally color4/color12 were set to the names blue3/blue
!*VT100*color4: blue3
!*VT100*color12: blue
!*VT100*color4: DodgerBlue1
!*VT100*color12: SteelBlue1
So all I needed to do was take the ones that don't show up well --
yellow, green and so forth -- and change them to colors that work
better, choosing from the color names in /etc/X11/rgb.txt
or my own RGB values. So I added lines like this to my ~/.Xdefaults:
!! color2 was green3
*VT100*color2: green4
!! color8 was gray50
*VT100*color8: gray30
!! color10 was green
*VT100*color10: rgb:00/aa/00
!! color11 was yellow
*VT100*color11: dark orange
!! color14 was cyan
*VT100*color14: dark cyan
... and so on.
Now I can share accounts, and
I no longer have to curse at those default ls and vim settings!
Update: Tip from Mikachu: ctlseqs.txt
is an excellent reference on terminal control sequences.
I had a nice relaxing holiday season. A little too relaxing -- I
didn't get much hacking done, and spent more time fighting with
things that didn't work than making progress fixing things.
But I did spend quite a bit of time with my laptop,
currently running Arch Linux,
trying to get the fonts to work as well as they do in Ubuntu.
I don't have a definite solution yet to my Arch font issues,
but all the fiddling with fonts did lead me to realize that
I needed an easier way to preview specific fonts in bold.
So I added Bold and Italic buttons to
fontasia,
and called it Fontasia 0.5. I'm finding it quite handy for previewing
all my fixed-width fonts while trying to find one emacs can display.
I wrote yesterday about
my quest
for an app for reading news feeds
and other timely information from the web.
And how existing ebook readers didn't meet that need.
That meant I would have to write something.
Android development is done in Java,
using Eclipse as an IDE. Let me just state up front that (a) I dislike
Java (and have forgotten most of what I once knew about it) and
(b) I hate IDEs -- they make you use their crippled editor instead of
your own, they control what you can put where on the screen, and
they're always popping up windows that get in your way.
Okay, not an auspicious beginning. But let's try to be open-minded,
follow the instructions and see what happens.
I installed Eclipse from eclipse.org
after being advised that the version on Ubuntu is out of date. Then I
installed all the various Android plug-ins and SDKs and set them up (there
is no single page that lists all the steps, so I did a lot of
googling). It took maybe an hour or so to get it all installed
to the point where it could produce its "Hello world".
And then ... wow! Hello world worked right off the bat, and the
emulator worked for basic testing.
Hmm, okay ... how about if we use HTML as a format ... is there
a standard HTML display component? Sure enough -- I added a WebView to my
app and right away I had a working HTML reader. Okay, how about a row
of buttons and a status bar on top? Sure, no problem.
The standard Android online docs aren't great -- they're a wonderful
example of how to write seemingly comprehensive documentation that
somehow manages to tell you nothing of what you actually need to
know. But that's not as bad as it sounds, because there are lots of
forums and tutorials to fill in the gaps.
Stack Overflow
is particularly good for Android tips.
And yes, I did some swearing at Eclipse and spent too much time
googling how to disable features, like the "Content Assist" that
periodically freezes the whole UI for a minute or so in the middle of
your typing a line of code, while it thinks about some unhelpful and
irrelevant hints to offer you in a popup.
Turn it off in the prefs under Java/Editor.
(Eclipse's actual useful hints, like the ones you get when you
hover over something that's red because of an error, will still work.
I found them very helpful.)
More specifically: Java/Editor/Content Assist/Hovers, and turn off
Combined Hover and maybe anything else that happens without a
modifier key. You probably also want to turn off Enable Auto
Activation under Java/Editor/Content Assist. And possibly others
-- I kept turning things off until the popups and delays went away,
and I haven't found anything explaining how all these parameters relate.
Okay, so there were snags, and it's frustrating how there are almost no
open source apps for this open source OS. (Yes, my app will be.)
But here's the thing: in about 4 days, starting from nothing,
I had a little RSS reader that did everything I needed.
I've been adding features since then. Android doesn't have a reasonable
battery status monitor? Fine, my reader can show battery
percentage in the status bar. Android doesn't dim the screen enough?
Fine, I can dim it further inside the application (an idea borrowed from
Aldiko).
After less than a week of work I have an RSS reader that's better than my
Palms running Plucker ever were. And that says a lot about the ease of
the Android programming environment. I'm impressed!
Update: The source, an apk, and a brief discussion of how I use
my feed reader are now here:
FeedViewer.
I've been doing some Android development, using the standard Eclipse
development tools. A few days ago, I pasted some code that included
a comment about different Android versions, and got a surprise:
What do you think -- should I change all the "Android" references
to "Undried"?
Last week I found myself writing another article that includes code
snippets in HTML.
So what, you ask? The problem is, when you're writing articles in HTML,
every time you include a code snippet inside a <pre> tag you
invariably forget that special characters like < > & have
special meanings in HTML, and must be escaped. Every < has to
change to <, and so forth, after you paste the code.
In vi/vim, replacing characters is straightforward. But I usually
write longer articles in emacs, for various unimportant reasons,
and although emacs has global replace, it only works from wherever
you are now (called "point" in emacs lingo) to the end of the file.
So if you're trying to fix something you pasted in the middle of the
article, you can't do it with normal emacs replace.
Surely this is a wheel that has already been re-invented a thousand
times, I thought! But googling and asking emacs experts turned up nothing.
Looks like I'd have to write it.
And that turned out to be more difficult than I expected, for the same
reason: emacs replace-string works the same way from a
program as it does interactively, and replaces from point to the end
of the file, and there's no way to restrict it to a more limited range.
Several helpful people on #emacs chimed in with ideas, but most of
them didn't pan out. But ggole knew a way to do it that was both
clean and reliable (thanks!).
Here's the elisp function I ended up with.
It uses save-excursion
to put the cursor back where it started before you ran the function,
narrow-to-region to make replace-string work
only on the region, and save-restriction get rid of that
narrow-to-region after we're done. Nice!
My last entry mentioned some work I'd done to one of my mapping programs,
Ellie, to gather statistics from the track logs I get from my Garmin GPS.
In the course of working on Ellie, I discovered something
phenomenally silly about the GPX files from my Garmin Vista CX,
as uploaded with gpsbabel.
Track log points, quite reasonably, have time stamps in "Zulu time"
(essentially the same as GMT, give or take some fraction of a second).
They look like this:
But the waypoints you set for specific points of interest, even if
they're in the same GPX file, have timestamps that have no time zone at all.
They look like this:
Notice the waypoint's time isn't actually in a time field -- it's
duplicated in two fields, cmt (comment) and desc (description).
So it's not really intended to be a time stamp -- but it sure would be
handy if you could use it as one.
You might be able to correlate waypoints with track points by
comparing coordinates ... unless you spent more than an hour hanging
around a particular location, or came back several hours later (perhaps
starting and ending your hike at the same place). In that case ...
you'd better know what the local time zone was, including daylight
savings time.
What a silly omission, considering that the GPS obviously already knows
the Zulu time and could just as easily use that!
On our recent Mojave trip, as usual I spent some of the evenings
reviewing maps and track logs from some of the neat places we explored.
There isn't really any existing open source program for offline
mapping, something that works even when you don't have a network.
So long ago, I wrote Pytopo,
a little program that can take map tiles from a Windows program called
Topo! (or tiles you generate yourself somehow) and let you navigate
around in that map.
But in the last few years, a wonderful new source of map tiles has
become available: OpenStreetMap.
On my last desert trip, I whipped up some code to show OSM tiles, but
a lot of the code was hacky and empirical because I couldn't find any
documentation for details like the tile naming scheme.
Well, that's changed. Upon returning to civilization I discovered
there's now a wonderful page explaining the
Slippy
map tilenames very clearly, with sample code and everything.
And that was the missing piece -- from there, all the things I'd
been missing in pytopo came together, and now it's a useful
self-contained mapping script that can download its own tiles, and
cache them so that when you lose net access, your maps don't disappear
along with everything else.
Pytopo can show GPS track logs and waypoints, so you can see where you
went as well as where you might want to go, and whether that road off
to the right actually would have connected with where you thought you
were heading.
Most of the pytopo work came after returning from the desert, when I
was able to google and find that OSM tile naming page. But while still
out there and with no access to the web, I wanted to review the track
logs from some of our hikes and see how much climbing we'd done.
I have a simple package for plotting elevation from track logs,
called Ellie.
But when I ran it, I discovered that I'd never gotten around to
installing the pylab Python plotting package (say that three times
fast!) on this laptop.
No hope of installing the package without a net ... so instead, I
tweaked Ellie so that so that without pylab you can still print out
statistics like total climb. While I was at it I added total distance,
time spent moving and time spent stopped. Not a big deal, but it gave
me the numbers I wanted. It's available as ellie 0.3.
Part II of my CouchDB tutorial is out at Linux Planet.
In it, I use Python and CouchDB to write a simple application
that keeps track of which restaurants you've been to recently,
and to suggest new places to eat where you haven't been.
Dave was using some old vacation photos to test filesystem performance,
and that made me realize that I had beautiful photos from the same
trip that I hadn't yet turned into desktop backgrounds.
Sometimes I think that my
GIMP Wallpaper
script is the most useful of the GIMP plug-ins I've written.
It's such a simple thing ... but I bet I use it more than any of
my other plug-ins, and since I normally make backgrounds for at least
two resolutions (my 1680x1050 desktop and my 1366x768 laptop),
it certainly saves me a lot of time and hassle.
But an hour into my background-making, I started to have nagging doubts.
I wasn't renaming these images, just keeping the original filenames
from the camera, like pict0828.jpg. What if if some of these
were overwriting images of the same name?
The one thing my script doesn't do is check for that, and
gimp_file_save doesn't pop up any warnings.
I've always meant to add a check for it.
Of course, once the doubts started, I had to stop generating backgrounds
and start generating code. And I'm happy with the result:
wallpaper-0.4.py warns and won't let you save over an old background
image, but keeps all the logic in one dialog rather than popping up
extra warnings.
Now I can generate backgrounds without worrying that I'm stomping on
earlier ones.
I've been learning CouchDB, the hot NoSQL database, as part of my
new job. It's interesting -- a very different mindset compared to
classic databases like MySQL.
There's a fairly good Python package for it, python-couchdb ...
but the documentation is somewhat incomplete and there's very little
else written about it, and virtually no sample code to steal.
That makes it a perfect topic for a Linux Planet tutorial!
So here it is, Part 1:
I have a rather fun application for the database I introduce in the
article, but you'll have to wait until Part 2, two weeks from now,
to see the details.
A couple of weeks ago I posted about
fontasia,
my new font-chooser app.
It's gone through a couple of revisions since then, and Mikael Magnusson
contributed several excellent improvements, like being able to
render each font in the font list.
I'd been holding off on posting 0.3, hoping to have time to do
something about the font buttons -- they really need to be smaller,
so there's space for more categories. But between a new job and
several other commitments, I haven't had time to implement that.
And the fancy font list is so cool it really ought to be shared.
We were talking about fonts again on IRC, and how there really isn't
any decent font viewer on Linux that lets you group fonts into categories.
Any time you need to choose a font -- perhaps you know you need one
that's fixed-width, script, cartoony, western-themed --
you have to go through your entire font list, clicking
one by one on hundreds of fonts and saving the relevant ones somehow
so you can compare them later. If you have a lot of fonts installed,
it can take an hour or more to choose the right font for a project.
There's a program called fontypython that does some font categorization,
but it's hard to use: it doesn't operate on your installed fonts, only
on fonts you copy into a special directory. I never quite understood
that; I want to categorize the fonts I can actually use on my system.
I've been wanting to write a font categorizer for a long time, but
I always trip up on finding documentation on getting Python to render fonts.
But this time, when I googled, I found jan bodnar's
ZetCode
Pango tutorial, which gave me all I needed and I was off and running.
Fontasia is initially a font viewer. It shows all your fonts in a list
on the left, with a preview on the right. But it also lets you add
categories: just type the category name in the box and click
Add category and a button for that category will appear,
with the current font added to it. A font can be in multiple categories.
Once you've categorized your fonts, a menu at the top of the window
lets you show just the fonts in a particular category. So if you're
working on a project that needs a Western-style font, show that
category and you'll see only relevant fonts.
You can also show only the fonts you've categorized -- that way you can
exclude fonts you never use -- I don't speak Tamil or Urdu so I don't
really need to see those fonts when I'm choosing a font. Or you can
show only the uncategorized fonts: this is useful when you add
some new fonts to your system and need to go through them and categorize
them.
I'm excited about fontasia. It's only a few days old and already used
it several times for real-world font selection problems.
I found that CHDK scripting wasn't quite as good as I'd hoped -- some
of the functions, especially the aperture and shutter setting, were
quite flaky on my A540 so it really didn't work to write a bracketing
script. But it's fantastic for simple tasks like time-lapse photography,
or taking a series of shots like the Grass Roots Mapping folk do.
If you're at OSCON and you like scripting and photos, check out my
session on Thursday afternoon at 4:30:
Writing
GIMP Plug-ins and Scripts, in which I'll walk through several GIMP
scripts in Python and Script-Fu and show some little-known tricks
you can do with Python plug-ins.
How many times have you wanted an easy way of making arrows in GIMP?
I need arrows all the time, for screenshots and diagrams. And there
really isn't any easy way to do that in GIMP. There's a script-fu for
making arrows in the Plug-in registry,
but it's fiddly and always takes quite a few iterations to get it right.
More often, I use a collection of arrow brushes I downloaded from somewhere
-- I can't remember exactly where I got my collection, but there are
lots of options if you google gimp arrow brushes -- then
use the free rotate tool to rotate the arrow in the right direction.
The topic of arrows came up again on #gimp yesterday, and Alexia Death
mentioned her script-fu in
GIMP Fx Foundary
that "abuses the selection" to make shapes, like stars and polygons.
She suggested that it would be easy to make arrows the same way, using
the current selection as a guide to where the arrow should go.
And that got me thinking about Joao Bueno's neat Python plug-in demo that
watches the size of the selection and updates a dialog every time the
selection changes. Why not write an interactive Python script that
monitors the selection and lets you change the arrow by changing the
size of the selection, while fine-tuning the shape and size of the
arrowhead interactively via a dialog?
Of course I had to write it. And it works great! I wish I'd written
this five years ago.
This will also make a great demo for my OSCON 2010 talk on
Writing
GIMP Scripts and Plug-ins, Thursday July 22. I wish I'd had it for
Libre Graphics Meeting last month.
I needed a way to send the output of a Python program to two places
simultaneously: print it on-screen, and save it to a file.
Normally I'd use the Linux command tee for that:
prog | tee prog.out saves a copy of the output to the
file prog.out as well as printing it. That worked fine until
I added something that needed to prompt the user for an answer.
That doesn't work when you're piping through tee: the output gets
buffered and doesn't show up when you need it to, even if you try
to flush() it explicitly.
I investigated shell-based solutions: the output I need is on
sterr, while Python's raw_input() user prompt uses stdout, so
if I could get the shell to send stderr through tee without stdout,
that would have worked. My preferred shell, tcsh, can't do this at all,
but bash supposedly can. But the best examples I could find on the
web, like the arcane
prog 2>&1 >&3 3>&- | tee prog.out 3>&-
didn't work.
I considered using /dev/tty or opening a pty, but those calls only work
on Linux and Unix and the program is otherwise cross-platform.
What I really wanted was a class that acts like a standard
Python file object,
but when you write to it it writes to two places: the log file and stderr.
I am greatly indebted to KirkMcDonald of #python for finding the problem.
In the Python source implementing >>,
PyFile_WriteObject (line 2447) checks the object's type, and if it's
subclassed from the built-in file object, it writes
directly to the object's fd instead of calling
write().
The solution is to use composition rather than inheritance. Don't make your
file-like class inherit from file, but instead include a
file object inside it. Like this:
And it works! print >>sys.stderr, "Hello, world" now
goes to the file as well as stderr, and raw_input still
works to prompt the user for input.
In general, I'm told, it's not safe to inherit from
Python's built-in objects like file, because they tend
to make assumptions instead of making virtual calls to your
overloaded methods. What happened here will happen for other objects too.
So use composition instead when extending Python's built-in types.
I spent a morning wrestling with git after writing a minor GIMP fix
that I wanted to check in.
Deceptively simple ideas, like "Check the git log to see the expected
format of check-in messages", turned out to be easier said than done.
Part of the problem was git's default colors: colors calculated to be
invisible to anyone using a terminal with dark text on a light background.
And that sent me down the perilous path of git configuration.
git-config
does have a manual page. But it lacks detail: you can't get
from there to knowing what to change so that the first line of commits
in git log doesn't show up yellow.
But that's okay, thought I: all I need to do is list the default
settings, then change anything that's a light color like yellow to
a darker color. Easy, right?
Well, no. It turns out there's no way to get the default settings --
because they aren't part of git's config; they're hardwired into the
C code.
But you can find most of them with a
seach
for GIT_COLOR in the source.
The most useful lines are these the ones in diff.c, builtin-branch.c and
wt-status.c.
gitconfig
The next step is to translate those C lines to git preferences,
something you can put in a .gitconfig.
Here's a list of all the colors mentioned in the man page,
and their default values -- I used "normal" for grep and
interactive where I wasn't sure of the defaults.
[color "diff"]
plain = normal
meta = bold
frag = cyan
old = red
new = green
commit = yellow
whitespace = normal red
[color "branch"]
current = green
local = normal
remote = red
plain = normal
[color "status"]
header = normal
added = red
updated = green
changed = red
untracked = red
nobranch = red
[color "grep"]
match = normal
[color "interactive"]
prompt = normal
header = normal
help = normal
error = normal
The syntax and colors are fairly clearly explained in the manual:
allowable colors are normal, black, red, green,
yellow, blue, magenta, cyan and white. After the foreground color,
you can optionally list a background color. You can also list an
attribute, chosen from bold, dim, ul, blink and reverse --
only one at a time, no combining of attributes.
So if you really wanted to, you could say something like
[color "status"]
header = normal blink
added = magenta yellow
updated = green reverse
changed = red bold
untracked = blue white
nobranch = red white bold
Minimal changes for light backgrounds
What's the minimum you need to get everything readable?
On the light grey background I use, I needed to change the yellow, cyan
and green entries:
[color "diff"]
frag = cyan
new = green
commit = yellow
[color "branch"]
current = green
[color "status"]
updated = green
Disclaimer: I haven't tested all these settings -- because I haven't
yet figured out where all of them apply. That's another area where the
manual is a bit short on detail ...
I've written in the past about Python GUI programming using the GTK and
Tk toolkits, and several KDE fans felt that I was slighting the much
nicer looking Qt.
I didn't want to dwell on it in the article (and didn't have space anyway),
but pyqt turned out to be a bit of a pain.
There's no official documentation -- or at least nothing
that's obviously official -- and a lot of
the examples on google are out of date because of API changes.
None of the tutorial examples explain much, and they never demonstrate
the practical features I'd want to do in a real app.
It was surprisingly hard to come up with an application idea
that worked well, looked good and was still easy to explain.
And don't get me started on this whole "Slots and signals are
revolutionarily different even though they look just like the callbacks
every other toolkit has used for the last three decades" meme.
I'm sure there is a subtle technical difference -- but if there's
a difference that matters to the average UI programmer, their
documentation sure doesn't make it clear.
All that aside, PyQt (and Qt in general) does produce very pretty apps
and is worth trying for that reason.
The suit images in the article are adapted from some suits I found on
Wikimedia Commons
(the "Naipe" set).
I wanted them to look more 3-dimensional, so I applied my
blobipy GIMP
script as well as scaling and resizing them.
I really liked those shiny-looking Tango heart and spade emblems (also
on the Wikimedia Commons page) but I couldn't find a diamond or club
to match.
The poker program I wrote has menus and a second round of dealing,
where you can mark off the cards you want to keep.
I couldn't fit all that in a 700-word article, but
the complete program is available here:
qpoker.py
or you can get it in a tarball along with the suit images at
qpoker.tar.gz.
My Linux Planet article last week was on
printing
pretty calendars.
But I hit one bug in Photo
Calendar.
It had a HTML file chooser for picking an image ...
and when I chose an image and clicked Select to use it.
it got the pathname wrong every time.
I poked into the code (Photo Calendar's code turned out to be
exceptionally clean and well documented) and found that it was
expecting to get the pathname from the file input element's
value attribute. But input.File.value was just
returning the filename, foo.jpg, instead of the full pathname,
/home/user/Images/yosemite/foo.jpg. So when the app tried
to make it into a file:/// URL, it ended up pointing to the
wrong place.
It turned out the cause was a
security
change in Firefox 3. The issue: it's considered a security
hole to expose full pathnames on your computer to Javascript code
coming from someone else's server. The Javascript could give bad
guys access to information about the directory structures on your disk.
That's a perfectly reasonable concern, and it makes sense to consider
it as a security hole.
The problem is that this happens even when you're running a local app
on your local disk. Programs written in any other language and toolkit
-- a Python program using pygtk, say, or a C++ Qt program -- have
access to the directories on your disk, but you can't use Javascript
inside Firefox to do the same thing.
The only ways to make an exception seems to be an elaborate procedure
requiring the user to change settings in about:config.
Not too helpful.
Perhaps this is even reasonable, given how common
cross-site scripting bugs have been in browsers lately -- maybe
running a local script really is a security risk if you have other
tabs active.
But it leaves us with the problem of what to do about apps that need
to do things like choose a local image file, then display it.
And it turns out there is: a data URL. Take the entire contents of
the file (ouch) and create a URL out of those contents, then set the
src attribute of the image to that.
Of course, that makes for a long, horrifying, unreadable URL --
but the user never has to see that part.
I suspect it's also horribly memory intensive -- the image has to be
loaded into memory anyway, to display it, but is Firefox also
translating all of that to a URL-legal syntax? Obviously, any real
app using this technique had better keep an eye on memory consumption.
But meanwhile, it fixes Photo Calendar's file button.
We just had the second earthquake in two days, and I was chatting with
someone about past earthquakes and wanted to measure the distance to
some local landmarks. So I fired up
PyTopo as the easiest way
to do that. Click on one point, click on a second point and it prints
distance and bearing from the first point to the second.
Except it didn't. In fact, clicks weren't working at all. And although
I have hacked a bit on parts of pytopo (the most recent project was
trying to get scaling working properly in tiles imported from OpenStreetMap),
the click handling isn't something I've touched in quite a while.
It turned out that there's a regression in PyGTK: mouse button release
events now need you to set an event mask for button presses as well as
button releases. You need both, for some reason. So you now need code
that looks like this:
drawing_area.connect("button-release-event", button_event)
drawing_area.set_events(gtk.gdk.EXPOSURE_MASK |
# next line wasn't needed before:
gtk.gdk.BUTTON_PRESS_MASK |
gtk.gdk.BUTTON_RELEASE_MASK )
An easy fix ... once you find it.
I filed
bug 606453
to see whether the regression was intentional.
I've checked in the fix to the
PyTopo
svn repository on Google Code.
It's so nice having a public source code repository like that!
I'm planning to move Pho to Google Code soon.
While debugging Javascript, I've occasionally come across references
to a useful function called console.log. Supposedly you
can log errors with a line like:
Since the only native way of logging debug statements in Javascript
is with a pop-up alert() box, having a less obtrusive way to print
is something any JS programmer could use.
The catch? It didn't do anything -- except print
console is not defined.
Today a friend was crowing about how wonderful Javascript debugging
was in Google's Chrome browser -- because it has functions like
console.log.
After some searching and poking around, we determined that Firefox
also has console.log -- it's just well hidden and a bit
hard to get going.
First, you need the Firebug extension. If you're developing Javascript,
you probably already have it. If not, you need it.
Run Firebug and click to the Console tab. Now click on the
tiny arrow that shows up at the right edge of that tab, as shown.
Turns out there's a whole menu of options under there -- one of
which is Enabled.
But wait, that's not all. In my case, the console was already
Enabled according to the menu. To get the console working,
I had to
Disable the console
Re-enable it
Shift-reload the page being debugged
My friend also said that if she didn't enable the console in Firebug,
then her script died when she called console.log.
That didn't happen for me -- all that happened was that I got error
messages in the error console (the one accessed from Firefox's
Tools menu, different from the Firebug console). But it's
a good idea to check for its existence if you're going to use
debugging statements in your code. Like this:
Continuing the discussion of those funny characters you sometimes
see in email or on web pages, today's Linux Planet article
discusses how to convert and handle encoding errors, using
Python or the command-line tool recode:
I almost always write my
presentation
slides using HTML. Usually I use Firefox to present them; it's
the browser I normally run, so I know it's installd and the slides
all work there. But there are several disadvantages to using Firefox:
In fullscreen mode, it has a small "minimized urlbar" at the
top of the screen that I've never figured out to banish -- not only
is it visible to users, but it also messes up the geometry of
the slides (they have to be 762 pixels high rather than 768);
It's very heavyweight, bad when using a mini laptop or netbook;
Any personal browsing preferences, like no-animation,
flashblock or noscript, apply to slides too unless explicitly
disabled, which I've forgotten to do more than once before a talk.
Last year, when I was researching lightweight browsers, one of the
ones that impressed me most was something I didn't expect: the demo
app that comes with
pywebkitgtk
(package python-webkit on Ubuntu).
In just a few lines of Python, you can create your own browser with
any UI you like, with a fully functional content area.
Their current demo even has tabs.
So why not use pywebkitgtk to create a simple fullscreen
webkit-based presentation tool?
It was even simpler than I expected. Here's the code:
#!/usr/bin/env python
# python-gtk-webkit presentation program.
# Copyright (C) 2009 by Akkana Peck.
# Share and enjoy under the GPL v2 or later.
import sys
import gobject
import gtk
import webkit
class WebBrowser(gtk.Window):
def __init__(self, url):
gtk.Window.__init__(self)
self.fullscreen()
self._browser= webkit.WebView()
self.add(self._browser)
self.connect('destroy', gtk.main_quit)
self._browser.open(url)
self.show_all()
if __name__ == "__main__":
if len(sys.argv) <= 1 :
print "Usage:", sys.argv[0], "url"
sys.exit(0)
gobject.threads_init()
webbrowser = WebBrowser(sys.argv[1])
gtk.main()
That's all! No navigation needed, since the slides include javascript
navigation to skip to the next slide, previous, beginning and end.
It does need some way to quit (for now I kill it with ctrl-C)
but that should be easy to add.
Webkit and image buffering
It works great. The only problem is that webkit's image loading turns out
to be fairly poor compared to Firefox's. In a presentation where most
slides are full-page images, webkit clears the browser screen to
white, then loads the image, creating a noticable flash each time.
Having the images in cache, by stepping through the slide show then
starting from the beginning again, doesn't help much (these are local
images on disk anyway, not loaded from the net). Firefox loads the
same images with no flash and no perceptible delay.
I'm not sure if there's a solution. I asked some webkit developers and
the only suggestion I got was to rewrite the javascript in the slides
to do image preloading. I'd rather not do that -- it would complicate
the slide code quite a bit solely for a problem that exists only in
one library.
There might be some clever way to hack double-buffering in the app code.
Perhaps something like catching the 'load-started' signal,
switching to another gtk widget that's a static copy of the current
page (if there's a way to do that), then switching back on 'load-finished'.
But that will be a separate article if I figure it out. Ideas welcome!
Update, years later: I've used this for quite a few real presentations now.
Of course, I keep tweaking it: see
my scripts page
for the latest version.
Helping people get started with Linux shells, I've noticed they
tend to make two common mistakes vastly more than any others:
Typing a file path without a slash, like etc/fstab
typing just a filename, without a command in front of it
The first boils down to a misunderstanding of how the Linux file
system hierarchy works. (For a refresher, you might want to check out
my Linux Planet article
Navigating
the Linux Filesystem.)
The second problem is due to forgetting the rules of shell grammar.
Every shell sentence needs a verb, just like every sentence in English.
In the shell, the command is the verb: what do you want to do?
The arguments, if any, are the verb's direct object:
What do you want to do it to?
(For grammar geeks, there's no noun phrase for a subject because shell
commands are imperative. And yes, I ended a sentence with a preposition,
so go ahead and feel superior if you believe that's incorrect.)
The thing is, both mistakes are easy to make, especially when you're
new to the shell, perhaps coming from a "double-click on the file and let
the computer decide what you should do with it" model. The shell model
is a lot more flexible and (in my opinion) better -- you, not
the computer, gets to decide what you should do with each file --
but it does take some getting used to.
But as a newbie, all you know is that you type a command and get some
message like "Permission denied." Why was permission denied? How are
you to figure out what the real problem was? And why can't the shell
help you with that?
And a few days ago I realized ... it can! Bash, zsh and
similar shells have a fairly flexible error handling mechanism.
Ubuntu users have seen one part of this, where if you type a command
you don't have installed, Ubuntu gives you a fancy error message
suggesting what you might have meant and/or what package you might
be missing:
$ catt /etc/fstab
No command 'catt' found, did you mean:
Command 'cat' from package 'coreutils' (main)
Command 'cant' from package 'swap-cwm' (universe)
catt: command not found
What if I tapped into that same mechanism and wrote a more general
handler that could offer helpful suggestions when it looked like the user
forgot the command or the leading slash?
It turns out that Ubuntu's error handler uses a ridiculously specific
function called command_not_found_handle that can't be used for
other errors. Some helpful folks I chatted with on #bash felt, as I
did, that such a specific mechanism was silly. But they pointed me to
a more general error trapping mechanism that turned out to work fine
for my purposes.
It took some fussing and fighting with bash syntax, but I have a basic
proof-of-concept. Of course it could be expanded to cover a lot more
types of error cases -- and more types of files the user might want
to open.
Here are some sample errors it catches:
$ schedule.html
bash: ./schedule.html: Permission denied
schedule.html is an HTML file. Did you want to run: firefox schedule.html
$ screenshot.jpg
bash: ./screenshot.jpg: Permission denied
screenshot.jpg is an image file. Did you want to run:
pho screenshot.jpg
gimp screenshot.jpg
$ .bashrc
bash: ./.bashrc: Permission denied
.bashrc is a text file. Did you want to run:
less .bashrc
vim .bashrc
$ ls etc/fstab
/bin/ls: cannot access etc/fstab: No such file or directory
Did you forget the leading slash?
etc/fstab doesn't exist, but /etc/fstab does.
You can find the code here:
Friendly shell errors
and of course I'm happy to take suggestions or contributions for how
to make it friendlier to new shell users.
For years I've been reading daily news feeds on a series of PalmOS
PDAs, using a program called
Sitescooper
that finds new pages on my list of sites, downloads them, then runs
Plucker to translate them into Plucker's
open Palm-compatible ebook format.
Sitescooper has an elaborate series of rules for trying to get around
the complicated formatting in modern HTML web pages. It has an
elaborate cache system to figure out what it's seen before.
When sites change their design (which most news sites seem to
do roughly monthly), it means going in and figuring out the new
format and writing a new Sitescooper site file. And it doesn't
understand RSS, so you can't use the simplified RSS that most
sites offer. Finally, it's no longer maintained; in fact, I was
the last maintainer, after the original author lost interest.
Several weeks ago, bma tweeted
about a Python RSS reader he'd hacked up using the feedparser
package. His reader targeted email, not Palm, but finding out
about feedparser was enough to get me started. So I wrote
FeedMe
(Carla Schroder came up with the all-important name).
I've been using it for a couple of weeks now and I'm very happy
with the results. It's still quite rough, of course, but it's
already producing better files than Sitescooper did, and it
seems more maintainable. Time will tell.
Of course it needs to be made more flexible, adjusted so that
it can produce formats besides Plucker, and so on. I'll get to it.
And the only site I miss now, because it doesn't offer an RSS feed,
is Linux Planet.
Maybe I'll find a solution for that eventually.
I've been getting tired of my various desktop backgrounds, and
realized that I had a lot of trip photos, from fabulous places
like Grosvenor Arch (at right),
that I'd never added to my background collection.
There's nothing like lots of repetitions of the same task to
bring out the shortcomings of a script, and the
wallpaper
script I threw together earlier this year was no exception.
I found myself frequently irritated by not having enough information
about what the script was doing or being able to change the filename.
Then I could have backgrounds named grosvenor.jpg rather
than img2691.jpg.
Alas, I can't use the normal GIMP Save-as dialog, since GIMP doesn't
make that dialog available to plug-ins. (That's a deliberate choice,
though I've never been clear on the reason behind it.) If I wanted
to give that control to the user, I'd have to make my own dialogs.
It's no problem to make a GTK dialog from Python. Just create a
gtk.Dialog, add a gtk.Entry to it, call dialog.run(), then check
the return value and get the entry's text to see if it changed.
No problem, right?
Ha! If you think that, you don't work with computers.
The dialog popped up fine, it read the text entry fine ... but it
wouldn't go away afterward. So after the user clicked OK, the
plug-in tried to save and GIMP popped up the JPEG save dialog
(the one that has a quality slider and other controls, but no
indication of filename) under my text entry dialog, which
remained there.
All attempts at calling dialog.hide() and dialog.destroy() and
similar mathods were of no avail. A helpful person on #pygtk worked
with me but ended up as baffled as I was. What was up?
In the end, GIMP guru Sven pointed me to the answer.
The problem was that my dialog wasn't part of the GTK main loop. In
retrospect, this makes sense: the plug-in is an entirely different
process, so I shouldn't be surprised that it would have its own main loop.
So when I hide() and destroy(), those events don't happen right away
because there's no loop in the plug-in process that would see them.
The plug-in passes control back to GIMP to do the gimp_file_save().
GIMP's main loop doesn't have access to the hide and destroy signals I
just sent. So the gimp_file_save runs, popping up its own dialog
(under mine, because the JPEG save dialog is transient to the original
image window while my python dialog isn't).
That finishes, returns control to the plug-in, the plug-in exits and
at that point GTK cleans up and finally destroys the dialog.
The solution is to loop over GTK events in the plug-in before calling
gimp_file_save, like this:
That loop gives the Python process a chance to clean up the dialog
before passing control to GIMP and its main loop. GTK in the
subprocess is happy, the user is happy, and I'm happy because now
I have a much more efficient way of making lots of desktop
backgrounds for lots of different machines.
Someone was asking for help building XEphem on the XEphem mailing list.
It was a simple case of a missing include file, where the only trick
is to find out what package you need to install to get that file.
(This is complicated on Ubuntu, which the poster was using,
by the way they fragment the X developement headers into a maze of
a xillion tiny packages.)
The solution -- apt-file -- is so simple and easy to use, and yet
a lot of people don't know about it. So here's how it works.
The poster reported getting these compiler errors:
ar rc libz.a adler32.o compress.o crc32.o uncompr.o deflate.o trees.o zutil.o inflate.o inftrees.o inffast.o
ranlib libz.a
make[1]: Leaving directory `/home/gregs/xephem-3.7.4/libz'
gcc -I../../libastro -I../../libip -I../../liblilxml -I../../libjpegd -I../../libpng -I../../libz -g -O2 -Wall -I../../libXm/linux86 -I/usr/X11R6/include -c -o aavso.o aavso.c
In file included from aavso.c:12:
../../libXm/linux86/Xm/Xm.h:56:27: error: X11/Intrinsic.h: No such file or directory
../../libXm/linux86/Xm/Xm.h:57:23: error: X11/Shell.h: No such file or directory
../../libXm/linux86/Xm/Xm.h:58:23: error: X11/Xatom.h: No such file or directory
../../libXm/linux86/Xm/Xm.h:59:34: error: X11/extensions/Print.h: No such file or directory
In file included from ../../libXm/linux86/Xm/Xm.h:60,
from aavso.c:12:
../../libXm/linux86/Xm/XmStrDefs.h:1373: error: expected `=', `,', `;', `asm' or `__attribute__' before `char'
In file included from ../../libXm/linux86/Xm/Xm.h:60,
from aavso.c:12:
../../libXm/linux86/Xm/XmStrDefs.h:5439:28: error: X11/StringDefs.h: No such file or directory
In file included from ../../libXm/linux86/Xm/Xm.h:61,
from aavso.c:12:
../../libXm/linux86/Xm/VirtKeys.h:108: error: expected `)' before `*' token
In file included from ../../libXm/linux86/Xm/Display.h:49,
from ../../libXm/linux86/Xm/DragC.h:48,
from ../../libXm/linux86/Xm/Transfer.h:44,
from ../../libXm/linux86/Xm/Xm.h:62,
from aavso.c:12:
../../libXm/linux86/Xm/DropSMgr.h:88: error: expected specifier-qualifier-list before `XEvent'
../../libXm/linux86/Xm/DropSMgr.h:100: error: expected specifier-qualifier-list before `XEvent'
How do you go about figuring this out?
When interpreting compiler errors, usually what matters is the
*first* error. So try to find that. In the transcript above, the first
line saying "error:" is this one:
../../libXm/linux86/Xm/Xm.h:56:27: error: X11/Intrinsic.h: No such file or directory
So the first problem is that the compiler is trying to find a file
called Intrinsic.h that isn't installed.
On Debian-based systems, there's a great program you can use to find
files available for install: apt-file. It's not installed by default,
so install it, then update it, like this (the update will take a long time):
In this case two two packages could install a file by that name.
You can usually figure out from looking which one is the
"real" one (usually the one with the shorter name, or the one
where the package name sounds related to what you're trying to do).
If you're stil not sure, try something like
apt-cache show libxt-dev tendra to find out more
about the packages involved.
In this case, it's pretty clear that tendra is a red herring,
and the problem is likely that the libxt-dev package is missing.
So apt-get install libxt-dev and try the build again.
Repeat the process until you have everything you need for the build.
Remember apt-file if you're not already using it.
It's tremendously useful in tracking down build dependencies.
Over the years, I've kept a few sets of records in the Datebook app
on my PalmOS PDA -- health records and such.
I've been experimenting with a few python plotting packages
(pycha, CairoPlot and a few others) and I wanted to try plotting
one of my Datebook databases.
Not so fast. It seems that it's been a year or more since I last
crunched any of this data -- and in the time since then,
pilot-link has bumped its version numbers and is now shipping
libpisock.so.9 instead of .8.
So what? Well, the problem is that Linux hasn't offered any way
to read Palm Datebook files for years. The pilot-link package
offered on most distros used to include a program
called pilot-datebook, but it was deleted from the source several
years ago. Apparently it was hard to maintain.
Back when it first disappeared, I built the previous version of
the source, stuck the pilot-datebook binary in ~/bin/linux and
have been using it ever since. Which worked fine -- until
libpisock.so.8 was no longer there. (Linking .9 to .8 didn't work either.)
This is all the more ironic because I don't need pilot-datebook to
talk to the PDA with libpisock -- all I want to do is parse the format
of a file I've already uploaded.
Off to hunt for an old version of the source. I started at
pilot-link.org, but gave up after a while -- they don't seem to
have source there except for the latest couple of versions, nor
do they have any documentation. Ironically, in their FAQ the very
first question is "How can I read the databook entries from a Palm
backup?" but the FAQ page is broken and the "answer" is actually
another unrelated FAQ question.
Anyway, no help there. I tried googling for old tarballs but there doesn't
seem to be anything like archive.org for source code.
All I found was the original
pilot-datebook
page, with a tarball that you insert into a copy of pilot-link 0.9.5
then modify the Makefile. Might work but that's really old.
So I fell back on old distributions. I guessed that Ubuntu Dapper was
old enough that it might still have pilot-datebook. So I went to the
Dapper pilot-link
source and downloaded the source tarball (curiously, they don't offer
src debs -- you have to download the tarball and patches separately).
Of course, it doesn't build on Ubuntu Jaunty. It had various
entertaining errors ranging from wanting a mysterious
tcl.m4 file not present in the code ... to not being
able to find <iostream.h< because all the C++ stdlib files have
recently been renamed to remove the .h ... to a change in the
open() system call where I needed to add permissions argument
for O_CREAT.
But I did get it working! So now I have a pilot-datebook program
that builds and runs on Ubuntu Jaunty, and parses my DatebookDB.pdb file.
Since I bet I'm not the only one in the world who occasionally wants
to read a Palm Datebook file, I've put my working version of the
source here:
pilot-link_0.11.8.jaunty.tar.gz.
After the usual configure and make, if all you want is pilot-datebook,
cd src/pilot-datebook then copy both pilot-datebook
and the directory .libs to wherever you want to install them.
And yeah, it would be better to write a standalone program that just
parsed the format. But it's hard to justify that for what's
essentially a dead platform. The real solution is to quit using
a Palm for this, import the data into some common format and keep it
on my Linux workstation from now on.
Sometimes I love open source. A user contacted me about my program
Crikey!,
which lets you generate key events to do things like assign a key
that will type in a string you don't want to type in by hand.
He had some thorny problems where crikey was failing, and a few
requests, like sending Alt and other modifier keys.
We corresponded a bit, figured out exactly how things should work
and some test cases, went through a couple iterations of changes
where I got lots of detailed and thoughtful feedback and more test cases,
and now Crikey can do an assortment of new useful stuff.
New features: crikey now handles number codes like \27, modifier
keys like \A for alt, does a better job with symbols like
\(Return\), and handles a couple of new special characters
like \e for escape.
It also works better at sending window manager commands,
like "\A\t" to change the active window.
I've added some better documentation on all the syntaxes it
understands, both on the web page and in the -h and -l (longhelp)
command-line arguments, and made a release: crikey 0.8.3.
Plus: a list of great regression tests that I can use when testing
future updates (in the file TESTING in the tarball).
Continuing my Linux Planet series on Linux performance monitoring,
the latest article looks at bloat and how you can measure it:
Finding
and Trimming Linux Bloat.
This one just covers the basics.
The followup article, in two weeks, will dive into more detail
on how to analyze what resources programs are really using.
I survived another GetSET Javascript-in-a-day workshop for
GetSET 2009.
It went okay, but not as well as I'd hoped.
This year's class was more distractable
than classes of past years -- and, judging by their career goals,
less interested in computers or engineering, unfortunate in a
program run by the Society of Women Engineers.
In the morning, we had a hard time getting them to focus long enough
to learn basics like what a variable was. After a caucus at lunchtime,
we decided to skip the next exercise (looping
over an array of colors) and spend some time drilling on the basics,
and keep at it 'til they got it.
It took a while but we eventually got through.
We needed more examples in the morning, more interaction, some
visceral way of explaining programming basics so they really get it.
They do better working as a group on a concrete problem,
like the final whiteboard exercise,
"How do we figure out whether the click was on the flower?".
That always ends up being a highlight of the class,
even though it involves (gasp) doing math.
This year was no exception, but it did take a while to get through.
Using variables lost them completely
("is the mouse's X coordinate bigger than or less than the flower's X?")
but when we used actual numnbers and ran through several
examples, things eventually clicked.
"The flower starts at (2, 5) and is 200 pixels wide. If the mouse
click is at (34, 45), who thinks it's inside the flower? Raise your
hands. Who thinks it's not? Now what if I click at (300, 24)?"
A couple of them got it right away, but it took a long time to
bring the whole class along.
I'm not still sure how to use that method for more basic concepts
like "what is a variable?".
Perhaps some sort of role-playing?
Watching William Phelps guide the girls through planet motions
in our astronomy workshop Wednesday,
each girl playing the role of a solar system object, inspired me.
I'd used role-playing like that with little kids,
but William says it works even with adults to get concepts across,
and after seeing him with the high schoolers I believe it.
But how to adapt that to programming concepts?
A recent Slate article on
teaching programming had some interesting ideas I want to try.
Printed handouts for GetSET may be a waste of time. Nobody was
even bothering to look at them, despite the fact that they had
complete instructions for everything we were doing. Do schools not
give students printed assignments or homework any more? Last year,
they used the printed exercises but not the quick reference guides;
this year they wouldn't even read the exercises. On the other hand,
it might be worth it for the handful in each class who really
love programming. I always hope some of them take the handouts
home and try some of the extras on their own.
Finally, the class would be so much easier if we could teach it on
a less pointy-clicky OS!
Or at least on machines where IE isn't the default browser.
The first 3-4 exercises go painfully slowly, guiding
a roomful of girls through many GUI navigation steps:
Open the GetSET folder on your desktop
Find the file named whatever.html
Right-click on it, find Open With, and choose Wordpad
Now find the window where you were just looking at the GetSET files
(because everything on Windows tends to open with huge windows,
it's now covered by Wordpad)
Drag whatever.html into Firefox
Then the helpers have to go around the room ensuring that the
girls have the correct file loaded in both Wordpad and Firefox.
This took way too long with only four people to check the whole class,
especially since we had to do it for every exercise.
Invariably some girls will doubleclick instead of right-clicking
or dragging, and will end up in whatever HTML editor Microsoft is
pushing this year, or with an IE window instead of Firefox
(and then the Error Console won't be there when she looks for it later).
Suggestions like "Keep that window open, you'll need it throughout
the class" or "Try making that window smaller, so you can see both
windows at once" don't help. The girls are too used to the standard
Windows model of one screen-filling window at a time. Keeping two
apps visible at once is too foreign. A few them are good at using
the taskbar to switch among apps, but for the rest, loading new files
is awkward and error prone.
In postmortems two years ago we talked about having them work on one
file throughout the whole workshop. That would solve the problem,
but I'm still working on how to do it without a lot of "Now comment
out the code you just wrote, so you won't get the prompt every time,
then scroll down to the next block of code and uncomment it."
I couldn't help thinking how on Linux, we could just tell them to type
leafpad whatever.html; firefox whatever.html and be done.
Or even give them an alias that would do it. Hmm ... I wonder if
I could make a Windows .bat file that would open the same file in
Wordpad and Firefox both? Must try that.
During OSCON a couple of weeks ago, I kept wishing I could do
Twitter searches for a pattern like #oscon in a cleaner way than
keeping a tab open in Firefox where I periodically hit Refresh.
Python-twitter doesn't support searches, alas, though it is part
of the Twitter API. There's an experimental branch of python-twitter
with searching, but I couldn't get it to work. But it turns out
Gwibber is also written in Python, and I was able to lift some
JSON code from Gwibber to implement a search. (Gwibber itself,
alas, doesn't work for me: it bombs out looking for the Gnome
keyring. Too bad, looks like it might be a decent client.)
I hacked up a "search for OSCON" program and used it a little during
the week of the conference, then got home and absorbed in catching
up and preparing for next week's GetSET summer camp, where I'm
running an astronomy workshop and a Javascript workshop for high
school girls. That's been keeping me frazzled, but I found a little
time last night to clean up the search code and release
Twit 0.3
with search and a few other new command-line arguments.
No big deal, but it was nice to take a hacking break from all this
workshop coordinating. I'm definitely happier program than I am
organizing events, that's for sure.
I was reading a terrific article on the New York Times about
Watching
Whales Watching Us.
At least, I was trying to read it -- but the NYT website forces font
faces and sizes that, on my system, end up giving me a tiny font
that's too small to read. Of course I can increase font size with
Ctrl-+ -- but it gets old having to do that every time I load a NYT page.
The first step was to get Greasemonkey working on Firefox 3.5.
"Update scripts" doesn't find a new script, and if you go to
Greasemonkey's home page, the last entry is from many months ago
and announces Firefox 3.1 support. But curiously, if you go to the
Greasemonkey
page on the regular Mozilla add-ons site, it does support 3.5.
I've had Greasemonkey for quite some time, but
every time I try to get started writing a script I have trouble
getting started. There are dozens of Greasemonkey tutorials on the
web, but most of them are oriented toward installing scripts and
don't address "What do you type into the fields of the Greasemonkey
New User Script dialog?"
Fortunately, I did find one that explained it:
The
beginner's guide to Greasemonkey scripting.
I gave my script a name (NYT font) and a namespace (my own domain),
added http://*nytimes.com/* for Includes,
and nothing for Excludes.
Click OK, and Greasemonkey offers a "choose editor" dialog. I chose
emacs, which mostly worked though the emacs window unaccountably
came up with a split window that I had to dismiss with C-x 1.
Now what to type in the editor? Firebug came to the rescue here.
I went back to the NYT page with the too-small fonts and clicked on
Firebug. The body style showed that they're setting
font-family: Georgia, serif
font-size: 84.5%
84.5%? Where does that come from? What happens if I change that
to 100%? Fortunately, I can test that right there in the Firebug
window. 100% made the fonts fairly huge, but 90% was about right.
I went back to greasemonkey's editor window and added:
document.body.style.fontSize = "90%";
Saved the file, and that was all I needed! Once I hit Reload on the
NYT page I got a much more readable font size.
I finally dragged myself into 2009 and tried Twitter.
I'd been skeptical, but it's actually fairly interesting and not
that much of a time sink. While it's true that some people tweet
about every detail of their lives -- "I'm waiting for a bus" /
"Oh, hooray, the bus is finally here" / "I got a good seat in the
second row of the bus" / "The bus just passed Second St. and two
kids got on" / "Here's a blurry photo from my phone of the Broadway Av.
sign as we pass it"
-- it's easy enough to identify those people and un-follow them.
And there are tons of people tweeting about interesting stuff.
It's like a news ticker, but customizable -- news on the latest
protests in Iran, the latest progress on freeing the Mars Spirit
Rover, the latest interesting publication on dinosaur fossils,
and what's going on at that interesting conference halfway around
the world.
The trick is to figure out how you want the information delivered.
I didn't want to have to leave a tab open in Firefox all the time.
There was an xchat plug-in that sounded perfect -- I have an xchat
window up most of the time I'm online -- but it turned out it works
by picking one of the servers you're connected to, making a private
channel and posting things there. That seemed abusive to the server
-- what if everyone on Freenode did that?
So I wanted a separate client. Something lightweight and simple.
Unfortunately, all the Twitter clients available for Linux either
require that I install a lot of infrastructure first (either Adobe
Air or Mono), or they just plain didn't work (a Twitter client
where you can't click on links? Come on!)
The article shows how to use the bindings to write a bare-bones
client. But of course, I've been hacking on the client all along,
so the one I'm actually using has a lot more features like *ahem*
letting you click on links. And letting you block threads, though
I haven't actually tested that since I haven't seen any threads
I wanted to block since my first day.
You can download the
current version of
Twit, and anyone who's interested can
follow me on Twitter.
I don't promise to be interesting -- that's up to you to decide --
but I do promise not to tweet about every block of my bus ride.
On my last Mojave trip, I spent a lot of the evenings hacking on
PyTopo.
I was going to try to stick to OpenStreetMap and other existing mapping
applications like TangoGPS, a neat little smartphone app for
downloading OpenStreetMap tiles that also runs on the desktop --
but really, there still isn't any mapping app that works well enough
for exploring maps when you have no net connection.
In particular, uploading my GPS track logs after a day of mapping,
I discovered that Tango really wasn't a good way of exploring them,
and I already know Merkaartor, nice as it is for entering new OSM
data, isn't very good at working offline. There I was, with PyTopo
and a boring hotel room; I couldn't stop myself from tweaking a bit.
Adding tracklogs was gratifyingly easy. But other aspects of the
code bother me, and when I started looking at what I might need to
do to display those Tango/OSM tiles ... well, I've known for a while
that some day I'd need to refactor PyTopo's code, and now was the time.
Surprisingly, I completed most of the refactoring on the trip.
But even after the refactoring, displaying those OSM tiles turned out
to be a lot harder than I'd hoped, because I couldn't find any
reliable way of mapping a tile name to the coordinates of that tile.
I haven't found any documentation on that anywhere, and Tango and
several other programs all do it differently and get slightly
different coordinates. That one problem was to occupy my spare time
for weeks after I got home, and I still don't have it solved.
But meanwhile, the rest of the refactoring was done, nice features
like track logs were working, and I've had to move on to other
projects. I am going to finish the OSM tile MapCollection class,
but why hold up a release with a lot of useful changes just for that?
So here's PyTopo 0.8,
and the couple of known problems with the new features will have to wait
for 0.9.
A silly little thing, but something that Python books mostly don't
mention and I can never find via Google:
How do you find all the methods in a given class, object or module?
Ideally the documentation would tell you. Wouldn't that be nice?
But in the real world, you can't count on that,
and examining all of an object's available methods can often give
you a good guess at how to do whatever you're trying to do.
Python objects keep their symbol table in a dictionary
called __dict__ (that's two underscores on either end of the word).
So just look at object.__dict__. If you just want the
names of the functions, use object.__dict__.keys().
Thanks to JanC for suggesting dir(object) and help(object), which
can be more helpful -- not all objects have a __dict__.
Someone on the OSM newbies list asked how he could strip
waypoints out of a GPX track file. Seems he has track logs of an
interesting and mostly-unmapped place that he wants to add to
openstreetmap, but there
are some waypoints that shouldn't be included, and he wanted a
good way of separating them out before uploading.
Most of the replies involved "just edit the XML." Sure, GPX files
are pretty simple and readable XML -- but a user shouldn't ever have
to do that! Gpsman and gpsbabel were also mentioned, but they're not
terribly easy to use either.
That reminded me that I had another XML-parsing task I'd been wanting
to write in Python: a way to split track files from my Garmin GPS.
Sometimes, after a day of mapping, I end up with several track
segments in the same track log file. Maybe I mapped several different
trails; maybe I didn't get a chance to upload one day's mapping before
going out the next day. Invariably some of the segments are of zero
length (I don't know why the Garmin does that, but it always does).
Applications like merkaartor don't like this one bit, so I
usually end up editing the XML file and splitting it into
segments by hand. I'm comfortable with XML -- but it's still silly.
I already have some basic XML parsing as part
of PyTopo and Ellie, so I know the parsing very easy to do.
So, spurred on by the posting on OSM-newbies,
I wrote a little GPX parser/splitter called
gpxmgr.
gpxmgr -l file.gpx can show you how many track logs are
in the file; gpxmgr -w file.gpx can write new files for
each non-zero track log. Add -p if you want to be prompted for
each filename (otherwise it'll use the name of the track log,
which might be something like "ACTIVE\ LOG\ #2").
How, you may wonder, does that help the original
poster's need to separate out waypoints from track files?
It doesn't. See, my GPS won't save tracklogs and
waypoints in the same file, even if you want them that way;
you have to use two separate gpsbabel commands to upload a track
file and a waypoint file. So I don't actually know what a
tracklog-plus-waypoint file looks like.
If anyone wants to use gpxmgr to manage waypoints as well as tracks,
send me a sample GPX file that combines them both.
I wrote last week about the sorts of
programmer
compulsions that lead to silly apps like my
animated Javascript
Jupiter. I got it working well enough and stopped, knowing
there were more features that would be easy to add but trying
to ignore them.
My mom, immediately upon seeing it, unerringly zeroed in on the biggest
missing feature I'd been trying to ignore. "Can you make it go
faster or slower?"
I put it off for a while, but of course I had to do it -- so now
there are Faster and Slower buttons. It still goes by hour jumps,
so the fastest you can go is an hour per millisecond. Fun to watch.
Or you can slow it down to 1 hour per 3600000 milliseconds if you
want to see it animate in real time. :-)
It's not like I needed another Jupiter's moons application.
I've already written more or less the same app for four platforms.
I don't use the Java web version,
Juplet, very much
any more, because I often have Java disabled or missing. And I don't
use my Zaurus any more so
Juplet for Zaurus
isn't very relevant. But I can always call up my
Xlib or PalmOS
Jupiter's moons app if I need to check on those Galilean moons.
They work fine.
Another version would be really pointless. A waste of time.
So it should have been no big deal when, during the course of
explaining to someone the difference between Java and Javascript,
it suddenly occurred to me that it would be awfully easy to
re-implement that Java Juplet web page using Javascript, HTML
and CSS. I mean, a rational person would just say "oh, yeah, I
suppose that's true" and go on with life.
But what I'm trying to say is that programming isn't a career path,
or a hobby, or a field of academic study. It's a disease.
It's a compulsion, where, sometimes, just realizing that
something could be done renders you unable to think about
anything else until you just ... try ... just a few minutes ...
see how well it works ... oh, wow, that really looks a lot better
than the Java version, wouldn't it look even nicer if you just added
in this one other little tweak ... but wait, now it's so close to
working, I bet it wouldn't be all that hard to take the Java class
and turn it into ...
... and before you know it, it's tomorrow and you have something
that's almost a working app, and it's just really a shame to
get that far and not finish it at least to the point where you can
share it.
But then, Javascript and web pages are so easy to work on that it
really isn't that much extra work to add in some features that the
old version didn't have, like an animate button ...
... and your Saturday morning is gone forever, and there's not much
you can do about that, but at least you have a nice
animated Jupiter's moons
(and shadows) page when the sickness passes and you can finally
think about other things.
This week's Linux Planet article is another one on Python and
graphical toolkits, but this time it's a little more advanced:
Graphical
Python Programming With PyGTK.
This one started out as a fun and whizzy screensaver sort of program
that draws lots of pretty colors -- but I couldn't quite fit it all
into one article, so that will have to wait for the sequel two weeks
from now.
Ever since I got the GPS I've been wanting something that plots the
elevation data it stores. There are lots of apps that will show me
the track I followed in latitude and longitude, but I couldn't find
anything that would plot elevations.
But GPX (the XML-based format commonly used to upload track logs)
is very straightforward -- you can look at the file and read the
elevations right out of it. I knew it wouldn't be hard to write
a script to plot them in Python; it just needed a few quiet hours.
Sounded like just the ticket for a rainy day stuck at home with
a sore throat.
Sure enough, it was fairly easy. I used xml.dom.minidom to
parse the file (I'd already had some experience with it in
gimplabels
for converting gLabels templates), and pylab from
matplotlib
for doing the plotting. Easy and nice looking.
I was making a minor tweak to my
garmin script
that uses gpsbabel to read in tracklogs and waypoints from my GPS
unit, and I needed to look up the syntax of how to do some little
thing in sh script. (One of the hazards of switching languages a
lot: you forget syntax details and have to look things up a lot,
or at least I do.)
I have quite a collection of scripts in various languages in my
~/bin (plus, of course, all the scripts normally installed in
/usr/bin on any Linux machine) so I knew I'd have lots of examples.
But there are scripts of all languages sharing space in those
directories; it's hard to find just sh examples.
For about the two-hundredth time, I wished, "Wouldn't it be nice
to have a command that can search for patterns only in files that
are really sh scripts?"
And then, the inevitable followup ... "You know, that would be
really easy to write."
So I did -- a little python hack called langgrep that takes a language,
grep arguments and a file list, looks for a shebang line and only greps
the files that have a shebang matching the specified language.
Of course, while writing langgrep I needed langgrep, to look up
details of python syntax for things like string.find (I can never
remember whether it's string.find(s, pat) or s.find(pat); the python
libraries are usually nicely object-oriented but strings are an
exception and it's the former, string.find). I experimented with
various shell options -- this is Unix, so of course there are plenty
of ways of doing this in the shell, without writing a script. For instance:
grep find `egrep -l '#\\!.*python' *`
grep find `file * | grep python | sed 's/:.*//'`
i in foo; file $i|grep python && grep find $i; done # in sh/bash
These are all pretty straightforward, but when I try to make them
into tcsh aliases things get a lot trickier. tcsh lets you make
aliases that take arguments, so you can use !:1 to mean the first
argument, !2-$ to mean all the arguments starting with the second
one. That's all very well, but when you put them into a shell alias
in a file like .cshrc that has to be parsed, characters like ! and $
can mean other things as well, so you have to escape them with \.
So the second of those three lines above turns into something like
alias greplang "grep \!:2-$ `file * | grep \!:1 | sed 's/:.*//'`"
except that doesn't work either, so it probably needs more escaping
somewhere. Anyway, I decided after a little alias hacking that
figuring out the right collection of backslash escapes would
probably take just as long as writing a python script to do the
job, and writing the python script sounded more fun.
So here it is: my
langgrep
script. (Awful name, I know; better ideas welcome!)
Use it like this (if python is the language you're looking for,
find is the search pattern, and you want -w to find only "find"
as a whole word):
Making desktop backgrounds in GIMP is a bit tedious if you have
several machines with screens of different sizes. The workflow goes
something like this:
First, choose Crop tool and turn on Fixed: Aspect Ratio.
Then loop over images:
Load an image
Go to Tool Options
Type in the aspect ratio: 4:3, 8:5, 5:4, 1366:768 etc.
Go to the image and crop.
Image->Scale (I have this on Shift-S, can't remember whether
that was a default binding or not).
Ctrl-K to delete the current width (Ctrl-U also works, but beeps;
I'm not sure why)
Type in the desired width (1024 or 1680 or 1366 or whatever)
(I always hit Tab here, though it's probably not necessary)
Click Scale or type Alt-S (unfortunately, Return doesn't work
in this dialog).
Save As to the appropriate name and path for the current resolution
Undo (the Scale), Undo (the Crop)
Load a new image (continue loop)
But you can use Save Options (Tool Presets) to avoid step 3,
typing in the aspect ratio.
Here's how:
Set up the aspect ratio you want in the Crop tool, with Fixed checked
Click on Save Options (the floppy disk icon in the lower left of
Tool Options)
Choose a name (choose New Entry first if you've already saved options).
Repeat, for each aspect ratio you might want to use.
Now clicking on Restore Options gives you a menu of all your commonly
used aspect ratios -- much faster than typing them in every time.
Too bad there's no way to use this shortcut for the Scale step,
or to do Crop and Scale in one operation.
Nice shortcut! But having done that, I realized I could shorten it
even more: I could make a selection (not a crop) with one of my preset
aspect ratios, then run a script that would figure out from the aspect
ratio which absolute size I wanted, crop and scale, and either save
it to the right place, or make a new image so I could save without
needing to Redo or Save a Copy. That was an easy script to write,
so here it is:
wallpaper.py.
Mostly this week has been consumed with preparations for
LCA ... but programming
is a sickness. When you get email from someone suggesting
something relatively simple and obviously useful, well ...
it's simply impossible not to pull out that emacs window
and start typing.
And so it was when I got a request for a backspace character in
crikey.
Of course backspace and delete seem like perfectly reasonable
and useful characters to want; don't know why I didn't think
of putting them in before. So I did.
But while I was in there, suddenly it occurred to me that it
really wouldn't be much harder to let users specify any key
by symbol. (Did I mention being a programmer is a sickness?)
And then I realized that specifying control characters with
a caret, like ^H, would also be quite useful. (Did I mention
that ...)
So anyway, now there's a Crikey 0.8 and it's time to get back
to packing and endless fiddling with my talk slides.
Except, wait, I need to update my netscheme script to work
right with the new laptop, and ...
I've been wanting for a long time to make Debian and Ubuntu
repositories so people can install
pho with apt-get,
but every time I try to look it up I get bogged down.
But I got mail from a pho user who really wanted that, and even
suggested a howto.
That howto
didn't quite do it, but it got me moving to look for a better one,
which I eventually found in the
Debian
Repository Howto.
It wasn't complete either, alas, so it took some trial-and-error
before it actually worked. Here's what finally worked:
I created two web-accessible directories, called hardy and etch.
I copied all the files created by dpgk-buildpkg on each distro --
.deb, .dsc, .tar.gz, and .changes (I don't think
this last file is used by anything) -- into each directory
(renaming them to add -etch and -hardy as appropriate).
Then:
It gives an error, ** Packages in archive but missing from override file: **
but seems to work anyway.
Now you can use one of the following /etc/apt/sources.list lines:
deb http://shallowsky.com/apt/hardy ./
deb http://shallowsky.com/apt/etch ./
After an apt-get update, it saw pho, but it warned me
WARNING: The following packages cannot be authenticated!
pho
Install these packages without verification [y/N]?
There's some discussion in the
SecureAPT page
on the Debian wiki, but it's a bit involved and I'm not clear if
it helps me if I'm not already part of the official Debian keychain.
This page on
Release
check of non Debian sources was a little more helpful, and told me
how to create the Release and Release.gpg file -- but then I just get
a different error,
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY
And worse, it's an error now, not just a warning,
preventing any apt-get update.
Going back to the SecureApt page, under
Setting up a secure apt repository they give the two steps the
other page gave for creating Release and Release.gpg, with a third
step: "Publish the key fingerprint, that way your users will know what
key they need to import in order to authenticate the files in the
archive."
So apparently if users don't take steps to import the key manually,
they can't update at all. Whereas if I leave out the Release and
Release.gpg files, all they have to do is type y when they see the
warning. Sounds like it's better to leave off the key.
I wish, though, that there was a middle ground, where I could offer the
key for those who wanted it without making it harder for those
who don't care.
I wrote moonroot
more to figure out how to do it than to run it myself.
But on the new monitor I have so much screen real estate
that I've started using it -- but the quality of the images was
such an embarrassment that I couldn't stand it. So I took a few
minutes and cleaned up the images and made a moonroot 0.6 release.
Turned out there was a trick I'd missed when I originally made the
images, years ago. XPM apparently only allows 1-bit transparency.
When I was editing the RGB image and removing the outside edge of the circle,
some of the pixels ended up semi-transparent, and when I saved the
file as .xpm, they ended up looking very different (much darker)
from what I had edited.
Here are two ways to solve that in GIMP:
Use the "Hard edge" option on the eraser tool (and a hard-edged
brush, of course, not a fuzzy one).
Convert the image to indexed, in which case GIMP will only allow
one bit's worth of transparency. (That doesn't help for full-color
images, but for a greyscale image like the moon, there's no loss
of color since even RGB images can only have 8 bits per channel.)
Either way, the way to edit a transparent image where you're trying
to make the edges look clean is to add a solid-color background
layer (I usually use white, but of course it depends on how you're going
to use the image) underneath the layer you're trying to edit.
(In the layers dialog, click the New button, chose White for the
new layer, click the down-arrow button to move it below the original
layer, then click on the original layer so your editing will all
happen there.)
Once you're editing a circle with sharp edges, you'll probably need
to adjust the colors for some of the edge pixels too. Unfortunately
the Smudge tool doesn't seem to work on indexed images, so you'll
probably spend a lot of time alternating between the Color Picker
and the Pencil tool, picking pixel colors then dabbing them onto
other pixels. Key bindings are the best way to do that: o activates
the Color Picker, N the Pencil, P the Paintbrush. Even if you don't
normally use those shortcuts it's worth learning them for the
duration of this sort of operation.
Or use the Clone tool, where the only keyboard shortcut you have to
remember is Ctrl to pick a new source pixel. (I didn't think of that
until I was already finished, but it works fine.)
Pho 0.9.6-pre3 has been working great for me for about a month, and
I've been trying to find the time to do a release. I finally managed
it this weekend, after making a final tweak to change the default
PHO_REMOTE command from gimp-remote to gimp since
gimp-remote is obsolete and is no longer built by default.
The big changes from 0.9.5 are Keywords mode, slideshow mode,
the new PHO_REMOTE environment variable,
swapping -f and -F, and a bunch of performance work and
minor bug fixing.
I built deb packages for Ubuntu (Hardy, but they should work on
Intrepid too) and Debian (Etch), as well as the usual source tarball,
and they're available at the usual place:
Someone on #openbox this morning wanted help in bringing up a window
without decorations -- no titlebar or window borders.
Afterward, Mikael commented that the app should really be coded
not to have borders in the first place.
Me: You can do that?
Turns out it's not a standard ICCCM request, but one that mwm
introduced, MWM_HINTS_DECORATIONS.
Mikael pointed me to the urxvt source as an example of an app that uses it.
My own need was more modest: my little
moonroot
Xlib program that draws the moon at approximately its current phase.
Since the code is a lot simpler than urxvt, perhaps the new version,
moonroot 0.4, will be useful as an example for someone (it's also
an example of how to use the X Shape extension for making
non-rectangular windows).
I've released
Pho 0.9.6-pre3.
The only change is to fix a sporadic bug where
pho would sometimes jump back to the first image after deleting
the last one, rather than backing up to the next-to-last image.
I was never able to reproduce the bug reliably, but
I cleaned up the image list next/prev code quite a bit and
haven't seen the bug since then. I'd appreciate having a few
testers exercising this code as much as possible.
Otherwise pho is looking pretty solid for a 0.9.6 release.
Last night Joao and I were on IRC helping someone who was learning
to write gimp plug-ins. We got to talking about pixel operations and
how to do them in Python. I offered my arclayer.py as an example of
using pixel regions in gimp, but added that C is a lot faster for
pixel operations. I wondered if reading directly from the tiles
(then writing to a pixel region) might be faster.
But Joao knew a still faster way. As I understand it, one major reason
Python is slow at pixel region operations compared to a C plug-in is
that Python only writes to the region one pixel at a time, while C can
write batches of pixels by row, column, etc. But it turns out you
can grab a whole pixel region into a Python array, manipulate it as
an array then write the whole array back to the region. He thought
this would probably be quite a bit faster than writing to the pixel
region for every pixel.
He showed me how to change the arclayer.py code to use arrays,
and I tried it on a few test layers. Was it faster?
I made a test I knew would take a long time in arclayer,
a line of text about 1500 pixels wide. Tested it in the old arclayer;
it took just over a minute to calculate the arc. Then I tried Joao's
array version: timing with my wristwatch stopwatch, I call it about
1.7 seconds. Wow! That might be faster than the C version.
The updated, fast version (0.3) of arclayer.py is on my
arclayer page.
If you just want the trick to using arrays, here it is:
from array import array
[ ... setting up ... ]
# initialize the regions and get their contents into arrays:
srcRgn = layer.get_pixel_rgn(0, 0, srcWidth, srcHeight,
False, False)
src_pixels = array("B", srcRgn[0:srcWidth, 0:srcHeight])
dstRgn = destDrawable.get_pixel_rgn(0, 0, newWidth, newHeight,
True, True)
p_size = len(srcRgn[0,0])
dest_pixels = array("B", "\x00" * (newWidth * newHeight * p_size))
[ ... then inside the loop over x and y ... ]
src_pos = (x + srcWidth * y) * p_size
dest_pos = (newx + newWidth * newy) * p_size
newval = src_pixels[src_pos: src_pos + p_size]
dest_pixels[dest_pos : dest_pos + p_size] = newval
[ ... when the loop is all finished ... ]
# Copy the whole array back to the pixel region:
dstRgn[0:newWidth, 0:newHeight] = dest_pixels.tostring()
I've been using my pre-released 0.9.6-pre1 version of
pho, my image
viewer, for ages, now, and it's been working fine. I keep wanting
to release it, but there were a
couple of minor bugs that irritated me and I hadn't had time to
track down. Tonight, I finally got caught up with my backlog and
found myself with a few extra minutes to spare, and fixed the last
two known bugs. Quick, time to release before I discover anything else!
(There were a couple other features I was hoping to implement --
multiple external commands, parsing a .phorc file, and having
Keywords mode read and write the Keywords file itself -- but
none of those is terribly important and they can wait.)
It's only a -pre release, but I'm not going to have a long
protracted set of betas this time. 0.9.6-pre1 is very usable,
and I'm finding Keywords mode to be awfully useful for classifying
my mountain of back photos.
So, pho users, give it a try and let me know if you see any bugs!
It's my hope to release the real 0.9.6 in a week or two, if nobody
finds any monstrous bugs in the meantime.
Every summer I volunteer as an instructor for a one-day Javascript
programming class at the GetSET
summer technology camp for high school girls. GetSET is a great
program run by the Society of Women Engineers.
it's intended for minority girls from relatively poor neighborhoods,
and the camp is free to the girls (thanks to some great corporate
sponsors). They're selected through a competitive interview process
so they're all amazingly smart and motivated, and it's always rewarding
being involved.
Teaching programming in one day to people with no programming
background at all is challenging, of course. You can't get into any
of the details you'd like to cover, like style, or debugging
techniques. By the time you get through if-else, for and while loops,
some basic display methods, the usual debugging issues like reading
error messages, and typographical issues like
"Yes, uppercase and lowercase really are different" and "No, sorry,
that's a colon, you need a semicolon", it's a pretty full day and
the students are saturated.
I got drafted as lead presenter several years ago, by default by
virtue of being the only one of the workshop leaders who actually
programs in Javascript. For several years I'd been asking for a chance
to rewrite the course to try to make it more fun and visual
(originally it used a lot of form validation exercises), and
starting with last year's class I finally got the chance. I built
up a series of graphics and game exercises (using some of Sara
Falamaki's Hangman code, which seemed perfect since she wrote it
when she was about the same age as the girls in the class) and
it went pretty well. Of course, we had no idea how fast the girls
would go or how much material we could get through, so I tried to
keep it flexible and we adjusted as needed.
Last year went pretty well, and in the time since then we've
exchanged a lot of email about how we could improve it.
We re-ordered some of the exercises, shifted our emphasis in a few
places, factored some of the re-used code (like windowWidth()) into
a library file so the exercise files weren't so long, and moved more of
the visual examples earlier.
I also eliminated a lot of the slides. One of the biggest surprises
last year was the "board work". I had one exercise where the user
clicks in the page, and the student has to write the code to figure
out whether the click was over the image or not. I had been nervous
about that exercise -- I considered it the hardest of the exercises.
You have to take the X and Y coordinates of the mouse click, the X and
Y coordinates of the image (the upper left corner of the <div>
or <img> tag), and the size of the image (assumed to be 200x200),
and turn that all into a couple of lines of working Javascript code.
Not hard once you understand the concepts, but hard to explain, right?
I hadn't made a slide for that, so we went to the whiteboard to draw
out the image, the location of the mouse click, the location of the
image's upper left corner, and figure out the math ...
and the students, who had mostly been sitting passively
through the heavily slide-intensive earlier stuff, came alive. They
understood the diagram, they were able to fill in the blanks and keep
track of mouse click X versus image X, and they didn't even have much
trouble turning that into code they typed into their exercise. Fantastic!
Remembering that, I tried to use a lot fewer slides this year.
I felt like I still needed to have slides to explain the basic
concepts that they actually needed to use for the exercises -- but
if there was anything I thought they could figure out from context,
or anything that was just background, I cut it. I tried for as few
slides as possible between exercises, and more places where we could
elicit answers from the students. I think we still have too many slides
and not enough "board work" -- but we're definitely making progress,
and this year went a lot better and kept them much better engaged.
We're considering next year doing the first several exercises on the
board first, then letting them type it in to their own copies to
verify that it works.
We did find we needed to leave code examples visible:
after showing slides saying something like "Ex 7:
Write a loop that writes a line of text in each color", I had to
back up to the previous slide where I'd showed what the code actually
looked like. I had planned on their using my "Javascript Quick
Reference" handout for reference and not needing that information
on the slides; but in fact, I think they were confused about the
quickref and most never even opened it. Either that information needs
to be in the handout, or it needs to be displayed on the screen as
they work, or I have to direct them to the quickref page explicitly
("Now turn to page 3 in ...") or put that information in the exercises.
The graphical flower exercises were a big hit this year (I showed them
early and promised we'd get to them, and when we did, just before
lunch, several girls cheered) and, like last year, some of the girls
who finished them earlier decided on their own that they wanted to
change them to use other images, which was also a big hit. Several
other girls decided they wanted more than 10 flowers displayed, and
others hit on the idea of changing the timeout to be a lot shorter,
which made for some very fun displays. Surprisingly, hardly anyone
got into infinite loops and had to kill the browser (always a
potential problem with javascript, especially when using popups
like alert() or prompt()).
I still have some issues I haven't solved, like what to do about
semicolons and braces. Javascript is fairly agnostic about
them. Should I tell the girls that they're required? (I did that
this year, but it's confusing because then when you get to "if"
statements you have to explain why that's different.) Not mention
them at all? (I'm leaning toward that for next year.)
And it's always a problem figuring out what the fastest girls should
do while waiting for the rest to finish.
This year, in addition to trying to make each exercise shorter,
we tried having the girls work on them in groups of
two or three, so they could help each other. It didn't quite work out
that way -- they all worked on their own copies of the exercises
but they did seem to collaborate more, and I think that's the best
balance. We also encourage the ones who finish first to help the girls
around them, which mostly they do on their own anyway.
And we really do need to find a better editor we can use on the
Windows lab machines instead of Wordpad. Wordpad's font is too small on
the projection machine, and on the lab machines it's impossible for
most of us to tell the difference between parentheses, brackets and
braces, which leads to lots of time-wasting subtle bugs. Surely
there's something available for Windows that's easy to use,
freely distributable, makes it easy to change the font, and has
parenthesis and brace matching (syntax highlighting would be nice too).
Well, we have a year to look for one now.
All in all, we had a good day and most of the girls gave the class
high marks. Even the ones who concluded "I learned I shouldn't
be a programmer because it takes too much attention to detail"
said they liked the class. And we're fine with that --
not everybody wants to be a programmer, and the point isn't to
force them into any specific track. We're happy if we can give
them an idea of what computer programming is really like ...
then they'll decide for themselves what they want to be.
A user on the One Laptop Per Child (OLPC, also known as the XO)
platform wrote to ask me how to use crikey on that platform.
There are two stages to getting crikey running on a new platform:
Build it, and
Figure out how to make a key run a specific program.
The crikey page
contains instructions I've collected for binding keys in various
window managers, since that's usually the hard part.
On normal Linux machines the first step is normally no problem.
But apparently the OLPC comes with gcc but without make or the X
header files. (Not too surprising: it's not a machine aimed at
developers and I assume most people developing for the machine
cross-compile from a more capable Linux box.)
We're still working on that (if my correspondant gets it working,
I'll post the instructions), but while I was googling for
information about the OLPC's X environment I stumbled upon
a library I didn't know existed: python-xlib.
It turns out it's possible to do most or all of what crikey does
from Python. The OLPC is Python based; if I could write crikey
in Python, it might solve the problem.
So I whipped up a little key event generating script as a test.
Unfortunately, it didn't solve the OLPC problem (they don't include
python-xlib on the machine either) but it was a fun exercises, and
might be useful as an example of how to generate key events in
python-xlib. It supports both event generating methods: the X Test
extension and XSendEvent. Here's the script:
/pykey-0.1.
But while I was debugging the X Test code, I had to solve a bug that
I didn't remember ever solving in the C version of crikey. Sure
enough, it needed the same fix I'd had to do in the python version.
Two fixes, actually. First, when you send a fake key event through
XTest, there's no way to specify a shift mask. So if you need a
shifted character like A, you have to send KeyPress Shift, KeyPress a.
But if that's all you send, XTest on some systems does exactly what
the real key would do if held down and never released: it
autorepeats. (But only for a little while, not forever. Go figure.)
So the real answer is to send KeyPress Shift, KeyPress a, KeyRelease
a, KeyRelease Shift. Then everything works nicely. I've updated
crikey accordingly and released version 0.7 (though since XTest
isn't used by default, most users won't see any change from 0.6).
In the XSendEvent case, crikey still doesn't send the KeyRelease
event -- because some systems actually see it as another KeyPress.
(Hey, what fun would computers be if they were consistent and
always predictable, huh?)
Both C and Python versions are linked off the
crikey page.
I was looking at Dave's little phase-of-the-moon Mac application,
and got the urge to play with moonroot, the little xlib ditty I
wrote several years ago to put a moon (showing the right phase)
on the desktop.
I fired it up, and got the nice moon-shaped window ... but with a
titlebar. I didn't want that! Figuring out how to get rid of the
titlebar in openbox was easy, just
... but it didn't work! A poke with xwininfo showed the likely
cause: instead of "moonroot", the window was listed as "Unnamed window".
Whoops!
A little poking around revealed three different ways to set "name"
for a window: XStoreName, XSetClassHint (which sets both class
name and app name), and XSetWMName. Available online documentation
on these functions was not very helpful in explaining the differences;
fortunately someone hanging out on the openbox channel knew the
difference (thanks, Crazy_Hopper). Thus:
XSetWMName sets a property called XA_NAME which is
primarily used to update the window's titlebar.
Note that this may be more than the app name (for instance,
Firefox puts the title of the current page in the titlebar).
To use XSetWMName, you have to set up and populate an
XTextProperty structure, which first requires that you set up
a string list then run XStringListToTextProperty
-- not difficult but it's several annoying steps.
XStoreName is a shortcut to XSetWMName, a way to set
the XA_NAME (titlebar name) in one step.
XSetClassHint sets two properties at once: a name hint
and a class hint. This is the name and class that the window
manager uses for directives like suppressing the titlebar.
I didn't see much in the way of example code for what an app ought to
do with these, so I'll post mine here:
char* appname;
XClassHint* classHint;
[ ... ]
if (argv && argc > 1)
appname = basename(argv[0]);
else
appname = "moonroot";
/* set the titlebar name */
XStoreName(dpy, win, appname);
/* set the name and class hints for the window manager to use */
classHint = XAllocClassHint();
if (classHint) {
classHint->res_name = appname;
classHint->res_class = "MoonRoot";
}
XSetClassHint(dpy, win, classHint);
XFree(classHint);
And if anyone is interested in my silly moon program, it's at
moonroot-0.3.tar.gz.
moonroot gives you a large moon,
moonroot -s gives a smaller one.
I'm not terribly happy with its accuracy and wasted too much time
today fiddling with it and verifying that it's doing the right time
conversions. All I can figure is that the approximation in Meeus'
Astronomical Algorithms is way too approximate (it's
sometimes off by more than a day) and I should just rewrite all my
moon programs to calculate moon phase the hard (and slow) way.
Python is so cool. I love how I'll be working on a script and
suddenly think "Oh, it should also do X, but I bet that'll be a
lot more work", and then it occurs to me that I can do exactly that
by adding about 2 more lines of python. And I add them and it
works the first time.
Anyway, it turned out to be very easy to go through all existing
blog articles and add tags for the current category hierarchy,
being careful to preserve each file's last-modified date since
that's what pyblosxom uses for the date of the entry.
add-tags.py
Entries on this blog are arranged by category. But all too often I
have something that really belongs equally well in two
categories. Since pyblosxom's categories follow the hierarchy on disk,
there's no way to have an entry in two categories. Enter tags.
Tags are a way of assigning any number of keywords to each blog
entry. Search engines apparently pay attention to tags, and most
tagged blogs also let you search by tag.
I wanted my tags to follow whatever canonical tag format the big
blogging sites use, so search engines would index them. Unfortunately,
this isn't well documented anywhere. Wikipedia has a
tags
entry that mentions a couple of common formats; the HTML format
given in that entry (<a rel="tag" ...>) turns out
to be the format used on most popular sites like livejournal and
blogspot, so that's what I wanted to use. Later, someone pointed me
to a much better tag
explanation on technorati, which is useful whether or not you
decide to register with technorati.
Next: how to implement searching?
The simplest pyblosxom tags plug-in is called simply
tags.py.
All the others are much more complex and do tons of things I'm
not interested in.
But tags.py doesn't support static mode, and points
to a modified tags.py
that's supposedly modified to work with static blogs.
Alas, when I tried that version, it didn't work (and an inquiry on the
pybloxsom list got a response from someone who agreed it didn't work).
So I hacked around and eventually got it working.
Here's a
diff
for what I changed or just the
tags-static.py
plug-in.
Additional steps I needed that weren't mentioned in tags.py:
Add "#tags foo,bar" directives as the second line of an entry,
right under the title; anywhere else in the file it will be ignored.
You may ned to create the tag directories
http://yoursite/tags/$tagname
yourself (pyblosxom created the directories for me on the web
server, but not on the machine where I first tested).
In addition to the config file entries discussed below, if you use
an extension other than .txt (or maybe even if you don't) you also
need to set py[ 'taggable_files' ] = [ "ext" ]
In your story.html template, include $tag_links
wherever you want the tags line to go. But make "Tags:
" or something similar be part of the pretext, so it won't
be included on un-tagged entries.
I also wrote a little python
index.cgi
for my blog's /tags directory, so you can see the list of tags used so
far. Strangely, tags.py didn't create any such index, and it was
easier to make a cgi than to figure out how to do it from a blosxom
plug-in.
And as long as I'm posting pyblosxom diffs, here's the little
filename
diff for 1.4.3 that I apply to pyblosxom whenever I update it, to
let me use the .blx extension rather than .txt for my blog source files.
(That way I can configure my editor to treat blog files as html, which
they are -- they aren't plaintext.)
Anyway, it all seems to be working now, and in theory I can tag all
future articles. I'll probably go back and gradually add tags to
older articles, but that's a bigger project and there's no rush.
On a recent Mojave desert trip, we tried to follow a minor dirt road
that wasn't mapped correctly on any of the maps we had, and eventually
had to retrace our steps. Back at the hotel, I fired up my trusty
PyTopo on the East
Mojave map set and tried to trace the road. But I found that as I
scrolled along the road, things got slower and slower until it
just wasn't usable any more.
PyTopo was taking up all of my poor laptop's memory. Why?
Python is garbage collected -- you're not supposed to have
to manage memory explicitly, like freeing pixbufs.
I poked around in all the sample code and man pages I had available
but couldn't find any pygtk examples that seemed to be doing any
explicit freeing.
When we got back to civilization (read: internet access) I did
some searching and found the key. It's even in the
PyGTK
Image FAQ, and there's also some discussion in a
mailing
list thread from 2003.
Turns out that although Python is supposed to handle its own garbage
collection, the Python interpreter doesn't grok the size of a pixbuf
object; in particular, it doesn't see the image bits as part of the
object's size. So dereferencing lots of pixbuf objects doesn't trigger
any "enough memory has been freed that it's time to run the garbage
collector" actions.
The solution is easy enough: call gc.collect() explicitly
after drawing a map (or any other time a bunch of pixbufs have been
dereferenced).
So there's a new version of PyTopo, 0.6
that should run a lot better on small memory machines, plus
a new collection format (yet another format from
the packaged Topo! map sets) courtesy of Tom Trebisky.
Oh ... in case you're wondering, the ancient USGS maps from
Topo! didn't show the road correctly either.
I left the water on too long in the garden again. I keep doing
that: I'll set up something where I need to check back in five minutes or
fifteen minutes, then I get involved in what I'm doing and 45 minutes
later, the cornbread is burnt or the garden is flooded.
When I was growing up, my mom had a little mechanical egg timer.
You twist the dial to 5 minutes or whatever, and it goes
tick-tick-tick and then DING! I could probably
find one of those to buy (they're probably all digital now
and include clocks and USB plugs and bluetooth ports) but since the
problem is always that I'm getting distracted by something on the
computer, why not run an app there?
Of course, you can do this with shell commands. The simple solution
is:
(sleep 300; zenity --info --text="Turn off the water!") &
But the zenity dialogs are small -- what if I don't notice it? --
and besides, I have to multiply by 60 to turn a minute delay into
sleep seconds. I'm lazy -- I want the computer to do that for me!
Update: Ed Davies points out that "sleep 5m" also works.
A slightly more elaborate solution is at. Say something like:
at now + 15 minutes
and when it prompts for commands, type something like:
export DISPLAY=:0.0
zenity --info --text="Your cornbread is ready"
to pop up a window with a message.
But that's too much typing and has the same problem of the small
easily-ignored dialogs. I'd really rather have a great big red
window that I can't possibly miss.
Surely, I thought, someone has already written a nice egg-timer
application! I tried aptitude search timer and found several
apps such as gtimer, which is much more complicated than I wanted (you
can define named events and choose from a list of ... never mind, I
stopped reading there). I tried googling, but didn't have much luck
there either (lots of Windows and web apps, no Linux apps or
cross-platform scripts).
Clearly just writing the damn thing was going to be easier than
finding one.
(Why is it that every time I want to do something simple on a computer,
I have to write it? I feel so sorry for people who don't program.)
I wanted to do it in python, but what to use for the window that pops up?
I've used python-gtk in the past, but I've been meaning to check out
TkInter (the gui toolkit that's kinda-sorta part of Python) and
this seemed like a nice opportunity since the goal was so simple.
The resulting script:
eggtimer.
Call it like this:
eggtimer 5 Turn off the water
and in five minutes, it will pop up a huge red window the size of the
screen with your message in big letters. (Click it or hit a key to
dismiss it.)
First Impressions of TkInter
It was good to have an excuse to try TkInter and compare it with python-gtk.
TkInter has been recommended as something normally installed
with Python, so the user doesn't have to install anything extra.
This is apparently true on Windows (and maybe on Mac), but on
Ubuntu it goes the other way: I already had pygtk, because GIMP
uses it, but to use TkInter I had to install python-tk.
For developing I found TkInter irritating. Most
of the irritation concerned the poor documentation:
there are several tutorials demonstrating very basic uses, but
not much detailed documentation for answering questions like "What
class is the root Tk() window and what methods does it have?"
(The best I found -- which never showed up in google, but was
referenced from O'Reilly's Programming Python -- was
here.)
In contrast, python-gtk is
very well documented.
Things I couldn't do (or, at least, couldn't figure out how to do, and
googling found only postings from other people wanting to do the same thing):
Button didn't respond to any of the obvious keys, like Return or
Space, and in fact setting key handlers on the button didn't work --
I ended up setting a key handler on the root window.
I couldn't find a way to set the root window size and background
explicitly, so I had to set approximate window size by guessing at
the size of the internal padding of the button.
There's an alternate to the root Tk() window called
Toplevel, which is documented and does allow setting window
size. Unfortunately, it also pops up an empty dialog without being
told to (presumably a bug).
All of the tutorials I found for creating dialogs was wrong,
and I finally gave up on dialogs and just used a regular window.
I couldn't fork and return control to the shell, because TkInter
windows don't work when called from a child process (for reasons no
one seems to be able to explain), so you have to run it in the
background with & if you want your shell prompt back.
I expect I'll be sticking with pygtk for future projects.
It's just too hard figuring things out with no documentation.
But it was fun having an excuse to try something new.
Someone showed up on #gimp today with a color specified as an HTML
hex color specifier, and wanted to know how to find the nearest
color name.
Easy, right? There have got to be a bazillion pages that do that,
plus at least a couple of Linux apps.
But I googled for a while and couldn't find a single one. There
are lots of pages that list all the RGB colors, or convert decimal
red, green and blue into HTML #nnn hex codes, or offer aesthetic
advice about pleasing combinations of colors for themes (including
this lovely page on butterfly-inspired color themes,
courtesy of Rik) but nothing I could
find that gave color names. Apparently there used to be a Linux
app that did that, a piece of Gnome 1 called GColorSel,
but it's gone now.
I got to thinking (always dangerous!) ...
/etc/X11/rgb.txt has a list of color names
with their RGB color equivalents. It would be really easy to write
something that just read down the list finding the ones closest to
the specified color.
Uh-oh ... of course, once that thought occurred to me, I was doomed.
Programmer's disease. I had to write it. So I did, and here it is:
Find the
Nearest Matching Color Name. It checks against both rgb.txt
and the much smaller list of 17 CSS color names.
A couple of friends periodically pester me to write about why
I stopped contributing to Mozilla after so many years with the
project. I've held back, feeling like it's airing dirty laundry
in public.
But a
discussion
on mozilla.dev.planning over the last week, started by Nelson
Bolyard, aired it for me: it was their culture of regressions.
I love Mozilla technology. I'm glad it exists, and I still use it for
my everyday browsing. But trying to contribute to Mozilla just got
too frustrating. I spent more time chasing down and trying to fix
other people's breakages than I did working on anything I
wanted to work on.
That might be okay, barely, when you're getting paid for it. But
when you're volunteering your own time, who wants to spend it fixing
code checked in by some other programmer who just can't be bothered to
clean up his own mess?
It's the difference between spending a day cleaning your own house ...
and spending every day cleaning other people's houses.
Nelson said it eloquently in this exchange:
(Robert Kaiser writes)
As we are open source, everyone can access and test that code, and
find and file the regressions, so that they get fixed over time.
(Boris Zbarsky writes)
That last conclusion doesn't necessarily follow. To get them fixed you
need someone fixing them.
(Nelson Bolyard writes)
We're very unlikely to get volunteers to spent large amounts of effort,
rewriting formerly working code to get it to work again, after it was
broken by someone else's checkin. This demotivates developers and drives
them away. They think "why should I keep working on this when others can
break my code and I must pay for their mistakes?" and "I worked hard to
get that working, and now person X has broken it. Let HIM fix it."
This was exactly how I felt,
and it's the reason I quit working on Mozilla.
A little later in the thread,
Boris Zbarsky reports that the trunk has been so broken with regressions
that it's been unusable for him for weeks or months. (When you have
someone as committed and sharp as Boris unable to use your software,
you know there's something wrong with your project's culture.)
He writes:
"For example, on my machine (Linux) about one in three SVG testcases in
Bugzilla causes trunk Gecko to hang X ..."
Justin Dolske replies, "Oh, Linux," and asks if it's related to
turning on Cairo. Boris replies affirmatively.
Just another example where a change was
checked in that caused serious regressions keeping at least one
important contributor from using the browser on a regular basis;
yet it's still there and hasn't been backed out. Of course, it's
"only Linux".
David Baron appears to take Nelson's concerns seriously,
and suggests criteria for closing the tree and making
everyone stop work to track down regressions. As he correctly
comments, closing the tree is very serious and inefficient, and should
be avoided in all but the most serious cases.
But Nelson repeats the real question:
(Nelson Bolyard writes)
Under what circumstances does a Sheriff back out a patch due to
functional regressions? From what you wrote above, I gather it's "never". :(
Alas, the thread peters out after that; there's no reply
to Nelson's question.
The problem with Mozilla isn't that there are regressions.
Mistakes happen. The problem is that regressions never get
fixed, because the project's culture encourages regressions.
The prevailing attitude is
that it's okay to check in changes that break other people's features,
as long as your new feature is cool enough or the right people want
it. If you break something, well, hey, someone will figure out a fix
eventually. Or not. Either way, it's not your problem.
Working on new features is fun, and so is getting the credit for being
the one to check them in. Fixing bugs, writing API documentation,
extensive testing -- these things aren't fun, they're hard work, and
there isn't much glory in them either (you don't get much appreciation
or credit for it). So why do them if you don't have to? Let someone
else worry about it, as long as the project lets you get away with it!
A project with a culture of responsibility would say that the person
who broke something should fix it, and that broken stuff should stay
out of the tree. If programmers don't do that
themselves just because it's the right thing to do, the project could
enforce it: just insist that regression-causing changes that can't
be fixed right away be backed out. Fix the regressions out
of the tree where they aren't causing problems for other people.
Get help from people to test it and to integrate it with those
other modules you forgot about the first time around.
Yes, even if it's a change that's needed -- even if it's something
a lot of people want. If it's a good change, there will always be time
to check it in later.
For a bit
over a year I've been running a patched version of Firefox,
which I call Kitfox,
as my main browser. I patch it because there are a few really
important features that the old Mozilla suite had which Firefox
removed; for a long time this kept me from using Firefox
(and I'm not
the only one who feels that way), but when the Mozilla Foundation
stopped supporting the suite and made Firefox the only supported
option, I knew my only choice was to make Firefox do what I needed.
The patches were pretty simple, but they meant that I've been building
my own Firefox all this time.
Since all my changes were in JavaScript code, not C++,
I knew this was probably all achievable with a Firefox extension.
But never around to it;
building the Mozilla source isn't that big a deal to me. I did it as
part of my job for quite a few years, and my desktop machine is fast
enough that it doesn't take that long to update and rebuild, then
copy the result to my laptop.
But when I installed the latest Debian, "Etch", on the laptop, things
got more complicated. It turns out Etch is about a year behind in
its libraries. Programs built on any other system won't run on Etch.
So I'd either have to build Mozilla on my laptop (a daunting prospect,
with builds probably in the 4-hour range) or keep another
system around for the purpose of building software for Etch.
Not worth it. It was time to learn to build an extension.
There are an amazing number of good tutorials on the web for writing
Firefox extensions (I won't even bother to link to any; just google
firefox extension and make your own choices).
They're all organized as step by step examples with sample code.
That's great (my favorite type of tutorial) but it left my real
question unanswered: what can you do in an extension?
The tutorial examples all do simple things like add a new menu or toolbar
button. None of them override existing Javascript, as I needed to do.
Canonical
URL to the rescue.
It's an extension that overrides one of the very behaviors I wanted to
override: that of adding "www." to the beginning and ".com" or ".org"
to the end of whatever's in the URLbar when you ctrl-click.
(The Mozilla suite behaved much more usefully: ctrl-click opened the
URL in a new tab, just like ctrl-clicking on a link. You never need
to add www. and .com or .org explicitly because the URL loading code
will do that for you if the initial name doesn't resolve by itself.)
Canonical URL showed me that all you need to do is make an overlay
containing your new version of the JavaScript method you want to
override. Easy!
So now I have a tiny Kitfox extension
that I can use on the laptop or anywhere else. Whee!
Since extensions are kind of a pain to unpack,
I also made a source tarball which includes a simple Makefile:
kitfox-0.1.tar.gz.
Pho 0.9.5-pre5
has been working nicely since I released it two weeks
ago. And meanwhile, I've already started working on the next
version. So I've released it as 0.9.5 with no changes (except for
version string and some updates to the documentation and debian
config files).
I made a .deb on Ubuntu Edgy, but haven't actually tested it yet
(anyone who sees problems, please let me know) and I'll try to
make a straight Debian package on Sarge sometime soon.
So what's this stuff I've been working on for the next version?
Image categorization. I shoot so many photos, and categorizing
them by keyword can be a lot of work. Although
Pho's "Notes 0 through 9" are helpful for a small number of notes,
it's tough keeping track of which note corresponds to which keyword
when I'm categorizing a directory full of photos from a trip.
The next Pho release (which will have a much shorter release cycle
than 0.9.5 did, honest!) will have an optional Keywords dialog
where you can type in keywords and associate them with photos.
I know there are apps such as f-spot, gthumb and Picasa, but they
all seem much more heavyweight than what I need, and Pho only
needs a tiny bit of work to get there.
While I'm working on dialogs, I'm also cleaning up modality:
Pho dialogs will now stay visible so they can't get lost behind
the image, and the question dialog ("Really delete?" or "Do you want
to quit?" will be modal.
But that's all coming in the next version.
For now, 0.9.5 is the stable version: get it from
the Pho page.
Pho's been static for a long time -- it's been working well enough
that I keep forgetting that there were a couple of bugs that need
fixing for a 0.9.5 release.
I had some time tonight, so I dug in and fixed the bugs I
remembered: some issues with zooming in and out, a bug with
aspect ratio when switching out of fullscreen mode, and
the fact that Note 0 didn't work.
While I was at it, I added an environment variable, PHO_ARGS,
where you can preset your default values. I find that I always
want -p (presentation mode), so now I can specify that with
PHO_ARGS=p, and use pho -P when I want window borders.
I also updated the man page.
I'll test this for a little while and if nobody finds any
serious bugs, maybe I can finally release 0.9.5.
I was talking about desktop backgrounds -- wallpaper -- with some
friends the other day, and it occurred to me that it might be fun
to have my system choose a random backdrop for me each morning.
Finding backgrounds is no problem: I have plenty of images
stored in ~/Backgrounds -- mostly photos I've taken over the
years, with a smattering of downloads from sites like the
APOD.
So all I needed was a way to select one file at random from the
directory.
This is Unix, so there's definitely a commandline way to do it, right?
Well, surprisingly, I couldn't find an easy way that didn't involve
any scripting. Some shells have a random number generator built in
($RANDOM in bash) but you still have to do some math on the result.
Of course, I could have googled, since I'm sure other people have
written random-wallpaper scripts ... but what's the fun in that?
If it has to be a script, I might as well write my own.
Rather than write a random wallpaper script, I wanted something that
could be more generally useful: pick one random line from standard
input and print it. Then I could pass it the output of ls -1
$HOME/Backgrounds, and at the same time I'd have a script that
I could also use for other purposes, such as choosing a random
quotation, or choosing a "flash card" question when studying for
an exam.
The obvious approach is to read all of standard input into an array,
count the lines, then pick a random number between one and $num_lines
and print that array element. It took no time to whip that up in
Python and it worked fine. But it's not very efficient -- what if
you're choosing a line from a 10Mb file?
Then Sara Falamaki (thanks, Sara!) pointed me to a
page
with a neat Perl algorithm. It's Perl so it's not easy to read,
but the algorithm is cute. You read through the input line by line,
keeping track of the line number. For each line, the chance that
this line should be the one printed at the end is the reciprocal of
the line number: in other words, there's one chance out of
$line_number that this line is the one to print.
So if there's only one line, of course you print that line;
when you get to the second line, there's one chance out of two that
you should switch; on the third, one chance out of three, and so on.
A neat idea, and it doesn't require storing the whole file in memory.
In retrospect, I should have thought of it myself: this is basically
the same algorithm I used for averaging images in GIMP for
my silly Chix Stack Mars
project, and I later described the method in the image stacking
section of my GIMP book.
To average images by stacking them, you give the bottom layer 100%
opacity, the second layer 50% opacity, the third 33% opacity, and so
on up the stack. Each layer makes an equal contribution to the final
result, so what you see is the average of all layers.
The randomline script, which you can inspect
here,
worked fine, so I hooked it up to accomplish the original
problem: setting a randomly chosen desktop background each day.
Since I use a lightweight window manager (fvwm) rather than gnome or
kde, and I start X manually rather than using gdm, I put this in my
.xinitrc:
So, an overlong article about a relatively trivial but nontheless
nifty algorithm. And now I have a new desktop background each day.
Today it's something prosaic: mud cracks from Death Valley.
Who knows what I'll see tomorrow?
Update, years later:
I've written a script for the whole job,
randombg,
because on my laptop I want to choose from a different set of
backgrounds depending on whether I'm plugged in to an external monitor
or using the lower resolution laptop display.
But meanwhile, I've just been pointed to the shuf command,
which does pretty much what my randomline script did.
So you don't actually need any scripts, just
Belated release announcement: 0.5b2 of my little map viewer
PyTopo
has been working well, so I released 0.5 last week with only a
few minor changes from the beta.
I'm sure I'll immediately find six major bugs -- but hey, that's
what point releases are for. I only did betas this time because
of the changed configuration file format.
I also made a start on a documentation page for the .pytopo file
(though it doesn't really have much that wasn't already written
in comments inside the script).
I'm working on some little Javascript demos (for a workshop at
this summer's Get SET girls'
technology camp) so I've had the Javascript Console up for most of
my browsing over the last few days. I also have Mozilla's
strict Javascript checking on
(user_pref("javascript.options.strict", true); in prefs.js
or user.js) since I don't want to show the girls code that generates
warnings. (Strict mode also reports errors in CSS.)
It's been eye opening how many sites give warnings.
You know that nice clean ultra-simple Google search page?
One CSS error and one JS warning. But that's peanuts to the pages
of errors I see on most sites,
and they're not all missing "var" declarations.
I have to hit the "Clear" button frequently if I want to be able to
see the errors on the pages I'm working on.
And my own sites? Yes, I admit it, I've seen some errors in my own
pages too. Though it makes me feel better that there aren't very many
of them (mostly CSS problems, not JS). I'm going to keep the JS
Console visible more often so I'll see these errors and correct them.
A few months ago, someone contacted me who was trying to use my
PyTopo map display script for a different set of map data, the
Topo! National Parks series. We exchanged some email about the
format the maps used.
I'd been wanting to make PyTopo more general
anyway, and already had some hacky code in my local version to
let it use a local geologic map that I'd chopped into segments.
So, faced with an Actual User (always a good incentive!), I
took the opportunity to clean up the code, use some of Python's
support for classes, and introduce several classes of map data.
I called it 0.5 beta 1 since it wasn't well tested. But in the last
few days, I had occasion to do some map exploring,
cleaned up a few remaining bugs, and implemented a feature which
I hadn't gotten around to implementing in the new framework
(saving maps to a file).
I think it's ready to use now. I'm going to do some more testing:
after visiting the USGS
Open House today and watching Jim Lienkaemper's narrated
Virtual
Tour of the Hayward Fault,
I'm all fired up about trying again to find more online geologic
map data.
But meanwhile, PyTopo is feature complete and has the known
bugs fixed. The latest version is on
the PyTopo page.
Over dinner, I glanced at the cover of the latest Dr. Dobb's
(a new article on Ruby on Rails),
then switched to BBC World News. The first Beeb headline
was Aid
flow begins for Java victims.
I guess I was a little distracted from dinner preparations ...
my first thought was "Are they going to give them all copies of
Ruby and Rails?"
Then, of course, I remembered the earthquake.
Oh, right, those Java victims!
(Not to make light of the situation there, which sounds grim.
And just as I was writing this, I got email from the USGS Earthquake
Notification Service reporting another aftershock in Indonesia,
this one magnitude 5.6. I hope it doesn't make matters worse.)
This morning I was all ready to continue working on an ongoing web
project when I discovered that mysql wasn't running.
That's funny, it was running fine yesterday! I tried
/etc/init.d/mysql start, but it failed. The only error message
was, "Please take a look at the syslog."
So I hied myself over to /var/log, to discover that
mysql.log and mysql.err were both there, but empty.
Some poking around /etc/mysql/my.cnf revealed that logging is
commented out by default, because: "# Be aware that this log
type is a performance killer."
I uncommented logging and tried again, but /var/log/mysql.err
remained blank, and all that was in mysql.log was three lines
related basically giving its runtime arguments and the name of the
socket file.
Back to the drawing board. I was well aware that I had changed the
mysql settings yesterday. See, mysqld on Ubuntu likes to create its
socket as /var/run/mysqld/mysqld.sock, but other apps, like
Ruby, all expect to find it in /tmp/mysql.sock. It's easy enough to
change Ruby's expectations. But then I found out that although the
cmdline client mysql also expects the socket in
/var/run/mysqld, it depends on something called
mysqladmin that wants the socket in /tmp. (I may have
those two reversed. No matter: the point is that you can't use the
client to talk to the database because it and the program it depends
on disagree about the name of the socket. This is probably a Dapper bug.)
Okay, so I had to pick one. I decided that /tmp/mysql.sock was
easier to remember and more standard with non-Debian setups. I knew
where to change it in the server (/etc/mysql/my.cnf is there and well
commented) but the mysql client doesn't use that, and it took some
googling and help from clueful friends to find out that what it wanted
was a new file called /etc/my.cnf (how's that for a nice clear
configuration file name?) containing one line:
socket = /tmp/mysql.sock
That done, mysql started and ran and everything worked. Woo!
Except that it didn't the following morning after a reboot, and didn't
give any error messages as to why.
Off I went on a merry chase across init files: /etc/init.d/mysql calls
/etc/mysql/debian-start (which made me realize that debian has added
yet another config file, debian.cnf, which has yet another copy
of the line specifying the socket filename) which calls
/usr/share/mysql/debian-start.inc.sh as well as calling various
other programs. But those were all red herrings:
the trick to debugging the problem was to run mysqld
directly (not via /etc/init.d/mysql start: it actually does
print error messages, but they were being hidden by using the init.d
script.
The real problem turned out to be that I had changed the location of the
socket file, but not the pid file, in /etc/mysql/my.cnf, which was
also located by default in /var/run/mysqld. Apparently that
directory is created dynamically at each boot, and it isn't created
unless it's needed for the socket file (whether the pid file needs it
doesn't matter). So since I'd moved the socket file to /tmp,
/var/run/mysqld wasn't created, mysqld couldn't create its pid file
and it bailed. Solution: edit my.cnf to use /tmp for the pid file.
Back when I laboriously installed Ruby and Rails on Ubuntu "Hoary
Hedgehog" (which involved basically ignoring all the Ubuntu packages
and building everything, including Ruby itself, from source), I was
cheered by the notes in Ubuntu's forums and bugzilla indicating that
as of the next release ("Breezy Badger") all the right versions
would be there, and all this source compilation would no longer
be necessary.
I didn't get around to trying it until today. Breezy and its successor
"Dapper Drake" do indeed have a rails package as well as a Ruby
package, and I happily installed them. All looked great -- until
I actually tried to use them on a real-world application. It turns
out that the Ruby and Rails packages don't include gems, Ruby's
package manager (similar to the CPAN system familiar to Perl
programmers). And gems is required for doing anything
useful in Rails.
Drat! After several false starts, I eventually found the
instructions on this
page. Except that installs way more than seems necessary
for what I need to do, and if you copy/paste lines from that page
you may end up with a bunch of packages you don't want, like an
out of date version of mysql.
So here are simplified instructions for using Ruby on Rails
on Ubuntu Breezy or Dapper.
As yourself:
wget http://rubyforge.org/frs/download.php/5207/rubygems-0.8.11.tgz
tar zxvf rubygems-0.8.11.tgz
As root:
cd rubygems-0.8.11
ruby setup.rb
gem install rubygems-update
gem install rails
Say yes to all dependency questions during the gem install of rails.
Add your web server and database of choice (you probably already
have them installed, anyway) and you should be good to go.
You may note that the page I referenced tells you to in