Akkana's Musings on Open Source, Science, and Nature.
Sun, 31 Aug 2008
I wanted to get a list of who'd been contributing the most in a
particular open source project. Most projects of any size have a
ChangeLog file, in which check-ins have entries like this:
2008-08-26 Jane Hacker <firstname.lastname@example.org>
* src/app/print.c: make sure the Portrait and Landscape
* buttons update according to the current setting.
I wanted to take each entry, save the name of the developer checking
in, then eventually count the number of times each name occurs (the
number of times that developer checked in) and print them in order
from most check-ins to least.
Getting the names is easy: for check-ins in the last 9 years, I just
want the lines that start with "200". (Of course, if I wanted earlier
check-ins I could make the match more general.)
grep "^200" ChangeLog
But now I want to trim the line so it includes only the
contributor's name. A bit of sed geekery can do that: the date is a
fixed format (four characters, a dash, two, dash, two, then two
spaces, so "^....-..-.. " matches that pattern.
But I want to remove the email address part too
(sometimes people use different email addresses
when they check in). So I want a sed pattern that will match
something at the front (to discard), something in the middle (keep that part)
and something at the end (discard).
Here's how to do that in sed:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\)<.*$/\1/'
In English, that says: "For each line in the ChangeLog that starts
with 200, find a pattern at the beginning consisting of any four
characters, a dash, two characters, dash, two characters, dash, and
two spaces; then immediately after that, save all characters up to
a < symbol; then throw away the < and any characters that follow
until the end of the line."
That works pretty well! But it's not quite right: it includes the
two spaces after the name as part of the name. In sed, \s matches
any space character (like space or tab).
So you'd think this should work:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\)\s+<.*$/\1/'
\s+ means it will require that at least one and maybe more space
characters immediately before the < are also discarded.
But it doesn't work. It turns out the reason is that the \(.*\)
expression is "greedier" than the \s+: so the saved name expression
grabs the first space, leaving only the second to the \s+.
The way around that is to make the name expression specify that it
can't end with a space. \S is the term for "anything that's not a
space character"; so the expression becomes
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\S\)\s\+<.*$/\1/'
(the + turned out to need a backslash before it).
We have the list of names! Add a
| sort on the end to
sort them alphabetically -- that will make sure you get all the
"Jane Hacker" lines listed together. But how to count them?
The Unix program most frequently invoked after
uniq, which gets rid of all the repeated lines.
On a hunch, I checked out the man page,
and found the -c option: "prefix lines by the number of occurrences".
Perfect! Then just sort them by the number, from largest to
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\S\)\s+<.*$/\1/' | sort | uniq -c | sort -rn
And we're done!
Now, this isn't perfect since it doesn't catch "Checking in patch
contributed by email@example.com" attributions -- but those aren't in
a standard format in most projects, so they have to be handled by hand.
Disclaimer: Of course, number of check-ins is not a good measure of
how important or productive someone is. You can check in a lot of
one-line fixes, or you can write an important new module and submit
it for someone else to merge in. The point here wasn't to rank
developers, but just to get an idea who was checking into the tree
and how often.
Well, that ... and an excuse to play with nifty Linux shell pipelines.
[ 11:12 Aug 31, 2008
More linux |
permalink to this entry
Thu, 28 Aug 2008
I have an article on Linux Planet! The first of many, I hope.
At least the first of a short series on Linux astronomy programs,
starting with the one that's easiest to use: KStars.
It's oriented toward binocular observing, with suggestions
for good targets for beginners.
the Night Sky with Linux, Part I: KStars
[ 21:46 Aug 28, 2008
More writing |
permalink to this entry
Sat, 23 Aug 2008
I've wanted to know forever how to forward a message with all or
some of its attachments from mutt.
You can set the variable mime_forward so that when you forward
a message, it includes the entire message, headers, attachments and
all, as a single attachment. You can't edit this or change it in
any way. If you want to trim the original message, or omit one
of the attachments, you're out of luck.
I've found two ways to do it.
First: type v to get to the attachments screen. Type t repeatedly to
tag all the attachments, including the initial small text/plain
attachment (that's the original message body). When they're all
tagged, type ;f (forward all tagged attachments). After you fill
in the To: prompt, you'll be able to edit the message body, and
when you leave the editor, you'll have the attachment list there
to edit as you see fit.
If that doesn't work (I haven't tried it on HTML messages),
there's a slightly more elaborate procedure: use
use the current message as a template for a new one.
This calls up an editor on the current message, including headers.
Change the From to your name, the To to your intended recipient, and
edit the message body to your heart's content. When you're done,
you're sent to the Compose screen, where you can adjust the
attachment list and send the message.
Forwarding is pretty clearly not what Esc-E was intended for ...
but it does the job and might be a handy trick to know.
[ 22:14 Aug 23, 2008
More linux |
permalink to this entry
Fri, 22 Aug 2008
One of the local community colleges sent out glossy flyers
advertising their program, with the tag line "College pays
for itself; don't put it off!"
To prove how valuable college can be, they include a helpful
table showing the "Mediun earnings" for people with various
West Valley actually has a decent sciences program, and some
other interesting programs like Park Management (ranger training).
But I suspect I should stay away from their English and Statistics
[ 15:47 Aug 22, 2008
More humor |
permalink to this entry
Thu, 21 Aug 2008
On a short afternoon hike at Sanborn today, Dave and I decided to go
by the tiny koi pond near the visitors' center to see if any newts
were left this late into summer.
What a scene! In the current semi-drought, the pond has become a mud
flat, its surface
tracks and squirming with newts and crayfish trying to push
themselves out of the sticky mud.
In the few holes where the water was more than a couple inches
deep, fish flopped -- several 6-8" long golden koi plus something brown
but similarly large. A few of the newts thrashed in the water holes, too,
seemingly trying to get clean of the mud that coated them;
but most of the newts wriggled across the shallower mud flats,
heading nowhere in particular but looking very unhappy.
The crayfish seemed most numerous at the dryer edges of the pond,
pushing themselves laboriously up out of the mud with their claws
and dragging themselves across the mud.
Newts normally migrate, and can go surprisingly long distances
(miles) across land, so I think at least some of these newts will
survive. The fish, I must assume, are doomed unless someone rescues them.
I wonder if the rangers have considered selling the non-native
koi to someone who wants them, and replacing them with native fish?
Are there any fish native here this far upstream? Penitencia Creek
(at Alum Rock) has small fish (up to about 3" long), but it carries
more water in dry seasons than any creek near Sanborn.
What about the crayfish? Can crayfish survive long out of water,
bury themselves in mud (the ones here didn't seem too happy about
that idea) or migrate overland?
I suspect there will be some happy park raccoons tonight.
[ 20:21 Aug 21, 2008
More nature |
permalink to this entry
Chase sent us replacement MasterCards out of the blue. They came with
a brochure touting their wonderful new
feature. So convenient! You just hold the card near a reader
and it charges your account, no need for any of that pesky swiping
of cards or signing of forms!
Yes, it's RFID (Radio Frequency ID tags), the small low-power radio
transmitters also used by Walmart and various other retailers,
and in other applications such as company security badges/access cards
(and, unfortunately, new passports in quite a few countries).
It seems a little odd to me that Chase's marketing implies that most
people would think it's a good thing to have a credit card that
can be charged easily without even taking it out of your wallet ...
to have a card that can be charged from some distance away without
your even knowing about it.
It's apparently easy and cheap to build an RFID credit card skimmer:
Bruce Schneier has
several articles about it, and in a later article he
offers several links to articles on how to
your own RFID skimmer.
We called Chase right away and told them we didn't want the "Blink"
cards. They said we could keep our old, non-RFID cards and continue
to use them, and destroy the new ones. Whew!
Googling for links for this article, I found that we're not
the only Chase customers to want to decline Blink.
For anyone wondering how secure this technology is, the recent
debacle of the cracked Dutch RFID subway cards gives you an idea
Schneier, "Dutch RFID Transit Card Hacked";
Register, "Dutch transit card crippled by multihacks",
and a followup where the MBTA, Boston's transit agency,
the courts to muzzle three MIT students who were trying to present
a paper at Defcon on the security holes in the MBTA's RFID-based pay system.
For anyone who gets stuck with an RFID credit card, here's how to make an
wallet, and how to make an
[ 10:26 Aug 21, 2008
More tech/security |
permalink to this entry
Sat, 16 Aug 2008
Last night Joao and I were on IRC helping someone who was learning
to write gimp plug-ins. We got to talking about pixel operations and
how to do them in Python. I offered my arclayer.py as an example of
using pixel regions in gimp, but added that C is a lot faster for
pixel operations. I wondered if reading directly from the tiles
(then writing to a pixel region) might be faster.
But Joao knew a still faster way. As I understand it, one major reason
Python is slow at pixel region operations compared to a C plug-in is
that Python only writes to the region one pixel at a time, while C can
write batches of pixels by row, column, etc. But it turns out you
can grab a whole pixel region into a Python array, manipulate it as
an array then write the whole array back to the region. He thought
this would probably be quite a bit faster than writing to the pixel
region for every pixel.
He showed me how to change the arclayer.py code to use arrays,
and I tried it on a few test layers. Was it faster?
I made a test I knew would take a long time in arclayer,
a line of text about 1500 pixels wide. Tested it in the old arclayer;
it took just over a minute to calculate the arc. Then I tried Joao's
array version: timing with my wristwatch stopwatch, I call it about
1.7 seconds. Wow! That might be faster than the C version.
The updated, fast version (0.3) of arclayer.py is on my
If you just want the trick to using arrays, here it is:
from array import array
[ ... setting up ... ]
# initialize the regions and get their contents into arrays:
srcRgn = layer.get_pixel_rgn(0, 0, srcWidth, srcHeight,
src_pixels = array("B", srcRgn[0:srcWidth, 0:srcHeight])
dstRgn = destDrawable.get_pixel_rgn(0, 0, newWidth, newHeight,
p_size = len(srcRgn[0,0])
dest_pixels = array("B", "\x00" * (newWidth * newHeight * p_size))
[ ... then inside the loop over x and y ... ]
src_pos = (x + srcWidth * y) * p_size
dest_pos = (newx + newWidth * newy) * p_size
newval = src_pixels[src_pos: src_pos + p_size]
dest_pixels[dest_pos : dest_pos + p_size] = newval
[ ... when the loop is all finished ... ]
# Copy the whole array back to the pixel region:
dstRgn[0:newWidth, 0:newHeight] = dest_pixels.tostring()
[ 21:02 Aug 16, 2008
More gimp |
permalink to this entry
Mon, 11 Aug 2008
I've been using my pre-released 0.9.6-pre1 version of
, my image
viewer, for ages, now, and it's been working fine. I keep wanting
to release it, but there were a
couple of minor bugs that irritated me and I hadn't had time to
track down. Tonight, I finally got caught up with my backlog and
found myself with a few extra minutes to spare, and fixed the last
two known bugs. Quick, time to release before I discover anything else!
(There were a couple other features I was hoping to implement --
multiple external commands, parsing a .phorc file, and having
Keywords mode read and write the Keywords file itself -- but
none of those is terribly important and they can wait.)
It's only a -pre release, but I'm not going to have a long
protracted set of betas this time. 0.9.6-pre1 is very usable,
and I'm finding Keywords mode to be awfully useful for classifying
my mountain of back photos.
So, pho users, give it a try and let me know if you see any bugs!
It's my hope to release the real 0.9.6 in a week or two, if nobody
finds any monstrous bugs in the meantime.
Get Pho here.
[ 20:30 Aug 11, 2008
More programming |
permalink to this entry
Sat, 09 Aug 2008
programming class at the GetSET
summer technology camp for high school girls. GetSET is a great
program run by the Society of Women Engineers.
it's intended for minority girls from relatively poor neighborhoods,
and the camp is free to the girls (thanks to some great corporate
sponsors). They're selected through a competitive interview process
so they're all amazingly smart and motivated, and it's always rewarding
Teaching programming in one day to people with no programming
background at all is challenging, of course. You can't get into any
of the details you'd like to cover, like style, or debugging
techniques. By the time you get through if-else, for and while loops,
some basic display methods, the usual debugging issues like reading
error messages, and typographical issues like
"Yes, uppercase and lowercase really are different" and "No, sorry,
that's a colon, you need a semicolon", it's a pretty full day and
the students are saturated.
I got drafted as lead presenter several years ago, by default by
virtue of being the only one of the workshop leaders who actually
to rewrite the course to try to make it more fun and visual
(originally it used a lot of form validation exercises), and
starting with last year's class I finally got the chance. I built
up a series of graphics and game exercises (using some of Sara
Falamaki's Hangman code, which seemed perfect since she wrote it
when she was about the same age as the girls in the class) and
it went pretty well. Of course, we had no idea how fast the girls
would go or how much material we could get through, so I tried to
keep it flexible and we adjusted as needed.
Last year went pretty well, and in the time since then we've
exchanged a lot of email about how we could improve it.
We re-ordered some of the exercises, shifted our emphasis in a few
places, factored some of the re-used code (like windowWidth()) into
a library file so the exercise files weren't so long, and moved more of
the visual examples earlier.
I also eliminated a lot of the slides. One of the biggest surprises
last year was the "board work". I had one exercise where the user
clicks in the page, and the student has to write the code to figure
out whether the click was over the image or not. I had been nervous
about that exercise -- I considered it the hardest of the exercises.
You have to take the X and Y coordinates of the mouse click, the X and
Y coordinates of the image (the upper left corner of the <div>
or <img> tag), and the size of the image (assumed to be 200x200),
Not hard once you understand the concepts, but hard to explain, right?
I hadn't made a slide for that, so we went to the whiteboard to draw
out the image, the location of the mouse click, the location of the
image's upper left corner, and figure out the math ...
and the students, who had mostly been sitting passively
through the heavily slide-intensive earlier stuff, came alive. They
understood the diagram, they were able to fill in the blanks and keep
track of mouse click X versus image X, and they didn't even have much
trouble turning that into code they typed into their exercise. Fantastic!
Remembering that, I tried to use a lot fewer slides this year.
I felt like I still needed to have slides to explain the basic
concepts that they actually needed to use for the exercises -- but
if there was anything I thought they could figure out from context,
or anything that was just background, I cut it. I tried for as few
slides as possible between exercises, and more places where we could
elicit answers from the students. I think we still have too many slides
and not enough "board work" -- but we're definitely making progress,
and this year went a lot better and kept them much better engaged.
We're considering next year doing the first several exercises on the
board first, then letting them type it in to their own copies to
verify that it works.
We did find we needed to leave code examples visible:
after showing slides saying something like "Ex 7:
Write a loop that writes a line of text in each color", I had to
back up to the previous slide where I'd showed what the code actually
Reference" handout for reference and not needing that information
on the slides; but in fact, I think they were confused about the
quickref and most never even opened it. Either that information needs
to be in the handout, or it needs to be displayed on the screen as
they work, or I have to direct them to the quickref page explicitly
("Now turn to page 3 in ...") or put that information in the exercises.
The graphical flower exercises were a big hit this year (I showed them
early and promised we'd get to them, and when we did, just before
lunch, several girls cheered) and, like last year, some of the girls
who finished them earlier decided on their own that they wanted to
change them to use other images, which was also a big hit. Several
other girls decided they wanted more than 10 flowers displayed, and
others hit on the idea of changing the timeout to be a lot shorter,
which made for some very fun displays. Surprisingly, hardly anyone
got into infinite loops and had to kill the browser (always a
like alert() or prompt()).
I still have some issues I haven't solved, like what to do about
them. Should I tell the girls that they're required? (I did that
this year, but it's confusing because then when you get to "if"
statements you have to explain why that's different.) Not mention
them at all? (I'm leaning toward that for next year.)
And it's always a problem figuring out what the fastest girls should
do while waiting for the rest to finish.
This year, in addition to trying to make each exercise shorter,
we tried having the girls work on them in groups of
two or three, so they could help each other. It didn't quite work out
that way -- they all worked on their own copies of the exercises
but they did seem to collaborate more, and I think that's the best
balance. We also encourage the ones who finish first to help the girls
around them, which mostly they do on their own anyway.
And we really do need to find a better editor we can use on the
Windows lab machines instead of Wordpad. Wordpad's font is too small on
the projection machine, and on the lab machines it's impossible for
most of us to tell the difference between parentheses, brackets and
braces, which leads to lots of time-wasting subtle bugs. Surely
there's something available for Windows that's easy to use,
freely distributable, makes it easy to change the font, and has
parenthesis and brace matching (syntax highlighting would be nice too).
Well, we have a year to look for one now.
All in all, we had a good day and most of the girls gave the class
high marks. Even the ones who concluded "I learned I shouldn't
be a programmer because it takes too much attention to detail"
said they liked the class. And we're fine with that --
not everybody wants to be a programmer, and the point isn't to
force them into any specific track. We're happy if we can give
them an idea of what computer programming is really like ...
then they'll decide for themselves what they want to be.
[ 11:54 Aug 09, 2008
More education |
permalink to this entry
Tue, 05 Aug 2008
The tech press is in a buzz about the new search company,
Cuil (pronounced "cool").
Most people don't like it much, but are using it as an excuse
to rhapsodize about Google and why they took such
a commanding lead in the search market, PageRank and huge
data centers and all those other good things Google has.
Not to run down PageRank or other Google inventions -- Google
does an excellent job at search these days (sometimes spam-SEO sites
get ahead of them, but so far they've always caught up) -- but that's
not how I remember it. Google's victory over other search engines
was a lot simpler and more basic than that. What did they bring?
Most of you have probably forgotten it since we take Google so for
granted now, but back in the bad old days when search engines were
just getting started, they all did it the wrong way. If you searched
red fish, pretty much all the early search engines would
give you all the pages that had either red or fish
anywhere in them. The more words you added, the less likely you
were to find anything that was remotely related to what you wanted.
Google was the first search engine that realized the simple fact
(obvious to all of us who were out there actually doing
searches) that what people want when they search for multiple words
is only the pages that have all the words -- the pages that
have both red and fish. It was the search
engine where it actually made sense to search for more than one word,
the first where you could realistically narrow down your search to
something fairly specific.
Even today, most site searches don't do this right. Try searching for
several keywords on your local college's web site, or on a retail site
that doesn't license Google (or Yahoo or other major search engine)
Logical and. The killer boolean for search engines.
(I should mention that Dave, when he heard this, shook his
head. "No. Google took over because it was the first engine that just
gave you simple text that you could read, without spinning blinking
images and tons of other crap cluttering up the page."
He has a point -- that was certainly
another big improvement Google brought, which hardly anybody else
seems to have realized even now. Commercial sites get more and more
nobody notices that Google, the industry leader, eschews all that crap
and sticks with simplicity. I don't agree that's why they won, but
it would be an excellent reason to stick with Google even if their search
results weren't the best.)
So what about Cuil? I finally got around to trying it this morning,
starting with a little "vanity google" for my name.
The results were fairly reasonable, though oddly slanted toward
TAC, a local astronomy group
in which I was fairly active around ten years ago
(three hits out of the first ten are TAC!)
Dave then started typing colors into Cuil to see what he would get,
and found some disturbing results. He has Firefox' cookie preference
set to "Ask me before setting a cookie" -- and it looks like Cuil loads
pages in the background, setting cookies galore for sites you haven't
ever seen or even asked to see. For every search term he thought of,
Cuil popped up a cookie request dialog while he was still typing.
blu wanted to set a cookie for bluefish.something.
gre wanted to set a cookie for www.gre.ac.uk.
yel wanted to set a cookie for www.myyellow.com.
pra wanted to set a cookie for www.pvamu.edu.
Pretty creepy, especially when combined with Cuil's propensity
(noted by every review I've seen so far, and it's true here too)
for including porn and spam sites. We only noticed this because he
happened to have the "Ask me" pref set. Most people wouldn't even know.
Use Cuil and you may end up with a lot of cookies set from sites
you've never even seen, sites you wouldn't want to be associated
with. Better hope no investigators come crawling through your
browser profile any time soon.
[ 10:10 Aug 05, 2008
More tech |
permalink to this entry
Mon, 04 Aug 2008
No postings for a while -- I was too tied up with getting ready for
OSCON, and now that it's over, too tied up with catching up with
stuff that gotten behind.
A few notes about OSCON:
It was a good conference -- lots of good speakers, interesting topics
and interesting people. Best talks: anything by Paul Fenwick,
anything by Damian Conway.
tutorial was fun too. It's a little embedded processor with a
breadboard and sockets to control arbitrary electronic devices,
all programmed over a USB plug using a Java app.
I'm not a hardware person at all (what do
those resistor color codes mean again?) but even I, even after coming
in late, managed to catch up and build the basic circuits they
demonstrated, including programming them with my laptop. Very cool!
I'm looking forward to playing more with the Arduino when I get a
spare few moments.
The conference's wi-fi network was slow and sometimes flaky (what else is new?)
but they had a nice touch I haven't seen at any other conference:
Wired connections, lots of them, on tables and sofas scattered
around the lounge area (and more in rooms like the speakers' lounge).
The wired net was very fast and very reliable. I'm always surprised
I don't see more wired connections at hotels and conferences, and
it sure came in handy at OSCON.
The AV staff was great, very professional and helpful. I was speaking
first thing Monday morning (ulp!) so I wanted to check the room Sunday
night and make sure my laptop could talk to the projector and so
forth. Everything worked fine.
Portland is a nice place to hold a convention -- the light rail is
great, the convention center is very accessible, and street parking
isn't bad either if you have a car there.
Dave went with me, so it made more sense for us to drive.
The drive was interesting because the central valley was so thick
with smoke from all the fires (including the terrible Paradise fire
that burned for so long, plus a new one that had just started up near
Yosemite) that we couldn't see Mt Shasta when driving right by it.
It didn't get any better until just outside of Sacramento. It must
have been tough for Sacramento valley residents, living in that for
weeks! I hope they've gotten cleared out now.
I finally saw that Redding Sundial bridge I've been hearing so much
about. We got there just before sunset, so we didn't get to check the
sundial, but we did get an impressive deep red smoky sun vanishing
into the gloom.
End of my little blog-break, and time to get back to
scrambling to get caught up on writing and prep for the
school girls. Every year we try to make it more relevant and
less boring, with more thinking and playing and less rote typing.
I think we're making progress, but we'll see how it goes next week.
[ 22:00 Aug 04, 2008
More conferences |
permalink to this entry