Shallow Thoughts

Akkana's Musings on Open Source, Science, and Nature.

Sun, 31 Aug 2008

Useful shell pipeline: Who checks in how much?

I wanted to get a list of who'd been contributing the most in a particular open source project. Most projects of any size have a ChangeLog file, in which check-ins have entries like this:
2008-08-26  Jane Hacker  <hacker@domain.org>

        * src/app/print.c: make sure the Portrait and Landscape
        * buttons update according to the current setting.

I wanted to take each entry, save the name of the developer checking in, then eventually count the number of times each name occurs (the number of times that developer checked in) and print them in order from most check-ins to least.

Getting the names is easy: for check-ins in the last 9 years, I just want the lines that start with "200". (Of course, if I wanted earlier check-ins I could make the match more general.)

grep "^200" ChangeLog

But now I want to trim the line so it includes only the contributor's name. A bit of sed geekery can do that: the date is a fixed format (four characters, a dash, two, dash, two, then two spaces, so "^....-..-.. " matches that pattern.

But I want to remove the email address part too (sometimes people use different email addresses when they check in). So I want a sed pattern that will match something at the front (to discard), something in the middle (keep that part) and something at the end (discard).

Here's how to do that in sed:

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\)<.*$/\1/'
In English, that says: "For each line in the ChangeLog that starts with 200, find a pattern at the beginning consisting of any four characters, a dash, two characters, dash, two characters, dash, and two spaces; then immediately after that, save all characters up to a < symbol; then throw away the < and any characters that follow until the end of the line."

That works pretty well! But it's not quite right: it includes the two spaces after the name as part of the name. In sed, \s matches any space character (like space or tab). So you'd think this should work:

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\)\s+<.*$/\1/'
\s+ means it will require that at least one and maybe more space characters immediately before the < are also discarded. But it doesn't work. It turns out the reason is that the \(.*\) expression is "greedier" than the \s+: so the saved name expression grabs the first space, leaving only the second to the \s+.

The way around that is to make the name expression specify that it can't end with a space. \S is the term for "anything that's not a space character"; so the expression becomes

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\S\)\s\+<.*$/\1/'
(the + turned out to need a backslash before it).

We have the list of names! Add a | sort on the end to sort them alphabetically -- that will make sure you get all the "Jane Hacker" lines listed together. But how to count them? The Unix program most frequently invoked after sort is uniq, which gets rid of all the repeated lines. On a hunch, I checked out the man page, man uniq, and found the -c option: "prefix lines by the number of occurrences". Perfect! Then just sort them by the number, from largest to smallest:

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\S\)\s+<.*$/\1/' | sort | uniq -c | sort -rn
And we're done!

Now, this isn't perfect since it doesn't catch "Checking in patch contributed by susan@otherhost.com" attributions -- but those aren't in a standard format in most projects, so they have to be handled by hand.

Disclaimer: Of course, number of check-ins is not a good measure of how important or productive someone is. You can check in a lot of one-line fixes, or you can write an important new module and submit it for someone else to merge in. The point here wasn't to rank developers, but just to get an idea who was checking into the tree and how often.

Well, that ... and an excuse to play with nifty Linux shell pipelines.

Tags: , , , ,
[ 11:12 Aug 31, 2008    More linux | permalink to this entry ]

Thu, 28 Aug 2008

Writing for Linux Planet: Stargazing with KStars

I have an article on Linux Planet! The first of many, I hope. At least the first of a short series on Linux astronomy programs, starting with the one that's easiest to use: KStars. It's oriented toward binocular observing, with suggestions for good targets for beginners.

Viewing the Night Sky with Linux, Part I: KStars

Tags: , ,
[ 21:46 Aug 28, 2008    More writing | permalink to this entry ]

Sat, 23 Aug 2008

Forwarding a message with attachments in Mutt

I've wanted to know forever how to forward a message with all or some of its attachments from mutt.

You can set the variable mime_forward so that when you forward a message, it includes the entire message, headers, attachments and all, as a single attachment. You can't edit this or change it in any way. If you want to trim the original message, or omit one of the attachments, you're out of luck.

I've found two ways to do it.

First: type v to get to the attachments screen. Type t repeatedly to tag all the attachments, including the initial small text/plain attachment (that's the original message body). When they're all tagged, type ;f (forward all tagged attachments). After you fill in the To: prompt, you'll be able to edit the message body, and when you leave the editor, you'll have the attachment list there to edit as you see fit.

If that doesn't work (I haven't tried it on HTML messages), there's a slightly more elaborate procedure: use
Esc-e   resend-message   use the current message as a template for a new one.
This calls up an editor on the current message, including headers. Change the From to your name, the To to your intended recipient, and edit the message body to your heart's content. When you're done, you're sent to the Compose screen, where you can adjust the attachment list and send the message.

Forwarding is pretty clearly not what Esc-E was intended for ... but it does the job and might be a handy trick to know.

Tags: , , ,
[ 22:14 Aug 23, 2008    More linux | permalink to this entry ]

Fri, 22 Aug 2008

College Pays

[Mediun earnings] One of the local community colleges sent out glossy flyers advertising their program, with the tag line "College pays for itself; don't put it off!"

To prove how valuable college can be, they include a helpful table showing the "Mediun earnings" for people with various education levels.

West Valley actually has a decent sciences program, and some other interesting programs like Park Management (ranger training). But I suspect I should stay away from their English and Statistics classes.

Tags: ,
[ 15:47 Aug 22, 2008    More humor | permalink to this entry ]

Thu, 21 Aug 2008

Pond denizens struggle against mud

On a short afternoon hike at Sanborn today, Dave and I decided to go by the tiny koi pond near the visitors' center to see if any newts were left this late into summer. [newt stuck in mud]

What a scene! In the current semi-drought, the pond has become a mud flat, its surface criss-crossed with tracks and squirming with newts and crayfish trying to push themselves out of the sticky mud.

In the few holes where the water was more than a couple inches deep, fish flopped -- several 6-8" long golden koi plus something brown but similarly large. A few of the newts thrashed in the water holes, too, seemingly trying to get clean of the mud that coated them; but most of the newts wriggled across the shallower mud flats, heading nowhere in particular but looking very unhappy. The crayfish seemed most numerous at the dryer edges of the pond, pushing themselves laboriously up out of the mud with their claws and dragging themselves across the mud.

Newts normally migrate, and can go surprisingly long distances (miles) across land, so I think at least some of these newts will survive. The fish, I must assume, are doomed unless someone rescues them. I wonder if the rangers have considered selling the non-native koi to someone who wants them, and replacing them with native fish? Are there any fish native here this far upstream? Penitencia Creek (at Alum Rock) has small fish (up to about 3" long), but it carries more water in dry seasons than any creek near Sanborn.

What about the crayfish? Can crayfish survive long out of water, bury themselves in mud (the ones here didn't seem too happy about that idea) or migrate overland?

I suspect there will be some happy park raccoons tonight.

Tags: , , ,
[ 20:21 Aug 21, 2008    More nature | permalink to this entry ]

Chase "Blink" RFID credit cards

Chase sent us replacement MasterCards out of the blue. They came with a brochure touting their wonderful new Chase's "Blink" feature. So convenient! You just hold the card near a reader and it charges your account, no need for any of that pesky swiping of cards or signing of forms!

Yes, it's RFID (Radio Frequency ID tags), the small low-power radio transmitters also used by Walmart and various other retailers, and in other applications such as company security badges/access cards (and, unfortunately, new passports in quite a few countries).

It seems a little odd to me that Chase's marketing implies that most people would think it's a good thing to have a credit card that can be charged easily without even taking it out of your wallet ... to have a card that can be charged from some distance away without your even knowing about it.

It's apparently easy and cheap to build an RFID credit card skimmer: Bruce Schneier has collected several articles about it, and in a later article he offers several links to articles on how to build your own RFID skimmer.

We called Chase right away and told them we didn't want the "Blink" cards. They said we could keep our old, non-RFID cards and continue to use them, and destroy the new ones. Whew!

Googling for links for this article, I found that we're not the only Chase customers to want to decline Blink.

For anyone wondering how secure this technology is, the recent debacle of the cracked Dutch RFID subway cards gives you an idea (Bruce Schneier, "Dutch RFID Transit Card Hacked"; The Register, "Dutch transit card crippled by multihacks", and a followup where the MBTA, Boston's transit agency, used the courts to muzzle three MIT students who were trying to present a paper at Defcon on the security holes in the MBTA's RFID-based pay system.

For anyone who gets stuck with an RFID credit card, here's how to make an RFID-blocking wallet, and how to make an RFID zapper.

Tags: , ,
[ 10:26 Aug 21, 2008    More tech/security | permalink to this entry ]

Sat, 16 Aug 2008

Fast Pixel Ops in GIMP-Python

Last night Joao and I were on IRC helping someone who was learning to write gimp plug-ins. We got to talking about pixel operations and how to do them in Python. I offered my arclayer.py as an example of using pixel regions in gimp, but added that C is a lot faster for pixel operations. I wondered if reading directly from the tiles (then writing to a pixel region) might be faster.

But Joao knew a still faster way. As I understand it, one major reason Python is slow at pixel region operations compared to a C plug-in is that Python only writes to the region one pixel at a time, while C can write batches of pixels by row, column, etc. But it turns out you can grab a whole pixel region into a Python array, manipulate it as an array then write the whole array back to the region. He thought this would probably be quite a bit faster than writing to the pixel region for every pixel.

He showed me how to change the arclayer.py code to use arrays, and I tried it on a few test layers. Was it faster? I made a test I knew would take a long time in arclayer, a line of text about 1500 pixels wide. Tested it in the old arclayer; it took just over a minute to calculate the arc. Then I tried Joao's array version: timing with my wristwatch stopwatch, I call it about 1.7 seconds. Wow! That might be faster than the C version.

The updated, fast version (0.3) of arclayer.py is on my arclayer page.

If you just want the trick to using arrays, here it is:

from array import array

[ ... setting up ... ]
        # initialize the regions and get their contents into arrays:
        srcRgn = layer.get_pixel_rgn(0, 0, srcWidth, srcHeight,
                                     False, False)
        src_pixels = array("B", srcRgn[0:srcWidth, 0:srcHeight])

        dstRgn = destDrawable.get_pixel_rgn(0, 0, newWidth, newHeight,
                                            True, True)
        p_size = len(srcRgn[0,0])               
        dest_pixels = array("B", "\x00" * (newWidth * newHeight * p_size))

[ ... then inside the loop over x and y ... ]
                        src_pos = (x + srcWidth * y) * p_size
                        dest_pos = (newx + newWidth * newy) * p_size
                        
                        newval = src_pixels[src_pos: src_pos + p_size]
                        dest_pixels[dest_pos : dest_pos + p_size] = newval

[ ... when the loop is all finished ... ]
        # Copy the whole array back to the pixel region:
        dstRgn[0:newWidth, 0:newHeight] = dest_pixels.tostring() 

Good stuff!

Tags: , , ,
[ 21:02 Aug 16, 2008    More gimp | permalink to this entry ]

Mon, 11 Aug 2008

Pho 0.9.6-pre1

I've been using my pre-released 0.9.6-pre1 version of pho, my image viewer, for ages, now, and it's been working fine. I keep wanting to release it, but there were a couple of minor bugs that irritated me and I hadn't had time to track down. Tonight, I finally got caught up with my backlog and found myself with a few extra minutes to spare, and fixed the last two known bugs. Quick, time to release before I discover anything else!

(There were a couple other features I was hoping to implement -- multiple external commands, parsing a .phorc file, and having Keywords mode read and write the Keywords file itself -- but none of those is terribly important and they can wait.)

It's only a -pre release, but I'm not going to have a long protracted set of betas this time. 0.9.6-pre1 is very usable, and I'm finding Keywords mode to be awfully useful for classifying my mountain of back photos.

So, pho users, give it a try and let me know if you see any bugs! It's my hope to release the real 0.9.6 in a week or two, if nobody finds any monstrous bugs in the meantime.

Get Pho here.

[ 20:30 Aug 11, 2008    More programming | permalink to this entry ]

Sat, 09 Aug 2008

GetSET: Teaching Javascript to high school girls

Every summer I volunteer as an instructor for a one-day Javascript programming class at the GetSET summer technology camp for high school girls. GetSET is a great program run by the Society of Women Engineers. it's intended for minority girls from relatively poor neighborhoods, and the camp is free to the girls (thanks to some great corporate sponsors). They're selected through a competitive interview process so they're all amazingly smart and motivated, and it's always rewarding being involved.

Teaching programming in one day to people with no programming background at all is challenging, of course. You can't get into any of the details you'd like to cover, like style, or debugging techniques. By the time you get through if-else, for and while loops, some basic display methods, the usual debugging issues like reading error messages, and typographical issues like "Yes, uppercase and lowercase really are different" and "No, sorry, that's a colon, you need a semicolon", it's a pretty full day and the students are saturated.

I got drafted as lead presenter several years ago, by default by virtue of being the only one of the workshop leaders who actually programs in Javascript. For several years I'd been asking for a chance to rewrite the course to try to make it more fun and visual (originally it used a lot of form validation exercises), and starting with last year's class I finally got the chance. I built up a series of graphics and game exercises (using some of Sara Falamaki's Hangman code, which seemed perfect since she wrote it when she was about the same age as the girls in the class) and it went pretty well. Of course, we had no idea how fast the girls would go or how much material we could get through, so I tried to keep it flexible and we adjusted as needed.

Last year went pretty well, and in the time since then we've exchanged a lot of email about how we could improve it. We re-ordered some of the exercises, shifted our emphasis in a few places, factored some of the re-used code (like windowWidth()) into a library file so the exercise files weren't so long, and moved more of the visual examples earlier.

I also eliminated a lot of the slides. One of the biggest surprises last year was the "board work". I had one exercise where the user clicks in the page, and the student has to write the code to figure out whether the click was over the image or not. I had been nervous about that exercise -- I considered it the hardest of the exercises. You have to take the X and Y coordinates of the mouse click, the X and Y coordinates of the image (the upper left corner of the <div> or <img> tag), and the size of the image (assumed to be 200x200), and turn that all into a couple of lines of working Javascript code. Not hard once you understand the concepts, but hard to explain, right?

I hadn't made a slide for that, so we went to the whiteboard to draw out the image, the location of the mouse click, the location of the image's upper left corner, and figure out the math ... and the students, who had mostly been sitting passively through the heavily slide-intensive earlier stuff, came alive. They understood the diagram, they were able to fill in the blanks and keep track of mouse click X versus image X, and they didn't even have much trouble turning that into code they typed into their exercise. Fantastic!

Remembering that, I tried to use a lot fewer slides this year. I felt like I still needed to have slides to explain the basic concepts that they actually needed to use for the exercises -- but if there was anything I thought they could figure out from context, or anything that was just background, I cut it. I tried for as few slides as possible between exercises, and more places where we could elicit answers from the students. I think we still have too many slides and not enough "board work" -- but we're definitely making progress, and this year went a lot better and kept them much better engaged. We're considering next year doing the first several exercises on the board first, then letting them type it in to their own copies to verify that it works.

We did find we needed to leave code examples visible: after showing slides saying something like "Ex 7: Write a loop that writes a line of text in each color", I had to back up to the previous slide where I'd showed what the code actually looked like. I had planned on their using my "Javascript Quick Reference" handout for reference and not needing that information on the slides; but in fact, I think they were confused about the quickref and most never even opened it. Either that information needs to be in the handout, or it needs to be displayed on the screen as they work, or I have to direct them to the quickref page explicitly ("Now turn to page 3 in ...") or put that information in the exercises.

The graphical flower exercises were a big hit this year (I showed them early and promised we'd get to them, and when we did, just before lunch, several girls cheered) and, like last year, some of the girls who finished them earlier decided on their own that they wanted to change them to use other images, which was also a big hit. Several other girls decided they wanted more than 10 flowers displayed, and others hit on the idea of changing the timeout to be a lot shorter, which made for some very fun displays. Surprisingly, hardly anyone got into infinite loops and had to kill the browser (always a potential problem with javascript, especially when using popups like alert() or prompt()).

I still have some issues I haven't solved, like what to do about semicolons and braces. Javascript is fairly agnostic about them. Should I tell the girls that they're required? (I did that this year, but it's confusing because then when you get to "if" statements you have to explain why that's different.) Not mention them at all? (I'm leaning toward that for next year.)

And it's always a problem figuring out what the fastest girls should do while waiting for the rest to finish. This year, in addition to trying to make each exercise shorter, we tried having the girls work on them in groups of two or three, so they could help each other. It didn't quite work out that way -- they all worked on their own copies of the exercises but they did seem to collaborate more, and I think that's the best balance. We also encourage the ones who finish first to help the girls around them, which mostly they do on their own anyway.

And we really do need to find a better editor we can use on the Windows lab machines instead of Wordpad. Wordpad's font is too small on the projection machine, and on the lab machines it's impossible for most of us to tell the difference between parentheses, brackets and braces, which leads to lots of time-wasting subtle bugs. Surely there's something available for Windows that's easy to use, freely distributable, makes it easy to change the font, and has parenthesis and brace matching (syntax highlighting would be nice too). Well, we have a year to look for one now.

All in all, we had a good day and most of the girls gave the class high marks. Even the ones who concluded "I learned I shouldn't be a programmer because it takes too much attention to detail" said they liked the class. And we're fine with that -- not everybody wants to be a programmer, and the point isn't to force them into any specific track. We're happy if we can give them an idea of what computer programming is really like ... then they'll decide for themselves what they want to be.

Tags: , , ,
[ 11:54 Aug 09, 2008    More education | permalink to this entry ]

Tue, 05 Aug 2008

In Praise of Logical AND. In Censure of Invasive Cookies.

The tech press is in a buzz about the new search company, Cuil (pronounced "cool"). Most people don't like it much, but are using it as an excuse to rhapsodize about Google and why they took such a commanding lead in the search market, PageRank and huge data centers and all those other good things Google has.

Not to run down PageRank or other Google inventions -- Google does an excellent job at search these days (sometimes spam-SEO sites get ahead of them, but so far they've always caught up) -- but that's not how I remember it. Google's victory over other search engines was a lot simpler and more basic than that. What did they bring?

Logical AND.

Most of you have probably forgotten it since we take Google so for granted now, but back in the bad old days when search engines were just getting started, they all did it the wrong way. If you searched for red fish, pretty much all the early search engines would give you all the pages that had either red or fish anywhere in them. The more words you added, the less likely you were to find anything that was remotely related to what you wanted.

Google was the first search engine that realized the simple fact (obvious to all of us who were out there actually doing searches) that what people want when they search for multiple words is only the pages that have all the words -- the pages that have both red and fish. It was the search engine where it actually made sense to search for more than one word, the first where you could realistically narrow down your search to something fairly specific.

Even today, most site searches don't do this right. Try searching for several keywords on your local college's web site, or on a retail site that doesn't license Google (or Yahoo or other major search engine) technology.

Logical and. The killer boolean for search engines.

(I should mention that Dave, when he heard this, shook his head. "No. Google took over because it was the first engine that just gave you simple text that you could read, without spinning blinking images and tons of other crap cluttering up the page." He has a point -- that was certainly another big improvement Google brought, which hardly anybody else seems to have realized even now. Commercial sites get more and more cluttered, and nobody notices that Google, the industry leader, eschews all that crap and sticks with simplicity. I don't agree that's why they won, but it would be an excellent reason to stick with Google even if their search results weren't the best.)

So what about Cuil? I finally got around to trying it this morning, starting with a little "vanity google" for my name. The results were fairly reasonable, though oddly slanted toward TAC, a local astronomy group in which I was fairly active around ten years ago (three hits out of the first ten are TAC!)

Dave then started typing colors into Cuil to see what he would get, and found some disturbing results. He has Firefox' cookie preference set to "Ask me before setting a cookie" -- and it looks like Cuil loads pages in the background, setting cookies galore for sites you haven't ever seen or even asked to see. For every search term he thought of, Cuil popped up a cookie request dialog while he was still typing.

Searching for blu wanted to set a cookie for bluefish.something.
Searching for gre wanted to set a cookie for www.gre.ac.uk.
Searching for yel wanted to set a cookie for www.myyellow.com.
Searching for pra wanted to set a cookie for www.pvamu.edu.

Pretty creepy, especially when combined with Cuil's propensity (noted by every review I've seen so far, and it's true here too) for including porn and spam sites. We only noticed this because he happened to have the "Ask me" pref set. Most people wouldn't even know. Use Cuil and you may end up with a lot of cookies set from sites you've never even seen, sites you wouldn't want to be associated with. Better hope no investigators come crawling through your browser profile any time soon.

Tags: , ,
[ 10:10 Aug 05, 2008    More tech | permalink to this entry ]

Mon, 04 Aug 2008

Back from OSCON

No postings for a while -- I was too tied up with getting ready for OSCON, and now that it's over, too tied up with catching up with stuff that gotten behind.

A few notes about OSCON:

It was a good conference -- lots of good speakers, interesting topics and interesting people. Best talks: anything by Paul Fenwick, anything by Damian Conway.

The Arduino tutorial was fun too. It's a little embedded processor with a breadboard and sockets to control arbitrary electronic devices, all programmed over a USB plug using a Java app. I'm not a hardware person at all (what do those resistor color codes mean again?) but even I, even after coming in late, managed to catch up and build the basic circuits they demonstrated, including programming them with my laptop. Very cool! I'm looking forward to playing more with the Arduino when I get a spare few moments.

The conference's wi-fi network was slow and sometimes flaky (what else is new?) but they had a nice touch I haven't seen at any other conference: Wired connections, lots of them, on tables and sofas scattered around the lounge area (and more in rooms like the speakers' lounge). The wired net was very fast and very reliable. I'm always surprised I don't see more wired connections at hotels and conferences, and it sure came in handy at OSCON.

The AV staff was great, very professional and helpful. I was speaking first thing Monday morning (ulp!) so I wanted to check the room Sunday night and make sure my laptop could talk to the projector and so forth. Everything worked fine.

Portland is a nice place to hold a convention -- the light rail is great, the convention center is very accessible, and street parking isn't bad either if you have a car there.

Dave went with me, so it made more sense for us to drive. The drive was interesting because the central valley was so thick with smoke from all the fires (including the terrible Paradise fire that burned for so long, plus a new one that had just started up near Yosemite) that we couldn't see Mt Shasta when driving right by it. It didn't get any better until just outside of Sacramento. It must have been tough for Sacramento valley residents, living in that for weeks! I hope they've gotten cleared out now.

[Redding Sundial bridge] I finally saw that Redding Sundial bridge I've been hearing so much about. We got there just before sunset, so we didn't get to check the sundial, but we did get an impressive deep red smoky sun vanishing into the gloom. Photos here.

End of my little blog-break, and time to get back to scrambling to get caught up on writing and prep for the GetSET Javascript class for high school girls. Every year we try to make it more relevant and less boring, with more thinking and playing and less rote typing. I think we're making progress, but we'll see how it goes next week.

Tags: , , , , ,
[ 22:00 Aug 04, 2008    More conferences | permalink to this entry ]