Shallow Thoughts : : Mar

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Mon, 29 Mar 2010

Non-greedy regular expressions to clean up crappy autogenerated HTML

I maintain the websites for several clubs. No surprise there -- anyone with even a moderate knowledge of HTML, or just a lack of fear of it, invariably gets drafted into that job in any non-computer club.

In one club, the person in charge of scheduling sends out an elaborate document every three months in various formats -- Word, RTF, Excel, it's different each time. The only regularity is that it's always full of crap that makes it hard to turn it into a nice simple HTML table.

This quarter, the formats were Word and RTF. I used unrtf to turn the RTF version into HTML -- and found a horrifying page full of lines like this:

<body><font size=3></font><font size=3><br>
</font><font size=3></font><b><font size=4></font></b><b><font size=4><table border=2>
</font></b><b><font size=4><tr><td><b><font size=4><font face="Arial">Club Schedule</font></font></b><b><font size=4></font></b><b><font size=4></font></b></td>
<font size=3></font><font size=3><td><font size=3><b><font face="Arial">April 13</font></b></font><font size=3></font><font size=3><br>
</font><font size=3></font><font size=3><b></b></font></td>
I've put the actual page content in bold; the rest is just junk, mostly doing nothing, mostly not even legal HTML, that needs to be eliminated if I want the page to load and display reasonably.

I didn't want to clean up that mess by hand! So I needed some regular expressions to clean it up in an editor. I tried emacs first, but emacs makes it hard to try an expression then modify it a little when the first try doesn't work, so I switched to vim.

The key to this sort of cleanup is non-greedy regular expressions. When you have a bad tag sandwiched in the middle of a line containing other tags, you want to remove everything from the <font through the next > -- but no farther, or else you'll delete real content. If you have a line like

<td><font size=3>Hello<font> world</td>
you only want to delete through the <font>, not through the </td>.

In general, you make a regular expression non-greedy by adding a ? after the wildcard -- e.g. <font.*?>. But that doesn't work in vim. In vim, you have to use \{M,N} which matches from M to N repetitions of whatever immediately precedes it. You can also use the shortcut \{-} to mean the same thing as *? (0 or more matches) in other programs.

Using that, I built up a series of regexp substitutes to clean up that unrtf mess in vim:

:%s/<\/\{0,1}font.\{-}>//g
:%s/<b><\/b>//g
:%s/<\/b><b>//g
:%s/<\/i><i>//g
:%s/<td><\/td>/<td><br><\/td>/g
:%s/<\/\{0,1}span.\{-}>//g
:%s/<\/\{0,1}center>//g

That took care of 90% of the work, leaving me with hardly any cleanup I needed to do by hand. I'll definitely keep that list around for the next time I need to do this.

Tags: , , ,
[ 23:02 Mar 29, 2010    More linux/editors | permalink to this entry | ]

Sat, 27 Mar 2010

Creating a Linux Live USB stick: Big win for Fedora

Three times now I've gotten myself into a situation where I was trying to install Ubuntu and for some reason couldn't burn a CD. So I thought hey, maybe I can make a bootable USB image on this handy thumb drive here. And spent the next three hours unsuccessfully trying to create one. And finally gave up, got in the car and went to buy a new CD burner or find someone who could burn the ISO to a CD because that's really the only way you can install or run Ubuntu.

There are tons of howtos on the web for creating live USB sticks for Ubuntu. Almost all of them start with "First, download the CD image and burn it to a CD. Now, boot off the CD and ..."

The few that don't discuss apps like usb-creator-gtk or unetbootin tha work great if you're burning the current Ubuntu Live CD image from a reasonably current Ubuntu machine, but which fail miserably in every other case (wildly pathological cases like burning the current Ubuntu alternate installer CD from the last long-term-support version of Ubuntu. I mean, really, should that be so unusual?)

Tonight, I wanted a bootable USB of Fedora 12. I tried the Ubuntu tools already mentioned, but usb-creator-gtk won't even try with an image that isn't Ubuntu, and unetbootin wrote something but the resulting stick didn't boot.

I asked on the Fedora IRC channel, where a helpful person pointed me to this paragraph on copying an ISO image with dd.

Holy mackerel! One command:

dd if=Fedora-12-i686-Live.iso of=/dev/sdf bs=8M
and in less than ten minutes it was ready. And it booted just fine!

Really, Ubuntu, you should take a look at Fedora now and then. For machines that are new enough, USB boot is much faster and easier than CD burning -- so give people an easy way to get a bootable USB version of your operating system. Or they might give up and try a distro that does make it easy.

Tags: , , ,
[ 23:01 Mar 27, 2010    More linux/install | permalink to this entry | ]

Thu, 25 Mar 2010

Article: Linux Boot Camp (part 1: SysV Init)

My latest article is up on Linux Planet: How Linux Boots: Linux Boot Camp (Part I: SysV Init)

It describes the boot sequence, from grub to kernel loading to init scripts to starting X. Part I covers the classic "SysV Init" model still used to some extent by every distro; part II will cover Upstart, the version that's gradually working its way into some of the newer Linux releases.

Tags: , ,
[ 15:25 Mar 25, 2010    More writing | permalink to this entry | ]

Tue, 23 Mar 2010

They're teaching you wrong

Overheard at a restaurant tonight:

Mother: So, what did you learn in school today?

Son: (Excited) We learned to do one takeaway one!

Mother: Really? What is one takeaway one?

Son: (with obvious pride) Zero!

Mother: Are you sure?

Son: (slightly flustered) Uh, yeah ... Zero ...?

Mother: One takeaway one is ten. They're teaching you wrong, aren't they?

Tags: ,
[ 19:13 Mar 23, 2010    More education | permalink to this entry | ]

Fri, 19 Mar 2010

Firefox printer preferences: the novel

I discovered recently that 1067 lines in my Firefox preferences file (out of 1438 total) were devoted to duplicating default printer settings.

I got a new printer recently. I needed to set up a preference in user.js so I can switch temporarily to landscape mode printing without having landscape mode become permanent. So I checked in on prefs.js to see what Firefox called my new printer -- and, well, eek!

For every printer I've ever used on this machine, I had a set of options that looked like this:

user_pref("print.printer_Epson.print_bgcolor", false);
user_pref("print.printer_Epson.print_bgimages", false);
user_pref("print.printer_Epson.print_colorspace", "default");
user_pref("print.printer_Epson.print_command", "lpr ${MOZ_PRINTER_NAME:+-P\"$MOZ_PRINTER_NAME\"}");
user_pref("print.printer_Epson.print_margin_bottom", "0.500000012107193");
user_pref("print.printer_Epson.print_margin_left", "0.500000012107193");
and so on -- 41 lines, in the case of print.printer_Epson. But some printers had multiple sets of preferences -- here's the list of printer names, each of which had those 41 lines, more or less:

In case you're curious, this encompasses three physical printers I've used with Firefox: my old Epson C86, my new HP F4280, and Dave's Brother HL 2070N. None of these values is anything I've ever set myself; they're all default values.

Why Firefox feels the need to store them for all eternity is anybody's guess.

But wait, you say ... 41 lines times 9 printers is a lot, but it doesn't come close to equalling 1067 lines. What else is there?

Well, there are another 43 lines that repeat all those same defaults again but don't specify any particular printer, like user_pref("print.print_footerleft", "&PT");.

And then, oh wait, what's this? All the preceding prefs are duplicated all over again, with "tmp" added, like this:

user_pref("print.tmp.printerfeatures.Brother_HL-2070N_series.supports_paper_size
_change", true);
user_pref("print.tmp.printerfeatures.CUPS/Brother.orientation.count", 2);
and so on. 456 lines of that.

Unfortunately, I got a little over-zealous in deleting lines before I'd made a backup of the original file. So by the time it occurred to me to write this up, I'd destroyed some of the evidence and had to work from a backup, which "only" had 813 lines of print preferences. Part of that is that I didn't have the new printer yet (two entries times 41 lines times two) but that only gets me up to 977 lines. I'm not sure what the other 190 lines were.

How many printing preferences do you have? You can see them by going to about:config and typing print. Or on Linux, you can count them. First find your profile folder, where your prefs.js file lives, or search for prefs.js directly:

locate prefs.js | grep home

Then use wc on that prefs.js file to count your print preference lines:

grep print yourprofile/prefs.js | wc -l

As to why Firefox uses so many redundant lines in the preference file for settings that have never been changed from the defaults ... well, your guess is as good as mine.

Tags: , , ,
[ 19:59 Mar 19, 2010    More tech/web | permalink to this entry | ]

Sun, 14 Mar 2010

Finally -- Tapered lines in GIMP! How to make them (in 2.7).

[grass brush example] So many times I've wanted a way to make tapered lines in GIMP. It doesn't come up that often, but when it does, it's frustrating that it's so difficult.

For instance, when I was working on the animated brush section of Beginning GIMP, I wanted to make a brush that looked like grass, because that's something I've found quite difficult to do by hand in GIMP. But to make each blade of grass, I ended up drawing a green line of fixed width, zoom way in, then using the lasso selection tool to select and clear the edges of the end of each stroke. What a pain!

[tapered lines in GIMP] Imagine my excitement when I saw GIMP developer Alexia Death talking about how she'd added taper to GIMP's Paint Dynamics in the development version of GIMP. I had to try it.

But I needed some help figuring out how to do it, and I know I'll forget; so I wrote up a tutorial, both for myself and to help anyone else who needs tapered lines.

Alas, this feature is brand new and only works in recent development builds. But if you aren't that current with GIMP, it's something to look forward to. I'll keep this tutorial updated in case methods change.

GIMP Tutorial: Tapered lines

Tags: ,
[ 20:26 Mar 14, 2010    More gimp | permalink to this entry | ]

Thu, 11 Mar 2010

Grub2 Tutorial, Part 3

Part 3 and final of my series on configuring Ubuntu's new grub2 boot menu. I translate a couple of commonly-seen error messages, but most of the article is devoted to multi-boot machines. If you have several different operating systems or Linux distros installed on separate disk partitions, grub2 has some unpleasant surprises, so see my article for some (unfortunately very hacky) workarounds for its limitations.

Why use Grub2? Good question!
(Let me note that I didn't write the title, though I don't disagree with it.)

Tags: , , ,
[ 10:56 Mar 11, 2010    More writing | permalink to this entry | ]

Tue, 09 Mar 2010

Making those Fn- laptop keys do something useful

A friend was trying to get some of her laptop's function keys working under Ubuntu, and that reminded me that I'd been meaning to do the same on my Vaio TX 650P.

My brightness keys worked automagically -- I suspected via the scripts in /etc/acpi -- and that was helpful in tracking down the rest of the information I needed. But it still took a bit of fiddling since (surprise!) how this stuff works isn't documented.

Update: That "isn't documented" remark applies to the ACPI system. Matt Zimmerman points out that there is some good documentation on the rest of the key-handling system, and pointed me to two really excellent pages: Hotkeys architecture and Hotkeys Troubleshooting. Recommended reading!

Here's the procedure I found.

First, use acpi_listen to find out what events are generated by the key you care about. Not all keys generate ACPI events. I haven't get figured out what controls this -- possibly the kernel. When you type the key, you're looking for something like this:

sony/hotkey SPIC 00000001 00000012
You may get separate events for key down and key up. It's your choice as to which one matters.

Once you know the code for your key, it's time to make it do something. Create a new file in /etc/acpi/events -- I called mine sony-lcd-btn. It doesn't matter what you call it -- acpid will read all of them. (Yes, that means every time you start up it's reading all those toshiba and asus files even if you have a Lenovo or Sony. Looks like a nice place to shave off a little boot time.)

The file is very simple and should look something like this:

# /etc/acpi/events/sony-lcd-btn

event=sony/hotkey SPIC 00000001 00000012
action=/etc/acpi/sonylcd.sh

Now create a script for the action you specified in the event file. I created a script /etc/acpi/sonylcd.sh that looks like this:

#! /bin/bash
# temporary, for testing:
echo "LCD button!" >/dev/console

Now restart acpid: service acpid restart if you're on karmic, or /etc/init.d/acpid restart on earlier releases. Press the button. If you're running from the console (or using a tool like xconsole), and you got all the codes right, you should be able to see the echo from your script.

Now you can do anything you want. For instance, when I press the LCD button I generally want to run this:

xrandr --output VGA --mode 1024x768

Or to make it toggle, I could write a slightly smarter script using xrandr --query to find out the current mode and behave accordingly. I'll probably do that at some point when I have a projector handy.

Tags: , ,
[ 17:15 Mar 09, 2010    More linux/kernel | permalink to this entry | ]

Sun, 07 Mar 2010

Recipe: Crockpot Rouladen

I never blog recipes. But while I was making rouladen today, I remembered when I first tried to make it, and discovered that the recipes on the web were all for something entirely different than the delicious rouladen my mom used to make. Mom got the recipe from a German babysitter named Betty who used to take care of me when I was little. It was fantastic and I haven't had anything else like it anywhere, so I asked Mom for the recipe, adapted it a little for my crockpot, and have been enjoying it ever since.

Apologies for the lack of precise quantities. This is how we do recipes in my family, and I'm not great at following precise instructions anyway, and in any case, the recipe originally came from Mom watching Betty make it once.

Crockpot Rouladen

Flank steak - lay it out flat.

Mustard - whatever kind you have lying around. Paint a thin layer onto steak. I personally hate mustard, but it doesn't taste like mustard in the final dish so it's okay.

Bacon - maybe 5 pieces. Cook to not-quite-crisp, to get rid of some of the fat. I cut off some of the fat too, but I'm weird that way. Lay strips on top of mustard.

Bread crumbs - Sprinkle on top of bacon. A little or a lot, as you wish. Enough to leak out when it's rolled, as it thickens the sauce nicely.

Roll steak up and secure with skewers or string. Watch the grain and roll it so that when you slice it, you'll be slicing across the grain. This will seem weird and wrong and you'll want to roll it up the other way because this way you'll end up with a long skinny thing that doesn't fit in the pot. It'll taste just as good either way, but it'll be a lot easier to eat if you roll it up the right way.

Brown steak a bit in small amount of oil, any kind ... maybe use a little of the bacon grease.

Onions, sliced - I don't like onions, so I leave them out.

Tomato sauce - one regular-sized can. Pour over steak. Add a little water too, up to about 1/3 can, if you want more sauce.

Salt, pepper, spices as desired. I add a little cinnamon, to make it taste more like Grecian Chicken (another tomato-sauce recipe where googling gets entirely the wrong result, and if I ever find it I'll be sure to blog it) or like the chicken tikka masala at Bollywood Cafe (which has no resemblance to tikka masala anywhere else, but is wonderful). I usually toss in a couple of bay leaves too, and whatever else I feel like adding that day.

Cook in the crockpot maybe 6.5 hours on high, longer on low. Also works fine simmering in a pan on the stove -- check it about 2.5 hours but expect it to take 3 or so. It doesn't hurt to baste occasionally, or add water if it starts to look dry (in the crockpot that usually isn't needed).

In the last hour or two, toss in:

Raisins - maybe a double handful (a couple small boxes).

When it's done, it should be falling-apart tender.

Serving: Cut small rounds, ladle sauce over them, and serve with noodles or bread.

Enjoy!

Tags:
[ 11:56 Mar 07, 2010    More recipes | permalink to this entry | ]

Fri, 05 Mar 2010

Adding video to an OpenOffice Impress presentation

(and how to convert MPEG video to animated GIF)

I gave an Ignite talk this week at Ignite Silicon Valley. It was a great event! Lots of entertaining talks about all sorts of topics.

I'd always wanted to do an Ignite speech. I always suspected the kicker would be format: O'Reilly's guidelines specified PowerPoint format.

Of course, as a Linux user, my only option for creating PowerPoint slides is OpenOffice. Historically, OpenOffice and I haven't gotten along very well, and this slide show was no exception. Happily, Ignite needs only 20 slides ... how hard can that be, right? Most of my slides were very simple (a few words, or one picture), with one exception: I had one simulation I wanted to show as a video. (When I give this presentation on my own machine, I run the simulation live, but that's not an option on someone else's machine.

Impress woes

First I wrestled with Open Office to create the non-animated slides. It was harder than I'd expected. I just loved having to go back and un-capitalize words that OO kept helpfully re-capitalizing for me. And the way it wouldn't let me change text format on any word that triggered the spellchecker, because it needed to show me the spellcheck context menu instead. And the guessing game clicking around trying to find a place where OO would let me drag to move the text to somewhere where it was approximately centered.

And when I finally thought I had everything, I saved as .ppt, re-loaded and discovered that it had lost all my formatting, so instead of yellow 96 point centered text I had white 14-point left-aligned, and I had to go in and select the text on each slide and change three or four properties on each one.

And I couldn't use it for an actual presentation. In slideshow mode, it only showed the first slide about one time out of six. The other times, it showed a blank slide for the first 15 seconds before auto-advancing to the second one. The auto-advance timing was off anyway (see below). Fortunately, I didn't need use OpenOffice for this presentation; I only needed it to create the PPT file. I ended up making a separate version of the slides in HTML to practice with.

Inserting a movie

But I did eventually have all my static slides ready. It was time to insert my movie, which I had converted to MPEG1 on the theory that it works everywhere. With the mpeg added, I saved one copy to OpenOffice's native format of .odp, plus the .ppt copy I would need for the actual presentation.

Then I quit and opened the .ppt -- and the video slide was blank. A bit of searching revealed that this was a long-known issue, bug 90272, but there seems to be no interest in fixing it. So I was out of luck if I wanted to attach an MPEG, unless I could find someone with a real copy of PowerPoint.

Plan B: Animated GIF

Next idea: convert my 15-second video to an animated GIF. But how to do that? Google found me quite a few web pages that claimed to give the recipe, but they all led to the same error message: ERROR: gif only handles the rgb24 pixel format. Use -pix_fmt rgb24.

So what? Just add -pix_fmt rgb24 to the commandline, right? But the trick turns out to be where to add it, since ffmpeg turns out to be highly picky about its argument order. Here's the working formula to convert a movie to animated GIF:

$ ffmpeg -i foo.mpeg -pix_fmt rgb24 foo.gif

This produced a huge file, though, and it didn't really need to be 1024x768, so I scaled it down with ImageMagick:

convert -depth 8 -scale 800x600 flock-mpeg.gif flock-mpeg-800.gif
which brought the file size from 278M down to a much more reasonable 1.9M.

Happily, OpenOffice does seem to be able to import and save animated GIFs, even to .ppt format. It has trouble displaying them -- that's bug 90272 -- so you wouldn't want to use this format for a presentation you were actually going to give in OpenOffice. But as I mentioned, OpenOffice was already out for that.

If you do this, make sure all your static slides are finished first. Once I loaded the animated GIF, OpenOffice slowed to a crawl and it was hard to do anything at all. Moving text on a slide turned into an ordeal of "hover the mouse where you think a move cursor might show up, and wait 45 seconds ... cursor change? No? Okay, move a few pixels and wait again." Nothing happened in real time. A single mouse click wouldn't register for 30 seconds or more. And this was on my fast dual-core desktop with 4G RAM; I don't even want to think what it would be like on my laptop. I don't know if OOo is running the animations continuously, or what -- but be sure you have everything else finished before you load any animations.

The moment of truth

I never found out whether my presentation worked in real Microsoft Powerpoint. As it turned out, at the real event, the display machine was a Mac running Keynote. Keynote was able to import the .ppt from OpenOffice, and to display the animation. Whew!

One curiosity about the display: the 15 seconds per slide auto-advance failed on the animated slide. The slide showed for 30 seconds rather than 15. I had written this off as another OpenOffice bug, so I wasn't prepared when Keynote did the same thing in the live presentation, and I had to extemporize for 15 seconds.

My theory, thinking about it afterward, is that the presentation programs don't start the counter until the animation has finished playing. So for an Ignite presentation, you might need to set the animation to play for exactly 15 seconds, then set that slide to advance after 0 seconds. If that's even possible.

Or just use HTML. The great irony of this whole story is that some of the other presenters used their own laptops, so I probably could have used my HTML version (which had none of these problems) had I asked. I will definitely remember that for the next Ignite! Meanwhile, I suppose it's good for me to try OO Impress every few years and remind myself why I avoid it the rest of the time.

Tags: , , , ,
[ 16:36 Mar 05, 2010    More speaking | permalink to this entry | ]