Shallow Thoughts

Akkana's Musings on Open Source, Science, and Nature.

Tue, 29 May 2007

A Culture of Regressions (or, Why I no longer work on Mozilla)

A couple of friends periodically pester me to write about why I stopped contributing to Mozilla after so many years with the project. I've held back, feeling like it's airing dirty laundry in public.

But a discussion on over the last week, started by Nelson Bolyard, aired it for me: it was their culture of regressions.

I love Mozilla technology. I'm glad it exists, and I still use it for my everyday browsing. But trying to contribute to Mozilla just got too frustrating. I spent more time chasing down and trying to fix other people's breakages than I did working on anything I wanted to work on.

That might be okay, barely, when you're getting paid for it. But when you're volunteering your own time, who wants to spend it fixing code checked in by some other programmer who just can't be bothered to clean up his own mess?

It's the difference between spending a day cleaning your own house ... and spending every day cleaning other people's houses.

Nelson said it eloquently in this exchange:

(Robert Kaiser writes)
As we are open source, everyone can access and test that code, and find and file the regressions, so that they get fixed over time.

(Boris Zbarsky writes)
That last conclusion doesn't necessarily follow. To get them fixed you need someone fixing them.

(Nelson Bolyard writes)
We're very unlikely to get volunteers to spent large amounts of effort, rewriting formerly working code to get it to work again, after it was broken by someone else's checkin. This demotivates developers and drives them away. They think "why should I keep working on this when others can break my code and I must pay for their mistakes?" and "I worked hard to get that working, and now person X has broken it. Let HIM fix it."

This was exactly how I felt, and it's the reason I quit working on Mozilla.

A little later in the thread, Boris Zbarsky reports that the trunk has been so broken with regressions that it's been unusable for him for weeks or months. (When you have someone as committed and sharp as Boris unable to use your software, you know there's something wrong with your project's culture.) He writes: "For example, on my machine (Linux) about one in three SVG testcases in Bugzilla causes trunk Gecko to hang X ..."

Justin Dolske replies, "Oh, Linux," and asks if it's related to turning on Cairo. Boris replies affirmatively. Just another example where a change was checked in that caused serious regressions keeping at least one important contributor from using the browser on a regular basis; yet it's still there and hasn't been backed out. Of course, it's "only Linux".

David Baron appears to take Nelson's concerns seriously, and suggests criteria for closing the tree and making everyone stop work to track down regressions. As he correctly comments, closing the tree is very serious and inefficient, and should be avoided in all but the most serious cases.

But Nelson repeats the real question:

(Nelson Bolyard writes)
Under what circumstances does a Sheriff back out a patch due to functional regressions? From what you wrote above, I gather it's "never". :(

Alas, the thread peters out after that; there's no reply to Nelson's question.

The problem with Mozilla isn't that there are regressions. Mistakes happen. The problem is that regressions never get fixed, because the project's culture encourages regressions. The prevailing attitude is that it's okay to check in changes that break other people's features, as long as your new feature is cool enough or the right people want it. If you break something, well, hey, someone will figure out a fix eventually. Or not. Either way, it's not your problem.

Working on new features is fun, and so is getting the credit for being the one to check them in. Fixing bugs, writing API documentation, extensive testing -- these things aren't fun, they're hard work, and there isn't much glory in them either (you don't get much appreciation or credit for it). So why do them if you don't have to? Let someone else worry about it, as long as the project lets you get away with it!

A project with a culture of responsibility would say that the person who broke something should fix it, and that broken stuff should stay out of the tree. If programmers don't do that themselves just because it's the right thing to do, the project could enforce it: just insist that regression-causing changes that can't be fixed right away be backed out. Fix the regressions out of the tree where they aren't causing problems for other people. Get help from people to test it and to integrate it with those other modules you forgot about the first time around.

Yes, even if it's a change that's needed -- even if it's something a lot of people want. If it's a good change, there will always be time to check it in later.

When it's really working.

[ 10:07 May 29, 2007    More programming | permalink to this entry ]

Sun, 27 May 2007

A Kitfox Extension

For a bit over a year I've been running a patched version of Firefox, which I call Kitfox, as my main browser. I patch it because there are a few really important features that the old Mozilla suite had which Firefox removed; for a long time this kept me from using Firefox (and I'm not the only one who feels that way), but when the Mozilla Foundation stopped supporting the suite and made Firefox the only supported option, I knew my only choice was to make Firefox do what I needed. The patches were pretty simple, but they meant that I've been building my own Firefox all this time.

Since all my changes were in JavaScript code, not C++, I knew this was probably all achievable with a Firefox extension. But never around to it; building the Mozilla source isn't that big a deal to me. I did it as part of my job for quite a few years, and my desktop machine is fast enough that it doesn't take that long to update and rebuild, then copy the result to my laptop.

But when I installed the latest Debian, "Etch", on the laptop, things got more complicated. It turns out Etch is about a year behind in its libraries. Programs built on any other system won't run on Etch. So I'd either have to build Mozilla on my laptop (a daunting prospect, with builds probably in the 4-hour range) or keep another system around for the purpose of building software for Etch. Not worth it. It was time to learn to build an extension.

There are an amazing number of good tutorials on the web for writing Firefox extensions (I won't even bother to link to any; just google firefox extension and make your own choices). They're all organized as step by step examples with sample code. That's great (my favorite type of tutorial) but it left my real question unanswered: what can you do in an extension? The tutorial examples all do simple things like add a new menu or toolbar button. None of them override existing Javascript, as I needed to do.

Canonical URL to the rescue. It's an extension that overrides one of the very behaviors I wanted to override: that of adding "www." to the beginning and ".com" or ".org" to the end of whatever's in the URLbar when you ctrl-click. (The Mozilla suite behaved much more usefully: ctrl-click opened the URL in a new tab, just like ctrl-clicking on a link. You never need to add www. and .com or .org explicitly because the URL loading code will do that for you if the initial name doesn't resolve by itself.) Canonical URL showed me that all you need to do is make an overlay containing your new version of the JavaScript method you want to override. Easy!

So now I have a tiny Kitfox extension that I can use on the laptop or anywhere else. Whee!

Since extensions are kind of a pain to unpack, I also made a source tarball which includes a simple Makefile: kitfox-0.1.tar.gz.

Tags: , , , ,
[ 10:59 May 27, 2007    More tech/web | permalink to this entry ]

Tue, 15 May 2007

Etch: Blacklisting Modules, Udev, and Firewire Networking

The new Debian Etch installation on my laptop was working pretty well. But it had one weirdness: the ethernet card was on eth1, not eth0. ifconfig -a revealed that eth0 was ... something else, with no IP address configured and a really long MAC address. What was it?

Poking around dmesg revealed that it was related to the IEEE 1394 and the eth1394 module. It was firewire networking.

This laptop, being a Vaio, does have a built-in firewire interface (Sony calls it i.Link). The Etch installer, when it detected no network present, had noted that it was "possible, though unlikely" that I might want to use firewire instead, and asked whether to enable it. I declined.

Yet the installed system ended up with firewire networking not only installed, but taking the first network slot, ahead of any network cards. It didn't get in the way of functionality, but it was annoying and clutters the output whenever I type ifconfig -a. Probably took up a little extra boot time and system resources, too. I wanted it gone.

Easier said than done, as it turns out.

I could see two possible approaches.

  1. Figure out who was setting it to eth1, and tell it to ignore the device instead.
  2. Blacklist the kernel module, so it couldn't load at all.

I begain with approach 1. The obvious culprit, of course, was udev. (I had already ruled out hal, by removing it, rebooting and observing that the bogus eth0 was still there.) Poking around /etc/udev/rules.d revealed the file where the naming was happening: z25_persistent-net.rules.

It looks like all you have to do is comment out the two lines for the firewire device in that file. Don't believe it. Upon reboot, udev sees the firewire devices and says "Oops! persistent-net.rules doesn't have a rule for this device. I'd better add one!" and you end up with both your commented-out line, plus a brand new uncommented line. No help.

Where is that controlled? From another file, z45_persistent-net-generator.rules. So all you have to do is edit that file and comment out the lines, right?

Well, no. The firewire lines in that file merely tell udev how to add a comment when it updates z25_persistent-net.rules. It still updates the file, it just doesn't comment it as clearly.

There are some lines in z45_persistent-net-generator.rules whose comments say they're disabling particular devices, by adding a rule GOTO="persistent_net_generator_end". But adding that in the firewire device lines caused the boot process to hang. There may be a way to ignore a device from this file, but I haven't found it, nor any documentation on how this system works.

Defeated, I switched to approach 2: prevent the module from loading at all. I never expect to use firewire networking, so it's no loss. And indeed, there are lots of other modules loaded I'd like to blacklist, since they represent hardware this machine doesn't have. So it would be nice to learn how.

I had a vague memory of there having been a file with a name like /etc/modules.blacklist some time back in the Pliocene. But apparently no such file exists any more. I did find /etc/modprobe.d/blacklist, which looked promising; but the comment at the beginning of that file says

# This file lists modules which will not be loaded as the result of
# alias expsnsion, with the purpose of preventing the hotplug subsystem
# to load them. It does not affect autoloading of modules by the kernel.
Okay, sounds like this file isn't what I wanted. (And ... hotplug? I thought that was long gone, replaced by udev scripts.) I tried it anyway. Sure enough, not what I wanted.

I fiddled with several other approaches before Debian diva Erinn Clark found this helpful page. I created a file called /etc/modprobe.d/00local and added this line to it:

install eth1394 /bin/true
and on the next boot, the module was no longer loaded, and no longer showed up as a bogus ethernet device. Hurray!

This /etc/modprobe.d/00local technique probably doesn't bear examining too closely. It has "hack" written all over it. But if that's the only way to blacklist problematic modules, I guess it's better than nothing.

Tags: , , ,
[ 18:10 May 15, 2007    More linux | permalink to this entry ]

Debian "Etch": a Sketch

Since I'd already tried the latest Ubuntu on my desktop, I wanted to check out Debian's latest, "Etch", on my laptop.

The installer was the same as always, and the same as the Ubuntu installer. No surprises, although I do like the way Debian gives me a choice of system types to install (Basic desktop, Web server, etc. ... though why isn't "Development" an option?) compared to Ubuntu's "take the packages we give you and deal with it later" approach.

Otherwise, the install went very much like a typical Ubuntu install. I followed the usual procedures and workarounds so as not to overwrite the existing grub, to get around the Vaio hardware issues, etc. No big deal, and the install went smoothly.

The good

But the real surprise came on booting into the new system. Background: my Vaio SR-17 has a quirk (which regular readers will have heard about already): it has one PCMCIA slot, which is needed for either the external CDROM drive or a network card. This means that at any one time, you can have a network, or a CDROM, but not both. This tends to throw Debian-based installers into a tizzy -- you have to go through five or more screens (including timing out on DHCP even after you've told it that you have no network card) to persuade the installer that yes, you really don't have a network and it's okay to continue anyway.

That means that the first step after rebooting into the new system is always configuring the network card. In Ubuntu installs, this typically means either fiddling endlessly with entries in the System or Admin menus, or editing /etc/network/interfaces.

Anticipating a vi session, I booted into my new Etch and inserted the network card (a 3COM 3c59x which often confuses Ubuntu). Immediately, something began spinning in the upper taskbar. Curious, I waited, and in ten seconds or so a popup appeared informing me "You are now connected to the wired net."

And indeed I was! The network worked fine. Kudos to debian -- Etch is the first distro which has ever handled this automatically. (I still need to edit /etc/network/interfaces to set my static IP address -- network manager

Of course, since this was my laptop, the next most important feature is power management. Happily, both sleep and hibernate worked correctly, once I installed the hibernate package. That had been my biggest worry: Ubuntu was an early pioneer in getting ACPI and power management code working properly, but it looks like Debian has caught up.

The bad

I did see a couple of minor glitches.

First, I got a lot of system hangs in X. These turned out to be the usual dri problem on S3 video cards. It's a well known bug, and I wish distros would fix it!

I've also gotten at least one kernel OOPS, but I have a theory about what might be causing that. Time will tell whether it's a real problem.

It took a little googling to figure out the line I needed to add to /etc/apt/sources.list in order to install programs that weren't included on the CD. (Etch automatically adds lines for security updates, but not for getting new software). But fortunately, lots of other people have already asked this in a variety of forums. The answer is:

deb etch main contrib non-free

My husband had suggested that Etch might be lighter weight than Ubuntu and less dependent on hal (which I always remove from my laptop, because its constant hardware polling makes noise and sucks power). But no: Etch installed hal, and any attempt to uninstall it takes with it the whole gnome desktop environment, plus network-manager (that's apparently that nice app that noticed my network card earlier) and rhythmbox. I don't actually use the gnome desktop or these other programs, but it would be nice to have the option of trying them when I want to check something out. So for now I've resorted to the temporary solution: mv /usr/sbin/hald /usr/sbin/hald-not

The ugly

Etch looks fairly nice, and I'm looking forward to exploring it. I'm mostly kidding about the "ugly". I did hit one minor bit of ugliness involving network devices which led me on a two-hour chase ... but I'll save that for its own article.

Tags: , ,
[ 13:29 May 15, 2007    More linux | permalink to this entry ]

Sun, 13 May 2007

Feisty Fawn Versus Apache

In the last installment, I got the Visor driver working. My sitescooper process also requires that I have a local web server (long story), so I needed Apache. It was already there and running (curiously, Apache 1.3.34, not Apache 2), and it was no problem to point the DocumentRoot to the right place.

But when I tested my local site, I discovered that although I could see the text on my website, I couldn't see any of the images. Furthermore, if I right-clicked on any of those images and tried "View image", the link was pointing to the right place (http://localhost/images/foo.jpg). The file (/path/to/mysite/images/foo.jpg) existed with all the right permissions. What was going on?

/var/log/apache/error.log gave me the clue. When I was trying to view http://localhost/images/foo.jpg, apache was throwing this error:

 [error] [client] File does not exist: /usr/share/images/foo.jpg
/usr/share/images? Huh?

Searching for usr/share/images in /etc/apache/httpd.conf gave the answer. It turns out that Ubuntu, in their infinite wisdom, has decided that no one would ever want a directory called images in their webspace. Instead, they set up an alias so that any reference to /images gets redirected to /usr/share/images.


Anyway, the solution is to comment out that stanza of httpd.conf:

<IfModule mod_alias.c>
#    Alias /icons/ /usr/share/apache/icons/
#    <Directory /usr/share/apache/icons>
#         Options Indexes MultiViews
#         AllowOverride None
#         Order allow,deny
#         Allow from all
#    </Directory>
#    Alias /images/ /usr/share/images/
#    <Directory /usr/share/images>
#         Options MultiViews
#         AllowOverride None
#         Order allow,deny
#         Allow from all
#    </Directory>

I suppose it's nice that they provided an example for how to use mod_alias. But at the cost of breaking any site that has directories named /images or /icons? Is it just me, or is that a bit crazy?

Tags: , ,
[ 21:55 May 13, 2007    More linux | permalink to this entry ]

Feisty Fawn: The Adventure Continues, with the Visor Driver

When we left off, I had just found a workaround for my Feisty Fawn installer problems and had gotten the system up and running.

By now, it was late in the day, time for my daily Sitescooper run to grab some news to read on my Treo PDA. The process starts with making a backup (pilot-xfer -s). But pilot-xfer failed because it couldn't find the device, /dev/ttyUSB1. The system was seeing the device connection -- dmesg said

[ 1424.598770] usb 5-2.3: new full speed USB device using ehci_hcd and address 4
[ 1424.690951] usb 5-2.3: configuration #1 chosen from 1 choice
"configuration #1"? What does that mean? I poked around /etc/udev a bit and found this rule in rules.d/60-symlinks.rules:
# Create /dev/pilot symlink for Palm Pilots
KERNEL=="ttyUSB*", ATTRS{product}=="Palm Handheld*|Handspring *|palmOne Handheld", \
Oh, maybe they were calling it /dev/pilot1? But no, there was nothing matching /dev/*pilot*, just as there was nothing matching /dev/ttyUSB*.

But this time googling led me right to the bug, bug 108512. Turns out that for some reason (which no one has investigated yet), feisty doesn't autoload the visor module when you plug in a USB palm device the way other distros always have. The temporary workaround is sudo modprobe visor; the long-term workaround is to add visor to /etc/modules.

On the subject of Feisty's USB support, though, I do have some good news to report.

My biggest motivation for upgrading from edgy was because USB2 had stopped working a few months ago -- bug 54419. I hoped that the newer kernel in Feisty might fix the problem.

So once I had the system up and running, I plugged my trusty hated-by-edgy MP3 player into the USB2 hub, and checked dmesg. It wasn't working -- but the error message was actually useful. Rather than obscure complaints like end_request: I/O error, dev sde, sector 2033440 or device descriptor read/64, error -110 or 3:0:0:0: rejecting I/O to dead device it had a message (which I've since lost) about "insufficient power". Now that's something I might be able to do something about!

So I dug into my bag o' cables and found a PS/2 power adaptor that fit my USB2 hub, plugged it in, plugged the MP3 player into the hub, and voila! it was talking on USB2 again.

Tags: , , , , ,
[ 20:10 May 13, 2007    More linux | permalink to this entry ]

Sat, 12 May 2007

Installing "Feisty Fawn"

I finally found some time to try the latest Ubuntu, "Feisty Fawn", on my desktop machine.

I used a xubuntu alternate installer disk, since I don't need the gnome desktop, and haven't had much luck booting from the Ubuntu live CDs lately. (I should try the latest, but my husband had already downloaded and burned the alternate disk and I couldn't work up the incentive to download another disk image.

The early portions of the install were typical ubuntu installer: choose a few language options, choose manual disk partitioning, spend forever up- and down-arrowing through the partitioner trying to persuade it not to automount every partition on the disk (after about the sixth time through I gave up and just let it mount the partitions; I'll edit /etc/fstab later) then begin the install.

Cannot find /lib/modules/2.6.20-15-generic
update-initramfs: failed for /boot/initrd.img-2.6.0-15-generic

Couldn't install grub, and warning direly, "This is a fatal error".

But then popcorn on #linuxchix found Ubuntu bug 37527. Turns out the problem is due to using an existing /boot partition, which has other kernels installed. Basically, Ubuntu's new installer can't handle this properly. The workaround is to go through the installer without a separate /boot partition, let it install its kernels to /boot on the root partition (but don't let it install grub, even though it's fairly insistent about it), then reboot into an old distro and copy the files from the newly-installed feisty partition to the real /boot. That worked fine.

The rest of the installation was smooth, and soon I was up and running. I made some of my usual tweaks (uninstall gdm, install tcsh, add myself to /etc/password with my preferred UID, install fvwm and xloadimage, install build-essentials and the zillion other packages needed to compile anything useful) and I had a desktop.

Of course, the adventure wasn't over. There was more fun yet to come! But I'll post about that separately.

Tags: ,
[ 19:36 May 12, 2007    More linux | permalink to this entry ]

Fri, 11 May 2007

Great Deals on Brush Mouse

The previous entry covered springtime butterflies, but it's springtime in the back yard, too.

Notch (our longtime resident squirrel) is heavily pregnant. It's not slowing her down much -- she still leaps and climbs gracefully -- but apparently raging hormones in a pregnant squirrel create a desperate need to bury walnuts. She's here all day long, demanding one walnut after another. She isn't very interested in eating, only burying.

We play games. Today I handed her a walnut then raised it while she was still holding it; she hung on for a few seconds, then pulled her hind legs up, did a backflip, landed on her forelegs and scampered off, to reappear a few minutes later wanting another one.

Ringtail the fox squirrel is still with us, as is a young male Eastern grey (perhaps the father of Notch's brood?) and the most recent arrival, a male fox squirrel. But in addition, we have a new visitor we've only seen a few times: a mouse, larger than a house mouse but smaller than a black rat. It's apparently some kind of native mouse. (Good! That's much more interesting, plus it means it's far less likely to want to move inside the house. Wildlife is great fun outdoors, less fun when they want to move in with you.)

So what kind of mouse is it? Hey, no problem -- there are only thirty or forty species of native mouse in my mammals field guide! Okay, so identifying a mouse that you only see for a few seconds at a time isn't terribly easy. But one caught my eye pretty early on: the brush mouse with its long ears and habit of moving by jumping, like our mouse. I don't know for sure that this is a brush mouse, but it seems like a reasonable first guess.

When I google for "brush mouse", the links aren't that useful, but the ads are intriguing. Google presents two sponsored ads. One is a colored ad at the top of the page for a Mouse Brush, from I know someone who keeps mice -- I'll have to ask her if she has a Mouse Brush. I thought they normally kept themselves clean pretty well without needing to be brushed, but you never know, maybe those fancy longhaired mice need some help.

The second ad was over on the right and was even more interesting. It said:

Brush Mouse
Great deals on Brush Mouse
Shop on eBay and Save!

That's a relief -- if anything happens to our brush mouse, now we know where we can get a new one!

It's just amazing the sorts of things you can find on ebay.

Tags: ,
[ 20:53 May 11, 2007    More nature | permalink to this entry ]

Spring Butterfly Madness

It's spring, and butterflies are everywhere in the local parks. If you like butterflies and live in Northern California (or anywhere with a similar climate), get yourself out this weekend ot check out the action! There are a few northern checkerspots, tiger swallowtails and others flitting about, but the real partiers are the variable checkerspots.

[Variable checkerspot butterflies on yerba santa] At Stevens Creek, they're clustered in huge numbers on the pale blue-violet flowers of yerba santa. Some yerba santa bushes are completely covered with butterflies. Others aren't: a closer look shows that those bushes have flowers pointing down, rather than up. Maybe once a flower is pollinated and its nectar gone, it sags?

[Variable checkerspot butterflies on buckeye flowers] On the other side of the road, at Piccheti Ranch, yerba santa isn't so common, and the checkerspots gather on the last of the clusters of buckeye flowers.

And one more checkerspot-on-yerba-santa picture, just 'cause they're pretty.

[ 20:17 May 11, 2007    More nature | permalink to this entry ]

Sat, 05 May 2007

The Pesky "Unresponsive Script" Dialog

For quite some time, I've been seeing all too frequently the dialog in Firefox which says:
A script on this page may be busy, or it may have stopped responding. You can stop the script now, or continue to see if the script will complete.
[Continue] [Stop script]

Googling found lots of pages offering advice on how to increase the timeout for scripts from the default of 5 seconds to 20 or more (change the preference dom.max_script_run_time in about:config. But that seemed wrong. I was seeing the dialog on lots of pages where other people didn't see it, even on my desktop machine, which, while it isn't the absolute latest and greatest in supercomputing, still is plenty fast for basic web tasks.

The kicker came when I found the latest page that triggers this dialog: Firefox' own cache viewer. Go to about:cache and click on "List Cache Entries" under Disk cache device. After six or seven seconds I got an Unresponsive script dialog every time. So obviously this wasn't a problem with the web sites I was visiting.

Someone on #mozillazine pointed me to Mozillazine's page discussing this dialog, but it's not very useful. For instance, it includes advice like

To determine what script is running too long, open the Error Console and tell it to stop the script. The Error Console should identify the script causing the problem.
Error console? What's that? I have a JavaScript Console, but it doesn't offer any way to stop scripts. No one on #mozillazine seemed to have any idea where I might find this elusive Error console either. Later Update: turns out this is new with Firefox 2.0. I've edited the Mozillazine page to say so. Funny that no one on IRC knew about it.

But there's a long and interesting MozillaZine discussion of the problem in which it's clear that it's often caused by extensions (which the Mozillazine page had also suggested). I checked the suggested list of Problematic extensions, but I didn't see anything that looked likely.

So I backed up my Firefox profile and set to work, disabling my extensions one at a time. First was Adblock, since it appeared in the Problematic list, but removing it didn't help: I still got the Unresponsive script when viewing my cache.

The next try was Media Player Connectivity. Bingo! No more Unresponsive dialog. That was easy.

Media Player Connectivity never worked right for me anyway. It's supposed to help with pages that offer videos not as a simple video link, like movie.mpeg or or whatever, but as an embedded object in the page which insists on a specific browser plug-in (like Apples's QuickTime or Microsoft's Windows Media Player).

Playing these videos in Firefox is a huge pain in the keister -- you have to View Source and crawl through the HTML trying to find the URL for the actual video. Media Player Connectivity is supposed to help by doing the crawl for you and presenting you with video links for any embedded video it finds. But it typically doesn't find anything, and its user interface is so inconsistent and complicated that it's hard to figure out what it's telling you. It also can't follow the playlists and .SMIL files that so many sites use now. So I end up having to crawl through HTML source anyway.

Too bad! Maybe some day someone will find a way to make it easier to view video on Linux Firefox. But at least I seem to have gotten rid of those Unresponsive Script errors. That should make for nicer browsing!

Tags: , , ,
[ 12:07 May 05, 2007    More tech/web | permalink to this entry ]