Shallow Thoughts : tags : kernel

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Fri, 25 Feb 2011

Article: Building kernels for plug computers

This week's Linux Planet article continues the Plug Computer series, with Cross-compiling Custom Kernels for Little Linux Plug Computers.

It covers how to find and install a cross-compiler (sadly, your Linux distro probably doesn't include one), configuring the Linux kernel build, and a few gotchas and things that I've found not to work reliably.

It took me a lot of trial and error to figure out some of this -- there are lots of howtos on the web but a lot of them skip the basic steps, like the syntax for the kernel's CROSS_COMPILE argument -- but you'll need it if you want to enable any unusual drivers like GPIO. So good luck, and have fun!

Tags: , , , ,
[ 12:30 Feb 25, 2011    More tech | permalink to this entry | ]

Sun, 09 May 2010

The new udev in Lucid

Ubuntu's latest release, 10.04 "Lucid Lynx", really seems remarkably solid. It boots much faster than any Ubuntu of the past three years, and has some other nice improvements too.

But like every release, they made some pointless random undocumented changes that broke stuff. The most frustrating has been getting my front-panel flash card reader to work under Lucid's new udev, so I could read SD cards from my camera and PDA.

The SD card slot shows up as /dev/sdb, but unless there's a card plugged in at boot time, there's no /dev/sdb1 that you can actually mount.

hal vs udisks

Prior to Lucid, the "approved" way of creating sdb1 was to let hald-addons-storage poll every USB device every so often, to see if anyone has plugged in a card and if so, check its partition table and create appropriate devices.

That's a lot of polling -- and in any case, hald isn't standard on Lucid, and even when it's installed, it sometimes runs and sometimes doesn't. (I haven't figured out what controls whether it decides to run). Hal isn't even supposed to be needed on Lucid -- it's supposed to use devicekit (renamed to) udisks for that.

Except I guess they couldn't quite figure out how to get udisks working in time, so they patched things together so that on Gnome systems, hald does the same old polling stuff -- and on non Gnome systems, well, maybe it does and maybe it doesn't. And maybe you can't read your camera cards. Oh well!

udev rules

But on systems prior to Lucid there was another way: make a udev rule to create sdb1 through sdb15 every time. I have an older article on setting up udev rules for multicard readers, but none of my old udev rules worked on Lucid.

After many rounds of udevadm info -a -p /block/sdb and udevadm test /block/sdb, service udev restart, and many reboots, I finally found a rule that worked.

Create a /etc/udev/rules.d/71-multicard-reader.rules file containing the following:

# Create all devices for multicard reader:
KERNEL=="sd[b-g]", SUBSYSTEMS=="usb", ATTRS{idVendor}=="1d6b", ATTRS{idProduct}=="0002", OPTIONS+="all_partitions,last_rule"

Replace the 1d6b and 0002 with the vendor and product of your own device, as determined with udevadm info -a -p /block/sdb ... and don't be tempted to use the vendor and device ID you get from lsusb, because those are different.

What didn't work that used to? String matches. Some of them. For example, this worked:

KERNEL=="sd[b-g]", SUBSYSTEMS=="scsi", ATTRS{model}=="*SD*", NAME{all_partitions}="sdcard"
but these didn't:
KERNEL=="sd[b-g]", SUBSYSTEMS=="scsi", ATTRS{model}=="*SD*Reader*", NAME{all_partitions}="sdcard"
KERNEL=="sd[a-g]", SUBSYSTEMS=="scsi", ATTRS{model}=="USB SD Reader   ", NAME{all_partitions}="cardsd"

Update: The first of those two lines does indeed work now, whereas it didn't when I was testing. It's possible that this has something to do with saving hardware states and needing an extra udevadm trigger, as suggested in Alex's Changes in Ubuntu Lucid to udev.

According to udevadm info, the model is "USB SD Reader " (three spaces at the end). But somehow "*SD*" matches this while "*SD*Reader*" and the exact string do not. Go figure.

Numeric order

I'd like to have this rule run earlier, so it runs before /lib/udev/rules.d/60-persistent-storage.rules and could use OPTIONS+="last_rule" to keep the persistent storage rules from firing (they run a lot of unnecessary external programs for each device). But if I rename the rule from 71-multicard-reader.rules to 59-, it doesn't run at all. Why? Shrug. It's not like udevadm test will tell me.

Other things I love (not) about the new udev

Tags: , , , ,
[ 21:51 May 09, 2010    More linux/kernel | permalink to this entry | ]

Thu, 15 Apr 2010

Function keys on an Apple keyboard, on Linux

Quick tip from Dave, passed along to someone else trying to use an Apple keyboard on Linux:

On Linux, for some reason Apple keyboards' function keys don't work by default. Most of them try to run special functions instead, like volume up/down or play/pause.

But you can get normal function keys by talking to the kernel module that drives the keyboard:

echo 2 >  /sys/module/hid_apple/parameters/fnmode

This will only last until shutdown, so put that line in /etc/rc.local or a similar place so it runs every time you boot.

Here's an Ubuntu help page on Apple Keyboards with more information and other tricks.

Tags: , , ,
[ 16:03 Apr 15, 2010    More linux/kernel | permalink to this entry | ]

Mon, 16 Nov 2009

More on kernel options needed for new X servers

A week ago I wrote about my mouse woes and how they were solved by enabling the "Enable modesetting on intel by default" kernel option.

But all was still not right with X. I could boot into a console, start X, and everything was fine -- but once X was running, I couldn't go back to console mode. Switching to a console, e.g. Ctrl-Alt-F2, did nothing except make the mouse cursor disappear, and any attempt to quit X to go back to my login shell put the monitor in sleep mode permanently -- the machine was still up, and I could ssh in or ctrl-alt-Delete to reboot, but nothing else I did would bring my screen back.

It wasn't strictly an Ubuntu problem, though this showed up with Karmic; I have a Gentoo install on another partition and it had the same problem. And I knew it was a kernel problem, because the Ubuntu kernel did let me quit X.

I sought in vain among the kernel's various Graphics settings. You might think that "enable modesetting" would be related to, you know, being unable to change video modes ... but it wasn't. I tried different DRM options and switching framebuffer on and off. Though, oddly, enabling framebuffer didn't actually seem to enable the framebuffer.

Finally I stepped through the Graphics section of make menuconfig comparing my settings with a working kernel, and saw a couple of differences that didn't look at all important: "Select compiled-in fonts" and "VGA 8x16 font". Silly, but what could they hurt? I switched them on and rebuilt.

And on the next boot, I had a framebuffer, and mode switching.

So be warned: those compiled-in fonts are not optional if you want a framebuffer; and you'd better want a framebuffer, because that isn't optional either if you want to be able to get out of X once you start it.

Tags: , , ,
[ 20:22 Nov 16, 2009    More linux/kernel | permalink to this entry | ]

Tue, 10 Nov 2009

Mouse failures with 2.6.31, Karmic and Intel

I've been seeing intermittent mouse failures since upgrading to Ubuntu 9.10 "Karmic". At first, maybe one time out of five I would boot, start X, and find that I couldn't move my mouse pointer. But after building a 2.6.31.4 kernel, things got worse and it happened nearly every time.

It wasn't purely an X problem; if I enabled gpm, the mouse failed in the console as well as in X. And it wasn't hardware, because if I used Ubuntu 9.10's standard kernel, my mouse worked every time.

After much poking around with kernel options, I discovered that if I tunred off the Direct Rendering manager ("Intel 830M, 845G, 852GM, 855GM, 865G (i915 driver)"), my mouse would work. But that wasn't a satisfactory solution; aside from not being able to run Google Earth, it seems that Intel graphics needs DRM even to get reasonable performance redrawing windows. Without it, every desktop switch means watching windows slowly redraw over two or three seconds.

(Aside: why is it that Intel cards with shared CPU memory need DRM to draw basic 2-D windows, when my ancient ATI Radeon cards without shared memory had no such problems?)

But I think I finally have it nailed. In the kernel's Direct Rendering Manager options (under Graphics), the "Intel 830M, 845G, 852GM, 855GM, 865G (i915 driver)" using its "i915 driver" option has a new sub-option: "Enable modesetting on intel by default".

The help says:

CONFIG_DRM_I915_KMS:
Choose this option if you want kernel modesetting enabled by default, and you have a new enough userspace to support this. Running old userspaces with this enabled will cause pain. Note that this causes the driver to bind to PCI devices, which precludes loading things like intelfb.

Sounds optional, right? Sounds like, if I want to build a kernel that will work on both karmic and jaunty, I should leave that off so as not to "cause pain".

But no. It turns out it's actually mandatory on karmic. Without it, there's a race condition where about 80-90% of the time, hal won't see a mouse device at all, so the mouse won't work either in X or even on the console with gpm.

It's sort of the opposite of the "Remove sysfs features which may confuse old userspace tools" in General Setup, where the name implies that it's optional on new distros like Karmic, but in fact, if you leave it on, the kernel won't work reliably.

So be warned when configuring a kernel for brand-new distros. There are some new pitfalls, and options that worked in the past may not work any longer!

Update: see also the followup post for two more non-optional options.

Tags: , , , ,
[ 23:34 Nov 10, 2009    More linux/kernel | permalink to this entry | ]

Mon, 02 Nov 2009

Controlling brightness on a Sony laptop in Linux 2.6.31

My laptop, a Sony Vaio TX650P, badly needed a kernel update. 2.6.28.3 had been running so well that I just hadn't gotten around to changing it. When I finally updated to 2.6.31.5, nearly everything worked, with one exception: Fn-F5 and Fn-F6 no longer adjusted screen brightness.

I found that there were lots of bugs filed on this problem -- kernel.org bug 12816, Ubuntu bug 414810, Fedora bug 519105 and so on. A few of them had terse comments like "Can you try this patched binary release?" but none of them had a source patch, or any hints as to the actual nature of the problem.

But one of them did point me to /sys/class/backlight. In my working 2.6.28.3 kernel, that directory contained a sony subdirectory containing useful files that let me query or change the brightness: echo 1 >/sys/class/backlight/sony/brightness
On my nonworking 2.6.31.5 kernel, I had /sys/class/backlight but there was no sony subdirectory there.

grep SONY .config in the two kernels revealed that my working kernel had SONY_LAPTOP set, while the nonworking one did not. No problem! Just figure out where SONY_LAPTOP is in the configuration (it turns out to be at the very end of "device drivers" under "X86 Platform Specific Device Drivers"), make menuconfig, set SONY_LAPTOP and rebuild ... right?

Well, no. make menuconfig showed me lots of laptop manufacturers in the "Platform Specific" category, but Sony wasn't one of them. Of course, since it didn't show the option it also didn't offer a chance to read the help for the option either, which might have told me what its dependencies were.

Time for a recursive grep of kernel source: grep -r SONY_LAPTOP .
arch/x86/configs/i386_defconfig told me the option did indeed still exist, and where to find it. drivers/platform/x86/Kconfig lists the option itself, and says it depends on INPUT and RFKILL.

RFKILL? A bit more poking around located that one near the end of "Networking support", with the name "RF switch subsystem support". (I love how user-visible names for kernel options only marginally resemble the CONFIG names.) It's apparently intended for "control over RF switches found on many WiFi and Bluetooth cards," something I don't seem to need on this laptop (my WiFi works fine without it) -- except that the kernel for some reason won't let me build the ACPI brightness control without it.

So that's the secret. Go to "Networking support", set "RF switch subsystem support", then back up to "Device drivers", scroll down to the end to find "X86 Platform Specific Device Drivers" and inside it you'll now see "Sony Laptop Extras". Enable that, and the Fn-F5/F6 keys (as well as /sys/class/backlight/sony/brightness) will work again.

Tags: , , ,
[ 10:07 Nov 02, 2009    More linux/kernel | permalink to this entry | ]

Thu, 22 Oct 2009

Article: Building Your Own Linux Kernel, Part III

Part III of building your own kernel, Tricky kernel options, covers some of the more confusing options you'll encounter when configuring your kernel.

Meanwhile, I'm still jazzed about the great howto that Nathan Willis of Worldlabel.com wrote a few days ago for my Gimp labels scripts: Fast labels and Card layout with Gimplabels.

Tags: , ,
[ 15:03 Oct 22, 2009    More writing | permalink to this entry | ]

Sat, 10 Oct 2009

Article: Building Your Own Linux Kernel, Part II

Part II of building your own kernel, Configuring a New Linux Kernel, covers how to run menuconfig, how to disable modules and slim down the kernel to only the parts you need, and a few important options to look for.

Part III, in two weeks, will tour some specific kernel options and what they do.

Tags: , ,
[ 10:19 Oct 10, 2009    More writing | permalink to this entry | ]

Fri, 25 Sep 2009

Article: Building Your Own Linux Kernel, Part I

My latest article is up on Linux Planet: Building Your Own Linux Kernel, Part I.

"But aren't there a gazillion howtos already on the web on kernel building?" I thought so too. But when someone showed up on LinuxChix recently asking for help building her kernel, I went looking -- and all the howtos I could find were out of date (even the README in the very latest kernel gives instructions based on LILO, not GRUB).

More important, none of them offered help in that all-important question: How do I start with a configuration file I know will work?

My quick-and-dirty howto shows you how to take your distro's configuration file and build the latest mainstream kernel based on it. The next article will cover how to change that configuration and tune it for your own machine.

Tags: , ,
[ 10:45 Sep 25, 2009    More writing | permalink to this entry | ]

Thu, 02 Jul 2009

Disabling mouse/keyboard wakeup

Suspend (sleep) works very well on the dual-Atom desktop. The only problem with it is that the mouse or keyboard wake it up. I don't mind the keyboard, but the mouse is quite sensitive, so a breeze through the window or a big truck driving by on the street can jiggle the mouse and wake the machine when I'm away.

I've been through all the BIOS screens looking for a setting to flip, but there's nothing there. Some web searching told me that under Windows, there's a setting you can change that will affect this, but I couldn't find anything similar for Linux, until finally drc clued me in to /proc/acpi/wakeup.

cat /proc/acpi/wakeup
will tell you all the events that can cause your machine to wake up from various sleep states.

Unfortunately, they're also obscurely coded. Here are mine:

Device  S-state   Status   Sysfs node
SLPB      S4    *enabled  
P32       S4     disabled  pci:0000:00:1e.0
UAR1      S4     enabled   pnp:00:0a
PEX0      S4     disabled  pci:0000:00:1c.0
PEX1      S4     disabled  
PEX2      S4     disabled  pci:0000:00:1c.2
PEX3      S4     disabled  pci:0000:00:1c.3
PEX4      S4     disabled  
PEX5      S4     disabled  
UHC1      S3     disabled  pci:0000:00:1d.0
UHC2      S3     disabled  pci:0000:00:1d.1
UHC3      S3     disabled  pci:0000:00:1d.2
UHC4      S3     disabled  pci:0000:00:1d.3
EHCI      S3     disabled  pci:0000:00:1d.7
AC9M      S4     disabled  
AZAL      S4     disabled  pci:0000:00:1b.0

What do all those symbols mean? I have no clue. Apparently the codes come from the BIOS's DSDT code, and since it varies from board to board, nobody has published tables of likely translations.

The only two wakeups that were enabled for me were SLPB and UAR1. SLPB apparently stands for SLeeP Button, and Rik suggested UAR probably stood for Universal Asynchronous Receiver (the more familiar term UART both receives and Transmits.) Some of the other devices in the list can possibly be identified by comparing their pci: codes against lspci, but not those two.

Time for some experimentation. You can toggle any of these by writing to the wakeup device:

echo UAR1 >/proc/acpi/wakeup

It turned out that to disable mouse and keyboard wakeup, I had to disable both SLPB and UAR1. With both disabled, the machine wakes up when I press the power button. (What the SLeeP Button is, if it's not the power button, I don't know.)

My mouse and keyboard are PS/2. For a USB mouse and keyboard, look for something like USB0, UHC0, USB1.

The UAR1 setting is remembered even across boots: there's no need to do anything to make sure the setting is remembered. But the SLPB setting resets every time I boot. So I edited /etc/rc.local and added this line:

echo SLPB >/proc/acpi/wakeup

Tags: , ,
[ 10:21 Jul 02, 2009    More linux/kernel | permalink to this entry | ]

Wed, 24 Jun 2009

Keeping external kernel modules from being deleted

I've been enjoying my random system beeps, different every day. At least up until yesterday, when I didn't seem to have one. Today I didn't have one either, and discovered that was because the beep module was no longer loaded.

Why not? Well, I updated my kernel to tweak some ACPI parameters (fruitlessly, as it turns out; I'm trying to get powertop to give me more information but I haven't found the magic combination of kernel parameters it wants on this machine) and so I did a make modules_install. And it seems that make modules_install starts out by doing rm -rf /lib/modules/VERSION/kernel which removed my externally built beep module along with everything else.

I couldn't find documentation on this, but I did find Intel Wireless bug 556 which talks about the issue. Apparently somewhere along the way 2.6 started doing this rm -rf, but you can get around it by installing outside the kernel directory.

In other words, instead of

cp beep.ko /lib/modules/2.6.29.4/kernel/drivers/input/misc/
do
cp beep.ko /lib/modules/2.6.29.4/drivers/input/misc/
Then your external module won't get wiped out at the next modules_install.

I've let the maintainer of Fancy Beeper know about this, so it won't be a problem for that module, but it's a good tip to know about in general -- I'm sure there are lots of modules that hit this problem.

Tags: , ,
[ 10:30 Jun 24, 2009    More linux/kernel | permalink to this entry | ]

Wed, 08 Apr 2009

Reading the temperature using /sys rather than /proc

I was curious whether Linux could read the CPU temperature on Dave's new Mac Mini. I normally read the temperature with something like this:
cat /proc/acpi/thermal_zone/ATF0/temperature
(the ATF0 part varies from machine to machine).

Though this doesn't work on all machines -- on my AMD desktop it always returns the same number, which, I'm told, means that the BIOS probably has some code that looks something like this:

if (OS == "Win95" || OS == "Win98") {
  return get_win9x_temp();
}
else if (OS == "WinNT" || OS == "WinXP" || OS == "Vista") {
  return get_nt_temp();
}
else {
  return 40;
}
Anyway, I wondered whether the Mac would have that problem (with different OS names, of course).

There wasn't anything in /proc/acpi/thermal_zone on the Mac, but /proc is deprecated and we're all supposed to be moving to /sys, right? But nobody writes about the new way to get the temperature from /sys; most people are still using the old /proc way.

Took some digging, but I found it:

cat /sys/class/thermal/thermal_zone0/temp
It's in thousandths of a degree C now, rather than straight degrees C.

And on the Mini? Nope, it's not there either. If Dave needs the temperature he needs to stick to OS X, or else figure out lm_sensors.

Update: Matthew Garrett has an excellent blog article on the OS entries reported to ACPI. Apparently Linux since 2.6.29 has claimed to be "Microsoft Windows NT" to avoid just the sort of problem I mentioned. Though that leaves me confused about why my desktop machine always reports 40C. Thanks to JanC for pointing me to that article!

Tags: , , ,
[ 21:54 Apr 08, 2009    More linux/kernel | permalink to this entry | ]

Sat, 08 Sep 2007

Custom beep sounds

It's always amazed me that Linux doesn't let you customize the system beep. You can call xset b volume pitch duration to make it beep higher or lower, or you can turn it off or switch to "visual bell"; but there's nothing like the way most other OSes let you customize it to any sound you want. (Some desktops let you customize the desktop alerts, but that doesn't do anything about the beeping you get from vim, or emacs, or bash, or a hundred other programs that run in terminals.)

Today someone pointed me toward a Gentoo wiki page that led me to Fancy Beeper Daemon. This is a small kernel module that adds a device, /dev/beep, which outputs a byte every time there's a beep. You can write a very simple daemon (several samples in Python are included with the module) which reads /dev/beep and plays the sound of your choice.

I downloaded beep-2.6.15+.tar.gz, but it needed one tweak to build it under 2.6.23-rc3: I needed to add

#include <linux/sched.h>
among the includes at the beginning of beep.c. After that, it compiled and installed just fine. Since it's a kernel module, it works in consoles as well as under X.

/dev/beep is created with owner and group root, and readable/ writable only by owner and group, so to test it I had to chmod 666 /dev/beep to run the daemon. I fixed that by making a file in /etc/udev/rules.d called 41-beep.rules, containing:

KERNEL=="beep", GROUP="audio"

Finally, based on the nice samples that came with the module, I wrote my own very simple Python daemon, playbeepd, that uses aplay.

Lots of fun! I don't know how much I'll use it, but it's good to have the option of custom beep sounds.

Tags: , ,
[ 21:47 Sep 08, 2007    More linux | permalink to this entry | ]

Thu, 28 Jun 2007

A Quartet of Workarounds

I upgraded my laptop's Ubuntu partition from Edgy to Feisty. Debian Etch works well, but it's just too old and I can't build software like GIMP that insists on depending on cutting-edge libraries.

But Feisty is cutting edge in other ways, so it's been a week of workarounds, in two areas: Firefox and the kernel. I'll start with Firefox.

Firefox crashes playing flash

First, the way Ubuntu's Firefox crashes when running Flash. I run flashblock, fortunately, so I've been able to browse the web just fine as long as I don't click on a flashblock button. But I like being able to view the occasional youtube video, and flash 7 has worked fine for me on every distro except Ubuntu. I first saw the problem on Edgy, and upgrading to Feisty didn't cure the problem.

But it wasn't their Firefox build; my own "kitfox" firefox build crashed as well. And it wasn't their flash installation; I've never had any luck with either their adobe flash installer nor their opensource libswfdec, so I'm running the same old flash 7 plug-in that I've used all along for other distros.

To find out what was really happening, I ran Firefox from the commandline, then went to a flash page. It turned out it was triggering an X error:

The error was: 'BadMatch (invalid parameter attributes)'.
(Details: serial 104 error_code 8 request_code 145 minor_code 3)

That gave me something to search for. It turns out there's a longstanding Ubuntu bug, 14911 filed on this issue, with several workarounds. Setting the environment variable XLIB_SKIP_ARGB_VISUALS to 1 fixed the problem, but, reading farther in the bug, I saw that the real problem was that Ubuntu's installer had, for some strange reason, configured my X to use 16 bit color instead of 24. Apparently this is pretty common, and due to some bug involving X's and Mozilla's or Flash's handling of transparency, this causes flash to crash Mozilla.

So the solution is very simple. Edit /etc/X11/xorg.conf, look for the DefaultDepth line, and if it's 16, that's your problem. Change it to 24, restart X and see if flash works. It worked for me!

Eliminating Firefox's saved session pester dialog

While I was fiddling with Firefox, Dave started swearing. "Why does Firefox always make me go through this dialog about restoring the last session? Is there a way to turn that off?"

Sure enough, there's no exposed preference for this, so I poked around about:config, searched for browser and found browser.sessionstore.resume_from_crash. Doubleclick that line to change it to false and you'll have no more pesky dialog.

For more options related to session store, check the Mozillazine Session Restore page.

Kernel: runaway kacpid

Alas, having upgraded to Feisty expressly so that I could build cutting-edge programs like GIMP, I discovered that I couldn't build anything at all. Anything that uses heavy CPU for more than a minute or two triggers a kernel daemon, kacpid, to grab most of the CPU for itself. Being part of the kernel (even though it has a process ID), kacpi is unkillable, and prevents the machine from shutting down, so once this happens the only solution is to pull the power plug.

This has actually been a longstanding Ubuntu problem (bug 75174) but it used to be that disabling acpid would stop kacpid from running away, and with feisty, that no longer helps. The bug is also kernel.org bug 8274.

The Ubuntu bug suggested that disabling cpufreq solved it for one person. Apparently the only way to do that is to build a new kernel. There followed a long session of attempted kernel building. It was tricky because of course I couldn't build on the target machine (inability to build being the problem I was trying to solve), and even if I built on my desktop machine, a large rsync of the modules directory would trigger a runaway kacpi. In the end, building a standalone kernel with no modules was the only option.

But turning off cpufreq didn't help, nor did any of the other obvious acpi options. The only option which stops kacpid is to disable acpi altogether, and use apm. I'm sorry to lose hibernate, and temperature monitoring, but that appears to be my only option with modern kernels. Sigh.

Kernel: Hangs for 2 minutes at boot time initializing sound card

While Dave and I were madly trying to find a set of config options to build a 2.6.21 that would boot on a Vaio (he was helping out with his SR33 laptop, starting from a different set of config options) we both hit, at about the same time, an odd bug: partway through boot, the kernel would initialize the USB memory stick reader:

sd 0:0:0:0: Attached scsi removable disk sda
sd 0:0:0:0: Attached scsi generic sg0 type 0
and then it would hang, for a long time. Two minutes, as it turned out. And the messages after that were pretty random: sometimes related to the sound card, sometimes to the network, sometimes ... GConf?! (What on earth is GConf doing in a kernel boot sequence?) We tried disabling various options to try to pin down the culprit: what was causing that two minute hang?

We eventually narrowed the blame to the sound card (which is a Yamaha, using the ymfpci driver). And that was enough information for google to find this Linux Kernel Mailing List thread. Apparently the sound maintainer decided, for some reason, to make the ymfpci driver depend on an external firmware file ... and then didn't include the firmware file, nor is it included in the alsa-firmware package he references in that message. Lovely. I'm still a little puzzled about the timeout: the post does not explain why, if a firmware file isn't found on the disk, waiting for two minutes is likely to make one magically appear.

Apparently it will be fixed in 2.6.22, which isn't much help for anyone who's trying to run a kernel on any of the 2.6.21.* series in the meantime. (Isn't it a serious enough regression to fix in 2.6.21.*?) And he didn't suggest a workaround, except that alsa-firmware package which doesn't actually contain the firmware for that card. Looks like it's left to the user to make things work.

So here's what to do: it turns out that if you take a 2.6.21 kernel, and substitute the whole sound/pci/ymfpci directory from a 2.6.20 kernel source tree, it builds and boots just fine. And I'm off and running with a standalone apm kernel with no acpi; sound works, and I can finally build GIMP again.

So it's been quite a week of workarounds. You know, I used to argue with all those annoying "Linux is not ready for the desktop" people. But sometimes I feel like Linux usability is moving in the wrong direction. I try to imagine explaining to my mac-using friends why they should have to edit /etc/X11/xorg.conf because their distro set up a configuration that makes Firefox crash, or why they need to build a new kernel because the distributed ones all crash or hang ... I love Linux and don't regret using it, but I seem to need workarounds like this more often now than I did a few years ago. Sometimes it really does seem like the open source world is moving backward, not forward.

Tags: , , , , ,
[ 23:24 Jun 28, 2007    More linux | permalink to this entry | ]

Tue, 15 May 2007

Etch: Blacklisting Modules, Udev, and Firewire Networking

The new Debian Etch installation on my laptop was working pretty well. But it had one weirdness: the ethernet card was on eth1, not eth0. ifconfig -a revealed that eth0 was ... something else, with no IP address configured and a really long MAC address. What was it?

Poking around dmesg revealed that it was related to the IEEE 1394 and the eth1394 module. It was firewire networking.

This laptop, being a Vaio, does have a built-in firewire interface (Sony calls it i.Link). The Etch installer, when it detected no network present, had noted that it was "possible, though unlikely" that I might want to use firewire instead, and asked whether to enable it. I declined.

Yet the installed system ended up with firewire networking not only installed, but taking the first network slot, ahead of any network cards. It didn't get in the way of functionality, but it was annoying and clutters the output whenever I type ifconfig -a. Probably took up a little extra boot time and system resources, too. I wanted it gone.

Easier said than done, as it turns out.

I could see two possible approaches.

  1. Figure out who was setting it to eth1, and tell it to ignore the device instead.
  2. Blacklist the kernel module, so it couldn't load at all.

I begain with approach 1. The obvious culprit, of course, was udev. (I had already ruled out hal, by removing it, rebooting and observing that the bogus eth0 was still there.) Poking around /etc/udev/rules.d revealed the file where the naming was happening: z25_persistent-net.rules.

It looks like all you have to do is comment out the two lines for the firewire device in that file. Don't believe it. Upon reboot, udev sees the firewire devices and says "Oops! persistent-net.rules doesn't have a rule for this device. I'd better add one!" and you end up with both your commented-out line, plus a brand new uncommented line. No help.

Where is that controlled? From another file, z45_persistent-net-generator.rules. So all you have to do is edit that file and comment out the lines, right?

Well, no. The firewire lines in that file merely tell udev how to add a comment when it updates z25_persistent-net.rules. It still updates the file, it just doesn't comment it as clearly.

There are some lines in z45_persistent-net-generator.rules whose comments say they're disabling particular devices, by adding a rule GOTO="persistent_net_generator_end". But adding that in the firewire device lines caused the boot process to hang. There may be a way to ignore a device from this file, but I haven't found it, nor any documentation on how this system works.

Defeated, I switched to approach 2: prevent the module from loading at all. I never expect to use firewire networking, so it's no loss. And indeed, there are lots of other modules loaded I'd like to blacklist, since they represent hardware this machine doesn't have. So it would be nice to learn how.

I had a vague memory of there having been a file with a name like /etc/modules.blacklist some time back in the Pliocene. But apparently no such file exists any more. I did find /etc/modprobe.d/blacklist, which looked promising; but the comment at the beginning of that file says

# This file lists modules which will not be loaded as the result of
# alias expsnsion, with the purpose of preventing the hotplug subsystem
# to load them. It does not affect autoloading of modules by the kernel.
Okay, sounds like this file isn't what I wanted. (And ... hotplug? I thought that was long gone, replaced by udev scripts.) I tried it anyway. Sure enough, not what I wanted.

I fiddled with several other approaches before Debian diva Erinn Clark found this helpful page. I created a file called /etc/modprobe.d/00local and added this line to it:

install eth1394 /bin/true
and on the next boot, the module was no longer loaded, and no longer showed up as a bogus ethernet device. Hurray!

This /etc/modprobe.d/00local technique probably doesn't bear examining too closely. It has "hack" written all over it. But if that's the only way to blacklist problematic modules, I guess it's better than nothing.

Tags: , , ,
[ 19:10 May 15, 2007    More linux | permalink to this entry | ]