Shallow Thoughts

Akkana's Musings on Open Source, Science, and Nature.

Tue, 29 Apr 2008

How often does updatedb really need to run?

Since updating to Hardy, I've been getting mail from Anacron:
/etc/cron.weekly/slocate:
slocate: fatal error: load_file: Could not open file: /etc/updatedb.conf: No such file or directory

That's the script that updates the database for locate, Linux's fast find system. I figured I must have screwed something up when I moved that slocate cron script from cron.daily to cron.weekly (because I hate having my machine slow to a crawl as soon as I boot it in the morning, and it doesn't bother me if the database doesn't necessarily have files added in the last day or two).

But after talking to some other folks and googling for Ubuntu bugs, I discovered I wasn't the only one getting that mail, and there was already a bug covering it. Comparing my setup with another Hardy user's, I found that the file slocate was failing to find, /etc/updatedb.conf, belongs to a different package, mlocate. If mlocate is installed, then slocate's cron script works; otherwise, it doesn't. Sounds like slocate should have a dependency that pulls in mlocate, no?

But wait, what do these two packages do? Let's try a little aptitude search locate:

p   dlocate                         - fast alternative to dpkg -L and dpkg -S   
p   kio-locate                      - kio-slave for the locate command          
i   locate                          - maintain and query an index of a directory
p   mlocate                         - quickly find files on the filesystem based
i   slocate                         - Secure replacement of findutil's locate   
Okay, forget the first two, but we have locate, mlocate, and slocate. How do they relate?

Worse, if I install mlocate (so slocate will work) and then look in my cron directories, it turns out I now have, count 'em, five different cron scripts that run updatedb. They are:

In cron.daily:

locate: 72 lines! but a lot of that is comments and pruning, and a lot of fiddling to figure out what version of the kernel is running to see whether it can pass any advanced flags when it tries to renice the process. In the end it calls updatedb.findutils (note no full path, though it uses a full path when it checks for it earlier in the script).

slocate: A much simpler but unfortunately buggy 20 lines. It checks for /etc/updatedb.conf, runs it if it exists, fiddles with ionice, checks again for /etc/updatedb.conf, and based on whether it finds it, runs either /usr/bin/slocate -u or /usr/bin/slocate -u -f proc. The latter path is what was failing and sending root mail every time the script was run.

mlocate: an even slimmer 12 line script, which checks for /usr/bin/updatedb.mlocate and, if it exists, fiddles ionice then runs /usr/bin/updatedb.mlocate.

In cron.weekly:

Two virtually identical scripts called find.notslocate and find.notslocate.dpkg-new, which differ only in dpkg-new having more elaborate ionice options. They both run updatedb. And which updatedb would that be? Probably /usr/bin/updatedb, which links to /etc/alternatives/updatedb, which probably links to either updatedb.mlocate or updatedb.slocate, whichever you've installed most recently. But in either case, it's hard to see why you'd need this script running weekly if you're already running both flavors of updatedb from other scripts cron.daily. And having two copies of the script is just plain wrong (and there was already a bug filed on it). (As long as you're poking around in cron.daily and cron.weekly, check and see if you have any more of these extra dpkg-new or dpkg-old scripts -- they might be slowing down your machine for no reason.)

Further research reveals that mlocate is a new(ish) package intended to replace slocate. (There was a long discussion of that on ubuntu-devel, leading to the replacement of slocate with mlocate very late in the Hardy development cycle. There was also lots of discussion of "tracker", apparently a GUI fast find tool that can only search in the user's home directory.)

What is this mlocate? The m stands for "merge": the advantage of mlocate is that it can merge new results into its existing database instead of replacing the whole thing every time. Sounds good, right? However, the down side is that mlocate apparently can't to purge its database of old files that no longer exist, and these files will clutter up your locate results. Running locate -e will keep them from being printed -- but there seems to be no way to set this permanently, via an environment variable or .locaterc file, nor to tell updatedb.mlocate to clean up its database. So you'll need to alias locate to locate -e if you want sensible behavior. Or go back to slocate. Sigh.

Cleaning up

The important thing is to get rid of most of those spurious updatedb cron scripts. You might choose to run updatedb daily, weekly, or only when you choose to run it; but you probably don't want five different scripts running two different versions of updatedb at different times. The packages obviously aren't cleaning up after themselves, so let's do a little manual cleanup.

That find.slocate script looks suspicious. In fact, if you run dpkg -S find.notslocate, you find out that it doesn't belong to any package -- not only should the .dpkg-old version not be there, neither should the other one! So out they go.

As for slocate and mlocate, it's important to know that the two packages can coexist: installing mlocate doesn't remove slocate or vice versa. A clean Hardy install should have only mlocate; upgrades from Gutsy are more likely to have a broken slocate.

Having both packages probably isn't what you want. So pick one, and remove or disable the other. If mlocate is what you want, apt-get purge slocate and just make sure that /etc/cron.*/slocate disappears. If you decide you want slocate, it's a little trickier since the slocate package is broken; but you can fix it by creating an empty /etc/updatedb.conf so updatedb.slocate won't fail.

Tags: , , ,
[ 20:48 Apr 29, 2008    More linux/install | permalink to this entry ]