Shallow Thoughts : tags : debugging

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Tue, 02 Sep 2014

Using strace to find configuration file locations

I was using strace to figure out how to set up a program, lftp, and a friend commented that he didn't know how to use it and would like to learn. I don't use strace often, but when I do, it's indispensible -- and it's easy to use. So here's a little tutorial.

My problem, in this case, was that I needed to find out what configuration file I needed to modify in order to set up an alias in lftp. The lftp man page tells you how to define an alias, but doesn't tell you how to save it for future sessions; apparently you have to edit the configuration file yourself.

But where? The man page suggested a couple of possible config file locations -- ~/.lftprc and ~/.config/lftp/rc -- but neither of those existed. I wanted to use the one that already existed. I had already set up bookmarks in lftp and it remembered them, so it must have a config file already, somewhere. I wanted to find that file and use it.

So the question was, what files does lftp read when it starts up? strace lets you snoop on a program and see what it's doing.

strace shows you all system calls being used by a program. What's a system call? Well, it's anything in section 2 of the Unix manual. You can get a complete list by typing: man 2 syscalls (you may have to install developer man pages first -- on Debian that's the manpages-dev package). But the important thing is that most file access calls -- open, read, chmod, rename, unlink (that's how you remove a file), and so on -- are system calls.

You can run a program under strace directly:

$ strace lftp sitename
Interrupt it with Ctrl-C when you've seen what you need to see.

Pruning the output

And of course, you'll see tons of crap you're not interested in, like rt_sigaction(SIGTTOU) and fcntl64(0, F_GETFL). So let's get rid of that first. The easiest way is to use grep. Let's say I want to know every file that lftp opens. I can do it like this:

$ strace lftp sitename |& grep open

I have to use |& instead of just | because strace prints its output on stderr instead of stdout.

That's pretty useful, but it's still too much. I really don't care to know about strace opening a bazillion files in /usr/share/locale/en_US/LC_MESSAGES, or libraries like /usr/lib/i386-linux-gnu/libp11-kit.so.0.

In this case, I'm looking for config files, so I really only want to know which files it opens in my home directory. Like this:

$ strace lftp sitename |& grep 'open.*/home/akkana'

In other words, show me just the lines that have either the word "open" or "read" followed later by the string "/home/akkana".

Digression: grep pipelines

Now, you might think that you could use a simpler pipeline with two greps:

$ strace lftp sitename |& grep open | grep /home/akkana

But that doesn't work -- nothing prints out. Why? Because grep, under certain circumstances that aren't clear to me, buffers its output, so in some cases when you pipe grep | grep, the second grep will wait until it has collected quite a lot of output before it prints anything. (This comes up a lot with tail -f as well.) You can avoid that with

$ strace lftp sitename |& grep --line-buffered open | grep /home/akkana
but that's too much to type, if you ask me.

Back to that strace | grep

Okay, whichever way you grep for open and your home directory, it gives:

open("/home/akkana/.local/share/lftp/bookmarks", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.netrc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/akkana/.local/share/lftp/rl_history", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.inputrc", O_RDONLY|O_LARGEFILE) = 5
Now we're getting somewhere! The file where it's getting its bookmarks is ~/.local/share/lftp/bookmarks -- and I probably can't use that to set my alias.

But wait, why doesn't it show lftp trying to open those other config files?

Using script to save the output

At this point, you might be sick of running those grep pipelines over and over. Most of the time, when I run strace, instead of piping it through grep I run it under script to save the whole output.

script is one of those poorly named, ungoogleable commands, but it's incredibly useful. It runs a subshell and saves everything that appears in that subshell, both what you type and all the output, in a file.

Start script, then run lftp inside it:

$ script /tmp/lftp.strace
Script started on Tue 26 Aug 2014 12:58:30 PM MDT
$ strace lftp sitename

After the flood of output stops, I type Ctrl-D or Ctrl-C to exit lftp, then another Ctrl-D to exit the subshell script is using. Now all the strace output was in /tmp/lftp.strace and I can grep in it, view it in an editor or anything I want.

So, what files is it looking for in my home directory and why don't they show up as open attemps?

$ grep /home/akkana /tmp/lftp.strace

Ah, there it is! A bunch of lines like this:

access("/home/akkana/.lftprc", R_OK)    = -1 ENOENT (No such file or directory)
stat64("/home/akkana/.lftp", 0xbff821a0) = -1 ENOENT (No such file or directory)
mkdir("/home/akkana/.config", 0755)     = -1 EEXIST (File exists)
mkdir("/home/akkana/.config/lftp", 0755) = -1 EEXIST (File exists)
access("/home/akkana/.config/lftp/rc", R_OK) = 0

So I should have looked for access and stat as well as open. Now I have the list of files it's looking for. And, curiously, it creates ~/.config/lftp if it doesn't exist already, even though it's not going to write anything there.

So I created ~/.config/lftp/rc and put my alias there. Worked fine. And I was able to edit my bookmark in ~/.local/share/lftp/bookmarks later when I had a need for that. All thanks to strace.

Tags: , ,
[ 13:06 Sep 02, 2014    More linux/cmdline | permalink to this entry | ]

Thu, 28 Aug 2014

Debugging a mysterious terminal setting

For the last several months, I repeatedly find myself in a mode where my terminal isn't working quite right. In particular, Ctrl-C doesn't work to interrupt a running program. It's always in a terminal where I've been doing web work. The site I'm working on sadly has only ftp access, so I've been using ncftp to upload files to the site, and git and meld to do local version control on the copy of the site I keep on my local machine. I was pretty sure the problem was coming from either git, meld, or ncftp, but I couldn't reproduce it.

Running reset fixed the problem. But since I didn't know what program was causing the problem, I didn't know when I needed to type reset.

The first step was to find out which of the three programs was at fault. Most of the time when this happened, I wouldn't notice until hours later, the next time I needed to stop a program with Ctrl-C. I speculated that there was probably some way to make zsh run a check after every command ... if I could just figure out what to check.

Terminal modes and stty -a

It seemed like my terminal was getting put into raw mode. In programming lingo, a terminal is in raw mode when characters from it are processed one at a time, and special characters like Ctrl-C, which would normally interrupt whatever program is running, are just passed like any other character.

You can list your terminal modes with stty -a:

$ stty -a
speed 38400 baud; rows 32; columns 80; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = ;
eol2 = ; swtch = ; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread -clocal -crtscts
ignbrk -brkint ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff
-iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig icanon -iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke

But that's a lot of information. Unfortunately there's no single flag for raw mode; it's a collection of a lot of flags. I checked the interrupt character: yep, intr = ^C, just like it should be. So what was the problem?

I saved the output with stty -a >/tmp/stty.bad, then I started up a new xterm and made a copy of what it should look like with stty -a >/tmp/stty.good. Then I looked for differences: meld /tmp/stty.good /tmp/stty.bad. I saw these flags differing in the bad one: ignbrk ignpar -iexten -ixon, while the good one had -ignbrk -ignpar iexten ixon. So I should be able to run:

$ stty -ignbrk -ignpar iexten ixon
and that would fix the problem. But it didn't. Ctrl-C still didn't work.

Setting a trap, with precmd

However, knowing some things that differed did give me something to test for in the shell, so I could test after every command and find out exactly when this happened. In zsh, you do that by defining a precmd function, so here's what I did:

precmd()
{
    stty -a | fgrep -- -ignbrk > /dev/null
    if [ $? -ne 0 ]; then
        echo
        echo "STTY SETTINGS HAVE CHANGED \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!"
        echo
    fi
}
Pardon all the exclams. I wanted to make sure I saw the notice when it happened.

And this fairly quickly found the problem: it happened when I suspended ncftp with Ctrl-Z.

stty sane and isig

Okay, now I knew the culprit, and that if I switched to a different ftp client the problem would probably go away. But I still wanted to know why my stty command didn't work, and what the actual terminal difference was.

Somewhere in my web searching I'd stumbled upon some pages suggesting stty sane as an alternative to reset. I tried it, and it worked.

According to man stty, stty sane is equivalent to

$ stty cread -ignbrk brkint -inlcr -igncr icrnl -iutf8 -ixoff -iuclc -ixany  imaxbel opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke

Eek! But actually that's helpful. All I had to do was get a bad terminal (easy now that I knew ncftp was the culprit), then try:

$ stty cread 
$ stty -ignbrk 
$ stty brkint
... and so on, trying Ctrl-C each time to see if things were back to normal. Or I could speed up the process by grouping them:
$ stty cread -ignbrk brkint
$ stty -inlcr -igncr icrnl -iutf8 -ixoff
... and so forth. Which is what I did. And that quickly narrowed it down to isig. I ran reset, then ncftp again to get the terminal in "bad" mode, and tried:
$ stty isig
and sure enough, that was the difference.

I'm still not sure why meld didn't show me the isig difference. But if nothing else, I learned a bit about debugging stty settings, and about stty sane, which is a much nicer way of resetting the terminal than reset since it doesn't clear the screen.

Tags: , ,
[ 15:41 Aug 28, 2014    More linux | permalink to this entry | ]

Sun, 09 Jun 2013

Debugging a Firefox freeze on Ringtail and Sid

I recently went on an upgrading spree on my main computer. In the hope of getting more up-to-date libraries, I updated my Ubuntu to 13.04 "Raring Ringtail", and Debian to unstable "Sid". Most things went fine -- except for Firefox.

Under both Ringtail and Sid, Firefox became extremely unstable. I couldn't use it for more than about fifteen minutes before it would freeze while trying to access some web resource. The only cure when that happened was to kill it and start another Firefox. This was happening with the exact same Firefox -- a 21.0 build from mozilla.org -- that I was using without any problems on older versions of Debian and Ubuntu; and with the exact same profile. So it was clearly something that had changed about Debian and Ubuntu.

The first thing I do when I hit a Firefox bug is test with a fresh profile. I have all sorts of Firefox customizations, extensions and other hacks. In fact, the customizations are what keep me tied to Firefox rather than jumping to some other browser. But they do, too often, cause problems. I have a generic profile I keep around for testing, so I fired it up and used it for browsing for a day. Firefox still froze, but not as often.

Disabling Extensions

Was it one of my extensions? I went to the Tools->Add-ons to try disabling them all ... and Firefox froze. Bingo! That was actually good news. Problems like "Firefox freezes a lot" are hard to debug. "Firefox freezes every time I open Tools->Add-ons" are a whole lot easier. Now I needed to find some other way of disabling extensions to see if that helped.

I went to my Firefox profile directory and moved everything in the extensions directory into a new directory I made called extensions.sav. Then I started moving them back one by one, each time starting Firefox and calling up Tools->Add-ons. It turned out two extensions were causing the freeze: Open in Browser and Custom Tab Width. So I left those off for the time being.

Disabling Themes

Along the way, I discovered that clicking on Appearance in Tools->Add-ons would also cause a freeze, so my visual theme was also a problem. This wasn't something I cared about: some time back when Mozilla started trumpeting their themeability, I clicked around and picked up some theme involving stars and planets. I could live without that.

But how do you disable a theme? Especially if you can't go to Tools->Add-ons->Appearance?

Turns out everything written on the web on this is wrong. First, everything on themes on mozilla.org assumes you can get to that Appearance tab, and doesn't even consider the possibility that you might have to look in your profile and remove a file. Search further and you might find references to files named lightweighttheme-header and lightweighttheme-footer, neither of which existed in my profile.

But I did have a directory called lwtheme. So I removed that, plus four preferences in prefs.js that included the term "lightweightThemes". After a restart, my theme was gone, I was able to view that Appearance tab, and I was able to browse the web for nearly 4 hours before firefox hung again. Darn! That wasn't all of it.

Debugging the environment

But soon after that I had a breakthrough. I discovered a page on my bank's website that froze Firefox every time. But that was annoying for testing, since it required logging in then clicking through several other pages, and you never know what a bank website might decide to do if you start logging in over and over. I didn't want to get locked out.

But then I was checking an episode in one of the podcasts I listen to, which involved going to the link http://downloads.bbc.co.uk/podcasts/radio4/moreorless/rss.xml -- and Firefox froze, on a simple RSS link. I restarted and tried again -- another freeze. I'd finally found the Rosetta stone, something that hung Firefox every time. Now I could do some serious testing!

I'd had friends try this using the same version of Firefox and Ubuntu, without seeing a freeze. Was it something about my user environment? I created a new user, switched to another virtual console (Ctrl-Alt-F2) and logged in as my new user, then ran X. This was a handy way to test: I could get to my normal user's X session in Ctrl-Alt-F7, while the new user's X session was on Ctrl-Alt-F8. Since I don't have Gnome or KDE installed on this machine, the new user came up with a default Openbox session. It came up at the wrong resolution -- the X11 in the newest Linux distros apparently doesn't read the HDMI monitor properly -- but I wasn't worried about that.

And when I ran Firefox as the new user (letting it create a new profile) and middlemouse-pasted the BBC RSS URL, it loaded it, without freezing.

Now we're getting somewhere. Now I knew it was something about my user environment.

I tried copying all of ~/.config from my user to the new user. No hang. I tried various other configuration files. Still no hang.

The X initialization

I'll skip some steps here, and just mention that in trying to fix the resolution problem, so I didn't have to do all my debugging at 1024x768, I discovered that if I used my .xinitrc file to start X, I'd get a freezy Firefox. If I didn't use my .xinitrc, and defaulted to the system one, Firefox was fine. Even if I removed everything else from my .xinitrc, and simply ran openbox from it, that was enough to make Firefox hang.

Okay, what was the system doing? I poked around /etc/X11: it was running /etc/X11/Xsession. I copied that file to my .xinitrc and started X. No hang.

Xsession does a bunch of things, but one of the main things it does is run every script in the /etc/X11/Xsession.d directory. So I made a copy of that directory inside my home directory, and modified .xinitrc to execute those files instead. Then I started moving them aside to see which ones made a difference.

And I found it. /etc/X11/Xsession.d/75dbus_dbus-launch was the file that mattered.

75dbus_dbus-launch takes the name of the program that's going to be executed -- in this case that was x-session-manager, which links to /etc/alternatives/x-session-manager, which links to /usr/bin/openbox-session -- and instead runs /usr/bin/dbus-launch --exit-with-session x-session-manager.

Now that I knew that, I moved everything aside and made a little .xinitrc that ran /usr/bin/dbus-launch --exit-with-session openbox-session. And Firefox didn't crash.

Dbus

So it all comes down to dbus. I was already running dbus: ps shows /usr/bin/dbus-daemon --system running -- and that worked fine for everything dbussy I normally do, like run "gimp image.jpg" and have it open in my already running GIMP.

But on Ringtail and Sid, that isn't enough for Firefox. For some reason, on these newer systems, Firefox requires a second dbus daemon -- it shows up in ps as /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session -- for the X session. If it doesn't have that, it's fine for a while, and then, hours later, it will mysteriously freeze while waiting for a network resource.

Why? I have no idea. No one I've asked seems to know anything about how dbus works, the difference between system and session dbus daemons, or why any of it it would have this effect on Firefox.

I filed a Firefox bug, Bug 881122, though I don't have much hope of anyone being interested in a bug that only affects Linux users using nonstandard X sessions. But maybe I'm not the only one. If your Firefox is hanging and you found your way here, I hope I've given you some ideas. And if anyone has a clue as to what's really happening and why dbus would have that effect, I'd love to hear from you.

Tags: , , , , , ,
[ 20:08 Jun 09, 2013    More linux | permalink to this entry | ]

Thu, 09 Dec 2010

Article: Troubleshooting, part II: /proc and Python

My article this week on Linux Planet concerns Advanced Linux Server Troubleshooting (part 2).

It's two loosely related topics: exploring the /proc filesystem, and how to use it to find information on a running process; and several ways to get stack traces from Python programs.

This (as well as Troubleshooting part I) arose from a problem we had at work, where we use Linux plug computers (ARM-based Linux appliances) running Python scripts. It's not uncommon for Python networking scripts to go into never-never-land, waiting forever on a network connection without timing out. Since plug computers tend not to be outfitted with the latest and greatest tools like gdb and debug versions of libraries, we've needed to find more creative ways of figuring out what processes are doing to make sure our programs are ready for anything.

Tags: , ,
[ 11:44 Dec 09, 2010    More linux | permalink to this entry | ]

Wed, 24 Nov 2010

Article: Troubleshooting running processes, part I

How do you troubleshoot a process that's running away, sucking up too much CPU, or not doing anything at all? Today on Linux Planet: Troubleshooting Linux Servers: top and Other Basic System Tools.

This is part I, covering basics like top, strace and gdb. Part II will get into hairier stuff and tips for debugging Python applications.

Tags: , ,
[ 21:06 Nov 24, 2010    More linux | permalink to this entry | ]