Shallow Thoughts : tags : debugging
Akkana's Musings on Open Source Computing and Technology, Science, and Nature.
Tue, 02 Sep 2014
I was using strace to figure out how to set up a program, lftp, and
a friend commented that he didn't know how to use it
and would like to learn. I don't use strace often, but when I do,
it's indispensible -- and it's easy to use. So here's a little tutorial.
My problem, in this case, was that I needed to find out what
configuration file I needed to modify in order to set up an alias
in lftp. The lftp man page tells you how to define an alias, but doesn't
tell you how to save it for future sessions; apparently you have
to edit the configuration file yourself.
But where? The man page suggested
a couple of possible config file locations -- ~/.lftprc and
~/.config/lftp/rc -- but neither of those existed. I wanted
to use the one that already existed. I had already set up bookmarks
in lftp and it remembered them, so it must have a config file already,
somewhere. I wanted to find that file and use it.
So the question was, what files does lftp read when it starts up?
strace lets you snoop on a program and see what it's doing.
strace shows you all system calls being used by a program.
What's a system call? Well, it's anything in section 2 of the Unix manual.
You can get a complete list by typing:
man 2 syscalls
(you may have to install developer man pages first -- on Debian that's
the manpages-dev package). But the important thing is that most
file access calls -- open, read, chmod, rename, unlink (that's how you
remove a file), and so on -- are system calls.
You can run a program under strace directly:
$ strace lftp sitename
Interrupt it with Ctrl-C when you've seen what you need to see.
Pruning the output
And of course, you'll see tons of crap you're not interested in,
like rt_sigaction(SIGTTOU) and fcntl64(0, F_GETFL). So let's get rid
of that first. The easiest way is to use grep. Let's say I want to know
every file that lftp opens. I can do it like this:
$ strace lftp sitename |& grep open
I have to use |& instead of just | because strace prints its
output on stderr instead of stdout.
That's pretty useful, but it's still too much. I really don't care
to know about strace opening a bazillion files in
/usr/share/locale/en_US/LC_MESSAGES, or libraries like
In this case, I'm looking for config files, so I really only want to know
which files it opens in my home directory. Like this:
$ strace lftp sitename |& grep 'open.*/home/akkana'
In other words, show me just the lines that have either the word "open"
or "read" followed later by the string "/home/akkana".
Digression: grep pipelines
Now, you might think that you could use a simpler pipeline with two greps:
$ strace lftp sitename |& grep open | grep /home/akkana
But that doesn't work -- nothing prints out. Why? Because grep, under
certain circumstances that aren't clear to me, buffers its output, so
in some cases when you pipe grep | grep, the second grep will wait
until it has collected quite a lot of output before it prints anything.
(This comes up a lot with
tail -f as well.)
You can avoid that with
$ strace lftp sitename |& grep --line-buffered open | grep /home/akkana
but that's too much to type, if you ask me.
Back to that strace | grep
Okay, whichever way you grep for open and your home directory,
open("/home/akkana/.local/share/lftp/bookmarks", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.netrc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/akkana/.local/share/lftp/rl_history", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.inputrc", O_RDONLY|O_LARGEFILE) = 5
Now we're getting somewhere! The file where it's getting its bookmarks
-- and I probably can't use that
to set my alias.
But wait, why doesn't it show lftp trying to open those other config files?
Using script to save the output
At this point, you might be sick of running those grep pipelines over
and over. Most of the time, when I run strace, instead of piping it
through grep I run it under script to save the whole output.
script is one of those poorly named, ungoogleable commands, but it's
incredibly useful. It runs a subshell and saves everything that appears
in that subshell, both what you type and all the output, in a file.
Start script, then run lftp inside it:
$ script /tmp/lftp.strace
Script started on Tue 26 Aug 2014 12:58:30 PM MDT
$ strace lftp sitename
After the flood of output stops, I type Ctrl-D or Ctrl-C to exit lftp,
then another Ctrl-D to exit the subshell script is using.
Now all the strace output was in /tmp/lftp.strace and I can
grep in it, view it in an editor or anything I want.
So, what files is it looking for in my home directory and why don't
they show up as open attemps?
$ grep /home/akkana /tmp/lftp.strace
Ah, there it is! A bunch of lines like this:
access("/home/akkana/.lftprc", R_OK) = -1 ENOENT (No such file or directory)
stat64("/home/akkana/.lftp", 0xbff821a0) = -1 ENOENT (No such file or directory)
mkdir("/home/akkana/.config", 0755) = -1 EEXIST (File exists)
mkdir("/home/akkana/.config/lftp", 0755) = -1 EEXIST (File exists)
access("/home/akkana/.config/lftp/rc", R_OK) = 0
So I should have looked for access and stat as well as
Now I have the list of files it's looking for. And, curiously,
it creates ~/.config/lftp if it doesn't exist already, even though
it's not going to write anything there.
So I created ~/.config/lftp/rc and put my alias there. Worked fine.
And I was able to edit my bookmark in ~/.local/share/lftp/bookmarks
later when I had a need for that. All thanks to strace.
[ 13:06 Sep 02, 2014
More linux/cmdline |
permalink to this entry |
Thu, 28 Aug 2014
For the last several months, I repeatedly find myself in a mode where
my terminal isn't working quite right. In particular, Ctrl-C doesn't
work to interrupt a running program. It's always in a terminal where
I've been doing web work. The site I'm working on sadly has only ftp
access, so I've been using ncftp to upload files to the site, and git
and meld to do local version control on the copy of the site I keep on
my local machine. I was pretty sure the problem was coming from either
git, meld, or ncftp, but I couldn't reproduce it.
reset fixed the problem. But since I didn't know
what program was causing the problem, I didn't know when I needed to
The first step was to find out which of the three programs was at fault.
Most of the time when this happened, I wouldn't notice until hours
later, the next time I needed to stop a program with Ctrl-C.
I speculated that there was probably some way to make zsh run a check
after every command ... if I could just figure out what to check.
Terminal modes and stty -a
It seemed like my terminal was getting put into raw mode.
In programming lingo, a terminal is in raw mode when characters
from it are processed one at a time, and special characters like
Ctrl-C, which would normally interrupt whatever program is running,
are just passed like any other character.
You can list your terminal modes with stty -a:
$ stty -a
speed 38400 baud; rows 32; columns 80; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = ;
eol2 = ; swtch = ; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread -clocal -crtscts
ignbrk -brkint ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff
-iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig icanon -iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
But that's a lot of information. Unfortunately there's no single flag
for raw mode; it's a collection of a lot of flags.
I checked the interrupt character:
intr = ^C, just like it should be. So what was
I saved the output with
stty -a >/tmp/stty.bad, then
I started up a new xterm and made a copy of what it should
look like with
stty -a >/tmp/stty.good. Then I looked
meld /tmp/stty.good /tmp/stty.bad.
I saw these flags differing in the bad one: ignbrk ignpar -iexten -ixon,
while the good one had -ignbrk -ignpar iexten ixon. So I should be
able to run:
$ stty -ignbrk -ignpar iexten ixon
and that would fix the problem. But it didn't. Ctrl-C still didn't work.
Setting a trap, with precmd
However, knowing some things that differed did give me something to
test for in the shell, so I could test after every command and find
out exactly when this happened. In zsh, you do that by defining a
precmd function, so here's what I did:
stty -a | fgrep -- -ignbrk > /dev/null
if [ $? -ne 0 ]; then
echo "STTY SETTINGS HAVE CHANGED \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!"
Pardon all the exclams. I wanted to make sure I saw the notice when it happened.
And this fairly quickly found the problem: it happened when I suspended
ncftp with Ctrl-Z.
stty sane and isig
Okay, now I knew the culprit, and that if I switched to a different ftp
client the problem would probably go away. But I still wanted to know
why my stty command didn't work, and what the actual terminal
Somewhere in my web searching I'd stumbled upon some pages suggesting
stty sane as an alternative to
I tried it, and it worked.
man stty, stty sane is equivalent to
$ stty cread -ignbrk brkint -inlcr -igncr icrnl -iutf8 -ixoff -iuclc -ixany imaxbel opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke
Eek! But actually that's helpful. All I had to do was get a bad
terminal (easy now that I knew ncftp was the culprit), then try:
$ stty cread
$ stty -ignbrk
$ stty brkint
... and so on, trying Ctrl-C each time to see if things were back to normal.
Or I could speed up the process by grouping them:
$ stty cread -ignbrk brkint
$ stty -inlcr -igncr icrnl -iutf8 -ixoff
... and so forth. Which is what I did. And that quickly narrowed it
down to isig
. I ran reset, then ncftp again to get the terminal
in "bad" mode, and tried:
$ stty isig
and sure enough, that was the difference.
I'm still not sure why meld didn't show me the isig difference.
But if nothing else, I learned a bit about debugging stty settings,
stty sane, which is a much nicer way of
resetting the terminal than
reset since it doesn't
clear the screen.
[ 15:41 Aug 28, 2014
More linux |
permalink to this entry |
Sun, 09 Jun 2013
I recently went on an upgrading spree on my main computer. In the hope
of getting more up-to-date libraries, I updated my Ubuntu to 13.04
"Raring Ringtail", and Debian to unstable "Sid". Most things went fine
-- except for Firefox.
Under both Ringtail and Sid, Firefox became extremely unstable.
I couldn't use it for more than about fifteen minutes before it would
freeze while trying to access some web resource. The only cure when
that happened was to kill it and start another Firefox.
This was happening with the exact same Firefox -- a 21.0 build from
mozilla.org -- that I was using without any problems on older versions
of Debian and Ubuntu; and with the exact same profile. So it was
clearly something that had changed about Debian and Ubuntu.
The first thing I do when I hit a Firefox bug is test with
a fresh profile. I have all sorts of Firefox customizations, extensions
and other hacks. In fact, the customizations are what keep me tied
to Firefox rather than jumping to some other browser. But they do,
too often, cause problems. I have a generic profile I keep around
for testing, so I fired it up and used it for browsing for a day.
Firefox still froze, but not as often.
Was it one of my extensions?
I went to the Tools->Add-ons to try disabling them all ...
and Firefox froze. Bingo! That was actually good news. Problems like
"Firefox freezes a lot" are hard to debug. "Firefox freezes every time
I open Tools->Add-ons" are a whole lot easier.
Now I needed to find some other way of disabling extensions to see if
I went to my Firefox profile directory and moved everything
in the extensions directory into a new directory I made called
extensions.sav. Then I started moving them back one by one,
each time starting Firefox and calling up Tools->Add-ons.
It turned out two extensions were causing the freeze: Open in Browser
and Custom Tab Width. So I left those off for the time being.
Along the way, I discovered that clicking on Appearance in
Tools->Add-ons would also cause a freeze, so my visual
theme was also a problem. This wasn't something I cared about:
some time back when Mozilla started trumpeting their themeability,
I clicked around and picked up some theme involving stars and planets.
I could live without that.
But how do you disable a theme?
Especially if you can't go to Tools->Add-ons->Appearance?
Turns out everything written on the web on this is wrong. First,
everything on themes on mozilla.org assumes you can get to that
Appearance tab, and doesn't even consider the possibility that you
might have to look in your profile and remove a file.
Search further and you might find references to files named
lightweighttheme-header and lightweighttheme-footer, neither of
which existed in my profile.
But I did have a directory called lwtheme.
So I removed that, plus four preferences in prefs.js that included
the term "lightweightThemes".
After a restart, my theme was gone, I was able to view that Appearance tab,
and I was able to browse the web for nearly 4 hours before firefox hung again.
Darn! That wasn't all of it.
Debugging the environment
But soon after that I had a breakthrough.
I discovered a page on my bank's website that froze Firefox every time.
But that was annoying for testing, since it required logging in then
clicking through several other pages, and you never know what a bank
website might decide to do if you start logging in over and over.
I didn't want to get locked out.
But then I was checking an episode in one of the podcasts I listen to,
which involved going to the link
-- and Firefox froze, on a simple RSS link. I restarted and tried
again -- another freeze. I'd finally found the Rosetta stone,
something that hung Firefox every time. Now I could do some serious testing!
I'd had friends try this using the same version of Firefox and Ubuntu,
without seeing a freeze. Was it something about my user environment?
I created a new user, switched to another virtual console (Ctrl-Alt-F2)
and logged in as my new user, then ran X. This was a handy way to test:
I could get to my normal user's X session in Ctrl-Alt-F7, while the new
user's X session was on Ctrl-Alt-F8. Since I don't have Gnome or KDE
installed on this machine, the new user came up with a default Openbox
session. It came up at the wrong resolution -- the X11 in the newest
Linux distros apparently doesn't read the HDMI monitor properly --
but I wasn't worried about that.
And when I ran Firefox as the new user (letting it create a new profile)
and middlemouse-pasted the BBC RSS URL, it loaded it, without freezing.
Now we're getting somewhere.
Now I knew it was something about my user environment.
I tried copying all of ~/.config from my user to the new user. No hang.
I tried various other configuration files. Still no hang.
The X initialization
I'll skip some steps here, and just mention that in trying to fix the
resolution problem, so I didn't have to do all my debugging at 1024x768,
I discovered that if I used my .xinitrc file to start X, I'd get a freezy
Firefox. If I didn't use my .xinitrc, and defaulted to the system one,
Firefox was fine. Even if I removed everything else from my .xinitrc,
and simply ran openbox from it, that was enough to make Firefox hang.
Okay, what was the system doing? I poked around /etc/X11:
it was running /etc/X11/Xsession. I copied that file to my
.xinitrc and started X. No hang.
Xsession does a bunch of things, but one of the main things it does is run
every script in the /etc/X11/Xsession.d directory.
So I made a copy of that directory inside my home directory, and modified
.xinitrc to execute those files instead. Then I started moving them
aside to see which ones made a difference.
And I found it. /etc/X11/Xsession.d/75dbus_dbus-launch was the
file that mattered.
75dbus_dbus-launch takes the name of the program that's
going to be executed -- in this case that was x-session-manager, which
links to /etc/alternatives/x-session-manager, which links to
/usr/bin/openbox-session -- and instead runs
/usr/bin/dbus-launch --exit-with-session x-session-manager.
Now that I knew that, I moved everything aside and made a little
.xinitrc that ran
/usr/bin/dbus-launch --exit-with-session openbox-session.
And Firefox didn't crash.
So it all comes down to dbus. I was already running dbus: ps shows
/usr/bin/dbus-daemon --system running -- and that worked fine
for everything dbussy I normally do, like run "gimp image.jpg" and
have it open in my already running GIMP.
But on Ringtail and Sid, that isn't enough for Firefox. For some
reason, on these newer systems, Firefox requires a second
dbus daemon -- it shows up in ps as
/usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
-- for the X session. If it doesn't have that, it's fine for a while,
and then, hours later, it will mysteriously freeze while waiting for
a network resource.
Why? I have no idea. No one I've asked seems to know anything about
how dbus works, the difference between system and session dbus daemons,
or why any of it it would have this effect on Firefox.
I filed a Firefox bug,
though I don't have much hope of anyone being interested in a bug
that only affects Linux users using nonstandard X sessions.
But maybe I'm not the only one. If your Firefox is hanging and
you found your way here, I hope I've given you some ideas.
And if anyone has a clue as to what's really happening and why
dbus would have that effect, I'd love to hear from you.
[ 20:08 Jun 09, 2013
More linux |
permalink to this entry |
Thu, 09 Dec 2010
My article this week on Linux Planet concerns
Linux Server Troubleshooting (part 2).
It's two loosely related topics: exploring the /proc filesystem, and
how to use it to find information on a running process; and several
ways to get stack traces from Python programs.
This (as well as
part I) arose from a problem we had at work, where we use
Linux plug computers (ARM-based Linux appliances) running Python
scripts. It's not uncommon for Python networking scripts to go into
never-never-land, waiting forever on a network connection without
timing out. Since plug computers tend not to be outfitted with the
latest and greatest tools like gdb and debug versions of libraries,
we've needed to find more creative ways of figuring out what
processes are doing to make sure our programs are ready for anything.
[ 11:44 Dec 09, 2010
More linux |
permalink to this entry |
Wed, 24 Nov 2010
How do you troubleshoot a process that's running away, sucking up
too much CPU, or not doing anything at all?
Today on Linux Planet:
Linux Servers: top and Other Basic System Tools
This is part I, covering basics like top, strace and gdb.
Part II will get into hairier stuff and tips for debugging Python
[ 21:06 Nov 24, 2010
More linux |
permalink to this entry |