Shallow Thoughts : tags : cmdline

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Tue, 23 May 2017

Python help from the shell -- greppable and saveable

I'm working on a project involving PyQt5 (on which, more later). One of the problems is that there's not much online documentation, and it's hard to find out details like what signals (events) each widget offers.

Like most Python packages, there is inline help in the source, which means that in the Python console you can say something like

>>> from PyQt5.QtWebEngineWidgets import QWebEngineView
>>> help(QWebEngineView)
The problem is that it's ordered alphabetically; if you want a list of signals, you need to read through all the objects and methods the class offers to look for a few one-liners that include "unbound PYQT_SIGNAL".

If only there was a way to take help(CLASSNAME) and pipe it through grep!

A web search revealed that plenty of other people have wished for this, but I didn't see any solutions. But when I tried running python -c "help(list)" it worked fine -- help isn't dependent on the console.

That means that you should be able to do something like

python -c "from sys import exit; help(exit)"

Sure enough, that worked too.

From there it was only a matter of setting up a zsh function to save on complicated typing. I set up separate aliases for python2, python3 and whatever the default python is. You can get help on builtins (pythonhelp list) or on objects in modules (pythonhelp sys.exit). The zsh suffixes :r (remove extension) and :e (extension) came in handy for separating the module name, before the last dot, and the class name, after the dot.

# Python help functions. Get help on a Python class in a
# format that can be piped through grep, redirected to a file, etc.
# Usage: pythonhelp [module.]class [module.]class ...
pythonXhelp() {
    for f in $*; do
        if [[ $f =~ '.*\..*' ]]; then
            s="from ${module} import ${obj}; help($obj)"
        $python -c $s
alias pythonhelp="pythonXhelp python"
alias python2help="pythonXhelp python2"
alias python3help="pythonXhelp python3"

So now I can type

python3help PyQt5.QtWebEngineWidgets.QWebEngineView | grep PYQT_SIGNAL
and get that list of signals I wanted.

Tags: , ,
[ 14:12 May 23, 2017    More programming | permalink to this entry | comments ]

Fri, 31 Mar 2017

Show mounted filesystems

Used to be that you could see your mounted filesystems by typing mount or df. But with modern Linux kernels, all sorts are implemented as virtual filesystems -- proc, /run, /sys/kernel/security, /dev/shm, /run/lock, /sys/fs/cgroup -- I have no idea what most of these things are except that they make it much more difficult to answer questions like "Where did that ebook reader mount, and did I already unmount it so it's safe to unplug it?" Neither mount nor df has a simple option to get rid of all the extraneous virtual filesystems and only show real filesystems. oints-filtering-non-interesting-types had some suggestions that got me started:

mount -t ext3,ext4,cifs,nfs,nfs4,zfs
mount | grep -E --color=never  '^(/|[[:alnum:]\.-]*:/)'
Another answer there says it's better to use findmnt --df, but that still shows all the tmpfs entries (findmnt --df | grep -v tmpfs might do the job).

And real mounts are always mounted on a filesystem path starting with /, so you can do mount | grep '^/'.

But it also turns out that mount will accept a blacklist of types as well as a whitelist: -t notype1,notype2... I prefer the idea of excluding a blacklist of filesystem types versus restricting it to a whitelist; that way if I mount something unusual like curlftpfs that I forgot to add to the whitelist, or I mount a USB stick with a filesystem type I don't use very often (ntfs?), I'll see it.

On my system, this was the list of types I had to disable (sheesh!):

mount -t nosysfs,nodevtmpfs,nocgroup,nomqueue,notmpfs,noproc,nopstore,nohugetlbfs,nodebugfs,nodevpts,noautofs,nosecurityfs,nofusectl

df is easier: like findmnt, it excludes most of those filesystem types to begin with, so there are only a few you need to exclude:

df -hTx tmpfs -x devtmpfs -x rootfs

Obviously I don't want to have to type either of those commands every time I want to check my mount list. SoI put this in my .zshrc. If you call mount or df with no args, it applies the filters, otherwise it passes your arguments through. Of course, you could make a similar alias for findmnt.

# Mount and df are no longer useful to show mounted filesystems,
# since they show so much irrelevant crap now.
# Here are ways to clean them up:
mount() {
    if [[ $# -ne 0 ]]; then
        /bin/mount $*

    # Else called with no arguments: we want to list mounted filesystems.
    /bin/mount -t nosysfs,nodevtmpfs,nocgroup,nomqueue,notmpfs,noproc,nopstore,nohugetlbfs,nodebugfs,nodevpts,noautofs,nosecurityfs,nofusectl

df() {
    if [[ $# -ne 0 ]]; then
        /bin/df $*

    # Else called with no arguments: we want to list mounted filesystems.
    /bin/df -hTx tmpfs -x devtmpfs -x rootfs

Update: Chris X Edwards suggests lsblk or lsblk -o 'NAME,MOUNTPOINT'. it wouldn't have solved my problem because it only shows /dev devices, not virtual filesystems like sshfs, but it's still a command worth knowing about.

Tags: ,
[ 12:25 Mar 31, 2017    More linux | permalink to this entry | comments ]

Sat, 01 Oct 2016

Zsh magic: remove all raw photos that don't have a corresponding JPEG

Lately, when shooting photos with my DSLR, I've been shooting raw mode but with a JPEG copy as well. When I triage and label my photos (with pho and metapho), I use only the JPEG files, since they load faster and there's no need to index both. But that means that sometimes I delete a .jpg file while the huge .cr2 raw file is still on my disk.

I wanted some way of removing these orphaned raw files: in other words, for every .cr2 file that doesn't have a corresponding .jpg file, delete the .cr2.

That's an easy enough shell function to write: loop over *.cr2, change the .cr2 extension to .jpg, check whether that file exists, and if it doesn't, delete the .cr2.

But as I started to write the shell function, it occurred to me: this is just the sort of magic trick zsh tends to have built in.

So I hopped on over to #zsh and asked, and in just a few minutes, I had an answer:

rm *.cr2(e:'[[ ! -e ${REPLY%.cr2}.jpg ]]':)

Yikes! And it works! But how does it work? It's cheating to rely on people in IRC channels without trying to understand the answer so I can solve the next similar problem on my own.

Most of the answer is in the zshexpn man page, but it still took some reading and jumping around to put the pieces together.

First, we take all files matching the initial wildcard, *.cr2. We're going to apply to them the filename generation code expression in parentheses after the wildcard. (I think you need EXTENDED_GLOB set to use that sort of parenthetical expression.)

The variable $REPLY is set to the filename the wildcard expression matched; so it will be set to each .cr2 filename, e.g. img001.cr2.

The expression ${REPLY%.cr2} removes the .cr2 extension. Then we tack on a .jpg: ${REPLY%.cr2}.jpg. So now we have img001.jpg.

[[ ! -e ${REPLY%.cr2}.jpg ]] checks for the existence of that jpg filename, just like in a shell script.

So that explains the quoted shell expression. The final, and hardest part, is how to use that quoted expression. That's in section 14.8.7 Glob Qualifiers. (estring) executes string as shell code, and the filename will be included in the list if and only if the code returns a zero status.

The colons -- after the e and before the closing parenthesis -- are just separator characters. Whatever character immediately follows the e will be taken as the separator, and anything from there to the next instance of that separator (the second colon, in this case) is taken as the string to execute. Colons seem to be the character to use by convention, but you could use anything. This is also the part of the expression responsible for setting $REPLY to the filename being tested.

So why the quotes inside the colons? They're because some of the substitutions being done would be evaluated too early without them: "Note that expansions must be quoted in the string to prevent them from being expanded before globbing is done. string is then executed as shell code."

Whew! Complicated, but awfully handy. I know I'll have lots of other uses for that.

One additional note: section 14.8.5, Approximate Matching, in that manual page caught my eye. zsh can do fuzzy matches! I can't think offhand what I need that for ... but I'm sure an idea will come to me.

Tags: , , ,
[ 15:28 Oct 01, 2016    More linux/cmdline | permalink to this entry | comments ]

Fri, 15 May 2015

Of file modes, umasks and fmasks, and mounting FAT devices

I have a bunch of devices that use VFAT filesystems. MP3 players, camera SD cards, SD cards in my Android tablet. I mount them through /etc/fstab, and the files always look executable, so when I ls -f them, they all have asterisks after their names. I don't generally execute files on these devices; I'd prefer the files to have a mode that doesn't make them look executable.

I'd like the files to be mode 644 (or 0644 in most programming languages, since it's an octal, or base 8, number). 644 in binary is 110 100 100, or as the Unix ls command puts it, rw-r--r--.

There's a directive, fmask, that you can put in fstab entries to control the mode of files when the device is mounted. (Here's Wikipedia's long umask article.) But how do you get from the mode you want the files to be, 644, to the mask?

The mask (which corresponds to the umask command) represent the bits you don't want to have set. So, for instance, if you don't want the world-execute bit (1) set, you'd put 1 in the mask. If you don't want the world-write bit (2) set, as you likely don't, put 2 in the mask. So that's already a clue that I'm going to want the rightmost byte to be 3: I don't want files mounted from my MP3 player to be either world writable or executable.

But I also don't want to have to puzzle out the details of all nine bits every time I set an fmask. Isn't there some way I can take the mode I want the files to be -- 644 -- and turn them into the mask I'd need to put in /etc/fstab or set as a umask?

Fortunately, there is. It seemed like it ought to be straightforward, but it took a little fiddling to get it into a one-line command I can type. I made it a shell function in my .zshrc:

# What's the complement of a number, e.g. the fmask in fstab to get
# a given file mode for vfat files? Sample usage: invertmask 755
invertmask() {
    python -c "print '0%o' % (~(0777 & 0$1) & 0777)"

This takes whatever argument I give to it -- $1 -- and takes only the three rightmost bytes from it, (0777 & 0$1). It takes the bitwise NOT of that, ~. But the result of that is a negative number, and we only want the three rightmost bytes of the result, (result) & 0777, expressed as an octal number -- which we can do in python by printing it as %o. Whew!

Here's a shorter, cleaner looking alias that does the same thing, though it's not as clear about what it's doing:

invertmask1() {
    python -c "print '0%o' % (0777 - 0$1)"

So now, for my MP3 player I can put this in /etc/fstab:

UUID=0000-009E /mp3 vfat user,noauto,exec,fmask=133,shortname=lower 0 0

Tags: ,
[ 10:27 May 15, 2015    More linux/cmdline | permalink to this entry | comments ]

Thu, 19 Feb 2015

Finding core dump files

Someone on the SVLUG list posted about a shell script he'd written to find core dumps.

It sounded like a simple task -- just locate core | grep -w core, right? I mean, any sensible packager avoids naming files or directories "core" for just that reason, don't they?

But not so: turns out in the modern world, insane numbers of software projects include directories called "core", including projects that are developed primarily on Linux so you'd think they would avoid it ... even the kernel. On my system, locate core | grep -w core | wc -l returned 13641 filenames.

Okay, so clearly that isn't working. I had to agree with the SVLUG poster that using "file" to find out which files were actual core dumps is now the only reliable way to do it. The output looks like this:

$ file core
core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), too many program headers (375)

The poster was using a shell script, but I was fairly sure it could be done in a single shell pipeline. Let's see: you need to run locate to find any files with 'core" in the name.

Then you pipe it through grep to make sure the filename is actually core: since locate gives you a full pathname, like /lib/modules/3.14-2-686-pae/kernel/drivers/edac/edac_core.ko or /lib/modules/3.14-2-686-pae/kernel/drivers/memstick/core, you want lines where only the final component is core -- so core has a slash before it and an end-of-line (in grep that's denoted by a dollar sign, $) after it. So grep '/core$' should do it.

Then take the output of that locate | grep and run file on it, and pipe the output of that file command through grep to find the lines that include the phrase 'core file'.

That gives you lines like

/home/akkana/geology/NorCal/pinnaclesGIS/core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), too many program headers (523)

But those lines are long and all you really need are the filenames; so pass it through sed to get rid of anything to the right of "core" followed by a colon.

Here's the final command:

file `locate core | grep '/core$'` | grep 'core file' | sed 's/core:.*//'

On my system that gave me 11 files, and they were all really core dumps. I deleted them all.

Tags: ,
[ 12:54 Feb 19, 2015    More linux | permalink to this entry | comments ]

Sun, 07 Sep 2014

Dot Reminders

I read about cool computer tricks all the time. I think "Wow, that would be a real timesaver!" And then a week later, when it actually would save me time, I've long since forgotten all about it.

After yet another session where I wanted to open a frequently opened file in emacs and thought "I think I made a bookmark for that a while back", but then decided it's easier to type the whole long pathname rather than go re-learn how to use emacs bookmarks, I finally decided I needed a reminder system -- something that would poke me and remind me of a few things I want to learn.

I used to keep cheat sheets and quick reference cards on my desk; but that never worked for me. Quick reference cards tend to be 50 things I already know, 40 things I'll never care about and 4 really great things I should try to remember. And eventually they get burned in a pile of other papers on my desk and I never see them again.

My new system is working much better. I created a file in my home directory called .reminders, in which I put a few -- just a few -- things I want to learn and start using regularly. It started out at about 6 lines but now it's grown to 12.

Then I put this in my .zlogin (of course, you can do this for any shell, not just zsh, though the syntax may vary):

if [[ -f ~/.reminders ]]; then
  cat ~/.reminders

Now, in every login shell (which for me is each new terminal window I create on my desktop), I see my reminders. Of course, I don't read them every time; but I look at them often enough that I can't forget the existence of great things like emacs bookmarks, or diff <(cmd1) <(cmd2).

And if I forget the exact keystroke or syntax, I can always cat ~/.reminders to remind myself. And after a few weeks of regular use, I finally have internalized some of these tricks, and can remove them from my .reminders file.

It's not just for tech tips, either; I've used a similar technique for reminding myself of hard-to-remember vocabulary words when I was studying Spanish. It could work for anything you want to teach yourself.

Although the details of my .reminders are specific to Linux/Unix and zsh, of course you could use a similar system on any computer. If you don't open new terminal windows, you can set a reminder to pop up when you first log in, or once a day, or whatever is right for you. The important part is to have a small set of tips that you see regularly.

Tags: , ,
[ 21:10 Sep 07, 2014    More tech | permalink to this entry | comments ]

Tue, 02 Sep 2014

Using strace to find configuration file locations

I was using strace to figure out how to set up a program, lftp, and a friend commented that he didn't know how to use it and would like to learn. I don't use strace often, but when I do, it's indispensible -- and it's easy to use. So here's a little tutorial.

My problem, in this case, was that I needed to find out what configuration file I needed to modify in order to set up an alias in lftp. The lftp man page tells you how to define an alias, but doesn't tell you how to save it for future sessions; apparently you have to edit the configuration file yourself.

But where? The man page suggested a couple of possible config file locations -- ~/.lftprc and ~/.config/lftp/rc -- but neither of those existed. I wanted to use the one that already existed. I had already set up bookmarks in lftp and it remembered them, so it must have a config file already, somewhere. I wanted to find that file and use it.

So the question was, what files does lftp read when it starts up? strace lets you snoop on a program and see what it's doing.

strace shows you all system calls being used by a program. What's a system call? Well, it's anything in section 2 of the Unix manual. You can get a complete list by typing: man 2 syscalls (you may have to install developer man pages first -- on Debian that's the manpages-dev package). But the important thing is that most file access calls -- open, read, chmod, rename, unlink (that's how you remove a file), and so on -- are system calls.

You can run a program under strace directly:

$ strace lftp sitename
Interrupt it with Ctrl-C when you've seen what you need to see.

Pruning the output

And of course, you'll see tons of crap you're not interested in, like rt_sigaction(SIGTTOU) and fcntl64(0, F_GETFL). So let's get rid of that first. The easiest way is to use grep. Let's say I want to know every file that lftp opens. I can do it like this:

$ strace lftp sitename |& grep open

I have to use |& instead of just | because strace prints its output on stderr instead of stdout.

That's pretty useful, but it's still too much. I really don't care to know about strace opening a bazillion files in /usr/share/locale/en_US/LC_MESSAGES, or libraries like /usr/lib/i386-linux-gnu/

In this case, I'm looking for config files, so I really only want to know which files it opens in my home directory. Like this:

$ strace lftp sitename |& grep 'open.*/home/akkana'

In other words, show me just the lines that have either the word "open" or "read" followed later by the string "/home/akkana".

Digression: grep pipelines

Now, you might think that you could use a simpler pipeline with two greps:

$ strace lftp sitename |& grep open | grep /home/akkana

But that doesn't work -- nothing prints out. Why? Because grep, under certain circumstances that aren't clear to me, buffers its output, so in some cases when you pipe grep | grep, the second grep will wait until it has collected quite a lot of output before it prints anything. (This comes up a lot with tail -f as well.) You can avoid that with

$ strace lftp sitename |& grep --line-buffered open | grep /home/akkana
but that's too much to type, if you ask me.

Back to that strace | grep

Okay, whichever way you grep for open and your home directory, it gives:

open("/home/akkana/.local/share/lftp/bookmarks", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.netrc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/akkana/.local/share/lftp/rl_history", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.inputrc", O_RDONLY|O_LARGEFILE) = 5
Now we're getting somewhere! The file where it's getting its bookmarks is ~/.local/share/lftp/bookmarks -- and I probably can't use that to set my alias.

But wait, why doesn't it show lftp trying to open those other config files?

Using script to save the output

At this point, you might be sick of running those grep pipelines over and over. Most of the time, when I run strace, instead of piping it through grep I run it under script to save the whole output.

script is one of those poorly named, ungoogleable commands, but it's incredibly useful. It runs a subshell and saves everything that appears in that subshell, both what you type and all the output, in a file.

Start script, then run lftp inside it:

$ script /tmp/lftp.strace
Script started on Tue 26 Aug 2014 12:58:30 PM MDT
$ strace lftp sitename

After the flood of output stops, I type Ctrl-D or Ctrl-C to exit lftp, then another Ctrl-D to exit the subshell script is using. Now all the strace output was in /tmp/lftp.strace and I can grep in it, view it in an editor or anything I want.

So, what files is it looking for in my home directory and why don't they show up as open attemps?

$ grep /home/akkana /tmp/lftp.strace

Ah, there it is! A bunch of lines like this:

access("/home/akkana/.lftprc", R_OK)    = -1 ENOENT (No such file or directory)
stat64("/home/akkana/.lftp", 0xbff821a0) = -1 ENOENT (No such file or directory)
mkdir("/home/akkana/.config", 0755)     = -1 EEXIST (File exists)
mkdir("/home/akkana/.config/lftp", 0755) = -1 EEXIST (File exists)
access("/home/akkana/.config/lftp/rc", R_OK) = 0

So I should have looked for access and stat as well as open. Now I have the list of files it's looking for. And, curiously, it creates ~/.config/lftp if it doesn't exist already, even though it's not going to write anything there.

So I created ~/.config/lftp/rc and put my alias there. Worked fine. And I was able to edit my bookmark in ~/.local/share/lftp/bookmarks later when I had a need for that. All thanks to strace.

Tags: , ,
[ 13:06 Sep 02, 2014    More linux/cmdline | permalink to this entry | comments ]

Thu, 28 Aug 2014

Debugging a mysterious terminal setting

For the last several months, I repeatedly find myself in a mode where my terminal isn't working quite right. In particular, Ctrl-C doesn't work to interrupt a running program. It's always in a terminal where I've been doing web work. The site I'm working on sadly has only ftp access, so I've been using ncftp to upload files to the site, and git and meld to do local version control on the copy of the site I keep on my local machine. I was pretty sure the problem was coming from either git, meld, or ncftp, but I couldn't reproduce it.

Running reset fixed the problem. But since I didn't know what program was causing the problem, I didn't know when I needed to type reset.

The first step was to find out which of the three programs was at fault. Most of the time when this happened, I wouldn't notice until hours later, the next time I needed to stop a program with Ctrl-C. I speculated that there was probably some way to make zsh run a check after every command ... if I could just figure out what to check.

Terminal modes and stty -a

It seemed like my terminal was getting put into raw mode. In programming lingo, a terminal is in raw mode when characters from it are processed one at a time, and special characters like Ctrl-C, which would normally interrupt whatever program is running, are just passed like any other character.

You can list your terminal modes with stty -a:

$ stty -a
speed 38400 baud; rows 32; columns 80; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = ;
eol2 = ; swtch = ; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread -clocal -crtscts
ignbrk -brkint ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff
-iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig icanon -iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke

But that's a lot of information. Unfortunately there's no single flag for raw mode; it's a collection of a lot of flags. I checked the interrupt character: yep, intr = ^C, just like it should be. So what was the problem?

I saved the output with stty -a >/tmp/stty.bad, then I started up a new xterm and made a copy of what it should look like with stty -a >/tmp/stty.good. Then I looked for differences: meld /tmp/stty.good /tmp/stty.bad. I saw these flags differing in the bad one: ignbrk ignpar -iexten -ixon, while the good one had -ignbrk -ignpar iexten ixon. So I should be able to run:

$ stty -ignbrk -ignpar iexten ixon
and that would fix the problem. But it didn't. Ctrl-C still didn't work.

Setting a trap, with precmd

However, knowing some things that differed did give me something to test for in the shell, so I could test after every command and find out exactly when this happened. In zsh, you do that by defining a precmd function, so here's what I did:

    stty -a | fgrep -- -ignbrk > /dev/null
    if [ $? -ne 0 ]; then
        echo "STTY SETTINGS HAVE CHANGED \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!"
Pardon all the exclams. I wanted to make sure I saw the notice when it happened.

And this fairly quickly found the problem: it happened when I suspended ncftp with Ctrl-Z.

stty sane and isig

Okay, now I knew the culprit, and that if I switched to a different ftp client the problem would probably go away. But I still wanted to know why my stty command didn't work, and what the actual terminal difference was.

Somewhere in my web searching I'd stumbled upon some pages suggesting stty sane as an alternative to reset. I tried it, and it worked.

According to man stty, stty sane is equivalent to

$ stty cread -ignbrk brkint -inlcr -igncr icrnl -iutf8 -ixoff -iuclc -ixany  imaxbel opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke

Eek! But actually that's helpful. All I had to do was get a bad terminal (easy now that I knew ncftp was the culprit), then try:

$ stty cread 
$ stty -ignbrk 
$ stty brkint
... and so on, trying Ctrl-C each time to see if things were back to normal. Or I could speed up the process by grouping them:
$ stty cread -ignbrk brkint
$ stty -inlcr -igncr icrnl -iutf8 -ixoff
... and so forth. Which is what I did. And that quickly narrowed it down to isig. I ran reset, then ncftp again to get the terminal in "bad" mode, and tried:
$ stty isig
and sure enough, that was the difference.

I'm still not sure why meld didn't show me the isig difference. But if nothing else, I learned a bit about debugging stty settings, and about stty sane, which is a much nicer way of resetting the terminal than reset since it doesn't clear the screen.

Tags: , ,
[ 15:41 Aug 28, 2014    More linux | permalink to this entry | comments ]

Sat, 28 Dec 2013

Finding filenames in a disorganized directory

I've been scanning a bunch of records with Audacity (using as a guide Carla Schroder's excellent Book of Audacity and a Behringer UCA222 USB audio interface -- audacity doesn't seem able to record properly from the built-in sound card on any laptop I own, while it works fine with the Behringer.

Audacity's user interface isn't great for assembly-line recording of lots of tracks one after the other, especially on a laptop with a trackpad that doesn't work very well, so I wasn't always as organized with directory names as I could have been, and I ended up with a mess. I was periodically backing up the recordings to my desktop, but as I shifted from everything-in-one-directory to an organized system, the two directories got out of sync.

To get them back in sync, I needed a way to answer this question: is every file inside directory A (maybe in some subdirectory of it) also somewhere under subdirectory B? In other words, can I safely delete all of A knowing that anything in it is safely stored in B, even though the directory structures are completely different?

I was hoping for some clever find | xargs way to do it, but came up blank. So eventually I used a little zsh loop: one find to get the list of files to test, then for each of those, another find inside the target directory, then test the exit code of find to see if it found the file. (I'm assuming that if the songname.aup file is there, the songname_data directory is too.)

for fil in $(find AAA/ -name '*.aup'); do
  fil=$(basename $fil)
  find BBB -name $fil >/dev/null
  if [[ $? != 0 ]]; then
    echo $fil is not in BBB

Worked fine. But is there an easier way?

Tags: , , ,
[ 10:36 Dec 28, 2013    More linux/cmdline | permalink to this entry | comments ]

Mon, 07 Oct 2013

Viewing HTML mail messages from Mutt (or other command-line mailers)

Command-line mailers like mutt have one disadvantage: viewing HTML mail with embedded images. Without images, HTML mail is no problem -- run it through lynx, links or w3m. But if you want to see images in place, how do you do it?

Mutt can send a message to a browser like firefox ... but only the textual part of the message. The images don't show up.

That's because mail messages include images, not as separate files, but as attachments within the same file, encoded it a format known as MIME (Multipurpose Internet Mail Extensions). An image link in the HTML, instead of looking like <img src="picture.jpg">., will instead look something like <img src="cid:0635428E-AE25-4FA0-93AC-6B8379300161">. (Apple's or <img src="">. (Yahoo's webmail).

CID stands for Content ID, and refers to the ID of the image as it is encoded in MIME inside the image. GUI mail programs, of course, know how to decode this and show the image. Mutt doesn't.

A web search finds a handful of shell scripts that use the munpack program (part of the mpack package on Debian systems) to split off the files; then they use various combinations of sed and awk to try to view those files. Except that none of the scripts I found actually work for messages sent from modern mailers -- they don't decode the CID links properly.

I wasted several hours fiddling with various shell scripts, trying to adjust sed and awk commands to figure out the problem, when I had the usual epiphany that always eventually arises from shell script fiddling: "Wouldn't this be a lot easier in Python?"

Python's email package

Python has a package called email that knows how to list and unpack MIME attachments. Starting from the example near the bottom of that page, it was easy to split off the various attachments and save them in a temp directory. The key is

import email

fp = open(msgfile)
msg = email.message_from_file(fp)

for part in msg.walk():

That left the problem of how to match CIDs with filenames, and rewrite the links in the HTML message accordingly.

The documentation on the email package is a bit unclear, unfortunately. For instance, they don't give any hints what object you'll get when iterating over a message with walk, and if you try it, they're just type 'instance'. So what operations can you expect are legal on them? If you run help(part) in the Python console on one of the parts you get from walk, it's generally class Message, so you can use the Message API, with functions like get_content_type(), get_filename(). and get_payload().

More useful, it has dictionary keys() for the attributes it knows about each attachment. part.keys() gets you a list like

 'Content-Disposition' ]

So by making a list relating part.get_filename() (with a made-up filename if it doesn't have one already) to part['Content-ID'], I'd have enough information to rewrite those links.

Case-insensitive dictionary matching

But wait! Not so simple. That list is from a Yahoo mail message, but if you try keys() on a part sent by Apple mail, instead if will be 'Content-Id'. Note the lower-case d, Id, instead of the ID that Yahoo used.

Unfortunately, Python doesn't have a way of looking up items in a dictionary with the key being case-sensitive. So I used a loop:

    for k in part.keys():
        if k.lower() == 'content-id':
            print "Content ID is", part[k]

Most mailers seem to put angle brackets around the content id, so that would print things like "Content ID is <>". Those angle brackets have to be removed, since the CID links in the HTML file don't have them.

for k in part.keys():
    if k.lower() == 'content-id':
        if part[k].startswith('<') and part[k].endswith('>'):
            part[k] = part[k][1:-1]

But that didn't work -- the angle brackets were still there, even though if I printed part[k][1:-1] it printed without angle brackets. What was up?

Unmutable parts inside email.Message

It turned out that the parts inside an email Message (and maybe the Message itself) are unmutable -- you can't change them. Python doesn't throw an exception; it just doesn't change anything. So I had to make a local copy:

for k in part.keys():
    if k.lower() == 'content-id':
        content_id = part[k]
        if content_id.startswith('<') and content_id.endswith('>'):
            content_id = content_id[1:-1]
and then save content_id, not part[k], in my list of filenames and CIDs.

Then the rest is easy. Assuming I've built up a list called subfiles containing dictionaries with 'filename' and 'Content-Id', I can do the substitution in the HTML source:

    htmlsrc = html_part.get_payload(decode=True)
    for sf in subfiles:
        htmlsrc = re.sub('cid: ?' + sf['Content-Id'],
                         'file://' + sf['filename'],
                         htmlsrc, flags=re.IGNORECASE)

Then all I have to do is hook it up to a key in my .muttrc:

# macro  index  <F10>  "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/\n" "View HTML in browser"
# macro  pager  <F10>  "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/\n" "View HTML in browser"

Works nicely! Here's the complete script: viewhtmlmail.

Tags: , , , , ,
[ 11:49 Oct 07, 2013    More tech/email | permalink to this entry | comments ]

Sat, 24 Aug 2013

A nifty shell redirection trick: process substitution

I love shell pipelines, and flatter myself that I'm pretty good at them. But a discussion last week on the Linuxchix Techtalk mailing list on finding added lines in a file turned up a terrific bash/zsh shell redirection trick I'd never seen before:

join -v 2 <(sort A.txt) <(sort B.txt)

I've used backquotes, and their cognate $(), plenty. For instance, you can do things like PS1=$(hostname): or PS1=`hostname`: to set your prompt to the current hostname: the shell runs the hostname command, takes its output, and substitutes that output in place of the backquoted or parenthesized expression.

But I'd never seen that <(...) trick before, and immediately saw how useful it was. Backquotes or $() let you replace arguments to a command with a program's output -- they're great for generating short strings for programs that take all their arguments on the command line. But they're no good for programs that need to read a file, or several files. <(...) lets you take the output of a command and pass it to a program as though it was the contents of a file. And if you can do it more than once in the same command -- as in Little Girl's example -- that could be tremendously useful.

Playing with it to see if it really did what it looked like it did, and what other useful things I could do with it, I tried this (and it worked just fine):

$ diff <(echo hello; echo there) <(echo hello; echo world)
< there
> world
It acts as though I had two files, which each have "hello" as their first line; but one has "there" as the second line, while the other has "world". And diff shows the difference. I don't think there's any way of doing anything like that with backquotes; you'd need to use temp files.

Of course, I wanted to read more about it -- how have I gone all these years without knowing about this? -- and it looks like I'm not the only one who didn't know about it. In fact, none of the pages I found on shell pipeline tricks even mentioned it.

It turns out it's called "process substitution" and I found it documented in Chapter 23 of the Advanced Bash-Scripting Guide.

I tweeted it, and a friend who is a zsh master gave me some similar cool tricks. For instance, in zsh echo hi > >(cat) > >(cat -n) lets you pipe the output of a command to more than one other command.

That's zsh, but in bash (or zsh too, of course), you can use >() and tee to do the same thing: echo hi | tee >(cat) | cat -n

If you want a temp file to be created automatically, one you can both read and write, you can use =(foo) (zsh only?)

Great stuff! Some other pages that discuss some of these tricks:

Tags: , , ,
[ 19:23 Aug 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 24 Jul 2013

Yet more on that comma-inserting regexp, plus a pattern to filter unprintable characters

One more brief followup on that comma inserting sed pattern and its followup:

$ echo 20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'

In the second article, I'd mentioned that the hardest part of the exercise was figuring out where we needed backslashes. Devdas (f3ew) asked on Twitter whether I would still need all the backslash escapes even if I put the pattern in a file -- in other worse, are the backslashes merely to get the shell to pass special characters unchanged?

A good question, and I suspected the need for some of the backslashes would disappear. So I tried this:

$ echo ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   
$ echo 20130607215015 | sed -f /tmp/commas

And it didn't work. No commas were inserted.

The problem, it turns out, is that my shell, zsh, changed both instances of \b to an ASCII backspace, ^H. Editing the file fixes that, and so does

$ echo -E ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   

But that only applies to echo: zsh doesn't do the \b -> ^H substitution in the original command, where you pass the string directly as a sed argument.

Okay, with that straightened out, what about Devdas' question?

Surprisingly, it turns out that all the backslashes are still needed. None of them go away when you echo > file, so they weren't there just to get special characters past the shell; and if you edit the file and try removing some of the backslashes, you'll see that the pattern no longer works. I had thought at least some of them, like the ones before the \{ \}, were extraneous, but even those are still needed.

Filtering unprintable characters

As long as I'm writing about regular expressions, I learned a nice little tidbit last week. I'm getting an increasing flood of Asian-language spams which my mail ISP doesn't filter out (they use spamassassin, which is pretty useless for this sort of filtering). I wanted a simple pattern I could pass to egrep (via procmail) that would filter out anything with a run of more than 4 unprintable characters in a row. [^[:print:]]{4,} should do it, but it wasn't working.

The problem, it turns out, is the definition of what's printable. Apparently when the default system character set is UTF-8, just about everything is considered printable! So the trick is that you need to set LC_ALL to something more restrictive, like C (which basically means ASCII) to before :print: becomes useful for language-based filtering. (Thanks to Mikachu for spotting the problem).

So in a terminal, you can do something like

LC_ALL=C egrep -v '[^[:print:]]' filename

In procmail it was a little harder; I couldn't figure out any way to change LC_ALL from a procmail recipe; the only solution I came up with was to add this to ~/.procmailrc:

export LC_ALL=C

It does work, though, and has cut the spam load by quite a bit.

Tags: , , , ,
[ 19:35 Jul 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Tue, 09 Jul 2013

Sed: insert commas into numbers, but in a smarter way

A few days ago I wrote about a nifty sed script to insert commas into numbers that I dissected with the help of Dana Jansens.

Once we'd figured it out, though, Dana thought this wasn't really the best solution. For instance, what if you have a file that has some numbers in it, but also has some digits mixed up with letters? Do you really want to insert commas into every string of digits? What if you have some license plates, like abc1234? Maybe it would be better to restrict the change to digits that stand by themselves and are obviously meant to be numbers. How much harder would that be?

More regexp fun! We kicked it around a bit, and came up with a solution:

$ echo abc20130607215015 | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'
$ echo abc20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'
$ echo 20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'   

Breaking that down: \b is any word boundary -- you could also use \< to indicate that it's the start of a word, much like \> was the end of a word.

\([0-9]\+\) is any string of one or more digits, taken as a group. The \( \) part marks it as a group so we'll be able to use it later.

\([0-9]\{3\}\) is a string of exactly three digits: again, we're using \( \) to mark it as our second numbered group.

\b is another word boundary (we could use \>), to indicate that the group of three digits must come at the end of a word, with only whitespace or punctuation following it.

/\1,\2/: once we've matched the pattern -- a word break, one or more digits, three digits and another word break -- we'll replace it with this. \1 matches the first group we found -- that was the string of one or more digits. \2 matches the second group, the final trio of digits. And there's a comma in between. We use the same :a; ;ta trick as in the first example to loop around until there are no more triplets to match.

The hardest part of this was figuring out what needed to be escaped with backslashes. The one that really surprised me was the \+. Although * works in sed the same way it does in other programs, matching zero or more repetitions of the preceding pattern, sed uses \+ rather than + for one or more repetitions. It took us some fiddling to find all the places we needed backslashes.

Tags: , ,
[ 21:16 Jul 09, 2013    More linux/cmdline | permalink to this entry | comments ]

Sun, 07 Jul 2013

Inserting commas into numbers with sed

Carla Schroder's recent article, More Great Linux Awk, Sed, and Bash Tips and Tricks , had a nifty sed command I hadn't seen before to take a long number and insert commas appropriately:

sed -i ':a;s/\B[0-9]\{3\}\gt;/,&/;ta' numbers.txt
. Or, if you don't have a numbers.txt file, you can do something like
echo 20130607215015 | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'
(I dropped the -i since that's for doing in-place edits of a file).

Nice! But why does it work? It would be easy enough to insert commas after every third number, but that doesn't work unless the number of digits is a multiple of three. In other words, you don't want 20130607215015 to become 201,306,072,150,15 (note how the last group only has two digits); it has to count in threes from the right if you want to end up with 20,130,607,215,015.

Carla's article didn't explain it, and neither did any of the other sites I found that mentioned this trick.

So, with some help from regexp wizard Dana Jansens (of OpenBox fame), I've broken it down into more easily understood bits.

Labels and loops

The first thing to understand is that this is actually several sed commands. I was familiar with sed's basic substitute command, s/from/to/. But what's the rest of it? The semicolons separate the commands, so the whole sed script is:


What this does is set up a label called a. It tries to do the substitute command, and if the substitute succeeds (if something was changed), then ta tells it to loop back around to label a, the beginning of the script.

So let's look at that substitute command.

The substitute

Sed's s/from/to/ (like the equivalent command in vim and many other programs) looks for the first instance of the from pattern and replaces it with the to pattern. So we're searching for \B[0-9]\{3\}\> and replacing it with ,&/

Clear as mud, right? Well, the to pattern is easy: & matches whatever we just substituted (from), so this just sticks a comma in front of ... something.

The from pattern, \B[0-9]\{3\}\>, is a bit more challenging. Let's break down the various groups:

Matches anything that is not a word boundary.
Matches any digit.
Matches three repetitions of whatever precedes it (in this case, a digit).
Matches a word boundary at the end of a word. This was the hardest part to figure out, because no sed documentation anywhere bothers to mention this pattern. But Dana knew it as a vim pattern, and it turns out it does the same thing in sed even though the docs don't say so.

Okay, put them together, and the whole pattern matches any three digits that are not preceded by a word boundary but which are at the end of a word (i.e. they're followed by a word boundary).

Cool! So in our test number, 20130607215015, this matches the last three digits, 015. It doesn't match any of the other digits because they're not followed by a word end boundary.

So the substitute will insert a comma before the last three numbers. Let's test that:

$ echo 20130607215015 | sed 's/\B[0-9]\{3\}\>/,&/'

Sure enough!

How the loop works

So the substitution pattern just adds the last comma. Once the comma is inserted, the ta tells sed to go back to the beginning (label :a) and do it again.

The second time, the comma that was just inserted is now a word boundary, so the pattern matches the three digits before the comma, 215, and inserts another comma before them. Let's make sure:

$ echo 20130607215,015 | sed 's/\B[0-9]\{3\}\>/,&/'

So that's how the pattern manages to match triplets from right to left.

Dana later commented that this wasn't really the best solution -- what if the string of digits is attached to other characters and isn't really a number? I'll cover that in a separate article in a few days. Update: Here's the smarter pattern, Sed: insert commas into numbers, but in a smarter way.

Tags: , ,
[ 14:14 Jul 07, 2013    More linux/cmdline | permalink to this entry | comments ]

Tue, 26 Mar 2013

Pasting the Primary X Selection into Firefox

Sometimes I need to take a URL from some text app -- like a shell window, or an IRC comment -- and open it in Firefox.

If it's a standard http://, that's trivial: I highlight the URL with my house (often a doubleclick will do it), go to my Firefox window and middleclick somewhere in the content area, anywhere that's not a link, and Firefox goes to the URL.

That works because selecting anything, in X, copies the selection to the Primary selection buffer. The Primary selection is different from the Clipboard selection that's used with Windows and Mac style Ctrl-X/Ctrl-C/Ctrl-V copy and paste; it's faster and doesn't require switching between keyboard and mouse. Since your hand is already on the mouse (from selecting the text), you don't have to move to the keyboard to type Ctrl-C, then back to the mouse to go to the Firefox window, then back to the keyboard to type Ctrl-V.

But it fails in some cases. Like when someone says in IRC, "There's a great example of that at". You can highlight and middleclick in Firefox all you want, but Firefox doesn't recognize it as a URL and won't go there. Or if I want to highlight a couple of search terms and pass them into a Google search.

(Rant: middlemouse used to work for these cases, but it was disabled -- without even an option for getting it back -- due to a lot of whining in bugzilla by people coming from Windows backgrounds who didn't like middleclick paste because they found it unexpected, yet who weren't willing to turn it off with the middlemouse.contentLoadURL pref).

So in those cases, what I've been doing is:

It works, but it's a lot more steps, and entails several switches between keyboard and mouse. Frustrating!

It would be a little less frustrating if I had a key binding in Firefox that said "Paste the current X primary selection." A web search shows that quite a few other people have been bothered by this problem -- for instance, here and here -- but without any solutions. Apparently in a lot of apps, Ctrl-Insert inserts the Primary X selection -- but in Firefox and a few others, it inserts the Clipboard instead, just like Ctrl-C.

I could write my own fix, by unzipping Firefox's omni.ja file and editing various .xul and .js files inside it. But if I were doing that, I could just as easily revert Firefox's original behavior of going to the link. Neither of these is difficult; the problem is that every time I Firefox updates (which is about twice a week these days), things break until I manually go in and unzip the jar and make my changes again. I used to do that, but I got tired of needing to do it so often. And I tried to do it via a Firefox extension, until Mozilla changed the Firefox extension API so that extensions couldn't modify key bindings any more.

Since Firefox changes so often, it's nicer to have a solution that's entirely outside of Firefox. And a comment in one of those discussion threads gave me an idea: make a key binding in my window manager that uses xset to copy the primary selection to the clipboard, then use my crikey program to insert a fake Ctrl-V that Firefox will see.

Here's a command to do that:

xsel -o -p | xsel -i -b; crikey -s 1 "^V"

xsel -o prints a current X selection, and -p specifies the Primary. xsel -i sets an X selection to whatever it gets on standard input (which in this case will be whatever was in the Primary selection), and -b tells it to set the Clipboard selection.

Then crikey -s 1 "^V" waits one second (I'll probably reduce this after more testing) and then generates an X event for a Ctrl-V.

I bound that command to Ctrl-Insert in my window manager, Openbox, like this:

<keybind key="C-Insert">
  <action name="Execute">
    <execute>/bin/sh -c 'xsel -o -p | xsel -i -b; crikey -s 1 "^V"'</execute>
Openbox didn't seem happy with the pipe, so I wrapped the whole thing in a sh -c.

Now, whenever I type Ctrl-Insert, whatever program I'm in will do a Ctrl-V but insert the Primary selection rather than the Clipboard. It should work in other recalcitrant programs, like LibreOffice, as well. In Firefox, now, I just have to type Ctrl-L Ctrl-Insert Return.

Of course, it's easy enough to make a binding specific to Firefox that does the Ctrl-L and the Return automatically. I've bound that to Alt-Insert, and its execute line looks like this:

    <execute>/bin/sh -c 'xsel -o -p | xsel -i -b; crikey -s 1 "^L^V\\n"'</execute>

Fun with Linux! Now the only hard part will be remembering to use the bindings instead of doing things the hard way.

Tags: , , , ,
[ 20:35 Mar 26, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 15 Aug 2012

Getting ls to show symlinks (and stripping terminal slashes in shells)

The Linux file listing program, ls, has been frustrating me for some time with its ever-changing behavior on symbolic links.

For instance, suppose I have a symlink named Maps that points to a directory on another disk called /data/Maps. If I say ls ~/Maps, I might want to see where the link points:

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
or I might equally want to see the contents of the /data/Maps directory.

Many years ago, the Unix ls program magically seemed to infer when I wanted to see the link and what it points to, versus when I wanted to see the contents of the directory the link points to. I'm not even sure any more what the rule was; just that I was always pleasantly surprised that it did what I wanted. Now, in modern Linux, it usually manages to do the opposite of what I want. But the behavior has changed several times until, I confess, I'm no longer even sure of what I want it to do.

So if I'm not sure whether I usually want it to show the symlink or follow it ... why not make it do both?

There's no ls flag that will do that. But that's okay -- I can make a shell function to do what I want..

Current ls flags

First let's review man ls to see the relevant flags we do have, searching for the string "deref".

I find three different flags to tell ls to dereference a link: -H (dereference any link explicitly mentioned on the command line -- even though ls does that by default); --dereference-command-line-symlink-to-dir (do the same if it's a directory -- even though -H already does that, and even though ls without any flags also already does that); and -L (dereference links even if they aren't mentioned on the command line). The GNU ls maintainers are clearly enamored with dereferencing symlinks.

In contrast, there's one flag, -d, that says not to dereference links (when used in combination with -l). And -d isn't useful in general (you can't make it part of a normal ls alias) because -d also has another, more primary meaning: it also prevents you from listing the contents of normal, non-symlinked directories.

Solution: a shell function

Let's move on to the problem of how to show both the link information and the dereferenced file.

Since there's no ls flag to do it, I'll have to do it by looping over the arguments of my shell function. In a shell test, you can use -h to tell if a file is a symlink. So my first approach was to call ls -ld on all the symlinks to show what the point to:

ll() {
    /bin/ls -laFH $*
    for f in $*; do
        if [[ -h $f ]]; then
            echo -n Symlink:
            /bin/ls -ld $f

Terminally slashed

That worked on a few simple tests. But when I tried to use it for real I hit another snag: terminal slashes.

In real life, I normally run this with autocompletion. I don't type ll ~/Maps -- I'm more likely to type like ll Ma<tab> -- the tab looks for files beginning with Ma and obligingly completes it as Maps/ -- note the slash at the end.

And, well, it turns out /bin/ls -ld Maps/ no longer shows the symlink, but derefernces it instead -- yes, never mind that the man page says -d won't dereference symlinks. As I said, those ls maintainers really love dereferencing.

Okay, so if I want to not dereference, since there's no ls flag that means really don't dereference, I mean it -- my little zsh function needs to find a way of stripping any terminal slash on each directory name. Of course, I could do it with sed:

        f=`echo $f | sed 's/\/$//'`
and that works fine, but ... ick. Surely zsh has a better way?

In fact, there's a better way that even works in bash (thanks to zsh wizard Mikachu for this gem):


That "remove terminal slash" trick has already come in handy in a couple of other shell functions I use -- definitely a useful trick if you use autocompletion a lot.

Making the link line more readable

But wait: one more tweak, as long as I'm tweaking. That long ls -ld line,

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
is way too long and full of things I don't really care about (the permissions, ownership and last-modified date on a symlink aren't very interesting). I really only want the last three words,
Maps -> /data/Maps/

Of course I could use something like awk to get that. But zsh has everything -- I bet it has a clever way to separate words.

And indeed it does: arrays. The documentation isn't very clear and not all the array functions worked as the docs implied, but here's what ended up working: you can set an array variable by using parentheses after the equals sign in a normal variable-setting statement, and after that, you can refer to it using square brackets. You can even use negative indices, like in python, to count back from the end of an array. That made it easy to do what I wanted:

            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[-3,-1]

Hooray zsh! Though it turned out that -3 didn't work for directories with spaces in the name, so I had to use [9, -1] instead. The echo -E is to prevent strange things happening if there are things like backslashes in the filename.

The completed shell function

I moved the symlink-showing function into a separate function, so I can call it from several different ls aliases, and here's the final result:

show_symlinks() {
    for f in $*; do
        # Remove terminal slash.
        if [[ -h $f ]]; then
            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[9,-1]

ll() {
    /bin/ls -laFH $*
    show_symlinks $*

Bash doesn't have arrays like zsh, so replace those two lines with

            echo -n 'Symlink: '
            /bin/ls -ld $f | cut -d ' ' -f 10-
and the rest of the function should work just fine.

Tags: , , ,
[ 20:22 Aug 15, 2012    More linux/cmdline | permalink to this entry | comments ]

Sat, 24 Mar 2012

Find out what processes are making network connections

A thread on the Ubuntu-devel-discuss mailing list last month asked about how to find out what processes are making outgoing network connectsion on a Linux machine. It referenced Ubuntu bug 820895: Log File Viewer does not log "Process Name", which is specific to Ubuntu's iptables logging of apps that are already blocked in iptables ... but the question goes deeper.

Several years ago, my job required me to use a program -- never mind which one -- from a prominent closed-source company. This program was doing various annoying things in addition to its primary task -- operations that got around the window manager and left artifacts all over my screen, operations that potentially opened files other than the ones I asked it to open -- but in addition, I noticed that when I ran the program, the lights on the DSL modem started going crazy. It looked like the program was making network connections, when it had no reason to do that. Was it really doing that?

Unfortunately, at the time I couldn't find any Linux command that would tell me the answer. As mentioned in the above Ubuntu thread, there are programs for Mac and even Windows to tell you this sort of information, but there's no obvious way to find out on Linux.

The discussion ensuing in the ubuntu-devel-discuss thread tossed around suggestions like apparmor and selinux -- massive, complex ways of putting up fortifications your whole system. But nobody seemed to have a simple answer to how to find information about what apps are making network connections.

Well, it turns out there are a a couple ofsimple way to get that list. First, you can use ss:

$ ss -tp
State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port   
ESTAB      0      0                     ::1:58466                  ::1:ircd     users:(("xchat",1063,43))
ESTAB      0      0        users:(("xchat",1063,36))
ESTAB      0      0                     ::1:ircd                   ::1:58466    users:(("bitlbee",1076,10))
ESTAB      0      0        users:(("xchat",1063,24))
ESTAB      0      0   

Update: you might also want to add listening connections where programs are listening for incoming connections: ss -tpla
Though this may be less urgent if you have a firewall in place.

-t shows only TCP connections (so you won't see all the interprocess communication among programs running on your machine). -p prints the process associated with each connection.

ss can do some other useful things, too, like show all the programs connected to your X server right now, or show all your ssh connections. See man ss for examples.

Or you can use netstat:

$ netstat -A inet -p
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 imbrium.timochari:51800 linuxchix.osuosl.o:ircd ESTABLISHED 1063/xchat      
tcp        0      0 imbrium.timochari:59011 ec2-107-21-74-122.:ircd ESTABLISHED 1063/xchat      
tcp        0      0 imbrium.timochari:54253 ESTABLISHED 1063/xchat      
tcp        0      0 imbrium.timochari:58158 s3-1-w.amazonaws.:https ESTABLISHED

In both cases, the input is a bit crowded and hard to read. If all you want is a list of processes making connections, that's easy enough to do with the usual Unix utilities like grep and sed:

$ ss -tp | grep -v Recv-Q | sed -e 's/.*users:(("//' -e 's/".*$//' | sort | uniq
$ netstat -A inet -p | grep '^tcp' | grep '/' | sed 's_.*/__' | sort | uniq

Finally, you can keep an eye on what's going on by using watch to run one of these commands repeatedly:

watch ss -tp

Using watch with one of the pipelines to print only process names is possible, but harder since you have to escape a lot of quotation marks. If you want to do that, I recommend writing a script.

And back to the concerns expressed on the Ubuntu thread, you could also write a script to keep logs of which processes made connections over the course of a day. That's definitely a tool I'll keep in my arsenal.

Tags: , , ,
[ 12:28 Mar 24, 2012    More linux | permalink to this entry | comments ]

Tue, 03 Jan 2012

Open the X selection in a browser window, from any desktop

Like most Linux users, I use virtual desktops. Normally my browser window is on a desktop of its own.

Naturally, it often happens that I encounter a link I'd like to visit while I'm on a desktop where the browser isn't visible. From some apps, I can click on the link and have it show up. But sometimes, the link is just text, and I have to select it, change to the browser desktop, paste the link into firefox, then change desktops again to do something else while the link loads.

So I set up a way to load whatever's in the X selection in firefox no matter what desktop I'm on.

In most browsers, including firefox, you can tell your existing browser window to open a new link from the command line: firefox opens that link in your existing browser window if you already have one up, rather than starting another browser. So the trick is to get the text you've selected.

At first, I used a program called xclip. You can run this command: firefox `xclip -o` to open the selection. That worked okay at first -- until I hit my first URL in weechat that was so long that it was wrapped to the next line. It turns out xclip does odd things with multi-line output; depending on whether it thinks the output is a terminal or not, it may replace the newline with a space, or delete whatever follows the newline. In any case, I couldn't find a way to make it work reliably when pasted into firefox.

After futzing with xclip for a little too long, trying to reverse-engineer its undocumented newline behavior, I decided it would be easier just to write my own X clipboard app in Python. I already knew how to do that, and it's super easy once you know the trick:

mport gtk
primary = gtk.clipboard_get(gtk.gdk.SELECTION_PRIMARY)
if primary.wait_is_text_available() :
    print primary.wait_for_text()

That just prints it directly, including any newlines or spaces. But as long as I was writing my own app, why not handle that too?

It's not entirely necessary on Firefox: on Linux, Firefox has some special code to deal with pasting multi-line URLs, so you can copy a URL that spans multiple lines, middleclick in the content area and things will work. On other platforms, that's disabled, and some Linux distros disable it as well; you can enable it by going to about:config and searching for single, then setting the preference editor.singlelinepaste.pasteNewlines to 2.

However, it was easy enough to make my Python clipboard app do the right thing so it would work in any browser. I used Python's re (regular expressions) module:

#!/usr/bin/env python

import gtk
import re

primary = gtk.clipboard_get(gtk.gdk.SELECTION_PRIMARY)

if not primary.wait_is_text_available() :
s = primary.wait_for_text()

# eliminate newlines, and any spaces immediately following a newline:
print re.sub(r'[\r\n]+ *', '', s)

That seemed to work fine, even on long URLs pasted from weechat with newlines and spaces, like that looked like

All that was left was binding it so I could access it from anywhere. Of course, that varies depending on your desktop/window manager. In Openbox, I added two items to my desktop menu in menu.xml:

  <item label="open selection in Firefox">
    <action name="Execute"><execute>sh -c 'firefox `xclip -o`'</execute></action>
  <item label="open selection in new tab">
    <action name="Execute"><execute>sh -c 'firefox -new-tab `xclip -o`'</execute></action>

I also added some code in rc.xml inside <context name="Desktop">, so I can middle-click or control-middle-click on the desktop to open a link in the browser:

      <mousebind button="Middle" action="Press">
        <action name="Execute">
          <execute>sh -c 'firefox `pyclip`'</execute>
      <mousebind button="C-Middle" action="Press">
        <action name="Execute">
          <execute>sh -c -new-tab 'firefox `pyclip`'</execute>

I set this up maybe two hours ago and I've probably used it ten or fifteen times already. This is something I should have done long ago!

Tags: , , ,
[ 22:37 Jan 03, 2012    More linux | permalink to this entry | comments ]

Sun, 18 Dec 2011

Convert patterns in only some lines to title case

A friend had a fun problem: she had some XML files she needed to import into GNUcash, but the program that produced them left names in all-caps and she wanted them more readable. So she'd have a file like this:


   <MEMO>SOME COMPANY    ANY TOWN   CA 11-25-11 330346
and wanted to change the NAME and MEMO lines to read Some Company and Any Town. However, the tags, like <NAME>, all had to remain upper case, and presumably so did strings like DEBIT. How do you change just the NAME and MEMO lines from upper case to title case?

The obvious candidate to do string substitutes is sed. But there are several components to the problem.


First, how do you ensure the replacement only happens on lines with NAME and MEMO?

sed lets you specify address ranges for just that purpose. If you say sed 's/xxx/yyy/' sed will change all xxx's to yyy; but if you say sed '/NAME/s/xxx/yyy/' then sed will only do that substitution on lines containing NAME.

But we need this to happen on lines that contain either NAME or MEMO. How do you do that? With \|, like this: sed '/\(NAME\|MEMO\)/s/xxx/yyy/'

Converting to title case

Next, how do you convert upper case to lower case? There's a sed command for that: \L. Run sed 's/.*/\L&/' and type some upper and lower case characters, and they'll all be converted to lower-case.

But here we want title case -- we want most of each word converted to lowercase, but the first letter should stay uppercase. That means we need to detect a word and figure out which is the first letter.

In the strings we're considering, a word is a set of letters A through Z with one of the following characteristics:

  1. It's preceded by a space
  2. It's preceded by a close-angle-bracket, >

So the pattern /[ >][A-Z]*/ will match anything we consider a word that might need conversion.

But we need to separate the first letter and the rest of the word, so we can treat them separately. sed's \( \) operators will let us do that. The pattern \([ >][A-Z]\) finds the first letter of a word (including the space or > preceding it), and saves that as its first matched pattern, \1. Then \([A-Z]*\) right after it will save the rest of the word as \2.

So, taking our \L case converter, we can convert to title case like this: sed 's/\([ >][A-Z]\)\([A-Z]*\)/\1\L\2/g

Starting to look long and scary, right? But it's not so bad if you build it up gradually from components. I added a g on the end to tell sed this is a global replace: do the operation on every word it finds in the line, otherwise it will only make the substitution once, on the first word it sees, then quit.

Putting it together

So we know how to seek out specific lines, and how to convert to title case. Put the two together, and you get the final command:

sed '/\(NAME\|MEMO\)/s/\([ >][A-Z]\)\([A-Z]*\)/\1\L\2/g'

I ran it on the test input, and it worked just fine.

For more information on sed, a good place to start is the sed regular expressions manual.

Tags: , ,
[ 14:13 Dec 18, 2011    More linux/cmdline | permalink to this entry | comments ]

Sat, 03 Sep 2011

List only directories

Fairly often, I want a list of subdirectories inside a particular directory. For instance, when posting blog entries, I may need to decide whether an entry belongs under "linux" or some sub-category, like "linux/cmdline" -- so I need to remind myself what categories I have under linux.

But strangely, Linux offers no straightforward way to ask that question. The ls command lists directories -- along with the files. There's no way to list just the directories. You can list the directories first, with the --group-directories-first option. Or you can flag the directories specially: ls -F appends a slash to each directory name, so instead of linux you'd see linux/. But you still have to pick the directories out of a long list of files. You can do that with grep, of course:

ls -1F ~/web/blog/linux | grep /
That's a one, not an ell: it tells ls to list files one per line. So now you get a list of directories, one per line, with a slash appended to each one. Not perfect, but it's a start.

Or you can use the find program, which has an option -type d that lists only directories. Perfect, right?

find ~/web/blog/linux -maxdepth 1 -type d

Except that lists everything with full pathnames: /home/akkana/web/blog/linux, /home/akkana/web/blog/linux/editors, /home/akkana/web/blog/linux/cmdline and so forth. Way too much noise to read quickly.

What I'd really like is to have just a list of directory names -- no slashes, no newlines. How do we get from ls or find output to that? Either we can start with find and strip off all the path information, either in a loop with basename or with a sed command; or start with ls -F, pick only the lines with slashes, then strip off those slashes. The latter sounds easier.

So let's go back to that ls -1F ~/web/blog/linux | grep / command. To strip off the slashes, you can use sed's s (substitute) command. Normally the syntax is sed 's/oldpat/newpat/'. But since slashes are the pattern we're substituting, it's better to use something else as the separator character. I'll use an underscore.

The old pattern, the one I want to replace, is / -- but I only want to replace the last slash on the line, so I'll add a $ after it, representing end-of-line. The new pattern I want instead of the slash is -- nothing.

So my sed argument is 's_/$__' and the command becomes:

ls -1F ~/web/blog/linux | grep / | sed 's_/$__'

That does what I want. If I don't want them listed one per line, I can fudge that using backquotes to pass the output of the whole command to the shell's echo command:

echo `ls -1F ~/web/blog/linux | grep / | sed 's_/$__'`

If you have a lot of directories to list and you want ls's nice columnar format, that's a little harder. You can ls the list of directories (the names inside the backquotes), ls `your long command` -- except that now that you've stripped off the path information, ls won't know where to find the files. So you'd have to change directory first:

cd ~/web/blog/linux; ls -d `ls -1F | grep / | sed 's_/$__'`

That's not so good, though, because now you've changed directories from wherever you were before. To get around that, use parentheses to run the commands inside a subshell:

(cd ~/web/blog/linux; ls -d `ls -1F | grep / | sed 's_/$__'`)

Now the cd only applies within the subshell, and when the command finishes, your own shell will still be wherever you started.

Finally, I don't want to have to go through this discovery process every time I want a list of directories. So I turned it into a couple of shell functions, where $* represents all the arguments I pass to the command, and $1 is just the first argument.

lsdirs() { 
  (cd $1; /bin/ls -d `/bin/ls -1F | grep / | sed 's_/$__'`)

lsdirs2() { 
  echo `/bin/ls -1F $* | grep / | sed 's_/$__'` 
I specify /bin/ls because I have a function overriding ls in my .zshrc. Most people won't need to, but it doesn't hurt.

Now I can type lsdirs ~/web/blog/linux and get a nice list of directories.

Update, shortly after posting: In zsh (which I use), there's yet another way: */ matches only directories. It appends a trailing slash to them, but *(/) matches directories and omits the trailing slash. So you can say

echo ~/web/blog/linux/*(/:t)
:t strips the directory part of each match. To see other useful : modifiers, type ls *(: then hit TAB.

Thanks to Mikachu for the zsh tips. Zsh can do anything, if you can just figure out how ...

Tags: , , ,
[ 11:22 Sep 03, 2011    More linux/cmdline | permalink to this entry | comments ]

Mon, 30 May 2011

Command-line Arduino development

I've been doing more Arduino development lately. But I don't use the Arduino Java development environment -- programming is so much easier when you have a real editor, like emacs or vim, and key bindings to speed everything up.

I've found very little documentation on how to do command-line Arduino development, and most of the Makefiles out there are old and no longer work. So I've written up a tutorial. It ended up too long for a blog post, so I've made it a separate article:

Command-line Arduino development.

Tags: , , , ,
[ 14:45 May 30, 2011    More programming | permalink to this entry | comments ]

Tue, 15 Mar 2011

Using grep to solve another Cartalk puzzler

It's another episode of "How to use Linux to figure out CarTalk puzzlers"! This time you don't even need any programming.

Last week's puzzler was A Seven-Letter Vacation Curiosity. Basically, one couple hiking in Northern California and another couple carousing in Florida both see something described by a seven-letter word containing all five vowels -- but the two things they saw were very different. What's the word?

That's an easy one to solve using basic Linux command-line skills -- assuming the word is in the standard dictionary. If it's some esoteric word, all bets are off. But let's try it and see. It's a good beginning exercise in regular expressions and how to use the command line.

There's a handy word list in /usr/share/dict/words, one word per line. Depending on what packages you have installed, you may have bigger dictionaries handy, but you can usually count on /usr/share/dict/words being there on any Linux system. Some older Unix systems may have it in /usr/dict/words instead.

We need a way to choose all seven letter words. That's easy. In a regular expression, . (a dot) matches one letter. So ....... (seven dots) matches any seven letters.

(There's a more direct way to do that: the expression .\{7\} will also match 7 letters, and is really a better way. But personally, I find it harder both to remember and to type than the seven dots. Still, if you ever need to match 43 characters, or 114, it's good to know the "right" syntax.)

Fine, but if you grep ....... /usr/share/dict/words you get a list of words with seven or more letters. See why? It's because grep prints any line where it finds a match -- and a word with nine letters certainly contains seven letters within it.

The pattern you need to search for is '^.......$' -- the up-caret ^ matches the beginning of a line, and the dollar sign $ matches the end. Put single quotes around the pattern so the shell won't try to interpret the caret or dollar sign as special characters. (When in doubt, it's always safest to put single quotes around grep patterns.)

So now we can view all seven-letter words: grep '^.......$' /usr/share/dict/words
How do we choose only the ones that contain all the letters a e i o and u?

That's easy enough to build up using pipelines, using the pipe character | to pipe the output of one grep into a different grep. grep '^.......$' /usr/share/dict/words | grep a sends that list of 7-letter words through another grep command to make sure you only see words containing an a.

Now tack a grep for each of the other letters on the end, the same way:
grep '^.......$' /usr/share/dict/words | grep a | grep e | grep i | grep o | grep u

Voilà! I won't spoil the puzzler, but there are two words that match, and one of them is obviously the answer.

The power of the Unix command line to the rescue!

Tags: , , , , ,
[ 11:00 Mar 15, 2011    More linux/cmdline | permalink to this entry | comments ]

Wed, 29 Sep 2010

"Who am I?" Maybe nobody!

We hit an interesting problem at work recently. A coworker made a deb package which, during installation, needed to figure out the ID of the user running it, so it could make files writable by that user. Of course, while a package is being installed it's run by root, so the trick is to find out who you were before you sudoed or sued to root.

He was using the command who am i -- reasonable, since it's been a staple since the early days of Unix. For those not familiar with the command, /usr/bin/who, if given two arguments, regardless of what those arguments are, will print information about the current logged-in user. It also offers a -m option to do the same thing. So who am i, who a b, and who -m should all print a line like:

$ who am i
akkana   pts/1        2010-09-29 09:33 (:0.0)

Except they don't. For me, they printed nothing at all -- which broke my colleague's install script.

A quick poll among friends on IRC showed that who am i worked for some people, failed for others, with no obvious logic to it.

It's the terminal

It took some digging to find out what was going on, but the difference turned out to be the terminal being used. The who program -- with or without -m -- gets its info from /var/run/utmp, a file that maintains a record of who's logged in to the system. And it turns out some terminals create a utmp entry, while others don't. So:
Program Creates utmp entry?
gnome-terminal yes
konsole yes
xterm no
xfterm4 yes
terminator no
rxvt no
roxterm yes

I use xterm myself. Xterm is documented (in its man page) to modify the utmp entry, and it has a command-line flat, +ut, plus two X resources, ptyHandshake and utmpInhibit. None of the three work: setting

XTerm*ptyHandshake: true
XTerm*utmpInhibit: false
then running xterm +ut still doesn't show up in who. I guess that's a bug in xterm (or Ubuntu's version of xterm).

How do you get the real user?

Okay, so who am i clearly isn't a reliable way of getting the user ID. What can you use instead?

Several people suggested the id program. It has a -r option which supposedly prints the real UID. Unfortunately, what it really does is print:

$ id -r
id: cannot print only names or real IDs in default format
The man page doesn't offer any suggestions how to use a format other than default, so we're kinda stuck there.

Update: people keep suggesting id -ru to me. Evidently I wasn't very clear in this article: the goal is to get the real id of the login user. In other words, if you're logged in as mary and using sudo, you want mary, not root.

Alas, adding -u to id's flags gets only the effective user id: -u wins over -r. This is very easy to test: sudo id -ru prints 0, as does id -ru inside su.

But elly on Freenode had a great suggestion:

stat -c '%U' `readlink /proc/self/fd/0`
What does this do?

/proc/self is a symlink to /proc/pid, a directory where you can find out all sorts of information about a process.

One of the things you can find out about a process is open file descriptors: in particular, standard input, output and error. So /proc/self/fd/0 corresponds to standard input of the current process -- which in the example above is readlink.

What is readlink? Well, /proc/self/fd/0, in the normal case, is actually a symlink to the terminal controlling the process. readlink prints the file to which that link points -- for instance, /dev/pts/1. That's the terminal being used.

Now that we know the name of the terminal, all we need to do is find out who owns it. (This is the information who am i would have gotten from utmp, had there been a utmp entry.) ls -l /dev/pts/1 will show you that it's you, even if you run it as sudo ls -l /dev/pts/1. You could take that and strip off fields to get the username, but stat, as elly suggested, is a much better way of doing that.

Put it all together, and stat -c '%U' `readlink /proc/self/fd/0 gets standard input for the current process, follows the link to get the controlling terminal, then finds out who owns that terminal.

That's you!

A similar but slightly shorter solution suggested by Mikachu: stat -c %u `tty`

Tags: ,
[ 17:39 Sep 29, 2010    More linux/cmdline | permalink to this entry | comments ]

Fri, 18 Jun 2010

Use "date" to show time abroad

While I was in Europe, Dave stumbled on a handy alias on his Mac to check the time where I was: date -v +10 (+10 is the offset from the current time). But when he tried to translate this to Linux, he found that the -v flag from FreeBSD's date program wasn't available on the GNU date on Linux.

But I suggested he could do the same thing with the TZ environment variable. It's not documented well anywhere I could find, but if you set TZ to the name of a time zone, date will print out the time for that zone rather than your current one.

So, for bash:

$ TZ=Europe/Paris date  # time in Paris
$ TZ=GB date            # time in Great Britain
$ TZ=GMT-02 date        # time two timezones east of GMT
or for csh:
% ( setenv TZ Europe/Paris; date)
% ( setenv TZ GB; date)
% ( setenv TZ GMT-02; date)

That's all very well. But when I tried

% ( setenv TZ UK; date)
% ( setenv TZ FR; date)
they gave the wrong time, even though Wikipedia's list of time zones seemed to indicate that those abbreviations were okay.

The trick seems to be that setting TZ only works for abbreviations in /usr/share/zoneinfo/, or maybe in /usr/share/zoneinfo/posix/. If you give an abbreviation, like UK or FR or America/San_Francisco, it won't give you an error, it'll just print GMT as if that was what you had asked for.

So this trick is useful for printing times abroad -- but if you want to be safe, either stick to syntaxes like GMT-2, or make a script that checks whether your abbreviation exists in the directory before calling date, and warns you rather than just printing the wrong time.

Tags: , , ,
[ 14:04 Jun 18, 2010    More linux/cmdline | permalink to this entry | comments ]