Shallow Thoughts : tags : zsh

Akkana's Musings on Open Source Computing, Science, and Nature.

Sat, 24 Aug 2013

A nifty shell redirection trick: process substitution

I love shell pipelines, and flatter myself that I'm pretty good at them. But a discussion last week on the Linuxchix Techtalk mailing list on finding added lines in a file turned up a terrific bash/zsh shell redirection trick I'd never seen before:

join -v 2 <(sort A.txt) <(sort B.txt)

I've used backquotes, and their cognate $(), plenty. For instance, you can do things like PS1=$(hostname): or PS1=`hostname`: to set your prompt to the current hostname: the shell runs the hostname command, takes its output, and substitutes that output in place of the backquoted or parenthesized expression.

But I'd never seen that <(...) trick before, and immediately saw how useful it was. Backquotes or $() let you replace arguments to a command with a program's output -- they're great for generating short strings for programs that take all their arguments on the command line. But they're no good for programs that need to read a file, or several files. <(...) lets you take the output of a command and pass it to a program as though it was the contents of a file. And if you can do it more than once in the same command -- as in Little Girl's example -- that could be tremendously useful.

Playing with it to see if it really did what it looked like it did, and what other useful things I could do with it, I tried this (and it worked just fine):

$ diff <(echo hello; echo there) <(echo hello; echo world)
2c2
< there
---
> world
It acts as though I had two files, which each have "hello" as their first line; but one has "there" as the second line, while the other has "world". And diff shows the difference. I don't think there's any way of doing anything like that with backquotes; you'd need to use temp files.

Of course, I wanted to read more about it -- how have I gone all these years without knowing about this? -- and it looks like I'm not the only one who didn't know about it. In fact, none of the pages I found on shell pipeline tricks even mentioned it.

It turns out it's called "process substitution" and I found it documented in Chapter 23 of the Advanced Bash-Scripting Guide.

I tweeted it, and a friend who is a zsh master gave me some similar cool tricks. For instance, in zsh echo hi > >(cat) > >(cat -n) lets you pipe the output of a command to more than one other command.

That's zsh, but in bash (or zsh too, of course), you can use >() and tee to do the same thing: echo hi | tee >(cat) | cat -n

If you want a temp file to be created automatically, one you can both read and write, you can use =(foo) (zsh only?)

Great stuff! Some other pages that discuss some of these tricks:

Tags: , , ,
[ 18:23 Aug 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 24 Jul 2013

Yet more on that comma-inserting regexp, plus a pattern to filter unprintable characters

One more brief followup on that comma inserting sed pattern and its followup:

$ echo 20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'
20,130,607,215,015

In the second article, I'd mentioned that the hardest part of the exercise was figuring out where we needed backslashes. Devdas (f3ew) asked on Twitter whether I would still need all the backslash escapes even if I put the pattern in a file -- in other worse, are the backslashes merely to get the shell to pass special characters unchanged?

A good question, and I suspected the need for some of the backslashes would disappear. So I tried this:

$ echo ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   
$ echo 20130607215015 | sed -f /tmp/commas

And it didn't work. No commas were inserted.

The problem, it turns out, is that my shell, zsh, changed both instances of \b to an ASCII backspace, ^H. Editing the file fixes that, and so does

$ echo -E ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   

But that only applies to echo: zsh doesn't do the \b -> ^H substitution in the original command, where you pass the string directly as a sed argument.

Okay, with that straightened out, what about Devdas' question?

Surprisingly, it turns out that all the backslashes are still needed. None of them go away when you echo > file, so they weren't there just to get special characters past the shell; and if you edit the file and try removing some of the backslashes, you'll see that the pattern no longer works. I had thought at least some of them, like the ones before the \{ \}, were extraneous, but even those are still needed.

Filtering unprintable characters

As long as I'm writing about regular expressions, I learned a nice little tidbit last week. I'm getting an increasing flood of Asian-language spams which my mail ISP doesn't filter out (they use spamassassin, which is pretty useless for this sort of filtering). I wanted a simple pattern I could pass to egrep (via procmail) that would filter out anything with a run of more than 4 unprintable characters in a row. [^[:print:]]{4,} should do it, but it wasn't working.

The problem, it turns out, is the definition of what's printable. Apparently when the default system character set is UTF-8, just about everything is considered printable! So the trick is that you need to set LC_ALL to something more restrictive, like C (which basically means ASCII) to before :print: becomes useful for language-based filtering. (Thanks to Mikachu for spotting the problem).

So in a terminal, you can do something like

LC_ALL=C egrep -v '[^[:print:]]' filename

In procmail it was a little harder; I couldn't figure out any way to change LC_ALL from a procmail recipe; the only solution I came up with was to add this to ~/.procmailrc:

export LC_ALL=C

It does work, though, and has cut the spam load by quite a bit.

Tags: , , , ,
[ 18:35 Jul 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Sat, 15 Jun 2013

Autocompleting xchat channel log filenames in zsh

Sometimes zsh is a little too smart for its own good.

Something I do surprisingly often is to complete the filenames for my local channel logs in xchat. Xchat gives its logs crazy filenames like /home/akkana/.xchat2/xchatlogs/FreeNode-#ubuntu-us-ca.log. They're hard to autocomplete -- I have to type something like: ~/.xc<tab>xc<tab>l<tab>Fr<tab>\#ub<tab>us<tab> Even with autocompletion, that's a lot of typing!

Bug zsh makes it even worse: I have to put that backslash in front of the hash, \#, or else zsh will see it either as a comment (unless I unsetopt interactivecomments, in which case I can't paste functions from my zshrc when I'm testing them); or as an extended regular expression (unless I unsetopt extendedglob). I don't want to unset either of those options: I use both of them.

Tonight I was fiddling with something else related to extendedglob, and was moved to figure out another solution to the xchat completion problem. Why not get zsh's smart zle editor to insert most of that annoying, not easily autocompletable string for me?

The easy solution was to bind it to a function key. I picked F8 for testing, and figured out its escape sequence by typing echo , then Ctrl-V, then hitting F8. It turns out to insert <ESC>[20~. So I made a binding:

bindkey -s '\e[20~' '~/.xchat2/xchatlogs/ \\\#^B^B^B'

When I press F8, that inserts the following string:

~/.xchat2/xchatlogs/ \#
                    ↑ (cursor ends up here)
... moving the cursor back three characters, so it's right before the space. The space is there so I can autocomplete the server name by typing something like Fr<TAB> for FreeNode. Then I delete the space (Ctrl-D), go to the end of the line (Ctrl-E), and start typing my channel name, like ubu<TAB>us<TAB>. I don't have to worry about typing the rest of the path, or the escaped hash sign.

That's pretty cool. But I wished I could bind it to a character sequence, like maybe .xc, rather than using a function key. (I could use my Crikey program to do that at the X level, but that's cheating; I wanted to do it within zsh.) You can't just use bindkey -s '.xch' '~/.xchat2/xchatlogs/ \\\#^B^B^B' because it's recursive: as soon as zsh inserts the ~/.xc part, that expands too, and you end up with ~/~/.xchat2/xchatlogs/hat2/xchatlogs/ \# \#.

The solution, though it's a lot more lines, is to use the special variables LBUFFER and RBUFFER. LBUFFER is everything left of the cursor position, and RBUFFER everything right of it. So I define a function to set those, then set a zle "widget" to that function, then finally bindkey to that widget:

function autoxchat()
{
    LBUFFER+="~/.xchat2/xchatlogs/"
    RBUFFER=" \\#$RBUFFER"
}
zle -N autoxchat
bindkey ".xc" autoxchat

Pretty cool! The only down side: now that I've gone this far in zle bindings, I'm probably an addict and will waste a lot more time tweaking them.

Tags: , ,
[ 20:31 Jun 15, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 15 Aug 2012

Getting ls to show symlinks (and stripping terminal slashes in shells)

The Linux file listing program, ls, has been frustrating me for some time with its ever-changing behavior on symbolic links.

For instance, suppose I have a symlink named Maps that points to a directory on another disk called /data/Maps. If I say ls ~/Maps, I might want to see where the link points:

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
or I might equally want to see the contents of the /data/Maps directory.

Many years ago, the Unix ls program magically seemed to infer when I wanted to see the link and what it points to, versus when I wanted to see the contents of the directory the link points to. I'm not even sure any more what the rule was; just that I was always pleasantly surprised that it did what I wanted. Now, in modern Linux, it usually manages to do the opposite of what I want. But the behavior has changed several times until, I confess, I'm no longer even sure of what I want it to do.

So if I'm not sure whether I usually want it to show the symlink or follow it ... why not make it do both?

There's no ls flag that will do that. But that's okay -- I can make a shell function to do what I want..

Current ls flags

First let's review man ls to see the relevant flags we do have, searching for the string "deref".

I find three different flags to tell ls to dereference a link: -H (dereference any link explicitly mentioned on the command line -- even though ls does that by default); --dereference-command-line-symlink-to-dir (do the same if it's a directory -- even though -H already does that, and even though ls without any flags also already does that); and -L (dereference links even if they aren't mentioned on the command line). The GNU ls maintainers are clearly enamored with dereferencing symlinks.

In contrast, there's one flag, -d, that says not to dereference links (when used in combination with -l). And -d isn't useful in general (you can't make it part of a normal ls alias) because -d also has another, more primary meaning: it also prevents you from listing the contents of normal, non-symlinked directories.

Solution: a shell function

Let's move on to the problem of how to show both the link information and the dereferenced file.

Since there's no ls flag to do it, I'll have to do it by looping over the arguments of my shell function. In a shell test, you can use -h to tell if a file is a symlink. So my first approach was to call ls -ld on all the symlinks to show what the point to:

ll() {
    /bin/ls -laFH $*
    for f in $*; do
        if [[ -h $f ]]; then
            echo -n Symlink:
            /bin/ls -ld $f
        fi
    done
}

Terminally slashed

That worked on a few simple tests. But when I tried to use it for real I hit another snag: terminal slashes.

In real life, I normally run this with autocompletion. I don't type ll ~/Maps -- I'm more likely to type like ll Ma<tab> -- the tab looks for files beginning with Ma and obligingly completes it as Maps/ -- note the slash at the end.

And, well, it turns out /bin/ls -ld Maps/ no longer shows the symlink, but derefernces it instead -- yes, never mind that the man page says -d won't dereference symlinks. As I said, those ls maintainers really love dereferencing.

Okay, so if I want to not dereference, since there's no ls flag that means really don't dereference, I mean it -- my little zsh function needs to find a way of stripping any terminal slash on each directory name. Of course, I could do it with sed:

        f=`echo $f | sed 's/\/$//'`
and that works fine, but ... ick. Surely zsh has a better way?

In fact, there's a better way that even works in bash (thanks to zsh wizard Mikachu for this gem):

        f=${f%/}

That "remove terminal slash" trick has already come in handy in a couple of other shell functions I use -- definitely a useful trick if you use autocompletion a lot.

Making the link line more readable

But wait: one more tweak, as long as I'm tweaking. That long ls -ld line,

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
is way too long and full of things I don't really care about (the permissions, ownership and last-modified date on a symlink aren't very interesting). I really only want the last three words,
Maps -> /data/Maps/

Of course I could use something like awk to get that. But zsh has everything -- I bet it has a clever way to separate words.

And indeed it does: arrays. The documentation isn't very clear and not all the array functions worked as the docs implied, but here's what ended up working: you can set an array variable by using parentheses after the equals sign in a normal variable-setting statement, and after that, you can refer to it using square brackets. You can even use negative indices, like in python, to count back from the end of an array. That made it easy to do what I wanted:

            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[-3,-1]

Hooray zsh! Though it turned out that -3 didn't work for directories with spaces in the name, so I had to use [9, -1] instead. The echo -E is to prevent strange things happening if there are things like backslashes in the filename.

The completed shell function

I moved the symlink-showing function into a separate function, so I can call it from several different ls aliases, and here's the final result:

show_symlinks() {
    for f in $*; do
        # Remove terminal slash.
        f=${f%/}
        if [[ -h $f ]]; then
            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[9,-1]
        fi
    done
}

ll() {
    /bin/ls -laFH $*
    show_symlinks $*
}

Bash doesn't have arrays like zsh, so replace those two lines with

            echo -n 'Symlink: '
            /bin/ls -ld $f | cut -d ' ' -f 10-
and the rest of the function should work just fine.

Tags: , , ,
[ 19:22 Aug 15, 2012    More linux/cmdline | permalink to this entry | comments ]

Syndicated on:
LinuxChix Live
Ubuntu Women
Women in Free Software
Graphics Planet
DevChix
Ubuntu California
Planet Openbox
Devchix
Planet LCA2009

Friends' Blogs:
Morris "Mojo" Jones
Jane Houston Jones
Dan Heller
Long Live the Village Green
Ups & Downs
DailyBBG

Other Blogs of Interest:
DevChix
Scott Adams
Dave Barry
BoingBoing

Powered by PyBlosxom.