Shallow Thoughts : tags : zsh

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Fri, 04 Dec 2015

Distclean part 2: some useful zsh tricks

I wrote recently about a zsh shell function to run make distclean on a source tree even if something in autoconf is messed up. In order to save any arguments you've previously passed to configure or autogen.sh, my function parsed the arguments from a file called config.log.

But it might be a bit more reliable to use config.status -- I'm guessing this is the file that make uses when it finds it needs to re-run autogen.sh. However, the syntax in that file is more complicated, and parsing it taught me some useful zsh tricks.

I can see the relevant line from config.status like this:

$ grep '^ac_cs_config' config.status
ac_cs_config="'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"

--enable-foo --disable-bar are options I added purely for testing. I wanted to make sure my shell function would work with multiple arguments.

Ultimately, I want my shell function to call autogen.sh --prefix=/usr/local/gimp-git --enable-foo --disable-bar The goal is to end up with $args being a zsh array containing those three arguments. So I'll need to edit out those quotes and split the line into an array.

Sed tricks

The first thing to do is to get rid of that initial ac_cs_config= in the line from config.status. That's easy with sed:

$ grep '^ac_cs_config' config.status | sed -e 's/ac_cs_config=//'
"'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"

But since we're using sed anyway, there's no need to use grep to get the line: we can do it all with sed. First try:

sed -n '/^ac_cs_config/s/ac_cs_config=//p' config.status

Search for the line that starts with ac_cs_config (^ matches the beginning of a line); then replace ac_cs_config= with nothing, and p print the resulting line. -n tells sed not to print anything except when told to with a p.

But it turns out that if you give a sed substitution a blank pattern, it uses the last pattern it was given. So a more compact version, using the search pattern ^ac_cs_config, is:

sed -n '/^ac_cs_config=/s///p' config.status

But there's also another way of doing it:

sed '/^ac_cs_config=/!d;s///' config.status

! after a search pattern matches every line that doesn't match the pattern. d deletes those lines. Then for lines that weren't deleted (the one line that does match), do the substitution. Since there's no -n, sed will print all lines that weren't deleted.

I find that version more difficult to read. But I'm including it because it's useful to know how to chain several commands in sed, and how to use ! to search for lines that don't match a pattern.

You can also use sed to eliminate the double quotes:

sed '/^ac_cs_config=/!d;s///;s/"//g' config.status
'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
But it turns out that zsh has a better way of doing that.

Zsh parameter substitution

I'm still relatively new to zsh, but I got some great advice on #zsh. The first suggestion:

sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args

I'll be using final print -rl - $args for all these examples: it prints an array variable with one member per line. For the actual distclean function, of course, I'll be passing the variable to autogen.sh, not printing it out.

First, let's look at the heart of that expression: the args=( ${(Q)${(z)${(Q)REPLY}}}.

The heart of this is the expression ${(Q)${(z)${(Q)x}}} The zsh parameter substitution syntax is a bit arcane, but each of the parenthesized letters does some operation on the variable that follows.

The first (Q) strips off a level of quoting. So:

$ x='"Hello world"'; print $x; print ${(Q)x}
"Hello world"
Hello world

(z) splits an expression and stores it in an array. But to see that, we have to use print -l, so array members will be printed on separate lines.

$ x="a b c"; print -l $x; print "....."; print -l ${(z)x}
a b c
.....
a
b
c

Zsh is smart about quotes, so if you have quoted expressions it will group them correctly when assigning array members:

$ 
x="'a a' 'b b' 'c c'"; print -l $x; print "....."; print -l ${(z)x} 'a a' 'b b' 'c c' ..... 'a a' 'b b' 'c c'

So let's break down the larger expression: this is best read from right to left, inner expressions to outer.

${(Q) ${(z) ${(Q) x }}}
   |     |     |   \
   |     |     |    The original expression, 
   |     |     |   "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"
   |     |     \
   |     |      Strip off the double quotes:
   |     |      '--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
   |     \
   |      Split into an array of three items
   \
    Strip the single quotes from each array member,
    ( --prefix=/usr/local/gimp-git --enable-foo --disable-bar )
Neat!

For more on zsh parameter substitutions, see the Zsh Guide, Chapter 5: Substitutions.

Passing the sed results to the parameter substitution

There's still a little left to wonder about in our expression, sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args

The IFS= read -r seems to be a common idiom in zsh scripting. It takes standard input and assigns it to the variable $REPLY. IFS is the input field separator: you can split variables into words by spaces, newlines, semicolons or any other character you want. IFS= sets it to nothing. But because the input expression -- "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'" -- has quotes around it, IFS is ignored anyway.

So you can do the same thing with this simpler expression, to assign the quoted expression to the variable $x. I'll declare it a local variable: that makes no difference when testing it in the shell, but if I call it in a function, I won't have variables like $x and $args cluttering up my shell afterward.

local x=$(sed -n '/^ac_cs_config=/s///p' config.status); local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args

That works in the version of zsh I'm running here, 5.1.1. But I've been warned that it's safer to quote the result of $(). Without quotes, if you ever run the function in an older zsh, $x might end up being set only to the first word of the expression. Second, it's a good idea to put "local" in front of the variable; that way, $x won't end up being set once you've returned from the function. So now we have:

local x="$(sed -n '/^ac_cs_config=/s///p' config.status)"; local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args

You don't even need to use a local variable. For added brevity (making the function even more difficult to read! -- but we're way past the point of easy readability), you could say:

args=( ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}} ); print -rl - $args
or even
print -rl - ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}}
... but that final version, since it doesn't assign to a variable at all, isn't useful for the function I'm writing.

Tags: , , , ,
[ 13:25 Dec 04, 2015    More linux/cmdline | permalink to this entry | comments ]

Sat, 24 Aug 2013

A nifty shell redirection trick: process substitution

I love shell pipelines, and flatter myself that I'm pretty good at them. But a discussion last week on the Linuxchix Techtalk mailing list on finding added lines in a file turned up a terrific bash/zsh shell redirection trick I'd never seen before:

join -v 2 <(sort A.txt) <(sort B.txt)

I've used backquotes, and their cognate $(), plenty. For instance, you can do things like PS1=$(hostname): or PS1=`hostname`: to set your prompt to the current hostname: the shell runs the hostname command, takes its output, and substitutes that output in place of the backquoted or parenthesized expression.

But I'd never seen that <(...) trick before, and immediately saw how useful it was. Backquotes or $() let you replace arguments to a command with a program's output -- they're great for generating short strings for programs that take all their arguments on the command line. But they're no good for programs that need to read a file, or several files. <(...) lets you take the output of a command and pass it to a program as though it was the contents of a file. And if you can do it more than once in the same command -- as in Little Girl's example -- that could be tremendously useful.

Playing with it to see if it really did what it looked like it did, and what other useful things I could do with it, I tried this (and it worked just fine):

$ diff <(echo hello; echo there) <(echo hello; echo world)
2c2
< there
---
> world
It acts as though I had two files, which each have "hello" as their first line; but one has "there" as the second line, while the other has "world". And diff shows the difference. I don't think there's any way of doing anything like that with backquotes; you'd need to use temp files.

Of course, I wanted to read more about it -- how have I gone all these years without knowing about this? -- and it looks like I'm not the only one who didn't know about it. In fact, none of the pages I found on shell pipeline tricks even mentioned it.

It turns out it's called "process substitution" and I found it documented in Chapter 23 of the Advanced Bash-Scripting Guide.

I tweeted it, and a friend who is a zsh master gave me some similar cool tricks. For instance, in zsh echo hi > >(cat) > >(cat -n) lets you pipe the output of a command to more than one other command.

That's zsh, but in bash (or zsh too, of course), you can use >() and tee to do the same thing: echo hi | tee >(cat) | cat -n

If you want a temp file to be created automatically, one you can both read and write, you can use =(foo) (zsh only?)

Great stuff! Some other pages that discuss some of these tricks:

Tags: , , ,
[ 19:23 Aug 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 24 Jul 2013

Yet more on that comma-inserting regexp, plus a pattern to filter unprintable characters

One more brief followup on that comma inserting sed pattern and its followup:

$ echo 20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'
20,130,607,215,015

In the second article, I'd mentioned that the hardest part of the exercise was figuring out where we needed backslashes. Devdas (f3ew) asked on Twitter whether I would still need all the backslash escapes even if I put the pattern in a file -- in other worse, are the backslashes merely to get the shell to pass special characters unchanged?

A good question, and I suspected the need for some of the backslashes would disappear. So I tried this:

$ echo ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   
$ echo 20130607215015 | sed -f /tmp/commas

And it didn't work. No commas were inserted.

The problem, it turns out, is that my shell, zsh, changed both instances of \b to an ASCII backspace, ^H. Editing the file fixes that, and so does

$ echo -E ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas   

But that only applies to echo: zsh doesn't do the \b -> ^H substitution in the original command, where you pass the string directly as a sed argument.

Okay, with that straightened out, what about Devdas' question?

Surprisingly, it turns out that all the backslashes are still needed. None of them go away when you echo > file, so they weren't there just to get special characters past the shell; and if you edit the file and try removing some of the backslashes, you'll see that the pattern no longer works. I had thought at least some of them, like the ones before the \{ \}, were extraneous, but even those are still needed.

Filtering unprintable characters

As long as I'm writing about regular expressions, I learned a nice little tidbit last week. I'm getting an increasing flood of Asian-language spams which my mail ISP doesn't filter out (they use spamassassin, which is pretty useless for this sort of filtering). I wanted a simple pattern I could pass to egrep (via procmail) that would filter out anything with a run of more than 4 unprintable characters in a row. [^[:print:]]{4,} should do it, but it wasn't working.

The problem, it turns out, is the definition of what's printable. Apparently when the default system character set is UTF-8, just about everything is considered printable! So the trick is that you need to set LC_ALL to something more restrictive, like C (which basically means ASCII) to before :print: becomes useful for language-based filtering. (Thanks to Mikachu for spotting the problem).

So in a terminal, you can do something like

LC_ALL=C egrep -v '[^[:print:]]' filename

In procmail it was a little harder; I couldn't figure out any way to change LC_ALL from a procmail recipe; the only solution I came up with was to add this to ~/.procmailrc:

export LC_ALL=C

It does work, though, and has cut the spam load by quite a bit.

Tags: , , , ,
[ 19:35 Jul 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Sat, 15 Jun 2013

Autocompleting xchat channel log filenames in zsh

Sometimes zsh is a little too smart for its own good.

Something I do surprisingly often is to complete the filenames for my local channel logs in xchat. Xchat gives its logs crazy filenames like /home/akkana/.xchat2/xchatlogs/FreeNode-#ubuntu-us-ca.log. They're hard to autocomplete -- I have to type something like: ~/.xc<tab>xc<tab>l<tab>Fr<tab>\#ub<tab>us<tab> Even with autocompletion, that's a lot of typing!

Bug zsh makes it even worse: I have to put that backslash in front of the hash, \#, or else zsh will see it either as a comment (unless I unsetopt interactivecomments, in which case I can't paste functions from my zshrc when I'm testing them); or as an extended regular expression (unless I unsetopt extendedglob). I don't want to unset either of those options: I use both of them.

Tonight I was fiddling with something else related to extendedglob, and was moved to figure out another solution to the xchat completion problem. Why not get zsh's smart zle editor to insert most of that annoying, not easily autocompletable string for me?

The easy solution was to bind it to a function key. I picked F8 for testing, and figured out its escape sequence by typing echo , then Ctrl-V, then hitting F8. It turns out to insert <ESC>[20~. So I made a binding:

bindkey -s '\e[20~' '~/.xchat2/xchatlogs/ \\\#^B^B^B'

When I press F8, that inserts the following string:

~/.xchat2/xchatlogs/ \#
                    ↑ (cursor ends up here)
... moving the cursor back three characters, so it's right before the space. The space is there so I can autocomplete the server name by typing something like Fr<TAB> for FreeNode. Then I delete the space (Ctrl-D), go to the end of the line (Ctrl-E), and start typing my channel name, like ubu<TAB>us<TAB>. I don't have to worry about typing the rest of the path, or the escaped hash sign.

That's pretty cool. But I wished I could bind it to a character sequence, like maybe .xc, rather than using a function key. (I could use my Crikey program to do that at the X level, but that's cheating; I wanted to do it within zsh.) You can't just use bindkey -s '.xch' '~/.xchat2/xchatlogs/ \\\#^B^B^B' because it's recursive: as soon as zsh inserts the ~/.xc part, that expands too, and you end up with ~/~/.xchat2/xchatlogs/hat2/xchatlogs/ \# \#.

The solution, though it's a lot more lines, is to use the special variables LBUFFER and RBUFFER. LBUFFER is everything left of the cursor position, and RBUFFER everything right of it. So I define a function to set those, then set a zle "widget" to that function, then finally bindkey to that widget:

function autoxchat()
{
    LBUFFER+="~/.xchat2/xchatlogs/"
    RBUFFER=" \\#$RBUFFER"
}
zle -N autoxchat
bindkey ".xc" autoxchat

Pretty cool! The only down side: now that I've gone this far in zle bindings, I'm probably an addict and will waste a lot more time tweaking them.

Tags: , ,
[ 21:31 Jun 15, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 15 Aug 2012

Getting ls to show symlinks (and stripping terminal slashes in shells)

The Linux file listing program, ls, has been frustrating me for some time with its ever-changing behavior on symbolic links.

For instance, suppose I have a symlink named Maps that points to a directory on another disk called /data/Maps. If I say ls ~/Maps, I might want to see where the link points:

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
or I might equally want to see the contents of the /data/Maps directory.

Many years ago, the Unix ls program magically seemed to infer when I wanted to see the link and what it points to, versus when I wanted to see the contents of the directory the link points to. I'm not even sure any more what the rule was; just that I was always pleasantly surprised that it did what I wanted. Now, in modern Linux, it usually manages to do the opposite of what I want. But the behavior has changed several times until, I confess, I'm no longer even sure of what I want it to do.

So if I'm not sure whether I usually want it to show the symlink or follow it ... why not make it do both?

There's no ls flag that will do that. But that's okay -- I can make a shell function to do what I want..

Current ls flags

First let's review man ls to see the relevant flags we do have, searching for the string "deref".

I find three different flags to tell ls to dereference a link: -H (dereference any link explicitly mentioned on the command line -- even though ls does that by default); --dereference-command-line-symlink-to-dir (do the same if it's a directory -- even though -H already does that, and even though ls without any flags also already does that); and -L (dereference links even if they aren't mentioned on the command line). The GNU ls maintainers are clearly enamored with dereferencing symlinks.

In contrast, there's one flag, -d, that says not to dereference links (when used in combination with -l). And -d isn't useful in general (you can't make it part of a normal ls alias) because -d also has another, more primary meaning: it also prevents you from listing the contents of normal, non-symlinked directories.

Solution: a shell function

Let's move on to the problem of how to show both the link information and the dereferenced file.

Since there's no ls flag to do it, I'll have to do it by looping over the arguments of my shell function. In a shell test, you can use -h to tell if a file is a symlink. So my first approach was to call ls -ld on all the symlinks to show what the point to:

ll() {
    /bin/ls -laFH $*
    for f in $*; do
        if [[ -h $f ]]; then
            echo -n Symlink:
            /bin/ls -ld $f
        fi
    done
}

Terminally slashed

That worked on a few simple tests. But when I tried to use it for real I hit another snag: terminal slashes.

In real life, I normally run this with autocompletion. I don't type ll ~/Maps -- I'm more likely to type like ll Ma<tab> -- the tab looks for files beginning with Ma and obligingly completes it as Maps/ -- note the slash at the end.

And, well, it turns out /bin/ls -ld Maps/ no longer shows the symlink, but derefernces it instead -- yes, never mind that the man page says -d won't dereference symlinks. As I said, those ls maintainers really love dereferencing.

Okay, so if I want to not dereference, since there's no ls flag that means really don't dereference, I mean it -- my little zsh function needs to find a way of stripping any terminal slash on each directory name. Of course, I could do it with sed:

        f=`echo $f | sed 's/\/$//'`
and that works fine, but ... ick. Surely zsh has a better way?

In fact, there's a better way that even works in bash (thanks to zsh wizard Mikachu for this gem):

        f=${f%/}

That "remove terminal slash" trick has already come in handy in a couple of other shell functions I use -- definitely a useful trick if you use autocompletion a lot.

Making the link line more readable

But wait: one more tweak, as long as I'm tweaking. That long ls -ld line,

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
is way too long and full of things I don't really care about (the permissions, ownership and last-modified date on a symlink aren't very interesting). I really only want the last three words,
Maps -> /data/Maps/

Of course I could use something like awk to get that. But zsh has everything -- I bet it has a clever way to separate words.

And indeed it does: arrays. The documentation isn't very clear and not all the array functions worked as the docs implied, but here's what ended up working: you can set an array variable by using parentheses after the equals sign in a normal variable-setting statement, and after that, you can refer to it using square brackets. You can even use negative indices, like in python, to count back from the end of an array. That made it easy to do what I wanted:

            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[-3,-1]

Hooray zsh! Though it turned out that -3 didn't work for directories with spaces in the name, so I had to use [9, -1] instead. The echo -E is to prevent strange things happening if there are things like backslashes in the filename.

The completed shell function

I moved the symlink-showing function into a separate function, so I can call it from several different ls aliases, and here's the final result:

show_symlinks() {
    for f in $*; do
        # Remove terminal slash.
        f=${f%/}
        if [[ -h $f ]]; then
            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[9,-1]
        fi
    done
}

ll() {
    /bin/ls -laFH $*
    show_symlinks $*
}

Bash doesn't have arrays like zsh, so replace those two lines with

            echo -n 'Symlink: '
            /bin/ls -ld $f | cut -d ' ' -f 10-
and the rest of the function should work just fine.

Tags: , , ,
[ 20:22 Aug 15, 2012    More linux/cmdline | permalink to this entry | comments ]