Shallow Thoughts : tags : shell

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sun, 06 Jan 2019

Keeping track of reading

About fifteen years ago, a friend in LinuxChix blogged about doing the "50-50 Book Challenge". The goal was to read fifty new books in a year, plus another fifty old books she'd read before.

I had no idea whether this was a lot of books or not. How many books do I read in a year? I had no idea. But now I wanted to know. So I started keeping a list: not for the 50-50 challenge specifically, but just to see what the numbers were like.

It would be easy enough to do this in a spreadsheet, but I'm not really a spreadsheet kind of girl, unless there's a good reason to use one, like accounting tables or other numeric data. So I used a plain text file with a simple, readable format, like these entries from that first year, 2004:

Dragon Hunter: Roy Chapman Andrews and the Central Asiatic Expeditions, Charles Gallenkamp, Michael J. Novacek
  Fascinating account of a series of expeditions in the early 1900s
  searching for evidence of early man.  Instead, they found
  groundbreaking dinosaur discoveries, including the first evidence
  of dinosaurs protecting their eggs (Oviraptor).

Life of Pi
  Uneven, quirky, weird.  Parts of it are good, parts are awful.
  I found myself annoyed by it ... but somehow compelled to keep
  reading.  The ending may have redeemed it.

The Lions of Tsavo : Exploring the Legacy of Africa's Notorious Man-Eaters, Bruce D. Patterson
  Excellent overview of the Tsavo lion story, including some recent
  findings.  Makes me want to find the original book, which turns
  out to be public domain in Project Gutenberg.

- Bellwether, Connie Willis
  What can I say?  Connie Willis is one of my favorite writers and
  this is arguably her best book.  Everyone should read it.
  I can't imagine anyone not liking it.

If there's a punctuation mark in the first column, it's a reread. (I keep forgetting what character to use, so sometimes it's a dot, sometimes a dash, sometimes an atsign.) If there's anything else besides a space, it's a new book. Lines starting with spaces are short notes on what I thought of the book. I'm not trying to write formal reviews, just reminders. If I don't have anything in specific to say, I leave it blank or write a word or two, like "fun" or "disappointing".

Crunching the numbers

That means it's fairly easy to pull out book titles and count them with grep and wc. For years I just used simple aliases:

 All books this year: egrep '^[^ ]' books2019 | wc -l
 Just new books:      egrep '^[^ -.@]' books2019 | wc -l
 Just reread books:   egrep '^[-.@]' books2019 | wc -l

But after I had years of accumulated data I started wanting to see it all together, so I wrote a shell alias that I put in my .zshrc:

booksread() {
  setopt extendedglob
  for f in ~/Docs/Lists/books/books[0-9](#c4); do
    year=$(echo $f | sed 's/.*books//')
    let allbooks=$(egrep '^[^ ]' $f | grep -v 'Book List:' | wc -l)
    let rereads=$(egrep '^[-.@\*]' $f  | grep -v 'Book List:'| wc -l)
    printf "%4s:   All: %3d   New: %3d   rereads: %3d\n" \
           $year $allbooks $(($allbooks - $rereads)) $rereads

In case you're curious, my numbers are all over the map:

$ booksread
2004:   All:  53   New:  44   rereads:   9
2005:   All:  51   New:  36   rereads:  15
2006:   All:  72   New:  59   rereads:  13
2007:   All:  59   New:  49   rereads:  10
2008:   All:  42   New:  33   rereads:   9
2009:   All:  56   New:  47   rereads:   9
2010:   All:  43   New:  27   rereads:  16
2011:   All:  80   New:  55   rereads:  25
2012:   All:  65   New:  58   rereads:   7
2013:   All:  59   New:  54   rereads:   5
2014:   All: 128   New: 121   rereads:   7
2015:   All: 111   New: 103   rereads:   8
2016:   All:  66   New:  62   rereads:   4
2017:   All:  57   New:  56   rereads:   1
2018:   All:  74   New:  71   rereads:   3
2019:   All:   3   New:   3   rereads:   0

So sometimes I beat that 100-book target that the 50-50 people advocated, other times not. I'm not worried about the overall numbers. Some years I race through a lot of lightweight series mysteries; other years I spend more time delving into long nonfiction books.

But I have learned quite a few interesting tidbits.

What Does it all Mean?

I expected my reread count would be quite high. As it turns out, I don't reread nearly as much as I thought. I have quite a few "comfort books" that I like to read over and over again (am I still five years old?), especially when I'm tired or ill. I sometimes feel guilty about that, like I'm wasting time when I could be improving my mind. I tell myself that it's not entirely a waste: by reading these favorite books over and over, perhaps I'll absorb some of the beautiful rhythms, strong characters, or clever plot twists, that make me love them; and that maybe some of that will carry over into my own writing. But it feels like rationalization.

But that first year, 2004, I read 44 new books and reread 9, including the Lord of the Rings trilogy that I hadn't read since I was a teenager. So I don't actually "waste" that much time on rereading. Over the years, my highest reread count was 25 in 2011, when I reread the whole Tony Hillerman series.

Is my reread count low because I'm conscious of the record-keeping, and therefore I reread less than I would otherwise? I don't think so. I'm still happy to pull out a battered copy of Tea with the Black Dragon or Bellweather or Watership Down or The Lion when I don't feel up to launching into a new book.

Another thing I wondered: would keeping count encourage me to read more short mysteries and fewer weighty non-fiction tomes? I admit I am a bit more aware of book lengths now -- oh, god, the new Stephenson is how many pages? -- but I try not to get competitive, even with myself, about numbers, and I don't let a quest for big numbers keep me from reading Blood and Thunder or The Invention of Nature. (And I had that sinking feeling about Stephenson even before I started keeping a book list. The man can write, but he could use an editor with a firm hand.)

What counts as a book? Do I feel tempted to pile up short, easy books to "get credit" for them, or to finish a bad book I'm not enjoying? Sometimes a little, but mostly no. What about novellas? What about partial reads, like skipping chapters? I decide on a case by case basis but don't stress over it. I do keep entries for books I start and don't finish (with spaces at the beginning of the line so they don't show up in the count), with notes on why I gave up on them, or where I left off if I intend to go back.

Unexpected Benefits

Keeping track of my reading has turned out to have other benefits. For instance, it prevents accidental rereads. Last year Dave checked a mystery out of the library (we read a lot of the same books, so anything one of us reads, the other will at least consider). I looked at it and said "That sounds awfully familiar. Haven't we already read it?" Sure enough, it was on my list from the previous year, and I hadn't liked it. Dave doesn't keep a book list, so he started reading, but eventually realized that he, too, had read it before.

And sometimes my memory of a book isn't very clear, and my notes on what I thought of a book are useful. Last year, on a hike, a friend and I got to talking about the efforts to eradicate rats on southern California's Channel Islands. I said "Oh, I read an interesting novel about that recently. Was it Barbara Kingsolver? No, wait ... I think it was T.C. Boyle. Interesting book, you should check it out."

When I got home, I consulted my book lists and found it in 2011:

When the Killing's Done, T.C. Boyle
  A tough slog through part 1, but it gets somewhat better in part 2
  (there are actually a few characters you don't hate, finally)
  and some plot eventually emerges, near the end of the novel.

I sent my friend an email rescinding my recommendation. I told her the book does cover some interesting details related to the rat eradication, but I'd forgotten that it was a poor excuse for a novel. In the end she decided to read it anyway, and her opinion agreed with mine. I believe she's started keeping a book list of her own now.

On the other hand, it's also good to have a record of delightful new discoveries. A gem from last year:

Mr. Penumbra's 24-hour bookstore, Robin Sloan
  Unexpectedly good! I read this because Sloan was on the Embedded
  podcast, but I didn't expect much. Turns out Sloan can write!
  Had me going from the beginning. Also, the glow-in-the-dark books
  on the cover were fun.

Even if I forget Sloan's name (sad, I know, but I have a poor memory for names), when I see a new book of his I'll know to check it out. I didn't love his second book, Sourdough, quite as much as Mr. Penumbra, but he's still an author worth following.

Tags: , , ,
[ 12:09 Jan 06, 2019    More misc | permalink to this entry | comments ]

Thu, 23 Aug 2018

Making Sure the Debian Kernel is Up To Date

I try to avoid Grub2 on my Linux machines, for reasons I've discussed before. Even if I run it, I usually block it from auto-updating /boot since that tends to overwrite other operating systems. But on a couple of my Debian machines, that has meant needing to notice when a system update has installed a new kernel, so I can update the relevant boot files. Inevitably, I fail to notice, and end up running an out of date kernel.

But didn't Debian use to have a /boot/vmlinuz that always linked to the latest kernel? That was such a good idea: what happened to that?

I'll get to that. But before I found out, I got sidetracked trying to find a way to check whether my kernel was up-to-date, so I could have it warn me of out-of-date kernels when I log in.

That turned out to be fairly easy using uname and a little shell pipery:

# Is the kernel running behind?
kernelvers=$(uname -a | awk '{ print $3; }')
latestvers=$(cd /boot; ls -1 vmlinuz-* | sort --version-sort | tail -1 | sed 's/vmlinuz-//')
if [[ $kernelvers != $latestvers ]]; then
    echo "======= Running kernel $kernelvers but $latestvers is available"
    echo "The kernel is up to date"

I put that in my .login. But meanwhile I discovered that that /boot/vmlinuz link still exists -- it just isn't enabled by default for some strange reason. That, of course, is the right way to make sure you're on the latest kernel, and you can do it with the linux-update-symlinks command.

linux-update-symlinks is called automatically when you install a new kernel -- but by default it updates symlinks in the root directory, /, which isn't much help if you're trying to boot off a separate /boot partition.

But you can configure it to notice your /boot partition. Edit /etc/kernel-img.conf and change link_in_boot to yes:

link_in_boot = yes

Then linux-update-symlinks will automatically update the /boot/vmlinuz link whenever you update the kernel, and whatever bootloader you prefer can point to that image. It also updates /boot/vmlinuz.old to point to the previous kernel in case you can't boot from the new one.

Update: To get linux-update-symlinks to update symlinks to reflect the current kernel, you need to reinstall the package for the current kernel, e.g. apt-get install --reinstall linux-image-4.18.0-3-amd64. Just apt-get install --reinstall linux-image-amd64 isn't enough.

Tags: , ,
[ 20:14 Aug 23, 2018    More linux/kernel | permalink to this entry | comments ]

Sat, 01 Oct 2016

Zsh magic: remove all raw photos that don't have a corresponding JPEG

Lately, when shooting photos with my DSLR, I've been shooting raw mode but with a JPEG copy as well. When I triage and label my photos (with pho and metapho), I use only the JPEG files, since they load faster and there's no need to index both. But that means that sometimes I delete a .jpg file while the huge .cr2 raw file is still on my disk.

I wanted some way of removing these orphaned raw files: in other words, for every .cr2 file that doesn't have a corresponding .jpg file, delete the .cr2.

That's an easy enough shell function to write: loop over *.cr2, change the .cr2 extension to .jpg, check whether that file exists, and if it doesn't, delete the .cr2.

But as I started to write the shell function, it occurred to me: this is just the sort of magic trick zsh tends to have built in.

So I hopped on over to #zsh and asked, and in just a few minutes, I had an answer:

rm *.cr2(e:'[[ ! -e ${REPLY%.cr2}.jpg ]]':)

Yikes! And it works! But how does it work? It's cheating to rely on people in IRC channels without trying to understand the answer so I can solve the next similar problem on my own.

Most of the answer is in the zshexpn man page, but it still took some reading and jumping around to put the pieces together.

First, we take all files matching the initial wildcard, *.cr2. We're going to apply to them the filename generation code expression in parentheses after the wildcard. (I think you need EXTENDED_GLOB set to use that sort of parenthetical expression.)

The variable $REPLY is set to the filename the wildcard expression matched; so it will be set to each .cr2 filename, e.g. img001.cr2.

The expression ${REPLY%.cr2} removes the .cr2 extension. Then we tack on a .jpg: ${REPLY%.cr2}.jpg. So now we have img001.jpg.

[[ ! -e ${REPLY%.cr2}.jpg ]] checks for the existence of that jpg filename, just like in a shell script.

So that explains the quoted shell expression. The final, and hardest part, is how to use that quoted expression. That's in section 14.8.7 Glob Qualifiers. (estring) executes string as shell code, and the filename will be included in the list if and only if the code returns a zero status.

The colons -- after the e and before the closing parenthesis -- are just separator characters. Whatever character immediately follows the e will be taken as the separator, and anything from there to the next instance of that separator (the second colon, in this case) is taken as the string to execute. Colons seem to be the character to use by convention, but you could use anything. This is also the part of the expression responsible for setting $REPLY to the filename being tested.

So why the quotes inside the colons? They're because some of the substitutions being done would be evaluated too early without them: "Note that expansions must be quoted in the string to prevent them from being expanded before globbing is done. string is then executed as shell code."

Whew! Complicated, but awfully handy. I know I'll have lots of other uses for that.

One additional note: section 14.8.5, Approximate Matching, in that manual page caught my eye. zsh can do fuzzy matches! I can't think offhand what I need that for ... but I'm sure an idea will come to me.

Tags: , , ,
[ 15:28 Oct 01, 2016    More linux/cmdline | permalink to this entry | comments ]

Fri, 04 Dec 2015

Distclean part 2: some useful zsh tricks

I wrote recently about a zsh shell function to run make distclean on a source tree even if something in autoconf is messed up. In order to save any arguments you've previously passed to configure or, my function parsed the arguments from a file called config.log.

But it might be a bit more reliable to use config.status -- I'm guessing this is the file that make uses when it finds it needs to re-run However, the syntax in that file is more complicated, and parsing it taught me some useful zsh tricks.

I can see the relevant line from config.status like this:

$ grep '^ac_cs_config' config.status
ac_cs_config="'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"

--enable-foo --disable-bar are options I added purely for testing. I wanted to make sure my shell function would work with multiple arguments.

Ultimately, I want my shell function to call --prefix=/usr/local/gimp-git --enable-foo --disable-bar The goal is to end up with $args being a zsh array containing those three arguments. So I'll need to edit out those quotes and split the line into an array.

Sed tricks

The first thing to do is to get rid of that initial ac_cs_config= in the line from config.status. That's easy with sed:

$ grep '^ac_cs_config' config.status | sed -e 's/ac_cs_config=//'
"'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"

But since we're using sed anyway, there's no need to use grep to get the line: we can do it all with sed. First try:

sed -n '/^ac_cs_config/s/ac_cs_config=//p' config.status

Search for the line that starts with ac_cs_config (^ matches the beginning of a line); then replace ac_cs_config= with nothing, and p print the resulting line. -n tells sed not to print anything except when told to with a p.

But it turns out that if you give a sed substitution a blank pattern, it uses the last pattern it was given. So a more compact version, using the search pattern ^ac_cs_config, is:

sed -n '/^ac_cs_config=/s///p' config.status

But there's also another way of doing it:

sed '/^ac_cs_config=/!d;s///' config.status

! after a search pattern matches every line that doesn't match the pattern. d deletes those lines. Then for lines that weren't deleted (the one line that does match), do the substitution. Since there's no -n, sed will print all lines that weren't deleted.

I find that version more difficult to read. But I'm including it because it's useful to know how to chain several commands in sed, and how to use ! to search for lines that don't match a pattern.

You can also use sed to eliminate the double quotes:

sed '/^ac_cs_config=/!d;s///;s/"//g' config.status
'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
But it turns out that zsh has a better way of doing that.

Zsh parameter substitution

I'm still relatively new to zsh, but I got some great advice on #zsh. The first suggestion:

sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args

I'll be using final print -rl - $args for all these examples: it prints an array variable with one member per line. For the actual distclean function, of course, I'll be passing the variable to, not printing it out.

First, let's look at the heart of that expression: the args=( ${(Q)${(z)${(Q)REPLY}}}.

The heart of this is the expression ${(Q)${(z)${(Q)x}}} The zsh parameter substitution syntax is a bit arcane, but each of the parenthesized letters does some operation on the variable that follows.

The first (Q) strips off a level of quoting. So:

$ x='"Hello world"'; print $x; print ${(Q)x}
"Hello world"
Hello world

(z) splits an expression and stores it in an array. But to see that, we have to use print -l, so array members will be printed on separate lines.

$ x="a b c"; print -l $x; print "....."; print -l ${(z)x}
a b c

Zsh is smart about quotes, so if you have quoted expressions it will group them correctly when assigning array members:

x="'a a' 'b b' 'c c'"; print -l $x; print "....."; print -l ${(z)x} 'a a' 'b b' 'c c' ..... 'a a' 'b b' 'c c'

So let's break down the larger expression: this is best read from right to left, inner expressions to outer.

${(Q) ${(z) ${(Q) x }}}
   |     |     |   \
   |     |     |    The original expression, 
   |     |     |   "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"
   |     |     \
   |     |      Strip off the double quotes:
   |     |      '--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
   |     \
   |      Split into an array of three items
    Strip the single quotes from each array member,
    ( --prefix=/usr/local/gimp-git --enable-foo --disable-bar )

For more on zsh parameter substitutions, see the Zsh Guide, Chapter 5: Substitutions.

Passing the sed results to the parameter substitution

There's still a little left to wonder about in our expression, sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args

The IFS= read -r seems to be a common idiom in zsh scripting. It takes standard input and assigns it to the variable $REPLY. IFS is the input field separator: you can split variables into words by spaces, newlines, semicolons or any other character you want. IFS= sets it to nothing. But because the input expression -- "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'" -- has quotes around it, IFS is ignored anyway.

So you can do the same thing with this simpler expression, to assign the quoted expression to the variable $x. I'll declare it a local variable: that makes no difference when testing it in the shell, but if I call it in a function, I won't have variables like $x and $args cluttering up my shell afterward.

local x=$(sed -n '/^ac_cs_config=/s///p' config.status); local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args

That works in the version of zsh I'm running here, 5.1.1. But I've been warned that it's safer to quote the result of $(). Without quotes, if you ever run the function in an older zsh, $x might end up being set only to the first word of the expression. Second, it's a good idea to put "local" in front of the variable; that way, $x won't end up being set once you've returned from the function. So now we have:

local x="$(sed -n '/^ac_cs_config=/s///p' config.status)"; local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args

You don't even need to use a local variable. For added brevity (making the function even more difficult to read! -- but we're way past the point of easy readability), you could say:

args=( ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}} ); print -rl - $args
or even
print -rl - ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}}
... but that final version, since it doesn't assign to a variable at all, isn't useful for the function I'm writing.

Tags: , , , ,
[ 13:25 Dec 04, 2015    More linux/cmdline | permalink to this entry | comments ]

Fri, 27 Nov 2015

Getting around make clean or make distclean aclocal failures

Keeping up with source trees for open source projects, it often happens that you pull the latest source, type make, and get an error like this (edited for brevity):

$ make
cd . && /bin/sh ./missing --run aclocal-1.14
missing: line 52: aclocal-1.14: command not found
WARNING: aclocal-1.14' is missing on your system. You should only need it if you modified acinclude.m4' or'. You might want to install the Automake' and Perl' packages. Grab them from any GNU archive site.

What's happening is that make is set up to run ./ (similar to running ./configure except it does some other stuff tailored to people who build from the most current source tree) automatically if anything has changed in the tree. But if the version of aclocal has changed since the last time you ran or configure, then running configure with the same arguments won't work.

Often, running a make distclean, to clean out all local configuration in your tree and start from scratch, will fix the problem. A simpler make clean might even be enough. But when you try it, you get the same aclocal error.

Whoops! make clean runs make, which triggers the rule that configure has to run before make, which fails.

It would be nice if the make rules were smart enough to notice this and not require configure or autogen if the make target is something simple like clean or distclean. Alas, in most projects, they aren't.

But it turns out that even if you can't run with your usual arguments -- e.g. ./ --prefix=/usr/local/gimp-git -- running ./ by itself with no extra arguments will often fix the problem.

This happens to me often enough with the GIMP source tree that I made a shell alias for it:

alias distclean="./ && ./configure && make clean"

Saving your configure arguments

Of course, this wipes out any arguments you've previously passed to autogen and configure. So assuming this succeeds, your very next action should be to run autogen again with the arguments you actually want to use, e.g.:

./ --prefix=/usr/local/gimp-git

Before you ran the distclean, you could get those arguments by looking at the first few lines of config.log. But after you've run distclean, config.log is gone -- what if you forgot to save the arguments first? Or what if you just forget that you need to re-run again after your distclean?

To guard against that, I wrote a somewhat more complicated shell function to use instead of the simple alias I listed above.

The first trick is to get the arguments you previously passed to configure. You can parse them out of config.log:

$ egrep '^  \$ ./configure' config.log
  $ ./configure --prefix=/usr/local/gimp-git --enable-foo --disable-bar

Adding a bit of sed to strip off the beginning of the command, you could save the previously used arguments like this:

    args=$(egrep '^  \$ ./configure' config.log | sed 's_^  \$ ./configure __')

(There's a better place for getting those arguments, config.status -- but parsing them from there is a bit more complicated, so I'll follow up with a separate article on that, chock-full of zsh goodness.)

So here's the distclean shell function, written for zsh:

distclean() {
    setopt localoptions errreturn

    args=$(egrep '^  \$ ./configure' config.log | sed 's_^  \$ ./configure __')
    echo "Saved args:" $args
    make clean

    echo "==========================="
    echo "Running ./ $args"
    sleep 3
    ./ $args

The setopt localoptions errreturn at the beginning is a zsh-ism that tells the shell to exit if there's an error. You don't want to forge ahead and run configure and make clean if your didn't work right. errreturn does much the same thing as the && between the commands in the simpler shell alias above, but with cleaner syntax.

If you're using bash, you could string all the commands on one line instead, with && between them, something like this: ./ && ./configure && make clean && ./ $args Or perhaps some bash user will tell me of a better way.

Tags: , , ,
[ 13:33 Nov 27, 2015    More programming | permalink to this entry | comments ]

Sat, 28 Dec 2013

Finding filenames in a disorganized directory

I've been scanning a bunch of records with Audacity (using as a guide Carla Schroder's excellent Book of Audacity and a Behringer UCA222 USB audio interface -- audacity doesn't seem able to record properly from the built-in sound card on any laptop I own, while it works fine with the Behringer.

Audacity's user interface isn't great for assembly-line recording of lots of tracks one after the other, especially on a laptop with a trackpad that doesn't work very well, so I wasn't always as organized with directory names as I could have been, and I ended up with a mess. I was periodically backing up the recordings to my desktop, but as I shifted from everything-in-one-directory to an organized system, the two directories got out of sync.

To get them back in sync, I needed a way to answer this question: is every file inside directory A (maybe in some subdirectory of it) also somewhere under subdirectory B? In other words, can I safely delete all of A knowing that anything in it is safely stored in B, even though the directory structures are completely different?

I was hoping for some clever find | xargs way to do it, but came up blank. So eventually I used a little zsh loop: one find to get the list of files to test, then for each of those, another find inside the target directory, then test the exit code of find to see if it found the file. (I'm assuming that if the songname.aup file is there, the songname_data directory is too.)

for fil in $(find AAA/ -name '*.aup'); do
  fil=$(basename $fil)
  find BBB -name $fil >/dev/null
  if [[ $? != 0 ]]; then
    echo $fil is not in BBB

Worked fine. But is there an easier way?

Tags: , , ,
[ 10:36 Dec 28, 2013    More linux/cmdline | permalink to this entry | comments ]

Sat, 24 Aug 2013

A nifty shell redirection trick: process substitution

I love shell pipelines, and flatter myself that I'm pretty good at them. But a discussion last week on the Linuxchix Techtalk mailing list on finding added lines in a file turned up a terrific bash/zsh shell redirection trick I'd never seen before:

join -v 2 <(sort A.txt) <(sort B.txt)

I've used backquotes, and their cognate $(), plenty. For instance, you can do things like PS1=$(hostname): or PS1=`hostname`: to set your prompt to the current hostname: the shell runs the hostname command, takes its output, and substitutes that output in place of the backquoted or parenthesized expression.

But I'd never seen that <(...) trick before, and immediately saw how useful it was. Backquotes or $() let you replace arguments to a command with a program's output -- they're great for generating short strings for programs that take all their arguments on the command line. But they're no good for programs that need to read a file, or several files. <(...) lets you take the output of a command and pass it to a program as though it was the contents of a file. And if you can do it more than once in the same command -- as in Little Girl's example -- that could be tremendously useful.

Playing with it to see if it really did what it looked like it did, and what other useful things I could do with it, I tried this (and it worked just fine):

$ diff <(echo hello; echo there) <(echo hello; echo world)
< there
> world
It acts as though I had two files, which each have "hello" as their first line; but one has "there" as the second line, while the other has "world". And diff shows the difference. I don't think there's any way of doing anything like that with backquotes; you'd need to use temp files.

Of course, I wanted to read more about it -- how have I gone all these years without knowing about this? -- and it looks like I'm not the only one who didn't know about it. In fact, none of the pages I found on shell pipeline tricks even mentioned it.

It turns out it's called "process substitution" and I found it documented in Chapter 23 of the Advanced Bash-Scripting Guide.

I tweeted it, and a friend who is a zsh master gave me some similar cool tricks. For instance, in zsh echo hi > >(cat) > >(cat -n) lets you pipe the output of a command to more than one other command.

That's zsh, but in bash (or zsh too, of course), you can use >() and tee to do the same thing: echo hi | tee >(cat) | cat -n

If you want a temp file to be created automatically, one you can both read and write, you can use =(foo) (zsh only?)

Great stuff! Some other pages that discuss some of these tricks:

Tags: , , ,
[ 19:23 Aug 24, 2013    More linux/cmdline | permalink to this entry | comments ]

Sat, 15 Jun 2013

Autocompleting xchat channel log filenames in zsh

Sometimes zsh is a little too smart for its own good.

Something I do surprisingly often is to complete the filenames for my local channel logs in xchat. Xchat gives its logs crazy filenames like /home/akkana/.xchat2/xchatlogs/FreeNode-#ubuntu-us-ca.log. They're hard to autocomplete -- I have to type something like: ~/.xc<tab>xc<tab>l<tab>Fr<tab>\#ub<tab>us<tab> Even with autocompletion, that's a lot of typing!

Bug zsh makes it even worse: I have to put that backslash in front of the hash, \#, or else zsh will see it either as a comment (unless I unsetopt interactivecomments, in which case I can't paste functions from my zshrc when I'm testing them); or as an extended regular expression (unless I unsetopt extendedglob). I don't want to unset either of those options: I use both of them.

Tonight I was fiddling with something else related to extendedglob, and was moved to figure out another solution to the xchat completion problem. Why not get zsh's smart zle editor to insert most of that annoying, not easily autocompletable string for me?

The easy solution was to bind it to a function key. I picked F8 for testing, and figured out its escape sequence by typing echo , then Ctrl-V, then hitting F8. It turns out to insert <ESC>[20~. So I made a binding:

bindkey -s '\e[20~' '~/.xchat2/xchatlogs/ \\\#^B^B^B'

When I press F8, that inserts the following string:

~/.xchat2/xchatlogs/ \#
                    ↑ (cursor ends up here)
... moving the cursor back three characters, so it's right before the space. The space is there so I can autocomplete the server name by typing something like Fr<TAB> for FreeNode. Then I delete the space (Ctrl-D), go to the end of the line (Ctrl-E), and start typing my channel name, like ubu<TAB>us<TAB>. I don't have to worry about typing the rest of the path, or the escaped hash sign.

That's pretty cool. But I wished I could bind it to a character sequence, like maybe .xc, rather than using a function key. (I could use my Crikey program to do that at the X level, but that's cheating; I wanted to do it within zsh.) You can't just use bindkey -s '.xch' '~/.xchat2/xchatlogs/ \\\#^B^B^B' because it's recursive: as soon as zsh inserts the ~/.xc part, that expands too, and you end up with ~/~/.xchat2/xchatlogs/hat2/xchatlogs/ \# \#.

The solution, though it's a lot more lines, is to use the special variables LBUFFER and RBUFFER. LBUFFER is everything left of the cursor position, and RBUFFER everything right of it. So I define a function to set those, then set a zle "widget" to that function, then finally bindkey to that widget:

function autoxchat()
zle -N autoxchat
bindkey ".xc" autoxchat

Pretty cool! The only down side: now that I've gone this far in zle bindings, I'm probably an addict and will waste a lot more time tweaking them.

Tags: , ,
[ 21:31 Jun 15, 2013    More linux/cmdline | permalink to this entry | comments ]

Wed, 15 Aug 2012

Getting ls to show symlinks (and stripping terminal slashes in shells)

The Linux file listing program, ls, has been frustrating me for some time with its ever-changing behavior on symbolic links.

For instance, suppose I have a symlink named Maps that points to a directory on another disk called /data/Maps. If I say ls ~/Maps, I might want to see where the link points:

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
or I might equally want to see the contents of the /data/Maps directory.

Many years ago, the Unix ls program magically seemed to infer when I wanted to see the link and what it points to, versus when I wanted to see the contents of the directory the link points to. I'm not even sure any more what the rule was; just that I was always pleasantly surprised that it did what I wanted. Now, in modern Linux, it usually manages to do the opposite of what I want. But the behavior has changed several times until, I confess, I'm no longer even sure of what I want it to do.

So if I'm not sure whether I usually want it to show the symlink or follow it ... why not make it do both?

There's no ls flag that will do that. But that's okay -- I can make a shell function to do what I want..

Current ls flags

First let's review man ls to see the relevant flags we do have, searching for the string "deref".

I find three different flags to tell ls to dereference a link: -H (dereference any link explicitly mentioned on the command line -- even though ls does that by default); --dereference-command-line-symlink-to-dir (do the same if it's a directory -- even though -H already does that, and even though ls without any flags also already does that); and -L (dereference links even if they aren't mentioned on the command line). The GNU ls maintainers are clearly enamored with dereferencing symlinks.

In contrast, there's one flag, -d, that says not to dereference links (when used in combination with -l). And -d isn't useful in general (you can't make it part of a normal ls alias) because -d also has another, more primary meaning: it also prevents you from listing the contents of normal, non-symlinked directories.

Solution: a shell function

Let's move on to the problem of how to show both the link information and the dereferenced file.

Since there's no ls flag to do it, I'll have to do it by looping over the arguments of my shell function. In a shell test, you can use -h to tell if a file is a symlink. So my first approach was to call ls -ld on all the symlinks to show what the point to:

ll() {
    /bin/ls -laFH $*
    for f in $*; do
        if [[ -h $f ]]; then
            echo -n Symlink:
            /bin/ls -ld $f

Terminally slashed

That worked on a few simple tests. But when I tried to use it for real I hit another snag: terminal slashes.

In real life, I normally run this with autocompletion. I don't type ll ~/Maps -- I'm more likely to type like ll Ma<tab> -- the tab looks for files beginning with Ma and obligingly completes it as Maps/ -- note the slash at the end.

And, well, it turns out /bin/ls -ld Maps/ no longer shows the symlink, but derefernces it instead -- yes, never mind that the man page says -d won't dereference symlinks. As I said, those ls maintainers really love dereferencing.

Okay, so if I want to not dereference, since there's no ls flag that means really don't dereference, I mean it -- my little zsh function needs to find a way of stripping any terminal slash on each directory name. Of course, I could do it with sed:

        f=`echo $f | sed 's/\/$//'`
and that works fine, but ... ick. Surely zsh has a better way?

In fact, there's a better way that even works in bash (thanks to zsh wizard Mikachu for this gem):


That "remove terminal slash" trick has already come in handy in a couple of other shell functions I use -- definitely a useful trick if you use autocompletion a lot.

Making the link line more readable

But wait: one more tweak, as long as I'm tweaking. That long ls -ld line,

lrwxrwxrwx   1 akkana users              12 Jun 17  2009 Maps -> /data/Maps/
is way too long and full of things I don't really care about (the permissions, ownership and last-modified date on a symlink aren't very interesting). I really only want the last three words,
Maps -> /data/Maps/

Of course I could use something like awk to get that. But zsh has everything -- I bet it has a clever way to separate words.

And indeed it does: arrays. The documentation isn't very clear and not all the array functions worked as the docs implied, but here's what ended up working: you can set an array variable by using parentheses after the equals sign in a normal variable-setting statement, and after that, you can refer to it using square brackets. You can even use negative indices, like in python, to count back from the end of an array. That made it easy to do what I wanted:

            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[-3,-1]

Hooray zsh! Though it turned out that -3 didn't work for directories with spaces in the name, so I had to use [9, -1] instead. The echo -E is to prevent strange things happening if there are things like backslashes in the filename.

The completed shell function

I moved the symlink-showing function into a separate function, so I can call it from several different ls aliases, and here's the final result:

show_symlinks() {
    for f in $*; do
        # Remove terminal slash.
        if [[ -h $f ]]; then
            line=( $(/bin/ls -ld $f ) )
            echo -E Symlink: $line[9,-1]

ll() {
    /bin/ls -laFH $*
    show_symlinks $*

Bash doesn't have arrays like zsh, so replace those two lines with

            echo -n 'Symlink: '
            /bin/ls -ld $f | cut -d ' ' -f 10-
and the rest of the function should work just fine.

Tags: , , ,
[ 20:22 Aug 15, 2012    More linux/cmdline | permalink to this entry | comments ]

Sat, 03 Sep 2011

List only directories

Fairly often, I want a list of subdirectories inside a particular directory. For instance, when posting blog entries, I may need to decide whether an entry belongs under "linux" or some sub-category, like "linux/cmdline" -- so I need to remind myself what categories I have under linux.

But strangely, Linux offers no straightforward way to ask that question. The ls command lists directories -- along with the files. There's no way to list just the directories. You can list the directories first, with the --group-directories-first option. Or you can flag the directories specially: ls -F appends a slash to each directory name, so instead of linux you'd see linux/. But you still have to pick the directories out of a long list of files. You can do that with grep, of course:

ls -1F ~/web/blog/linux | grep /
That's a one, not an ell: it tells ls to list files one per line. So now you get a list of directories, one per line, with a slash appended to each one. Not perfect, but it's a start.

Or you can use the find program, which has an option -type d that lists only directories. Perfect, right?

find ~/web/blog/linux -maxdepth 1 -type d

Except that lists everything with full pathnames: /home/akkana/web/blog/linux, /home/akkana/web/blog/linux/editors, /home/akkana/web/blog/linux/cmdline and so forth. Way too much noise to read quickly.

What I'd really like is to have just a list of directory names -- no slashes, no newlines. How do we get from ls or find output to that? Either we can start with find and strip off all the path information, either in a loop with basename or with a sed command; or start with ls -F, pick only the lines with slashes, then strip off those slashes. The latter sounds easier.

So let's go back to that ls -1F ~/web/blog/linux | grep / command. To strip off the slashes, you can use sed's s (substitute) command. Normally the syntax is sed 's/oldpat/newpat/'. But since slashes are the pattern we're substituting, it's better to use something else as the separator character. I'll use an underscore.

The old pattern, the one I want to replace, is / -- but I only want to replace the last slash on the line, so I'll add a $ after it, representing end-of-line. The new pattern I want instead of the slash is -- nothing.

So my sed argument is 's_/$__' and the command becomes:

ls -1F ~/web/blog/linux | grep / | sed 's_/$__'

That does what I want. If I don't want them listed one per line, I can fudge that using backquotes to pass the output of the whole command to the shell's echo command:

echo `ls -1F ~/web/blog/linux | grep / | sed 's_/$__'`

If you have a lot of directories to list and you want ls's nice columnar format, that's a little harder. You can ls the list of directories (the names inside the backquotes), ls `your long command` -- except that now that you've stripped off the path information, ls won't know where to find the files. So you'd have to change directory first:

cd ~/web/blog/linux; ls -d `ls -1F | grep / | sed 's_/$__'`

That's not so good, though, because now you've changed directories from wherever you were before. To get around that, use parentheses to run the commands inside a subshell:

(cd ~/web/blog/linux; ls -d `ls -1F | grep / | sed 's_/$__'`)

Now the cd only applies within the subshell, and when the command finishes, your own shell will still be wherever you started.

Finally, I don't want to have to go through this discovery process every time I want a list of directories. So I turned it into a couple of shell functions, where $* represents all the arguments I pass to the command, and $1 is just the first argument.

lsdirs() { 
  (cd $1; /bin/ls -d `/bin/ls -1F | grep / | sed 's_/$__'`)

lsdirs2() { 
  echo `/bin/ls -1F $* | grep / | sed 's_/$__'` 
I specify /bin/ls because I have a function overriding ls in my .zshrc. Most people won't need to, but it doesn't hurt.

Now I can type lsdirs ~/web/blog/linux and get a nice list of directories.

Update, shortly after posting: In zsh (which I use), there's yet another way: */ matches only directories. It appends a trailing slash to them, but *(/) matches directories and omits the trailing slash. So you can say

echo ~/web/blog/linux/*(/:t)
:t strips the directory part of each match. To see other useful : modifiers, type ls *(: then hit TAB.

Thanks to Mikachu for the zsh tips. Zsh can do anything, if you can just figure out how ...

Tags: , , ,
[ 11:22 Sep 03, 2011    More linux/cmdline | permalink to this entry | comments ]

Tue, 15 Mar 2011

Using grep to solve another Cartalk puzzler

It's another episode of "How to use Linux to figure out CarTalk puzzlers"! This time you don't even need any programming.

Last week's puzzler was A Seven-Letter Vacation Curiosity. Basically, one couple hiking in Northern California and another couple carousing in Florida both see something described by a seven-letter word containing all five vowels -- but the two things they saw were very different. What's the word?

That's an easy one to solve using basic Linux command-line skills -- assuming the word is in the standard dictionary. If it's some esoteric word, all bets are off. But let's try it and see. It's a good beginning exercise in regular expressions and how to use the command line.

There's a handy word list in /usr/share/dict/words, one word per line. Depending on what packages you have installed, you may have bigger dictionaries handy, but you can usually count on /usr/share/dict/words being there on any Linux system. Some older Unix systems may have it in /usr/dict/words instead.

We need a way to choose all seven letter words. That's easy. In a regular expression, . (a dot) matches one letter. So ....... (seven dots) matches any seven letters.

(There's a more direct way to do that: the expression .\{7\} will also match 7 letters, and is really a better way. But personally, I find it harder both to remember and to type than the seven dots. Still, if you ever need to match 43 characters, or 114, it's good to know the "right" syntax.)

Fine, but if you grep ....... /usr/share/dict/words you get a list of words with seven or more letters. See why? It's because grep prints any line where it finds a match -- and a word with nine letters certainly contains seven letters within it.

The pattern you need to search for is '^.......$' -- the up-caret ^ matches the beginning of a line, and the dollar sign $ matches the end. Put single quotes around the pattern so the shell won't try to interpret the caret or dollar sign as special characters. (When in doubt, it's always safest to put single quotes around grep patterns.)

So now we can view all seven-letter words: grep '^.......$' /usr/share/dict/words
How do we choose only the ones that contain all the letters a e i o and u?

That's easy enough to build up using pipelines, using the pipe character | to pipe the output of one grep into a different grep. grep '^.......$' /usr/share/dict/words | grep a sends that list of 7-letter words through another grep command to make sure you only see words containing an a.

Now tack a grep for each of the other letters on the end, the same way:
grep '^.......$' /usr/share/dict/words | grep a | grep e | grep i | grep o | grep u

Voilà! I won't spoil the puzzler, but there are two words that match, and one of them is obviously the answer.

The power of the Unix command line to the rescue!

Tags: , , , , ,
[ 11:00 Mar 15, 2011    More linux/cmdline | permalink to this entry | comments ]

Fri, 02 Apr 2010

Article: Making Bash Error Messages Friendlier

A while back I worked on an error handler for bash that made the shell a lot friendlier for newbies (or anyone else, really).

Linux Planet gave me the chance to write it up in more detail, explaining a bit more about how it works: Making Bash Error Messages Friendlier.

Tags: , ,
[ 17:17 Apr 02, 2010    More writing | permalink to this entry | comments ]

Fri, 27 Nov 2009

Tip: Bash remembering history across sessions

Two separate friends just had this problem, one of them a fairly experienced Linux user:

You're in bash, history works, but it's not remembered across sessions. Why?

Maybe the size of the history file somehow got set to zero?

Nope -- that's not it.

Maybe it's using the wrong file. In bash you can set $HISTFILE to point to different places; for instance, you can use that to maintain different histories per window, or per machine.

$ echo $HISTFILE
Nope, that's not it either.

The problem, for both people, turned out to be really simple:

$ ls -l $HISTFILE
-rw------- 1 root root 92 2007-08-20 14:03 /home/user/.bash_history

I'm not sure how it happens, but sometimes the .bash_history file becomes owned by root, and then as a normal user you can't update your history any more.

So a simple

and you're all set -- history across sessions should start working again.

Tags: , ,
[ 14:42 Nov 27, 2009    More linux/cmdline | permalink to this entry | comments ]

Sun, 08 Nov 2009

Friendlier error messages for shell newbies

Helping people get started with Linux shells, I've noticed they tend to make two common mistakes vastly more than any others:
  1. Typing a file path without a slash, like etc/fstab
  2. typing just a filename, without a command in front of it

The first boils down to a misunderstanding of how the Linux file system hierarchy works. (For a refresher, you might want to check out my Linux Planet article Navigating the Linux Filesystem.)

The second problem is due to forgetting the rules of shell grammar. Every shell sentence needs a verb, just like every sentence in English. In the shell, the command is the verb: what do you want to do? The arguments, if any, are the verb's direct object: What do you want to do it to?

(For grammar geeks, there's no noun phrase for a subject because shell commands are imperative. And yes, I ended a sentence with a preposition, so go ahead and feel superior if you believe that's incorrect.)

The thing is, both mistakes are easy to make, especially when you're new to the shell, perhaps coming from a "double-click on the file and let the computer decide what you should do with it" model. The shell model is a lot more flexible and (in my opinion) better -- you, not the computer, gets to decide what you should do with each file -- but it does take some getting used to.

But as a newbie, all you know is that you type a command and get some message like "Permission denied." Why was permission denied? How are you to figure out what the real problem was? And why can't the shell help you with that?

And a few days ago I realized ... it can! Bash, zsh and similar shells have a fairly flexible error handling mechanism. Ubuntu users have seen one part of this, where if you type a command you don't have installed, Ubuntu gives you a fancy error message suggesting what you might have meant and/or what package you might be missing:

$ catt /etc/fstab
No command 'catt' found, did you mean:
 Command 'cat' from package 'coreutils' (main)
 Command 'cant' from package 'swap-cwm' (universe)
catt: command not found

What if I tapped into that same mechanism and wrote a more general handler that could offer helpful suggestions when it looked like the user forgot the command or the leading slash?

It turns out that Ubuntu's error handler uses a ridiculously specific function called command_not_found_handle that can't be used for other errors. Some helpful folks I chatted with on #bash felt, as I did, that such a specific mechanism was silly. But they pointed me to a more general error trapping mechanism that turned out to work fine for my purposes.

It took some fussing and fighting with bash syntax, but I have a basic proof-of-concept. Of course it could be expanded to cover a lot more types of error cases -- and more types of files the user might want to open.

Here are some sample errors it catches:

$ schedule.html
bash: ./schedule.html: Permission denied

schedule.html is an HTML file. Did you want to run: firefox schedule.html

$ screenshot.jpg
bash: ./screenshot.jpg: Permission denied

screenshot.jpg is an image file. Did you want to run:
    pho screenshot.jpg
    gimp screenshot.jpg

$ .bashrc
bash: ./.bashrc: Permission denied

.bashrc is a text file. Did you want to run:
    less .bashrc
    vim .bashrc

$ ls etc/fstab
/bin/ls: cannot access etc/fstab: No such file or directory

Did you forget the leading slash?
etc/fstab doesn't exist, but /etc/fstab does.

You can find the code here: Friendly shell errors and of course I'm happy to take suggestions or contributions for how to make it friendlier to new shell users.

Tags: , , , ,
[ 15:07 Nov 08, 2009    More linux | permalink to this entry | comments ]

Thu, 12 Mar 2009

Linux Planet: Basic Shell Scripting

I'm beginning a programming series on Linux Planet, starting with a basic intro to shell scripting for people with no programming experience: Intro to Shell Programming: Writing a Simple Web Gallery

(For those inclined, digg and reddit links).

Tags: , ,
[ 23:08 Mar 12, 2009    More writing | permalink to this entry | comments ]

Sat, 28 Feb 2009

langgrep: search only in scripts of a specified language

I was making a minor tweak to my garmin script that uses gpsbabel to read in tracklogs and waypoints from my GPS unit, and I needed to look up the syntax of how to do some little thing in sh script. (One of the hazards of switching languages a lot: you forget syntax details and have to look things up a lot, or at least I do.)

I have quite a collection of scripts in various languages in my ~/bin (plus, of course, all the scripts normally installed in /usr/bin on any Linux machine) so I knew I'd have lots of examples. But there are scripts of all languages sharing space in those directories; it's hard to find just sh examples. For about the two-hundredth time, I wished, "Wouldn't it be nice to have a command that can search for patterns only in files that are really sh scripts?"

And then, the inevitable followup ... "You know, that would be really easy to write."

So I did -- a little python hack called langgrep that takes a language, grep arguments and a file list, looks for a shebang line and only greps the files that have a shebang matching the specified language.

Of course, while writing langgrep I needed langgrep, to look up details of python syntax for things like string.find (I can never remember whether it's string.find(s, pat) or s.find(pat); the python libraries are usually nicely object-oriented but strings are an exception and it's the former, string.find). I experimented with various shell options -- this is Unix, so of course there are plenty of ways of doing this in the shell, without writing a script. For instance:

grep find `egrep -l '#\\!.*python' *`
grep find `file * | grep python | sed 's/:.*//'`
i in foo; file $i|grep python && grep find $i; done    # in sh/bash
These are all pretty straightforward, but when I try to make them into tcsh aliases things get a lot trickier. tcsh lets you make aliases that take arguments, so you can use !:1 to mean the first argument, !2-$ to mean all the arguments starting with the second one. That's all very well, but when you put them into a shell alias in a file like .cshrc that has to be parsed, characters like ! and $ can mean other things as well, so you have to escape them with \. So the second of those three lines above turns into something like
alias greplang "grep \!:2-$ `file * | grep \!:1 | sed 's/:.*//'`"
except that doesn't work either, so it probably needs more escaping somewhere. Anyway, I decided after a little alias hacking that figuring out the right collection of backslash escapes would probably take just as long as writing a python script to do the job, and writing the python script sounded more fun.

So here it is: my langgrep script. (Awful name, I know; better ideas welcome!) Use it like this (if python is the language you're looking for, find is the search pattern, and you want -w to find only "find" as a whole word):

langgrep python -w find ~/bin/*

Tags: , ,
[ 10:57 Feb 28, 2009    More programming | permalink to this entry | comments ]

Thu, 26 Feb 2009

Linux Planet article: DotDot and Permissions

Probably the last in the commandline series (at least for a while, today's article covers the meaning of . and .. in Linux pathnames, and how to tell what permissions a file has.

Sharing Files in Linux and Understanding Pathnames

Tags: , ,
[ 22:26 Feb 26, 2009    More writing | permalink to this entry | comments ]

Thu, 12 Feb 2009

Navigating the Linux filesystem

My latest Linux Planet article covers how to find your way around the Linux filesystem in the command-line, for anyone who wants to graduate from file managers and start using the shell.

Navigating the Linux Filesystem (and the Digg link for those so inclined).

Tags: , ,
[ 18:42 Feb 12, 2009    More writing | permalink to this entry | comments ]

Mon, 22 Dec 2008

What is a shell, anyway?

Continuing the basic Linux command-line tutorial series, a discussion of the difference between a terminal window and a shell: The Linux Command Shell For Beginners: What is the Shell?

(Digg link, for those who digg).

Tags: , ,
[ 17:16 Dec 22, 2008    More writing | permalink to this entry | comments ]

Fri, 12 Dec 2008

Navigating the Linux filesystem

My latest Linux Planet article covers how to find your way around the Linux filesystem in the command-line, for anyone who wants to graduate from file managers and start using the shell.

Navigating the Linux Filesystem (and the Digg link for those so inclined).

Tags: , ,
[ 13:49 Dec 12, 2008    More writing | permalink to this entry | comments ]

Sun, 12 Oct 2008

More fun with regexps: Adding "[no output]" in shell logs

Someone on LinuxChix' techtalk list asked whether she could get tcsh to print "[no output]" after any command that doesn't produce output, so that when she makes logs to help her co-workers, they will seem clearer.

I don't know of a way to do that in any shell (the shell would have to capture the output of every command; emacs' shell-mode does that but I don't think any real shells do) but it seemed like it ought to be straightforward enough to do as a regular expression substitute in vi. You're looking for lines where a line beginning with a prompt is followed immediately by another line beginning with a prompt; the goal is to insert a new line consisting only of "[no output]" between the two lines.

It turned out to be pretty easy in vim. Here it is:

:%s/\(^% .*$\n\)\(% \)/\1[no results]\r\2/


starts a command
do the following command on every line of this file
start a global substitute command
start a "capture group" -- you'll see what it does soon
match only patterns starting at the beginning of a line
look for a % followed by a space (your prompt)
after the prompt, match any other characters until...
the end of the line, after which...
there should be a newline character
end the capture group after the newline character
start a second capture group
look for another prompt. In other words, this whole
expression will only match when a line starting with a prompt
is followed immediately by another line starting with a prompt.
end the second capture group
We're finally done with the mattern to match!
Now we'll start the replacement pattern.
Insert the full content of the first capture group
(this is also called a "backreference" if you want
to google for a more detailed explanation).
So insert the whole first command up to the newline
after it.
[no results]
After the newline, insert your desired string.
insert a carriage return here (I thought this should be
\n for a newline, but that made vim insert a null instead)
insert the second capture group (that's just the second prompt)
end of the substitute pattern

Of course, if you have a different prompt, substitute it for "% ". If you have a complicated prompt that includes time of day or something, you'll have to use a slightly more complicated match pattern to match it.

Tags: , , , ,
[ 14:34 Oct 12, 2008    More linux/editors | permalink to this entry | comments ]

Sun, 31 Aug 2008

Useful shell pipeline: Who checks in how much?

I wanted to get a list of who'd been contributing the most in a particular open source project. Most projects of any size have a ChangeLog file, in which check-ins have entries like this:
2008-08-26  Jane Hacker  <>

        * src/app/print.c: make sure the Portrait and Landscape
        * buttons update according to the current setting.

I wanted to take each entry, save the name of the developer checking in, then eventually count the number of times each name occurs (the number of times that developer checked in) and print them in order from most check-ins to least.

Getting the names is easy: for check-ins in the last 9 years, I just want the lines that start with "200". (Of course, if I wanted earlier check-ins I could make the match more general.)

grep "^200" ChangeLog

But now I want to trim the line so it includes only the contributor's name. A bit of sed geekery can do that: the date is a fixed format (four characters, a dash, two, dash, two, then two spaces, so "^....-..-.. " matches that pattern.

But I want to remove the email address part too (sometimes people use different email addresses when they check in). So I want a sed pattern that will match something at the front (to discard), something in the middle (keep that part) and something at the end (discard).

Here's how to do that in sed:

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\)<.*$/\1/'
In English, that says: "For each line in the ChangeLog that starts with 200, find a pattern at the beginning consisting of any four characters, a dash, two characters, dash, two characters, dash, and two spaces; then immediately after that, save all characters up to a < symbol; then throw away the < and any characters that follow until the end of the line."

That works pretty well! But it's not quite right: it includes the two spaces after the name as part of the name. In sed, \s matches any space character (like space or tab). So you'd think this should work:

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\)\s+<.*$/\1/'
\s+ means it will require that at least one and maybe more space characters immediately before the < are also discarded. But it doesn't work. It turns out the reason is that the \(.*\) expression is "greedier" than the \s+: so the saved name expression grabs the first space, leaving only the second to the \s+.

The way around that is to make the name expression specify that it can't end with a space. \S is the term for "anything that's not a space character"; so the expression becomes

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\S\)\s\+<.*$/\1/'
(the + turned out to need a backslash before it).

We have the list of names! Add a | sort on the end to sort them alphabetically -- that will make sure you get all the "Jane Hacker" lines listed together. But how to count them? The Unix program most frequently invoked after sort is uniq, which gets rid of all the repeated lines. On a hunch, I checked out the man page, man uniq, and found the -c option: "prefix lines by the number of occurrences". Perfect! Then just sort them by the number, from largest to smallest:

grep "^200" ChangeLog | sed 's/^....-..-..  \(.*\S\)\s+<.*$/\1/' | sort | uniq -c | sort -rn
And we're done!

Now, this isn't perfect since it doesn't catch "Checking in patch contributed by" attributions -- but those aren't in a standard format in most projects, so they have to be handled by hand.

Disclaimer: Of course, number of check-ins is not a good measure of how important or productive someone is. You can check in a lot of one-line fixes, or you can write an important new module and submit it for someone else to merge in. The point here wasn't to rank developers, but just to get an idea who was checking into the tree and how often.

Well, that ... and an excuse to play with nifty Linux shell pipelines.

Tags: , , , ,
[ 12:12 Aug 31, 2008    More linux | permalink to this entry | comments ]

Wed, 07 Nov 2007

Tried bash, went back to tcsh

I've been a tcsh user for many years. Back in the day, there were lots of reasons for preferring csh to sh, mostly having to do with command history. Most of those reasons are long gone -- modern bash and tcsh are vastly improved from those early shells, and borrow from each other, so the differences are fairly minor.

Back in July, I solved the last blocker that had been keeping me off bash, so I put some effort into migrating all my .cshrc settings into a .bashrc so I could give bash a fair shot. It almost won; but after four months, I've switched back to tcsh, due mostly to a single niggling bash bug that I just can't seem to solve. After all these years, the crucial difference is still history. Specifically, history amnesia: bash has an annoying habit of forgetting history commands just when I most want them back.

Say I type some longish command. After it runs, I hit return a couple of times, wait a while, do a couple of other things, then decide I want to call that command back from history so I can run something similar, maybe with the filename changed or a different flag. I ctrl-P or up-arrow ... and the command isn't there!

If I type history at this point, I'll see most of my command history ... with an empty line in place of the line I was hoping to repeat. The command is gone. My only option is to remember what I typed, and type it all again.

Nobody seems to know why this happens, and it's sporadic, doesn't happen every time. Friends have been able to reproduce it, so it's not just me or my weird settings. It drives me batty. It wouldn't be so bad except it always seems to happen on the tricky commands that I really didn't want to retype.

It's too bad, because otherwise I had bash nicely whipped into shape, and it does have some advantages over tcsh. Some of the tradeoffs:

tcsh wins

Of course, you bash users, set me straight if I missed out on some bash options that would have solved some of these problems. And especially if you have a clue about the evil disappearing history commands!

bash wins

Of course, bash and tcsh aren't the only shells around. From what I hear, zsh blends the good features of bash and tcsh. One of these days I'll try it and see.

Tags: , , , ,
[ 22:58 Nov 07, 2007    More linux | permalink to this entry | comments ]

Tue, 17 Jul 2007

Adventures with bash's word erase

I've been a happy csh/tcsh user for decades. But every now and then I bow to pressure and try to join the normal bash-using Linux world.

But I always come up against one problem right away: word erase (Control-W). For those who don't use ^W, suppose I type something like:

% ls /backups/images/trips/arizona/2007
Then I suddenly realize I want utah in 2007, not arizona. In csh, I can hit ^W twice and it erases the last two words, and I'm ready to type u<tab>. In bash, ^W erases the whole path leaving only "ls", so it's no help here. It may seem like a small thing, but I use word erase hundreds of times a day and it's hard to give it up. Google was no help, except to tell me I wasn't the only one asking.

Then the other day I was chatting about this issue with a friend who uses zsh for that reason (zsh is much more flexible at defining key bindings) and someone asked, "Is that like Meta-Delete?"

It turned out that Alt-Backspace (like many Linux applications, bash calls the Alt key "Meta", and Linux often confuses Delete and Backspace) did exactly what I wanted. Very promising!

But Alt-Backspace is not easy to type, since it's not reachable from the "home" typing position. What I needed, now that I knew bash and readline had the function, was a way to bind it to ^W.

Bash's binding syntax is documented, though the functions available don't seem to be. But bind -p | grep word gave me some useful information. It seems that \C-w was bound to "unix-word-rubout" (that was the one I didn't want) whereas "\e\C-?" was bound to "backward-kill-word". ("\e\C-?" is an obscure way of saying Meta-DEL: \e is escape, and apparently bash, like emacs, treats ESC followed by a key as the same as pressing Alt and the key simultaneously. And Control-question-mark is the Delete character in ASCII.)

So my task was to bind \C-w to backward-kill-word. It looked like this ought to work:

bind '\C-w:backward-kill-word'

... Except it didn't. bind -p | grep w showed that C-W was still bound to "unix-word-rubout".

It turned out that it was the terminal (stty) settings causing the problem: when the terminal's werase (word erase) character is set, readline hardwires that character to do unix-word-rubout and ignores any attempts to change it.

I found the answer in a bash bug report. The stty business was introduced in readline 5.0, but due to complaints, 5.1 was slated to add a way to override the stty settings. And happily, I had 5.2! So what was this new way override method? The posting gave no hint, but eventually I found it.

Put in your .inputrc:

set bind-tty-special-chars Off

And finally my word erase worked properly and I could use bash!

Tags: , , ,
[ 16:22 Jul 17, 2007    More linux | permalink to this entry | comments ]

Wed, 28 Feb 2007

Random Wallpaper

I was talking about desktop backgrounds -- wallpaper -- with some friends the other day, and it occurred to me that it might be fun to have my system choose a random backdrop for me each morning.

Finding backgrounds is no problem: I have plenty of images stored in ~/Backgrounds -- mostly photos I've taken over the years, with a smattering of downloads from sites like the APOD. So all I needed was a way to select one file at random from the directory.

This is Unix, so there's definitely a commandline way to do it, right? Well, surprisingly, I couldn't find an easy way that didn't involve any scripting. Some shells have a random number generator built in ($RANDOM in bash) but you still have to do some math on the result.

Of course, I could have googled, since I'm sure other people have written random-wallpaper scripts ... but what's the fun in that? If it has to be a script, I might as well write my own.

Rather than write a random wallpaper script, I wanted something that could be more generally useful: pick one random line from standard input and print it. Then I could pass it the output of ls -1 $HOME/Backgrounds, and at the same time I'd have a script that I could also use for other purposes, such as choosing a random quotation, or choosing a "flash card" question when studying for an exam.

The obvious approach is to read all of standard input into an array, count the lines, then pick a random number between one and $num_lines and print that array element. It took no time to whip that up in Python and it worked fine. But it's not very efficient -- what if you're choosing a line from a 10Mb file?

Then Sara Falamaki (thanks, Sara!) pointed me to a page with a neat Perl algorithm. It's Perl so it's not easy to read, but the algorithm is cute. You read through the input line by line, keeping track of the line number. For each line, the chance that this line should be the one printed at the end is the reciprocal of the line number: in other words, there's one chance out of $line_number that this line is the one to print.

So if there's only one line, of course you print that line; when you get to the second line, there's one chance out of two that you should switch; on the third, one chance out of three, and so on.

A neat idea, and it doesn't require storing the whole file in memory. In retrospect, I should have thought of it myself: this is basically the same algorithm I used for averaging images in GIMP for my silly Chix Stack Mars project, and I later described the method in the image stacking section of my GIMP book. To average images by stacking them, you give the bottom layer 100% opacity, the second layer 50% opacity, the third 33% opacity, and so on up the stack. Each layer makes an equal contribution to the final result, so what you see is the average of all layers.

The randomline script, which you can inspect here, worked fine, so I hooked it up to accomplish the original problem: setting a randomly chosen desktop background each day. Since I use a lightweight window manager (fvwm) rather than gnome or kde, and I start X manually rather than using gdm, I put this in my .xinitrc:

(xsetbg -fullscreen -border black `find $HOME/Backgrounds -name "*.*" | randomline`) &

Update: I've switched to using hsetroot, which is a little more robust than xsetbg. My new command is:

hsetroot -center `find -L $HOME/Backgrounds -name "*.*" | randomline`

So, an overlong article about a relatively trivial but nontheless nifty algorithm. And now I have a new desktop background each day. Today it's something prosaic: mud cracks from Death Valley. Who knows what I'll see tomorrow?

Update, years later: I've written a script for the whole job, randombg, because on my laptop I want to choose from a different set of backgrounds depending on whether I'm plugged in to an external monitor or using the lower resolution laptop display.
But meanwhile, I've just been pointed to the shuf command, which does pretty much what my randomline script did. So you don't actually need any scripts, just

hsetroot -fill `find ~/Images/Backgrounds/1680x1050/ -name '*.jpg' | shuf -n 1`

Tags: , ,
[ 14:02 Feb 28, 2007    More programming | permalink to this entry | comments ]

Fri, 29 Dec 2006

The Joys of Shell Pipelines ...

A friend called me for help with a sysadmin problem they were having at work. The problem: find all files bigger than one gigabyte, print all the filenames, add up all the sizes and print the total. And for some reason (not explained to me) they needed to do this all in one command line.

This is Unix, so of course it's possible somehow! The obvious place to start is with the find command, and man find showed how to find all the 1G+ files:

find / -size +1G

(Turns out that's a GNU find syntax, and BSD find, on OS X, doesn't support it. I left it to my friend to check man find for the OS X equivalent of -size _1G.)

But for a problem like this, it's pretty clear we'd need to get find to execute a program that prints both the filename and the size. Initially I used ls -ls, but Saz (who was helping on IRC) pointed out that du on a file also does that, and looks a bit cleaner. With find's unfortunate syntax, that becomes:

find / -size +1G -exec du "{}" \;

But now we needed awk, to collect and add up all the sizes while printing just the filenames. A little googling (since I don't use awk very often) and experimenting led to the final solution:

find / -size +1G -exec du "{}" \; | awk '{print $2; total += $1} END { print "Total is", total}'

Ah, the joys of Unix shell pipelines!

Update: Ed Davies suggested an easier way to do the same thing. turns out du will handle it all by itself: du -hc `find . -size +1G` Thanks, Ed!

Tags: , , , ,
[ 17:53 Dec 29, 2006    More linux | permalink to this entry | comments ]

Sun, 14 May 2006

Linkifying with Regular Expressions

I had a page of plaintext which included some URLs in it, like this:
Tour of the Hayward Fault

Technical Reports on Hayward Fault

I wanted to add links around each of the urls, so that I could make it part of a web page, more like this:

Tour of the Hayward Fault

Technical Reports on Hayward Fault
htt p://

Surely there must be a program to do this, I thought. But I couldn't find one that was part of a standard Linux distribution.

But you can do a fair job of linkifying just using a regular expression in an editor like vim or emacs, or by using sed or perl from the commandline. You just need to specify the input pattern you want to change, then how you want to change it.

Here's a recipe for linkifying with regular expressions.

Within vim:

:%s_\(https\=\|ftp\)://\S\+_<a href="&">&</a>_

If you're new to regular expressions, it might be helpful to see a detailed breakdown of why this works:

Tell vim you're about to type a command.
The following command should be applied everywhere in the file.
Do a global substitute, and everything up to the next underscore will represent the pattern to match.
This will be a list of several alternate patterns.
If you see an "http", that counts as a match.
Zero or one esses after the http will match: so http and https are okay, but httpsssss isn't.
Here comes another alternate pattern that you might see instead of http or https.
URLs starting with ftp are okay too.
We're done with the list of alternate patterns.
After the http, https or ftp there should always be a colon-slash-slash.
After the ://, there must be a character which is not whitespace.
There can be any number of these non-whitespace characters as long as there's at least one. Keep matching until you see a space.
Finally, the underscore that says this is the end of the pattern to match. Next (until the final underscore) will be the expression which will replace the pattern.
<a href="&">
An ampersand, &, in a substitute expression means "insert everything that was in the original pattern". So the whole url will be inserted between the quotation marks.
Now, outside the <a href="..."> tag, insert the matched url again, and follow it with a </a> to close the tag.
The final underscore which says "this is the end of the replacement pattern". We're done!

Linkifying from the commandline using sed

Sed is a bit trickier: it doesn't understand \S for non-whitespace, nor = for "zero or one occurrence". But this expression does the trick:
sed -e 's_\(http\|https\|ftp\)://[^ \t]\+_<a href="&">&</a>_' <infile.txt >outfile.html

Addendum: George Riley tells me about VST for Vim 7, which looks like a nice package to linkify, htmlify, and various other useful things such as creating HTML presentations. I don't have Vim 7 yet, but once I do I'll definitely check out VST.

Tags: , , , , ,
[ 13:40 May 14, 2006    More linux/editors | permalink to this entry | comments ]

Mon, 10 Oct 2005

How to Search Your Mozilla Cache

Ever want to look for something in your browser cache, but when you go there, it's just a mass of oddly named files and you can't figure out how to find anything?

(Sure, for whole pages you can use the History window, but what if you just want to find an image you saw this morning that isn't there any more?)

Here's a handy trick.

First, change directory to your cache directory (e.g. $HOME/.mozilla/firefox/blahblah/Cache).

Next, list the files of the type you're looking for, in the order in which they were last modified, and save that list to a file. Like this:

% file `ls -1t` | grep JPEG | sed 's/: .*//' > /tmp/foo

In English: ls -t lists in order of modification date, and -1 ensures that the files will be listed one per line. Pass that through grep for the right pattern (do a file * to see what sorts of patterns get spit out), then pass that through sed to get rid of everything but the filename. Save the result to a temporary file.

The temp file now contains the list of cache files of the type you want, ordered with the most recent first. You can now search through them to find what you want. For example, I viewed them with Pho:

pho `cat /tmp/foo`
For images, use whatever image viewer you normally use; if you're looking for text, you can use grep or whatever search you lke. Alternately, you could ls -lt `cat foo` to see what was modified when and cut down your search a bit further, or any other additional paring you need.

Of course, you don't have to use the temp file at all. I could have said simply:

pho `ls -1t` | grep JPEG | sed 's/: .*//'`
Making the temp file is merely for your convenience if you think you might need to do several types of searches before you find what you're looking for.

Tags: , , , , , , ,
[ 22:40 Oct 10, 2005    More tech/web | permalink to this entry | comments ]

Wed, 19 Jan 2005

Desktop Search -- What you need if you don't have grep

I've been surprised by the recent explosion in Windows desktop search tools. Why does everyone think this is such a big deal that every internet company has to jump onto the bandwagon and produce one, or be left behind?

I finally realized the answer this morning. These people don't have grep! They don't have any other way of searching out patterns in files.

I use grep dozens of times every day: for quickly looking up a phone number in a text file, for looking in my Sent mailbox for that url I mailed to my mom last week, for checking whether I have any saved email regarding setting up CUPS, for figuring out where in mozilla urlbar clicks are being handled.

Every so often, some Windows or Mac person is opining about how difficult commandlines are and how glad they are not to have to use them, and I ask them something like, "What if you wanted to search back through your mail folders to find the link to the cassini probe images -- e.g. lines that have both http:// and cassini in them?" I always get a blank look, like it would never occur to them that such a search would ever be possible.

Of course, expert users have ways of doing such searches (probably using command-line add-ons such as cygwin); and Mac OS X has the full FreeBSD commandline built in. And more recent Windows versions (Win2k and XP) now include a way to search for content in files (so in the Cassini example, you could search for http:// or cassini, but probably not both at once.) But the vast majority of Windows and Mac users have no way to do such a search, the sort of thing that Linux commandline users do casually dozens of times per day. Until now.

Now I see why desktop search is such a big deal.

But rather than installing web-based advertising-drive apps with a host of potential privacy and security implications ...

wouldn't it be easier just to install grep?

Tags: , , ,
[ 12:45 Jan 19, 2005    More tech | permalink to this entry | comments ]