langgrep: search only in scripts of a specified language (Shallow Thoughts)

Akkana's Musings on Open Source, Science, and Nature.

Sat, 28 Feb 2009

langgrep: search only in scripts of a specified language

I was making a minor tweak to my garmin script that uses gpsbabel to read in tracklogs and waypoints from my GPS unit, and I needed to look up the syntax of how to do some little thing in sh script. (One of the hazards of switching languages a lot: you forget syntax details and have to look things up a lot, or at least I do.)

I have quite a collection of scripts in various languages in my ~/bin (plus, of course, all the scripts normally installed in /usr/bin on any Linux machine) so I knew I'd have lots of examples. But there are scripts of all languages sharing space in those directories; it's hard to find just sh examples. For about the two-hundredth time, I wished, "Wouldn't it be nice to have a command that can search for patterns only in files that are really sh scripts?"

And then, the inevitable followup ... "You know, that would be really easy to write."

So I did -- a little python hack called langgrep that takes a language, grep arguments and a file list, looks for a shebang line and only greps the files that have a shebang matching the specified language.

Of course, while writing langgrep I needed langgrep, to look up details of python syntax for things like string.find (I can never remember whether it's string.find(s, pat) or s.find(pat); the python libraries are usually nicely object-oriented but strings are an exception and it's the former, string.find). I experimented with various shell options -- this is Unix, so of course there are plenty of ways of doing this in the shell, without writing a script. For instance:

grep find `egrep -l '#\\!.*python' *`
grep find `file * | grep python | sed 's/:.*//'`
i in foo; file $i|grep python && grep find $i; done    # in sh/bash
These are all pretty straightforward, but when I try to make them into tcsh aliases things get a lot trickier. tcsh lets you make aliases that take arguments, so you can use !:1 to mean the first argument, !2-$ to mean all the arguments starting with the second one. That's all very well, but when you put them into a shell alias in a file like .cshrc that has to be parsed, characters like ! and $ can mean other things as well, so you have to escape them with \. So the second of those three lines above turns into something like
alias greplang "grep \!:2-$ `file * | grep \!:1 | sed 's/:.*//'`"
except that doesn't work either, so it probably needs more escaping somewhere. Anyway, I decided after a little alias hacking that figuring out the right collection of backslash escapes would probably take just as long as writing a python script to do the job, and writing the python script sounded more fun.

So here it is: my langgrep script. (Awful name, I know; better ideas welcome!) Use it like this (if python is the language you're looking for, find is the search pattern, and you want -w to find only "find" as a whole word):

langgrep python -w find ~/bin/*

Tags: , ,
[ 09:57 Feb 28, 2009    More programming | permalink to this entry ]