Shallow Thoughts
Akkana's Musings on Open Source, Science, and Nature.
Sun, 12 Oct 2008
Someone on LinuxChix' techtalk list asked whether she could get
tcsh to print "[no output]" after any command that doesn't produce
output, so that when she makes logs to help her co-workers, they
will seem clearer.
I don't know of a way to do that in any shell (the shell would have
to capture the output of every command; emacs' shell-mode does that
but I don't think any real shells do) but it seemed like it ought
to be straightforward enough to do as a regular expression substitute
in vi. You're looking for lines where a line beginning with a prompt
is followed immediately by another line beginning with a prompt;
the goal is to insert a new line consisting only of "[no output]"
between the two lines.
It turned out to be pretty easy in vim. Here it is:
:%s/\(^% .*$\n\)\(% \)/\1[no results]\r\2/
Explanation:
- :
- starts a command
- %
- do the following command on every line of this file
- s/
- start a global substitute command
- \(
- start a "capture group" -- you'll see what it does soon
- ^
- match only patterns starting at the beginning of a line
- %
- look for a % followed by a space (your prompt)
- .*
- after the prompt, match any other characters until...
- $
- the end of the line, after which...
- \n
- there should be a newline character
- \)
- end the capture group after the newline character
- \(
- start a second capture group
- %
- look for another prompt. In other words, this whole
- expression will only match when a line starting with a prompt
- is followed immediately by another line starting with a prompt.
- \)
- end the second capture group
- /
- We're finally done with the mattern to match!
- Now we'll start the replacement pattern.
- \1
- Insert the full content of the first capture group
- (this is also called a "backreference" if you want
- to google for a more detailed explanation).
- So insert the whole first command up to the newline
- after it.
- [no results]
- After the newline, insert your desired string.
- \r
- insert a carriage return here (I thought this should be
- \n for a newline, but that made vim insert a null instead)
- \2
- insert the second capture group (that's just the second prompt)
- /
- end of the substitute pattern
Of course, if you have a different prompt, substitute it for "% ".
If you have a complicated prompt that includes time of day or
something, you'll have to use a slightly more complicated match
pattern to match it.
Tags: regexp, shells, linux, editors
[
13:34 Oct 12, 2008
More linux/editors |
permalink to this entry
]
Sun, 31 Aug 2008
I wanted to get a list of who'd been contributing the most in a
particular open source project. Most projects of any size have a
ChangeLog file, in which check-ins have entries like this:
2008-08-26 Jane Hacker <hacker@domain.org>
* src/app/print.c: make sure the Portrait and Landscape
* buttons update according to the current setting.
I wanted to take each entry, save the name of the developer checking
in, then eventually count the number of times each name occurs (the
number of times that developer checked in) and print them in order
from most check-ins to least.
Getting the names is easy: for check-ins in the last 9 years, I just
want the lines that start with "200". (Of course, if I wanted earlier
check-ins I could make the match more general.)
grep "^200" ChangeLog
But now I want to trim the line so it includes only the
contributor's name. A bit of sed geekery can do that: the date is a
fixed format (four characters, a dash, two, dash, two, then two
spaces, so "^....-..-.. " matches that pattern.
But I want to remove the email address part too
(sometimes people use different email addresses
when they check in). So I want a sed pattern that will match
something at the front (to discard), something in the middle (keep that part)
and something at the end (discard).
Here's how to do that in sed:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\)<.*$/\1/'
In English, that says: "For each line in the ChangeLog that starts
with 200, find a pattern at the beginning consisting of any four
characters, a dash, two characters, dash, two characters, dash, and
two spaces; then immediately after that, save all characters up to
a < symbol; then throw away the < and any characters that follow
until the end of the line."
That works pretty well! But it's not quite right: it includes the
two spaces after the name as part of the name. In sed, \s matches
any space character (like space or tab).
So you'd think this should work:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\)\s+<.*$/\1/'
\s+ means it will require that at least one and maybe more space
characters immediately before the < are also discarded.
But it doesn't work. It turns out the reason is that the \(.*\)
expression is "greedier" than the \s+: so the saved name expression
grabs the first space, leaving only the second to the \s+.
The way around that is to make the name expression specify that it
can't end with a space. \S is the term for "anything that's not a
space character"; so the expression becomes
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\S\)\s\+<.*$/\1/'
(the + turned out to need a backslash before it).
We have the list of names! Add a | sort on the end to
sort them alphabetically -- that will make sure you get all the
"Jane Hacker" lines listed together. But how to count them?
The Unix program most frequently invoked after sort
is uniq, which gets rid of all the repeated lines.
On a hunch, I checked out the man page, man uniq,
and found the -c option: "prefix lines by the number of occurrences".
Perfect! Then just sort them by the number, from largest to
smallest:
grep "^200" ChangeLog | sed 's/^....-..-.. \(.*\S\)\s+<.*$/\1/' | sort | uniq -c | sort -rn
And we're done!
Now, this isn't perfect since it doesn't catch "Checking in patch
contributed by susan@otherhost.com" attributions -- but those aren't in
a standard format in most projects, so they have to be handled by hand.
Disclaimer: Of course, number of check-ins is not a good measure of
how important or productive someone is. You can check in a lot of
one-line fixes, or you can write an important new module and submit
it for someone else to merge in. The point here wasn't to rank
developers, but just to get an idea who was checking into the tree
and how often.
Well, that ... and an excuse to play with nifty Linux shell pipelines.
Tags: shells, linux, pipelines, regexp
[
11:12 Aug 31, 2008
More linux |
permalink to this entry
]
Wed, 07 Nov 2007
I've been a tcsh user for many years. Back in the day, there were lots
of reasons for preferring csh to sh, mostly having to do with command
history. Most of those reasons are long gone -- modern bash and
tcsh are vastly improved from those early shells, and borrow from each
other, so the differences are fairly minor.
Back in July, I solved
the last blocker that had been keeping me off bash,
so I put some effort into migrating all my .cshrc settings into
a .bashrc so I could give bash a fair shot. It almost won; but
after four months, I've switched back to tcsh, due mostly to a single
niggling bash bug that I just can't seem to solve.
After all these years, the crucial difference is still history.
Specifically, history amnesia: bash has an annoying habit of
forgetting history commands just when I most want them back.
Say I type some longish command.
After it runs, I hit return a couple of times, wait a while, do
a couple of other things, then decide I want to call that command
back from history so I can run something similar, maybe with the
filename changed or a different flag. I ctrl-P or up-arrow ... and
the command isn't there!
If I type history at this point, I'll see most of my
command history ... with an empty line in place of the line I was
hoping to repeat. The command is gone. My only option is to remember
what I typed, and type it all again.
Nobody seems to know why this happens, and it's sporadic, doesn't
happen every time. Friends have been able to reproduce it, so it's
not just me or my weird settings. It drives me batty.
It wouldn't be so bad except it always seems to happen on the
tricky commands that I really didn't want to retype.
It's too bad, because otherwise I had bash nicely whipped into shape,
and it does have some advantages over tcsh. Some of the tradeoffs:
tcsh wins
- Totally reliable history: commands never disappear.
- History tab completion: typing
!a<TAB>
expands to the last command that started with a. In bash, I have
to type !a:p to see the command, then
!! to execute it.
- When I tab-complete a file and there are multiple matches, tcsh shows
them, or at least beeps (depending on configuration). In bash I have
to hit a second tab just in case there might be matches.
- When I tab-complete a directory, tcsh adds the / automatically.
(Arguable. I find I want the / roughly 3/4 of the time.)
- tcsh doesn't drop remote connections if I suspend (with ~^Z).
bash drops me within a minute or two, regardless of settings like
$TMOUT. Bash users tell me I could solve this by using screen,
but that seems like a heavyweight workaround when tcsh "just works".
- tcsh sees $REMOTEHOST and $DISPLAY automatically when I ssh.
bash doesn't: ssh -X helps, but I still need some tricky
code in .bash_profile.
- aliases can have arguments, e.g.
alias llth 'ls -laFt \!* | head'
In bash these have to be functions, which means they don't show
up typing "which" or "alias".
- Prompt settings are more flexible -- there are options like %B for
bold. In bash you have to get the terminal type and use the
ansi color escape sequances, which don't include bold.
- Easier command editing setup -- behaviors like
getting
word-erase to stop at punctuation
don't involve chasing through multiple semi-documented programs,
and the answer doesn't vary with version.
- Documentation -- tcsh's is mostly in man tcsh, bash's is
scattered all over man pages for various helper programs.
And it's hard to google for bash help because "bash" as a keyword
gets you lots of stuff not related to shells.
Of course, you bash users, set me straight if I missed out
on some bash options that would have solved some of these problems.
And especially if you have a clue about the evil disappearing
history commands!
bash wins
- You don't have to
rehash every time you add a program
or change your path. That's a real annoyance of tcsh, and I could
understand a person used to bash rejecting tcsh on this alone.
- History remembers entire multi-line commands, and shows them
with semicolons when you arrow back through history. Very nice.
tcsh only remembers the first line and you have to retype the rest.
- Functions: although I don't like having to use them instead of
aliases, they're certainly powerful and nice to have.
Of course, bash and tcsh aren't the only shells around.
From what I hear, zsh blends the good features of bash and tcsh.
One of these days I'll try it and see.
Tags: linux, shells, bash, csh
[
21:58 Nov 07, 2007
More linux |
permalink to this entry
]
Tue, 17 Jul 2007
I've been a happy csh/tcsh user for decades. But every now and then I
bow to pressure and try to join the normal bash-using Linux world.
But I always come up against one problem right away: word erase
(Control-W). For those who don't use ^W, suppose I type something like:
% ls /backups/images/trips/arizona/2007
Then I suddenly realize I want utah in 2007, not arizona.
In csh, I can hit ^W twice and it erases the last two words, and I'm
ready to type
u<tab>.
In bash, ^W erases the whole path leaving
only "ls", so it's no help here.
It may seem like a small thing, but I use word erase hundreds of
times a day and it's hard to give it up. Google was no help, except
to tell me I wasn't the only one asking.
Then the other day
I was chatting about this issue with a friend who uses zsh for that
reason (zsh is much more flexible at defining key bindings)
and someone asked, "Is that like Meta-Delete?"
It turned out that Alt-Backspace
(like many Linux applications, bash calls the Alt key "Meta",
and Linux often confuses Delete and Backspace)
did exactly what I wanted. Very promising!
But Alt-Backspace is not easy to type, since it's not reachable from
the "home" typing position.
What I needed, now that I knew bash and readline had the function,
was a way to bind it to ^W.
Bash's binding syntax is documented, though the functions available
don't seem to be. But bind -p | grep word gave me
some useful information. It seems that \C-w was bound to
"unix-word-rubout" (that was the one I didn't want) whereas "\e\C-?"
was bound to "backward-kill-word".
("\e\C-?" is an obscure way of saying Meta-DEL: \e is escape, and
apparently bash, like emacs, treats ESC followed by a key as the same
as pressing Alt and the key simultaneously. And Control-question-mark
is the Delete character in ASCII.)
So my task was to bind \C-w to backward-kill-word.
It looked like this ought to work:
bind '\C-w:backward-kill-word'
... Except it didn't.
bind -p | grep w
showed that C-W was still bound to "unix-word-rubout".
It turned out that it was the terminal (stty) settings causing
the problem: when the terminal's werase (word erase)
character is set, readline hardwires that character to do
unix-word-rubout and ignores any attempts to change it.
I found the answer in a
bash
bug report. The stty business was introduced in readline 5.0,
but due to complaints, 5.1 was slated to add a way to override
the stty settings. And happily, I had 5.2! So what was this new
way override method? The posting gave no hint, but eventually I found it.
Put in your .inputrc:
set bind-tty-special-chars Off
And finally my word erase worked properly and I could use bash!
Tags: linux, shells, bash
[
15:22 Jul 17, 2007
More linux |
permalink to this entry
]
Fri, 29 Dec 2006
A friend called me for help with a sysadmin problem they were having
at work. The problem: find all files bigger than one gigabyte, print
all the filenames, add up all the sizes and print the total.
And for some reason (not explained to me) they needed to do this
all in one command line.
This is Unix, so of course it's possible somehow!
The obvious place to start is with the find command,
and man find showed how to find all the 1G+ files:
find / -size +1G
(Turns out that's a GNU find syntax, and BSD find, on OS X, doesn't
support it. I left it to my friend to check man find for the
OS X equivalent of -size _1G.)
But for a problem like this, it's pretty clear we'd need to get find
to execute a program that prints both the filename and the size.
Initially I used ls -ls, but Saz (who was helping on IRC)
pointed out that du on a file also does that, and looks a
bit cleaner. With find's unfortunate syntax, that becomes:
find / -size +1G -exec du "{}" \;
But now we needed awk, to collect and add up all the sizes
while printing just the filenames. A little googling (since I don't
use awk very often) and experimenting led to the final solution:
find / -size +1G -exec du "{}" \; | awk '{print $2; total += $1} END { print "Total is", total}'
Ah, the joys of Unix shell pipelines!
Update: Ed Davies suggested an easier way to do the same thing.
turns out du will handle it all by itself: du -hc `find . -size +1G`
Thanks, Ed!
Tags: linux, shells, backups, pipelines
[
16:53 Dec 29, 2006
More linux |
permalink to this entry
]