Zsh magic: remove all raw photos that don't have a corresponding JPEG
Lately, when shooting photos with my DSLR, I've been shooting raw mode but with a JPEG copy as well. When I triage and label my photos (with pho and metapho), I use only the JPEG files, since they load faster and there's no need to index both. But that means that sometimes I delete a .jpg file while the huge .cr2 raw file is still on my disk.
I wanted some way of removing these orphaned raw files: in other words, for every .cr2 file that doesn't have a corresponding .jpg file, delete the .cr2.
That's an easy enough shell function to write: loop over *.cr2, change the .cr2 extension to .jpg, check whether that file exists, and if it doesn't, delete the .cr2.
But as I started to write the shell function, it occurred to me: this is just the sort of magic trick zsh tends to have built in.
So I hopped on over to #zsh and asked, and in just a few minutes, I had an answer:
rm *.cr2(e:'[[ ! -e ${REPLY%.cr2}.jpg ]]':)
Yikes! And it works! But how does it work? It's cheating to rely on people in IRC channels without trying to understand the answer so I can solve the next similar problem on my own.
Most of the answer is in the zshexpn man page, but it still took some reading and jumping around to put the pieces together.
First, we take all files matching the initial wildcard, *.cr2
.
We're going to apply to them the filename generation code expression
in parentheses after the wildcard. (I think you need EXTENDED_GLOB set
to use that sort of parenthetical expression.)
The variable $REPLY is set to the filename the wildcard expression matched; so it will be set to each .cr2 filename, e.g. img001.cr2.
The expression ${REPLY%.cr2}
removes the .cr2 extension.
Then we tack on a .jpg: ${REPLY%.cr2}.jpg
.
So now we have img001.jpg.
[[ ! -e ${REPLY%.cr2}.jpg ]]
checks for the existence of
that jpg filename, just like in a shell script.
So that explains the quoted shell expression.
The final, and hardest part, is how to use that quoted expression.
That's in section 14.8.7 Glob Qualifiers.
(estring)
executes string as shell code, and the
filename will be included in the list if and only if the code returns
a zero status.
The colons -- after the e and before the closing parenthesis -- are just separator characters. Whatever character immediately follows the e will be taken as the separator, and anything from there to the next instance of that separator (the second colon, in this case) is taken as the string to execute. Colons seem to be the character to use by convention, but you could use anything. This is also the part of the expression responsible for setting $REPLY to the filename being tested.
So why the quotes inside the colons? They're because some of the substitutions being done would be evaluated too early without them: "Note that expansions must be quoted in the string to prevent them from being expanded before globbing is done. string is then executed as shell code."
Whew! Complicated, but awfully handy. I know I'll have lots of other uses for that.
One additional note: section 14.8.5, Approximate Matching, in that manual page caught my eye. zsh can do fuzzy matches! I can't think offhand what I need that for ... but I'm sure an idea will come to me.
[ 15:28 Oct 01, 2016 More linux/cmdline | permalink to this entry | ]