Shallow Thoughts : : cmdline
Akkana's Musings on Open Source Computing and Technology, Science, and Nature.
Sun, 19 Mar 2023
I back up my computer to a local disk (well, several redundant local disks)
using rsync
. (I don't particularly trust cloud providers,
and in any case our internet connection is very slow, especially for upload,
so waiting hours while the entire contents of my disk uploads isn't appealing.)
To save space and time, I have script that includes a list of files
and directories I don't need to back up: browser cache directories,
object files, build directories, generated files like thumbnails,
large video files, downloaded source, and so on.
I also have a list of files I do want to back up even though
they'd otherwise be excluded. For instance, I sometimes have local changes
in my GIMP source directory, outsrc/gimp-master/gimp/, even
though most of outsrc doesn't need to be backed up.
Or /blog/tags/build in my local mirror of the shallowsky
website, even though I have a rule that says directories named
build shouldn't usually be backed up.
I've been using rsync's --include
and --exclude
to handle this.
But I discovered yesterday that I'd been using them wrong, and some
things I thought were getting backed up, weren't.
It took some reading and experimenting before I figured out how
these rsync flags actually work — which doesn't seem to be
well explained anywhere.
Read more ...
Tags: backups, linux, cmdline, python
[
16:11 Mar 19, 2023
More linux/cmdline |
permalink to this entry |
]
Mon, 30 Mar 2020
It was surprisingly hard to come up with a "D" to write about,
without descending into Data geekery (always a temptation).
Though you may decide I've done that anyway with today's topic.
Out for a scenic drive to shake off some of the house-bound cobwebs,
I got to thinking about how so many places are named after the Devil.
California was full of them -- the Devil's Punchbowl, the Devil's
Postpile, and so forth -- and nearly every western National Park
has at least one devilish feature.
How many are there really? Happily, there's an easy way to answer
questions like this: the
Geographic Names page on the USGS website,
which hosts the Geographic Names Information System (GNIS).
You can download entire place name files for a state, or
you can search for place name matches at:
GNIS Feature Search.
When I searched there for "devil", I got 1883 hits -- but many of them
don't actually include the word "Devil". What, are they taking lessons
from Google about searching for things that don't actually match the
search terms?
I decided I wanted to download the results so I could
count them more easily.
The page offers View & Print all or
Save as pipe "|" delimited file. I chose to save the file.
Read more ...
Tags: GIS, mapping, data, cmdline, linux
[
16:30 Mar 30, 2020
More linux/cmdline |
permalink to this entry |
]
Thu, 31 Oct 2019
Someone on ##linux was talking about "bro pages", which turns out to
be a site that collects random short examples of how to use Linux
commands. It reminded me of
Command Line Magic,
a Twitter account I follow that gives sometimes entertaining or useful
command-line snippets.
I hadn't been to that page on the Twitter website in a while (I
usually use bitlbee for Twitter), and clicking through some of the
tweets on the "Who to follow" accounts took me to someone who'd made
a GNU
CoreUtils cheat sheet. I didn't really want the printed cheat
sheet, but I was interested in the commands used to generate it.
The commands involved downloading an HTML page and didn't work any
more -- the page was still there but its format has changed -- but
that got me to thinking about how it might be fun to generate
something that would show me a random command and its description,
starting not from coreutils but from the set of all commands I have
installed.
I can get a list of commands from the installed man pages in
/usr/share/man -- section 1, for basic commands, and section
8, for system-admin commands. (The other sections are for things
like library routines, system calls, files etc.)
So I can pick a random man page like this:
ls -1 /usr/share/man/man1/ /usr/share/man/man8 | shuf -n 1
which gives me a filename like
xlsfonts.1.gz.
The man pages are troff format, gzipped. You can run zcat on
them, but extracting the name and description still isn't entirely
trivial. In most cases, it comes right after the .SH NAME
line, so you could do something like
zcat $(ls -1 /usr/share/man/man1/* /usr/share/man/man8/* | shuf -n 1) | grep -A1 NAME | tail -1
(the * for the two directories causes ls to list the full pathname,
like
/usr/share/man/man1/xlsfonts.1.gz, instead of just the
filename, xlsfonts.1.gz).
But that doesn't work in every case: sometimes the description is more than
one line, or there's a line between the NAME line and the actual description.
A better way is to use apropos (man -k), which already knows how to
search through man pages and parse them to extract the command name and
description. For that, you need to
start with the filename (I'm going to drop those *s from the command since
I don't need the full pathname any more) and get rid of everything
after the first '.'.
You can do that with sed 's_\.[0-9].*__'
:
it looks for everything starting with a dot (\.
) followed
by a digit ([0-9]
-- sed doesn't understand \d
)
followed by anything (.*
) and replaces all of it with nothing,
the empty string.
Here's the full command:
apropos $(ls -1 /usr/share/man/man1/ /usr/share/man/man8 | shuf -n 1 | sed 's_\.[0-9].*__')
Sometimes it will give more than one command: for instance,
just now, testing it, it found /usr/share/man/man8/snap.8.gz,
pared that down to just snap, and apropos snap
found ten different commands. But that's unusual; most of the time
you'll just get one or two, and of course you could add another
| shuf -n 1
if want to make sure you get only one line.
Update: man -f
is a better solution: that will give a single
apropos-like description line for only the command picked by the
first shuf command.
man -f $(ls -1 /usr/share/man/man1/ /usr/share/man/man8 | shuf -n 1 | sed 's_\.[0-9].*__')
It's kind of a fun way to discover new commands you may not have
heard of. I'm going to put it in my .zlogin.
Tags: linux, cmdline
[
13:22 Oct 31, 2019
More linux/cmdline |
permalink to this entry |
]
Sat, 01 Oct 2016
Lately, when shooting photos with my DSLR, I've been shooting raw mode
but with a JPEG copy as well. When I triage and label my photos (with
pho and metapho), I use only the JPEG files, since they load faster
and there's no need to index both. But that means that sometimes I
delete a .jpg file while the huge .cr2 raw file is still on my disk.
I wanted some way of removing these orphaned raw files: in other words,
for every .cr2 file that doesn't have a corresponding .jpg file, delete
the .cr2.
That's an easy enough shell function to write: loop over *.cr2,
change the .cr2 extension to .jpg, check whether that file exists,
and if it doesn't, delete the .cr2.
But as I started to write the shell function, it occurred to me:
this is just the sort of magic trick zsh tends to have built in.
So I hopped on over to #zsh and asked, and in just a few minutes,
I had an answer:
rm *.cr2(e:'[[ ! -e ${REPLY%.cr2}.jpg ]]':)
Yikes! And it works! But how does it work? It's cheating to rely on people
in IRC channels without trying to understand the answer so I can solve
the next similar problem on my own.
Most of the answer is in
the zshexpn
man page, but it still took some reading and jumping around to put
the pieces together.
First, we take all files matching the initial wildcard, *.cr2
.
We're going to apply to them the filename generation code expression
in parentheses after the wildcard. (I think you need EXTENDED_GLOB set
to use that sort of parenthetical expression.)
The variable $REPLY is set to the filename the wildcard
expression matched;
so it will be set to each .cr2 filename, e.g. img001.cr2.
The expression ${REPLY%.cr2}
removes the .cr2 extension.
Then we tack on a .jpg: ${REPLY%.cr2}.jpg
.
So now we have img001.jpg.
[[ ! -e ${REPLY%.cr2}.jpg ]]
checks for the existence of
that jpg filename, just like in a shell script.
So that explains the quoted shell expression.
The final, and hardest part, is how to use that quoted expression.
That's in section 14.8.7 Glob Qualifiers.
(estring)
executes string as shell code, and the
filename will be included in the list if and only if the code returns
a zero status.
The colons -- after the e and before the closing parenthesis -- are
just separator characters. Whatever character immediately follows the
e will be taken as the separator, and anything from there to the next
instance of that separator (the second colon, in this case) is taken
as the string to execute. Colons seem to be the character to use by
convention, but you could use anything.
This is also the part of the expression responsible for setting $REPLY
to the filename being tested.
So why the quotes inside the colons? They're because some of the
substitutions being done would be evaluated too early without them:
"Note that expansions must be quoted in the string to prevent them
from being expanded before globbing is done. string is then executed
as shell code."
Whew! Complicated, but awfully handy. I know I'll have lots of other
uses for that.
One additional note: section 14.8.5, Approximate Matching, in that
manual page caught my eye. zsh can do fuzzy matches! I can't think
offhand what I need that for ... but I'm sure an idea will come to me.
Tags: zsh, shell, cmdline, imaging
[
15:28 Oct 01, 2016
More linux/cmdline |
permalink to this entry |
]
Fri, 04 Dec 2015
I wrote recently about a zsh shell function to
run
make distclean on a source tree even if something in autoconf
is messed up. In order to save any arguments you've previously
passed to configure or autogen.sh, my function parsed the arguments
from a file called config.log.
But it might be a bit more reliable to use config.status --
I'm guessing this is the file that make
uses when it finds it needs to re-run autogen.sh.
However, the syntax in that file is more complicated,
and parsing it taught me some useful zsh tricks.
I can see the relevant line from config.status like this:
$ grep '^ac_cs_config' config.status
ac_cs_config="'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"
--enable-foo --disable-bar are options I added
purely for testing. I wanted to make sure my shell function would
work with multiple arguments.
Ultimately, I want my shell function to call
autogen.sh --prefix=/usr/local/gimp-git --enable-foo --disable-bar
The goal is to end up with $args being a zsh array containing those
three arguments. So I'll need to edit out those quotes and split the
line into an array.
Sed tricks
The first thing to do is to get rid of that initial ac_cs_config=
in the line from config.status. That's easy with sed:
$ grep '^ac_cs_config' config.status | sed -e 's/ac_cs_config=//'
"'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"
But since we're using sed anyway, there's no need to use grep to
get the line: we can do it all with sed.
First try:
sed -n '/^ac_cs_config/s/ac_cs_config=//p' config.status
Search for the line that starts with ac_cs_config (^ matches
the beginning of a line);
then replace ac_cs_config= with nothing, and p
print the resulting line.
-n tells sed not to print anything except when told to with a p.
But it turns out that if you give a sed substitution a blank pattern,
it uses the last pattern it was given. So a more compact version,
using the search pattern ^ac_cs_config, is:
sed -n '/^ac_cs_config=/s///p' config.status
But there's also another way of doing it:
sed '/^ac_cs_config=/!d;s///' config.status
! after a search pattern matches every line that doesn't match
the pattern. d deletes those lines. Then for lines that weren't
deleted (the one line that does match), do the substitution.
Since there's no -n, sed will print all lines that weren't deleted.
I find that version more difficult to read. But I'm including it
because it's useful to know how to chain several commands in sed,
and how to use ! to search for lines that don't match a pattern.
You can also use sed to eliminate the double quotes:
sed '/^ac_cs_config=/!d;s///;s/"//g' config.status
'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
But it turns out that zsh has a better way of doing that.
Zsh parameter substitution
I'm still relatively new to zsh, but I got some great advice on #zsh.
The first suggestion:
sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args
I'll be using final print -rl - $args
for all these examples:
it prints an array variable with one member per line.
For the actual distclean function, of course, I'll be passing
the variable to autogen.sh, not printing it out.
First, let's look at the heart of that expression: the
args=( ${(Q)${(z)${(Q)REPLY}}}
.
The heart of this is the expression ${(Q)${(z)${(Q)x}}}
The zsh parameter substitution syntax is a bit arcane, but each of
the parenthesized letters does some operation on the variable that follows.
The first (Q)
strips off a level of quoting.
So:
$ x='"Hello world"'; print $x; print ${(Q)x}
"Hello world"
Hello world
(z)
splits an expression and stores it in an array.
But to see that, we have to use print -l
, so array members
will be printed on separate lines.
$ x="a b c"; print -l $x; print "....."; print -l ${(z)x}
a b c
.....
a
b
c
Zsh is smart about quotes, so if you have quoted expressions it will
group them correctly when assigning array members:
$
x="'a a' 'b b' 'c c'"; print -l $x; print "....."; print -l ${(z)x}
'a a' 'b b' 'c c'
.....
'a a'
'b b'
'c c'
So let's break down the larger expression: this is best read
from right to left, inner expressions to outer.
${(Q) ${(z) ${(Q) x }}}
| | | \
| | | The original expression,
| | | "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"
| | \
| | Strip off the double quotes:
| | '--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
| \
| Split into an array of three items
\
Strip the single quotes from each array member,
( --prefix=/usr/local/gimp-git --enable-foo --disable-bar )
Neat!
For more on zsh parameter substitutions, see the
Zsh
Guide, Chapter 5: Substitutions.
Passing the sed results to the parameter substitution
There's still a little left to wonder about in our expression,
sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args
The IFS= read -r
seems to be a common idiom in zsh scripting.
It takes standard input and assigns it to the variable $REPLY. IFS is
the input field separator: you can split variables into words by
spaces, newlines, semicolons or any other character you
want. IFS= sets it to nothing. But because the input expression --
"'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'" --
has quotes around it, IFS is ignored anyway.
So you can do the same thing with this simpler expression, to
assign the quoted expression to the variable $x.
I'll declare it a local variable: that makes no difference
when testing it in the shell, but if I call it in a function, I won't
have variables like $x and $args cluttering up my shell afterward.
local x=$(sed -n '/^ac_cs_config=/s///p' config.status); local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args
That works in the version of zsh I'm running here, 5.1.1. But I've
been warned that it's safer to quote the result of $(). Without
quotes, if you ever run the function in an older zsh, $x might end up
being set only to the first word of the expression. Second, it's a
good idea to put "local" in front of the variable; that way, $x won't
end up being set once you've returned from the function. So now we have:
local x="$(sed -n '/^ac_cs_config=/s///p' config.status)"; local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args
You don't even need to use a local variable. For added brevity (making
the function even more difficult to read! -- but we're way past the
point of easy readability), you could say:
args=( ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}} ); print -rl - $args
or even
print -rl - ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}}
... but that final version, since it doesn't assign to a variable at all,
isn't useful for the function I'm writing.
Tags: zsh, shell, regexp, gimp, programming
[
13:25 Dec 04, 2015
More linux/cmdline |
permalink to this entry |
]
Fri, 15 May 2015
I have a bunch of devices that use VFAT filesystems. MP3 players,
camera SD cards, SD cards in my Android tablet. I mount them through
/etc/fstab, and the files always look executable, so when
I ls -f
them, they all have asterisks after their names.
I don't generally execute files on these devices; I'd prefer the
files to have a mode that doesn't make them look executable.
I'd like the files to be mode 644 (or 0644 in most programming
languages, since it's an octal, or base 8, number). 644 in binary
is 110 100 100, or as the Unix ls
command puts it,
rw-r--r--.
There's a directive, fmask, that you can put in fstab
entries to control the mode of files when the device is mounted.
(Here's Wikipedia's long
umask article.)
But how do you get from the mode you want the files to be, 644,
to the mask?
The mask (which corresponds to the umask
command)
represent the bits you don't want to have set. So, for instance,
if you don't want the world-execute bit (1) set, you'd put 1 in the mask.
If you don't want the world-write bit (2) set, as you likely don't, put
2 in the mask. So that's already a clue that I'm going to want the
rightmost byte to be 3: I don't want files mounted from my MP3 player
to be either world writable or executable.
But I also don't want to have to puzzle out the details of all nine bits
every time I set an fmask. Isn't there some way I can take the mode I
want the files to be -- 644 -- and turn them into the mask I'd need to
put in /etc/fstab or set as a umask?
Fortunately, there is. It seemed like it ought to be straightforward,
but it took a little fiddling to get it into a one-line command I can type.
I made it a shell function in my .zshrc:
# What's the complement of a number, e.g. the fmask in fstab to get
# a given file mode for vfat files? Sample usage: invertmask 755
invertmask() {
python -c "print '0%o' % (~(0777 & 0$1) & 0777)"
}
This takes whatever argument I give to it -- $1 -- and takes
only the three rightmost bytes from it, (0777 & 0$1). It takes
the bitwise NOT of that, ~. But the result of that is a negative
number, and we only want the three rightmost bytes of the result,
(result) & 0777, expressed as an octal number -- which
we can do in python by printing it as %o. Whew!
Here's a shorter, cleaner looking alias that does the same thing,
though it's not as clear about what it's doing:
invertmask1() {
python -c "print '0%o' % (0777 - 0$1)"
}
So now, for my MP3 player I can put this in /etc/fstab:
UUID=0000-009E /mp3 vfat user,noauto,exec,fmask=133,shortname=lower 0 0
Tags: linux, cmdline
[
10:27 May 15, 2015
More linux/cmdline |
permalink to this entry |
]
Tue, 02 Sep 2014
I was using strace to figure out how to set up a program, lftp, and
a friend commented that he didn't know how to use it
and would like to learn. I don't use strace often, but when I do,
it's indispensible -- and it's easy to use. So here's a little tutorial.
My problem, in this case, was that I needed to find out what
configuration file I needed to modify in order to set up an alias
in lftp. The lftp man page tells you how to define an alias, but doesn't
tell you how to save it for future sessions; apparently you have
to edit the configuration file yourself.
But where? The man page suggested
a couple of possible config file locations -- ~/.lftprc and
~/.config/lftp/rc -- but neither of those existed. I wanted
to use the one that already existed. I had already set up bookmarks
in lftp and it remembered them, so it must have a config file already,
somewhere. I wanted to find that file and use it.
So the question was, what files does lftp read when it starts up?
strace lets you snoop on a program and see what it's doing.
strace shows you all system calls being used by a program.
What's a system call? Well, it's anything in section 2 of the Unix manual.
You can get a complete list by typing: man 2 syscalls
(you may have to install developer man pages first -- on Debian that's
the manpages-dev package). But the important thing is that most
file access calls -- open, read, chmod, rename, unlink (that's how you
remove a file), and so on -- are system calls.
You can run a program under strace directly:
$ strace lftp sitename
Interrupt it with Ctrl-C when you've seen what you need to see.
Pruning the output
And of course, you'll see tons of crap you're not interested in,
like rt_sigaction(SIGTTOU) and fcntl64(0, F_GETFL). So let's get rid
of that first. The easiest way is to use grep. Let's say I want to know
every file that lftp opens. I can do it like this:
$ strace lftp sitename |& grep open
I have to use |& instead of just | because strace prints its
output on stderr instead of stdout.
That's pretty useful, but it's still too much. I really don't care
to know about strace opening a bazillion files in
/usr/share/locale/en_US/LC_MESSAGES, or libraries like
/usr/lib/i386-linux-gnu/libp11-kit.so.0.
In this case, I'm looking for config files, so I really only want to know
which files it opens in my home directory. Like this:
$ strace lftp sitename |& grep 'open.*/home/akkana'
In other words, show me just the lines that have either the word "open"
or "read" followed later by the string "/home/akkana".
Digression: grep pipelines
Now, you might think that you could use a simpler pipeline with two greps:
$ strace lftp sitename |& grep open | grep /home/akkana
But that doesn't work -- nothing prints out. Why? Because grep, under
certain circumstances that aren't clear to me, buffers its output, so
in some cases when you pipe grep | grep, the second grep will wait
until it has collected quite a lot of output before it prints anything.
(This comes up a lot with tail -f
as well.)
You can avoid that with
$ strace lftp sitename |& grep --line-buffered open | grep /home/akkana
but that's too much to type, if you ask me.
Back to that strace | grep
Okay, whichever way you grep for open and your home directory,
it gives:
open("/home/akkana/.local/share/lftp/bookmarks", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.netrc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/akkana/.local/share/lftp/rl_history", O_RDONLY|O_LARGEFILE) = 5
open("/home/akkana/.inputrc", O_RDONLY|O_LARGEFILE) = 5
Now we're getting somewhere! The file where it's getting its bookmarks
is
~/.local/share/lftp/bookmarks -- and I probably can't use that
to set my alias.
But wait, why doesn't it show lftp trying to open those other config files?
Using script to save the output
At this point, you might be sick of running those grep pipelines over
and over. Most of the time, when I run strace, instead of piping it
through grep I run it under script to save the whole output.
script is one of those poorly named, ungoogleable commands, but it's
incredibly useful. It runs a subshell and saves everything that appears
in that subshell, both what you type and all the output, in a file.
Start script, then run lftp inside it:
$ script /tmp/lftp.strace
Script started on Tue 26 Aug 2014 12:58:30 PM MDT
$ strace lftp sitename
After the flood of output stops, I type Ctrl-D or Ctrl-C to exit lftp,
then another Ctrl-D to exit the subshell script is using.
Now all the strace output was in /tmp/lftp.strace and I can
grep in it, view it in an editor or anything I want.
So, what files is it looking for in my home directory and why don't
they show up as open attemps?
$ grep /home/akkana /tmp/lftp.strace
Ah, there it is! A bunch of lines like this:
access("/home/akkana/.lftprc", R_OK) = -1 ENOENT (No such file or directory)
stat64("/home/akkana/.lftp", 0xbff821a0) = -1 ENOENT (No such file or directory)
mkdir("/home/akkana/.config", 0755) = -1 EEXIST (File exists)
mkdir("/home/akkana/.config/lftp", 0755) = -1 EEXIST (File exists)
access("/home/akkana/.config/lftp/rc", R_OK) = 0
So I should have looked for access and stat as well as
open.
Now I have the list of files it's looking for. And, curiously,
it creates ~/.config/lftp if it doesn't exist already, even though
it's not going to write anything there.
So I created ~/.config/lftp/rc and put my alias there. Worked fine.
And I was able to edit my bookmark in ~/.local/share/lftp/bookmarks
later when I had a need for that. All thanks to strace.
Tags: linux, debugging, cmdline
[
13:06 Sep 02, 2014
More linux/cmdline |
permalink to this entry |
]
Sat, 28 Dec 2013
I've been scanning a bunch of records with Audacity (using as a guide
Carla Schroder's excellent Book of
Audacity and a
Behringer
UCA222 USB audio interface -- audacity doesn't seem able to record
properly from the built-in sound card on any laptop I own, while it
works fine with the Behringer.
Audacity's user interface isn't great for assembly-line recording of
lots of tracks one after the other, especially on a laptop with a
trackpad that doesn't work very well, so I wasn't always as organized
with directory names as I could have been, and I ended up with a mess.
I was periodically backing up the recordings to my desktop, but as I
shifted from everything-in-one-directory to an organized system, the
two directories got out of sync.
To get them back in sync, I needed a way to answer this question:
is every file inside directory A (maybe in some subdirectory of it)
also somewhere under subdirectory B? In other words, can I safely
delete all of A knowing that anything in it is safely stored in B,
even though the directory structures are completely different?
I was hoping for some clever find | xargs
way to do it,
but came up blank. So eventually I used a little zsh loop:
one find to get the list of files to test, then for each of
those, another find inside the target directory, then test
the exit code of find to see if it found the file.
(I'm assuming that if the songname.aup file is there, the songname_data
directory is too.)
for fil in $(find AAA/ -name '*.aup'); do
fil=$(basename $fil)
find BBB -name $fil >/dev/null
if [[ $? != 0 ]]; then
echo $fil is not in BBB
fi
done
Worked fine. But is there an easier way?
Tags: shell, cmdline, linux, programming
[
10:36 Dec 28, 2013
More linux/cmdline |
permalink to this entry |
]