Shallow Thoughts : : Jul
Akkana's Musings on Open Source Computing and Technology, Science, and Nature.
Mon, 29 Jul 2013
Increasingly I'm seeing broken sites that send automated HTML mail
with headers claiming it's plain text.
To understand what's happening, you have to know about something called
MIME multipart/alternative.
MIME stands for Multipurpose Internet Mail Extensions:
it's the way mail encodes different types of attachments,
so you can attach images, music, PDF documents or whatever
with your email.
If you send a normal plain text mail message, you don't need MIME.
But as soon as you send anything else -- like an HTML message where
you've made a word bold, changed color or inserted images -- you need it.
MIME adds a Content-Type to the message saying "This is HTML
mail, so you need to display it as HTML when you receive it" or
"Here's a PDF attachment, so you need to display it in a PDF viewer".
The headers for these two cases would look like this:
Content-Type: text/html
Content-Type: application/pdf
A lot of mail programs, for reasons that have never been particularly
clear, like to send two copies of every mail message: one in plain
text, one in HTML. They're two copies of the same message --
it's just that one version has fancier formatting than the other.
The MIME header that announces this is
Content-Type: multipart/alternative
because the two versions, text and HTML, are alternative versions of the
same message. The recipient need only read one, not both.
Inside the multipart/alternative section there will be further
MIME headers, one saying
Content-Type: text/plain
,
where it puts the text of your message,
and one
Content-Type: text/html
, where it puts HTML
source code.
This mostly works fine for real mail programs (though it's a rather
silly waste of bandwidth, sending double copies of everything for no
particularly good reason, and personally I always configure the mailers
I use to send only one copy at a time). But increasingly I'm
seeing automated mail robots that send multipart/alternative mail,
but do it wrong: they send HTML for both parts, or they send a
valid HTML part and a blank text part.
Why don't the site owners notice the problem?
You wouldn't ever notice a problem if you use the default configuration
on most mailers, to show the HTML part if at all possible. But most mail
programs give you an option to show the text part if there is one.
That way, you don't have to worry about those people who like to send
messages in pink blinking text on a plaid background -- all you see is the text.
If your mailer is configured to show plain text, for most messages
you'll see just text -- no colors, no blinking, no annoyances.
But for mail sent by these misconfigured mail robots, what you'll
see is HTML source code.
I've seen this in several places -- lots of spammers do it (who cares?
I was going to delete the message anyway), and one of the local
astronomy clubs does it so I've long since stopped trying to read
their announcements.
But the latest place I've seen this is one that ought to know better:
Coursera. They apparently reconfigured their notification system
recently, and I started getting course notifications that look like this:
/* Client-specific Styles */
#outlook a{padding:0;} /* Force Outlook to provide a "view in browser" button.
*/
body{width:100% !important;} .ReadMsgBody{width:100%;}
.ExternalClass{width:100%;} /* Force Hotmail to display emails at full width */
body{-webkit-text-size-adjust:none;} /* Prevent Webkit platforms from changing
default text sizes. */
/* Reset Styles */
body{margin:0; padding:0;}
img{border:0; height:auto; line-height:100%; outline:none;
text-decoration:none;}
table td{border-collapse:collapse;}
#backgroundTable{height:100% !important; margin:0; padding:0; width:100%
!important;}
p {margin-top: 14px; margin-bottom: 14px;}
/* /\/\/\/\/\/\/\/\/\/\ STANDARD STYLING: PREHEADER /\/\/\/\/\/\/\/\/\/\ */
.preheaderContent div a:link, .preheaderContent div a:visited, /* Yahoo! Mail
Override */ .preheaderContent div a .yshortcuts /* Yahoo! Mail Override */{
color: #3b6e8f;
... and on and on like that. You get the idea.
It's unreadable, even by a geek who knows HTML pretty well.
It would be fine in the HTML part of the message -- but this is
what they're sending in the text/plain part.
I filed a bug, but Coursera doesn't have a lot of staff to respond to
bug reports and it might be quite some time before they fix this.
Meanwhile, I don't want to miss notifications for the algorithms
course I'm currently taking. So I needed a workaround.
How to work around the problem in mutt
I found one for mutt at
alternative_order
and folder-hook.
When in my "classes" folder, I use a folder hook to tell mutt to
prefer text/html format over text/plain, even though my default is text/plain.
Then you also need to add a default folder hook to set the default
back for every other folder -- mutt folder hooks are frustrating
in that way.
The two folder hooks look like this:
folder-hook . 'set unalternative_order *; alternative_order text/plain text'
# Prefer the HTML part but only for Coursera,
# since it sends HTML in the text part.
folder-hook =in/coursera 'unalternative_order *; alternative_order text/html'
alternative_order specifies which types you'd most like to read.
unalternative_order is a lot less clear; the documentation says
it "removes a mime type from the alternative_order list", but doesn't
say anything more than that. What's the syntax? What's the difference
between using unalternative_order or just re-setting alternative_order?
Why do I have to specify it with * in both places? No one seems to know.
So it's a little unsatisfying, and perhaps not the cleanest way.
But it does work around the bug for sites where you really need
a way to read the mail.
Update: I also found this
discussion
of alternative_order which gives a nice set of key bindings to
toggle interactively between the various formats.
It was missing some backslashes, so I had to fiddle with it slightly
to get it to work. Put this in .muttrc:
macro pager ,@aoh= "\
<enter-command> unalternative_order *; \
alternative_order text/enriched text/html text/plain text;\
macro pager A ,@aot= 'toggle alternative order'<enter>\
<exit><display-message>"
macro pager ,@aot= "\
<enter-command> unalternative_order *; \
alternative_order text/enriched text/plain text/html text;\
macro pager A ,@aoh= 'toggle alternative order'<enter>\
<exit><display-message>"
macro pager A ,@aot= "toggle alternative order"
Then just type A (capital A) to toggle between formats. If it doesn't
change the first time you type A, type another one and it should
redisplay. I've found it quite handy.
Tags: mutt, email, html
[
15:13 Jul 29, 2013
More tech/email |
permalink to this entry |
]
Wed, 24 Jul 2013
One more brief followup on that
comma
inserting sed pattern and its
followup:
$ echo 20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'
20,130,607,215,015
In the second article, I'd mentioned that the hardest part of the exercise
was figuring out where we needed backslashes.
Devdas (f3ew) asked on Twitter
whether I would still need all the backslash escapes even
if I put the pattern in a file -- in other worse, are the backslashes
merely to get the shell to pass special characters unchanged?
A good question, and I suspected the need for some of the backslashes
would disappear. So I tried this:
$ echo ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas
$ echo 20130607215015 | sed -f /tmp/commas
And it didn't work. No commas were inserted.
The problem, it turns out, is that my shell, zsh, changed both instances
of \b to an ASCII backspace, ^H. Editing the file fixes that, and so does
$ echo -E ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta' >/tmp/commas
But that only applies to echo: zsh doesn't do the \b -> ^H substitution
in the original command, where you pass the string directly as a sed argument.
Okay, with that straightened out, what about Devdas' question?
Surprisingly, it turns out that all the backslashes are still needed.
None of them go away when you echo > file
, so they
weren't there just to get special characters past the shell; and if
you edit the file and try removing some of the backslashes, you'll
see that the pattern no longer works. I had thought at least some of them,
like the ones before the \{ \}, were extraneous, but even those are
still needed.
Filtering unprintable characters
As long as I'm writing about regular expressions, I learned a nice
little tidbit last week. I'm getting an increasing
flood of Asian-language spams which my mail ISP doesn't filter out (they
use spamassassin, which is pretty useless for this sort of filtering).
I wanted a simple pattern I could pass to egrep (via procmail) that
would filter out anything with a run of more than 4 unprintable characters
in a row. [^[:print:]]{4,}
should do it, but it wasn't working.
The problem, it turns out, is the definition of what's printable.
Apparently when the default system character set is UTF-8, just about
everything is considered printable! So the trick is that you need to
set LC_ALL to something more restrictive, like C (which basically means
ASCII) to before :print: becomes useful for language-based filtering.
(Thanks to Mikachu for spotting the problem).
So in a terminal, you can do something like
LC_ALL=C egrep -v '[^[:print:]]' filename
In procmail it was a little harder; I couldn't figure out any way to
change LC_ALL from a procmail recipe; the only solution I came up
with was to add this to ~/.procmailrc:
export LC_ALL=C
It does work, though, and has cut the spam load by quite a bit.
Tags: zsh, regexp, sed, cmdline, grep
[
19:35 Jul 24, 2013
More linux/cmdline |
permalink to this entry |
]
Sat, 20 Jul 2013
Sometimes when I middleclick on a Firefox link to open it in a new tab,
I get an empty new tab. I hate that.
It happens most often on Javascript links. For instance, suppose a
website offers a Help link next to the link I'm trying to use.
I don't know what type of link it is; if it's
a normal link, to an HTML page, then it may open in my current tab,
overwriting the form I just spent five minutes filling out.
So I want to middleclick it, so it will open in a new tab.
On the other hand, if it's a Javascript link that pops up a new
help window, middleclicking won't work at all; all it does is open
an empty new tab, which I'll have to close.
A similar effect happens on PDF links; in that case, middleclicking
gives me the "What do you want to do with this?" dialog but
I also get a new tab that I have to close. (Though I'm
not sure what happens with Firefox's new built-in PDF reader.)
Anyway, since there seems to be no way of making middleclick just
do the sensible thing and open these links in a new tab like I asked,
it, I can do something almost as good: a user stylesheet that warns me when
I'm about to click on one of these special links. This rule changes
the cursor to a crosshair, and turns the link bold with colors of red
on yellow. Hard to miss!
I put this into userContent.css, inside the chrome
directory inside my profile:
/*
* Make it really obvious when links are javascript,
* since middleclicking javascript links doesn't do anything
* except open an empty new tab that then has to be closed.
*/
a:hover[href^="javascript"] {
cursor: crosshair; font-weight: bold;
text-decoration: blink;
color: red; background-color: yellow
!important
}
/*
* And the same for PDFs, for the same reason.
* Sadly, we can't catch all PDFs, just the ones where the actual
* filename ends in .pdf.
* Apparently there's no way to make a selector case insensitive,
* so we have separate cases for .pdf and .PDFb
*/
a:hover[href$=".pdf"], a:hover[href$=".PDF"] {
cursor: crosshair;
color: red; background-color: yellow
!important
}
In selectors, ^="javascript"
means "starts with javascript",
for links like javascript:do_something().
$=".pdf"
means "ends with .pdf".
If you want to match a string anywhere inside the href,
*=
means "contains".
What about that crosshair cursor?
Here are some of the cursors you can use:
Mozilla's
cursor documentation page. Don't trust the images on that page --
hover over each cursor to see what your actual browser shows.
You can also warn about links that would open a new window or tab.
If you prefer to keep control of that, rather than letting each web
page designer decide for you where each link should open, you
can control it with the
browser.link.open newwindow
preference. But whatever you do with that preference you can add a rule for
a:hover[target="_blank"]
to help you notice links that
are likely to open in a new tab.
You can even make these special links blink, with
text-decoration: blink
.
Assuming you're not a curmudgeon like I am who disables blinking
entirely by setting the "browser.blink_allowed" preference to false.
Tags: firefox, web, css, browsers, mozilla
[
20:26 Jul 20, 2013
More tech/web |
permalink to this entry |
]
Tue, 16 Jul 2013
Just a couple of tips for communicating with your BeagleBone Black
once you have it flashed with the latest Angstrom:
Configure the USB network
The Beaglebone Black running Angstrom has a wonderful feature:
when it's connected to your desktop via the mini USB cable, you
can not only mount it like a disk, but you can also set up networking
to it.
If the desktop is running Linux, you should have all the drivers you need.
But you might need a couple of udev rules to make the network device
show up. I ran the
mkudevrule.sh
from the official Getting Started page, which creates four rules
and then runs sudo udevadm control --reload-rules
to enable them
without needing to reboot your Linux machine.
Now you're ready to boot the BeagleBone Black running Angstrom.
Ideally you'll want it connected to your desktop machine by
the mini USB cable, but also plugged in to a separate power supply.
Your Linux machine should see it as a new network device, probably
eth1: run ifconfig -a
to see it.
The Beagle is already configured as 192.168.7.2. So you can talk
to it by configuring your desktop machine to be .1 on the same network:
ifconfig eth1 192.168.7.1
So now you can ssh from your desktop machine to the BBB,
or point your browser at the BBB's built in web pages.
Make your Linux machine a router for the Beaglebone
If you want the Beaglebone Black to have access to the net, there are
two more things you need to do.
First, on the BBB itself, run these two lines:
/sbin/route add default gw 192.168.7.1
echo "nameserver 8.8.8.8" >> /etc/resolv.conf
You'll probably want to add these lines to the end of
/usr/bin/g-ether-load.sh on the BBB, so they'll be run
automatically every time you boot.
Then, back on your Linux host, do this:
sudo iptables -A POSTROUTING -t nat -j MASQUERADE
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward > /dev/null
Now you should be able to ping, ssh or otherwise use the BBB to get
anywhere on the net.
Once your network is running, you might want to run
/usr/bin/ntpdate -b -s -u pool.ntp.org
to set the time, since the BBB doesn't have a real-time clock (RTC).
Serial monitor
If you're serious about playing with hardware, you'll probably want
a serial cable, for those times when something goes wrong and your
board isn't talking properly over USB.
I use the Adafruit
console cable -- it's meant for Raspberry Pi but it works fine
with the BeagleBone, since they both use the same 3.3v logic levels.
It plugs in to the pins right next to the "P9" header, on the
power-supply-plug side of the board.
The header has six pins: plug the black wire (ground) into pin 1,
the one closest to the power plug and ethernet jack. Plug the green wire
(RXD) into pin 4, and the white wire (TXD) into 5, the next-to-last pin.
Do not plug in the red power wire -- leave it hanging.
It's 5 volts and might damage the BBB if you plug it in to the
wrong place.
In case my photo isn't clear enough (click for a larger image),
Here's a diagram made by a helpful person on #beagle:
BeagleBone
Black serial connections.
Once the cable is connected, now what? Easy:
screen /dev/ttyUSB0 115200
That requires read-write privileges on the serial device /dev/ttyUSB0;
if you get a Permission Denied type error, it probably means you need
to add yourself to group dialout. (Changing groups requires logging out
and logging back in, so if you're impatient, just run screen as root.)
I used the serial monitor while flashing my new Angstrom image
(which is how I found out about how it spends most of that hour
updating Gnome desktop stuff). On the Raspberry Pi, I was dependent
on the serial cable for all sorts of things while I worked on hardware
projects; on the BeagleBone, I suspect I may use the USB networking
feature a lot more. Still, the serial cable will be handy to have when
things go wrong, or if I use a different Linux distro (like Debian) that
doesn't enable the USB networking feature.
Tags: hardware, linux, beaglebone, maker
[
19:41 Jul 16, 2013
More hardware |
permalink to this entry |
]
Sat, 13 Jul 2013
I finally got a shiny new BeagleBone Black! This little board
looks like it should be just the ticket for robotics, a small, cheap,
low-power Linux device that can also talk to hardware like an Arduino,
with plenty of GPIO pins, analog, PWM, serial, I2C and all.
I plugged in the BeagleBone Black via the mini USB cable, and it
powered up and booted. It comes with a Linux distro, Angstrom, already
installed on its built-in flash memory. I had already known that it
would show up as a USB storage device -- you can mount it like a disk,
and read the documentation (already there on the filesystem) that way.
Quite a nice feature.
What I didn't know until I read the
Getting
Started guide was that it had an even slicker feature: it also
shows up as a USB network device.
All I had to do was run a script,
mkudevrule.sh,
to set up some udev rules, and then
ifconfig -a
on my desktop showed a new device named eth1.
The Beagle is already configured as 192.168.7.2, so I configured eth1
to be on the same network:
ifconfig eth1 192.168.7.1
and I was able to point my browser directly at a mini http server
running on the device, which gives links to all the built-in documentation.
Very slick and well thought out!
But of course, what I really wanted was to log in to the machine
itself. So I tried ssh 192.168.7.2
and ... nothing.
It turns out that the Angstrom that ships on current BBBs has a bug,
and ssh often doesn't work. The cure is to download a new Angstrom
image and re-flash the machine.
It was getting late in the evening, so I postponed that until the
following day. And a good thing I did: the flashing process turned out
to be very time consuming and poorly documented, at least for Linux users.
So here's how to do it.
Step 1: Use a separate power supply
Most of the Beaglebone guides recommend just powering
the BBB through its provided mini USB cable. Don't believe it.
At least, my first attempt at flashing failed, while repeating
exactly the same steps with the addition of an external power supply
worked just fine, and I've heard from other people who have had
similar problems trying to power the BBB through cable USB cable.
Fortunately, when I ordered the BBB I ordered a 2A power supply with
it. It's hard to believe that it ever really draws 2 amps, but that's
what Adafruit recommended, so that's what I ordered.
One caution: the BBB will start booting as soon as you
apply any power, whether from an external supply or the USB cable.
So it might be best to leave the USB cable disconnected during
the flashing process.
Get the eMMC-flasher image and copy it to the SD card
Download the image for the BeagleBone Black eMMC flasher from
Beagleboard
Latest Images.
They don't tell you the size of the image, but it's 369M.
The uncompress it. It's a .xz file, which I wasn't previously familiar
with, but I already had an uncompressor for it, unxz:
unxz BBB-eMMC-flasher-2013.06.20.img.xz
After uncompressing, it was 583M.
You'll need a microSD card to copy the image to. The Beagleboard folks
don't say how much space you need, but I found a few pages
talking about needing a 4G card. I'm not clear why you'd need that
for an image barely over half a gig, but 4G is what I happened to
have handy, so that's what I used.
Put the card in whatever adapter you need, plug it in to your Linux
box, and unmount it if got mounted automatically. Then copy the image
to the card -- just the base card device, not the first partition.
Replace X with the appropriate drive name (b in my case):
dd bs=1M if=BBB-eMMC-flasher-2013.06.20.img of=/dev/sdX
The copy will take quite a while.
Boot off the card
With the BBB powered off, insert the microSD card.
Find the "user boot" button. It's a tiny button right on top of the
microSD card reader. While holding it down, plug in your power supply
to power the BBB on. Keep holding the button down until you see all
four of the bright blue LEDs come on, then release the button.
Then wait. A long time. A really long time.
The LEDs should flash erratically during this period.
Most estimates I found on the web estimated 30-45 minutes to flash a
new version of Angstrom, but for me it took an hour and six minutes.
You'll know when it's done when the LEDs stop blinking erratically.
Either they'll all turn on steady (success) or they'll all go off
(failure).
Over an hour? Why so long?
I wondered that, of course, so in my second attempt at flashing, once
I had the serial cable plugged in, I ran ps periodically to see what
it was doing.
And for nearly half that time -- over 25 minutes -- what it was doing
was configuring Gnome.
Seriously. This Angstrom distribution for a tiny board half the size
of your hand runs a Gnome desktop -- and when it flashes its OS,
it doesn't just copy files, it runs Gnome configuration scripts for
every damn program on the system.
Okay. I'm a little less impressed with the Beagle's Angstrom setup now.
Though I still think this USB-ethernet thing is totally slick.
Tags: hardware, linux, beaglebone, maker
[
14:30 Jul 13, 2013
More hardware |
permalink to this entry |
]
Tue, 09 Jul 2013
A few days ago I wrote about a nifty
sed
script to insert commas into numbers that I dissected with the
help of Dana Jansens.
Once we'd figured it out, though, Dana thought this wasn't really the best
solution. For instance, what if you have a file that has some numbers
in it, but also has some digits mixed up with letters? Do you really
want to insert commas into every string of digits? What if you have
some license plates, like abc1234? Maybe it would be better to
restrict the change to digits that stand by themselves and
are obviously meant to be numbers. How much harder would that be?
More regexp fun! We kicked it around a bit, and came up with a solution:
$ echo abc20130607215015 | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'
abc20,130,607,215,015
$ echo abc20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'
abc20130607215015
$ echo 20130607215015 | sed ':a;s/\b\([0-9]\+\)\([0-9]\{3\}\)\b/\1,\2/;ta'
20,130,607,215,015
Breaking that down: \b
is any word boundary -- you could
also use \< to indicate that it's the start of a word, much like
\> was the end of a word.
\([0-9]\+\)
is any string of one or more digits, taken as
a group. The \( \)
part marks it as a group so we'll be
able to use it later.
\([0-9]\{3\}\)
is a string of exactly three digits: again,
we're using \( \)
to mark it as our second numbered group.
\b
is another word boundary (we could use \>),
to indicate that the group of three digits must come at the end
of a word, with only whitespace or punctuation following it.
/\1,\2/
: once we've matched the pattern -- a word break,
one or more digits, three digits and another word break -- we'll
replace it with this. \1 matches the first group we found -- that
was the string of one or more digits. \2 matches the second group,
the final trio of digits. And there's a comma in between.
We use the same :a; ;ta
trick as in the first example
to loop around until there are no more triplets to match.
The hardest part of this was figuring out what needed to be escaped
with backslashes. The one that really surprised me was the \+.
Although *
works in sed the same way it does in other
programs, matching zero or more repetitions of the preceding pattern,
sed uses \+
rather than +
for one or more
repetitions. It took us some fiddling to find all the places we needed
backslashes.
Tags: regexp, sed, cmdline
[
21:16 Jul 09, 2013
More linux/cmdline |
permalink to this entry |
]
Sun, 07 Jul 2013
Carla Schroder's recent article,
More Great Linux Awk, Sed, and Bash Tips and Tricks ,
had a nifty sed command I hadn't seen before to take a long number and
insert commas appropriately:
sed -i ':a;s/\B[0-9]\{3\}\gt;/,&/;ta' numbers.txt
.
Or, if you don't have a numbers.txt file, you can do something like
echo 20130607215015 | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'
(I dropped the -i since that's for doing in-place edits of a file).
Nice! But why does it work?
It would be easy enough to insert commas after every third number,
but that doesn't work unless the number of digits is a multiple of three.
In other words, you don't want 20130607215015 to become
201,306,072,150,15 (note how the last group only has two digits);
it has to count in threes from the right if you want to end up
with 20,130,607,215,015.
Carla's article didn't explain it, and neither did any of the other
sites I found that mentioned this trick.
So, with some help from regexp wizard Dana Jansens (of
OpenBox fame), I've broken it down
into more easily understood bits.
Labels and loops
The first thing to understand is that this is actually several sed commands.
I was familiar with sed's basic substitute command, s/from/to/.
But what's the rest of it? The semicolons separate the commands, so
the whole sed script is:
:a
s/\B[0-9]\{3\}\>/,&/
ta
What this does is set up a label called a. It tries to do the
substitute command, and if the substitute succeeds (if something
was changed), then ta
tells it to loop back around to
label a, the beginning of the script.
So let's look at that substitute command.
The substitute
Sed's s/from/to/ (like the equivalent command in vim and many
other programs) looks for the first instance of the from pattern
and replaces it with the to pattern. So we're searching for
\B[0-9]\{3\}\>
and replacing it with
,&/
Clear as mud, right? Well, the to pattern is easy: &
matches whatever we just substituted (from), so this just
sticks a comma in front of ... something.
The from pattern, \B[0-9]\{3\}\>
, is a bit more
challenging. Let's break down the various groups:
-
\B
-
Matches anything that is not a word boundary.
-
[0-9]
-
Matches any digit.
-
\{3\}
-
Matches three repetitions of whatever precedes it (in this case, a digit).
-
\>
-
Matches a word boundary at the end of a word. This was the hardest part
to figure out, because no sed documentation anywhere bothers to mention
this pattern. But Dana knew it as a vim pattern, and it turns out it
does the same thing in sed even though the docs don't say so.
Okay, put them together, and the whole pattern matches any three digits
that are not preceded by a word boundary but which are
at the end of a word (i.e. they're followed by a word boundary).
Cool! So in our test number, 20130607215015, this matches the last
three digits, 015. It doesn't match any of the other digits because
they're not followed by a word end boundary.
So the substitute will insert a comma before the last three numbers.
Let's test that:
$ echo 20130607215015 | sed 's/\B[0-9]\{3\}\>/,&/'
20130607215,015
Sure enough!
How the loop works
So the substitution pattern just adds the last comma.
Once the comma is inserted, the ta
tells sed to go back
to the beginning (label :a) and do it again.
The second time, the comma that was just inserted is now a word
boundary, so the pattern matches the three digits before the comma,
215, and inserts another comma before them. Let's make sure:
$ echo 20130607215,015 | sed 's/\B[0-9]\{3\}\>/,&/'
20130607,215,015
So that's how the pattern manages to match triplets from right to left.
Dana later commented that this wasn't really the best solution -- what
if the string of digits is attached to other characters and isn't
really a number? I'll cover that in a separate article in a few days.
Update: Here's the smarter pattern,
Sed:
insert commas into numbers, but in a smarter way.
Tags: regexp, sed, cmdline
[
14:14 Jul 07, 2013
More linux/cmdline |
permalink to this entry |
]
Wed, 03 Jul 2013
[This a slight revision of my monthly "Shallow Sky" column in the
SJAA Ephemeris newsletter.
Looks like the Ephemeris no longer has an online HTML version,
just the PDF of the whole newsletter,
so I may start reposting my Ephemeris columns here more often.]
Last month I stumbled upon a loony moon book I hadn't seen before, one
that deserves consideration by all lunar observers.
The book is The Moon: Considered as a Planet, a World, and a Satellite
by James Nasmyth, C.E. and James Carpenter, F.R.A.S.
It's subtitled "with twenty-six illustrative plates of lunar objects,
phenomena, and scenery; numerous woodcuts &c." It was written in 1885.
Astronomers may recognize the name Nasmyth: his name is attached to a modified
Cassegrain focus design used in a lot of big observatory telescopes.
Astronomy was just a hobby for him, though; he was primarily a
mechanical engineer. His coauthor, James Carpenter, was an astronomer
at the Royal Greenwich Observatory.
The most interesting thing about their book is the plates illustrating
lunar features. In 1885, photography wasn't far enough along to get
good close-up photos of the moon through a telescope. But Nasmyth and
Carpenter wanted to show something beyond sketches. So they built
highly detailed models of some of the most interesting areas of the
moon, complete with all their mountains, craters and rilles, then
photographed them under the right lighting conditions for interesting
shadows similar to what you'd see when that area was on the terminator.
I loved the idea, since I'd worked on a similar but much less
ambitious project myself. Over a decade ago, before we were married,
Dave North got the idea
to make a 3-D model of the full moon that he could use for the SJAA
astronomy class. I got drafted to help. We started by cutting a 3-foot
disk of wood, on which we drew a carefully measured grid corresponding
to the sections in Rukl's Atlas of the Moon. Then, section by section,
we drew in the major features we wanted to incorporate. Once the
drawing was done, we mixed up some spackle -- some light, and some
with a little black paint in it for the mare areas -- and started
building up relief on top of the features we'd sketched. The project
was a lot of fun, and we use the moon model when giving talks
(otherwise it hangs on the living room wall).
Nasmyth and Carpenter's models cover only small sections of the moon --
Copernicus, Plato, the Apennines -- but in amazing detail. Looking at
their photos really is like looking at the moon at high magnification
on a night of great seeing.
So I had to get the book. Amazon has two versions, a paperback and a
hardcover. I opted for the paperback, which turns out to be scanned
from a library book (there's even a scan of the pocket where the book's
index card goes). Some of the scanning is good, but some of the plates
come out all black. Not very satisfying.
But once I realized that an 1885 book was old enough to be public domain,
I checked the web. I found two versions: one at Archive.org and one on
Google Books. They're scans from two different libraries; the Archive.org
scan is better, but the epub version I downloaded for my ebook reader
has some garbled text and a few key plates, like Clavius, missing.
The Google version is a much worse scan and I couldn't figure out if
they had an epub version. I suspect the hardcover on Amazon is likely
a scan from yet a fourth library.
At the risk of sounding like some crusty old Linux-head, wouldn't it
be nice if these groups could cooperate on making one GOOD version
rather than a bunch of bad ones?
I also discovered that the San Jose library has a copy. A REAL copy,
not a scan.
It gave me a nice excuse to take the glass elevator up
to the 8th floor and take in the view of San Jose.
And once I got it,
I scanned all the
moon sculpture plates myself.
Sadly, like the Archive.org ebook, the San Jose copy is missing Copernicus.
I wonder if vandals are cutting that page out of library copies?
That makes me wince even to think of it, but I know such things happen.
Whichever version you prefer, I'd recommend that lunies get hold of
a copy. It's a great introduction to planetary science, with
very readable discussions of how you measure things like the distance
and size of the moon. It's an even better introduction to lunar
observing: if you merely go through all of their descriptions of
interesting lunar areas and try to observe the features they mention,
you'll have a great start on a lunar observing program that'll keep
you busy for months. For experienced observers, it might give you a
new appreciation of some lunar regions you thought you already knew
well. Not at super-fine levels of detail -- no Alpine Valley rille --
but a lot of good discussion of each area.
Other parts of the book are interesting only from a historical
perspective. The physical nature of lunar features wasn't a settled
issue in 1885, but Nasmyth and Carpenter feel confident that all of
the major features can be explained as volcanism. Lunar craters are
the calderas of enormous volcanoes; mountain ranges are volcanic too,
built up from long cracks in the moon's crust, like the Cascades range
in the Pacific Northwest.
There's a whole chapter on "Cracks and Radiating Streaks", including a
wonderful plate of a glass ball with cracks, caused by deformation,
radiating from a single point. They actually did the experiment: they
filled a glass globe with water and sealed it, then "plunged it into a
warm bath". The cracks that resulted really do look a bit like Tycho's
rays (if you don't look TOO closely: lunar rays actually line up with
the edges of the crater, not the center).
It's fun to read all the arguments that are plausible, well reasoned
-- and dead wrong. The idea that craters are caused by meteorite
impacts apparently hadn't even been suggested at the time.
Anyway, I enjoyed the book and would definitely recommend it. The
plates and observing advice can hold their own against any modern
observing book, and the rest ... is a fun historical note.
Here are some places to get it:
Amazon:
Online:
Or, try your local public library -- they might have a real copy!
Tags: astronomy, moon, science, art, books
[
16:12 Jul 03, 2013
More science/astro |
permalink to this entry |
]