Shallow Thoughts : : Jun

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sat, 24 Jun 2017

Mutt: Fixing Erroneous Charsets, part 632

Someone forwarded me a message from the Albuquerque Journal. It was all about "New Mexico\222s schools".

Sigh. I thought I'd gotten all my Mutt charset problems fixed long ago. My system locale is set to en_US.UTF-8, and accented characters in Spanish and in people's names usually show up correctly. But I do see this every now and then.

When I see it, I usually assume it's a case of incorrect encoding: whoever sent it perhaps pasted characters from a Windows Word document or something, and their mailer didn't properly re-encode them into the charset they were using to send the message.

In this case, the message had User-Agent: SquirrelMail/1.4.13. I suspect it came from a "Share this" link on the newspaper's website.

I used vim to look at the source of the message, and it had

Content-Type: text/plain; charset=iso-8859-1
For the bad characters, in vim I saw things like
New Mexico<92>s schools

I checked an old web page I'd bookmarked years ago that had a table of the iso-8859-1 characters, and sure enough, hex 0x92 was an apostrophe. What was wrong?

I got some help on the #mutt IRC channel, and, to make a long story short, that web table I was using was wrong. ISO-8859-1 doesn't include any characters in the range 8x-9x, as you can see on the Wikipedia ISO/IEC 8859-1.

What was happening was that the page was really cp1252: that's where those extra characters, like hex 92/octal 222 for an apostrophe, or hex 96/octal 226 for a dash (nitpick: that's an en dash, but it was used in a context that called for an em dash; if someone is going to use something other than the plain old ASCII dash - you'd think they'd at least use the right one. Sheesh!)

Anyway, the fix for this is to tell mutt when it sees iso-8859-1, use cp1252 instead:

charset-hook iso-8859-1 cp1252

Voilà! Now I could read the article about New Mexico's schools.

A happy find related to this: it turns out there's a better way of looking up ISO-8859 tables, and I can ditch that bookmark to the old, erroneous page. I've known about man ascii forever, but someone I'd never thought to try other charsets. Turns out man iso_8859-1 and man iso_8859-15 have built-in tables too. Nice!

(Sadly, man utf-8 doesn't give a table. Of course, that would be a long man page, if it did!)

Tags: , ,
[ 11:06 Jun 24, 2017    More linux | permalink to this entry | ]

Fri, 16 Jun 2017

Flycatchers Fledging, and The Buck Stops Here

We've had a pair of ash-throated flycatchers in the nest box I set up in the yard. I've been watching them bring bugs to the nest for a couple of weeks now, but this morning they've been acting unusual: fluttering around the corner of the house near my office window, calling to each other, not spending nearly as much time near the nest. I suspect one or more of the chicks may have fledged this morning, though I have yet to see more than two flycatchers at once. They still return to the nest box occasionally (one of them just delivered a big grasshopper), so not all the chicks have fledged yet. Maybe if I'm lucky I'll get to see one fledge.

I hope they're not too affected by the smoky air. We have two fires filling the air with smoke: the Bonita Fire, 50 miles north, and as of yesterday a new fire in Jemez Springs, only about half that distance. Yesterday my eyes were burning, my allergies were flaring up, and the sky was worse than the worst days in Los Angeles in the 70s. But it looks like the firefighters have gotten a handle on both fires; today is still smoky, with a major haze down in the Pojoaque Valley and over toward Albuquerque, but the sky above is blue and the smoke plume from Jemez Springs is a lot smaller and less dark than it was yesterday. Fingers crossed!

[Buck in velvet, drinking at the pond] And just a few minutes ago, a buck with antlers in velvet wandered into our garden to take a drink at the pond. Such a nice change from San Jose!

Tags: ,
[ 10:40 Jun 16, 2017    More nature | permalink to this entry | ]

Fri, 09 Jun 2017

Emacs: Typing dashes in html mode (update for Emacs 24)

Back in 2006, I wrote an article on making a modified copy of sgml-mode.el to make it possible to use double-dashed clauses -- like this -- in HTML without messing up auto-fill mode.

That worked, but the problem is that if you use your own copy of sgml-mode.el, you miss out on any other improvements to HTML and SGML mode. There have been some good ones, like smarter rewrap of paragraphs. I had previously tried lots of ways of customizing sgml-mode without actually replacing it, but never found a way.

Now, in emacs 24.5.1, I've found a easier way that seems to work. The annoying mis-indentation comes from the function sgml-comment-indent-new-line, which sets variables comment-start, comment-start-skip and comment-end and then calls comment-indent-new-line.

All I had to do was redefine sgml-comment-indent-new-line to call comment-indent-new-line without first defining the comment characters:

(defun sgml-comment-indent-new-line (&optional soft)
  (comment-indent-new-line soft))

Finding emacs source

I wondered if it might be better to call whatever underlying indent-new-line function comment-indent-new-line calls, or maybe just to call (newline-and-indent). But how to find the code of comment-indent-new-line?

Happily, describe-function (on C-h f, or if like me you use C-h for backspace, try F-1 h) tells you exactly what file defines a function, and it even gives you a link to click on to view the source. Wonderful!

It turned out just calling (newline-and-indent) wasn't enough, because sgml-comment-indent-new-line typically calls comment-indent-new-line when you've typed a space on the end of a line, and that space gets wrapped and then messes up indentation. But you can fix that by copying just a couple of lines from the source of comment-indent-new-line:

(defun sgml-comment-indent-new-line (&optional soft)
  (save-excursion (forward-char -1) (delete-horizontal-space))
  (delete-horizontal-space)
  (newline-and-indent))

That's a little longer than the other definition, but it's cleaner since comment-indent-new-line is doing all sorts of extra work you don't need if you're not handling comments. I'm not sure that both of the delete-horizontal-space lines are needed: the documentation for delete-horizontal-space says it deletes both forward and backward. But I have to assume they had a good reason for having both: maybe the (forward-char -1) is to guard against spurious spaces already having been inserted in the next line. I'm keeping it, to be safe.

Tags: ,
[ 11:16 Jun 09, 2017    More linux/editors | permalink to this entry | ]

Mon, 05 Jun 2017

HTML Email from Mutt

I know, I know. We use mailers like mutt because we don't believe in HTML mail and prefer plaintext. Me, too.

But every now and then a situation comes up where it would be useful to send something with emphasis. Or maybe you need to highlight changes in something. For whatever reason, every now and then I wish I had a way to send HTML mail.

I struggled with that way back, never did find a way, and ended up writing a Python script, htmlmail.py to send an HTML page, including images, as email.

Sending HTML Email

But just recently I found a neat mutt hack. It turns out it's quite easy to send HTML mail.

First, edit the HTML source in your usual mutt message editor (or compose the HTML some other way, and insert the file). Note: if there's any quoted text, you'll have to put a <pre> around it, or otherwise turn it into something that will display nicely in HTML.

Write the file and exit the editor. Then, in the Compose menu, type Ctrl-T to edit the attachment type. Change the type from text/plain to text/html.

That's it! Send it, and it will arrive looking like a regular HTML email, just as if you'd used one of them newfangled gooey mail clients. (No inline images, though.)

Viewing HTML Email

Finding out how easy that was made me wonder why the other direction isn't easier. Of course, I have my mailcap set up so that mutt uses lynx automatically to view HTML email:

text/html; lynx -dump %s; nametemplate=%s.html; copiousoutput

Lynx handles things like paragraph breaks and does in okay job of showing links; but it completely drops all emphasis, like bold, italic, headers, and colors. My terminal can display all those styles just fine. I've also tried links, elinks, and w3m, but none of them seem to be able to handle any text styling. Some of them will do bold if you run them interactively, but none of them do italic or colors, and none of them will do bold with -dump, even if you tell them what terminal type you want to use. Why is that so hard?

I never did find a solution, but it's worth noting some useful sites I found along the way. Like tips for testing bold, italics etc. in a terminal:, and for testing whether the terminal supports italics, which gave me these useful shell functions:

echo -e "\e[1mbold\e[0m"
echo -e "\e[3mitalic\e[0m"
echo -e "\e[4munderline\e[0m"
echo -e "\e[9mstrikethrough\e[0m"
echo -e "\e[31mHello World\e[0m"
echo -e "\x1B[31mHello World\e[0m"

ansi()          { echo -e "\e[${1}m${*:2}\e[0m"; }
bold()          { ansi 1 "$@"; }
italic()        { ansi 3 "$@"; }
underline()     { ansi 4 "$@"; }
strikethrough() { ansi 9 "$@"; }
red()           { ansi 31 "$@"; }

And in testing, I found that a lot of fonts didn't offer italics. One that does is Terminus, so if your normal font doesn't, you can run a terminal with Terminus: xterm -fn '-*-terminus-bold-*-*-*-20-*-*-*-*-*-*-*'

Not that it matters since none of the text-mode browsers offer italic anyway. But maybe you'll find some other use for italic in a terminal.

Tags: , ,
[ 18:28 Jun 05, 2017    More linux | permalink to this entry | ]