Shallow Thoughts : : linux

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sat, 24 Jun 2017

Mutt: Fixing Erroneous Charsets, part 632

Someone forwarded me a message from the Albuquerque Journal. It was all about "New Mexico\222s schools".

Sigh. I thought I'd gotten all my Mutt charset problems fixed long ago. My system locale is set to en_US.UTF-8, and accented characters in Spanish and in people's names usually show up correctly. But I do see this every now and then.

When I see it, I usually assume it's a case of incorrect encoding: whoever sent it perhaps pasted characters from a Windows Word document or something, and their mailer didn't properly re-encode them into the charset they were using to send the message.

In this case, the message had User-Agent: SquirrelMail/1.4.13. I suspect it came from a "Share this" link on the newspaper's website.

I used vim to look at the source of the message, and it had

Content-Type: text/plain; charset=iso-8859-1
For the bad characters, in vim I saw things like
New Mexico<92>s schools

I checked an old web page I'd bookmarked years ago that had a table of the iso-8859-1 characters, and sure enough, hex 0x92 was an apostrophe. What was wrong?

I got some help on the #mutt IRC channel, and, to make a long story short, that web table I was using was wrong. ISO-8859-1 doesn't include any characters in the range 8x-9x, as you can see on the Wikipedia ISO/IEC 8859-1.

What was happening was that the page was really cp1252: that's where those extra characters, like hex 92/octal 222 for an apostrophe, or hex 96/octal 226 for a dash (nitpick: that's an en dash, but it was used in a context that called for an em dash; if someone is going to use something other than the plain old ASCII dash - you'd think they'd at least use the right one. Sheesh!)

Anyway, the fix for this is to tell mutt when it sees iso-8859-1, use cp1252 instead:

charset-hook iso-8859-1 cp1252

Voilà! Now I could read the article about New Mexico's schools.

A happy find related to this: it turns out there's a better way of looking up ISO-8859 tables, and I can ditch that bookmark to the old, erroneous page. I've known about man ascii forever, but someone I'd never thought to try other charsets. Turns out man iso_8859-1 and man iso_8859-15 have built-in tables too. Nice!

(Sadly, man utf-8 doesn't give a table. Of course, that would be a long man page, if it did!)

Tags: , ,
[ 11:06 Jun 24, 2017    More linux | permalink to this entry | comments ]

Fri, 09 Jun 2017

Emacs: Typing dashes in html mode (update for Emacs 24)

Back in 2006, I wrote an article on making a modified copy of sgml-mode.el to make it possible to use double-dashed clauses -- like this -- in HTML without messing up auto-fill mode.

That worked, but the problem is that if you use your own copy of sgml-mode.el, you miss out on any other improvements to HTML and SGML mode. There have been some good ones, like smarter rewrap of paragraphs. I had previously tried lots of ways of customizing sgml-mode without actually replacing it, but never found a way.

Now, in emacs 24.5.1, I've found a easier way that seems to work. The annoying mis-indentation comes from the function sgml-comment-indent-new-line, which sets variables comment-start, comment-start-skip and comment-end and then calls comment-indent-new-line.

All I had to do was redefine sgml-comment-indent-new-line to call comment-indent-new-line without first defining the comment characters:

(defun sgml-comment-indent-new-line (&optional soft)
  (comment-indent-new-line soft))

Finding emacs source

I wondered if it might be better to call whatever underlying indent-new-line function comment-indent-new-line calls, or maybe just to call (newline-and-indent). But how to find the code of comment-indent-new-line?

Happily, describe-function (on C-h f, or if like me you use C-h for backspace, try F-1 h) tells you exactly what file defines a function, and it even gives you a link to click on to view the source. Wonderful!

It turned out just calling (newline-and-indent) wasn't enough, because sgml-comment-indent-new-line typically calls comment-indent-new-line when you've typed a space on the end of a line, and that space gets wrapped and then messes up indentation. But you can fix that by copying just a couple of lines from the source of comment-indent-new-line:

(defun sgml-comment-indent-new-line (&optional soft)
  (save-excursion (forward-char -1) (delete-horizontal-space))
  (delete-horizontal-space)
  (newline-and-indent))

That's a little longer than the other definition, but it's cleaner since comment-indent-new-line is doing all sorts of extra work you don't need if you're not handling comments. I'm not sure that both of the delete-horizontal-space lines are needed: the documentation for delete-horizontal-space says it deletes both forward and backward. But I have to assume they had a good reason for having both: maybe the (forward-char -1) is to guard against spurious spaces already having been inserted in the next line. I'm keeping it, to be safe.

Tags: ,
[ 11:16 Jun 09, 2017    More linux/editors | permalink to this entry | comments ]

Mon, 05 Jun 2017

HTML Email from Mutt

I know, I know. We use mailers like mutt because we don't believe in HTML mail and prefer plaintext. Me, too.

But every now and then a situation comes up where it would be useful to send something with emphasis. Or maybe you need to highlight changes in something. For whatever reason, every now and then I wish I had a way to send HTML mail.

I struggled with that way back, never did find a way, and ended up writing a Python script, htmlmail.py to send an HTML page, including images, as email.

Sending HTML Email

But just recently I found a neat mutt hack. It turns out it's quite easy to send HTML mail.

First, edit the HTML source in your usual mutt message editor (or compose the HTML some other way, and insert the file). Note: if there's any quoted text, you'll have to put a <pre> around it, or otherwise turn it into something that will display nicely in HTML.

Write the file and exit the editor. Then, in the Compose menu, type Ctrl-T to edit the attachment type. Change the type from text/plain to text/html.

That's it! Send it, and it will arrive looking like a regular HTML email, just as if you'd used one of them newfangled gooey mail clients. (No inline images, though.)

Viewing HTML Email

Finding out how easy that was made me wonder why the other direction isn't easier. Of course, I have my mailcap set up so that mutt uses lynx automatically to view HTML email:

text/html; lynx -dump %s; nametemplate=%s.html; copiousoutput

Lynx handles things like paragraph breaks and does in okay job of showing links; but it completely drops all emphasis, like bold, italic, headers, and colors. My terminal can display all those styles just fine. I've also tried links, elinks, and w3m, but none of them seem to be able to handle any text styling. Some of them will do bold if you run them interactively, but none of them do italic or colors, and none of them will do bold with -dump, even if you tell them what terminal type you want to use. Why is that so hard?

I never did find a solution, but it's worth noting some useful sites I found along the way. Like tips for testing bold, italics etc. in a terminal:, and for testing whether the terminal supports italics, which gave me these useful shell functions:

echo -e "\e[1mbold\e[0m"
echo -e "\e[3mitalic\e[0m"
echo -e "\e[4munderline\e[0m"
echo -e "\e[9mstrikethrough\e[0m"
echo -e "\e[31mHello World\e[0m"
echo -e "\x1B[31mHello World\e[0m"

ansi()          { echo -e "\e[${1}m${*:2}\e[0m"; }
bold()          { ansi 1 "$@"; }
italic()        { ansi 3 "$@"; }
underline()     { ansi 4 "$@"; }
strikethrough() { ansi 9 "$@"; }
red()           { ansi 31 "$@"; }

And in testing, I found that a lot of fonts didn't offer italics. One that does is Terminus, so if your normal font doesn't, you can run a terminal with Terminus: xterm -fn '-*-terminus-bold-*-*-*-20-*-*-*-*-*-*-*'

Not that it matters since none of the text-mode browsers offer italic anyway. But maybe you'll find some other use for italic in a terminal.

Tags: , ,
[ 18:28 Jun 05, 2017    More linux | permalink to this entry | comments ]

Tue, 25 Apr 2017

Typing Greek letters

I'm taking a MOOC that includes equations involving Greek letters like epsilon. I'm taking notes online, in Emacs, using the iimage mode tricks for taking MOOC class notes in emacs that I worked out a few years back.

Iimage mode works fine for taking screenshots of the blackboard in the videos, but sometimes I'd prefer to just put the equations inline in my file. At first I was typing out things like E = epsilon * sigma * T^4 but that's silly, and of course the professor isn't spelling out the Greek letters like that when he writes the equations on the blackboard. There's got to be a way to type Greek letters on this US keyboard.

I know how to type things like accented characters using the "Multi key" or "Compose key". In /etc/default/keyboard I have XKBOPTIONS="ctrl:nocaps,compose:menu,terminate:ctrl_alt_bksp" which, among other things, sets the compose key to be my "Menu" key, which I never used otherwise. And there's a file, /usr/share/X11/locale/en_US.UTF-8/Compose, that includes all the built-in compose key sequences. I have a shell function in my .zshrc,

composekey() {
  grep -i $1 /usr/share/X11/locale/en_US.UTF-8/Compose
}
so I can type something like composekey epsilon and find out how to type specific codes. But that didn't work so well for Greek letters. It turns out this is how you type them:
<dead_greek> <A>            : "Α"   U0391    # GREEK CAPITAL LETTER ALPHA
<dead_greek> <a>            : "α"   U03B1    # GREEK SMALL LETTER ALPHA
<dead_greek> <B>            : "Β"   U0392    # GREEK CAPITAL LETTER BETA
<dead_greek> <b>            : "β"   U03B2    # GREEK SMALL LETTER BETA
<dead_greek> <D>            : "Δ"   U0394    # GREEK CAPITAL LETTER DELTA
<dead_greek> <d>            : "δ"   U03B4    # GREEK SMALL LETTER DELTA
<dead_greek> <E>            : "Ε"   U0395    # GREEK CAPITAL LETTER EPSILON
<dead_greek> <e>            : "ε"   U03B5    # GREEK SMALL LETTER EPSILON
... and so forth. And this <dead_greek> key isn't actually defined in most US/English keyboard layouts: you can check whether it's defined for you with: xmodmap -pke | grep dead_greek

Of course you can use xmodmap to define a key to be <dead_greek>. I stared at my keyboard for a bit, and decided that, considering how seldom I actually need to type Greek characters, I didn't see the point of losing a key for that purpose (though if you want to, here's a thread on how to map <dead_greek> with xmodmap).

I decided it would make much more sense to map it to the compose key with a prefix, like 'g', that I don't need otherwise. I can do that in ~/.XCompose like this:

<Multi_key> <g> <A>            : "Α"   U0391    # GREEK CAPITAL LETTER ALPHA
<Multi_key> <g> <a>            : "α"   U03B1    # GREEK SMALL LETTER ALPHA
<Multi_key> <g> <B>            : "Β"   U0392    # GREEK CAPITAL LETTER BETA
<Multi_key> <g> <b>            : "β"   U03B2    # GREEK SMALL LETTER BETA
<Multi_key> <g> <D>            : "Δ"   U0394    # GREEK CAPITAL LETTER DELTA
<Multi_key> <g> <d>            : "δ"   U03B4    # GREEK SMALL LETTER DELTA
<Multi_key> <g> <E>            : "Ε"   U0395    # GREEK CAPITAL LETTER EPSILON
<Multi_key> <g> <e>            : "ε"   U03B5    # GREEK SMALL LETTER EPSILON
... and so forth.

And now I can type [MENU] g e and a lovely ε appears, at least in any app that supports Greek fonts, which is most of them nowadays.

Tags: ,
[ 12:57 Apr 25, 2017    More linux | permalink to this entry | comments ]

Fri, 31 Mar 2017

Show mounted filesystems

Used to be that you could see your mounted filesystems by typing mount or df. But with modern Linux kernels, all sorts are implemented as virtual filesystems -- proc, /run, /sys/kernel/security, /dev/shm, /run/lock, /sys/fs/cgroup -- I have no idea what most of these things are except that they make it much more difficult to answer questions like "Where did that ebook reader mount, and did I already unmount it so it's safe to unplug it?" Neither mount nor df has a simple option to get rid of all the extraneous virtual filesystems and only show real filesystems.

http://unix.stackexchange.com/questions/177014/showing-only-interesting-mount-p oints-filtering-non-interesting-types had some suggestions that got me started:

mount -t ext3,ext4,cifs,nfs,nfs4,zfs
mount | grep -E --color=never  '^(/|[[:alnum:]\.-]*:/)'
Another answer there says it's better to use findmnt --df, but that still shows all the tmpfs entries (findmnt --df | grep -v tmpfs might do the job).

And real mounts are always mounted on a filesystem path starting with /, so you can do mount | grep '^/'.

But it also turns out that mount will accept a blacklist of types as well as a whitelist: -t notype1,notype2... I prefer the idea of excluding a blacklist of filesystem types versus restricting it to a whitelist; that way if I mount something unusual like curlftpfs that I forgot to add to the whitelist, or I mount a USB stick with a filesystem type I don't use very often (ntfs?), I'll see it.

On my system, this was the list of types I had to disable (sheesh!):

mount -t nosysfs,nodevtmpfs,nocgroup,nomqueue,notmpfs,noproc,nopstore,nohugetlbfs,nodebugfs,nodevpts,noautofs,nosecurityfs,nofusectl

df is easier: like findmnt, it excludes most of those filesystem types to begin with, so there are only a few you need to exclude:

df -hTx tmpfs -x devtmpfs -x rootfs

Obviously I don't want to have to type either of those commands every time I want to check my mount list. SoI put this in my .zshrc. If you call mount or df with no args, it applies the filters, otherwise it passes your arguments through. Of course, you could make a similar alias for findmnt.

# Mount and df are no longer useful to show mounted filesystems,
# since they show so much irrelevant crap now.
# Here are ways to clean them up:
mount() {
    if [[ $# -ne 0 ]]; then
        /bin/mount $*
        return
    fi

    # Else called with no arguments: we want to list mounted filesystems.
    /bin/mount -t nosysfs,nodevtmpfs,nocgroup,nomqueue,notmpfs,noproc,nopstore,nohugetlbfs,nodebugfs,nodevpts,noautofs,nosecurityfs,nofusectl
}

df() {
    if [[ $# -ne 0 ]]; then
        /bin/df $*
        return
    fi

    # Else called with no arguments: we want to list mounted filesystems.
    /bin/df -hTx tmpfs -x devtmpfs -x rootfs
}

Update: Chris X Edwards suggests lsblk or lsblk -o 'NAME,MOUNTPOINT'. it wouldn't have solved my problem because it only shows /dev devices, not virtual filesystems like sshfs, but it's still a command worth knowing about.

Tags: ,
[ 12:25 Mar 31, 2017    More linux | permalink to this entry | comments ]

Sat, 18 Feb 2017

Highlight and remove extraneous whitespace in emacs

I recently got annoyed with all the trailing whitespace I saw in files edited by Windows and Mac users, and in code snippets pasted from sites like StackOverflow. I already had my emacs set up to indent with only spaces:

(setq-default indent-tabs-mode nil)
(setq tabify nil)
and I knew about M-x delete-trailing-whitespace ... but after seeing someone else who had an editor set up to show trailing spaces, and tabs that ought to be spaces, I wanted that too.

To show trailing spaces is easy, but it took me some digging to find a way to control the color emacs used:

;; Highlight trailing whitespace.
(setq-default show-trailing-whitespace t)
(set-face-background 'trailing-whitespace "yellow")

I also wanted to show tabs, since code indented with a mixture of tabs and spaces, especially if it's Python, can cause problems. That was a little harder, but I eventually found it on the EmacsWiki: Show whitespace:

;; Also show tabs.
(defface extra-whitespace-face
  '((t (:background "pale green")))
  "Color for tabs and such.")

(defvar bad-whitespace
  '(("\t" . 'extra-whitespace-face)))

While I was figuring this out, I got some useful advice related to emacs faces on the #emacs IRC channel: if you want to know why something is displayed in a particular color, put the cursor on it and type C-u C-x = (the command what-cursor-position with a prefix argument), which displays lots of information about whatever's under the cursor, including its current face.

Once I had my colors set up, I found that a surprising number of files I'd edited with vim had trailing whitespace. I would have expected vim to be better behaved than that! But it turns out that to eliminate trailing whitespace, you have to program it yourself. For instance, here are some recipes to Remove unwanted spaces automatically with vim.

Tags: ,
[ 16:41 Feb 18, 2017    More linux/editors | permalink to this entry | comments ]

Mon, 13 Feb 2017

Emacs: Initializing code files with a template

Part of being a programmer is having an urge to automate repetitive tasks.

Every new HTML file I create should include some boilerplate HTML, like <html><head></head></body></body></html>. Every new Python file I create should start with #!/usr/bin/env python, and most of them should end with an if __name__ == "__main__": clause. I get tired of typing all that, especially the dunderscores and slash-greater-thans.

Long ago, I wrote an emacs function called newhtml to insert the boilerplate code:

(defun newhtml ()
  "Insert a template for an empty HTML page"
  (interactive)
  (insert "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n"
          "<html>\n"
          "<head>\n"
          "<title></title>\n"
          "</head>\n\n"
          "<body>\n\n"
          "<h1></h1>\n\n"
          "<p>\n\n"
          "</body>\n"
          "</html>\n")
  (forward-line -11)
  (forward-char 7)
  )

The motion commands at the end move the cursor back to point in between the <title> and </title>, so I'm ready to type the page title. (I should probably have it prompt me, so it can insert the same string in title and h1, which is almost always what I want.)

That has worked for quite a while. But when I decided it was time to write the same function for python:

(defun newpython ()
  "Insert a template for an empty Python script"
  (interactive)
  (insert "#!/usr/bin/env python\n"
          "\n"
          "\n"
          "\n"
          "if __name__ == '__main__':\n"
          "\n"
          )
  (forward-line -4)
  )
... I realized that I wanted to be even more lazy than that. Emacs knows what sort of file it's editing -- it switches to html-mode or python-mode as appropriate. Why not have it insert the template automatically?

My first thought was to have emacs run the function upon loading a file. There's a function with-eval-after-load which supposedly can act based on file suffix, so something like (with-eval-after-load ".py" (newpython)) is documented to work. But I found that it was never called, and couldn't find an example that actually worked.

But then I realized that I have mode hooks for all the programming modes anyway, to set up things like indentation preferences. Inserting some text at the end of the mode hook seems perfectly simple:

(add-hook 'python-mode-hook
          (lambda ()
            (electric-indent-local-mode -1)
            (font-lock-add-keywords nil bad-whitespace)
            (if (= (buffer-size) 0)
                (newpython))
            (message "python hook")
            ))

The (= (buffer-size) 0) test ensures this only happens if I open a new file. Obviously I don't want to be auto-inserting code inside existing programs!

HTML mode was a little more complicated. I edit some files, like blog posts, that use HTML formatting, and hence need html-mode, but they aren't standalone HTML files that need the usual HTML template inserted. For blog posts, I use a different file extension, so I can use the elisp string-suffix-p to test for that:

  ;; s-suffix? is like Python endswith
  (if (and (= (buffer-size) 0)
           (string-suffix-p ".html" (buffer-file-name)))
      (newhtml) )

I may eventually find other files that don't need the template; if I need to, it's easy to add other tests, like the directory where the new file will live.

A nice timesaver: open a new file and have a template automatically inserted.

Tags: , ,
[ 09:52 Feb 13, 2017    More linux/editors | permalink to this entry | comments ]

Fri, 27 Jan 2017

Making aliases for broken fonts

A web page I maintain (originally designed by someone else) specifies Times font. On all my Linux systems, Times displays impossibly tiny, at least two sizes smaller than any other font that's ostensibly the same size. So the page is hard to read. I'm forever tempted to get rid of that font specifier, but I have to assume that other people in the organization like the professional look of Times, and that this pathologic smallness of Times and Times New Roman is just a Linux font quirk.

In that case, a better solution is to alias it, so that pages that use Times will choose some larger, more readable font on my system. How to do that was in this excellent, clear post: How To Set Default Fonts and Font Aliases on Linux .

It turned out Times came from the gsfonts package, while Times New Roman came from msttcorefonts:

$ fc-match Times
n021003l.pfb: "Nimbus Roman No9 L" "Regular"
$ dpkg -S n021003l.pfb
gsfonts: /usr/share/fonts/type1/gsfonts/n021003l.pfb
$ fc-match "Times New Roman"
Times_New_Roman.ttf: "Times New Roman" "Normal"
$ dpkg -S Times_New_Roman.ttf
dpkg-query: no path found matching pattern *Times_New_Roman.ttf*
$ locate Times_New_Roman.ttf
/usr/share/fonts/truetype/msttcorefonts/Times_New_Roman.ttf
(dpkg -S doesn't find the file because msttcorefonts is a package that downloads a bunch of common fonts from Microsoft. Debian can't distribute the font files directly due to licensing restrictions.)

Removing gsfonts fonts isn't an option; aside from some documents and web pages possibly not working right (if they specify Times or Times New Roman and don't provide a fallback), removing gsfonts takes gnumeric and abiword with it, and I do occasionally use gnumeric. And I like having the msttcorefonts installed (hey, gotta have Comic Sans! :-) ). So aliasing the font is a better bet.

Following Chuan Ji's page, linked above, I edited ~/.config/fontconfig/fonts.conf (I already had one, specifying fonts for the fantasy and cursive web families), and added these stanzas:

    <match>
        <test name="family"><string>Times New Roman</string></test>
        <edit name="family" mode="assign" binding="strong">
            <string>DejaVu Serif</string>
        </edit>
    </match>
    <match>
        <test name="family"><string>Times</string></test>
        <edit name="family" mode="assign" binding="strong">
            <string>DejaVu Serif</string>
        </edit>
    </match>

The page says to log out and back in, but I found that restarting firefox was enough. Now I could load up a page that specified Times or Times New Roman and the text is easily readable.

Tags: ,
[ 14:47 Jan 27, 2017    More linux | permalink to this entry | comments ]