Viewing HTML mail messages from Mutt (or other command-line mailers)
Update: the script described in this article has been folded into another script called viewmailattachments.py.
Command-line mailers like mutt have one disadvantage: viewing HTML mail with embedded images. Without images, HTML mail is no problem -- run it through lynx, links or w3m. But if you want to see images in place, how do you do it?
Mutt can send a message to a browser like firefox ... but only the textual part of the message. The images don't show up.
That's because mail messages include images,
not as separate files, but as attachments within the same file, encoded
it a format known as MIME (Multipurpose Internet Mail Extensions).
An image link in the HTML, instead of looking like
<img src="picture.jpg">.
, will instead look
something like
<img src="cid:0635428E-AE25-4FA0-93AC-6B8379300161">.
(Apple's Mail.app) or
<img src="cid:1.3631871432@web82503.mail.mud.yahoo.com">.
(Yahoo's webmail).
CID stands for Content ID, and refers to the ID of the image as it is encoded in MIME inside the image. GUI mail programs, of course, know how to decode this and show the image. Mutt doesn't.
A web search finds a handful of shell scripts that use the munpack program (part of the mpack package on Debian systems) to split off the files; then they use various combinations of sed and awk to try to view those files. Except that none of the scripts I found actually work for messages sent from modern mailers -- they don't decode the CID links properly.
I wasted several hours fiddling with various shell scripts, trying to adjust sed and awk commands to figure out the problem, when I had the usual epiphany that always eventually arises from shell script fiddling: "Wouldn't this be a lot easier in Python?"
Python's email package
Python has a package called email that knows how to list and unpack MIME attachments. Starting from the example near the bottom of that page, it was easy to split off the various attachments and save them in a temp directory. The key is
import email fp = open(msgfile) msg = email.message_from_file(fp) fp.close() for part in msg.walk():
That left the problem of how to match CIDs with filenames, and rewrite the links in the HTML message accordingly.
The documentation on the email package is a bit unclear, unfortunately.
For instance, they don't give any hints what object you'll get when
iterating over a message with walk
, and if you try it,
they're just type 'instance'. So what operations can you expect are
legal on them? If you run help(part)
in the Python console
on one of the parts you get from walk
,
it's generally class Message
, so you can use the
Message API,
with functions like get_content_type()
,
get_filename()
. and get_payload()
.
More useful, it has dictionary keys()
for the attributes
it knows about each attachment. part.keys()
gets you a list like
['Content-Type', 'Content-Transfer-Encoding', 'Content-ID', 'Content-Disposition' ]
So by making a list relating part.get_filename()
(with a
made-up filename if it doesn't have one already) to part['Content-ID'],
I'd have enough information to rewrite those links.
Case-insensitive dictionary matching
But wait! Not so simple. That list is from a Yahoo mail message, but if you try keys() on a part sent by Apple mail, instead if will be 'Content-Id'. Note the lower-case d, Id, instead of the ID that Yahoo used.
Unfortunately, Python doesn't have a way of looking up items in a dictionary with the key being case-sensitive. So I used a loop:
for k in part.keys(): if k.lower() == 'content-id': print "Content ID is", part[k]
Most mailers seem to put angle brackets around the content id, so that would print things like "Content ID is <14.3631871432@web82503.mail.mud.yahoo.com>". Those angle brackets have to be removed, since the CID links in the HTML file don't have them.
for k in part.keys(): if k.lower() == 'content-id': if part[k].startswith('<') and part[k].endswith('>'): part[k] = part[k][1:-1]
But that didn't work -- the angle brackets were still there, even though if I printed part[k][1:-1] it printed without angle brackets. What was up?
Unmutable parts inside email.Message
It turned out that the parts inside an email Message (and maybe the Message itself) are unmutable -- you can't change them. Python doesn't throw an exception; it just doesn't change anything. So I had to make a local copy:
for k in part.keys(): if k.lower() == 'content-id': content_id = part[k] if content_id.startswith('<') and content_id.endswith('>'): content_id = content_id[1:-1]and then save content_id, not part[k], in my list of filenames and CIDs.
Then the rest is easy. Assuming I've built up a list called subfiles containing dictionaries with 'filename' and 'Content-Id', I can do the substitution in the HTML source:
htmlsrc = html_part.get_payload(decode=True) for sf in subfiles: htmlsrc = re.sub('cid: ?' + sf['Content-Id'], 'file://' + sf['filename'], htmlsrc, flags=re.IGNORECASE)
Then all I have to do is hook it up to a key in my .muttrc:
# macro index <F10> "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/viewhtmlmail.py\n" "View HTML in browser" # macro pager <F10> "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/viewhtmlmail.py\n" "View HTML in browser"
Works nicely! Here's the complete script: viewhtmlmail.
[ 11:49 Oct 07, 2013 More tech/email | permalink to this entry | ]