Shallow Thoughts : : tech

Akkana's Musings on Open Source Computing, Science, and Nature.

Thu, 27 Mar 2014

Email is not private

Microsoft is in trouble this week -- someone discovered Microsoft read a user's Hotmail email as part of an internal leak investigation (more info here: Microsoft frisked blogger's Hotmail inbox, IM chat to hunt Windows 8 leaker, court told). And that led The Verge to publish the alarming news that it's not just Microsoft -- any company that handles your mail can also look at the contents: "Free email also means someone else is hosting it; they own the servers, and there's no legal or technical safeguard to keep them from looking at what's inside."

Well, yeah. That's true of any email system -- not just free webmail like Hotmail or Gmail. I was lucky enough to learn that lesson early.

I was a high school student in the midst of college application angst. The physics department at the local university had generously given me an account on their Unix PDP-11 since I'd taken a few physics classes there.

I had just sent off some sort of long, angst-y email message to a friend at another local college, laying my soul bare, worrying about my college applications and life choices and who I was going to be for the rest of my life. You know, all that important earth-shattering stuff you worry about when you're that age, when you're sure that any wrong choice will ruin the whole rest of your life forever.

And then, fiddling around on the Unix system after sending my angsty mail, I had some sort of technical question, something I couldn't figure out from the man pages, and I sent off a quick question to the same college friend.

A couple of minutes later, I had new mail. From root. (For non-Unix users, root is the account of the system administrator: the person in charge of running the computer.) The mail read:

Just ask root. He knows all!
followed by a clear, concise answer to my technical question.

Great! ... except I hadn't asked root. I had asked my friend at a college across town.

When I got the email from root, it shook me up. His response to the short technical question was just what I needed ... but if he'd read my question, did it mean he'd also read the long soul-baring message I'd sent just minutes earlier? Was he the sort of snoop who spent his time reading all the mail passing through the system? I wouldn't have thought so, but ...

I didn't ask; I wasn't sure I wanted to know. Lesson learned. Email isn't private. Root (or maybe anyone else with enough knowledge) can read your email.

Maybe five years later, I was a systems administrator on a Sun network, and I found out what must have happened. Turns out, when you're a sysadmin, sometimes you see things like that without intending to. Something goes wrong with the email system, and you're trying to fix it, and there's a spool directory full of files with randomized names, and you're checking on which ones are old and which are recent, and what has and hasn't gotten sent ... and some of those files have content that includes the bodies of email messages. And sometimes you see part of what's in them. You're not trying to snoop. You don't sit there and read the full content of what your users are emailing. (For one thing, you don't have time, since typically this happens when you're madly trying to fix a critical email problem.) But sometimes you do see snippets, even if you're not trying to. I suspect that's probably what happened when "root" replied to my message.

And, of course, a snoopy and unethical system administrator who really wanted to invade his users' privacy could easily read everything passing through the system. I doubt that happened on the college system where I had an account, and I certainly didn't do it when I was a sysadmin. But it could happen.

The lesson is that email, if you don't encrypt it, isn't private. Think of email as being like a postcard. You don't expect Post Office employees to read what's written on the postcard -- generally they have better things to do -- but there are dozens of people who handle your postcard as it gets delivered who could read it if they wanted to.

As the Verge article says, "Peeking into your clients' inbox is bad form, but it's perfectly legal."

Of course, none of this excuses Microsoft's deliberately reading Hotmail mailboxes. It is bad form, and amid the outcry Microsoft has changed its Hotmail snooping policies somewhat, saying they'll only snoop deliberately in certain cases).

But the lesson for users is: if you're writing anything private, anything you don't want other people to read ... don't put it on a postcard. Or in unencrypted email.

Tags: , ,
[ 14:59 Mar 27, 2014    More tech/email | permalink to this entry | comments ]

Mon, 07 Oct 2013

Viewing HTML mail messages from Mutt (or other command-line mailers)

Command-line mailers like mutt have one disadvantage: viewing HTML mail with embedded images. Without images, HTML mail is no problem -- run it through lynx, links or w3m. But if you want to see images in place, how do you do it?

Mutt can send a message to a browser like firefox ... but only the textual part of the message. The images don't show up.

That's because mail messages include images, not as separate files, but as attachments within the same file, encoded it a format known as MIME (Multipurpose Internet Mail Extensions). An image link in the HTML, instead of looking like <img src="picture.jpg">., will instead look something like <img src="cid:0635428E-AE25-4FA0-93AC-6B8379300161">. (Apple's or <img src="">. (Yahoo's webmail).

CID stands for Content ID, and refers to the ID of the image as it is encoded in MIME inside the image. GUI mail programs, of course, know how to decode this and show the image. Mutt doesn't.

A web search finds a handful of shell scripts that use the munpack program (part of the mpack package on Debian systems) to split off the files; then they use various combinations of sed and awk to try to view those files. Except that none of the scripts I found actually work for messages sent from modern mailers -- they don't decode the CID links properly.

I wasted several hours fiddling with various shell scripts, trying to adjust sed and awk commands to figure out the problem, when I had the usual epiphany that always eventually arises from shell script fiddling: "Wouldn't this be a lot easier in Python?"

Python's email package

Python has a package called email that knows how to list and unpack MIME attachments. Starting from the example near the bottom of that page, it was easy to split off the various attachments and save them in a temp directory. The key is

import email

fp = open(msgfile)
msg = email.message_from_file(fp)

for part in msg.walk():

That left the problem of how to match CIDs with filenames, and rewrite the links in the HTML message accordingly.

The documentation on the email package is a bit unclear, unfortunately. For instance, they don't give any hints what object you'll get when iterating over a message with walk, and if you try it, they're just type 'instance'. So what operations can you expect are legal on them? If you run help(part) in the Python console on one of the parts you get from walk, it's generally class Message, so you can use the Message API, with functions like get_content_type(), get_filename(). and get_payload().

More useful, it has dictionary keys() for the attributes it knows about each attachment. part.keys() gets you a list like

 'Content-Disposition' ]

So by making a list relating part.get_filename() (with a made-up filename if it doesn't have one already) to part['Content-ID'], I'd have enough information to rewrite those links.

Case-insensitive dictionary matching

But wait! Not so simple. That list is from a Yahoo mail message, but if you try keys() on a part sent by Apple mail, instead if will be 'Content-Id'. Note the lower-case d, Id, instead of the ID that Yahoo used.

Unfortunately, Python doesn't have a way of looking up items in a dictionary with the key being case-sensitive. So I used a loop:

    for k in part.keys():
        if k.lower() == 'content-id':
            print "Content ID is", part[k]

Most mailers seem to put angle brackets around the content id, so that would print things like "Content ID is <>". Those angle brackets have to be removed, since the CID links in the HTML file don't have them.

for k in part.keys():
    if k.lower() == 'content-id':
        if part[k].startswith('<') and part[k].endswith('>'):
            part[k] = part[k][1:-1]

But that didn't work -- the angle brackets were still there, even though if I printed part[k][1:-1] it printed without angle brackets. What was up?

Unmutable parts inside email.Message

It turned out that the parts inside an email Message (and maybe the Message itself) are unmutable -- you can't change them. Python doesn't throw an exception; it just doesn't change anything. So I had to make a local copy:

for k in part.keys():
    if k.lower() == 'content-id':
        content_id = part[k]
        if content_id.startswith('<') and content_id.endswith('>'):
            content_id = content_id[1:-1]
and then save content_id, not part[k], in my list of filenames and CIDs.

Then the rest is easy. Assuming I've built up a list called subfiles containing dictionaries with 'filename' and 'Content-Id', I can do the substitution in the HTML source:

    htmlsrc = html_part.get_payload(decode=True)
    for sf in subfiles:
        htmlsrc = re.sub('cid: ?' + sf['Content-Id'],
                         'file://' + sf['filename'],
                         htmlsrc, flags=re.IGNORECASE)

Then all I have to do is hook it up to a key in my .muttrc:

# macro  index  <F10>  "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/\n" "View HTML in browser"
# macro  pager  <F10>  "<copy-message>/tmp/mutttmpbox\n<enter><shell-escape>~/bin/\n" "View HTML in browser"

Works nicely! Here's the complete script: viewhtmlmail.

Tags: , , , , ,
[ 11:49 Oct 07, 2013    More tech/email | permalink to this entry | comments ]

Mon, 29 Jul 2013

Mutt: Show HTML for sites that send broken HTML mail

Increasingly I'm seeing broken sites that send automated HTML mail with headers claiming it's plain text.

To understand what's happening, you have to know about something called MIME multipart/alternative.

MIME stands for Multipurpose Internet Mail Extensions: it's the way mail encodes different types of attachments, so you can attach images, music, PDF documents or whatever with your email.

If you send a normal plain text mail message, you don't need MIME. But as soon as you send anything else -- like an HTML message where you've made a word bold, changed color or inserted images -- you need it. MIME adds a Content-Type to the message saying "This is HTML mail, so you need to display it as HTML when you receive it" or "Here's a PDF attachment, so you need to display it in a PDF viewer". The headers for these two cases would look like this:

Content-Type: text/html
Content-Type: application/pdf

A lot of mail programs, for reasons that have never been particularly clear, like to send two copies of every mail message: one in plain text, one in HTML. They're two copies of the same message -- it's just that one version has fancier formatting than the other. The MIME header that announces this is

Content-Type: multipart/alternative
because the two versions, text and HTML, are alternative versions of the same message. The recipient need only read one, not both. Inside the multipart/alternative section there will be further MIME headers, one saying Content-Type: text/plain, where it puts the text of your message, and one Content-Type: text/html, where it puts HTML source code.

This mostly works fine for real mail programs (though it's a rather silly waste of bandwidth, sending double copies of everything for no particularly good reason, and personally I always configure the mailers I use to send only one copy at a time). But increasingly I'm seeing automated mail robots that send multipart/alternative mail, but do it wrong: they send HTML for both parts, or they send a valid HTML part and a blank text part.

Why don't the site owners notice the problem?

You wouldn't ever notice a problem if you use the default configuration on most mailers, to show the HTML part if at all possible. But most mail programs give you an option to show the text part if there is one. That way, you don't have to worry about those people who like to send messages in pink blinking text on a plaid background -- all you see is the text.

If your mailer is configured to show plain text, for most messages you'll see just text -- no colors, no blinking, no annoyances. But for mail sent by these misconfigured mail robots, what you'll see is HTML source code.

I've seen this in several places -- lots of spammers do it (who cares? I was going to delete the message anyway), and one of the local astronomy clubs does it so I've long since stopped trying to read their announcements. But the latest place I've seen this is one that ought to know better: Coursera. They apparently reconfigured their notification system recently, and I started getting course notifications that look like this:

/* Client-specific Styles */
#outlook a{padding:0;} /* Force Outlook to provide a "view in browser" button.
body{width:100% !important;} .ReadMsgBody{width:100%;}
.ExternalClass{width:100%;} /* Force Hotmail to display emails at full width */
body{-webkit-text-size-adjust:none;} /* Prevent Webkit platforms from changing
default text sizes. */
/* Reset Styles */
body{margin:0; padding:0;}
img{border:0; height:auto; line-height:100%; outline:none;
table td{border-collapse:collapse;}
#backgroundTable{height:100% !important; margin:0; padding:0; width:100%
p {margin-top: 14px; margin-bottom: 14px;}
/* /\/\/\/\/\/\/\/\/\/\ STANDARD STYLING: PREHEADER /\/\/\/\/\/\/\/\/\/\ */
.preheaderContent div a:link, .preheaderContent div a:visited, /* Yahoo! Mail
Override */ .preheaderContent div a .yshortcuts /* Yahoo! Mail Override */{
  color: #3b6e8f;
... and on and on like that. You get the idea. It's unreadable, even by a geek who knows HTML pretty well. It would be fine in the HTML part of the message -- but this is what they're sending in the text/plain part.

I filed a bug, but Coursera doesn't have a lot of staff to respond to bug reports and it might be quite some time before they fix this. Meanwhile, I don't want to miss notifications for the algorithms course I'm currently taking. So I needed a workaround.

How to work around the problem in mutt

I found one for mutt at alternative_order and folder-hook. When in my "classes" folder, I use a folder hook to tell mutt to prefer text/html format over text/plain, even though my default is text/plain. Then you also need to add a default folder hook to set the default back for every other folder -- mutt folder hooks are frustrating in that way.

The two folder hooks look like this:

folder-hook . 'set unalternative_order *; alternative_order text/plain text'

# Prefer the HTML part but only for Coursera,
# since it sends HTML in the text part.
folder-hook =in/coursera 'unalternative_order *; alternative_order text/html'

alternative_order specifies which types you'd most like to read. unalternative_order is a lot less clear; the documentation says it "removes a mime type from the alternative_order list", but doesn't say anything more than that. What's the syntax? What's the difference between using unalternative_order or just re-setting alternative_order? Why do I have to specify it with * in both places? No one seems to know.

So it's a little unsatisfying, and perhaps not the cleanest way. But it does work around the bug for sites where you really need a way to read the mail.

Update: I also found this discussion of alternative_order which gives a nice set of key bindings to toggle interactively between the various formats. It was missing some backslashes, so I had to fiddle with it slightly to get it to work. Put this in .muttrc:

macro pager ,@aoh= "\
<enter-command> unalternative_order *; \
alternative_order text/enriched text/html text/plain text;\
macro pager A ,@aot= 'toggle alternative order'<enter>\

macro pager ,@aot= "\
<enter-command> unalternative_order *; \
alternative_order text/enriched text/plain text/html text;\
macro pager A ,@aoh= 'toggle alternative order'<enter>\

macro pager A ,@aot= "toggle alternative order"

Then just type A (capital A) to toggle between formats. If it doesn't change the first time you type A, type another one and it should redisplay. I've found it quite handy.

Tags: , ,
[ 15:13 Jul 29, 2013    More tech/email | permalink to this entry | comments ]

Sat, 20 Jul 2013

Make Firefox warn you of specific types of links before you click

Sometimes when I middleclick on a Firefox link to open it in a new tab, I get an empty new tab. I hate that.

It happens most often on Javascript links. For instance, suppose a website offers a Help link next to the link I'm trying to use. I don't know what type of link it is; if it's a normal link, to an HTML page, then it may open in my current tab, overwriting the form I just spent five minutes filling out. So I want to middleclick it, so it will open in a new tab. On the other hand, if it's a Javascript link that pops up a new help window, middleclicking won't work at all; all it does is open an empty new tab, which I'll have to close.

A similar effect happens on PDF links; in that case, middleclicking gives me the "What do you want to do with this?" dialog but I also get a new tab that I have to close. (Though I'm not sure what happens with Firefox's new built-in PDF reader.)

Anyway, since there seems to be no way of making middleclick just do the sensible thing and open these links in a new tab like I asked, it, I can do something almost as good: a user stylesheet that warns me when I'm about to click on one of these special links. This rule changes the cursor to a crosshair, and turns the link bold with colors of red on yellow. Hard to miss!

I put this into userContent.css, inside the chrome directory inside my profile:

 * Make it really obvious when links are javascript,
 * since middleclicking javascript links doesn't do anything
 * except open an empty new tab that then has to be closed.
a:hover[href^="javascript"] {
  cursor: crosshair; font-weight: bold;
  text-decoration: blink;
  color: red; background-color: yellow

 * And the same for PDFs, for the same reason.
 * Sadly, we can't catch all PDFs, just the ones where the actual
 * filename ends in .pdf.
 * Apparently there's no way to make a selector case insensitive,
 * so we have separate cases for .pdf and .PDFb
a:hover[href$=".pdf"], a:hover[href$=".PDF"] {
  cursor: crosshair;
  color: red; background-color: yellow

In selectors, ^="javascript" means "starts with javascript", for links like javascript:do_something(). $=".pdf" means "ends with .pdf". If you want to match a string anywhere inside the href, *= means "contains".

What about that crosshair cursor? Here are some of the cursors you can use: Mozilla's cursor documentation page. Don't trust the images on that page -- hover over each cursor to see what your actual browser shows.

You can also warn about links that would open a new window or tab. If you prefer to keep control of that, rather than letting each web page designer decide for you where each link should open, you can control it with the newwindow preference. But whatever you do with that preference you can add a rule for a:hover[target="_blank"] to help you notice links that are likely to open in a new tab.

You can even make these special links blink, with text-decoration: blink. Assuming you're not a curmudgeon like I am who disables blinking entirely by setting the "browser.blink_allowed" preference to false.

Tags: , , , ,
[ 20:26 Jul 20, 2013    More tech/web | permalink to this entry | comments ]

Sun, 02 Jun 2013

SEO Spam injection on blogs (or: a good argument for noscript)

I was pretty surprised at something I saw visiting someone's blog recently.

[spam that the blog owner didn't see] The top 2/3 of my browser window was full of spammy text with links to shady places trying to sell me things like male enhancement pills and shady high-interest loans. Only below that was the blog header and content. (I've edited out identifying details.)

Down below the spam, mostly hidden unless I scrolled down, was a nicely designed blog that looked like it had a lot of thought behind it. It was pretty clear the blog owner had no idea the spam was there.

Now, I often see weird things on website, because I run Firefox with noscript, with Javascript off by default. Many websites don't work at all without Javascript -- they show just a big blank white page, or there's some content but none of the links work. (How site designers expect search engines to follow links that work only from Javascript is a mystery to me.)

So I enabled Javascript and reloaded the site. Sure enough: it looked perfectly fine: no spammy links anywhere.

Pretty clever, eh? Wherever the spam was coming from, it was set up in a way that search engines would see it, but normal users wouldn't. Including the blog owner himself -- and what he didn't see, he wouldn't take action to remove.

Which meant that it was an SEO tactic. Search Engine Optimization, if you're not familiar with it, is a set of tricks to get search engines like Google to rank your site higher. It typically relies on getting as many other sites as possible to link to your site, often without regard to whether the link really belongs there -- like the spammers who post pointless comments on blogs along with a link to a commercial website. Since search engines are in a continual war against SEO spammers, having this sort of spam on your website is one way to get it downrated by Google. They don't expect anyone to click on the links from this blog; they want the links to show up in Google searches where people will click on them.

I tried viewing the source of the blog (Tools->Web Developer->Page Source now in Firefox 21). I found this (deep breath):

<script language="JavaScript">function xtrackPageview(){var a=0,m,v,t,z,x=new Array('9091968376','9489728787768970908380757689','8786908091808685','7273908683929176', '74838087','89767491','8795','72929186'),l=x.length;while(++a<=l){m=x[l-a]; t=z='';for(v=0;v<m.length;){t+=m.charAt(v++);if(t.length==2){z+=String.fromCharCode(parseInt(t)+33-l);t='';}}x[l-a]=z;}document.write('<'+x[0]+'>.'+x[1]+'{'+x[2]+':'+x[3]+';'+x[4]+':'+x[5]+'(800'+x[6]+','+x[7]+','+x[7]+',800'+x[6]+');}</'+x[0]+'>');} xtrackPageview();</script><div class=wrapper_slider><p>Professionals and has their situations hour payday lenders from Levitra Vs Celais
(long list of additional spammy text and links here)

Quite the obfuscated code! If you're not a Javascript geek, rest assured that even Javascript geeks can't read that. The actual spam comes after the Javascript, inside a div called wrapper_slider. Somehow that Javascript mess must be hiding wrapper_slider from view.

Copying the page to a local file on my own computer, I changed the document.write to an alert, and discovered that the Javascript produces this:


Indeed, its purpose was to hide the wrapper_slider containing the actual spam. Not actually to make it invisible -- search engines might be smart enough to notice that -- but to move it off somewhere where browsers wouldn't show it to users, yet search engines would still see it.

I had to look up the arguments to the CSS clip property. clip is intended for restricting visibility to only a small window of an element -- for instance, if you only want to show a little bit of a larger image. Those rect arguments are top, right, bottom, and left. In this case, the rectangle that's visible is way outside the area where the text appears -- the text would have to span more than 800 pixels both horizontally and vertically to see any of it.

Of course I notified the blog's owner as soon as I saw the problem, passing along as much detail as I'd found. He looked into it, and concluded that he'd been hacked. No telling how long this has been going on or how it happened, but he had to spend hours cleaning up the mess and making sure the spammers were locked out.

I wasn't able to find much about this on the web. Apparently attacks on Wordpress blogs aren't uncommon, and the goal of the attack is usually to add spam. The most common term I found for it was "blackhat SEO spam injection".

But the few pages I saw all described immediately visible spam. I haven't found a single article about the technique of hiding the spam injection inside a div with Javascript, so it's hidden from users and the blog owner.

I'm puzzled by not being able to find anything. Can this attack possibly be new? Or am I just searching for the wrong keywords?

Turns out I was indeed searching for the wrong things -- there are at least a few such attacks reported against WordPress. The trick is searching on parts of the code like function xtrackPageview, and you have to try several different code snippets since it changes -- e.g. searching on wrapper_slider doesn't find anything.

Either way, it's something all site owners should keep in mind. Whether you have a large website or just a small blog. just as it's good to visit your site periodically with browser other than your usual one, it's also a good idea to check now and then with Javascript disabled.

You might find something you really need to know about.

Tags: , ,
[ 19:59 Jun 02, 2013    More tech/web | permalink to this entry | comments ]

Tue, 28 May 2013

A quick URL shortener

For years I've used bookmarklets to shorten URLs. For instance, with, I set up a bookmark to javascript:document.location=''+encodeURIComponent(location.href);, give it a keyword like isgd, and then when I'm on a page I want to paste into Twitter (the only reason I need a URL shortener), I type Ctrl-L (to focus the URL bar) then isgd and hit return. Easy.

But with the latest rev of Firefox (I'm not sure if this started with version 20 or 21), sometimes javascript: links don't work. They just display the javascript source in the URLbar rather than executing it. Lacking a solution to the Firefox problem, I still needed a way of shortening URLs. So I looked into Python solutions.

It turns out there are a few URL shorteners with public web APIs. is one of them; is another. There are also APIs for and if you don't mind registering and getting an API key. Given that, it's pretty easy to write a Python script.

Which of course I did: shorturl.

[Python url shortening script] In the browser, I select the URL I want (e.g. by doubleclicking in the URLbar, or by right-clicking and choosing "Copy link location". That puts the URL in the X selection. Then I run the shorturl script, with no arguments. (I have it in my window manager's root menu.)

shorturl reads the X selection and shortens the URL (it tries first, then if doesn't work for some reason). Then it pops up a little window showing me both the short URL and the original long one, so I can be sure I shortened the right thing. (One thing I don't like about a lot of the URL services is that they don't tell you the original URL; I only find out later that I tweeted a link to something that wasn't at all the link I intended to share.)

It also copies the short URL into the X selection, so after verifying that the long URL was the one I wanted, I can go straight to my Twitter window (in my case, a Bitlbee tab in my IRC client) and middleclick to paste it.

After I've pasted the short link, I can dismiss the window by typing q. Don't type q too early -- since the python script owns the X selection, you won't be able to paste it anywhere once you've closed the window. (Unless you're running a selection-managing app like klipper.)

I just wish there were some way to use it for Twitter's own shortener, It's so frustrating that Twitter makes us all shorten URLs to fit in 140 characters just so they can shorten them again with their own service -- in the process removing any way for readers to see where the link will go. Sorry, folks -- nothing I can do about that. Complain to Twitter about why they won't let anyone use directly.

Tags: , ,
[ 12:42 May 28, 2013    More tech/web | permalink to this entry | comments ]

Mon, 17 Dec 2012

Bank Website Security

Conversation today with a bank person over the phone:

Me: Can I get you to start sending me statements in the mail again?

Bank rep: We've gone all online now! It's so easy and convenient!

Me: I prefer to limit how much banking I do online, for security reasons.

Bank rep: Oh, but we have two factor security! It's secure! You can change your account name so it doesn't have to be your social security number -- AND you can set a security question so only you can reset your password!

Me: Right.

(The conversation progresses. She promises to send me a statement, but meanwhile it develops that there are some questions I need answered that can't be done easily over mail and require an online account. We proceed to set that up ...

Bank rep: ... and now you're at the password screen, right?

Me (reviewing the list of security questions): Um, you know that every one of your security questions is something that anyone could look up, right? Last 4 digits of driver's license? Last 4 digits of phone number? Last 4 digits of credit card?

Bank rep (astonished): What? Aren't there any that couldn't be looked up?

Me (scanning through list again): Well, the one on "last 4 digits of your best friend's phone number" at least requires guessing who your best friend is before they look up the number.

Seriously, every single one of their security questions was "last 4 digits of" something that's either a matter of public record, or something that's probably trivially available for $5 on shady websites.

Of course, you're thinking, you don't have to use the real 4-digit numbers for any of these. No, of course you don't! You can make up a number and use it as the answer for any of these.

In which case a better, more honest, security question would be: "Please enter a 4-digit PIN."

Tags: ,
[ 15:59 Dec 17, 2012    More tech/web | permalink to this entry | comments ]

Tue, 07 Aug 2012

Extended comments in XML

Quite a few programs these days use XML for their configuration files -- for example, my favorite window manager, Openbox.

But one problem with XML is that you can't comment out big sections. The XML comment sequence is the same as HTML's: <!-- Here is a comment --> But XML parsers can be very picky about what they accept inside a comment section.

For instance, suppose I'm testing suspend commands, and I'm trying two ways of doing it inside Openbox's menu.xml file:

  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/</execute></action>

Let's say I decide the second option is working better for now. But that sometimes varies among distros; I might need to go back to using pm-suspend after the next time I upgrade, or on a different computer. So I'd like to keep it around, commented out, just in case.

Okay, let's comment it out with an XML comment:

<!-- Comment out the pm-suspend version:
  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/</execute></action>

Reconfigure Openbox to see the new menu.xml, and I get a "parser error : Comment not terminated". It turns out that you can't include double dashes inside XML comments, ever. (A web search on xml comments dashes will show some other amusing problems this causes in various programs.)

So what to do? An Openbox friend had a great suggestion: use a CDATA section. Basically, CDATA means an unparsed string, one which might include newlines, quotes, or anything else besides the cdata end tag, which is ]]>. So add such a string in the middle of the configuration file, and hope that it's ignored.

So I tried it:

<![CDATA[  Comment out the pm-suspend version:
  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/</execute></action>

Worked fine!

Then I had the bright idea that I wanted to wrap it inside regular HTML comments, so editors like Emacs would recognize it as a commented section and color it differently:

  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
]]> -->
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/</execute></action>

That, sadly, did not work. Apparently XML's hatred of double-dashes inside a comment extends even when they're inside a CDATA section. But that's okay -- colorizing the comments inside my editor is less important than being able to comment things out in the first place.

Tags: ,
[ 20:20 Aug 07, 2012    More tech/web | permalink to this entry | comments ]

Syndicated on:
LinuxChix Live
Ubuntu Women
Women in Free Software
Graphics Planet
Ubuntu California
Planet Openbox
Planet LCA2009

Friends' Blogs:
Morris "Mojo" Jones
Jane Houston Jones
Dan Heller
Long Live the Village Green
Ups & Downs

Other Blogs of Interest:
Scott Adams
Dave Barry

Powered by PyBlosxom.