Shallow Thoughts : tags : email

Akkana's Musings on Open Source Computing, Science, and Nature.

Sat, 08 Dec 2012

Decoding RFC 2047 email headers (like spam Subjects in other charsets)

Having not had much luck with spam filtering solutions like SpamAssassin, I'm forever having to add new spam filters by hand. For instance, after about the sixth time I get "President Waives Refi Requirement" or "Melt your fat! MUST WATCH this video now!" within a couple of hours, I'm pretty tired of it and don't want to see any more of them.

With mail filtering programs like procmail or maildrop, it's easy enough to match a pattern like "Subject:.*Refi Requirement" or "Subject:.*Melt your fat" and filter that message to a spam folder (or /dev/null).

But increasingly, I add patterns I'm seeing in spam messages, and yet the messages with those patterns keep coming in. Why? Because the spammers are using RFC 2047 to encode the subject into some other character set.

Here's how it works. A spammer sends a subject line that looks something like this:

Subject: =?utf-8?B?U3RvcCBPdmVycGF5aW5nIGZvciBQcmludGVyIEluaw==?=

Mail programs are smart enough to decode this into:

Subject: Stop Overpaying for Printer Ink

but spam filtering programs often aren't, so your "printer ink" filter won't catch it. And if you look through your spam folder with tools like grep to see why it didn't get caught, or to find particularly spammy subjects that might call for a filter (grep Subject spamfolder | sort is pretty handy), these encoded subjects will be incognito.

I briefly tried setting up a filter that spam-filed anything with =? in the Subject line. But that's way too broad a brush -- not all people there are legitimate reasons for using other charsets even in English language email. It's relatively rare, but it happens. And some bots, notably the Adafruit forum notification bot and the bot that sends out announcements from my alma mater, unaccountably encode the charset even when they're sending mail entirely in US ASCII.

So what's really needed is not to filter out all messages that specify a charset, but to decode the Subject so the spam filter can see it and filter it accordingly.

How? I couldn't find any ready-made tool available for Linux that could decode RFC 2047 headers; but the Python email package makes decoding a one-line task. In the Python interpreter:

$ python
Python 2.7.3 (default, Aug  1 2012, 05:16:07) 
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> email.Header.decode_header("Subject: =?utf-8?B?U3RvcCBPdmVycGF5aW5nIGZvciBQcmludGVyIEluaw==?=")
[('Subject:', None), ('Stop Overpaying for Printer Ink', 'utf-8')]
>>>

So it's easy to write a script that can pull headers out of email messages (files) and decode them. Just look for the line starting with the header you want to match -- e.g. "Subject:" -- and pass that line to email.Header.decode_header().

Only one snag. If the subject is longer than about 20 characters, spammers will often opt to split it up into multiple groups, sometimes even in different character sets. So for example, you might see something like this, spread over multiple lines:

Subject: =?windows-1252?Q?Earn_your_degree_=97_on_your_time?=
        =?windows-1252?Q?_and_terms?=

The script has to handle that too. If it's reading a header, it has to check the next line, and if that line begins with whitespace, treat it as more of the header.

The resulting script, decodemail.py (on github), seems pretty handy and should be able to be plugged in to a mail filtering program.

Tags: ,
[ 20:45 Dec 08, 2012    More programming | permalink to this entry | comments ]

Thu, 25 Aug 2011

Deleting email from a mail server with Python

How do you delete email from a mail server without downloading or reading it all?

Why? Maybe you got a huge load of spam and you need to delete it. Maybe you have your laptop set up to keep a copy of your mail on the server so you can get it on your desktop later ... but after a while you realize it's not worth downloading all that mail again. In my case, I use an ISP that keeps copies of all mail forwarded from one alias to another, so I periodically need to clean out the copies.

There are quite a few reasons you might want to delete mail without reading it ... so I was surprised to find that there didn't seem to be any easy way to do so.

But POP3 is a fairly simple protocol. How hard could it be to write a Python script to do what I needed?

Not hard at all, in fact. The poplib package does most of the work for you, encapsulating both the networking and the POP3 protocol. It even does SSL, so you don't have to send your password in the clear.

Once you've authenticated, you can list() messages, which gives you a status and a list of message numbers and sizes, separated by a space. Just loop through them and delete each one.

Here's a skeleton program to delete messages:

server = "mail.example.com"
port = 995
user = "myname"
passwd = "seekrit"

pop = poplib.POP3_SSL(server, port)
pop.user(user)
pop.pass_(passwd)

poplist = pop.list()
if poplist[0].startswith('+OK') :
    msglist = poplist[1]
    for msgspec in msglist :
        # msgspec is something like "3 3941", 
        # msg number and size in octets
        msgnum = int(msgspec.split(' ')[0])
        print "Deleting msg %d\r" % msgnum,
        pop.dele(msgnum)
    else :
        print "No messages for", user
else :
    print "Couldn't list messages: status", poplist[0]
pop.quit()

Of course, you might want to add more error checking, loop through a list of users, etc. Here's the full script: deletemail.

Tags: , ,
[ 16:41 Aug 25, 2011    More programming | permalink to this entry | comments ]

Tue, 09 Aug 2011

Changing your email address in Yahoo Groups

A while ago I switched ISPs, and maintaining a lot of email addresses got more complicated. So I decided to consolidate.

But changing your email address turns out to be tricky on some sites. For example, on Amazon it apparently requires a phone call to customer support (I haven't gotten around to it yet, but that's what their email support people told me to do).

Then there's Yahoo groups. I'm in quite a few groups, so when I made the switch, I went to groups.yahoo.com, added a valid address and made it my primary address. Great -- thought I was done.

Weeks later, it occurred to me that I hadn't been getting any mail from a bunch of groups I used to get mail from. I went to Yahoo groups and clicked around for five minutes trying to find something that would show me my email addresses. Eventually I gave up on that, went to one of the groups I hadn't been getting, and saw a notice at the top:

The email address you are using for this group is currently bouncing. More info here.

So naturally, I clicked on the More info here link, and got taken to a page that said:

Groups Error: No Permission

No Permission
You do not have permission to access this page.

Gosh, that's some helpful info, Yahoo!

So how do you really change it?

There are lots of ways to get to the Yahoo Groups "Manage your email addresses" page -- but it shows only the new address, listed as primary, as primary, and doesn't show the old address where it's actually trying to send all the mail. No way to delete it from there.

Now, you can Edit membership in any particular group: that shows both the old nonworking address (with the box checked) and the new one (check the box to change it). Great -- so I'm supposed to do that for all 25 or so groups I'm in? Seriously?

After much searching, I finally found an old discussion thread with a link to the Edit my groups page. Scroll down to the bottom and look for "Set all of the above to".

It's still not a one-step operation -- my groups are spread across three pages and there's no "View all on one page", and each time you submit a page, it takes you back to "View groups" mode so you have to click on the next page, then click on "Edit groups" again. Still, it's a heck of a lot faster than going through all the groups one by one.

In theory it's all changed now. But then, I thought that last time ... time will tell whether the mail actually starts flowing again.

Meanwhile, Yahoo developers: you might want to take a look at that "More info" page that just gives a permission error.

Tags: ,
[ 17:58 Aug 09, 2011    More tech | permalink to this entry | comments ]

Sun, 27 Mar 2011

Automated mail: check the plaintext part (or don't send one)

Funny thing happened last week.

I'm on the mailing list for a volunteer group. Round about last December, I started getting emails every few weeks congratulating me on RSVPing for the annual picnic meeting on October 17.

This being well past October, when the meeting apparently occurred -- and considering I'd never heard of the meeting before, let alone RSVPed for it -- I couldn't figure out why I kept getting these notices.

After about the third time I got the same notice, I tried replying, telling them there must be something wrong with their mailer. I never got a reply, and a few weeks later I got another copy of the message about the October meeting.

I continued sending replies, getting nothing in return -- until last week, when I got a nice apologetic note from someone in the organization, and an explanation of what had happened. And the explanation made me laugh.

Seems their automated email system sends messages as multipart, both HTML and plaintext. Many user mailers do that; if you haven't explicitly set it to do otherwise, you yourself are probably sending out two copies of every mail you send, one in HTML and one in plain text.

But in this automated system, the plaintext part was broken. When it sent out new messages in HTML format, apparently for the plaintext part it was always attaching the same old message, this message from October. Apparently no one in the organization had ever bothered to check the configuration, or looked at the plaintext part, to realize it was broken. They probably didn't even know it was sending out multiple formats.

I have my mailer configured to show me plaintext in preference to HTML. Even if I didn't use a text mailer (mutt), I'd still use that setting -- Thunderbird, Apple Mail, Claws and many other mailers offer it. It protects you from lots of scams and phishing attacks, "web bugs" to track you,, and people who think it's the height of style to send mail in blinking yellow comic sans on a red plaid background.

And reading the plaintext messages from this organization, I'd never noticed that the message had an HTML part, or thought to look at it to see if it was different.

It's not the first time I've seen automated mailers send multipart mail with the text part broken. An astronomy club I used to belong to set up a new website last year, and now all their meeting notices, which used to come in plaintext over a Yahoo groups mailing list, have a text part that looks like this actual example from a few days ago:

Subject: Members' Night at the Monthly Meeting
<p>&#60;&#115;&#116;&#121;&#108;&#101;&#32;&#116;&#121;&#112;&#101;&#61;&#34;&#1
16;&#101;&#120;&#116;&#47;&#99;&#115;&#115;&#34;&#62;@font-face {
  font-family: "MS 明朝";
}@font-face {
  font-family: "MS 明朝";
}@font-face {
  font-family: "Cambria";
}p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: Cambria; }a:link, span.MsoHyperlink { color: blue;
text-decoration: underline; }a:visited, span.MsoHyperlinkFollowed { color:
purple; text-decoration: underline; }.MsoChpDefault { font-family: Cambria;
}div.WordSection1 { page: WordSection1;
}&#60;&#47;&#115;&#116;&#121;&#108;&#101;&#62;
<p class="MsoNormal">Friday April 8<sup>th</sup> is members’ night at the
monthly meeting of the PAS.<span style="">&#160; </span>We are asking for
anyone, who has astronomical photographs that they would like to share, to
present them at the meeting.<span style="">&#160; </span>Each presenter will
have about 15 minutes to present and discuss his pictures.<span style=""> We
already have some presenters. &#160; </span></p>
<p class="MsoNormal">&#160;</p>
... on and on for pages full of HTML tags and no line breaks. I contacted the webmaster, but he was just using packaged software and didn't seem to grok that the software was broken and was sending HTML for the plaintext part as well as for the HTML part. His response was fairly typical: "It looks fine to me". I eventually gave up even trying to read their meeting announcements, and now I just delete them.

The silly thing about this is that I can read HTML mail just fine, if they'd just send HTML mail. What causes the problem is these automated systems that insist on sending both HTML and plaintext, but then the plaintext part is wrong. You'll see it on a lot of spam, too, where the plaintext portion says something like "Get a better mailer" (why? so I can see your phishing attack in all its glory?)

Folks, if you're setting up an automated email system, just pick one format and send it. Don't configure it to send multiple formats unless you're willing to test that all the formats actually work.

And developers, if you're writing an automated email system: don't use MIME multipart/alternative by default unless you're actually sending the same message in different formats. And if you must use multipart ... test it. Because your users, the administrators deploying your system for their organizations, won't know how to.

Tags: ,
[ 13:19 Mar 27, 2011    More tech/email | permalink to this entry | comments ]

Tue, 13 Apr 2010

"Joe-job" spam (forged From addresses)

I'm in a Yahoo group where a spammer just posted a message that looked like it was coming from someone in the group, so Yahoo allowed it.

The list owner posted a message about using good passwords so your account isn't hacked since that causes problems for everyone.

Of course, that's good advice and using good passwords is always a good idea. But I though this sounded more like a Joe-job spam, in which the spammer forges the From address to look like it's coming from someone else.

Normal users encounter this in two ways:

  1. You start getting tons of bounce messages that look as though you sent spam to hundreds of people and they're refusing it.
  2. You see spam that looks like it came from a friend of yours, or spam on a mailing list that looks like it came from a legitimate member of that list.

Since this sort of attack is so common, I felt the victim didn't deserve being harangued about not having set up a good password. So I posted a short note to the list explaining about Joe-jobs. But to make the point, I forged the From address of the list owner. Indeed, it got through Yahoo and out to the list just fine:

[ ... ] the spam probably wasn't from a bad password. It was probably just a spammer forging the header to look like it's from a legitimate user. It's called a "joe-job": http://en.wikipedia.org/wiki/Joe-job

To illustrate, I've changed the From address on this message to look like it's coming from Adam. I have not hacked [listowner]'s account or guessed his password or anything else. If this works, and looks like it came from [listowner], then the spam could have been done the same way -- and there's no need to blame the owner of the account, or accuse them of having a bad password.

Why does this work? Why doesn't Yahoo just block messages from user@isp.com if the mail doesn't come from isp.com?

They can't! Many, many people don't send mail from the domains in their email addresses. In effect, people forge their From header all the time. Here are some examples:

If mailing lists rejected posts in all these cases, people would be pretty annoyed. So they don't. But that means that now and then, some Joe-job spam gets through to mailing lists. Unfortunately.

(Update: The message that inspired this may very well have been a hacked password after all case, based on the mail headers. But I found that a lot of people didn't know about Joe-jobbing, so I thought this was worth writing up anyway.)

Tags: , , ,
[ 21:28 Apr 13, 2010    More tech/email | permalink to this entry | comments ]

Tue, 15 Dec 2009

Fetchmail without Postfix

I've been using fetchmail for a couple of years to get mail from the mail server to my local machine. But it had one disadvantage: it meant that I had to have postfix (or a similar large and complex MTA) configured and running on every machine I use, even the lightweight laptop.

I run procmail to filter my mail into folders -- Linuxchix mail into one folder, GIMP mailing lists into another, and so forth -- and it seemed like it ought to be possible for fetchmail to call procmail directly, without going through postfix.

I found several suggestions on the web -- for instance, fetchmail-procmail-sendmail -- but they didn't work for me. fetchmail downloaded each message, passed it to procmail, and procmail appended it to the relevant mailbox without the appropriate "From " header that mail programs need to tell when each new message starts.

Finally, on a tip from bma on #linuxchix and after a little experimentation, I added this line to ~/.fetchmailrc:

mda /usr/bin/procmail -f %F -m /home/username/.procmailrc
Works great! And it's a lot faster than going through postfix.

Tags: , , ,
[ 14:07 Dec 15, 2009    More tech/email | permalink to this entry | comments ]

Thu, 07 May 2009

Pruning those huge Spamassassin files

During a server backup, Dave complained that my .spamassasin directory was taking up 87Mb. I had to agree, that seemed a bit excessive.

The only two large files were auto-whitelist at 42M and bayes_seen at 41M. Apparently these never get pruned by spamassassin.

Unfortunately, these are binary files, so you can't just edit them and remove the early stuff, and spamassassin doesn't seem to have any documentation on how to prune their data files. A thread on the Spamassassin Users list on managing Spamassassin data says it's okay to delete bayes_seen and it will be regenerated.

For pruning auto-whitelist, that same post suggests a program called check-whitelist that is only available in a spamassassin source tarball -- it's not installed as part of distro packages. Run this with --clean. But a search on the spamassassin.com wiki turns up an entry on AutoWhitelist that says you should use tools/sa-awlUtil instead (it doesn't say how to run it or where to get it -- presumably download a source tarball and then RTFSC -- read the source code?)

Really, I'm not sure auto whitelisting is such a good idea anyway, especially auto whitelist entries from several years ago, so I opted for a simpler solution: removing the auto-whitelist file at the same time that I removed bayes_seen. Indeed, both files were immediately generated as new mail came in, but they were now much smaller.

I've run for a few weeks since doing that, and I'm not noticing any difference in either the number of false positives or false negatives. (Both are, unfortuantely, large enough to be noticable, but that was true before the change as well.)

Tags: , ,
[ 19:38 May 07, 2009    More tech/email | permalink to this entry | comments ]

Syndicated on:
LinuxChix Live
Ubuntu Women
Women in Free Software
Graphics Planet
DevChix
Ubuntu California
Planet Openbox
Devchix
Planet LCA2009

Friends' Blogs:
Morris "Mojo" Jones
Jane Houston Jones
Dan Heller
Long Live the Village Green
Ups & Downs
DailyBBG

Other Blogs of Interest:
DevChix
Scott Adams
Dave Barry
BoingBoing

Powered by PyBlosxom.