Shallow Thoughts : tags : spam
Akkana's Musings on Open Source Computing, Science, and Nature.
Sun, 02 Jun 2013
I was pretty surprised at something I saw visiting someone's blog recently.
The top 2/3 of my browser window was full of spammy text with links to
shady places trying to sell me things like male enhancement pills
and shady high-interest loans.
Only below that was the blog header and content.
(I've edited out identifying details.)
Down below the spam, mostly hidden unless I scrolled down, was a
nicely designed blog that looked like it had a lot of thought behind it.
It was pretty clear the blog owner had no idea the spam was there.
Now, I often see weird things on website, because I run Firefox with
there's some content but none of the links work. (How site designers
a mystery to me.)
perfectly fine: no spammy links anywhere.
Pretty clever, eh? Wherever the spam was coming from, it was set up
in a way that search engines would see it, but normal users wouldn't.
Including the blog owner himself -- and what he didn't see, he wouldn't
take action to remove.
Which meant that it was an SEO tactic.
Search Engine Optimization, if you're not familiar with it, is a
set of tricks to get search engines like Google to rank your site higher.
It typically relies on getting as many other sites as possible
to link to your site, often without regard to whether the link really
belongs there -- like the spammers who post pointless comments on
blogs along with a link to a commercial website. Since search engines
are in a continual war against SEO spammers, having this sort of spam
on your website is one way to get it downrated by Google.
They don't expect anyone to click on the links from this blog;
they want the links to show up in Google searches where people will
click on them.
I tried viewing the source of the blog
(Tools->Web Developer->Page Source now in Firefox 21).
I found this (deep breath):
(long list of additional spammy text and links here)
Copying the page to a local file on my own computer, I changed the
document.write to an
Indeed, its purpose was to hide the wrapper_slider
containing the actual spam.
Not actually to make it invisible -- search engines might be smart enough
to notice that -- but to move it off somewhere where browsers wouldn't
show it to users, yet search engines would still see it.
I had to look up the arguments to the CSS clip property.
is intended for restricting visibility to only a small window of an
element -- for instance, if you only want to show a little bit of a
Those rect arguments are top, right, bottom, and left.
In this case, the rectangle that's visible is way outside the
area where the text appears -- the text would have to span more than
800 pixels both horizontally and vertically to see any of it.
Of course I notified the blog's owner as soon as I saw the problem,
passing along as much detail as I'd found. He looked into it, and
concluded that he'd been hacked. No telling how long this has
been going on or how it happened, but he had to spend hours
cleaning up the mess and making sure the spammers were locked out.
I wasn't able to find much about this on the web. Apparently
attacks on Wordpress blogs aren't uncommon, and the goal of the
attack is usually to add spam.
The most common term I found for it was "blackhat SEO spam injection".
But the few pages I saw all described immediately visible spam.
I haven't found a single article about the technique of hiding the
and the blog owner.
I'm puzzled by not being able to find anything. Can this attack
possibly be new? Or am I just searching for the wrong keywords?
Turns out I was indeed searching for the wrong things -- there are at
few such attacks reported against WordPress.
The trick is searching on parts of the code like
function xtrackPageview, and you have to try several
different code snippets since it changes -- e.g. searching on
wrapper_slider doesn't find anything.
Either way, it's something all site owners should keep in mind.
Whether you have a large website or just a small blog.
just as it's good to visit your site periodically with browser other
than your usual one, it's also a good idea to check now and then
You might find something you really need to know about.
[ 18:59 Jun 02, 2013
More tech/web |
permalink to this entry |
Sat, 08 Dec 2012
Having not had much luck with spam filtering solutions like SpamAssassin,
I'm forever having to add new spam filters by hand. For instance, after
about the sixth time I get "President Waives Refi Requirement"
or "Melt your fat! MUST WATCH this video now!" within a couple of
hours, I'm pretty tired of it and don't want to see any more of them.
With mail filtering programs like procmail or maildrop, it's easy
enough to match a pattern like "Subject:.*Refi Requirement" or
"Subject:.*Melt your fat" and filter that message to a spam folder
But increasingly, I add patterns I'm seeing in spam messages, and yet
the messages with those patterns keep coming in. Why? Because the
spammers are using RFC 2047
to encode the subject into some other character set.
Here's how it works. A spammer sends a subject line that looks
something like this:
Mail programs are smart enough to decode this into:
Subject: Stop Overpaying for Printer Ink
but spam filtering programs often aren't, so your "printer ink" filter
won't catch it. And if you look through your spam folder with tools like
grep to see why it didn't get caught, or to find particularly spammy
subjects that might call for a filter
grep Subject spamfolder | sort is pretty handy),
these encoded subjects will be incognito.
I briefly tried setting up a filter that spam-filed anything with =? in the
Subject line. But that's way too broad a brush -- not all people
there are legitimate reasons for using other charsets even in English
language email. It's relatively rare, but it happens. And some bots,
notably the Adafruit forum notification bot
and the bot that sends out announcements from my alma mater,
unaccountably encode the charset even when they're sending mail
entirely in US ASCII.
So what's really needed is not to filter out all messages that specify
a charset, but to decode the Subject so the spam filter can see it and
filter it accordingly.
How? I couldn't find any ready-made tool
available for Linux that could decode RFC 2047 headers; but the Python
email package makes decoding a one-line task.
In the Python interpreter:
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> email.Header.decode_header("Subject: =?utf-8?B?U3RvcCBPdmVycGF5aW5nIGZvciBQcmludGVyIEluaw==?=")
[('Subject:', None), ('Stop Overpaying for Printer Ink', 'utf-8')]
So it's easy to write a script that can pull headers out of email
messages (files) and decode them. Just look for the line starting with
the header you want to match -- e.g. "Subject:" -- and pass that line
Only one snag. If the subject is longer than about 20 characters,
spammers will often opt to split it up into multiple groups, sometimes
even in different character sets. So for example, you might see
something like this, spread over multiple lines:
The script has to handle that too. If it's reading a header, it has to
check the next line, and if that line begins with whitespace, treat it
as more of the header.
The resulting script, decodemail.py
(on github), seems pretty handy and should be able to be plugged in
to a mail filtering program.
[ 20:45 Dec 08, 2012
More programming |
permalink to this entry |
Sat, 24 Sep 2011
I suspect all technical people -- at least those with a web presence
-- get headhunter spam. You know, email saying you're perfect for a
job opportunity at "a large Fortune 500 company" requiring ten years'
experience with technologies you've never used.
Mostly I just delete it. But this one sent me a followup --
I hadn't responded the first time, so surely I hadn't seen it and
here it was again, please respond since I was perfect for it.
Maybe I was just in a pissy mood that night. But
look, I'm a programmer, not a DBA -- I had to look it up to verify
that I knew what DBA stood for. I've never used Oracle.
A "Production DBA with extensive Oracle experience" job is right out,
and there's certainly nothing in my resume that would suggest that's
my line of work.
So I sent a brief reply, asking,
Why do you keep sending this?
Why exactly do you think I'm a DBA or an Oracle expert?
Have you looked at my resume? Do you think spamming people
with jobs completely unrelated to their field will get many
responses or help your credibility?
I didn't expect a reply. But I got one:
I must say my credibility is most important and it's unfortunate
that recruiters are thought of as less than in these regards. And, I know it
is well deserved by many of them.
In fact, Linux and SQL experience is more important than Oracle in this
situation and I got your email address through the Peninsula Linux Users
Group site which is old info and doesn't give any information about its
members' skill or experience. I only used a few addresses to experiment with
to see if their info has any value. Sorry you were one of the test cases but
I don't think this is spamming and apologize for any inconvenience it caused
[name removed], PhD
A courteous reply. But it stunned me.
Harvesting names from old pages on a LUG website, then sending a
rather specific job description out to all the names harvested,
regardless of their skillset -- how could that possibly not be
considered spam? isn't that practically the definition of spam?
And how could a recruiter expect to seem credible after sending this
sort of non-targeted mass solicitation?
To technical recruiters/headhunters: if you're looking for
good technical candidates, it does not help your case to spam people
with jobs that show you haven't read or understood their resume.
All it does is get you a reputation as a spammer. Then if you do, some
day, have a job that's relevant, you'll already have lost all credibility.
[ 20:30 Sep 24, 2011
More tech |
permalink to this entry |
Tue, 13 Apr 2010
I'm in a Yahoo group where a spammer just posted a message that
looked like it was coming from someone in the group, so Yahoo allowed it.
The list owner posted a message about using good passwords so your
account isn't hacked since that causes problems for everyone.
Of course, that's good advice and using good passwords is always a good idea.
But I though this sounded more like a
in which the spammer forges the From address to look like it's coming
from someone else.
Normal users encounter this in two ways:
- You start getting tons of bounce messages that look as though you
sent spam to hundreds of people and they're refusing it.
- You see spam that looks like it came from a friend of yours,
or spam on a mailing list that looks like it came from a
legitimate member of that list.
Since this sort of attack is so common, I felt the victim didn't
deserve being harangued about not having set up a good password.
So I posted a short note to the list explaining about Joe-jobs.
But to make the point, I forged the From address of the list owner.
Indeed, it got through Yahoo and out to the list just fine:
[ ... ] the spam probably
wasn't from a bad password. It was probably just a spammer forging
the header to look like it's from a legitimate user.
It's called a "joe-job": http://en.wikipedia.org/wiki/Joe-job
To illustrate, I've changed the From address on this message to
look like it's coming from Adam. I have not hacked [listowner]'s account
or guessed his password or anything else. If this works, and looks
like it came from [listowner], then the spam could have been done the same
way -- and there's no need to blame the owner of the account, or
accuse them of having a bad password.
Why does this work? Why doesn't Yahoo just block messages from
email@example.com if the mail doesn't come from isp.com?
They can't! Many, many people don't send mail from the domains in their
email addresses. In effect, people forge their From header all the time.
Here are some examples:
- You're using firstname.lastname@example.org, but you're using Thunderbird or Eudora
or Evolution or something to read and send mail from home.
- You're on your computer at home, but you're sending work-related
email from your work account.
- You're on your laptop, using Thunderbird or whatever, mailing
from email@example.com, but you're at a friend's house or a hotel or
conference or somewhere.
- You're sending mail from a public terminal somewhere (eek, do
people really type their mail info in to these things?)
- You're reading and sending mail from a mobile phone.
- You're sending mail from your own domain, firstname.lastname@example.org,
but you're at home or somewhere else other than wherever mydomain.com
If mailing lists rejected posts in all these cases, people would be
pretty annoyed. So they don't. But that means that now and then, some
Joe-job spam gets through to mailing lists. Unfortunately.
(Update: The message that inspired this may very
well have been a hacked password after all case, based on the mail
headers. But I found that a lot of people didn't know about
Joe-jobbing, so I thought this was worth writing up anyway.)
[ 21:28 Apr 13, 2010
More tech/email |
permalink to this entry |
Thu, 07 May 2009
During a server backup, Dave complained that my .spamassasin directory
was taking up 87Mb. I had to agree, that seemed a bit excessive.
The only two large files were auto-whitelist at 42M and bayes_seen at 41M.
Apparently these never get pruned by spamassassin.
Unfortunately, these are binary files, so you can't just edit them
and remove the early stuff, and spamassassin doesn't seem to have any
documentation on how to prune their data files.
A thread on the Spamassassin Users list on
Spamassassin data says it's okay to delete bayes_seen
and it will be regenerated.
For pruning auto-whitelist, that same post suggests a program called
check-whitelist that is only available in a spamassassin source tarball
-- it's not installed as part of distro packages. Run this with
But a search on the spamassassin.com wiki turns up an entry on
that says you should use tools/sa-awlUtil instead (it doesn't
say how to run it or where to get it -- presumably download a source
tarball and then RTFSC -- read the source code?)
Really, I'm not sure auto whitelisting is such a good idea anyway,
especially auto whitelist entries from several years ago,
so I opted for a simpler solution: removing the auto-whitelist file
at the same time that I removed bayes_seen. Indeed, both files were
immediately generated as new mail came in, but they were now much smaller.
I've run for a few weeks since doing that, and I'm not noticing any
difference in either the number of false positives or false
negatives. (Both are, unfortuantely, large enough to be noticable,
but that was true before the change as well.)
[ 19:38 May 07, 2009
More tech/email |
permalink to this entry |
Wed, 12 Nov 2008
I checked my Spam Assassin "probably" folder for the first time in too
long, and discovered that I was getting tons of false positives,
perfectly legitimate messages that were being filed as spam.
A little analysis of the X-Spam-Status: headers showed that all of
the misfiled messages (and lots of messages that didn't quite make it
over the threshold) were hitting a rule called DNS_FROM_SECURITYSAGE.
It turned out that this rule
obsolete and has been removed from Spam Assassin, but it
yet been removed from Debian, at least not from Etch.
So I filed a Debian bug. Or at least I think I did -- I got an
email acknowledgement from email@example.com but it didn't
include a bug number and Debian's
HyperEstraier based search engine
linked off the bug page
doesn't find it (I used reportbug).
Anyway, if you're getting lots of SECURITYSAGE false hits, edit
/usr/share/spamassassin/20_dnsbl_tests.cf and comment out the
lines for DNS_FROM_SECURITYSAGE and, while you're at it, the lines
for RCVD_IN_DSBL, which is also
obsolete. Just to be safe, you might also want to add
score DNS_FROM_SECURITYSAGE 0
in your .spamassassin/user_prefs (or equivalent systemwide file) as well.
Now if only I could figure out why it was setting
FORGED_RCVD_HELO and UNPARSEABLE_RELAY on messages from what seems
to be perfectly legitimate senders ...
[ 21:54 Nov 12, 2008
More linux |
permalink to this entry |