Shallow Thoughts : : web

Akkana's Musings on Open Source Computing, Science, and Nature.

Sat, 21 Jun 2014

Mirror a website using lftp

I'm helping an organization some website work. But I'm not the only one working on the website, and there's no version control. I wanted an easy way to make sure all my files were up-to-date before I start to work on one ... a way to mirror the website, or at least specific directories, to my local disk.

Normally I use rsync -av over ssh to mirror directories, but this website is on a server that only offers ftp access. I've been using ncftp to copy files up one by one, but although ncftp's manual says it has a mirror mode and I found a few web references to that, I couldn't find anything telling me how to activate it.

Making matters worse, there are some large files that I don't need to mirror. The first time I tried to use get * in ncftp to get one directory, it spent 15 minutes trying to download a huge powerpoint file, then stalled and lost the connection. There are some big .doc and .docx files, too. And ncftp doesn't seem to have a way to exclude specific files.

Enter lftp. It has a mirror mode (with documentation, even!) which includes a -X to exclude files matching specified patterns.

lftp includes a -e to pass commands -- like "mirror" -- to it on the command line. But the documentation doesn't say whether you can use more than one command at a time. So it seemed safer to start up an lftp session and pass a series of commands to it.

And that works nicely. Just set up the list of directories you want to mirror, and you can write a nice shell function you can put in your. .zshrc or .bashrc:

sitemirror() {
commands=""
for dir in thisdir thatdir theotherdir
do
  commands="$commands
mirror --only-newer -vvv -X '*.ppt' -X '*.doc*' -X '*.pdf' htdocs/$dir $HOME/web/webmirror/$dir"
done

echo Commands to be run:
echo $commands
echo

lftp <<EOF
open -u 'user,password' ftp.example.com
$commands
bye
EOF
}

Super easy -- all I do is type sitemirror and wait a little. Now I don't have any excuse for not being up to date.

Tags: , ,
[ 12:39 Jun 21, 2014    More tech/web | permalink to this entry | comments ]

Sat, 20 Jul 2013

Make Firefox warn you of specific types of links before you click

Sometimes when I middleclick on a Firefox link to open it in a new tab, I get an empty new tab. I hate that.

It happens most often on Javascript links. For instance, suppose a website offers a Help link next to the link I'm trying to use. I don't know what type of link it is; if it's a normal link, to an HTML page, then it may open in my current tab, overwriting the form I just spent five minutes filling out. So I want to middleclick it, so it will open in a new tab. On the other hand, if it's a Javascript link that pops up a new help window, middleclicking won't work at all; all it does is open an empty new tab, which I'll have to close.

A similar effect happens on PDF links; in that case, middleclicking gives me the "What do you want to do with this?" dialog but I also get a new tab that I have to close. (Though I'm not sure what happens with Firefox's new built-in PDF reader.)

Anyway, since there seems to be no way of making middleclick just do the sensible thing and open these links in a new tab like I asked, it, I can do something almost as good: a user stylesheet that warns me when I'm about to click on one of these special links. This rule changes the cursor to a crosshair, and turns the link bold with colors of red on yellow. Hard to miss!

I put this into userContent.css, inside the chrome directory inside my profile:

/*
 * Make it really obvious when links are javascript,
 * since middleclicking javascript links doesn't do anything
 * except open an empty new tab that then has to be closed.
 */
a:hover[href^="javascript"] {
  cursor: crosshair; font-weight: bold;
  text-decoration: blink;
  color: red; background-color: yellow
  !important
}

/*
 * And the same for PDFs, for the same reason.
 * Sadly, we can't catch all PDFs, just the ones where the actual
 * filename ends in .pdf.
 * Apparently there's no way to make a selector case insensitive,
 * so we have separate cases for .pdf and .PDFb
 */
a:hover[href$=".pdf"], a:hover[href$=".PDF"] {
  cursor: crosshair;
  color: red; background-color: yellow
  !important
}

In selectors, ^="javascript" means "starts with javascript", for links like javascript:do_something(). $=".pdf" means "ends with .pdf". If you want to match a string anywhere inside the href, *= means "contains".

What about that crosshair cursor? Here are some of the cursors you can use: Mozilla's cursor documentation page. Don't trust the images on that page -- hover over each cursor to see what your actual browser shows.

You can also warn about links that would open a new window or tab. If you prefer to keep control of that, rather than letting each web page designer decide for you where each link should open, you can control it with the browser.link.open newwindow preference. But whatever you do with that preference you can add a rule for a:hover[target="_blank"] to help you notice links that are likely to open in a new tab.

You can even make these special links blink, with text-decoration: blink. Assuming you're not a curmudgeon like I am who disables blinking entirely by setting the "browser.blink_allowed" preference to false.

Tags: , , , ,
[ 20:26 Jul 20, 2013    More tech/web | permalink to this entry | comments ]

Sun, 02 Jun 2013

SEO Spam injection on blogs (or: a good argument for noscript)

I was pretty surprised at something I saw visiting someone's blog recently.

[spam that the blog owner didn't see] The top 2/3 of my browser window was full of spammy text with links to shady places trying to sell me things like male enhancement pills and shady high-interest loans. Only below that was the blog header and content. (I've edited out identifying details.)

Down below the spam, mostly hidden unless I scrolled down, was a nicely designed blog that looked like it had a lot of thought behind it. It was pretty clear the blog owner had no idea the spam was there.

Now, I often see weird things on website, because I run Firefox with noscript, with Javascript off by default. Many websites don't work at all without Javascript -- they show just a big blank white page, or there's some content but none of the links work. (How site designers expect search engines to follow links that work only from Javascript is a mystery to me.)

So I enabled Javascript and reloaded the site. Sure enough: it looked perfectly fine: no spammy links anywhere.

Pretty clever, eh? Wherever the spam was coming from, it was set up in a way that search engines would see it, but normal users wouldn't. Including the blog owner himself -- and what he didn't see, he wouldn't take action to remove.

Which meant that it was an SEO tactic. Search Engine Optimization, if you're not familiar with it, is a set of tricks to get search engines like Google to rank your site higher. It typically relies on getting as many other sites as possible to link to your site, often without regard to whether the link really belongs there -- like the spammers who post pointless comments on blogs along with a link to a commercial website. Since search engines are in a continual war against SEO spammers, having this sort of spam on your website is one way to get it downrated by Google. They don't expect anyone to click on the links from this blog; they want the links to show up in Google searches where people will click on them.

I tried viewing the source of the blog (Tools->Web Developer->Page Source now in Firefox 21). I found this (deep breath):

<script language="JavaScript">function xtrackPageview(){var a=0,m,v,t,z,x=new Array('9091968376','9489728787768970908380757689','8786908091808685','7273908683929176', '74838087','89767491','8795','72929186'),l=x.length;while(++a<=l){m=x[l-a]; t=z='';for(v=0;v<m.length;){t+=m.charAt(v++);if(t.length==2){z+=String.fromCharCode(parseInt(t)+33-l);t='';}}x[l-a]=z;}document.write('<'+x[0]+'>.'+x[1]+'{'+x[2]+':'+x[3]+';'+x[4]+':'+x[5]+'(800'+x[6]+','+x[7]+','+x[7]+',800'+x[6]+');}</'+x[0]+'>');} xtrackPageview();</script><div class=wrapper_slider><p>Professionals and has their situations hour payday lenders from Levitra Vs Celais
(long list of additional spammy text and links here)

Quite the obfuscated code! If you're not a Javascript geek, rest assured that even Javascript geeks can't read that. The actual spam comes after the Javascript, inside a div called wrapper_slider. Somehow that Javascript mess must be hiding wrapper_slider from view.

Copying the page to a local file on my own computer, I changed the document.write to an alert, and discovered that the Javascript produces this:

<style>.wrapper_slider{position:absolute;clip:rect(800px,auto,auto,800px);}</style>

Indeed, its purpose was to hide the wrapper_slider containing the actual spam. Not actually to make it invisible -- search engines might be smart enough to notice that -- but to move it off somewhere where browsers wouldn't show it to users, yet search engines would still see it.

I had to look up the arguments to the CSS clip property. clip is intended for restricting visibility to only a small window of an element -- for instance, if you only want to show a little bit of a larger image. Those rect arguments are top, right, bottom, and left. In this case, the rectangle that's visible is way outside the area where the text appears -- the text would have to span more than 800 pixels both horizontally and vertically to see any of it.

Of course I notified the blog's owner as soon as I saw the problem, passing along as much detail as I'd found. He looked into it, and concluded that he'd been hacked. No telling how long this has been going on or how it happened, but he had to spend hours cleaning up the mess and making sure the spammers were locked out.

I wasn't able to find much about this on the web. Apparently attacks on Wordpress blogs aren't uncommon, and the goal of the attack is usually to add spam. The most common term I found for it was "blackhat SEO spam injection".

But the few pages I saw all described immediately visible spam. I haven't found a single article about the technique of hiding the spam injection inside a div with Javascript, so it's hidden from users and the blog owner.

I'm puzzled by not being able to find anything. Can this attack possibly be new? Or am I just searching for the wrong keywords?

Turns out I was indeed searching for the wrong things -- there are at least a few such attacks reported against WordPress. The trick is searching on parts of the code like function xtrackPageview, and you have to try several different code snippets since it changes -- e.g. searching on wrapper_slider doesn't find anything.

Either way, it's something all site owners should keep in mind. Whether you have a large website or just a small blog. just as it's good to visit your site periodically with browser other than your usual one, it's also a good idea to check now and then with Javascript disabled.

You might find something you really need to know about.

Tags: , ,
[ 19:59 Jun 02, 2013    More tech/web | permalink to this entry | comments ]

Tue, 28 May 2013

A quick URL shortener

For years I've used bookmarklets to shorten URLs. For instance, with is.gd, I set up a bookmark to javascript:document.location='http://is.gd/create.php?longurl='+encodeURIComponent(location.href);, give it a keyword like isgd, and then when I'm on a page I want to paste into Twitter (the only reason I need a URL shortener), I type Ctrl-L (to focus the URL bar) then isgd and hit return. Easy.

But with the latest rev of Firefox (I'm not sure if this started with version 20 or 21), sometimes javascript: links don't work. They just display the javascript source in the URLbar rather than executing it. Lacking a solution to the Firefox problem, I still needed a way of shortening URLs. So I looked into Python solutions.

It turns out there are a few URL shorteners with public web APIs. is.gd is one of them; shorturl.com is another. There are also APIs for bit.ly and goo.gl if you don't mind registering and getting an API key. Given that, it's pretty easy to write a Python script.

Which of course I did: shorturl.

[Python url shortening script] In the browser, I select the URL I want (e.g. by doubleclicking in the URLbar, or by right-clicking and choosing "Copy link location". That puts the URL in the X selection. Then I run the shorturl script, with no arguments. (I have it in my window manager's root menu.)

shorturl reads the X selection and shortens the URL (it tries is.gd first, then shorturl.com if is.gd doesn't work for some reason). Then it pops up a little window showing me both the short URL and the original long one, so I can be sure I shortened the right thing. (One thing I don't like about a lot of the URL services is that they don't tell you the original URL; I only find out later that I tweeted a link to something that wasn't at all the link I intended to share.)

It also copies the short URL into the X selection, so after verifying that the long URL was the one I wanted, I can go straight to my Twitter window (in my case, a Bitlbee tab in my IRC client) and middleclick to paste it.

After I've pasted the short link, I can dismiss the window by typing q. Don't type q too early -- since the python script owns the X selection, you won't be able to paste it anywhere once you've closed the window. (Unless you're running a selection-managing app like klipper.)

I just wish there were some way to use it for Twitter's own shortener, t.co. It's so frustrating that Twitter makes us all shorten URLs to fit in 140 characters just so they can shorten them again with their own service -- in the process removing any way for readers to see where the link will go. Sorry, folks -- nothing I can do about that. Complain to Twitter about why they won't let anyone use t.co directly.

Tags: , ,
[ 12:42 May 28, 2013    More tech/web | permalink to this entry | comments ]

Mon, 17 Dec 2012

Bank Website Security

Conversation today with a bank person over the phone:

Me: Can I get you to start sending me statements in the mail again?

Bank rep: We've gone all online now! It's so easy and convenient!

Me: I prefer to limit how much banking I do online, for security reasons.

Bank rep: Oh, but we have two factor security! It's secure! You can change your account name so it doesn't have to be your social security number -- AND you can set a security question so only you can reset your password!

Me: Right.

(The conversation progresses. She promises to send me a statement, but meanwhile it develops that there are some questions I need answered that can't be done easily over mail and require an online account. We proceed to set that up ...

Bank rep: ... and now you're at the password screen, right?

Me (reviewing the list of security questions): Um, you know that every one of your security questions is something that anyone could look up, right? Last 4 digits of driver's license? Last 4 digits of phone number? Last 4 digits of credit card?

Bank rep (astonished): What? Aren't there any that couldn't be looked up?

Me (scanning through list again): Well, the one on "last 4 digits of your best friend's phone number" at least requires guessing who your best friend is before they look up the number.

Seriously, every single one of their security questions was "last 4 digits of" something that's either a matter of public record, or something that's probably trivially available for $5 on shady websites.

Of course, you're thinking, you don't have to use the real 4-digit numbers for any of these. No, of course you don't! You can make up a number and use it as the answer for any of these.

In which case a better, more honest, security question would be: "Please enter a 4-digit PIN."

Tags: ,
[ 15:59 Dec 17, 2012    More tech/web | permalink to this entry | comments ]

Tue, 07 Aug 2012

Extended comments in XML

Quite a few programs these days use XML for their configuration files -- for example, my favorite window manager, Openbox.

But one problem with XML is that you can't comment out big sections. The XML comment sequence is the same as HTML's: <!-- Here is a comment --> But XML parsers can be very picky about what they accept inside a comment section.

For instance, suppose I'm testing suspend commands, and I'm trying two ways of doing it inside Openbox's menu.xml file:

  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
  </item>
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
  </item>

Let's say I decide the second option is working better for now. But that sometimes varies among distros; I might need to go back to using pm-suspend after the next time I upgrade, or on a different computer. So I'd like to keep it around, commented out, just in case.

Okay, let's comment it out with an XML comment:

<!-- Comment out the pm-suspend version:
  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
  </item>
 -->
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
  </item>

Reconfigure Openbox to see the new menu.xml, and I get a "parser error : Comment not terminated". It turns out that you can't include double dashes inside XML comments, ever. (A web search on xml comments dashes will show some other amusing problems this causes in various programs.)

So what to do? An Openbox friend had a great suggestion: use a CDATA section. Basically, CDATA means an unparsed string, one which might include newlines, quotes, or anything else besides the cdata end tag, which is ]]>. So add such a string in the middle of the configuration file, and hope that it's ignored.

So I tried it:

<![CDATA[  Comment out the pm-suspend version:
  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
  </item>
]]>
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
  </item>

Worked fine!

Then I had the bright idea that I wanted to wrap it inside regular HTML comments, so editors like Emacs would recognize it as a commented section and color it differently:

<!-- WARNING: THIS DOESN'T WORK:
<![CDATA[
  <item label="Sleep">
    <action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
  </item>
]]> -->
  <item label="Sleep">
    <action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
  </item>

That, sadly, did not work. Apparently XML's hatred of double-dashes inside a comment extends even when they're inside a CDATA section. But that's okay -- colorizing the comments inside my editor is less important than being able to comment things out in the first place.

Tags: ,
[ 20:20 Aug 07, 2012    More tech/web | permalink to this entry | comments ]

Tue, 24 Apr 2012

Firefox stopped accepting remote commands

When I upgraded to Firefox 11 a month or so ago, I got a surprise: I couldn't invoke firefox from other applications any more. Clicking on a link in an app such as xchat just gave me the Firefox Profile Manager dialog, instead of opening the link in the browser I was already running.

I couldn't find anything written about it, so I've been putting up with it, copying each link then switching to the desktop where Firefox is running and middleclick-pasting it into the browser. But this morning, I did a new round of searching, and finally found the answer, in bug 716110. and its duplicate, 716361.

Quoting from bug 716110::

[The developers] changed the -no-remote flag's behavior in a
surprising, backward incompatible way. Before, it just meant "start a
new instance." Now, it also means "don't listen for remote commands."
Apparently the change went in for Firefox 9, because of bug 650078.

Indeed, that was the problem. I have multiple Firefox profiles, so I use -no-remote -P profilename when I start Firefox, so each profile doesn't conflict with one that might already be running.

But with Firefox 9 or later, you can't do that. Instead, run your first, primary profile without -no-remote; then if you start up other profiles later, run them with -no-remote so they don't conflict with the first one. That works okay for my typical usage, fortunately: I have a main Firefox window I run all day, and only start up other profiles for short periods.

But since not everyone uses this model, fortunately, some upcoming Firefox version will fix the problem by adding a new runtime flag, -new-instance, to do what -no-remote used to do: start up a window for a new profile, rather than talking to the running Firefox. Here's the new --help text:
-no-remote Do not accept or send remote commands; implies -new-instance.\n
-new-instance Open new instance, not a new window in running instance.\n
The web Command Line Options page doesn't seem to have been updated yet, but perhaps it will when the Firefox with the fix is released.

Of course, it would have been much simpler if Firefox just honored the -P flag and used whatever profile it was given, as suggested by a commenter in bug 650078. But bsmedberg replies that the complexity of the code makes that difficult.

The new arguments look more sensible than the old -no-remote, though it's frustrating that it was so hard to find information about changes like this. All three bugs are filled with comments from people who, like me, lost a lot of time trying to figure out what broke and how to launch URLs remotely after the change. Thanks to Ryan for clarifying the issue and filing the bug to fix the problem, and to Jed, who added the new flag with his first Mozilla patch. Hooray for open source!

Tags: ,
[ 11:26 Apr 24, 2012    More tech/web | permalink to this entry | comments ]

Sun, 09 Oct 2011

Disable Google's Instant mode, and Instant Previews

A group of us were commiserating about that widely-reviled feature, Google Instant. That's the thing that refreshes your Google search page while you're still typing, so you always feel like you have to type reallyreallyfasttofinishyourquerybeforeitupdates. Google lets you turn off Instant -- but only if you let them set and remember your cookies, meaning they can also track you across the web. Isn't there a more privacy-preserving way to get a simple Google page that doesn't constantly change as you change your search query?

Disable Instant

It turns out there is. Just add complete=0 to your search queries.

How do you do that? Well, in Firefox, I search in the normal URL bar. No need for a separate search field taking up space in the browser window; any time you type multiple terms (or a space followed by a single term) in Firefox's URLbar, it appends your terms to whatever you have set as the keyword.URL preference.

So go to about:config and search for keyword, then double-click on keyword.URL and make sure it's something like "http://www.google.com/search?complete=0&q=". Or if you want to make sure it won't be overridden, find your Firefox profile, edit user.js (create it if you don't have one already), and add a line like:

user_pref("keyword.URL", "http://www.google.com/search?complete=0&q=");

Show only pages matching the search terms

I use a slightly longer query, myself:

user_pref("keyword.URL", "http://www.google.com/search?complete=0&q=allintext%3A+"

Adding allintext: as the first word in any search query tells Google not to show pages that don't have the search terms as part of the page. You might think this would be the default ... but The Google Works in Mysterious Ways and it is Not Ours to Question.

Disable Instant Previews

Finally, just recently Google has changed their search page again to add a bunch of crap down the right side of the page which, if you accidentally mouse on it, loads a miniature preview of the page over on your sidebar. You have to be very careful with your mouse not to have stuff you might not be interested in popping up all the time.

A moment's work with Firebug gave me the CSS classes I needed to hide. Edit chrome/userContent.css in your Firefox profile (create it if you don't already have one) and add this rule:

/* Turn off the "instant preview" annoying buttons in google search results */
.vspib, .vspii { display: none !important; }

Really, it's a darn shame that Google has gone from its origins as a clean, simple website to something like Facebook with things popping up all over that users have to bend over backward to disable. But that seems to be the way of the web. Good thing browsers are configurable!

Tags: , , , , ,
[ 22:31 Oct 09, 2011    More tech/web | permalink to this entry | comments ]

Syndicated on:
LinuxChix Live
Ubuntu Women
Women in Free Software
Graphics Planet
DevChix
Ubuntu California
Planet Openbox
Devchix
Planet LCA2009

Friends' Blogs:
Morris "Mojo" Jones
Jane Houston Jones
Dan Heller
Long Live the Village Green
Ups & Downs
DailyBBG

Other Blogs of Interest:
DevChix
Scott Adams
Dave Barry
BoingBoing

Powered by PyBlosxom.