Shallow Thoughts : tags : web

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Thu, 11 Sep 2014

I don't use web forums, the kind you have to read online, because they don't scale. If you're only interested in one subject, then they work fine: you can keep a browser tab for your one or two web forums perenially open and hit reload every few hours to see what's new. If you're interested in twelve subjects, each of which has several different web forums devoted to it -- how could you possibly keep up with that? So I don't bother with forums unless they offer an email gateway, so they'll notify me by email when new discussions get started, without my needing to check all those web pages several times per day.

LinkedIn discussions mostly work like a web forum. But for a while, they had a reasonably usable email gateway. You could set a preference to be notified of each new conversation. You still had to click on the web link to read the conversation so far, but if you posted something, you'd get the rest of the discussion emailed to you as each message was posted. Not quite as good as a regular mailing list, but it worked pretty well. I used it for several years to keep up with the very active Toastmasters group discussions.

About a year ago, something broke in their software, and they lost the ability to send email for new conversations. I filed a trouble ticket, and got a note saying they were aware of the problem and working on it. I followed up three months later (by filing another ticket -- there's no way to add to an existing one) and got a response saying be patient, they were still working on it. 11 months later, I'm still being patient, but it's pretty clear they have no intention of ever fixing the problem.

Just recently I fiddled with something in my LinkedIn prefs, and started getting "Popular Discussions" emails every day or so. The featured "popular discussion" is always something stupid that I have no interest in, but it's followed by a section headed "Other Popular Discussions" that at least gives me some idea what's been posted in the last few days. Seemed like it might be worth clicking on the links even though it means I'd always be a few days late responding to any conversations.

Except -- none of the links work. They all go to a generic page with a red header saying "Sorry it seems there was a problem with the link you followed."

I'm reading the plaintext version of the mail they send out. I tried viewing the HTML part of the mail in a browser, and sure enough, those links worked. So I tried comparing the text links with the HTML:

Text version:
HTML version:


Well, that's clear as mud, isn't it?

HTML entity substitution

I pasted both links one on top of each other, to make it easier to compare them one at a time. That made it fairly easy to find the first difference:

Text version:
HTML version:


Time to die laughing: they're doing HTML entity substitution on the plaintext part of their email notifications, changing & to &amp; everywhere in the link.

If you take the link from the text email and replace &amp; with &, the link works, and takes you to the specific discussion.

Pagination

Except you can't actually read the discussion. I went to a discussion that had been open for 2 days and had 35 responses, and LinkedIn only showed four of them. I don't even know which four they are -- are they the first four, the last four, or some Facebook-style "four responses we thought you'd like". There's a button to click on to show the most recent entries, but then I only see a few of the most recent responses, still not the whole thread.

Hooray for the web -- of course, plenty of other people have had this problem too, and a little web searching unveiled a solution. Add a pagination token to the end of the URL that tells LinkedIn to show 1000 messages at once.

&count=1000&paginationToken=

It won't actually show 1000 (or all) responses -- but if you start at the beginning of the page and scroll down reading responses one by one, it will auto-load new batches. Yes, infinite scrolling pages can be annoying, but at least it's a way to read a LinkedIn conversation in order.

Making it automatic

Okay, now I know how to edit one of their URLs to make it work. Do I want to do that by hand any time I want to view a discussion? Noooo!

Time for a script! Since I'll be selecting the URLs from mutt, they'll be in the X PRIMARY clipboard. And unfortunately, mutt adds newlines so I might as well strip those as well as fixing the LinkedIn problems. (Firefox will strip newlines for me when I paste in a multi-line URL, but why rely on that?)

Here's the important part of the script:

import subprocess, gtk

primary = gtk.clipboard_get(gtk.gdk.SELECTION_PRIMARY)
if not primary.wait_is_text_available() :
sys.exit(0)
"&count=1000&paginationToken="


And here's the full script: linkedinify on GitHub. I also added it to pyclip, the script I call from Openbox to open a URL in Firefox when I middle-click on the desktop.

Now I can finally go back to participating in those discussions.

Tags: , , ,
[ 13:10 Sep 11, 2014    More tech/web | permalink to this entry | comments ]

Mirror a website using lftp

I'm helping an organization some website work. But I'm not the only one working on the website, and there's no version control. I wanted an easy way to make sure all my files were up-to-date before I start to work on one ... a way to mirror the website, or at least specific directories, to my local disk.

Normally I use rsync -av over ssh to mirror directories, but this website is on a server that only offers ftp access. I've been using ncftp to copy files up one by one, but although ncftp's manual says it has a mirror mode and I found a few web references to that, I couldn't find anything telling me how to activate it.

Making matters worse, there are some large files that I don't need to mirror. The first time I tried to use get * in ncftp to get one directory, it spent 15 minutes trying to download a huge powerpoint file, then stalled and lost the connection. There are some big .doc and .docx files, too. And ncftp doesn't seem to have a way to exclude specific files.

Enter lftp. It has a mirror mode (with documentation, even!) which includes a -X to exclude files matching specified patterns.

lftp includes a -e to pass commands -- like "mirror" -- to it on the command line. But the documentation doesn't say whether you can use more than one command at a time. So it seemed safer to start up an lftp session and pass a series of commands to it.

And that works nicely. Just set up the list of directories you want to mirror, and you can write a nice shell function you can put in your. .zshrc or .bashrc:

sitemirror() {
commands=""
for dir in thisdir thatdir theotherdir
do
commands="$commands mirror --only-newer -vvv -X '*.ppt' -X '*.doc*' -X '*.pdf' htdocs/$dir $HOME/web/webmirror/$dir"
done

echo Commands to be run:
echo $commands echo lftp <<EOF open -u 'user,password' ftp.example.com$commands
bye
EOF
}


Super easy -- all I do is type sitemirror and wait a little. Now I don't have any excuse for not being up to date.

Tags: , ,
[ 12:39 Jun 21, 2014    More tech/web | permalink to this entry | comments ]

Make Firefox warn you of specific types of links before you click

Sometimes when I middleclick on a Firefox link to open it in a new tab, I get an empty new tab. I hate that.

It happens most often on Javascript links. For instance, suppose a website offers a Help link next to the link I'm trying to use. I don't know what type of link it is; if it's a normal link, to an HTML page, then it may open in my current tab, overwriting the form I just spent five minutes filling out. So I want to middleclick it, so it will open in a new tab. On the other hand, if it's a Javascript link that pops up a new help window, middleclicking won't work at all; all it does is open an empty new tab, which I'll have to close.

A similar effect happens on PDF links; in that case, middleclicking gives me the "What do you want to do with this?" dialog but I also get a new tab that I have to close. (Though I'm not sure what happens with Firefox's new built-in PDF reader.)

Anyway, since there seems to be no way of making middleclick just do the sensible thing and open these links in a new tab like I asked, it, I can do something almost as good: a user stylesheet that warns me when I'm about to click on one of these special links. This rule changes the cursor to a crosshair, and turns the link bold with colors of red on yellow. Hard to miss!

I put this into userContent.css, inside the chrome directory inside my profile:

/*
* Make it really obvious when links are javascript,
* since middleclicking javascript links doesn't do anything
* except open an empty new tab that then has to be closed.
*/
a:hover[href^="javascript"] {
cursor: crosshair; font-weight: bold;
color: red; background-color: yellow
!important
}

/*
* And the same for PDFs, for the same reason.
* Sadly, we can't catch all PDFs, just the ones where the actual
* filename ends in .pdf.
* Apparently there's no way to make a selector case insensitive,
* so we have separate cases for .pdf and .PDFb
*/
a:hover[href$=".pdf"], a:hover[href$=".PDF"] {
cursor: crosshair;
color: red; background-color: yellow
!important
}


In selectors, ^="javascript" means "starts with javascript", for links like javascript:do_something(). $=".pdf" means "ends with .pdf". If you want to match a string anywhere inside the href, *= means "contains". What about that crosshair cursor? Here are some of the cursors you can use: Mozilla's cursor documentation page. Don't trust the images on that page -- hover over each cursor to see what your actual browser shows. You can also warn about links that would open a new window or tab. If you prefer to keep control of that, rather than letting each web page designer decide for you where each link should open, you can control it with the browser.link.open newwindow preference. But whatever you do with that preference you can add a rule for a:hover[target="_blank"] to help you notice links that are likely to open in a new tab. You can even make these special links blink, with text-decoration: blink. Assuming you're not a curmudgeon like I am who disables blinking entirely by setting the "browser.blink_allowed" preference to false. Tags: , , , , [ 20:26 Jul 20, 2013 More tech/web | permalink to this entry | comments ] Sun, 02 Jun 2013 SEO Spam injection on blogs (or: a good argument for noscript) I was pretty surprised at something I saw visiting someone's blog recently. The top 2/3 of my browser window was full of spammy text with links to shady places trying to sell me things like male enhancement pills and shady high-interest loans. Only below that was the blog header and content. (I've edited out identifying details.) Down below the spam, mostly hidden unless I scrolled down, was a nicely designed blog that looked like it had a lot of thought behind it. It was pretty clear the blog owner had no idea the spam was there. Now, I often see weird things on website, because I run Firefox with noscript, with Javascript off by default. Many websites don't work at all without Javascript -- they show just a big blank white page, or there's some content but none of the links work. (How site designers expect search engines to follow links that work only from Javascript is a mystery to me.) So I enabled Javascript and reloaded the site. Sure enough: it looked perfectly fine: no spammy links anywhere. Pretty clever, eh? Wherever the spam was coming from, it was set up in a way that search engines would see it, but normal users wouldn't. Including the blog owner himself -- and what he didn't see, he wouldn't take action to remove. Which meant that it was an SEO tactic. Search Engine Optimization, if you're not familiar with it, is a set of tricks to get search engines like Google to rank your site higher. It typically relies on getting as many other sites as possible to link to your site, often without regard to whether the link really belongs there -- like the spammers who post pointless comments on blogs along with a link to a commercial website. Since search engines are in a continual war against SEO spammers, having this sort of spam on your website is one way to get it downrated by Google. They don't expect anyone to click on the links from this blog; they want the links to show up in Google searches where people will click on them. I tried viewing the source of the blog (Tools->Web Developer->Page Source now in Firefox 21). I found this (deep breath): <script language="JavaScript">function xtrackPageview(){var a=0,m,v,t,z,x=new Array('9091968376','9489728787768970908380757689','8786908091808685','7273908683929176', '74838087','89767491','8795','72929186'),l=x.length;while(++a<=l){m=x[l-a]; t=z='';for(v=0;v<m.length;){t+=m.charAt(v++);if(t.length==2){z+=String.fromCharCode(parseInt(t)+33-l);t='';}}x[l-a]=z;}document.write('<'+x[0]+'>.'+x[1]+'{'+x[2]+':'+x[3]+';'+x[4]+':'+x[5]+'(800'+x[6]+','+x[7]+','+x[7]+',800'+x[6]+');}</'+x[0]+'>');} xtrackPageview();</script><div class=wrapper_slider><p>Professionals and has their situations hour payday lenders from Levitra Vs Celais (long list of additional spammy text and links here)  Quite the obfuscated code! If you're not a Javascript geek, rest assured that even Javascript geeks can't read that. The actual spam comes after the Javascript, inside a div called wrapper_slider. Somehow that Javascript mess must be hiding wrapper_slider from view. Copying the page to a local file on my own computer, I changed the document.write to an alert, and discovered that the Javascript produces this: <style>.wrapper_slider{position:absolute;clip:rect(800px,auto,auto,800px);}</style>  Indeed, its purpose was to hide the wrapper_slider containing the actual spam. Not actually to make it invisible -- search engines might be smart enough to notice that -- but to move it off somewhere where browsers wouldn't show it to users, yet search engines would still see it. I had to look up the arguments to the CSS clip property. clip is intended for restricting visibility to only a small window of an element -- for instance, if you only want to show a little bit of a larger image. Those rect arguments are top, right, bottom, and left. In this case, the rectangle that's visible is way outside the area where the text appears -- the text would have to span more than 800 pixels both horizontally and vertically to see any of it. Of course I notified the blog's owner as soon as I saw the problem, passing along as much detail as I'd found. He looked into it, and concluded that he'd been hacked. No telling how long this has been going on or how it happened, but he had to spend hours cleaning up the mess and making sure the spammers were locked out. I wasn't able to find much about this on the web. Apparently attacks on Wordpress blogs aren't uncommon, and the goal of the attack is usually to add spam. The most common term I found for it was "blackhat SEO spam injection". But the few pages I saw all described immediately visible spam. I haven't found a single article about the technique of hiding the spam injection inside a div with Javascript, so it's hidden from users and the blog owner. I'm puzzled by not being able to find anything. Can this attack possibly be new? Or am I just searching for the wrong keywords? Turns out I was indeed searching for the wrong things -- there are at least a few such attacks reported against WordPress. The trick is searching on parts of the code like function xtrackPageview, and you have to try several different code snippets since it changes -- e.g. searching on wrapper_slider doesn't find anything. Either way, it's something all site owners should keep in mind. Whether you have a large website or just a small blog. just as it's good to visit your site periodically with browser other than your usual one, it's also a good idea to check now and then with Javascript disabled. You might find something you really need to know about. Tags: , , [ 19:59 Jun 02, 2013 More tech/web | permalink to this entry | comments ] Tue, 28 May 2013 A quick URL shortener For years I've used bookmarklets to shorten URLs. For instance, with is.gd, I set up a bookmark to javascript:document.location='http://is.gd/create.php?longurl='+encodeURIComponent(location.href);, give it a keyword like isgd, and then when I'm on a page I want to paste into Twitter (the only reason I need a URL shortener), I type Ctrl-L (to focus the URL bar) then isgd and hit return. Easy. But with the latest rev of Firefox (I'm not sure if this started with version 20 or 21), sometimes javascript: links don't work. They just display the javascript source in the URLbar rather than executing it. Lacking a solution to the Firefox problem, I still needed a way of shortening URLs. So I looked into Python solutions. It turns out there are a few URL shorteners with public web APIs. is.gd is one of them; shorturl.com is another. There are also APIs for bit.ly and goo.gl if you don't mind registering and getting an API key. Given that, it's pretty easy to write a Python script. Which of course I did: shorturl. In the browser, I select the URL I want (e.g. by doubleclicking in the URLbar, or by right-clicking and choosing "Copy link location". That puts the URL in the X selection. Then I run the shorturl script, with no arguments. (I have it in my window manager's root menu.) shorturl reads the X selection and shortens the URL (it tries is.gd first, then shorturl.com if is.gd doesn't work for some reason). Then it pops up a little window showing me both the short URL and the original long one, so I can be sure I shortened the right thing. (One thing I don't like about a lot of the URL services is that they don't tell you the original URL; I only find out later that I tweeted a link to something that wasn't at all the link I intended to share.) It also copies the short URL into the X selection, so after verifying that the long URL was the one I wanted, I can go straight to my Twitter window (in my case, a Bitlbee tab in my IRC client) and middleclick to paste it. After I've pasted the short link, I can dismiss the window by typing q. Don't type q too early -- since the python script owns the X selection, you won't be able to paste it anywhere once you've closed the window. (Unless you're running a selection-managing app like klipper.) I just wish there were some way to use it for Twitter's own shortener, t.co. It's so frustrating that Twitter makes us all shorten URLs to fit in 140 characters just so they can shorten them again with their own service -- in the process removing any way for readers to see where the link will go. Sorry, folks -- nothing I can do about that. Complain to Twitter about why they won't let anyone use t.co directly. Tags: , , [ 12:42 May 28, 2013 More tech/web | permalink to this entry | comments ] Mon, 17 Dec 2012 Bank Website Security Conversation today with a bank person over the phone: Me: Can I get you to start sending me statements in the mail again? Bank rep: We've gone all online now! It's so easy and convenient! Me: I prefer to limit how much banking I do online, for security reasons. Bank rep: Oh, but we have two factor security! It's secure! You can change your account name so it doesn't have to be your social security number -- AND you can set a security question so only you can reset your password! Me: Right. (The conversation progresses. She promises to send me a statement, but meanwhile it develops that there are some questions I need answered that can't be done easily over mail and require an online account. We proceed to set that up ... Bank rep: ... and now you're at the password screen, right? Me (reviewing the list of security questions): Um, you know that every one of your security questions is something that anyone could look up, right? Last 4 digits of driver's license? Last 4 digits of phone number? Last 4 digits of credit card? Bank rep (astonished): What? Aren't there any that couldn't be looked up? Me (scanning through list again): Well, the one on "last 4 digits of your best friend's phone number" at least requires guessing who your best friend is before they look up the number. Seriously, every single one of their security questions was "last 4 digits of" something that's either a matter of public record, or something that's probably trivially available for$5 on shady websites.

Of course, you're thinking, you don't have to use the real 4-digit numbers for any of these. No, of course you don't! You can make up a number and use it as the answer for any of these.

In which case a better, more honest, security question would be: "Please enter a 4-digit PIN."

Tags: ,
[ 15:59 Dec 17, 2012    More tech/web | permalink to this entry | comments ]

Tue, 07 Aug 2012

Quite a few programs these days use XML for their configuration files -- for example, my favorite window manager, Openbox.

But one problem with XML is that you can't comment out big sections. The XML comment sequence is the same as HTML's: <!-- Here is a comment --> But XML parsers can be very picky about what they accept inside a comment section.

For instance, suppose I'm testing suspend commands, and I'm trying two ways of doing it inside Openbox's menu.xml file:

  <item label="Sleep">
<action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
</item>
<item label="Sleep">
<action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
</item>


Let's say I decide the second option is working better for now. But that sometimes varies among distros; I might need to go back to using pm-suspend after the next time I upgrade, or on a different computer. So I'd like to keep it around, commented out, just in case.

Okay, let's comment it out with an XML comment:

<!-- Comment out the pm-suspend version:
<item label="Sleep">
<action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
</item>
-->
<item label="Sleep">
<action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
</item>


Reconfigure Openbox to see the new menu.xml, and I get a "parser error : Comment not terminated". It turns out that you can't include double dashes inside XML comments, ever. (A web search on xml comments dashes will show some other amusing problems this causes in various programs.)

So what to do? An Openbox friend had a great suggestion: use a CDATA section. Basically, CDATA means an unparsed string, one which might include newlines, quotes, or anything else besides the cdata end tag, which is ]]>. So add such a string in the middle of the configuration file, and hope that it's ignored.

So I tried it:

<![CDATA[  Comment out the pm-suspend version:
<item label="Sleep">
<action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
</item>
]]>
<item label="Sleep">
<action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
</item>


Worked fine!

Then I had the bright idea that I wanted to wrap it inside regular HTML comments, so editors like Emacs would recognize it as a commented section and color it differently:

<!-- WARNING: THIS DOESN'T WORK:
<![CDATA[
<item label="Sleep">
<action name="Execute"><execute>sudo pm-suspend --auto-quirks</execute></action>
</item>
]]> -->
<item label="Sleep">
<action name="Execute"><execute>sudo /etc/acpi/sleep.sh</execute></action>
</item>


That, sadly, did not work. Apparently XML's hatred of double-dashes inside a comment extends even when they're inside a CDATA section. But that's okay -- colorizing the comments inside my editor is less important than being able to comment things out in the first place.

Tags: ,
[ 20:20 Aug 07, 2012    More tech/web | permalink to this entry | comments ]

Displaying equations on the web

How do you show equations on a web page? Every now and then, I write an article that involves math, and I wrestle with that problem.

The obvious (but wrong) approach: MathML

It was nearly fifteen years ago that MathML was recommended as a standard for embedding equations inside an HTML page. I remember being excited about it back then. There were a few problems -- like the availability of fonts including symbols for integrals, summations and so forth -- but they seemed minor. That was 1998.

Now, in 2012, I found myself wanting to write an article involving an integral, so I looked into the state of MathML. I found that even now, all these years later, it wasn't widely supported.

In Firefox I could show some simple equations, like ${\int }_{x = 0}^{\infty }\frac{dx}{x}$ and $x=\frac{-b±\sqrt{{b}^{2}-4ac}}{2a}$

But when I tried them in Chromium, I learned that webkit-based browsers don't support MathML. At all. The exception is Safari: apparently Apple has added some MathML support into their browser but hasn't contributed that code back to webkit (yet?)

Besides that, MathML is ridiculously hard to use. Here's the code for that little integral:

<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<semantics>
<mrow>
<msubsup>
<mo>&int;</mo>
<mn>x = 0</mn>
<mi>&#x221E;</mi>
</msubsup>
<mfrac>
<mrow>
<mo>&dd;</mo>
<mi>x</mi>
</mrow>
<mi>x</mi>
</mfrac>
</mrow>
</semantics>
</mrow>
[/itex]


Ugh! You can't even specify infinity without using an HTML numeric entity. And the code for the quadratic equation is even worse (use View Source if you want to see it).

Good ol' tables

Several years ago, I wrote about the Twelve Days of Christmas and how to calculate the total number of gifts represented in the song.

I needed summations, and I was rather proud of working out a way to use HTML tables to display all the sums and line up everything correctly. It wasn't exactly publication-quality graphics, but it was readable.

More recently, I worked out a way to do exponentials that way, and found a hint about how to do integrals:

 now ∫ P (t) dt P0 = ———— 1 + t 0

Looks a little better than the tiny MathML version. But the code isn't any easier to read:

<table border="0" cellpadding="0" cellspacing="0">
<tr><td><td align="center"><small><i>now</i></small></td><td></td><td></td></tr>
<tr>
<td>
<td rowspan="3" valign="middle"><font size="6" style="font-size:3em" class="bigsym">&#8747;</font>
<td align="center"><i>P</i>&nbsp;(<i>t</i>)</td>

<td rowspan="3" valign="middle">&nbsp;<i>dt</i></td></tr>
<tr><td>P<sub>0</sub> =<td align="center">&mdash;&mdash;&mdash;&mdash;</td></tr>
<tr><td><td align="center">1 + <i>t</i></td></tr>
<tr><td><td valign="top"><small><i>0</i></small></td><td></td><td></td></tr>
</table>


The solution: MathJax

And then I discovered MathJAX. It was added recently to the Udacity forums, and I think it's also what MITx is using for their courses.

MathJax is fantastic. It's an open-source library that lets you specify equations in readable ways -- you can use MathML, but you can also use LaTEX or even ASCII math like x = (-b +- sqrt(b^2-4ac))/(2a) .

It uses Javascript: you put your equations in the text of the page with delimiters like $$around them (you can control the delimiters), then run a function that scans the page content and replaces any equations it sees with pretty graphics. (Viewers using NoScript or similar extensions will need to allow mathjax.org to see the equations, unless you make a local copy of the mathjax.org libraries, which you probably should anyway if you're using a lot of equations.) For displaying those graphics, MathJax might use MathML, HTML and CSS, or whatever, depending on the user's browser ... but you don't have to worry about that. (Alas, even in Firefox, MathML rendering isn't up to par so MathJax doesn't use it by default, though you can specify it as an option if you know your equations render well.) Here's that integral again, using LaTeX format:$$ P_0 =\int_0^\infty \frac {P(t) dt}{1 + t} $$and$$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$It's beautiful! And although I don't know LaTex at all -- been wanting an excuse to learn it -- I put together that integral with five minutes of web searching. (The quadratic code came from a MathJax demo page.) Here's what the code looks like: $$ P_0 =\int_0^\infty \frac {P(t) dt}{1 + t}  x = {-b \pm \sqrt{b^2-4ac} \over 2a} 


MathJax is even smart enough to notice the code there is in a <pre> tag, so I didn't have to find a way to escape it.

I'm sold! The MathJax team has really put together a nice package, and I think we'll be seeing it on a lot more websites. If you want to try it, start here: Getting Started with MathJAX.

Tags: , , , ,
[ 16:45 Apr 03, 2012    More science | permalink to this entry | comments ]

HTML and Javascript Presentations

When I give talks that need slides, I've been using my Slide Presentations in HTML and JavaScript for many years. I uploaded it in 2007 -- then left it there, without many updates.

But meanwhile, I've been giving lots of presentations, tweaking the code, tweaking the CSS to make it display better. And every now and then I get reminded that a few other people besides me are using this stuff.

For instance, around a year ago, I gave a talk where nearly all the slides were just images. Silly to have to make a separate HTML file to go with each image. Why not just have one file, img.html, that can show different images? So I wrote some code that lets you go to a URL like img.html?pix/whizzyphoto.jpg, and it will display it properly, and the Next and Previous slide links will still work.

Of course, I tweak this software mainly when I have a talk coming up. I've been working lately on my SCALE talk, coming up on January 22: Fun with Linux and Devices (be ready for some fun Arduino demos!) Sometimes when I overload on talk preparation, I procrastinate by hacking the software instead of the content of the actual talk. So I've added some nice changes just in the past few weeks.

For instance, the speaker notes that remind me of where I am in the talk and what's coming next. I didn't have any way to add notes on image slides. But I need them on those slides, too -- so I added that.

Then I decided it was silly not to have some sort of automatic reminder of what the next slide was. Why should I have to put it in the speaker notes by hand? So that went in too.

And now I've done the less fun part -- collecting it all together and documenting the new additions. So if you're using my HTML/JS slide kit -- or if you think you might be interested in something like that as an alternative to Powerpoint or Libre Office Presenter -- check out the presentation I have explaining the package, including the new features.

You can find it here: Slide Presentations in HTML and JavaScript

Tags: , , , , ,
[ 21:08 Jan 12, 2012    More speaking | permalink to this entry | comments ]

Disable Google's Instant mode, and Instant Previews

A group of us were commiserating about that widely-reviled feature, Google Instant. That's the thing that refreshes your Google search page while you're still typing, so you always feel like you have to type reallyreallyfasttofinishyourquerybeforeitupdates. Google lets you turn off Instant -- but only if you let them set and remember your cookies, meaning they can also track you across the web. Isn't there a more privacy-preserving way to get a simple Google page that doesn't constantly change as you change your search query?

Disable Instant

It turns out there is. Just add complete=0 to your search queries.

How do you do that? Well, in Firefox, I search in the normal URL bar. No need for a separate search field taking up space in the browser window; any time you type multiple terms (or a space followed by a single term) in Firefox's URLbar, it appends your terms to whatever you have set as the keyword.URL preference.

So go to about:config and search for keyword, then double-click on keyword.URL and make sure it's something like "http://www.google.com/search?complete=0&q=". Or if you want to make sure it won't be overridden, find your Firefox profile, edit user.js (create it if you don't have one already), and add a line like:

user_pref("keyword.URL", "http://www.google.com/search?complete=0&q=");


Show only pages matching the search terms

I use a slightly longer query, myself:

user_pref("keyword.URL", "http://www.google.com/search?complete=0&q=allintext%3A+"


Adding allintext: as the first word in any search query tells Google not to show pages that don't have the search terms as part of the page. You might think this would be the default ... but The Google Works in Mysterious Ways and it is Not Ours to Question.

Disable Instant Previews

Finally, just recently Google has changed their search page again to add a bunch of crap down the right side of the page which, if you accidentally mouse on it, loads a miniature preview of the page over on your sidebar. You have to be very careful with your mouse not to have stuff you might not be interested in popping up all the time.

A moment's work with Firebug gave me the CSS classes I needed to hide. Edit chrome/userContent.css in your Firefox profile (create it if you don't already have one) and add this rule:

/* Turn off the "instant preview" annoying buttons in google search results */
.vspib, .vspii { display: none !important; }


Really, it's a darn shame that Google has gone from its origins as a clean, simple website to something like Facebook with things popping up all over that users have to bend over backward to disable. But that seems to be the way of the web. Good thing browsers are configurable!

[ 22:31 Oct 09, 2011    More tech/web | permalink to this entry | comments ]

Fixing broken highlighting in Google search bar

Google has been doing a horrible UI experiment with me recently involving its search field.

I search for something -- fine, I get a normal search page page. At the top of the page is a text field with my search terms, like this:

Now suppose I want to modify my search. Suppose I double-click the word "ui", or drag my mouse across it to select it, perhaps intending to replace it with something else. Here's what happens:

Whoops! It highlighted something other than what I clicked, changed the font size of the highlighted text and moved it. Now I have no idea what I'm modifying.

This started happening several weeks ago (at about the same time they made Instant Seach mandatory -- yuck). It only happens on one of my machines, so I can only assume they're running one of their little UI experiments with me, but clearing google cookies (or even banning cookies from Google) didn't help. Blacklisting Google from javascript cures it, but then I can't use Google Maps or other services.

For a week or so, I tried using other search engines. Someone pointed me to Duck Duck Go, which isn't bad for general searches. But when it gets to technical searches, or elaborate searches with OR and - operators, google's search really is better. Except for, you know, minor details like not being able to edit your search terms.

But finally it occurred to me to try firebug. Maybe I could find out why the font size was getting changed. Indeed, a little poking around with firebug showed a suspicious-looking rule on the search field:

.gsfi, .lst {
font: 17px arial,sans-serif;
}

and disabling that made highlighting work again.

So to fix it permanently, I added the following to chrome/userContent.css in my Firefox profile directory:

.gsfi, .lst {
font-family: inherit !important;
font-size: inherit !important;
}


And now I can select text again! At least until the next time Google changes the rule and I have to go back to Firebug to chase it down all over again.

No, it does not make search easier to use to change the font size in the middle of someone's edits. It just drives the victim away to try other search engines.

Tags: , , ,
[ 22:05 Aug 16, 2011    More tech/web | permalink to this entry | comments ]

Tue, 09 Aug 2011

A while ago I switched ISPs, and maintaining a lot of email addresses got more complicated. So I decided to consolidate.

But changing your email address turns out to be tricky on some sites. For example, on Amazon it apparently requires a phone call to customer support (I haven't gotten around to it yet, but that's what their email support people told me to do).

Then there's Yahoo groups. I'm in quite a few groups, so when I made the switch, I went to groups.yahoo.com, added a valid address and made it my primary address. Great -- thought I was done.

Weeks later, it occurred to me that I hadn't been getting any mail from a bunch of groups I used to get mail from. I went to Yahoo groups and clicked around for five minutes trying to find something that would show me my email addresses. Eventually I gave up on that, went to one of the groups I hadn't been getting, and saw a notice at the top:

So naturally, I clicked on the More info here link, and got taken to a page that said:

Groups Error: No Permission

No Permission

Gosh, that's some helpful info, Yahoo!

So how do you really change it?

There are lots of ways to get to the Yahoo Groups "Manage your email addresses" page -- but it shows only the new address, listed as primary, as primary, and doesn't show the old address where it's actually trying to send all the mail. No way to delete it from there.

Now, you can Edit membership in any particular group: that shows both the old nonworking address (with the box checked) and the new one (check the box to change it). Great -- so I'm supposed to do that for all 25 or so groups I'm in? Seriously?

After much searching, I finally found an old discussion thread with a link to the Edit my groups page. Scroll down to the bottom and look for "Set all of the above to".

It's still not a one-step operation -- my groups are spread across three pages and there's no "View all on one page", and each time you submit a page, it takes you back to "View groups" mode so you have to click on the next page, then click on "Edit groups" again. Still, it's a heck of a lot faster than going through all the groups one by one.

In theory it's all changed now. But then, I thought that last time ... time will tell whether the mail actually starts flowing again.

Meanwhile, Yahoo developers: you might want to take a look at that "More info" page that just gives a permission error.

Tags: ,
[ 18:58 Aug 09, 2011    More tech | permalink to this entry | comments ]

Adventures with Virtual hosts and CGI on Apache 2.2

We had a server that was still running Debian Etch -- for which Debian just dropped support. We would have upgraded that machine to Lenny long ago except for one impediment: upgrading the live web server from apache 1 to apache 2.2.

Installing etch's apache 2.2.3 package and getting the website running under it was no problem. Debian has vastly improved their apache2 setup from years past -- for instance, installing PHP also enables it now, so you don't need to track down all the places it needs to be turned on.

But when we upgraded to Lenny and its apache 2.2.9, things broke. Getting it working again was tricky because most of the documentation is standard Apache documentation, not based on Debian's more complex setup. Here are the solutions we found.

Enabling virtual hosts

As soon as the new apache 2.2.9 was running, we lost all our websites, because the virtual hosts that had worked fine on Etch broke under Lenny's 2.2.9. Plus, every restart complained [warn] NameVirtualHost *:80 has no VirtualHosts.

All the web documentation said that we had to change the <VirtualHost *> lines to <VirtualHost *:80>. But that didn't help. Most documentation also said we would also need the line: NameVirtualHost *:80 Usually people seemed to find it worked best to put that in a newly created file called conf.d/virtualhosts. Our Lenny upgrade had already created that line and put it in ports.conf, but it didn't work either there or in conf.d/virtualhosts.

It turned out the key was to remove the NameVirtualHost *:80 line from ports.conf, and add it in sites-available/default. Removing it from ports was the important step: if it was in ports.conf at all, then it didn't matter if it was also in the default virtual host.

Enabling CGI scripts

Another problem to track down: CGI scripts had stopped working. I knew about Options +ExecCGI, but adding it wasn't helping. Turned out it also needed an AddHandler, which I don't remember having to add in recent versions on Ubuntu. I added this in the relevant virtual host file in sites-available:

  <Directory />
Options ExecCGI
</Directory>


Enabling .htaccess

We have one enduring mystery: .htaccess files work without needing a line like AllowOverride FileInfo anywhere. I've needed to add that directive in Ubuntu-based apache2 installations, but Lenny seems to allow .htaccess without any override for it. I'm still not sure why it works. It's not supposed to. But hey, without a few mysteries, computers would be boring, right?

Tags: , ,
[ 21:46 Jul 05, 2010    More tech/web | permalink to this entry | comments ]

Use Firefox User CSS to make LinkedIn discussions scroll normally

Several groups I'm in insist on using LinkedIn for discussions, instead of a mailing list. No idea why -- it's so much harder to use -- but for some reason that's where the community went.

Which is fine except for happens just about every time I try to view a discussion: I get a notice of a thread that sounds interesting, click on the link to view it, read the first posting, hit the space bar to scroll down ... whoops! Focus was in that silly search field at the top right of the page, so it won't scroll.

It's even more fun if I've already scrolled down a bit with the mousewheel -- in that case, hitting spacebar jumps back up to the top of the page, losing any context I have as well as making me click in the page before I can actually scroll.

Setting focus to search fields is a good thing on some pages. Google does it, which makes terrific sense -- if you go to google.com, your main purpose is to type something in that search box.

It doesn't, however, make sense on a page whose purpose is to let people read through a long discussion thread.

Since I never use that search field anyway, though, I came up with a solution using Firefox's user css. It seems there's no way to make an input field un-focusable or read-only using pure CSS (of course, you could use Javascript and Greasemonkey for that); but as long as you don't need to use it, you can make it disappear entirely.

form#global-search span#autocomplete-container input#main-search-box {
visibility:hidden;
}


Then restart Firefox and load a discussion page. The search box should be hidden, and spacebar should scroll the page just like it does on most web pages.

Of course, this will need to be updated the next time LinkedIn changes their page layout. And it's vaguely possible that somewhere else on the web is a page with that hierarchy of element names. But that's easy enough to fix: run a View Page Source on the LinkedIn page and add another level or two to the CSS rule. The concept is the important thing.

Tags: , , , ,
[ 17:17 Jun 25, 2010    More tech/web | permalink to this entry | comments ]

Displaying images from Javascript file inputs

(despite Firefox's attempts to prevent that)

My Linux Planet article last week was on printing pretty calendars. But I hit one bug in Photo Calendar. It had a HTML file chooser for picking an image ... and when I chose an image and clicked Select to use it. it got the pathname wrong every time.

I poked into the code (Photo Calendar's code turned out to be exceptionally clean and well documented) and found that it was expecting to get the pathname from the file input element's value attribute. But input.File.value was just returning the filename, foo.jpg, instead of the full pathname, /home/user/Images/yosemite/foo.jpg. So when the app tried to make it into a file:/// URL, it ended up pointing to the wrong place.

It turned out the cause was a security change in Firefox 3. The issue: it's considered a security hole to expose full pathnames on your computer to Javascript code coming from someone else's server. The Javascript could give bad guys access to information about the directory structures on your disk. That's a perfectly reasonable concern, and it makes sense to consider it as a security hole.

The problem is that this happens even when you're running a local app on your local disk. Programs written in any other language and toolkit -- a Python program using pygtk, say, or a C++ Qt program -- have access to the directories on your disk, but you can't use Javascript inside Firefox to do the same thing. The only ways to make an exception seems to be an elaborate procedure requiring the user to change settings in about:config. Not too helpful.

Perhaps this is even reasonable, given how common cross-site scripting bugs have been in browsers lately -- maybe running a local script really is a security risk if you have other tabs active. But it leaves us with the problem of what to do about apps that need to do things like choose a local image file, then display it.

And it turns out there is: a data URL. Take the entire contents of the file (ouch) and create a URL out of those contents, then set the src attribute of the image to that.

Of course, that makes for a long, horrifying, unreadable URL -- but the user never has to see that part. I suspect it's also horribly memory intensive -- the image has to be loaded into memory anyway, to display it, but is Firefox also translating all of that to a URL-legal syntax? Obviously, any real app using this technique had better keep an eye on memory consumption. But meanwhile, it fixes Photo Calendar's file button.

Here's what the code looks like:

  img = document.getElementById("pic");
fileinput = document.input.File;
if (img && fileinput)
img.src = fileinput.files[0].getAsDataURL();


Here's a working minimal demo of using getAsDataURL() with a file input.

Tags: , ,
[ 14:57 Jan 17, 2010    More programming | permalink to this entry | comments ]

"Cookies are small text files" -- what?

"Cookies are small text files which websites place on a visitor's computer."

I've seen this exact phrase hundreds of times, most recently on a site that should know better, The Register.

I'm dying to know who started this ridiculous non-explanation, and why they decided to explain cookies using an implementation detail from one browser -- at least, I'm guessing IE must implement cookies using separate small files, or must have done so at one point. Firefox stores them all in one file, previously a flat file and now an sqlite database.

How many users who don't know what a cookie is do know what a "text file" is? No, really, I'm serious. If you're a geek, go ask a few non-geeks what a text file is and how it differs from other files. Ask them what they'd use to view or edit a text file. Hint: if they say "Microsoft Word" or "Open Office", they don't know.

And what exactly makes a cookie file "text" anyway? In Firefox, cookies.sqlite is most definitely not a "text file" -- it's full of unprintable characters. But even if IE stores cookies using printable characters -- have you tried to read your cookies? I just went and looked at mine, and most of them looked something like this:

Name: __utma
Value: 76673194.4936867407419370000.1243964826.1243871526.1243872726.2

I don't know about you, but I don't spend a lot of time reading text that looks like that.

Why not skip the implementation details entirely, and just tell users what cookies are? Users don't care if they're stored in one file or many, or what character set is used. How about this?

I don't know who started this meme or why people keep copying it without stopping to think. But I smell a Fox Terrier. That was Stephen Jay Gould's example of a factoid invented by one writer and blindly copied by all who come later, (the fox terrier -- and no other breed -- was used for years to describe the size of Eohippus). At least that one was reasonably close. Gould went on to describe many more examples where people copied the wrong information, each successive textbook copying the last with no one ever going back to the source to check the information. It's usually a sign that the writer doesn't really understand what they're writing. Surely copying the phrase everyone else uses must be safe!

[ 21:25 Dec 01, 2009    More tech/web | permalink to this entry | comments ]

Building a Py-Webkit-GTK presentation tool

I almost always write my presentation slides using HTML. Usually I use Firefox to present them; it's the browser I normally run, so I know it's installd and the slides all work there. But there are several disadvantages to using Firefox:

• In fullscreen mode, it has a small "minimized urlbar" at the top of the screen that I've never figured out to banish -- not only is it visible to users, but it also messes up the geometry of the slides (they have to be 762 pixels high rather than 768);
• It's very heavyweight, bad when using a mini laptop or netbook;
• Any personal browsing preferences, like no-animation, flashblock or noscript, apply to slides too unless explicitly disabled, which I've forgotten to do more than once before a talk.

Last year, when I was researching lightweight browsers, one of the ones that impressed me most was something I didn't expect: the demo app that comes with pywebkitgtk (package python-webkit on Ubuntu). In just a few lines of Python, you can create your own browser with any UI you like, with a fully functional content area. Their current demo even has tabs.

So why not use pywebkitgtk to create a simple fullscreen webkit-based presentation tool?

It was even simpler than I expected. Here's the code:

#!/usr/bin/env python
# python-gtk-webkit presentation program.
# Copyright (C) 2009 by Akkana Peck.
# Share and enjoy under the GPL v2 or later.

import sys
import gobject
import gtk
import webkit

class WebBrowser(gtk.Window):
def __init__(self, url):
gtk.Window.__init__(self)
self.fullscreen()

self._browser= webkit.WebView()
self.connect('destroy', gtk.main_quit)

self._browser.open(url)
self.show_all()

if __name__ == "__main__":
if len(sys.argv) <= 1 :
print "Usage:", sys.argv[0], "url"
sys.exit(0)

webbrowser = WebBrowser(sys.argv[1])
gtk.main()


That's all! No navigation needed, since the slides include javascript navigation to skip to the next slide, previous, beginning and end. It does need some way to quit (for now I kill it with ctrl-C) but that should be easy to add.

Webkit and image buffering

It works great. The only problem is that webkit's image loading turns out to be fairly poor compared to Firefox's. In a presentation where most slides are full-page images, webkit clears the browser screen to white, then loads the image, creating a noticable flash each time. Having the images in cache, by stepping through the slide show then starting from the beginning again, doesn't help much (these are local images on disk anyway, not loaded from the net). Firefox loads the same images with no flash and no perceptible delay.

I'm not sure if there's a solution. I asked some webkit developers and the only suggestion I got was to rewrite the javascript in the slides to do image preloading. I'd rather not do that -- it would complicate the slide code quite a bit solely for a problem that exists only in one library.

There might be some clever way to hack double-buffering in the app code. Perhaps something like catching the 'load-started' signal, switching to another gtk widget that's a static copy of the current page (if there's a way to do that), then switching back on 'load-finished'.

But that will be a separate article if I figure it out. Ideas welcome!

Update, years later: I've used this for quite a few real presentations now. Of course, I keep tweaking it: see my scripts page for the latest version.

Tags: , , , ,
[ 17:12 Nov 11, 2009    More programming | permalink to this entry | comments ]

Quick Firefox tip: Hide the "Additional plugins" bar

Dave just discovered a useful preference in Firefox.

So many pages give that annoying info bar at the top that says "Additional plugins are needed to view this page." It doesn't tell you which plugins, but for Linux users it's a safe bet that whatever they are, you can't get them. Why have the stupid nagbar taking up real estate on the page for something you can't do anything about?

Displaying the info bar is the right thing for Firefox to do, of course. Some users may love to go traipsing off installing random plugins to make sure they see every annoying bit of animation and sound on a page. But Dave's excellent discovery was that the rest of us can turn off that bar.

The preference is plugins.hide_infobar_for_missing_plugin and you can see it by going to about:config and typing missing. Then double-click the line, and you'll never see that nagbar again.

Tags: , , ,
[ 12:09 Jul 14, 2009    More tech/web | permalink to this entry | comments ]

Newbie Greasemonkey script writing

I was reading a terrific article on the New York Times about Watching Whales Watching Us. At least, I was trying to read it -- but the NYT website forces font faces and sizes that, on my system, end up giving me a tiny font that's too small to read. Of course I can increase font size with Ctrl-+ -- but it gets old having to do that every time I load a NYT page.

The first step was to get Greasemonkey working on Firefox 3.5. "Update scripts" doesn't find a new script, and if you go to Greasemonkey's home page, the last entry is from many months ago and announces Firefox 3.1 support. But curiously, if you go to the Greasemonkey page on the regular Mozilla add-ons site, it does support 3.5.

I've had Greasemonkey for quite some time, but every time I try to get started writing a script I have trouble getting started. There are dozens of Greasemonkey tutorials on the web, but most of them are oriented toward installing scripts and don't address "What do you type into the fields of the Greasemonkey New User Script dialog?"

Fortunately, I did find one that explained it: The beginner's guide to Greasemonkey scripting. I gave my script a name (NYT font) and a namespace (my own domain), added http://*nytimes.com/* for Includes, and nothing for Excludes.

Click OK, and Greasemonkey offers a "choose editor" dialog. I chose emacs, which mostly worked though the emacs window unaccountably came up with a split window that I had to dismiss with C-x 1.

Now what to type in the editor? Firebug came to the rescue here.

I went back to the NYT page with the too-small fonts and clicked on Firebug. The body style showed that they're setting

font-family: Georgia, serif
font-size: 84.5%


84.5%? Where does that come from? What happens if I change that to 100%? Fortunately, I can test that right there in the Firebug window. 100% made the fonts fairly huge, but 90% was about right.

I went back to greasemonkey's editor window and added:

document.body.style.fontSize = "90%";


Saved the file, and that was all I needed! Once I hit Reload on the NYT page I got a much more readable font size.

[ 12:30 Jul 12, 2009    More tech/web | permalink to this entry | comments ]

Wrapping plaintext files in Firefox

A friend pointed me to a story she'd written. It was online as a .txt file. Unfortunately, it had no line breaks, and Firefox presented it with a horizontal scrollbar and no option to wrap the text to fit in the browser window.

But I was sure that was a long-solved problem -- surely there must be a userContent.css rule or a bookmarklet to handle text with long lines. The trick was to come up with the right Google query. Like this one: firefox OR mozilla wrap text userContent OR bookmarklet

I settled on the simple CSS rule from Tero Karvinen's page on Making preformated <pre> text wrap in CSS3, Mozilla, Opera and IE:

pre {
white-space: -moz-pre-wrap !important;
}

Add it to chrome/userContent.css and you're done.

But some people might prefer not to apply the rule to all text. If you'd prefer a rule that can be applied at will, a bookmarklet would be better. Like the word wrap bookmarklet from Return of the Sasquatch or the one from Jesse Ruderman's Bookmarklets for Zapping Annoyances collection.

Tags: , , , , ,
[ 11:47 Apr 08, 2008    More tech/web | permalink to this entry | comments ]

Firefox, caching, and fast Back/Forward buttons

I remember a few years ago the Mozilla folks were making a lot of noise about the "blazingly fast Back/Forward" that was coming up in the (then) next version of Firefox. The idea was that the layout engine was going to remember how the page was laid out (technically, there would be a "frame cache" as opposed to the normal cache which only remembers the HTML of the page). So when you click the Back button, Firefox would remember everything it knew about that page -- it wouldn't have to parse the HTML again or figure out how to lay out all those tables and images, it would just instantly display what the page looked like last time.

Time passed ... and Back/Forward didn't get faster. In fact, they got a lot slower. The "Blazingly Fast Back" code did get checked in (here's how to enable it) but somehow it never seemed to make any difference.

The problem, it turns out, is that the landing of bug 101832 added code to respect a couple of HTTP Cache-Control header settings, no-store and no-cache. There's also a third cache control header, must-revalidate, which is similar (the difference among the three settings is fairly subtle, and Firefox seems to treat them pretty much the same way).

Translated, that means that web servers, when they send you a page, can send some information along with the page that asks the browser "Please don't keep a local copy of this page -- any time you want it again, go back to the web and get a new copy."

There are pages for which this makes sense. Consider a secure bank site. You log in, you do your banking, you view your balance and other details, you log out and go to lunch ... then someone else comes by and clicks Back on your browser and can now see all those bank pages you were just viewing. That's why banks like to set no-cache headers.

But those are secure pages (https, not http). There are probably reasons for some non-secure pages to use no-cache or no-store ... um ... I can't think of any offhand, but I'm sure there are some.

But for most pages it's just silly. If I click Back, why wouldn't I want to go back to the exact same page I was just looking at? Why would I want to wait for it to reload everything from the server?

The problem is that modern Content Management Systems (CMSes) almost always set one or more of these headers. Consider the Linux.conf.au site. Linx.conf.au is one of the most clueful, geeky conferences around. Yet the software running their site sets

  Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache

on every page. I'm sure this isn't intentional -- it makes no sense for a bunch of basically static pages showing information about a conference several months away. Drupal, the CMS used by LinuxChix sets Cache-Control: must-revalidate -- again, pointless. All it does is make you afraid to click on links because then if you want to go Back it'll take forever. (I asked some Drupal folks about this and they said it could be changed with drupal_set_header).

(By the way, you can check the http headers on any page with: wget -S -O /dev/null http://... or, if you have curl, curl --head http://...)

Here's an excellent summary of the options in an Opera developer's blog, explaining why the way Firefox handle caching is not only unfriendly to the user, but also wrong according to the specs. (Darn it, reading sensible articles like that make me wish I wasn't so deeply invested in Mozilla technology -- Opera cares so much more about the user experience.)

But, short of a switch to Opera, how could I fix it on my end? Google wasn't any help, but I figured that this must be a reported Mozilla bug, so I turned to Bugzilla and found quite a lot. Here's the scoop. First, the code to respect the cache settings (slowing down Back/Forward) was apparently added in response to bug 101832. People quickly noticed the performance problem, and filed 112564. (This was back in late 2001.) There was a long debate, but in the end, a fix was checked in which allowed no-cache http (non-secure) sites to cache and get a fast Back/Forward. This didn't help no-store and must-revalidate sites, which were still just as slow as ever.

Then a few months later, bug 135289 changed this code around quite a bit. I'm still getting my head around the code involved in the two bugs, but I think this update didn't change the basic rules governing which pages get revalidated.

(Warning: geekage alert for next two paragraphs. Use this fix at your own risk, etc.)

Unfortunately, it looks like the only way to fix this is in the C++ code. For folks not afraid of building Firefox, the code lives in nsDocShell::ShouldDiscardLayoutState and controls the no-cache and no-store directives. In nsDocShell::ShouldDiscardLayoutState (currently lie 8224, but don't count on it), the final line is:

    return (noStore || (noCache && securityInfo));

Change that to
    return ((noStore || noCache) && securityInfo);

and Back/Forward will get instantly faster, while still preserving security for https. (If you don't care about that security issue and want pages to cache no matter what, just replace the whole function with return PR_FALSE; )

The must-validate setting is handled in a completely different place, in nsHttpChannel. However, for some reason, fixing nsDocShell also fixes Drupal pages which set only must-validate. I don't quite understand why yet. More study required. (End geekage.)

Any Mozilla folks are welcome to tell me why I shouldn't be doing this, or if there's a better way (especially if it's possible in a way that would work from an extension or preference). I'd also be interested in from Drupal or other CMS folks defending why so many CMSes destroy the user experience like this. But please first read the Opera article referenced above, so that you understand why I and so many other users have complained about it. I'm happy to share any comments I receive (let me know if you want your comments to be public or not).

Tags: , , , ,
[ 20:32 Oct 20, 2007    More tech/web | permalink to this entry | comments ]

Make Amazon pages narrow enough to read

I like buying from Amazon, but it's gotten a lot more difficult since they changed their web page design to assume super-wide browser windows. On the browser sizes I tend to use, even if I scroll right I can't read the reviews of books, because the content itself is wider than my browser window. Really, what's up with the current craze of insisting that everyone upgrade their screen sizes then run browser windows maximized?

(I'd give a lot for a browser that had the concept of "just show me the page in the space I have". Opera has made some progress on this and if they got it really working it might even entice me away from Firefox, despite my preference for open source and my investment in Mozilla technology ... but so far it isn't better enough to justify a switch.)

I keep meaning to try the greasemonkey extension, but still haven't gotten around to it. Today, I had a little time, so I googled to see if anyone had already written a greasemonkey script to make Amazon readable. I couldn't find one, but I did find a page from last October trying to fix a similar problem on another website, which mentioned difficulties in keeping scripts working under greasemonkey, and offered a Javascript bookmarklet with similar functionality.

Now we're talking! A bookmarklet sounds a lot simpler and more secure than learning how to program Greasemonkey. So I grabbed the bookmarklet, a copy of an Amazon page, and my trusty DOM Inspector window and set about figuring out how to make Amazon fit in my window.

It didn't take long to realize that what I needed was CSS, not Javascript. Which is potentially a lot easier: "all" I needed to do was find the right CSS rules to put in userContent.css. "All" is in quotes because getting CSS to do anything is seldom a trivial task.

But after way too much fiddling, I did finally come up with a rule to get Amazon's Editorial Reviews to fit. Put this in chrome/userContent.css inside your Firefox profile directory (if you don't know where your profile directory is, search your disk for a file called prefs.js):

div#productDescription div.content { max-width: 90% !important; }


You can replace that 90% with a pixel measurement, like 770px, or with a different percentage.

I spent quite a long time trying to get the user reviews (a table with two columns) to fit as well, without success. I was trying things like:

#customerReviews > div.content > table > tbody > tr > td { max-width: 300px; min-width: 10px !important; }
div#customerReviews > div.content > table { margin-right: 110px !important; }

but nothing worked, and some of it (like the latter of those two lines) actually interfered with the div.content rule for reasons I still don't understand. (If any of you CSS gurus want to enlighten me, or suggest a better or more complete solution, or solutions that work with other web pages, I'm all ears!)

I'll try for a more complete solution some other time, but for now, I'm spending my July 4th celebrating my independance from Amazon's idea of the one true browser width.

Tags: , , , ,
[ 21:01 Jul 04, 2007    More tech/web | permalink to this entry | comments ]

Xkcd Search Bookmarklet

Today's topics are three: the excellent comic called xkcd, the use of google to search a site but exclude parts of that site, and, most important, the useful Mozilla technique called Bookmarklets.

I found myself wanting to show someone a particular xkcd comic (the one about dreams). Xkcd, for anyone who hasn't been introduced, is a wonderfully geeky, smart, and thoughtful comic strip drawn by Randall Munroe.

How to search for a comic strip? Xkcd has an archive page but that seems to have a fairly small subset of all the comics. But fortunately the comics also have titles and alt tags, which google can index.

But googling for dreams site:xkcd.org gets me lots of hits on xkcd's forum and blag pages (which I hadn't even known existed) rather than just finding the comic I wanted. After some fiddling, though, I managed to find a way to exclude all the fora and blag pages: google for xkcd dreams site:xkcd.com -site:forums.xkcd.com -site:fora.xkcd.com -site:blag.xkcd.com
Nifty!

In fact, it was so nifty that I decided I might want to use it again. Fortunately, Mozilla browsers like Firefox have a great feature called bookmarklets. Bookmarklets are like shell aliases in Linux: they let you assign an alias to a bookmark, then substitute in your own terms each time you use it.

That's probably not clear, so here's how it works in this specific case:

1. I did the google search I listed above, which gave me this long and seemingly scary URL: http://www.google.com/search?hl=en&q=xkcd+dreams+site%3Axkcd.com+-site%3Aforums.xkcd.com+-site%3Afora.xkcd.com+-site%3Ablag.xkcd.com&btnG=Search
2. Bookmarks->Bookmark this page. Unfortunately Firefox doesn't let you change any bookmark properties at the time you make the bookmark, so:
3. Bookmarks->Organize Bookmarks, find the new bookmark (down at the bottom of the list) and Edit->Properties...
4. Change the Name to something useful (I called it Xkcd search) then choose a simple word for the Keyword field. This is the "alias" you'll use for the bookmark. I chose xkcd.
5. In the Location field, find the term you want to be variable. In this case, that's "dreams", because I won't always be searching for the comic about dreams, I might want to search for anything. Change that term to %s.
(Note to non-programmers: %s is a term often used in programming languages to mean "replace the %s with a string I'll provide later.")
So now the Location looks like: http://www.google.com/search?hl=en&q=xkcd+%s+site%3Axkcd.com+-site%3Aforums.xkcd.com+-site%3Afora.xkcd.com+-site%3Ablag.xkcd.com&btnG=Search
6. Save the bookmarklet (click OK) and, optionally, drag it into a folder somewhere where it won't clutter up your bookmarks menu. You aren't ever going to be choosing this from the menu.
Now I had a new bookmarklet. To test it, I went to the urlbar in Firefox and typed:
xkcd "regular expressions"

Voila! The first hit was exactly the comic I wanted.

(You'll find many more useful bookmarklets by googling on bookmarklets.)

Tags: , ,
[ 22:13 Jun 30, 2007    More tech/web | permalink to this entry | comments ]

A Kitfox Extension

For a bit over a year I've been running a patched version of Firefox, which I call Kitfox, as my main browser. I patch it because there are a few really important features that the old Mozilla suite had which Firefox removed; for a long time this kept me from using Firefox (and I'm not the only one who feels that way), but when the Mozilla Foundation stopped supporting the suite and made Firefox the only supported option, I knew my only choice was to make Firefox do what I needed. The patches were pretty simple, but they meant that I've been building my own Firefox all this time.

Since all my changes were in JavaScript code, not C++, I knew this was probably all achievable with a Firefox extension. But never around to it; building the Mozilla source isn't that big a deal to me. I did it as part of my job for quite a few years, and my desktop machine is fast enough that it doesn't take that long to update and rebuild, then copy the result to my laptop.

But when I installed the latest Debian, "Etch", on the laptop, things got more complicated. It turns out Etch is about a year behind in its libraries. Programs built on any other system won't run on Etch. So I'd either have to build Mozilla on my laptop (a daunting prospect, with builds probably in the 4-hour range) or keep another system around for the purpose of building software for Etch. Not worth it. It was time to learn to build an extension.

There are an amazing number of good tutorials on the web for writing Firefox extensions (I won't even bother to link to any; just google firefox extension and make your own choices). They're all organized as step by step examples with sample code. That's great (my favorite type of tutorial) but it left my real question unanswered: what can you do in an extension? The tutorial examples all do simple things like add a new menu or toolbar button. None of them override existing Javascript, as I needed to do.

Canonical URL to the rescue. It's an extension that overrides one of the very behaviors I wanted to override: that of adding "www." to the beginning and ".com" or ".org" to the end of whatever's in the URLbar when you ctrl-click. (The Mozilla suite behaved much more usefully: ctrl-click opened the URL in a new tab, just like ctrl-clicking on a link. You never need to add www. and .com or .org explicitly because the URL loading code will do that for you if the initial name doesn't resolve by itself.) Canonical URL showed me that all you need to do is make an overlay containing your new version of the JavaScript method you want to override. Easy!

So now I have a tiny Kitfox extension that I can use on the laptop or anywhere else. Whee!

Since extensions are kind of a pain to unpack, I also made a source tarball which includes a simple Makefile: kitfox-0.1.tar.gz.

Tags: , , , ,
[ 11:59 May 27, 2007    More tech/web | permalink to this entry | comments ]

Feisty Fawn Versus Apache

In the last installment, I got the Visor driver working. My sitescooper process also requires that I have a local web server (long story), so I needed Apache. It was already there and running (curiously, Apache 1.3.34, not Apache 2), and it was no problem to point the DocumentRoot to the right place.

But when I tested my local site, I discovered that although I could see the text on my website, I couldn't see any of the images. Furthermore, if I right-clicked on any of those images and tried "View image", the link was pointing to the right place (http://localhost/images/foo.jpg). The file (/path/to/mysite/images/foo.jpg) existed with all the right permissions. What was going on?

/var/log/apache/error.log gave me the clue. When I was trying to view http://localhost/images/foo.jpg, apache was throwing this error:

 [error] [client 127.0.0.1] File does not exist: /usr/share/images/foo.jpg

/usr/share/images? Huh?

Searching for usr/share/images in /etc/apache/httpd.conf gave the answer. It turns out that Ubuntu, in their infinite wisdom, has decided that no one would ever want a directory called images in their webspace. Instead, they set up an alias so that any reference to /images gets redirected to /usr/share/images.

WTF?

Anyway, the solution is to comment out that stanza of httpd.conf:

<IfModule mod_alias.c>
#    Alias /icons/ /usr/share/apache/icons/
#
#    <Directory /usr/share/apache/icons>
#         Options Indexes MultiViews
#         AllowOverride None
#         Order allow,deny
#         Allow from all
#    </Directory>
#
#    Alias /images/ /usr/share/images/
#
#    <Directory /usr/share/images>
#         Options MultiViews
#         AllowOverride None
#         Order allow,deny
#         Allow from all
#    </Directory>
</IfModule>


I suppose it's nice that they provided an example for how to use mod_alias. But at the cost of breaking any site that has directories named /images or /icons? Is it just me, or is that a bit crazy?

Tags: , ,
[ 22:55 May 13, 2007    More linux | permalink to this entry | comments ]

The Pesky "Unresponsive Script" Dialog

For quite some time, I've been seeing all too frequently the dialog in Firefox which says:
A script on this page may be busy, or it may have stopped responding. You can stop the script now, or continue to see if the script will complete.
[Continue] [Stop script]

Googling found lots of pages offering advice on how to increase the timeout for scripts from the default of 5 seconds to 20 or more (change the preference dom.max_script_run_time in about:config. But that seemed wrong. I was seeing the dialog on lots of pages where other people didn't see it, even on my desktop machine, which, while it isn't the absolute latest and greatest in supercomputing, still is plenty fast for basic web tasks.

The kicker came when I found the latest page that triggers this dialog: Firefox' own cache viewer. Go to about:cache and click on "List Cache Entries" under Disk cache device. After six or seven seconds I got an Unresponsive script dialog every time. So obviously this wasn't a problem with the web sites I was visiting.

Someone on #mozillazine pointed me to Mozillazine's page discussing this dialog, but it's not very useful. For instance, it includes advice like

To determine what script is running too long, open the Error Console and tell it to stop the script. The Error Console should identify the script causing the problem.
Error console? What's that? I have a JavaScript Console, but it doesn't offer any way to stop scripts. No one on #mozillazine seemed to have any idea where I might find this elusive Error console either. Later Update: turns out this is new with Firefox 2.0. I've edited the Mozillazine page to say so. Funny that no one on IRC knew about it.

But there's a long and interesting MozillaZine discussion of the problem in which it's clear that it's often caused by extensions (which the Mozillazine page had also suggested). I checked the suggested list of Problematic extensions, but I didn't see anything that looked likely.

So I backed up my Firefox profile and set to work, disabling my extensions one at a time. First was Adblock, since it appeared in the Problematic list, but removing it didn't help: I still got the Unresponsive script when viewing my cache.

The next try was Media Player Connectivity. Bingo! No more Unresponsive dialog. That was easy.

Media Player Connectivity never worked right for me anyway. It's supposed to help with pages that offer videos not as a simple video link, like movie.mpeg or movie.mov or whatever, but as an embedded object in the page which insists on a specific browser plug-in (like Apples's QuickTime or Microsoft's Windows Media Player).

Playing these videos in Firefox is a huge pain in the keister -- you have to View Source and crawl through the HTML trying to find the URL for the actual video. Media Player Connectivity is supposed to help by doing the crawl for you and presenting you with video links for any embedded video it finds. But it typically doesn't find anything, and its user interface is so inconsistent and complicated that it's hard to figure out what it's telling you. It also can't follow the playlists and .SMIL files that so many sites use now. So I end up having to crawl through HTML source anyway.

Too bad! Maybe some day someone will find a way to make it easier to view video on Linux Firefox. But at least I seem to have gotten rid of those Unresponsive Script errors. That should make for nicer browsing!

Tags: , , ,
[ 13:07 May 05, 2007    More tech/web | permalink to this entry | comments ]

Enabling CGI and PHP on Apache2

Every time I do a system upgrade on my desktop machine, I end up with a web server that can't do PHP or CGI, and I have to figure out all over again how to enable all the important stuff. It's all buried in various nonobvious places. Following Cory Doctorow's "My blog, my outboard brain" philosophy, I shall record here the steps I needed this time, so next time I can just look them up:
1. Install apache2.
2. Install an appropriate mod-php package (or, alternately, a full fledged PHP package).
3. Edit /etc/apache2/sites-enabled/000-default, find the stanza corresponding to the default site, and change AllowOverride from None to something more permissive. This controls what's allowed through .htaccess files. For testing, use All; for a real environment you'll probably want something more fine grained than that.
4. While you're there, look for the Options line in the same stanza and add +ExecCGI to the end.
5. Edit /etc/apache2/apache2.conf and search for PHP. No, not the line that already includes index.php; keep going to the lines that look something like
#AddType application/x-httpd-php .php

Uncomment these. Now PHP should work. The next step is to enable CGI.
6. Still in /etc/apache2/apache2.conf, search for CGI. Eventually you'll get to
# To use CGI scripts outside /cgi-bin/:
#

7. Finally, disable automatic start of apache at boot time (I don't need a web server running on my workstation every day, only on days when I'm actually doing web development). I think some upcoming Ubuntu release may offer a way to do that through Upstart, but for now, I
mv /etc/init.d/apache /etc/noinit.d

(having previously created /etc/noinit.d for that purpose).

Tags: , ,
[ 18:54 Mar 24, 2007    More tech/web | permalink to this entry | comments ]

New "Amabot" Phishing Scam Spoofing Amazon

I get tons of phishing scam emails spoofing Amazon. You know, the ones that say "Your Amazon account may have been compromised: please click here to log in and verify your identity", and if you look at the link, it goes to http://123.45.67.8/morestuff instead of http://www.amazon.com/morestuff. I get lots of similar phishing emails spoofing ebay and various banks.

http://www.amazon.com/gp/amabot/?pf_rd_url=http://211.75.237.149/%20%20/amazon/xec.php?cmd=sign-in

Check it out: they're actually using amazon.com, and Amazon has a 'bot called amabot that redirects you to somewhere else. Try this, for example: http://www.amazon.com/gp/amabot/?pf_rd_url=http://bn.com -- you start on Amazon's site and end up at Barnes & Noble.

When a family member got tricked by a phish email a few months ago (fortunately she became suspicious and stopped before revealing anything important) I gave her a quick lesson in how URLs work and how to recognize the host part. "If the host part isn't what you think it should be, it's probably a scam," I told her. That's pretty much the same as what Amazon says (#6 on their "Identifying Phishing or Spoofed E-mails" page). I guess now I need to teach her how to notice that there's another URL embedded in the original one, even when the original one goes to the right place. That's a bit more advanced. I suspect a lot of anti-phishing software uses the same technique and wouldn't have flagged this URL.

I reported the phish to Amazon (so far, just an automated reply, but it hasn't been very long). I hope they look into this use of their amabot and consider whether such a major phishing target really needs a 'bot that can redirect anywhere on the net.

Tags: , ,
[ 11:34 Oct 24, 2006    More tech/web | permalink to this entry | comments ]

Internet Explorer under WINE

I've been updating some web pages with tricky JavaScript and CSS, and testing to see if they work in IE (which they never do) is a hassle involving a lot of pestering of long suffering friends.

I've always heard people talk about how difficult it is to get IE working on Linux under WINE. It works in Crossover Office (which is a good excuse to get Crossover: the company, Codeweavers, is a good open source citizen and has contributed lots of work to WINE, and I've bought from them in the past) but most people who try installing IE under regular WINE seem to have problems.

Today someone pointed me to IEs 4 Linux. It's a script that downloads IE and installs it under WINE. You need wine and cabextract installed. I was sure it couldn't be that simple, but it seemed easy enough to try.

It works great! Asked me a couple of questions, downloaded IE, installed it, gave me an easy-to-run link in ~/bin, and it runs fine. Now I can test my pages myself without pestering my friends. Good stuff!

Tags: , ,
[ 15:21 Sep 04, 2006    More tech/web | permalink to this entry | comments ]

Fri, 04 Aug 2006

Every time I click on a mailto link, Firefox wants to bring up Evolution. That's a fairly reasonable behavior (I'm sure Evolution is configured as the default mailer somewhere on my system even though I've never used it) but it's not what I want, since I have mutt running through a remote connection to another machine and that's where I'd want to send mail. Dismissing the dialog is an annoyance that I keep meaning to find a way around.

But I just learned about two excellent solutions:

First: network.protocol-handler.warn-external.mailto
Set this preference to TRUE (either by going to about:config and searching for mailto, then doubleclicking on the line for this preference, or by editing the config.js or user.js file in your firefox profile) and the next time you click on a mailto link, you'll get a confirmation dialog asking whether you really want to launch an external mailer.

"Ew! Cancelling a dialog every time is nearly as bad as cancelling the Evolution launch!" Never fear: this dialog has a "Don't show me this again" checkbox, so check it and click Cancel and Firefox will remember. From then on, clicks on mailto links will be treated as no-ops.

"But wait! It's going to be confusing having links that do nothing when clicked on. I'm not going to know why that happened!" Happily, there's a solution to that, too: you can set up a custom user style (in your chrome/userContent.css directory) to show a custom icon when you mouse over any mailto link. Shiny!

Tags: , , ,
[ 21:19 Aug 04, 2006    More tech/web | permalink to this entry | comments ]

Firefox for Presentations: Hiding the URLbar

I've long been an advocate of making presentations in HTML rather than using more complex presentation software such as PowerPoint, Open Office Presenter, etc. For one thing, those presentation apps are rather heavyweight for my poor slow laptop. For another, you can put an HTML presentation on the web and everyone can see it right away, without needing to download the whole presentation and fire up extra software to see it.

The problem is that Mozilla's fullscreen mode doesn't give you an easy way to get rid of the URL/navigation bar, so your presentations look like you're showing web pages in a browser. That's fine for some audiences, but in some cases it looks a bit unpolished.

In the old Mozilla suite, I solved the problem by having a separate profile which I used only for presentations, in which I customized my browser to show no urlbar. But having separate profiles means you always have to specify one when you start up, and you can't quickly switch into presentation mode from a running browser. Surely there was a better way.

After some fruitless poking in the source, I decided to ask around on IRC, and Derek Pomery (nemo) came up with a wonderful CSS hack to do it. Just add one line to your chrome/userChrome.css file.

In Firefox:

#toolbar-menubar[moz-collapsed=true] + #nav-bar { display: none !important; }


In Seamonkey:

#main-menubar[moz-collapsed=true] + #nav-bar { display: none !important; }


This uses a nice CSS trick I hadn't seen before, adjacent sibling selectors, to set the visibility of one item based on the state of a sibling which appears earlier in the DOM tree.

(A tip for using the DOM Inspector to find out the names of items in fullscreen mode: since the menus are no longer visible, use Ctrl-Shift-I to bring up the DOM Inspector window. Then File->Inspect a Window and select the main content window, which gets you the chrome of the window, not just the content. Then you can explore the XUL hierarchy.)

This one-line CSS hack turns either Firefox or Seamonkey into an excellent presentation tool. If you haven't tried using HTML for presentations, I encourage you to try it. You may find that it has a lot of advantages over dedicated presentation software.

Addendum: I probably should have mentioned that View->Toolbars->Navigation Controls turns off the toolbar if you just need it for a one-time presentation or can't modify userChrome.css. You have to do it before you flip to fullscreen, of course, since the menus won't be there afterward, and then again when you flip back. I wasn't happy with this solution myself because of the two extra steps required every time, particularly because the steps are awkward since they require using the laptop's trackpad.

Tags: , , ,
[ 17:59 Apr 25, 2006    More tech/web | permalink to this entry | comments ]

Glancing Through Web Stats

I'm not very consistent about looking at the statistics on my web site. Every now and then I think of it, and take a look at who's been visiting, why, and with what, and it's always entertaining.

The first thing I do is take the apache log and run webalizer on it, to give me a breakdown of some of the "top" lists.

Of course, I'm extremely interested in the user agent list: which browsers are being used most often? As of last month, the Shallowsky list still has MSIE 6.0 in the lead ... but it's not as big a lead as it used to be, at 56.04%. Mozilla 5.0 (which includes all Gecko- based browsers, as far as I know, including Mozilla, Firefox, Netscape 6 and 7, Camino, etc.) is second with 20.31%. Next are four search engine 'bots, and then we're into the single digit percentages with a couple of old IE versions and Opera.

AvantGo (they're still around?) is number 11 with 0.37% -- interesting. It looks like they're grabbing the Hitchhiker's Guide to the Moon; then there are a bunch of lines like:

sync37.avantgo.com - - [05/Apr/2006:14:29:25 -0700] "GET / HTTP/1.0" 200 4549 "http://www.nineplanets.org/" "Mozilla/4.0 (compatible; AvantGo 6.0; FreeBSD)"

and I'm not sure how to read that (nineplanets.org is The Nine Planets, Bill Arnett's excellent and justifiably popular planetary site, and he and I have cross-links, but I'm not sure what that has to do with avantgo and my site). Not that it's a problem: of course, anyone is welcome to read my site on a PDA, via AvantGo or otherwise. I'm just curious.

Amusingly, the last user agent in the top fifteen is GIMP Layers, syndicating this blog.

Another interesting list is the search queries: what search terms did people use which led them to my site? Sometimes that's more interesting than other times: around Christmas, people were searching for "griffith park light show" and ending up at my lame collection of photos from a previous year's light show. I felt so sorry for them: Griffith Park never puts any information on the web so it's impossible to find out what hours and dates the light show will be open, so I know perfectly well why they were googling, and they certainly weren't getting any help from me. I would have put the information there if I'd known -- but I tried to find out and couldn't find it either.

But this month, no one is searching on anything unusual. The top searches leading to my site for the past two months are terms like birds, gimp plugins, linux powerpoint, mini laptops, debian chkconfig, san andreas fault, pandora, hummingbird pictures, fiat x1/9, jupiter's features, linux photo, and a rather large assortment of dirt bike queries. (I have very little dirt bike content on my site, but people must be desperate to find web pages on dirt bikes because those always show up very prominently in the search string list.)

Most popular pages are this blog (maybe just because of RSS readers), the Hitchhiker's Guide to the Moon, and bird photos, with an assortment of other pages covering software, linux tips, assorted photo collections, and, of course, dirt bikes.

That's most of what I can get from webalizer. Now it's time to look at the apache error logs. I have quite a few 404s (missing files). I can clean up some of the obvious ones, and others are coming from external sites I can't do anything about that for some reason link to filenames I deleted seven years ago; but how can I get a list of all the broken internal links on my site, so at least I can fix the errors that are my own fault?

Kathryn on Linuxchix pointed me to dead-links.com, a rather cool site. But it turns out it only looks for broken external links, not internal ones. That's useful, too, just not what I was after this time. Warning: if you try to save the page from firefox, it will start running all over again. You have to copy the content and paste it into a file if you want to save it.

But Kathryn and Val opined that wget was probably the way to go for finding internal links. Turns out wget has an option to delete each file after downloading it, so you can wget a whole site but not actually need to use the local space to duplicate the site. Use this command:

wget --recursive -nd -nv --delete-after --domains=domain.com http://domain.com/ | tee wget.out 2>&1


Now open the resulting file in an editor and search repeatedly for ERROR to find all the broken links. Unfortunately the errors are on a separate line from the filenames they reference, so you can't just use a grep. wget also gets some things wrong: for instance, it tries to download the .class file of a Java applet inside a .jar, then reports an error when the class doesn't exist. (--reject .class might help that.) Still, it's not hard to skip past these errors, and wget does seem to be a fairly good way of finding broken internal links.

There's one more check left to do in the access log. But that's a longer story, and a posting for another day.

Tags: ,
[ 21:43 Apr 14, 2006    More tech/web | permalink to this entry | comments ]

How to Search Your Mozilla Cache

Ever want to look for something in your browser cache, but when you go there, it's just a mass of oddly named files and you can't figure out how to find anything?

(Sure, for whole pages you can use the History window, but what if you just want to find an image you saw this morning that isn't there any more?)

Here's a handy trick.

First, change directory to your cache directory (e.g. $HOME/.mozilla/firefox/blahblah/Cache). Next, list the files of the type you're looking for, in the order in which they were last modified, and save that list to a file. Like this: % file ls -1t | grep JPEG | sed 's/: .*//' > /tmp/foo  In English: ls -t lists in order of modification date, and -1 ensures that the files will be listed one per line. Pass that through grep for the right pattern (do a file * to see what sorts of patterns get spit out), then pass that through sed to get rid of everything but the filename. Save the result to a temporary file. The temp file now contains the list of cache files of the type you want, ordered with the most recent first. You can now search through them to find what you want. For example, I viewed them with Pho: pho cat /tmp/foo  For images, use whatever image viewer you normally use; if you're looking for text, you can use grep or whatever search you lke. Alternately, you could ls -lt cat foo to see what was modified when and cut down your search a bit further, or any other additional paring you need. Of course, you don't have to use the temp file at all. I could have said simply: pho ls -1t | grep JPEG | sed 's/: .*//'  Making the temp file is merely for your convenience if you think you might need to do several types of searches before you find what you're looking for. Tags: , , , , , , , [ 22:40 Oct 10, 2005 More tech/web | permalink to this entry | comments ] Tue, 04 Oct 2005 Hacking Mozilla Extension Versions Mozilla Firefox's model has always been to dumb down the basic app to keep it simple, and require everything else to be implemented as separately-installed extensions. There's a lot to be said for this model, but aside from security (the need to download extensions of questionable parentage from unfamiliar sites) there's another significant down side: every time you upgrade your browser, all your extensions become disabled, and it may be months before they're updated to support the new Firefox version (if indeed they're ever updated). When you need extensions for basic functionality, like controlling cookies, or basic sanity, like blocking flash, the intervening months of partial functionality can be painful, especially when there's no reason for it (the plug-in API usually hasn't changed, merely the version string). It turns out it's very easy to tweak your installed plug-ins to run under your current Firefox version. 1. Locate your profile directory (e.g.$HOME/firefox/blah.blah for Firefox on Linux).
2. Edit profiledirectory/extensions/*/install.rdf
3. Search for maxVersion.
4. Update it to your current version (as shown in the Tools->Extensions dialog).
5. Restart the browser.

Disclaimer: Obviously, if the Firefox API really has changed in a way that makes it incompatible with your installed extensions, this won't be enough. Your extensions may fail to work, crash your browser, delete all your files, or cause a massive meteorite to strike the earth causing global extinction. Consider this a temporary solution; do check periodically to see if there's a real extension update available.

Tags: , , ,
[ 19:47 Oct 04, 2005    More tech/web | permalink to this entry | comments ]

Changing User Agent to Pretend to be MSIE6

In the wake of the Hurricane Katrina devastation, one of FEMA's many egregious mistakes is that their web site requires IE 6 in order for victims to register for relief.

It's mostly academic. The Katrina victims who need help the most didn't own computers, have net access, or, in many cases, even know how to use the web. Even if they owned computers, those computers are probably underwater and their ISP isn't up.

Nevertheless, some evacuees, staying with friends or relatives, or using library or other public access computers, may need to register for help using FEMA's web site.

It turns out that it's surprisingly difficult to google for the answer to the seemingly simple question, "How do I make my browser spoof IE6?" Here's the simple answer.

Opera: offers a menu to do this, and always has.

Mozilla or Firefox: the easiest way is to install the User Agent Switcher extension. Install it, restart the browser and you get a user-agent switching menu which includes an IE6 option.

To change the user agent on Mozilla-based browsers without the extension:

2. Right-click in the window (on Mac I think that's cmd-click to get a context menu?) and select New->String
3. Use general.useragent.override for the preference name, and Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) for the value.
I think this takes effect immediately, no need to restart the browser.

Safari (thanks to Rick Moen on svlug:

1. Exit Safari. Open Terminal.
2. Type defaults write com.apple.Safari IncludeDebugMenu -boolean true
3. Restart Safari.
Safari's menu bar will now include Debug, which has an option to change the user agent.

If you do change your user agent, please change it back after you've finished whatever business required it. Otherwise, web site administrators will think you're another IE user, and they they'll use that as justification for making more ridiculous IE-only pages like FEMA's. The more visits they see from non IE browsers, the more they'll realize that not everyone uses IE.

Tags: ,
[ 13:35 Sep 11, 2005    More tech/web | permalink to this entry | comments ]

Catching Up on Firefox Regressions

I spent a little time this afternoon chasing down a couple of recent Firefox regressions that have been annoying me.

First, the business where, if you type a url into the urlbar and hit alt-Enter (ctrl-Enter in my Kitfox variant) to open a new tab, if you go back to the old tab you still see the new url in the urlbar, which doesn't match the page being displayed there.

That turns out to be bug 227826, which was fixed a week and a half ago. Hooray!

Reading that bug yielded a nice Mozilla tip I hadn't previously known: hitting ESC when focus is in the urlbar will revert the urlbar to what it should be, without needing to Reload.

The other annoyance I wanted to chase down is the new failure of firefox -remote to handle URLs with commas in them (as so many news stories have these days); quoting the url is no help, because it no longer handles quotes either. That means that trying to call a browser from another program such as an IRC client is doomed to fail for any complex url.

That turns out to be a side effect of the check-in for bug 280725, which had something to do with handling non-ASCII URLs on Windows. I've filed bug 298960 to cover the regression.

That leaves only one (much more minor) annoyance: the way the selection color has changed, and quite often seems to give me white text on a dingy mustard yellow background. I think that's because of bug 56314, which apparently makes it choose a background color that's the reverse of the page's background, but which then doesn't seem to choose a contrasting foreground color.

It turns out you can override this if you don't mind specifying a single fixed set of selection colors (instead of having them change with the colors of every page). In userChrome.css (for the urlbar) and userContent.css (for page content):

::-moz-selection {
background-color: magenta;
color: white;
}

(obviously, pick any pair of colors which strikes your fancy).

Tags: , , ,
[ 21:45 Jun 27, 2005    More tech/web | permalink to this entry | comments ]

Tue, 08 Feb 2005

Turns out the Novell Ad requires flash 7, and just runs partially (but with no errors explaining the problem) with flash 6. About 2/3 of the linux users I polled on #linuxchix had the same problem as I did (still on flash 6).

I installed flash 7.0r25, and now I get video and sound (albeit with the usual flash "way out of sync" problem), but mozilla 1.8a6 crashes when leaving the page (I filed a talkback report).

Still not a great face to show migrating customers. Oh, well, maybe it works better on Novell Linux ...

Tags: , ,
[ 18:33 Feb 08, 2005    More linux | permalink to this entry | comments ]

Novell Can't Manage a "Migrate to Linux" Page That Works In Linux?

Someone on IRC posted a link to a Novell ad trying to persuade people to migrate from Windows to Linux.

It's flash, so I saw the flash click-to-view button. I clicked it, and something downloaded and showed play controls (a percent-done slider and a pause button). The controls respond, but no video ever appears.

Thinking maybe it was a problem with click-to-view, I tried it in my debug profile, with mostly default settings. No dice: even without click-to-view, the page just plain doesn't work in Linux Mozilla. Didn't work in Firefox either (though I don't have a Firefox profile without click-to-view, admittedly). People on Windows and Mac report that it works on those platforms.

I thought to myself, Novell is trying to be pro-Linux, they'll probably want to know about this. So I went up one level to try to find a contact address (there isn't one on the migration page). I didn't find any email addresses but I did find a feedback link, so I clicked it. It popped up an empty window, which sat empty for a minute or two, then filled with "Novell Account: Mal-formed reply from origin s". Any text which might follow that is cut off, doesn't fit in the window size they specified.

What does Novell expect customers to think when they migrate one machine to Linux, start using it to surf the web, and discover that they can't even read Novell's own pro-Linux pages from Linux? What sort of impression is that going to make on someone considering migrating a whole shop?

Fortunately sites like Novell's which don't work in Linux and Mozilla are the exception, not the rule. I can surf most of the web just fine; it's only a few bad apples who can't manage to write cross-platform web pages. But someone early in the migration process doesn't know that. They're more likely to just stop right there.

Tags: , ,
[ 12:30 Feb 08, 2005    More linux | permalink to this entry | comments ]

Mozilla tip: highlight links that would open a new window

Investigating some of the disappointing recent regressions in Mozilla (in particular in handling links that would open new windows, bug 278429), I stumbled upon this useful little tidbit from manko, in the old bug 78037:

You can use CSS to make your browser give different highlighting for links that would open in a different window.

Put something like this in your [moz_profile_dir]/chrome/userContent.css:

a[target="_blank"] {
-moz-outline: 1px dashed invert !important;
/* links to open in new window */
}

a:hover[target="_blank"] {
color: red; background-color: yellow
!important
}

a[href^="http://"] {
-moz-outline: 1px dashed #FFCC00 !important;
/* links outside from current site */
}

a[href^="http://"][target="_blank"] {
-moz-outline: 1px dashed #FF0000 !important;
/* combination */
}


I questioned the use of outlines rather than colors, but then realized why manko uses outlines instead: it's better to preserve the existing colors used by each page, so that link colors go along with the page's background color.

I tried adding a text-decoration: blink; to the a:hover style, but it didn't work. I don't know whether mozilla ignores blink, or if it's being overridden by the line I already had in userContent.css,

blink { text-decoration: none ! important; }

though I doubt that, since that should apply to the blink tag, not blink styles on other tags. In any case, the crosshair cursor should make new-window links sufficiently obvious, and I expect the blinking (even only on hover) would have gotten on my nerves before long.

Incidentally, for any web designers reading this (and who isn't, these days?), links that try to open new browser windows are a longstanding item on usability guru Jakob Neilsen's Top Ten Mistakes in Web Design, and he has a good explanation why. I'm clearly not the only one who hates them.

For a few other mozilla hacks, see my current userChrome.css and userContent.css.

Tags: , , ,
[ 14:03 Jan 17, 2005    More tech/web | permalink to this entry | comments ]

Web pages with ugly fonts: Mozilla thinks they're Russian

For years I've been plagued by having web pages occasionally display in a really ugly font that looks like some kind of ancient OCR font blockily scaled up from a bitmap font.

For instance, look at West Valley College page, or this news page.

I finally discovered today that pages look like this because Mozilla thinks they're in Cyrillic! In the case of West Valley, their server is saying in the http headers:

Content-Type: text/html; charset=WINDOWS-1251

-- WINDOWS-1251 is Cyrillic -- but the page itself specifies a Western character set:
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
`

On my system, Mozilla believes the server instead of the page, and chooses a Cyrillic font to display the page in. Unfortunately, the Cyrillic font it chooses is extremely bad -- I have good ones installed, and I can't figure out where this bad one is coming from, or I'd terminate it with extreme prejudice. It's not even readable for pages that really are Cyrillic.

The easy solution for a single page is to use Mozilla's View menu: View->Character Encoding->Western (ISO-8851-1). Unfortunately, that has to be done again for each new link I click on the site; there seems to be no way to say "Ignore this server's bogus charset claims".

The harder way: I sent mail to the contact address on the server page, and filed bug 278326 on Mozilla's ignoring the page's meta tag (which you'd think would override the server's default), but it was closed with the claim that the standard requires that Mozilla give precedence to the server. (I wonder what IE does?)

At least that finally inspired me to install Mozilla 1.8a6, which I'd downloaded a few days ago but hadn't installed yet, to verify that it saw the same charset. It did, but almost immediately I hit a worse bug: now mozilla -remote always opens a new window, even if new-tab or no directive at all is specified. The release notes have nothing matching "remote, but someone had already filed bug 276808.

Tags: , , ,
[ 20:15 Jan 13, 2005    More tech/web | permalink to this entry | comments ]

Mozilla Developer's Conference

The Mozilla Dev Conference yesterday went well. Shaver and Brendan showed off a new implementation they'd hacked up with Stuart allowing drawing into a graphics area from JavaScript, modelled after Apple's Canvas API. The API looked pretty simple from the code snippet they showed briefly, with commands for line, polygon, fill, and so forth. It also included full transparency support. This is all implemented in terms of Cairo.

Someone asked how this compared to SVG. The answer was to think of Canvas as an image you can change from JS -- simpler than an SVG document.

Brendan was funny, playing Vanna as Shaver did the brunt of the talking. "Ooh, that's pretty. What's that?"

Roc then gave a talk on "New Rendering Features for Gecko". Probably what attracted the most interest there was transparency: he has a new hack (not yet checked in) where you can add a parameter to a XUL window to make it transparent. X only supports 1-bit transparency, but in Windows implementation XUL windows can be fully transparent.

He began his talk talking about Cairo and about the changed hardware expectations these days. He stated that everyone has 3D now, or at least, anyone who doesn't, doesn't care about rendering and doesn't expect much. I found that rather disturbing, given that I sure don't want to see rendering stop working well on my laptop, and I'd hate to see Mozilla ignore education, developing countries and other markets where open source on cheap hardware is starting to gain a strong foothold.

The other bothersome thing Roc talked about was high-res displays. He mentioned people at IBM and other places using 200dpi displays, which (as anyone who's used even 100dpi and has imperfect vision knows) leads to tiny text and other display problems on a lot of pages due to the ubiquity of page designers who use pixel-based sizing. Roc's answer to this was to have an automatic x2 or x3 zoom for people at high resolutions like 200dpi. This seems to me a very poor solution: text will either be too big or too small, and images will be scaled weirdly. Perhaps if it's implemented as a smart font size scaling, without any mandatory image scaling, it could be helpful. I wish more work were going into Mozilla's text scaling, rather than things like automatic 2x zooms. Maybe this will be part of the work. Guess I need to seek out the bugs and get involved before I worry too much about right or wrong solutions.

Then AaronL gave his accessibility talk, stressing that "accessibility helps everybody" and that the minimum everyone should do is check pages and new XUL objects for keyboard accessibility. He talked a bit about how screen reading software works, with a demo, color-blindness issues (don't ever use color as the only cue), and accessibility problems with the current fad of implementing fake menus using JS and DHTML (such menus are almost never accessible to screen reading software, and often can't be triggered with keyboard events either). Hopefully awareness of these issues will increase as legislation mandates better accessibility. Aaron's talk was unfortunately cut short because he was scheduled as the last talk before lunch; people seemed interested and there was a lot of information on his slides which got skipped due to time constraints.

After lunch, Nigel spoke on writing XUL applications, Bob Clary presented an automated site testing tool he'd written (which runs in Mozilla) to validate HTML, CSS and JS, roc spoke again on the question of how backwards compatible and quirk-compatible Mozilla should be, Myk presented his RSS reading addition to Thunderbird mail, Pav gave a longer demo of the Cairo Canvas, and several other demos were presented.

Tags: , ,
[ 11:30 Aug 07, 2004    More tech/web | permalink to this entry | comments ]

Syndicated on:
LinuxChix Live
Ubuntu Women
Women in Free Software
Graphics Planet
DevChix
Ubuntu California
Planet Openbox
Devchix
Planet LCA2009

Friends' Blogs:
Morris "Mojo" Jones
Jane Houston Jones
Dan Heller
Long Live the Village Green
Ups & Downs
DailyBBG

Other Blogs of Interest:
DevChix