Shallow Thoughts

Akkana's Musings on Open Source, Science, and Nature.

Tue, 05 Aug 2008

In Praise of Logical AND. In Censure of Invasive Cookies.

The tech press is in a buzz about the new search company, Cuil (pronounced "cool"). Most people don't like it much, but are using it as an excuse to rhapsodize about Google and why they took such a commanding lead in the search market, PageRank and huge data centers and all those other good things Google has.

Not to run down PageRank or other Google inventions -- Google does an excellent job at search these days (sometimes spam-SEO sites get ahead of them, but so far they've always caught up) -- but that's not how I remember it. Google's victory over other search engines was a lot simpler and more basic than that. What did they bring?

Logical AND.

Most of you have probably forgotten it since we take Google so for granted now, but back in the bad old days when search engines were just getting started, they all did it the wrong way. If you searched for red fish, pretty much all the early search engines would give you all the pages that had either red or fish anywhere in them. The more words you added, the less likely you were to find anything that was remotely related to what you wanted.

Google was the first search engine that realized the simple fact (obvious to all of us who were out there actually doing searches) that what people want when they search for multiple words is only the pages that have all the words -- the pages that have both red and fish. It was the search engine where it actually made sense to search for more than one word, the first where you could realistically narrow down your search to something fairly specific.

Even today, most site searches don't do this right. Try searching for several keywords on your local college's web site, or on a retail site that doesn't license Google (or Yahoo or other major search engine) technology.

Logical and. The killer boolean for search engines.

(I should mention that Dave, when he heard this, shook his head. "No. Google took over because it was the first engine that just gave you simple text that you could read, without spinning blinking images and tons of other crap cluttering up the page." He has a point -- that was certainly another big improvement Google brought, which hardly anybody else seems to have realized even now. Commercial sites get more and more cluttered, and nobody notices that Google, the industry leader, eschews all that crap and sticks with simplicity. I don't agree that's why they won, but it would be an excellent reason to stick with Google even if their search results weren't the best.)

So what about Cuil? I finally got around to trying it this morning, starting with a little "vanity google" for my name. The results were fairly reasonable, though oddly slanted toward TAC, a local astronomy group in which I was fairly active around ten years ago (three hits out of the first ten are TAC!)

Dave then started typing colors into Cuil to see what he would get, and found some disturbing results. He has Firefox' cookie preference set to "Ask me before setting a cookie" -- and it looks like Cuil loads pages in the background, setting cookies galore for sites you haven't ever seen or even asked to see. For every search term he thought of, Cuil popped up a cookie request dialog while he was still typing.

Searching for blu wanted to set a cookie for bluefish.something.
Searching for gre wanted to set a cookie for www.gre.ac.uk.
Searching for yel wanted to set a cookie for www.myyellow.com.
Searching for pra wanted to set a cookie for www.pvamu.edu.

Pretty creepy, especially when combined with Cuil's propensity (noted by every review I've seen so far, and it's true here too) for including porn and spam sites. We only noticed this because he happened to have the "Ask me" pref set. Most people wouldn't even know. Use Cuil and you may end up with a lot of cookies set from sites you've never even seen, sites you wouldn't want to be associated with. Better hope no investigators come crawling through your browser profile any time soon.

Tags: , ,
[ 10:10 Aug 05, 2008    More tech | permalink to this entry ]

Fri, 04 Jul 2008

Learning about Firefox 3 extensions

Oops! Right after I posted that last entry, I discovered that my little kitfox extension wasn't working as well as I'd thought. And the more I hacked it, the less well it worked, and the more I discovered was missing, like a chrome.manifest file (which firefox 2 hadn't seemed to need).

Eventually some very helpful folks on #extdev pointed me to Ted Mielczarek's excellent Extension Wizard. Give it some details about your extension (its name and version, your name, and a couple things you might want like a toolbar button, a prefs panel and a context menu) and it generates a zipped directory containing a bare bones extension, even including niceties like internationalized strings.

Even better, your new extension skeleton includes a readme that tells you how to leave the extension expanded while you work on it. That's quite a bit easier than building the XPI file and installing it each time.

So kitfox has a 0.3 version (in the unlikely event that anybody besides me wants it).

There's a project called fizzypop to develop and extend useful Mozilla dev tools like the Extension Wizard ... watch that space for more details.

Tags: , ,
[ 20:12 Jul 04, 2008    More tech/web | permalink to this entry ]

Making Firefox 3 livable

I finally broke down and spent the time to get Firefox 3 working properly for me ... meaning, mostly, finding replacement extensions for the bare minimum of what I need in a browser: control over cookies (specifically, enabling/disabling them for specific sites), flashblock, and blocking of animated images. I'd downloaded extensions for all those a few weeks ago, but I found that although Firefox 3.0 said the FF3 extensions were active, and Firefox 2 said the old ones were, neither set actually worked.

I decided to start from scratch: remove all extensions -- rm -rf .mozilla/firefox/extensions/* .mozilla/firefox/extensions.* plus apt-get remove firefox-2-dom-inspector -- then install a new set of Firefox 3 add-ons.

After much hunting (I sure wish addons.mozilla.org would offer a way to limit the view to only extensions that work with Firefox 3! Combing through 15 pages of extensions looking for the handful that will actually install gets old fast) I found the replacements I needed: CS Lite for the cookie controls, a newer Flashblock, and Custom Toolbar Buttons as a stopgap for image animation (though I suspect updating anidisable will be a better solution in the long run). This time, with the old firefox 2 extensions purged, the new ones took hold and worked.

I also added a nice extension called OpenBook that fixes the horrible Firefox "Add bookmark" dialog. You know: the one that has two nearly identical dropdown category menus side by side, with the bigger one giving you only a tiny subset of your bookmark categories, and the smaller one being the real one. The one that doesn't offer a space for keyword, so to set up a bookmarklet you have to Add Bookmark, OK, Organize Bookmarks, find the bookmark you just added, Ctrl-I to get the Bookmark info dialog, and finally you can add your keyword. OpenBook gives you a dialog where you can set the keyword to begin with, and it only gives you one menu to list categories so you aren't constantly tempted to click on the wrong one.

Now for the urlbar -- that new firefox 3 "smarter" urlbar that slows down typing in the middle of a word so it can pop up a big fancy window full of guesses (all wrong) about where I might be trying to go. Actually, even if the guesses were right, it wouldn't help, because I'd have to stop typing, search the list visually, then if one of the suggestions was right, move my hand to the mouse or the arrow keys to choose that suggestion. That takes way longer than just typing the url.

But I guess I don't mind unhelpful suggestions popping up as long as it doesn't mess up focus (preventing me from clicking or tabbing to other apps on my screen) or slow down typing. Firefox 3 seems to be handling the focus issue better than firefox 2 did, but the slowdown was quite noticeable on the poor old laptop. So I wanted a way to disable the behavior. A little googling revealed that the Firefox crew immodestly calls their new urlbar the "awesomebar", which aside from giggle factor also proves quite useful in googling: a search on firefox disable awesomebar reveals that I'm not the only one who doesn't like it, and got me several preferences I could tweak in about:config plus a couple of extensions to turn it off entirely. I won't try to summarize, since the best settings depend on your machine's spec, plus personal preference.

Making progress! Now the only issue was getting my urlbar tweaks working, so that typing <Ctrl-Return> after typing a URL opened the URL in a new tab instead of tacking on various silly extensions (oh, yes, of course I wanted to go to http://www.firefox disable awesomebar.com rather than googling for those terms in a new tab). Fortunately, it turned out that the javascript that runs the urlbar has changed very little since firefox 2, and I hardly needed to change anything to get my kitfox extension (v. 0.2) working in Firefox 3.

Only one more issue: this blog. The CSS that handles the right sidebar wasn't displaying right. Seems that Firefox 2 has changed something about its interpretation of CSS, so it was floating the right sidebar way down to the bottom of the page below the last content line. Eventually (after adding firefox-3.0-dom-inspector, another extension that had stopped working in the transition) I discovered the problem: the #content was set to width: 77% while the #rightsidebar's left-margin was at 76%. Apparently Firefox 2 rounded up as needed, whereas Firefox 3 just ignores the left-margin if it would overlap the content, and then floats the sidebar anywhere it thinks it can fit it. Fixing those percentages helped quite a bit, and I added an overflow-x: hidden (on a tip from a helpful person in #firefox) so that wide calendar doesn't hurt layout for narrow windows. I think it's working now ... any readers having problems with the layout in any browser, by all means let me know.

Tags: , , , ,
[ 11:04 Jul 04, 2008    More tech/web | permalink to this entry ]

Thu, 12 Jun 2008

Making Firefox default to Portrait printing

I discovered a handy tip for Linux Firefox' printing Page Setup today.

Normal web page printing uses "Portrait" mode: you read the page with the paper oriented so that it's taller than it is wide.

Once a week, I need to print a form from a club web site to bring to the meetings. It's a table that's much wider than it is tall, so I want to print it that way: in "Landscape" mode.

In Firefox 2 (at least on Linux), you can't do that from the Print dialog -- there's no Portrait/Landscape option. So you have to use a separate dialog, Page Setup, following these steps:

  1. Run Page Setup
  2. Change Portrait to Landscape
  3. Click OK
  4. Print (bring up the Print dialog and click OK)
  5. Run Page Setup
  6. Change Landscape to Portrait
  7. Click OK
Kind of a lot of steps just to print one landscape page! But if you forget, the next page you print from Firefox will be printed in Landscape mode and will take twice as many pages as it should (if you don't notice what's happening and dive for the printer's OFF switch in time, that being the only way to cancel a printing job once it hits the printer).

This morning, it finally occurred to me that Firefox was storing this setting somehow, most likely in prefs.js. If I could find the setting and force it in user.js (which takes precedence over prefs.js and is not updated by Firefox), I could make Firefox set itself back to Portrait every time it starts up. (prefs.js and user.js are both generally found in $HOME/.mozilla/firefox/).

Some greppery-pokery revealed the solution. I needed only to add a line in user.js that looks like this:

user_pref("print.printer_CUPS/Epson.print_orientation", 0);
and presto! my problem was solved.

Oddly, it's set separately for every printer you have defined, even though there's no way to set one printer to Landscape while another one is still on Portrait (the Page Setup dialog is global, and applies to every printer Firefox knows about). "Epson" is the CUPS name of my primary printer; replace that with your printer's name (as set in CUPS), and add a similar line for each printer you have. For the printers I've used, 0 is Portrait and 1 is Landscape, but you can verify that by typing:

grep orientation prefs.js | grep name

That command will also help you if you're not sure what printers you have defined, or you don't use CUPS but want to try this under a different print spooler. (Don't be misled by all the orientation prefs with "tmp" in the name.)

As a minor digression, there's actually a secret pref that's supposed to give another way around the problem:

user_pref("print.whileInPrintPreview", true);
This lets you do all your printing from the Print Preview window, which offers its own Portrait and Landscape buttons. That would be a nice solution. Alas, the Portrait and Landscape buttons in that dialog currently don't work, and since this preference is undocumented and unmaintained, filing more bugs isn't likely to help.

(I should mention that this all pertains to Firefox 2. I haven't switched to Firefox 3 yet, so I don't know the state of its printing UI, or whether this preference is either helpful or effective there.)

Tags: , , ,
[ 20:07 Jun 12, 2008    More tech/web | permalink to this entry ]

Tue, 10 Jun 2008

Is ODF a standard, or not?

I'm confused about ODF. Remember a few years back when we kept reading about how governments and schools and everybody else should switch away from proprietary formats, and therefore they should use software that supported the open international standard of ODF (Open Document Format)? Never mind that at the time, there was only one program in existence that could read ODF (the then brand new Open Office 2, usually present only on new installs of cutting-edge Linux distros). Honest, we were told, lots of other open source word processors (meaning basically Abiword and Kword -- are there others?) would soon add ODF support in their upcoming releases -- so everybody should stop using .doc now and switch to Open Office 2.0.

Fast forward a few years. Now all the open source word processors support ODF, no problem, and there's even a plug-in available for MS Word even though MS is fighting with their usual underhanded tactics to get their latest Word format (OOXML) blessed as a standard too.

Meanwhile, Open Office 3.0 is in beta ... and since it actually has comment support that's usable (you can write new comments and read existing ones, and even see where they were in the document), I downloaded a copy and have been using it.

OOo 3.0 beta has a lot of problems reading and writing to .doc format, as it turns out. If I save something in any of the (three?) available .doc formats, then read it back in, lots of the formatting will have disappeared or changed. And about half the time, .doc files that I write will crash Dave's copy of Word 2003 if he tries to read them (yes, crashing says more about the quality of Word 2003 than about the quality of OOo 3 ... until you try to explain to your editor why your documents cause Word to crash.)

Anyway, I decided that the way to go was to save my intermediates as ODF until they're ready to submit, at which point I'll export a copy to Word 2000 format. Sounds straightforward, right?

So today, I was on the laptop (which doesn't have OOo 3 beta) and I used Ubuntu's existing OOo 2.4.0 to read in one of those .odt files I'd been saving.

And I got this warning:

[ screenshot of OOo3 format warning ]

This document was created by a newer version of OpenOffice.org. It may contain features not supported by your current version.

Click 'Update Now...' to run online update and get the latest version of OpenOffice.org.

I'm a little confused now. Wasn't ODF (.odt) was the format we were all supposed to use because it was an international standard, and therefore documents written by any program could be read by any other ODF-supporting program? And no one would be tied to any particular program or version?

I double-checked OOo 3's "Save as" file type menu, and the format I was using was:

ODF Text Document (.odt)
I don't see anything there about "OpenOffice 3.0 format (may not be readable by earlier versions or by other programs)." I just see the exact same string OOo 2.4.0 gives me for ODT -- for a format that apparently is not the same. Even Microsoft at least gives the option in Word of saving in formats that older Word versions used. It looks to me like ODF in OOo3 is a step backward, not a step forward. (In fact, it looks like OOo2 is reading the document just fine ... but I can't be sure after seeing that warning, and will have to check it very carefully before I send it anywhere.)

Can one of you ODF-enthusiasts please explain where I'm going wrong here, and why it makes sense to define an international standard format that's nevertheless different depending on what software wrote it? (I know this blog doesn't have comments, but I promise to publish here any comments I get if you say you want them published (and only then ... private comments are okay too). Here's my contact page.

Update, Jun 11 2008: Two helpful replies this morning.

Markku Korkeala tells me that the ODF standard has more than one version. OpenOffice 2.x writes ODF 1.1, while OO 3 writes 1.2. He also points me to Rob Weir's ODF Validation for Dummies, which includes a long discussion on XML validation methods (specifically, validation of ODF vs. Microsoft's OOXML).

Harm Hilvers writes a longer reply with some more useful information, which I'll include here:

ODF is a standard, but it's a constantly developing one. This should not be a problem, since as we all know the first ISO ODF was already quite comprehensive and complete. Newer versions are making ODF even more complete. This should not mean that ODF should be versioned in the Save As dialogue (although there might be differences in the ODF version's features) if OOo gives a list of the things that don't work in the older OOo if a newer ODF document is opened.

Personally I don't use Linux, but Mac OS X, and I use both OOo 3 and Pages (from the iWork office suite). Every time I open a Word document in Pages, it gives me a list of things that weren't imported properly because certain features are missing. OOo should do the same I guess: if a newer ODF document is opened in OOo 2 and one of the newer features is used, it should tell the user exactly that. That's a lot better than just telling the user that something might be wrong (or not).

(Akkana here:) What an excellent solution! I agree with Harm: if OOo had either mentioned ODF version numbers, or said something like "This document may use the 'foo' feature, which is not implemented in this version of Open Office", it would have gone a long way toward making me feel better about using ODF.

I still consider it a problem, though, that OpenOffice doesn't give you any option to save in a more backwards compatible format, nor does either version of OOo give you any hints about what might be incompatible. If you're using OOo 3 and you have to send a document to someone using software that can only read ODF 1.1 or 1.0, there's no way of knowing how much of your document they'll be able to read.

Thanks, both Markku and Harm, for the information.

Tags: , , ,
[ 21:49 Jun 10, 2008    More tech | permalink to this entry ]

Tue, 08 Apr 2008

Wrapping plaintext files in Firefox

A friend pointed me to a story she'd written. It was online as a .txt file. Unfortunately, it had no line breaks, and Firefox presented it with a horizontal scrollbar and no option to wrap the text to fit in the browser window.

But I was sure that was a long-solved problem -- surely there must be a userContent.css rule or a bookmarklet to handle text with long lines. The trick was to come up with the right Google query. Like this one: firefox OR mozilla wrap text userContent OR bookmarklet

I settled on the simple CSS rule from Tero Karvinen's page on Making preformated <pre> text wrap in CSS3, Mozilla, Opera and IE:

pre {
 white-space: -moz-pre-wrap !important;
}
Add it to chrome/userContent.css and you're done.

But some people might prefer not to apply the rule to all text. If you'd prefer a rule that can be applied at will, a bookmarklet would be better. Like the word wrap bookmarklet from Return of the Sasquatch or the one from Jesse Ruderman's Bookmarklets for Zapping Annoyances collection.

Tags: , , , , ,
[ 10:47 Apr 08, 2008    More tech/web | permalink to this entry ]

Sat, 20 Oct 2007

Firefox, caching, and fast Back/Forward buttons

I remember a few years ago the Mozilla folks were making a lot of noise about the "blazingly fast Back/Forward" that was coming up in the (then) next version of Firefox. The idea was that the layout engine was going to remember how the page was laid out (technically, there would be a "frame cache" as opposed to the normal cache which only remembers the HTML of the page). So when you click the Back button, Firefox would remember everything it knew about that page -- it wouldn't have to parse the HTML again or figure out how to lay out all those tables and images, it would just instantly display what the page looked like last time.

Time passed ... and Back/Forward didn't get faster. In fact, they got a lot slower. The "Blazingly Fast Back" code did get checked in (here's how to enable it) but somehow it never seemed to make any difference.

The problem, it turns out, is that the landing of bug 101832 added code to respect a couple of HTTP Cache-Control header settings, no-store and no-cache. There's also a third cache control header, must-revalidate, which is similar (the difference among the three settings is fairly subtle, and Firefox seems to treat them pretty much the same way).

Translated, that means that web servers, when they send you a page, can send some information along with the page that asks the browser "Please don't keep a local copy of this page -- any time you want it again, go back to the web and get a new copy."

There are pages for which this makes sense. Consider a secure bank site. You log in, you do your banking, you view your balance and other details, you log out and go to lunch ... then someone else comes by and clicks Back on your browser and can now see all those bank pages you were just viewing. That's why banks like to set no-cache headers.

But those are secure pages (https, not http). There are probably reasons for some non-secure pages to use no-cache or no-store ... um ... I can't think of any offhand, but I'm sure there are some.

But for most pages it's just silly. If I click Back, why wouldn't I want to go back to the exact same page I was just looking at? Why would I want to wait for it to reload everything from the server?

The problem is that modern Content Management Systems (CMSes) almost always set one or more of these headers. Consider the Linux.conf.au site. Linx.conf.au is one of the most clueful, geeky conferences around. Yet the software running their site sets

  Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
  Pragma: no-cache
on every page. I'm sure this isn't intentional -- it makes no sense for a bunch of basically static pages showing information about a conference several months away. Drupal, the CMS used by LinuxChix sets Cache-Control: must-revalidate -- again, pointless. All it does is make you afraid to click on links because then if you want to go Back it'll take forever. (I asked some Drupal folks about this and they said it could be changed with drupal_set_header).

(By the way, you can check the http headers on any page with: wget -S -O /dev/null http://... or, if you have curl, curl --head http://...)

Here's an excellent summary of the options in an Opera developer's blog, explaining why the way Firefox handle caching is not only unfriendly to the user, but also wrong according to the specs. (Darn it, reading sensible articles like that make me wish I wasn't so deeply invested in Mozilla technology -- Opera cares so much more about the user experience.)

But, short of a switch to Opera, how could I fix it on my end? Google wasn't any help, but I figured that this must be a reported Mozilla bug, so I turned to Bugzilla and found quite a lot. Here's the scoop. First, the code to respect the cache settings (slowing down Back/Forward) was apparently added in response to bug 101832. People quickly noticed the performance problem, and filed 112564. (This was back in late 2001.) There was a long debate, but in the end, a fix was checked in which allowed no-cache http (non-secure) sites to cache and get a fast Back/Forward. This didn't help no-store and must-revalidate sites, which were still just as slow as ever.

Then a few months later, bug 135289 changed this code around quite a bit. I'm still getting my head around the code involved in the two bugs, but I think this update didn't change the basic rules governing which pages get revalidated.

(Warning: geekage alert for next two paragraphs. Use this fix at your own risk, etc.)

Unfortunately, it looks like the only way to fix this is in the C++ code. For folks not afraid of building Firefox, the code lives in nsDocShell::ShouldDiscardLayoutState and controls the no-cache and no-store directives. In nsDocShell::ShouldDiscardLayoutState (currently lie 8224, but don't count on it), the final line is:

    return (noStore || (noCache && securityInfo));
Change that to
    return ((noStore || noCache) && securityInfo);
and Back/Forward will get instantly faster, while still preserving security for https. (If you don't care about that security issue and want pages to cache no matter what, just replace the whole function with return PR_FALSE; )

The must-validate setting is handled in a completely different place, in nsHttpChannel. However, for some reason, fixing nsDocShell also fixes Drupal pages which set only must-validate. I don't quite understand why yet. More study required. (End geekage.)

Any Mozilla folks are welcome to tell me why I shouldn't be doing this, or if there's a better way (especially if it's possible in a way that would work from an extension or preference). I'd also be interested in from Drupal or other CMS folks defending why so many CMSes destroy the user experience like this. But please first read the Opera article referenced above, so that you understand why I and so many other users have complained about it. I'm happy to share any comments I receive (let me know if you want your comments to be public or not).

Tags: , , , ,
[ 19:32 Oct 20, 2007    More tech/web | permalink to this entry ]

Wed, 04 Jul 2007

Make Amazon pages narrow enough to read

I like buying from Amazon, but it's gotten a lot more difficult since they changed their web page design to assume super-wide browser windows. On the browser sizes I tend to use, even if I scroll right I can't read the reviews of books, because the content itself is wider than my browser window. Really, what's up with the current craze of insisting that everyone upgrade their screen sizes then run browser windows maximized?

(I'd give a lot for a browser that had the concept of "just show me the page in the space I have". Opera has made some progress on this and if they got it really working it might even entice me away from Firefox, despite my preference for open source and my investment in Mozilla technology ... but so far it isn't better enough to justify a switch.)

I keep meaning to try the greasemonkey extension, but still haven't gotten around to it. Today, I had a little time, so I googled to see if anyone had already written a greasemonkey script to make Amazon readable. I couldn't find one, but I did find a page from last October trying to fix a similar problem on another website, which mentioned difficulties in keeping scripts working under greasemonkey, and offered a Javascript bookmarklet with similar functionality.

Now we're talking! A bookmarklet sounds a lot simpler and more secure than learning how to program Greasemonkey. So I grabbed the bookmarklet, a copy of an Amazon page, and my trusty DOM Inspector window and set about figuring out how to make Amazon fit in my window.

It didn't take long to realize that what I needed was CSS, not Javascript. Which is potentially a lot easier: "all" I needed to do was find the right CSS rules to put in userContent.css. "All" is in quotes because getting CSS to do anything is seldom a trivial task.

But after way too much fiddling, I did finally come up with a rule to get Amazon's Editorial Reviews to fit. Put this in chrome/userContent.css inside your Firefox profile directory (if you don't know where your profile directory is, search your disk for a file called prefs.js):

div#productDescription div.content { max-width: 90% !important; }

You can replace that 90% with a pixel measurement, like 770px, or with a different percentage.

I spent quite a long time trying to get the user reviews (a table with two columns) to fit as well, without success. I was trying things like:

#customerReviews > div.content > table > tbody > tr > td { max-width: 300px; min-width: 10px !important; }
div#customerReviews > div.content > table { margin-right: 110px !important; }
but nothing worked, and some of it (like the latter of those two lines) actually interfered with the div.content rule for reasons I still don't understand. (If any of you CSS gurus want to enlighten me, or suggest a better or more complete solution, or solutions that work with other web pages, I'm all ears!)

I'll try for a more complete solution some other time, but for now, I'm spending my July 4th celebrating my independance from Amazon's idea of the one true browser width.

Tags: , , , ,
[ 20:01 Jul 04, 2007    More tech/web | permalink to this entry ]