"Open up!"

[an 8-10 minute Toastmasters speech for the USGS Geospeakers club]

My husband and I were headed to Moab, Utah! That's one of my favorite places in the world. It's so beautiful, the desert and all the canyons carved by the Colorado river. And the rocks! Layer upon layer of them, all laid down during the time of the dinosaurs.

I wanted to know more about the rocks. So I went to the USGS web site to look for a geologic map that I could take along on my trip. They had just what I was looking for -- but when I downloaded the file, it ended with dot S I D. Nothing on my computer could open it. What on earth was it?

Eventually I figured out that it was a CLOSED FORMAT, and today I'll be telling you about the difference between CLOSED and OPEN formats. What makes one better than the other, and why should you care?

First, what does "format" mean? Every file on your computer's disk is encoded in some way. It's just bits, zeroes and ones -- but those bits represent something. Maybe they represent an email, an article you're writing, or a book you're reading. It might be a song you're listening to, or a photo you took with your camera. It might be a web page. Whatever it is, it's encoded in some way that lets the computer know how to handle it -- TEXT, HTML, WORD, JPEG, EXCEL. That's the format.

What does it mean for a format to be closed or open?

An OPEN format is one that's publically available. Everything you would need to know to read the format is published somewhere, so lots of people can write programs to read and write it. Some examples are text, HTML, JPEG and PDF.

In contrast, a CLOSED format -- also called a PROPRIETARY format -- is one that's controlled by a single company. You have to use software from that company to read or write it. For instance, Microsoft Word or Powerpoint, or Apple Quicktime video. If anybody else wants to write software to decode the file without paying for the privilege, the company that owns the format might sue.

"Sue?" you say. "Oh, come on. They wouldn't do that!"

That's probably what Dmitry thought. Dmitry Sklyarov was a programmer for a Russian company that makes ebooks accessible to blind people.

Dmitry discovered that Adobe's closed e-book format was very easy to decode, so he wrote a program that enabled blind people to read e-books they'd bought in the Adobe format.

He was invited to the US to present his findings at a conference in Las Vegas. But when he got there, he was arrested and thrown in jail. Dmitry spent 30 days in prison before protestors outside the Adobe building in San Jose finally convinced Adobe and the FBI to let him go home to his family -- and then only if he promised to testify against his company later.

But even if you're not worried about writing programs and getting sued, there are lots of other good reasons not to use closed formats.

The biggest reason is the risk that the format might disappear.

"Oh, come on!" you protest. "Companies like Microsoft and Adobe aren't going anywhere!" Oh yeah? That's what the genealogists thought about CommSoft. The company's program, called ROOTS, owned the genealogy market in the 80s, and everybody who was interested in tracing their family history used it. Nobody imagined it could ever disappear.

But in the mid 90s, competitors started springing up, and by 1997, CommSoft was hurting. They sold the company to Palladium, who sold it to Broderbund, which was purchased by The Learning Company, and so on ... and somewhere along the way, they stopped supporting the ROOTS software. Today there are thousands of family tree files in the ROOTS IV format that can't be read by any software available today.

But that's a specialized company. Microsoft and Adobe -- surely they're just "Too big to fail"?

Hmmm ... have you heard that about any other companies recently?

Even if the company doesn't disappear, they might just stop supporting the old format. Even if it's supposedly the same format!

For instance, in September of 2007, people using Microsoft Word 2003 updated to the latest service pack and got a surprise: they could no longer open any files written by Office versions from before 1997. All their old documents? Unreadable. There is a way to read them, though -- if you're comfortable making tricky edits to your Windows Registry. How many of you like editing your Windows registry?

Did you have any PowerPoint 95 files? PowerPoint 2007 won't open them. Hope you still have a machine with 2003 or earlier! Or maybe you were a Mac user who used the MSWorks spreadsheet. Microsoft dropped their Mac support, and today those spreadsheets are unreadable.

Even if the format is still supported, that doesn't mean you'll be able to read it. Many closed formats include something called Digital Rights Management that controls your right to access the data.

For instance, if you bought any music from the Wal-Mart digital music store before to February 2008 ... too bad! Wal-Mart used a format called Windows Media Audio, or WMA, which checks every time it's played to make sure you're still allowed to play the song. But late last year Wal-mart shut down its license server -- so the software can't check any more, and the music just won't play. Microsoft's MSN music service has the same problem, since Microsoft shut down the service in April of 2008. Fortunately, most music sold now is in a more open format, MP3.

Jeff Rothenberg once said, "Digital information lasts forever -- or five years, whichever comes first."

But you can beat that if you use open, standard formats. Michael Carden of the National Archives of Australia says that open document standards are crucial to preserving files and ensuring they'll be usable. His group translates every document they receive into an open XML-based format.

That's why increasingly, governments and school districts are specifying that all data should be exchanged in open formts. Countries like Germany, Holland and Norway are requiring open formats for government data. US states like Massachusetts, Connecticut, Oregon, Florida and even Texas have considered open format bills ... but so far, in the US, the corporate lobbyists always get there first and kill the bill.

But governments are where the issue is MOST important, since they safeguard data for all their citizens.

Like the USGS. Remember that geologic map I downloaded before my Utah trip? It ended with .sid? I googled it. It's a format called ... "Mister SID". Really, I'm not making that up! "Mister Sid" files are read using software from a company called ... LizardTech. I'm not making that up either. LizardTech owns a patent on the format, so you can't legally use it without their software.

What happens to all that online USGS data if LizardTech fails? Or is LizardTech "too big to fail"?

There are so many good reasons for sticking to open formats like text, HTML, JPEG, PDF. I've prepared a handout that summarizes some of the popular closed formats, and open formats you can use instead.

So the next time find yourself about to attach a Microsoft Word document to an email ... or to send a flyer out as a Powerpoint file ... or to use LizardTech formats on your public web site ... stop and think. Is this file controlled by a single company? Do I want to force all my readers to buy this company's software? Do I trust that company to support this format forever?

If you aren't sure about any of these questions -- please consider using an open format instead.