Shallow Thoughts : tags : ebook

Akkana's Musings on Open Source Computing, Science, and Nature.

Sat, 09 Jun 2012

Viewing and modifying epub ebook tags

My epub Books folder is starting to look like my physical bookshelf at home -- huge and overflowing with books I hope to read some day. Mostly free books from the wonderful Project Gutenberg and DRM-free books from publishers and authors who support that model.

With the Nook's standard library viewer that's impossible to manage. All you can do is sort all those books alphabetically by title or author and laboriously page through, some five books to a page, hoping the one you want will catch your eye. Worse, sometimes books show up in the author view but don't show up in the title view, or vice versa. I guess Barnes & Noble think nobody keeps more than ten or so books on their shelves.

Fortunately on my rooted Nook I have the option of using better readers, like FBreader and Aldiko, that let me sort by tags. If I want to read something about the Civil War, or Astronomy, or just relax with some Science Fiction, I can browse by keyword.

Well, in theory. In practice, tagging of ebooks is inconsistent and not very useful.

For instance, the Gutenberg tags for Othello are:

while the tags for Vanity Fair are

The Prince and the Pauper's tag list looks like:

while Captains Courageous looks like

I can understand wanting to tag details like this, but few of those tags are helpful when I'm browsing books on my little handheld device. I can't imagine sitting down to read and thinking, "Let's see, what books do I have on Interracial marriage? Or Saltwater fishing? No, on second thought I'd rather read some fiction set in the time of Edward VI, King of England, 1537-1553."

And of course, with over 90 books loaded on my ebook readers, it means I have hundreds of entries in my tags list, with few of them including more than one book.

Clearly what I needed to do was to change the tags on my ebooks.

Viewing and modifying epub tags

That ought to be simple, right? But ebooks are still a very young technology, and there's surprisingly little software devoted to them. Calibre can probably do it if you don't mind maintaining your whole book collection under calibre; but I like to be able to work on files one at a time or in small groups. And I couldn't find a program that would let me do that.

What to do? Well, epub is a fairly simple XML format, right? So modifying it with Python shouldn't that hard.

Managing epub in Python

An epub file is a collection of XML files packaged in a zip archive. So I unzipped one of my epub books and poked around. I found the tags in a file called content.opf, inside a <metadata> tag. They look like this:

<dc:subject>Science fiction</dc:subject>

So I could use Python's zipfile module to access the content.opf file inside the zip archive, then use the xml.dom.minidom parser to get to the tags. Writing a script to display existing tags was very easy.

What about replacing the old, unweildy tag list with new, simple tags?

It's easy enough to add nodes in Python's minidom. So the trick is writing it back to the epub file. The zipfile module doesn't have a way to modify a zip file in place, so I created a new zip archive and copied files from the old archive to the new one, replacing content.opf with a new version.

Python's difficulty with character sets in XML

But I hit a snag in writing the new content.opf. Python's XML classes have a toprettyxml() method to write the contents of a DOM tree. Seemed simple, and that worked for several ebooks ... until I hit one that contained a non-ASCII character. Then Python threw a UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 606: ordinal not in range(128).

Of course, there are ways (lots of them) to encode that output string -- I could do

ozf.writestr(info, dom.toprettyxml().encode(encoding, 'xmlcharrefreplace'))
, or
writestr(info, dom.toprettyxml(encoding=encoding)
Except ... what should I pass as the encoding? The content.opf file started with its encoding:
<?xml version='1.0' encoding='UTF-8'?>
but Python's minidom offers no way to get that information. In fact, none of Python's XML parsers seem to offer this.

Since you need a charset to avoid the UnicodeEncodeError, the only options are (1) always use a fixed charset, like utf-8, for content.opf, or (2) open content.opf and parse the charset line by hand after Python has already parsed the rest of the file. Yuck! So I chose the first option ... I can always revisit that if the utf-8 in content.opf ever causes problems.

The final script

Charset difficulties aside, though, I'm quite pleased with my epubtags.py script. It's very handy to be able to print tags on any .epub file, and after cleaning up the tags on my ebooks, it's great to be able to browse by category in FBreader. Here's the program: epubtag.py.

Tags: , ,
[ 12:05 Jun 09, 2012    More programming | permalink to this entry | comments ]

Sat, 12 May 2012

Downloading Adobe-protected books to a Nook using Linux

University of Chicago Press has a Carl Zimmer book, A Planet of Viruses, as their free monthly e-book.

I know Zimmer is a good writer. but the ebook, despite being free, is encumbered with Adobe's version of DRM, which unlocks via a Windows or Mac program. I use Linux, and wanted to read the book on a Nook. Was I out of luck?

Happily, the instruction page they sent when I signed up for the book helpfully included a section for Linux users. Hooray, U. Chicago! It said Adobe Digital Editions will run under Wine, the Windows emulator. I'd been meaning to try that anyway, and a Carl Zimmer book seemed like the perfect excuse.

And overall, it worked pretty well, with only a few snags. Here are the steps I had to follow:

Authorizing a book using Adobe Digital Editions in Linux on Wine

Install wine (on Ubuntu, I used apt-get install wine).

Download the Adobe Digital Editions setup.exe

Run: wine setup.exe (this should install ADE inside your .wine directory)

Copy the file, e.g. URLLink.acsm, into .wine/drive_c/My\ Documents/ Don't bother trying to open it with ADE -- the program won't open anything except PDF and epub. Curiously, the only ways to open the file from ADE are to drag the file onto the ADE window or to pass it as a commandline argument:
wine start .wine/drive_c/My\ Documents/URLLink.acsm

Now ADE should download your book and display it. You can read it there, if you want. But you won't want to -- it's not a good reading interface.

Authorizing a device with Adobe Digital Editions under Wine

Now how do you get it into your ebook reader? ADE running under Wine doesn't recognize devices such as ebook readers. so nothing will be copied automatically. But you can copy it manually.

In theory, the drive letter should stay mapped, so you should be able to use it for opening future books. Just remember to mount your device to the same location before running ADE under wine.

Tags: , ,
[ 10:03 May 12, 2012    More linux | permalink to this entry | comments ]

Tue, 26 Jul 2011

Nook Touch: the good, the bad, and the crazy

I've been dying to play with an ebook reader, and this week my mother got a new Nook Touch. That's not its official name, but since Barnes & Noble doesn't seem interested in giving it a model name to tell it apart from the two older Nooks, that's the name the internet seems to have chosen for this new small model with the 6-inch touchscreen.

Here's a preliminary review, based on a few days of playing with it.

Nice size, nice screen

The Nook Touch feels very light. It's a little heavier than a paperback, but it's easy to hold, and the rubbery back feels nice in the hand. The touchscreen works well enough for book reading, though you wouldn't want to try to play video games or draw pictures on it.

It's very easy to turn pages, either with the hardware buttons on the bezel or a tap on the edges of the screen. Page changes are much faster than with older e-ink readers like the original Nook or the Sony Pocket: the screen still flashes black with each page change, but only very briefly.

I'd wondered how a non-backlit e-ink display would work in dim light, since that's one thing you can't test in stores. It turns out it's not as good as a paper book -- neither as sharp nor as contrasty -- but still readable with my normal dim bedside lighting.

Changing fonts, line spacing and margins is easy once you figure out that you need to tap on the screen to get to that menu. Navigating within a book is also via that tap-on-page menu -- it gives you a progress meter you can drag, or a "jump to page" option. Which is a good thing. This is sadly very important (see below).

Searching within books isn't terribly convenient. I wanted to figure out from the user manual how to set a bookmark, and I couldn't find anything that looked helpful in the user manual's table of contents, so I tried searching for "bookmark". The search results don't show much context, so I had to try them one at a time, and there's no easy way to go back and try the next match. (Turns out you set a bookmark by tapping in the upper right corner, and then the bookmark applies to the next several pages.)

Plan to spend some quality time reading the full-length manual (provided as a pre-installed ebook, naturally) learning tricks like this: a lot of the UI isn't very discoverable (though it's simple enough once you learn it) so you'll miss a lot if you rely on what you can figure out by tapping around.

Off to a tricky start with minor Wi-fi issues

When we first powered up, we hit a couple of problems right off with wireless setup.

First, it had no way to set a static IP address. The only way we could get the Nook connected was to enable DHCP on the router.

But even then it wouldn't connect. We'd re-type the network password and hit "Connect"; the "Connect" button would flash a couple of times, leaving an "incorrect password" message at the top of the screen. This error message never went away, even after going back to the screen with the list of networks available, so it wasn't clear whether it was retrying the connection or not.

Finally through trial and error we found the answer: to clear a failed connection, you have to "Forget" the network and start over. So go back to the list of wireless networks, choose the right network, then tap the "Forget" button. Then go back and choose the network again and proceed to the connect screen.

Connecting to a computer

The Nook Touch doesn't come with much in the way of starter books -- just two public-domain titles, plus its own documentation -- so the first task was to download a couple of Project Gutenberg books that Mom had been reading on her Treo.

The Nook uses a standard micro-USB cable for both charging and its USB connection. Curiously, it shows up as a USB device with no partitions -- you have to mount sdb, not sdb1. Gnome handled that and mounted it without drama. Copying epub books to the Nook was just a matter of cp or drag-and-drop -- easy.

Getting library books may be moot

One big goal for this device is reading ebooks from the public library, and I had hoped to report on that. But it turns out to be a more difficult proposition than expected. There are all the expected DRM issues to surmount, but before that, there's the task of finding an ebook that's actually available to check out, getting the library's online credentials straightened out, and so forth. So that will be a separate article.

The fatal flaw: forgetting its position

Alas, the review is not all good news. While poking around, reading a page here and there, I started to notice that I kept getting reset back to the beginning of a book I'd already started. What was up?

For a while I thought it was my imagination. Surely remembering one's place in a book you're reading is fundamental to a device designed from the ground up as a book reader. But no -- it clearly was forgetting where I'd left off. How could that be?

It turns out this is a known and well reported problem with what B&N calls "side-loaded" content -- i.e. anything you load from your computer rather than download from their bookstore. With side-loaded books, apparently connecting the Nook to a PC causes it to lose its place in the book you're reading! (also discussed here and here).

There's no word from Barnes & Noble about this on any of the threads, but people writing about it speculate that when the Nook makes a USB connection, it internally unmounts its filesystems -- and forgets anything it knew about what was on those filesystems.

I know B&N wants to drive you to their site to buy all your books ... and I know they want to keep you online checking in with their store at every opportunity. But some people really do read free books, magazines and other "side loaded" content. An ebook reader that can't handle that properly isn't much of a reader.

It's too bad. The Nook Touch is a nice little piece of hardware. I love the size and light weight, the daylight-readable touchscreen, the fast page flips. Mom is being tolerant about her new toy, since she likes it otherwise -- "I'll just try to remember what page I was on." But come on, Barnes & Noble: a dedicated ebook reader that can't remember where you left off reading your book? Seriously?

Tags: ,
[ 19:46 Jul 26, 2011    More tech | permalink to this entry | comments ]

Wed, 30 Mar 2011

Reading, converting and editing EPUB ebooks

Since switching to the Archos 5 Android tablet for my daily feed reading, I've also been using it to read books in EPUB format.

There are tons of places to get EPUB ebooks -- I won't try to list them all, but Project Gutenberg is a good place to start. The next question was how to read them.

Reading EPUB books: Aldiko or FBReader

I've already mentioned Aldiko in my post on Android as an RSS reader. It's not so good for reading short RSS feeds, but it's excellent for ebooks.

But Aldiko has one fatal flaw: it insists on keeping its books in one place, and you can't change it. When I tried to add a big technical book, Aldiko spun for several minutes with no feedback, then finally declared it was out of space on the device. Frustrating, since I have a nearly empty 8-gigabyte micro-SD card and there's no way to get Aldiko to use it. Fiddling with symlinks didn't help.

A reader gave me a tip a while back that I should check out FBReader. I'd been avoiding it because of a bad experience with the early FBReader on the Nokia 770 -- but it's come a long way since then, and FBReaderJ, the Android port, works very nicely. It's as good a reader as Aldiko (except I wish the line spacing were more configurable). It has better navigation: I can see how far along in the book I am or jump to an arbitrary point, tasks Aldiko makes quite difficult. Most important, it lets me keep my books anywhere I want them. Plus it's open source.

Creating EPUB books: Calibre and ebook-convert

I hadn't had the tablet for long before I encountered an article that was only available as PDF. Wouldn't it be nice to read it on my tablet?

Of course, Android has lots of PDF readers. But most of them aren't smart about things like rewrapping lines or changing fonts and colors, so it's an unpleasant experience to try to read PDF on a five-inch screen. Could I convert the PDF to an EPUB?

Sadly, there aren't very many open-source options for handling EPUB. For converting from other formats, you have one choice: Calibre. It's a big complex GUI program for organizing your ebook library and a whole bunch of other things I would never want to do, and it has a ton of prerequisites, like Qt4. But the important thing is that it comes with a small Python script called ebook-convert.

ebook-convert has no built-in help -- it takes lots of options, but to find out what they are, you have to go to the ebook-convert page on Calibre's site. But here's all you typically need

ebook-convert --authors "Mark Twain" --title "Huckleberry Finn" infile.pdf huckfinn.epub
Update: They've changed the syntax as of Calibre v. 0.7.44, and now it insists on having the input and output filenames first:
ebook-convert infile.pdf huckfinn.epub --authors "Mark Twain" --title "Huckleberry Finn"

Pretty easy; the only hard part is remembering that it's --authors and not --author.

Calibre (and ebook-convert) can take lots of different input formats, not just PDF. If you're converting ebooks, you need it. I wish ebook-convert was available by itself, so I could run it on a server; I took a quick stab at separating it, but even once I separated out the Qt parts it still required Python libraries not yet available on Debian stable. I may try again some day, but for now, I'll stick to running it on desktop systems.

Editing EPUB books: Sigil

But we're not quite done yet. Calibre and ebook-convert do a fairly good job, but they're not perfect. When I tried converting my GIMP book from a PDF, the chapter headings were a mess and there was no table of contents. And of course I wanted the cover page to be right, instead of the default Calibre image. I needed a way to edit it.

EPUB is an XML format, so in theory I could have fixed this with a text editor, but I wanted to avoid that if possible.

And I found Sigil. Wikipedia claims it's the only application that can edit EPUB books.

There's no sigil package in Ubuntu (though Arch has one), but it was very easy to install from the sigil website.

And it worked beautifully. I cleaned up the font problems at the beginnings of chapters, added chapter breaks where they were missing, and deleted headings that didn't belong. Then I had Sigil auto-generate a table of contents from headers in the document. I was also able to fix the title and put the real book cover on the title page.

It all worked flawlessly, and the ebook I generated with Sigil looks very nice and has working navigation when I view it in FBReaderJ (it's still too big for Aldiko to handle). Very impressive. If you've ever wanted to generate your own ebook, or edit one you already have, you should definitely check out Sigil.

Tags: , ,
[ 10:17 Mar 30, 2011    More tech | permalink to this entry | comments ]

Mon, 20 Dec 2010

Android tablet as an ebook/RSS reader

I reviewed my Archos 5 Android tablet last week, but I didn't talk much about my main use for it: offline reading of news, RSS feeds and ebooks.

I've been searching for years for something to replace the aging and unsupported Palm platform. I've been using Palms for many years to read daily news feeds; first on the proprietary Avantgo service, but mostly using the open source Plucker.

I don't normally have access to a network when I'm reading -- I might be a passenger in a car or train, in a restaurant, standing in line at the market, or in the middle of the Mojave desert. So I run a script once a day on a network-connected computer to gather up a list of feeds, package it up and transfer it to the mobile device, so I have something to read whenever I find spare time.

For years I used Sitescooper on the host end to translate HTML pages into a mobile format, and eventually became its primary maintainer. But that got cumbersome, and I wrote a simpler RSS feed reader, feedme.

But on the reader side, that still left me buying old PalmOS Clies on ebay. Surely there was a better option?

I've been keeping an eye on ebook readers and tablets for a while now. But the Plucker reader has several key features not available in most ebook reader apps:

  1. An easy, open-source way of automatically translating RSS and HTML pages into something the reader can understand;
  2. Delete documents after you've read them, without needing to switch to a separate application;
  3. Random access to document, e.g. jump to the beginning or end, or 60% in;
  4. Follow links: nearly all RSS sites, whether news sites or blogs, are set up as an index page with links to individual story pages;
  5. Save external links if you click on them while offline, so you can fetch them later.

Most modern apps seem to assume either (a) that you'll be reading only books packaged commercially, or (b) that you're reading web pages and always have a net connection. Which meant that I'd probably have to roll my own; and that pointed to Android tablets rather than dedicated ebook readers.

Android as a reader

All the reviews I read pointed to Aldiko as the best e-reader on Android, so I installed it first thing. And indeed, it's a wonderful reader. The font is beautiful, and you can adjust size and color easily, including a two-click transition between configurable "day" and "night" schemes. It's easy to turn pages (something surprisingly difficult in most Android apps, since the OS seems to have no concept of "Page down"). It's easy to import new documents and easy to delete them after reading them.

So how about those other requirements? Not so good. Aldiko uses epub format, and it's possible (after much research) to produce those using ebook-convert, a command-line script you can get as part of the huge Calibre package. Alas, Calibre requires all sorts of extraneous packages like Qt even if you're never going to use the GUI; but once you get past that, the ebook-convert script works pretty well.

Except that links don't work, much. Sometimes they do, but mostly they do nothing. I don't know if this is a problem with Calibre's ebook-convert, Aldiko's reader, or the epub format itself, but you can't rely on links from the index page actually jumping anywhere. Aldiko also doesn't have a way to jump to a set point, so once you're inside a story you can't easily go back to the title page (sometimes BACK works, sometimes it doesn't).

And of course there's no way to save external links for later.

So Aldiko is a good book reader, but it wouldn't solve my feed-reading problem.

And that meant I had to write my own reader, and it was time to delve into the world of Android development. And it was surprisingly easy ... which I'll cover in a separate post. For now, I'll skip ahead and ruin the punch line by saying I have a lovely little feed-reading app, and my Archos and Android are working out great.

Tags: , , ,
[ 14:14 Dec 20, 2010    More tech | permalink to this entry | comments ]

Wed, 15 Dec 2010

Archos 5 Android Tablet review

[Archos 5]

For the past couple weeks I've been using a small Android tablet, an Archos 5. I use it primarily as an ebook and RSS feed reader (more about that separately), though of course I've played with assorted games and other apps too.

I've been trying to wait for the slew of cheap Android tablets the media assure us is coming out any day now. Except "any day now" never turns into "now". And I wanted something suitable for reading: small enough to fit in a jacket pocket and hold in one hand, yet large enough to fit a reasonable amount of text on the screen. A 4-5-inch screen seemed ideal.

There's nothing in the current crop fitting that description, but there's a year-old model, the Archos 5. It has a 4.8-inch screen, plus some other nice hardware like GPS. And it seems to have a fair community behind it, at archosfans.com.

I have the 16G flash version. I've had it for a couple of weeks now and I'm very happy so far. I'm not sure I'd recommend it to a newbie (due to the Android Marketplace's ban on tablets -- see below), but it's a lovely toy for someone fairly tech savvy.

My review turned out quite long, too long for a blog post. So if you're interested in the details of what's good and what's bad, you'll find the details in my Archos 5 Android Tablet review.

Tags: , , ,
[ 21:22 Dec 15, 2010    More tech | permalink to this entry | comments ]

Syndicated on:
LinuxChix Live
Ubuntu Women
Women in Free Software
Graphics Planet
DevChix
Ubuntu California
Planet Openbox
Devchix
Planet LCA2009

Friends' Blogs:
Morris "Mojo" Jones
Jane Houston Jones
Dan Heller
Long Live the Village Green
Ups & Downs
DailyBBG

Other Blogs of Interest:
DevChix
Scott Adams
Dave Barry
BoingBoing

Powered by PyBlosxom.