Shallow Thoughts : tags : ebook
Akkana's Musings on Open Source Computing, Science, and Nature.
Sat, 09 Jun 2012
My epub Books folder is starting to look like my physical bookshelf at
home -- huge and overflowing with books I hope to read some day.
Mostly free books from the wonderful
Project Gutenberg and
DRM-free books from publishers and authors who support that model.
With the Nook's standard library viewer that's impossible to manage.
All you can do is sort all those books alphabetically by title or author
and laboriously page through, some five books to a page, hoping the
one you want will catch your eye. Worse, sometimes books show up in
the author view but don't show up in the title view, or vice versa.
I guess Barnes & Noble think nobody keeps more than ten or so
books on their shelves.
Fortunately on my rooted Nook I have the option of using better
readers, like FBreader and Aldiko, that let me sort by tags.
If I want to read something about the Civil War, or Astronomy, or just
relax with some Science Fiction, I can browse by keyword.
Well, in theory. In practice, tagging of ebooks is inconsistent
and not very useful.
For instance, the Gutenberg tags for Othello are:
- Othello (Fictitious character) -- Drama
- Jealousy -- Drama
- Interracial marriage -- Drama
- Venice (Italy) -- Drama
- Muslims -- Drama
while the tags for Vanity Fair are
- England -- Fiction
- Married women -- Fiction
- Female friendship -- Fiction
- Social classes -- Fiction
- British -- Europe -- Fiction
- Waterloo, Battle of, Waterloo, Belgium, 1815 -- Fiction
The Prince and the Pauper's tag list looks like:
- Edward VI, King of England, 1537-1553 -- Fiction
- Impostors and imposture -- Fiction
- Social classes -- Fiction
- Poor children -- Fiction
- Lookalikes -- Fiction
- Princes -- Fiction
- Boys -- Fiction
- London (England) -- Fiction
- Historical fiction
while Captains Courageous looks like
- Sea stories
- Saltwater fishing -- Fiction
- Children of the rich -- Fiction
- Fishing boats -- Fiction
- Teenage boys -- Fiction
- Rescues -- Fiction
- Fishers -- Fiction
- Grand Banks of Newfoundland -- Fiction
I can understand wanting to tag details like this, but
few of those tags are helpful when I'm browsing books on
my little handheld device. I can't imagine sitting
down to read and thinking,
"Let's see, what books do I have on Interracial marriage? Or Saltwater
fishing? No, on second thought I'd rather read some fiction set in the
time of Edward VI, King of England, 1537-1553."
And of course, with over 90 books loaded on my ebook readers, it means
I have hundreds of entries in my tags list,
with few of them including more than one book.
Clearly what I needed to do was to change the tags on my ebooks.
Viewing and modifying epub tags
That ought to be simple, right? But ebooks are still a very young
technology, and there's surprisingly little software devoted to them.
Calibre can probably do it if you don't mind maintaining your whole
book collection under calibre; but I like to be able to work on files
one at a time or in small groups. And I couldn't find a program that
would let me do that.
What to do? Well, epub is a fairly simple XML format, right?
So modifying it with Python shouldn't that hard.
Managing epub in Python
An epub file is a collection of XML files packaged in a zip archive.
So I unzipped one of my epub books and poked around. I found the tags
in a file called content.opf, inside a <metadata> tag.
They look like this:
So I could use Python's
to access the content.opf file inside the zip archive, then use the
parser to get to the tags. Writing a script to display existing tags
was very easy.
What about replacing the old, unweildy tag list with new, simple tags?
It's easy enough to add nodes in Python's
So the trick is writing it back to the epub file.
zipfile module doesn't have a way to modify a zip file
in place, so I created a new zip archive and copied files from the
old archive to the new one, replacing content.opf with a new
Python's difficulty with character sets in XML
But I hit a snag in writing the new content.opf.
Python's XML classes have a toprettyxml() method to write the contents
of a DOM tree. Seemed simple, and that worked for several ebooks ...
until I hit one that contained a non-ASCII character. Then Python threw
UnicodeEncodeError: 'ascii' codec can't encode character
u'\u2014' in position 606: ordinal not in range(128).
Of course, there are ways (lots of them) to encode that output string --
I could do
ozf.writestr(info, dom.toprettyxml().encode(encoding, 'xmlcharrefreplace'))
Except ... what should I pass as the encoding?
file started with its encoding:
<?xml version='1.0' encoding='UTF-8'?>
but Python's minidom offers no way to get that information.
In fact, none of Python's XML parsers seem to offer this.
Since you need a charset to avoid the UnicodeEncodeError,
the only options are (1) always use a fixed charset, like utf-8,
for content.opf, or (2) open content.opf and parse the
charset line by hand after Python has already parsed the rest of the file.
Yuck! So I chose the first option ... I can always revisit that if the utf-8
in content.opf ever causes problems.
The final script
Charset difficulties aside, though, I'm quite pleased with my epubtags.py
script. It's very handy to be able to print tags on any .epub file,
and after cleaning up the tags on my ebooks, it's great to be
able to browse by category in FBreader. Here's the program:
[ 12:05 Jun 09, 2012
More programming |
permalink to this entry |
Sat, 12 May 2012
University of Chicago Press has a
Carl Zimmer book,
A Planet of Viruses, as their free monthly e-book.
I know Zimmer is a good writer. but the ebook, despite being free, is
encumbered with Adobe's version of DRM, which unlocks via a Windows
or Mac program. I use Linux, and wanted to read the book on a Nook.
Was I out of luck?
Happily, the instruction page they sent when I signed up
for the book helpfully included a section for Linux users. Hooray,
U. Chicago! It said Adobe Digital Editions will run under Wine,
the Windows emulator.
I'd been meaning to try that anyway, and a Carl Zimmer book seemed
like the perfect excuse.
And overall, it worked pretty well, with only a few snags.
Here are the steps I had to follow:
Authorizing a book using Adobe Digital Editions in Linux on Wine
Install wine (on Ubuntu, I used
apt-get install wine).
Download the Adobe Digital Editions setup.exe
(this should install ADE inside your .wine directory)
Copy the file, e.g. URLLink.acsm, into .wine/drive_c/My\ Documents/
Don't bother trying to open it with ADE -- the program won't open
anything except PDF and epub. Curiously, the only ways to open the
file from ADE are to drag the file onto the ADE window or to pass
it as a commandline argument:
wine start .wine/drive_c/My\ Documents/URLLink.acsm
Now ADE should download your book and display it.
You can read it there, if you want. But you won't want to -- it's not
a good reading interface.
Authorizing a device with Adobe Digital Editions under Wine
Now how do you get it into your ebook reader?
ADE running under Wine doesn't recognize devices such as ebook readers.
so nothing will be copied automatically. But you can copy it manually.
- Plug in your ebook reader.
- Mount the device wherever you like -- /media/nook, /nook or whatever.
- With ADE not running (quit it if it's running),
map a drive letter to the mount point:
- Run winecfg
- Click the Drives tab
- Click Add...
- Choose a drive letter (I chose D:)
- Under Path: type in the mount point, like /nook
- Click Show Advanced
- Set the Type: to Floppy disk
- Click OK to save it
- Now the drive is mapped. Re-run ADE:
wine .wine/drive_c/Program\ Files/Adobe/Adobe\ Digital\ Editions/digitaleditions.exe
ADE should now see the device and ask you if you want to authorize it.
- In ADE, drag the chosen book onto the left sidebar entry for the device.
- umount the reader ... and now your new book should show up in the library.
In theory, the drive letter should stay mapped, so you should be able
to use it for opening future books.
Just remember to mount your device to the same location before
running ADE under wine.
[ 10:03 May 12, 2012
More linux |
permalink to this entry |
Tue, 26 Jul 2011
I've been dying to play with an ebook reader, and this week my mother
got a new Nook Touch. That's not its official name,
but since Barnes & Noble doesn't seem interested in giving it a
model name to tell it apart from the two older Nooks, that's the name
the internet seems to have chosen for this new small model with the
Here's a preliminary review, based on a few days of playing with it.
Nice size, nice screen
The Nook Touch feels very light. It's a little heavier than a
paperback, but it's easy to hold, and the rubbery back feels nice in
the hand. The touchscreen works well enough for book reading, though
you wouldn't want to try to play video games or draw pictures on it.
It's very easy to turn pages, either with the hardware buttons on the
bezel or a tap on the edges of the screen. Page changes are
much faster than with older e-ink readers like the original Nook or the
Sony Pocket: the screen still flashes black with each page change,
but only very briefly.
I'd wondered how a non-backlit e-ink display would work in dim light,
since that's one thing you can't test in stores. It turns out it's
not as good as a paper book -- neither as sharp nor as contrasty -- but
still readable with my normal dim bedside lighting.
Changing fonts, line spacing and margins is easy once you figure out
that you need to tap on the screen to get to that menu.
Navigating within a book is also via that tap-on-page menu -- it gives
you a progress meter you can drag, or a "jump to page" option. Which is
a good thing. This is sadly very important (see below).
Searching within books isn't terribly convenient. I wanted to figure
out from the user manual how to set a bookmark, and I couldn't find
anything that looked helpful in the user manual's table of contents,
so I tried searching for "bookmark". The search results don't show much
context, so I had to try them one at a time, and
there's no easy way to go back and try the next match.
(Turns out you set a bookmark by tapping in the upper right corner,
and then the bookmark applies to the next several pages.)
Plan to spend some quality time reading the full-length manual
(provided as a pre-installed ebook, naturally) learning tricks like this:
a lot of the UI isn't very discoverable (though it's simple enough
once you learn it) so you'll miss a lot if you rely on what you can
figure out by tapping around.
Off to a tricky start with minor Wi-fi issues
When we first powered up, we hit a couple of problems right off with
First, it had no way to set a static IP address. The only way we
could get the Nook connected was to enable DHCP on the router.
But even then it wouldn't connect. We'd re-type the network
password and hit "Connect"; the "Connect" button would flash
a couple of times, leaving an "incorrect password" message at the top
of the screen. This error message never went away, even after going
back to the screen with the list of networks available, so it wasn't
clear whether it was retrying the connection or not.
Finally through trial and error we found the answer: to clear a
failed connection, you have to "Forget" the network and start over.
So go back to the list of wireless networks, choose the right network,
then tap the "Forget" button. Then go back and choose the network
again and proceed to the connect screen.
Connecting to a computer
The Nook Touch doesn't come with much in the way of starter books --
just two public-domain titles, plus its own documentation -- so the
first task was to download a couple of
Project Gutenberg books that
Mom had been reading on her Treo.
The Nook uses a standard micro-USB cable for both charging and its
USB connection. Curiously, it shows up as a USB device with no
partitions -- you have to mount sdb, not sdb1. Gnome handled that
and mounted it without drama. Copying epub books to the Nook was just
a matter of cp or drag-and-drop -- easy.
Getting library books may be moot
One big goal for this device is reading ebooks from the public library,
and I had hoped to report on that.
But it turns out to be a more difficult proposition than expected.
There are all the expected DRM issues to surmount, but before that,
there's the task of finding an ebook that's actually available to
check out, getting the library's online credentials straightened
out, and so forth. So that will be a separate article.
The fatal flaw: forgetting its position
Alas, the review is not all good news. While poking around, reading
a page here and there, I started to notice that I kept getting reset
back to the beginning of a book I'd already started. What was up?
For a while I thought it was my imagination. Surely remembering one's
place in a book you're reading is fundamental to a device designed from
the ground up as a book reader. But no -- it clearly was forgetting
where I'd left off. How could that be?
It turns out this is a known and well reported problem
with what B&N calls "side-loaded" content -- i.e. anything
you load from your computer rather than download from their bookstore.
With side-loaded books, apparently
the Nook to a PC causes it to lose its place in the book you're
reading! (also discussed
There's no word from Barnes & Noble about this on any of the threads,
but people writing about it speculate that when the Nook makes a USB
connection, it internally unmounts its filesystems -- and
forgets anything it knew about what was on those filesystems.
I know B&N wants to drive you to their site to buy all your books
... and I know they want to keep you online checking in with their
store at every opportunity. But some people really do read free
books, magazines and other "side loaded" content. An ebook reader
that can't handle that properly isn't much of a reader.
It's too bad. The Nook Touch is a nice little piece of hardware. I love
the size and light weight, the daylight-readable touchscreen, the fast
page flips. Mom is being tolerant about her new toy, since she likes it
otherwise -- "I'll just try to remember what page I was on."
But come on, Barnes & Noble: a dedicated ebook reader
that can't remember where you left off reading your book? Seriously?
[ 19:46 Jul 26, 2011
More tech |
permalink to this entry |
Wed, 30 Mar 2011
Since switching to the
Archos 5 Android tablet
for my daily feed reading, I've also been using it to read books in EPUB format.
There are tons of places to get EPUB ebooks -- I won't try
to list them all, but Project Gutenberg
is a good place to start. The next question was how to read them.
Reading EPUB books: Aldiko or FBReader
I've already mentioned Aldiko in my post on
as an RSS reader. It's not so good for reading short RSS feeds,
but it's excellent for ebooks.
But Aldiko has one fatal flaw: it insists on keeping its books in one
place, and you can't change it. When I tried to add a big
technical book, Aldiko spun for several minutes with no feedback,
then finally declared it was out of space on the device. Frustrating,
since I have a nearly empty 8-gigabyte micro-SD card and there's no
way to get Aldiko to use it. Fiddling with symlinks didn't help.
A reader gave me a tip a while back that I should check out FBReader.
I'd been avoiding it because of a bad experience with the
early FBReader on the Nokia 770 -- but it's come a long way since then,
and FBReaderJ, the Android port, works very nicely. It's as good a
reader as Aldiko (except I wish the line spacing were more
configurable). It has better navigation: I can see how far along in
the book I am or jump to an arbitrary point, tasks Aldiko makes quite
difficult. Most important, it lets me keep my books anywhere I want them.
Plus it's open source.
Creating EPUB books: Calibre and ebook-convert
I hadn't had the tablet for long before I encountered an article that was only
available as PDF. Wouldn't it be nice to read it on my tablet?
Of course, Android has lots of PDF readers. But most of them aren't
smart about things like rewrapping lines or changing fonts and colors,
so it's an unpleasant experience to try to read PDF on a five-inch screen.
Could I convert the PDF to an EPUB?
Sadly, there aren't very many open-source options for handling EPUB.
For converting from other formats, you have one choice: Calibre.
It's a big complex GUI program for organizing your ebook library and a
whole bunch of other things I would never want to do, and it has a ton
of prerequisites, like Qt4.
But the important thing is that it comes with a small Python
script called ebook-convert.
ebook-convert has no built-in help -- it takes lots of options,
but to find out what they are, you have to go to the
page on Calibre's site. But here's all you typically need
ebook-convert --authors "Mark Twain" --title "Huckleberry Finn" infile.pdf huckfinn.epub
They've changed the syntax as of Calibre v. 0.7.44, and now it insists
on having the input and output filenames first:
ebook-convert infile.pdf huckfinn.epub --authors "Mark Twain" --title "Huckleberry Finn"
Pretty easy; the only hard part is remembering that it's --authors
and not --author.
Calibre (and ebook-convert) can take lots of different input formats,
not just PDF. If you're converting ebooks, you need it. I wish
ebook-convert was available by itself, so I could run it on a server;
I took a quick stab at separating it, but even once I separated out
the Qt parts it still required Python libraries not yet available on
Debian stable. I may try again some day, but for now, I'll stick to
running it on desktop systems.
Editing EPUB books: Sigil
But we're not quite done yet. Calibre and ebook-convert do a fairly
good job, but they're not perfect. When I tried converting
my GIMP book from a PDF,
the chapter headings were a mess and there was no table of contents.
And of course I wanted the cover page to be right, instead of the
default Calibre image. I needed a way to edit it.
EPUB is an XML format, so in theory I could have fixed this with a
text editor, but I wanted to avoid that if possible.
And I found Sigil.
Wikipedia claims it's the
application that can edit EPUB books.
There's no sigil package in Ubuntu (though Arch has one), but it was
very easy to install from the sigil website.
And it worked beautifully. I cleaned
up the font problems at the beginnings of chapters, added chapter
breaks where they were missing, and deleted headings that didn't belong.
Then I had Sigil auto-generate a table of contents from headers in the
document. I was also able to fix the title and put the real book cover
on the title page.
It all worked flawlessly, and the ebook I generated with Sigil looks
very nice and has working navigation when I view it in FBReaderJ
(it's still too big for Aldiko to handle).
Very impressive. If you've ever wanted to generate your own ebook, or
edit one you already have, you should definitely check out Sigil.
[ 10:17 Mar 30, 2011
More tech |
permalink to this entry |
Mon, 20 Dec 2010
I reviewed my
Archos 5 Android
last week, but I didn't talk much about my main use for it:
offline reading of news, RSS feeds and ebooks.
I've been searching for years for something to replace the aging and
unsupported Palm platform. I've been using Palms for many years to
read daily news feeds; first on the proprietary Avantgo service,
but mostly using the open source Plucker.
I don't normally have access to a network when I'm reading -- I might
be a passenger in a car or train, in a restaurant, standing in line at
the market, or in the middle of the Mojave desert.
So I run a script once a day on a network-connected computer to gather
up a list of feeds, package it up and transfer it to the mobile
device, so I have something to read whenever I find spare time.
For years I used Sitescooper
on the host end to translate HTML
pages into a mobile format, and eventually became its primary maintainer.
But that got cumbersome, and I wrote a simpler RSS feed reader,
But on the reader side, that still left me buying old
PalmOS Clies on ebay. Surely there was a better option?
I've been keeping an eye on ebook readers and tablets for a while now.
But the Plucker reader has several key features not available
in most ebook reader apps:
- An easy, open-source way of automatically translating RSS and HTML
pages into something the reader can understand;
- Delete documents after you've read them, without needing to switch
to a separate application;
- Random access to document, e.g. jump to the beginning or end, or
- Follow links: nearly all RSS sites, whether news sites or blogs,
are set up as an index page with links to individual story pages;
- Save external links if you click on them while offline,
so you can fetch them later.
Most modern apps seem to assume either (a) that you'll be reading only
books packaged commercially, or (b) that you're reading web pages and
always have a net connection. Which meant that I'd probably have to
roll my own; and that pointed to Android tablets rather than dedicated
Android as a reader
All the reviews I read pointed to Aldiko
as the best e-reader on Android,
so I installed it first thing. And indeed, it's a wonderful reader.
The font is beautiful, and you can adjust size and color easily,
including a two-click transition between configurable "day" and "night"
schemes. It's easy to turn pages (something surprisingly difficult
in most Android apps, since the OS seems to have no concept of
"Page down"). It's easy to import new documents and easy to delete
them after reading them.
So how about those other requirements? Not so good. Aldiko uses epub format,
and it's possible (after much research) to produce those using
ebook-convert, a command-line script you can get as part of the
huge Calibre package. Alas, Calibre requires all sorts of
extraneous packages like Qt even if you're never going to use the GUI;
but once you get past that, the ebook-convert script works pretty well.
Except that links don't work, much. Sometimes they do, but mostly they
do nothing. I don't know if this is a problem with Calibre's ebook-convert,
Aldiko's reader, or the epub format itself, but you can't rely on links
from the index page actually jumping anywhere. Aldiko also doesn't have
a way to jump to a set point, so once you're inside a story you can't
easily go back to the title page (sometimes BACK works, sometimes it doesn't).
And of course there's no way to save external links for later.
So Aldiko is a good book reader, but it wouldn't solve my feed-reading
And that meant I had to write my own reader, and it was time to delve
into the world of Android development. And it was surprisingly easy ...
which I'll cover in a separate post. For now, I'll skip ahead and
ruin the punch line by saying I have a lovely little feed-reading app,
and my Archos and Android are working out great.
[ 14:14 Dec 20, 2010
More tech |
permalink to this entry |
Wed, 15 Dec 2010
For the past couple weeks I've been using a small Android tablet,
an Archos 5. I use it primarily as an ebook and RSS feed reader
(more about that separately), though of course I've played with
assorted games and other apps too.
I've been trying to wait for the slew of cheap Android tablets
the media assure us is coming out any day now. Except "any day now"
never turns into "now". And I wanted something suitable for reading:
small enough to fit in a jacket pocket and hold in one hand, yet
large enough to fit a reasonable amount of text on the screen.
A 4-5-inch screen seemed ideal.
There's nothing in the current crop fitting that description, but
there's a year-old model, the Archos 5. It has a 4.8-inch screen,
plus some other nice hardware like GPS.
And it seems to have a fair community behind it, at
I have the 16G flash version.
I've had it for a couple of weeks now and I'm very happy so far.
I'm not sure I'd recommend it to a newbie (due to the Android Marketplace's
ban on tablets -- see below), but it's a lovely toy for someone fairly
My review turned out quite long, too long for a blog post.
So if you're interested in the details of what's good and what's bad,
you'll find the details in my
Archos 5 Android
[ 21:22 Dec 15, 2010
More tech |
permalink to this entry |