I needed a way to translate modern web pages, with accented characters and curly quotes ("smart quotes") and emdashes and Euro symbols and all that jazz, to plain ASCII.
You'd think this would be an easy thing to do, but amazingly, I couldn't find such a thing anywhere.
Download: ununicode is now on GitHub as part of FeedMe: ununicode.py.
Or archived on shallowsky.comununicode-0.4.py.
To install it, copy it somewhere in your python path (I use ~/bin) and name it ununicode.py.
Now, before you get all angry at how we provincial Americans don't care about the rest of the world, I love UTF-8 and I know what ISO-8859-15 means. I even know how to insert Spanish characters into my email with vim and mutt.
But some software can't handle it. Like my PalmOS PDA, which I use as an ebook reader. I download RSS from the web using my RSS reader, FeedMe.
I wish I could find a way to get all those lovely UTF-8 and ISO-8859-15 characters displaying properly on my PalmOS 4.11 Clie -- but after years of trying I've concluded it can't be done. I need to translate files to ASCII before running plucker.
Enter ununicode. It's a small Python module. It knows about some characters, but, more important, it lets you keep an error log for characters it doesn't know how to translate, so you can add them as you find out about them.
Put it somewhere in your python path (I just stick it in ~/bin). Then use it like this:
import ununicode decoded = ununicode.toascii(line, errfile)Arguments:
Here's a minimal test script for it: testununicode.py. I'll be adding more samples to this as I encounter them.