Sitescooper that finds new pages on my list of sites, downloads them, then runs Plucker to translate them into Plucker's open Palm-compatible ebook format.
Sitescooper has an elaborate series of rules for trying to get around the complicated formatting in modern HTML web pages. It has an elaborate cache system to figure out what it's seen before. When sites change their design (which most news sites seem to do roughly monthly), it means going in and figuring out the new format and writing a new Sitescooper site file. And it doesn't understand RSS, so you can't use the simplified RSS that most sites offer. Finally, it's no longer maintained; in fact, I was the last maintainer, after the original author lost interest.
Several weeks ago, bma tweeted about a Python RSS reader he'd hacked up using the feedparser package. His reader targeted email, not Palm, but finding out about feedparser was enough to get me started. So I wrote FeedMe (Carla Schroder came up with the all-important name).
I've been using it for a couple of weeks now and I'm very happy with the results. It's still quite rough, of course, but it's already producing better files than Sitescooper did, and it seems more maintainable. Time will tell.
Of course it needs to be made more flexible, adjusted so that it can produce formats besides Plucker, and so on. I'll get to it.
And the only site I miss now, because it doesn't offer an RSS feed, is Linux Planet. Maybe I'll find a solution for that eventually.
[ 21:08 Oct 20, 2009 More programming | permalink to this entry | ]