[Yum, RSS!]

FeedMe: a Lightweight Feed/RSS reader

Ever want to download RSS from news sites or blogs to your phone, PDA, ebook reader, or other handheld device to read offline?

Feedme is a program for fetching stories from RSS or Atom feeds, from news sites, blogs or any other site that offers a feed. It saves simplified HTML files, or can translate to other formats such as Epub, FB2, Plucker or plain text for devices that can't handle HTML.

About RSS

RSS stands for Really Simple Syndication. It's a way of keeping track of things you've already seen so you only read the ones that are new, and it's widely used on sites like newspapers and blogs that are constantly adding new content.

Google has recently re-added RSS support to Chrome: Chrome’s newest feature resurrects the ghost of Google Reader (Popular Science, Nov 2, 2021), Google rediscovers RSS: tests new feature to ‘follow’ sites in Chrome on Android (The Verge, May 2020).

A web search for rss reader will get you a list of other ways to follow an RSS feed. I recommend limiting the search to "Past year" since there are some older RSS readers that are no longer supported. Wikipedia also has a Comparison of feed aggregators ("feed aggregators" means more or less the same thing as "RSS reader").

Download Feedme

FeedMe is now maintained on GitHub: FeedMe. It's currently at 1.0b5.

You will need Python 3, python's feedparser module (on Ubuntu or Debian, that's the package python3-feedparser) and lxml (package python3-lxml). Of course you can also install feedparser and lxml from PyPI if you prefer.

I know there are lots of other RSS readers -- but they all seem oriented toward converting RSS to mail to be read in an HTML-capable mail reader. I get enough mail, which I read in Mutt without much HTML support. I wanted a way to get it onto a handheld device in an easily readable, simple format that displays well on a small screen, without images and tables and stylesheets and javascript and all that other HTML cruft, and in a self-contained way so I can read it without needing a net connection or using up data allocation. You may remember older Palm apps that worked this way, like AvantGo or Sitescooper.

So I wrote FeedMe. I've been using it daily for many years, so I guess it works well enough for my purposes. Maybe it will work for you too.

FeedMe is sort of an RSS version of Sitescooper. By default, it produces HTML that's been simplified to work well on a small screen, but optionally it can convert pages to plaintext, EPUB, FB2 or Plucker format.

The feedme.conf file

The sample feedme.conf configuration file should be vaguely self-explanatory, though it doesn't contain every option.

Install your feedme.conf ~/.config/feedme/feedme.conf (the usual Linux location), or ~/$XDG_CONFIG_HOME/feedme/feedme.conf.

General options

The feedme.conf file should start with a set of default options in a section labeled [DEFAULT].

These options apply to the whole feedme process:

ascii
Convert all pages to plain ASCII. Useful for reading devices like Palm that can't display other character sets reliably.
dir
Where to save the collected pages. See save_days for how long they will be kept.
formats
Comma-separated list of output formats. Default "none", which will result in HTML output. Other options: epub, fb2, plucker.
verbose
Print lots of debugging chatter while feeding.
min_width
The minimum number of characters in an item link. Links shorter than this will be padded to this length (to make tapping easier). Default 25.
save_days
How long to retain feeds locally.

Defining a new feed

The recommended way of adding a new feed is by creating (or linking) a new file in your feedme config directory, the same location where feedme.conf lives.

The siteconf directory here contains a collection of sample feeds. They aren't guaranteed to be up-to-date; sometimes I give up on a site and stop updating it. If you don't need to make any changes, the easiest way to use these files is to symlink them into your feedme config directory: for example, ln -s ~/src/feedme/siteconf/washington-post.conf ~/.config/feedme/

If you wish, you can also define feeds by adding sections directly to the feedme.conf file.

To define a new feed, start by setting a name, in square brackets, like [Anytown Post-Dispatch] Set url to the site's RSS URL. Then go to the RSS page in a browser, click on a story, view the HTML source of the story, find the place where the actual story begins (it may be three quarters of the way down the page or even farther) and find something that indicates the start of the story, after all the sidebars and advertising and tracking javascript and other cruft. Save this as page_start. Optionally, scroll down to the end of the story and find a page_end as well. Your site file should look something like this:

[Anytown Post-Dispatch]
url = http://example.com/feeds.rss
page_start = <story>
page_end = </story>
(though page_start and page_end will probably be more complicated than that on most sites).

At this point you can test it by running feedme on the name you just set:

feedme -n 'Anytown Post-Dispatch'

The -n (nocache) option tells feedme to fetch stories even if there's nothing new; while testing, you want that since otherwise, after the first time, feedme will tell you that there are no new stories to fetch.

Once you have a basic site, you can start tuning the site-specific options.

Basic options for specific feeds

Here are some basic options that can be set in the [DEFAULT] section of your feedme.conf, and can be overridden for specific feeds:

url
The RSS URL for the site.
levels
Level 1: only save the RSS page. Level 2: save sub-pages.
skip_images
Don't save images. Default true.
skip_links:
For sites with levels=1 where you just want a single news feed and never want to click on anything (e.g. slashdot), this can eliminate distracting links that you might tap on accidentally while scrolling. (My husband wanted this. I doubt many other people will want it.)
allow_repeats:
Some sites re-use the same URL and ID for new content: for instance, a site might have a page for the monthly photo contest that gets updated once a month. For these sites, set allow_repeats. The default is false, because many news sites like to make trivial tweaks to yesterday's story, so you see the same story day after day unless you filter out stories you've seen before.
simplify
Clean up the HTML: try to remove things like text colors and sizes that sometimes make stories unreadable.
continue_on_timeout
Normally, if one page times out, feedme will assume the site is down. On sites that link to content from many different URLs, set this to true.
encoding
Normally feedme will try to guess the encoding from the page. But some pages lie, so use this to override that.
nocache
Don't check whether we've seen an entry before: collect everything.
nonlocal_images
Normally feedme will ignore images from other domains (usually ads). But some sites link to images from all over; set this to true in that case.
block_nonlocal_images
If we're not allowing nonlocal images, setting block_nonlocal to true will rewrite the img src attribute of all nonlocal images to a bogus value. This will cause some "broken images" tags to show. Use it if you're on a fixed data plan and don't want to incur extra data charges downloading images while reading stories. Default false.

Trickier Site Options

You may need some of these options for particular sites.

cookiefile
For this site, use cookies from a specific browser profile. Currently only Firefox's cookies.sqlite is supported.
max_srcset_size
On sites that use srcset for their images, how big do we want them to think our screen is? Integer, default 800.
rss_entry_size
Limit RSS entries to this many characters. This is for sites that put the entire story into the RSS page.
alt_domains
Normally, feedme will only fetch images from the same domain as the site being read, and if nonlocal_images is set, it will replace all other images with a dummy image. alt_domains specifies a list of domains that are acceptable sources of images for a given site, for sites that host images somewhere other than their normal domain.
single_page_pats
Sites that spread stories over many pages often have a naming convention in their "view as single page" URLs. This is a pattern, e.g. http://exmaple.com/.*single.html. Feedme will look for this pattern in a story, and if it sees it, it will follow the link and use it instead of the original story link.
skip_pats
For sites that embed annoying things in the middle of stories: maybe video clips, or ads, or distracting pull-quotes. Anything matching one of the skip_pats will be deleted from the story.
skip_link_pats
Skip links with these patterns.
skip_title_pats
Skip anything with titles containing these patterns.
skip_content_pats
Skip anything whose content includes these patterns.
index_skip_content_pats
Skip anything where the index content includes: when
When to check this site, if not always. May be a weekday, e.g. Sat, or a month date, e.g. 1 to check only on the first day of any month.

A few options, like alt_domains, page_start, page_end, or any of the "pats" options, can understand multiple regular expression patterns. Specify them by putting them on separate lines (whitespace at the beginning is optional and ignored):

skip_pats = style=".*?"
  Follow us on.*?
skip_link_pats = skip_link_pat = http://www.site.com/articles/video/
  http://www.site.com/articles/podcasts/

Helper Modules

For sites that can't be simply downloaded, you can define a helper Python module. There are page_helpers that can fetch URLs -- for instance, if a site requires JavaScript, you can use a package like Selenium to fetch it -- and feed_helpers that can get a feed that doesn't have RSS. See helpers/README.md for details.

Output Formats

Feedme always produces HTML as an output format. If that's all you need, the default formats = none is fine.

Downloaded HTML will be put in ~/feeds/ (which must exist; you can specify a different location as dir in feedme.conf).

Feedme can then convert the HTML into one of three formats: epub, plucker, or fb2. To get them, set formats = epub (or plucker, or fb2). You can specify multiple formats, comma separated, e.g. epub,fb2.

You'll need to have appropriate conversion tools installed on your system: plucker for plucker format, calibre's ebook-convert for the other two.

FeedMe can optionally convert each page to plain ascii, eliminating accented characters and such (displaying them used to be a problem on Palm PDAs, but shouldn't be needed for most modern devices). For this option, set ascii="yes" in feedme.conf and install my ununicode module somewhere in your python path.

Warning: these conversions haven't been tested in a while, though they used to work fine. If you have problems, please file a bug or contact me.

Maintenance:

Feedme uses three important directories:

Feedme's configuration file is ~/.config/feedme/feedme.conf.

Feedme's cache is ~/.cache/feedme/feedme.dat. This file should remain relatively small if you have a sane number of feeds, but it doesn't hurt to keep an eye on it. Feedme will also keep backup cache files for about a week, named by date; you can use these to go back to an earlier state in case you lost your feeds or accidentally deleted something.

~/feeds is where it stores the downloaded HTML by default (you can change this as dir in the feedme.conf). Stories are downloaded as sitename/number.html, e.g. ~/feeds/BBC_World_News/2.html. These stories are cleaned out every save_days (set in your feedme.conf).

If you save to formats beyond plain HTML, there may be other directories used for the converted files; for example, plucker files are created in ~/.plucker/feeds. This is never cleaned out by feedme, so you'll have to prune it yourself. When I used plucker as my feed reader, I had an alias that ran rm ~/.plucker/feedme/* to remove the previous day's plucks just before I ran feedme.

FeedMe's license is GPLv2 or (at your option) any later GPL version.

Thanks to Carla Schroder for the name suggestion!

Reading Your Feeds

By default, FeedMe fetches feeds to simplified HTML files on your local filesystem. You can read them there with a browser or any other app you prefer; or you can save to a format like EPUB to read on an ebook reader.

Currently, I run FeedMe daily on a web server; then I use an Android program I wrote, FeedViewer, to download the day's feeds to my phone and read them there.

I'm the first to confess that FeedViewer isn't very well documented, though, especially the process of downloading from a server. If you're trying to use FeedViewer and can't figure it out, let me know, otherwise I may never get around to documenting it.


More Software
Shallowsky Home
mail me