Ever want to download RSS from news sites or blogs to your phone, PDA, ebook reader, or other handheld device to read offline?
Feedme is a program for fetching stories from RSS or Atom feeds, from news sites, blogs or any other site that offers a feed. It saves simplified HTML files, or can translate to other formats such as Epub, FB2, Plucker or plain text for devices that can't handle HTML.
RSS stands for Really Simple Syndication. It's a way of keeping track of things you've already seen so you only read the ones that are new, and it's widely used on sites like newspapers and blogs that are constantly adding new content.
Google has re-added RSS support to Chrome after removing it earlier: Chrome’s newest feature resurrects the ghost of Google Reader (Popular Science, Nov 2, 2021), Google rediscovers RSS: tests new feature to ‘follow’ sites in Chrome on Android (The Verge, May 2020).
A web search for rss reader will get you a list of other ways to follow an RSS feed. I recommend limiting the search to "Past year" since there are some older RSS readers that are no longer supported. Wikipedia also has a Comparison of feed aggregators ("feed aggregators" means more or less the same thing as "RSS reader").
FeedMe is now maintained on GitHub: FeedMe. It's currently at 1.0b5.
You will need Python 3, python's feedparser module (on Ubuntu or Debian, that's the package python3-feedparser) and lxml (package python3-lxml). Of course you can also install feedparser and lxml from PyPI if you prefer.
So I wrote FeedMe. I've been using it daily for many years, so I guess it works well enough for my purposes. Maybe it will work for you too.
FeedMe is sort of an RSS version of Sitescooper. By default, it produces HTML that's been simplified to work well on a small screen, but optionally it can convert pages to plaintext, EPUB, FB2 or Plucker format.
The sample feedme.conf configuration file should be vaguely self-explanatory, though it doesn't contain every option.
Install your feedme.conf ~/.config/feedme/feedme.conf (the usual Linux location), or ~/$XDG_CONFIG_HOME/feedme/feedme.conf.
The feedme.conf file should start with a set of default options
in a section labeled
These options apply to the whole feedme process:
The recommended way of adding a new feed is by creating (or linking) a new file in your feedme config directory, the same location where feedme.conf lives.
The siteconf directory here contains a collection of sample feeds.
They aren't guaranteed to be up-to-date; sometimes I give up on a site
and stop updating it. If you don't need to make any changes, the easiest
way to use these files is to symlink them into your feedme config directory:
ln -s ~/src/feedme/siteconf/washington-post.conf ~/.config/feedme/
If you wish, you can also define feeds by adding sections directly to the feedme.conf file.
To define a new feed, start by setting a name, in square brackets,
Set url to the site's RSS URL.
Then go to the RSS page in a browser, click on a story,
view the HTML source of the story, find the place where the actual story
begins (it may be three quarters of the way down the page or even
farther) and find something that indicates the start of the story,
other cruft. Save this as
Optionally, scroll down to the end of the story and find
page_end as well. Your site file should look something
[Anytown Post-Dispatch] url = http://example.com/feeds.rss page_start = <story> page_end = </story>(though page_start and page_end will probably be more complicated than that on most sites).
At this point you can test it by running feedme on the name you just set:
feedme -n 'Anytown Post-Dispatch'
The -n (nocache) option tells feedme to fetch stories even if there's nothing new; while testing, you want that since otherwise, after the first time, feedme will tell you that there are no new stories to fetch.
Once you have a basic site, you can start tuning the site-specific options.
Here are some basic options that can be set in
[DEFAULT] section of your feedme.conf,
and can be overridden for specific feeds:
These options aren't as easy to explain or understand, but you may need them for particular sites.
skip_pats = div class="advertisement" span class="junk"The attribute (e.g. "junk") is a regular expression as used by BeautifulSoup.
A few options, like alt_domains, page_start, page_end, or any of the "pats" options, can understand multiple regular expression patterns. Specify them by putting them on separate lines (whitespace at the beginning is optional and ignored):
skip_pats = style=".*?" Follow us on.*? skip_link_pats = skip_link_pat = http://www.site.com/articles/video/ http://www.site.com/articles/podcasts/
Feedme always produces HTML as an output format.
If that's all you need, the default
formats = none is fine.
Downloaded HTML will be put in ~/feeds/ (which must exist; you can specify a different location as dir in feedme.conf).
Feedme can then convert the HTML into one of three formats:
epub, plucker, or fb2.
To get them,
formats = epub (or plucker, or fb2).
You can specify multiple formats, comma separated, e.g. epub,fb2.
You'll need to have appropriate conversion tools installed on your system: plucker for plucker format, calibre's ebook-convert for the other two.
FeedMe can optionally convert each page to plain ascii, eliminating
accented characters and such (displaying them used to be a problem
on Palm PDAs, but shouldn't be needed for most modern devices).
For this option, set
ascii="yes" in feedme.conf and install my
somewhere in your python path.
Warning: these conversions haven't been tested in a while, though they used to work fine. If you have problems, please file a bug or contact me.
Feedme uses three important directories:
Feedme's configuration file is ~/.config/feedme/feedme.conf.
~/feeds is where it stores the downloaded HTML by default
(you can change this as
dir in the feedme.conf).
Stories are downloaded as sitename/number.html, e.g.
~/feeds/BBC_World_News/2.html. These stories are cleaned
out every save_days (set in your feedme.conf).
The third important directory is the cache directory (see below).
If you save to formats beyond plain HTML, there may be other directories
used for the converted files; for example, plucker files are created in
This is never cleaned out by feedme, so you'll have to prune it yourself.
When I used plucker as my feed
reader, I had an alias that ran
to remove the previous day's plucks just before I ran feedme.
Feedme's cache is ~/.cache/feedme/feedme.dat. This file should remain relatively small if you have a sane number of feeds, but it doesn't hurt to keep an eye on it. Feedme will also keep backup cache files for about a week, named by date; you can use these to go back to an earlier state in case you lost your feeds or accidentally deleted something.
Old cache files are backed up for roughly a week. If you run feedme and something goes wrong, you can reset back to the previous cache file. For instance, if everything was fine when you ran feedme on Monday May 15, but something goes wrong on Tuesday, go to your cache directory and you'll see feedme.dat from Tuesday (a few minutes old), and feedme-23-05-15-Mon.dat so the fix is:
cp feedme-23-05-15-Mon.dat feedme.datand then when you run feedme again, it will re-fetch all of Tuesday's stories as though the first bad run never happened.
FeedMe's license is GPLv2 or (at your option) any later GPL version.
Thanks to Carla Schroder for the name suggestion!
By default, FeedMe fetches feeds to simplified HTML files on your local filesystem. You can read them there with a browser or any other app you prefer; or you can save to a format like EPUB to read on an ebook reader.
Currently, I run FeedMe daily on a web server; then I use an Android program I wrote, FeedViewer, to download the day's feeds to my phone and read them there.
I'm the first to confess that FeedViewer isn't very well documented, though, especially the process of downloading from a server. If you're trying to use FeedViewer and can't figure it out, let me know, otherwise I may never get around to documenting it.