I wrote at length about my explorations into
selenium
to fetch stories from the New York Times (as a subscriber).
But I mentioned in Part III that there was a much easier way
to fetch those stories, as long as the stories didn't need JavaScript.
That way is to use normal file fetching (using urllib or requests),
but with a CookieJar object containing the cookies from a Firefox
session where I'd logged in.
Read more ...
Tags: programming, python, cookies, firefox, scraping
[
12:22 Dec 03, 2021
More programming |
permalink to this entry |
]
In my eternal quest for a decent RSS feed for top World/National news,
I decided to try subscribing to the New York Times online.
But when I went to try to add them to my RSS reader, I discovered
it wasn't that easy: their login page sometimes gives a captcha, so
you can't just set a username and password in the RSS reader.
A common technique for sites like this is to log in with a browser,
then copy the browser's cookies into your news reading program.
At least, I thought it was a common technique -- but when I tried
a web search, examples were surprisingly hard to find.
None of the techniques to examine or save browser cookies were all
that simple, so I ended up writing a
browser_cookies.py
Python script to extract cookies from chromium and firefox browsers.
Read more ...
Tags: web, programming, python, cookies, privacy
[
11:19 Mar 30, 2021
More programming |
permalink to this entry |
]