Shallow Thoughts : : programming

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Thu, 23 Jun 2022

Clicking through a Translucent Image Window

[transparent image viewer overlayed on top of topo map]

Five years ago, I wrote about Clicking through a translucent window: using X11 input shapes and how I used a translucent image window that allows click-through, positioned on top of PyTopo, to trace an image of an old map and create tracks or waypoints.

But the transimageviewer.py app that I wrote then was based on GTK2, which is now obsolete and has been removed from most Linux distro repositories. So when I found myself wanting GIS to help investigate a growing trail controversy in Pueblo Canyon, I discovered I didn't have a usable click-through image viewer.

Read more ...

Tags: , , , ,
[ 19:08 Jun 23, 2022    More programming | permalink to this entry | ]

Fri, 03 Dec 2021

Importing Cookies from a Firefox Profile in Python

I wrote at length about my explorations into selenium to fetch stories from the New York Times (as a subscriber). But I mentioned in Part III that there was a much easier way to fetch those stories, as long as the stories didn't need JavaScript.

That way is to use normal file fetching (using urllib or requests), but with a CookieJar object containing the cookies from a Firefox session where I'd logged in.

Read more ...

Tags: , , , ,
[ 12:22 Dec 03, 2021    More programming | permalink to this entry | ]

Sat, 20 Nov 2021

Wikipedia: All Roads Lead to ... Philosophy?

At a recent LUG meeting, we were talking about various uses for web scraping, and someone brought up a Wikipedia game: start on any page, click on the first real link, then repeat on the page that comes up. The claim is that this chain always gets to Wikipedia's page on Philosophy.

We tried a few rounds, and sure enough, every page we tried did eventually get to Philosophy, usually via languages, which goes to communication, goes to discipline, action, intention, mental, thought, idea, philosophy.

It's a perfect game for a discussion of scraping. It should be an easy exercise to write a scraper to do this, right?

Read more ...

Tags: , , , ,
[ 19:31 Nov 20, 2021    More programming | permalink to this entry | ]

Thu, 11 Nov 2021

Selenium: Handling Timeouts and Errors

This is part 3 of my selenium exploration trying to fetch stories from the NY Times ((as a subscriber).

At the end of Part II, selenium was running on a server with the minimal number of X and GTK libraries installed.

But now that it can run unattended, there's nother problem: there are all kinds of ways this can fail, and your script needs to handle those errors somehow.

Before diving in, I should mention that for my original goal, fetching stories from the NY Times as a subscriber, it turned out I didn't need selenium after all. Since handling selenium errors turned out to be so brittle (as I'll describe in this article), I'm now using requests combined with a Python CookieJar. I'll write about that in a future article. Meanwhile ...

Handling Errors and Timeouts

Timeouts are a particular problem with selenium, because there doesn't seem to be any reliable way to change them so the selenium script doesn't hang for ridiculously long periods.

Read more ...

Tags: , , ,
[ 12:07 Nov 11, 2021    More programming | permalink to this entry | ]

Sun, 07 Nov 2021

Configuring Selenium to Run Headless, Without a Desktop

This is part 2 of my selenium exploration trying to fetch stories from the NY Times ((as a subscriber).

When we left off, I was learning the basics of selenium in order to fetch stories (as a subscriber) from the New York Times. Fetching stories was working properly, and all that remained was to put it in an automated script, then move it to a server where it could run automatically without my desktop machine needing to be on.

Unfortunately, that turned out to be the hardest part of the problem.

Read more ...

Tags: , , ,
[ 12:18 Nov 07, 2021    More programming | permalink to this entry | ]

Tue, 02 Nov 2021

Web Scraping with Selenium in Python

This is part 1 of my selenium exploration.

At the New Mexico GNU & Linux User Group, currently meeting virtually on Jitsi, someone expressed interest in scraping websites. Since I do quite a bit of scraping, I offered to give a tutorial on scraping with the Python module BeautifulSoup.

"What about selenium?" he asked. Sorry, I said, I've never needed selenium enough to figure it out.

But then a week later, I found I did have a need.

Read more ...

Tags: , , ,
[ 19:58 Nov 02, 2021    More programming | permalink to this entry | ]

Tue, 30 Mar 2021

Fetching Browser Cookies Programmatically

In my eternal quest for a decent RSS feed for top World/National news, I decided to try subscribing to the New York Times online. But when I went to try to add them to my RSS reader, I discovered it wasn't that easy: their login page sometimes gives a captcha, so you can't just set a username and password in the RSS reader.

A common technique for sites like this is to log in with a browser, then copy the browser's cookies into your news reading program. At least, I thought it was a common technique -- but when I tried a web search, examples were surprisingly hard to find.

None of the techniques to examine or save browser cookies were all that simple, so I ended up writing a browser_cookies.py Python script to extract cookies from chromium and firefox browsers.

Read more ...

Tags: , , , ,
[ 11:19 Mar 30, 2021    More programming | permalink to this entry | ]

Sat, 19 Dec 2020

Android Studio Workarounds, and Command-Line Gradle Builds

I got a new phone. (Not something that happens often.)

Fun, right? Well, partly, but also something I'd been dreading. I had a feeling that my ancient RSS reader, FeedViewer, which I use daily to read all my news feeds, probably wouldn't work under a modern Android (I wrote it for KitKat and it was last updated under Marshmallow). And that was correct.

Read more ...

Tags: , ,
[ 17:49 Dec 19, 2020    More programming | permalink to this entry | ]