Shallow Thoughts : : Nov

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Fri, 26 Nov 2021

Insert HTML Tags in Emacs

Emacs has various options for editing HTML, none of them especially good. I gave up on html-mode a while back because it had so many evil tendencies, like not ever letting you type a double dash without reformatting several lines around the current line as an HTML comment. I'm using web-mode now, which is better.

But there was one nice thing that html-mode had: quick key bindings for inserting tags. For instance, C-c i would insert the tag for italics, <i></i>, and if you had something selected it would make that italic: <i>whatever you had selected</i>.

It's a nice idea, but it was too smart (for some value of "smart" that means "dumb and annoying") for its own good. For instance, it loves to randomly insert things like newlines in places where they don't make any sense.

Read more ...

Tags: , , ,
[ 19:07 Nov 26, 2021    More linux/editors | permalink to this entry | ]

Sat, 20 Nov 2021

Wikipedia: All Roads Lead to ... Philosophy?

At a recent LUG meeting, we were talking about various uses for web scraping, and someone brought up a Wikipedia game: start on any page, click on the first real link, then repeat on the page that comes up. The claim is that this chain always gets to Wikipedia's page on Philosophy.

We tried a few rounds, and sure enough, every page we tried did eventually get to Philosophy, usually via languages, which goes to communication, goes to discipline, action, intention, mental, thought, idea, philosophy.

It's a perfect game for a discussion of scraping. It should be an easy exercise to write a scraper to do this, right?

Read more ...

Tags: , , , ,
[ 19:31 Nov 20, 2021    More programming | permalink to this entry | ]

Mon, 15 Nov 2021

Removing Bad Autocompletes from Firefox's Location Bar

A priest, a minister, and a rabbit walk into a bar.
The bartender asks the rabbit what he'll have to drink.
"How should I know?" says the rabbit. "I'm only here because of autocomplete."

Firefox folks like to call the location bar/URL bar the "awesomebar" because of the suggestions it makes. Sometimes, those suggestions are pretty great; there are a lot of sites I don't bother to bookmark because I know they will show up as the first suggestion.

Other times, the "awesomebar" not so awesome. It gets stuck on some site I never use, and there's seemingly no way to make Firefox forget that site.

Read more ...

Tags: , ,
[ 16:54 Nov 15, 2021    More tech/web | permalink to this entry | ]

Thu, 11 Nov 2021

Selenium: Handling Timeouts and Errors

This is part 3 of my selenium exploration trying to fetch stories from the NY Times ((as a subscriber).

At the end of Part II, selenium was running on a server with the minimal number of X and GTK libraries installed.

But now that it can run unattended, there's nother problem: there are all kinds of ways this can fail, and your script needs to handle those errors somehow.

Before diving in, I should mention that for my original goal, fetching stories from the NY Times as a subscriber, it turned out I didn't need selenium after all. Since handling selenium errors turned out to be so brittle (as I'll describe in this article), I'm now using requests combined with a Python CookieJar. I'll write about that in a future article. Meanwhile ...

Handling Errors and Timeouts

Timeouts are a particular problem with selenium, because there doesn't seem to be any reliable way to change them so the selenium script doesn't hang for ridiculously long periods.

Read more ...

Tags: , , ,
[ 12:07 Nov 11, 2021    More programming | permalink to this entry | ]

Sun, 07 Nov 2021

Configuring Selenium to Run Headless, Without a Desktop

This is part 2 of my selenium exploration trying to fetch stories from the NY Times ((as a subscriber).

When we left off, I was learning the basics of selenium in order to fetch stories (as a subscriber) from the New York Times. Fetching stories was working properly, and all that remained was to put it in an automated script, then move it to a server where it could run automatically without my desktop machine needing to be on.

Unfortunately, that turned out to be the hardest part of the problem.

Read more ...

Tags: , , ,
[ 12:18 Nov 07, 2021    More programming | permalink to this entry | ]

Tue, 02 Nov 2021

Web Scraping with Selenium in Python

This is part 1 of my selenium exploration.

At the New Mexico GNU & Linux User Group, currently meeting virtually on Jitsi, someone expressed interest in scraping websites. Since I do quite a bit of scraping, I offered to give a tutorial on scraping with the Python module BeautifulSoup.

"What about selenium?" he asked. Sorry, I said, I've never needed selenium enough to figure it out.

But then a week later, I found I did have a need.

Read more ...

Tags: , , ,
[ 19:58 Nov 02, 2021    More programming | permalink to this entry | ]