Shallow Thoughts : : Nov
Akkana's Musings on Open Source Computing and Technology, Science, and Nature.
Fri, 26 Nov 2021
Emacs has various options for editing HTML, none of them especially
good. I gave up on html-mode a while back because it had so many
evil tendencies, like not ever letting you type a double dash
without reformatting several lines around the current line as an
HTML comment. I'm using web-mode now, which is better.
But there was one nice thing that html-mode had: quick key bindings
for inserting tags. For instance, C-c i
would insert
the tag for italics, <i></i>
, and if you
had something selected it would make that italic:
<i>whatever you had selected</i>
.
It's a nice idea, but it was too smart (for some value of "smart"
that means "dumb and annoying") for its own good. For instance,
it loves to randomly insert things like newlines in places where they
don't make any sense.
Read more ...
Tags: editors, emacs, html, html-mode
[
19:07 Nov 26, 2021
More linux/editors |
permalink to this entry |
]
Sat, 20 Nov 2021
At a recent LUG meeting, we were talking about various uses for web
scraping, and someone brought up a Wikipedia game: start on any page,
click on the first real link, then repeat on the page that comes up.
The claim is that this chain always gets to Wikipedia's page on
Philosophy.
We tried a few rounds, and sure enough, every page we tried did
eventually get to Philosophy, usually via languages, which goes to
communication, goes to discipline, action, intention, mental, thought,
idea, philosophy.
It's a perfect game for a discussion of scraping. It should be an easy
exercise to write a scraper to do this, right?
Read more ...
Tags: programming, python, scraping, beautiful soup, wikipedia
[
19:31 Nov 20, 2021
More programming |
permalink to this entry |
]
Mon, 15 Nov 2021
A priest, a minister, and a rabbit walk into a bar.
The bartender asks the rabbit what he'll have to drink.
"How should I know?" says the rabbit. "I'm only here because of autocomplete."
Firefox folks like to call the location bar/URL bar the "awesomebar"
because of the suggestions it makes. Sometimes, those suggestions
are pretty great; there are a lot of sites I don't bother to bookmark
because I know they will show up as the first suggestion.
Other times, the "awesomebar" not so awesome. It gets stuck on some site
I never use, and there's seemingly no way to make Firefox forget that site.
Read more ...
Tags: web, firefox, sql
[
16:54 Nov 15, 2021
More tech/web |
permalink to this entry |
]
Thu, 11 Nov 2021
This is part 3 of my selenium exploration trying to fetch stories
from the NY Times ((as a subscriber).
At the end of Part II, selenium was running on a server with the
minimal number of X and GTK libraries installed.
But now that it can run unattended, there's nother problem:
there are all kinds of ways this can fail,
and your script needs to handle those errors somehow.
Before diving in, I should mention that for my original goal,
fetching stories from the NY Times as a subscriber,
it turned out I didn't need selenium after all.
Since handling selenium errors turned out to be so brittle
(as I'll describe in this article), I'm now using requests
combined with a Python CookieJar. I'll write about that in a
future article. Meanwhile ...
Handling Errors and Timeouts
Timeouts are a particular problem with selenium,
because there doesn't seem to be any reliable way to change them
so the selenium script doesn't hang for ridiculously long periods.
Read more ...
Tags: programming, python, scraping, selenium
[
12:07 Nov 11, 2021
More programming |
permalink to this entry |
]
Sun, 07 Nov 2021
This is part 2 of my selenium exploration trying to fetch stories
from the NY Times ((as a subscriber).
When we left off, I was learning
the
basics of selenium in order to fetch stories (as a subscriber)
from the New York Times. Fetching stories was working properly,
and all that remained was to put it in an automated script, then
move it to a server where it could run automatically without my
desktop machine needing to be on.
Unfortunately, that turned out to be the hardest part of the problem.
Read more ...
Tags: programming, python, scraping, selenium
[
12:18 Nov 07, 2021
More programming |
permalink to this entry |
]
Tue, 02 Nov 2021
This is part 1 of my selenium exploration.
At the New Mexico GNU & Linux User Group,
currently meeting virtually on Jitsi, someone expressed interest in scraping
websites. Since I do quite a bit of scraping, I offered to give
a tutorial on scraping with the Python module
BeautifulSoup.
"What about selenium?" he asked. Sorry, I said, I've never needed
selenium enough to figure it out.
But then a week later, I found I did have a need.
Read more ...
Tags: programming, python, scraping, selenium
[
19:58 Nov 02, 2021
More programming |
permalink to this entry |
]