It was surprisingly hard to come up with a "D" to write about, without descending into Data geekery (always a temptation). Though you may decide I've done that anyway with today's topic.
Out for a scenic drive to shake off some of the house-bound cobwebs, I got to thinking about how so many places are named after the Devil. California was full of them -- the Devil's Punchbowl, the Devil's Postpile, and so forth -- and nearly every western National Park has at least one devilish feature.
How many are there really? Happily, there's an easy way to answer questions like this: the Geographic Names page on the USGS website, which hosts the Geographic Names Information System (GNIS). You can download entire place name files for a state, or you can search for place name matches at: GNIS Feature Search.
When I searched there for "devil", I got 1883 hits -- but many of them don't actually include the word "Devil". What, are they taking lessons from Google about searching for things that don't actually match the search terms?
I decided I wanted to download the results so I could count them more easily. The page offers View & Print all or Save as pipe "|" delimited file. I chose to save the file.
Nifty Command-Line Tricks
The downloaded file is a format called CSV ("Comma Separated Values", though in this case they're separated by the pipe character, "|"), typically used in spreadsheets. I'm not really a spreadsheet person, and CSV files are just as easy to analyze using basic shell tools. Most Linux users are familiar with the power of the command line, but don't feel left out if you're not using Linux: the commands I'll show work fine on a Mac, and they probably work on Windows too if you use the Linux Subsystem for Windows.
I started with a basic count. I'd seen already, on the website's
search page, that a lot of the names didn't actually have "Devil"
in the name even though that's what I searched for, so that 1883
number is bogus. So I ran a
grep -i devil to pick out the place names that actually
do have "devil" in the name (-i means "ignore case", so it will find
devil as well as Devil). Then I piped the result through wc, word
count, using -l to count the number of matching lines:
grep -i devil GNIS_Devil.csv | wc -l
There are 1764 place names that actually do contain the word "Devil".
Hmm, I wonder which state is the most devilish? That's a little trickier using just shell commands. First we have to extract the state. Lines in the CSV file look like this:
"Devils Well"|"896970"|"Basin"|"De Baca"|"NM"|"340337N"|"1044154W"|"4400"|"Devils Well"|""|"13-NOV-1980"So the state part looks like
|"NM"|. How about if we look for anything that includes two uppercase letters between pipes and quotes?
grep -i devil GNIS_Devil.csv | sed 's/.*|\"\([A-Z][A-Z]\)\"|.*/\1/'gets a list of all the 2-letter state abbreviations.
But that's not too useful; what I want to know is how many times CA appears, how many times NM appears, etc. There's a nice trick for finding that: sort the list, so you'll see all the AKs first, followed by all the ALs, and so forth. Then you can use uniq -c to count how many times each state appears. So you'll see 45 AK, followed by 22 AL and so forth.
Almost there: but I'd like them in order by number, with the biggest number (the most devilish state) first. So I'll add one more sort at the end:
grep -i devil GNIS_Devil.csv | sed 's/.*|\"\([A-Z][A-Z]\)\"|.*/\1/' | sort | uniq -c | sort
Ta-da: CA is the most devilish state, with 221 devilish features, followed by Oregon with 187. 52 states are represented (including DC and GU (Guam), with one feature each. Here are the top ten:
221 CA 107 OR 97 TX 77 CO 75 WA 75 ID 64 UT 61 MT 52 NM 52 MO
Black Mesa / Black Mountain
Just for fun, as long as I still had the GNIS search page in my browser, I had another search in mind. A longstanding source of amusement when driving around Española is how there are two "Black Mesa"s a few miles apart, easily within site of each other. (Locals tend to use the Tewa name Tunyo, meaning "Spotted", for one, and one of several Spanish names, Mesa Prieta, meaning "Dark Mesa", for the other; though for Mesa Prieta I prefer one of the earlier Spanish names, Mesa Canoa, referring to its long and slender canoe-like shape.) Anyway, there's a local joke that the average distance between two features called "Black Mesa" in the state is how far you can see.
Not every state has mesas at all, but in states that don't, mountains fill the void, and "Black Mountain" is at least as popular a name as "Black Mesa".
How many are there really? I searched the GNIS site separately for Black Mesa and Black Mountain. Turns out that the US has 303 Black Mountains, and only 49 Black Mesas.
For mountains, CA is the big winner with 62 Black Mountains, versus AZ, NV and NM in a three-way tie for second place with 26 each. Mesas are far less common: there are 28 Black Mesas in AZ, 26 in NM, and a handful scattered among five other western states.
Also neat: New Mexico has the same number of Black Mesas and Black Mountains, and the number of Black Mesas or Mountains (the sum of the two) in the state is the same as the number of Devils. (Thanks to Dave for pointing that out.)
I suppose this still counts as data geekery. But it was a fun and easy exploration on a topic I've been curious about for a while!
[ 16:30 Mar 30, 2020 More linux/cmdline | permalink to this entry | comments ]