Shallow Thoughts : : Apr

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sat, 13 Apr 2013

Parsing NOAA historical weather data

We've been considering the possibility of moving out of the Bay Area to somewhere less crowded, somewhere in the desert southwest we so love to visit. But that also means moving to somewhere with much harsher weather.

How harsh? It's pretty easy to search for a specific location and get average temperatures. But what if I want to make a table to compare several different locations? I couldn't find any site that made that easy.

No problem, I say. Surely there's a Python library, I say. Well, no, as it turns out. There are Python APIs to get the current weather anywhere; but if you want historical weather data, or weather data averaged over many years, you're out of luck.

NOAA purports to have historical climate data, but the only dataset I found was spotty and hard to use. There's an FTP site containing directories by year; inside are gzipped files with names like 723710-03162-2012.op.gz. The first two numbers are station numbers, and there's a file at the top level called ish-history.txt with a list of the station codes and corresponding numbers. Not obvious!

Once you figure out the station codes, the files themselves are easy to parse, with lines like

STN--- WBAN   YEARMODA    TEMP       DEWP      SLP        STP       VISIB      WDSP     MXSPD   GUST    MAX     MIN   PRCP   SNDP   FRSHTT
724945 23293  20120101    49.5 24    38.8 24  1021.1 24  1019.5 24    9.9 24    1.5 24    4.1  999.9    68.0    37.0   0.00G 999.9  000000
Each line represents one day (20120101 is January 1st, 2012), and the codes are explained in another file called GSOD_DESC.txt. For instance, MAX is the daily high temperature, and SNDP is snow depth.

[NOAA historical temp program] So all I needed was to decode the station names, download the right files and parse them. That took about a day to write (including a lot of time wasted futzing with mysterious incantations for matplotlib).

Little accessibility refresher: I showed it to Dave -- "Neat, look at this, San Jose is the blue pair, Flagstaff is green and Page is red." His reaction: "This makes no sense. They all look the same to me. I have no idea which is which." Oops -- right. Don't use color as your only visual indicator. I knew that, supposedly! So I added markers in different shapes for each site. (I wish somebody would teach that lesson to Google Maps, which uses color as its only indicator on the traffic layer, so it's useless for red-green colorblind people.)

Back to the data -- it turns out NOAA doesn't actually have that much historical data available for download. If you search on most of these locations, you'll find sites that claim to have historical temperatures dating back 50 years or more, sometimes back to the 1800s. But NOAA typically only has files starting at about 2005 or 2006. I don't know where sites are getting this older data, or how reliable it is.

Still, averages since 2006 are still interesting to compare. Here's a run of noaatemps.py KSJC KFLG KSAF KLAM KCEZ KPGA KCNY. It's striking how moderate California weather is compared to any of these inland sites. No surprise there. Another surprise was that Los Alamos, despite its high elevation, has more moderate weather than most of the others -- lower highs, higher lows. I was a bit disappointed at how sparse the site list was -- no site in Moab? Really? So I used Canyonlands Field instead.

Anyway, it's fun for a data junkie to play around with, and it prints data on other weather factors, like precipitation and snowpack, although it doesn't plot them yet. The code is on my GitHub scripts page, under Weather.

Anyone found a better source for historical weather information? I'd love to have something that went back far enough to do some climate research, see what sites are getting warmer, colder, or seeing greater or lesser spreads between their extreme temperatures. The NOAA dataset obviously can't do that, so there must be something else that weather researchers use. Data on other countries would be interesting, too. Is there anything that's available to the public?

Tags: , , ,
[ 22:57 Apr 13, 2013    More programming | permalink to this entry | ]

Fri, 05 Apr 2013

QuizCross

Watching people weave into and out of our lane while they texted on the freeway (where are all the cops who are supposed to be cracking down on that this week?), Dave came up with an idea: a competition where you drive some sort of course -- start with an autocross course, or maybe add twists like parallel parking -- while simultaneously texting. Your score is a combination of your time through the course, fewest pylons hit, and the accuracy of your texted replies.

He was thinking of a show we used to see at a pizza place we frequented a few years ago, "Cash Cab". The premise: there's a special taxi that drives around New York City rigged with video gear, and if it picks you up, you get a chance to play a "Who wants to be a millionaire" style quiz show in the time till the driver gets you to your destination.

I have to admit, although Dave's combination of Cash Cab and autocross sounded intriguing, it didn't sound like something I'd actually want to do. Although I see plenty of drivers who seem to love the challenge of parallel parking or negotiating rush-hour traffic with one hand (or no hands!), it's not my thing.

But here's a modification that did sound fun to me: you wear a hands-free headset, and while you negotiate the course, someone asks you quiz-show type questions and you have to answer while you're driving the course. You can still use both hands to drive; just not your whole brain.

It's an exercise in concentration and filtering distractions. Can you figure out what part of the course needs your fullest attention, and which parts you might be able to take nearly as fast while thinking about the quiz question? It's a biathlon for motorheads.

The scientifically minded part of me wants to take a little extra time and add a free run through the course for each contestant at the beginning and end of the event, with no quiz questions. That way everybody gets a baseline time for the course, and it's easy to find out how much the distraction hurts our driving. Some studies say that a hands-free phone is just as distracting as a handheld one. Wouldn't you love to find out exactly how true that is for you?

I know it'll never happen -- it's hard enough to reserve autocross sites without the additional complications of an untried event format. But I'd sure love to try it. If any researchers with funding for distracted-driving studies are reading this and want to use the idea, count me in as either a helper or a study subject.

I'm calling it QuizCross. You heard it here first.

Tags: ,
[ 20:55 Apr 05, 2013    More misc | permalink to this entry | ]