Shallow Thoughts : tags : weather

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Tue, 12 Feb 2019

Reading the Output of a Weather Station Using Software Defined Radio

[Weather station] A while back, Dave ordered a weather station. His research pointed to the Ambient Weather WS-2000 as the best bang for the buck as far as accuracy (after it's calibrated, which is a time consuming and exacting process where you compare readings to a known-good mercury thermometer, a process that I suspect most weather station owners don't bother with).

It comes with a little 7" display console that sits indoors and reads the radio signal from the outside station as well as a second thermometer inside, then displays all the current weather data. It also uses wi-fi to report the data upstream to Ambient and, optionally, to a weather site such as Wunderground. (Which we did for a while, but now Wunderground is closing off their public API, so why give them data if they're not going to make it easy to share it?)

[Weather station console] Having the console readout and the Ambient "dashboard" is all very nice, but of course, being a data geek, I wanted a way to get the data myself, so I could plot it, save it or otherwise process it. And that's where Ambient falls short. The console, though it's already talking on wi-fi, gives you no way to get the data. They sell a separate unit called an "Observer" that provides a web page you can scrape, and we actually ordered one, but it turned out to be buggy and difficult to use, giving numbers that were substantially different from what the console showed, and randomly failing to answer, and we ended up returning the observer for a refund.

The other way of getting the data is online. Ambient provides an API you can use for that purpose, if you email them for a key. It mostly works, but it sometimes lags considerably behind real time, and it seems crazy to have to beg for a key and then get data from a company website that originated in our own backyard.

What I really wanted to do was read the signal from the weather station directly. I'd planned for ages to look into how to do that, but I'm a complete newbie to software defined radio and wasn't sure where to start. Then one day I noticed an SDR discussion on the #raspberrypi IRC channel on Freenode where I often hang out. I jumped in, asked some questions, and got pointed in the right direction and referred to the friendly and helpful #rtlsdr Freenode channel.

An Inexpensive SDR Dongle

Update: Take everything that follows with a grain of salt. I got it working, everything was great -- then when I tried it the very next day after I wrote the article, none of it worked. At all. The SDR dongle no longer saw anything from the station, even though the station was clearly still sending to the console.

I never did get it working reliably, nor did I ever find out what the problem was, and in the end I gave up. Occasionally the dongle will see the weather station's output, but most of the time it doesn't. It might be a temperature sensitivity issue (though the dongle I bought is supposed to be temperature compensated). Or maybe it's gremlins. Whatever it is, be warned that although the information below might get you started, it probably won't get you a reliably working SDR solution. I wish I knew the answer.

[Raspberry Pi with SDR dongle] On the experts' advice, I ordered a RTL-SDR Blog R820T2 RTL2832U 1PPM TCXO SMA Software Defined Radio with 2x Telescopic Antennas on Amazon. This dongle apparently has better temperature compensation than cheaper alternatives, it came with a couple of different antenna options, and I was told it should work well with Linux using a program called rtl_433.

Indeed it did. The command to monitor the weather station is

rtl_433 -f 915M

rtl_433 already knows the protocol for the WS-2000, so I didn't even need to do any decoding or reverse engineering; it produces a running account of the periodic signals being broadcast from the station. rtl_433 also helpfully offers -F json and -F csv options, along with a few other formats. What a great program!

JSON turned out to be the easiest for me to use; initially I thought CSV would be more compact, but rtl_433's CSV format includes fields for every possible quantity a weather station could ever broadcast. When you think about it, that makes sense: once you're outputting CSV you can't add a new field in mid-stream, so you'd better be ready for anything. JSON, on the other hand, lets you report just whatever the weather station reports, and it's easy to parse from nearly any programming language.

Testing the SDR Dongle

Full disclosure: at first, rtl_433 -f 915M wasn't showing me anything and I still don't know why. Maybe I had a loose connection on the antenna, or maybe I got unlucky and the weather station picked the exact wrong time to take a vacation. But while I was testing, I found another program that was very helpful in testing whether my SDR dongle was working: rtl_fm, which plays radio stations. The only trick is finding the right arguments, since the example from the man page just played static. Here's what worked for me:

rtl_fm -f 101.1M -M fm -g 20 -s 200k -A fast -r 32k -l 0 -E deemp | play -r 32k -t raw -e s -b 16 -c 1 -V1 -
That command plays the 101.1 FM radio station. (I had to do a web search to give me some frequencies of local radio stations; it's been a long time since I listened to normal radio.)

Once I knew the dongle was working, I needed to verify what frequency the weather station was using for its broadcasts. What I really wanted was something that would scan frequencies around 915M and tell me what it found. Everyone kept pointing me to a program called Gqrx. But it turns out Gqrx on Linux requires PulseAudio and absolutely refuses to work or install without it, even if you have no interest in playing audio. I didn't want to break my system's sound (I've never managed to get sound working reliably under PulseAudio), and although it's supposedly possible to build Gqrx without Pulse, it's a difficult build: I saw plenty of horror stories, and it requires Boost, always a sign that a build will be problematic. I fiddled with it a little but decided it wasn't a good time investment.

I eventually found a scanner that worked: RTLSDR-Scanner. It let me set limiting frequencies and scan between them, and by setting it to accumulate, I was able to verify that indeed, the weather station (or something else!) was sending a signal on 915 MHz. I guess by then, the original problem had fixed itself, and after that, rtl_433 started showing me signals from the weather station. It's not super polished, but it's the only scanner I've found that works without requiring PulseAudio.

That Puzzling Rainfall Number

One mystery remained to be solved. The JSON I was getting from the weather station looked like this (I've reformatted it for readablility):

{
    "time" : "2019-01-11 11:50:12",
    "model" : "Fine Offset WH65B",
    "id" : 60,
    "temperature_C" : 2.200,
    "humidity" : 94,
    "wind_dir_deg" : 316,
    "wind_speed_ms" : 0.064,
    "gust_speed_ms" : 0.510,
    "rainfall_mm" : 90.678,
    "uv" : 324,
    "uvi" : 0,
    "light_lux" : 19344.000,
     "battery" : "OK",
    "mic" : "CRC"
}

This on a day when it hadn't rained in ages. What was up with that "rainfall_mm" : 90.678 ?

I asked on the rtl_433 list and got a prompt and helpful answer: it's a cumulative number, since some unspecified time in the past (possibly the last time the battery was changed?) So as long as I make a note of the rainfall_mm number, any change in that number means new rainfall.

This being a snowy winter, I haven't been able to test that yet: the WS-2000 doesn't measure snowfall unless some snow happens to melt in the rain cup.

Some of the other numbers, like uv and uvi, are in mysterious unknown units and sometimes give results that make no sense (why doesn't uv go to zero at night? You're telling me that there's that much UV in starlight?), but I already knew that was an issue with the Ambient. It's not rtl_433's fault.

I notice that the numbers are often a bit different from what the Ambient API reports; apparently they do some massaging of the numbers, and the console has its own adjustment factors too. We'll have to do some more calibration with a mercury thermometer to see which set of numbers is right.

Anyway, cool stuff! It took no time at all to write a simple client for my WatchWeather web app that runs rtl_433 and monitors the JSON output. I already had WatchWeather clients collecting reports from Raspberry Pi Zero Ws sitting at various places in the house with temperature/humidity sensors attached; and now my WatchWeather page can include the weather station itself.

Meanwhile, we donated another weather station to the Los Alamos Nature Center, though it doesn't have the SDR dongle, just the normal Ambient console reporting to Wunderground.

Tags: , , ,
[ 13:20 Feb 12, 2019    More tech | permalink to this entry | ]

Mon, 23 Jul 2018

Rain Song

We've been in the depths of a desperate drought. Last year's monsoon season never happened, and then the winter snow season didn't happen either.

Dave and I aren't believers in tropical garden foliage that requires a lot of water; but even piñons and junipers and other native plants need some water. You know it's bad when you find yourself carrying a watering can down to the cholla and prickly pear to keep them alive.

This year, the Forest Service closed all the trails for about a month -- too much risk of one careless cigarette-smoking hiker, or at least I think that was the reason (they never really explained it) -- and most the other trail agencies followed suit. But then in early July, the forecasts started predicting the monsoon at last. We got some cloudy afternoons, some humid days (what qualifies as humid in New Mexico, anyway -- sometimes all the way up to 40%), and the various agencies opened their trails again. Which came as a surprise, because those clouds and muggy days didn't actually include any significant rain. Apparently mere air humidity is enough to mitigate a lot of the fire risk?

Tonight the skies finally let loose. When the thunder and lightning started in earnest, a little after dinner, Dave and I went out to the patio to soak in the suddenly cool and electric air and some spectacular lightning bolts while watching the hummingbirds squabble over territory. We could see rain to the southwest, toward Albuquerque, and more rain to the east, toward the Sangres, but nothing where we were.

Then a sound began -- a distant humming/roaring, like the tires of a big truck passing on the road. "Are we hearing rain approaching?" we both asked at the same time. Since moving to New Mexico we're familiar with being able to see rain a long way away; and of course everyone has heard rain as it falls around them, either as a light pitter-patter or the louder sound from a real storm; but we'd never been able to hear the movement of a rainstorm as it gradually moved toward us.

Sure enough, the sound got louder and louder, almost unbearably loud -- and then suddenly we were inundated with giant-sized drops, blowing way in past the patio roof to where we were sitting.

I've heard of rain dances, and songs sung to bring the rain, but I didn't know it could sing back.

We ran for the door, not soon enough. But that was okay; we didn't mind getting drenched. After a drought this long, water from the sky is cause only for celebration.

The squall dumped over a third of an inch in only a few minutes. (This according to our shiny new weather station with a sensitive tipping-bucket rain gauge that measures in hundredths of an inch.) Then it eased up to a light drizzle for a while, the lightning moved farther away, and we decided it was safe to run down the trail to "La Cienega" (Spanish for swamp) at the bottom of the property and see if any water had accumulated. Sure enough! Lake La Senda (our humorous moniker for a couple of little puddles that sometimes persist as long as a couple of days) was several inches deep. Across the road, we could hear a canyon tree frog starting to sing his ratchety song -- almost more welcome than the sound of the rain itself.

As I type this, we're reading a touch over half an inch and we're down to a light drizzle. The thunder has receded but there's still plenty of lightning.

More rain! Keep it coming!

Tags:
[ 20:38 Jul 23, 2018    More nature | permalink to this entry | ]

Mon, 14 May 2018

Plotting the Jet Stream, or Other Winds, with ECMWF Data

I've been trying to learn more about weather from a friend who used to work in the field -- in particular, New Mexico's notoriously windy spring. One of the reasons behind our spring winds relates to the location of the jet stream. But I couldn't find many good references showing how the jet stream moves throughout the year. So I decided to try to plot it myself -- if I could find the data. Getting weather data can surprisingly hard.

In my search, I stumbled across Geert Barentsen's excellent Annual variations in the jet stream (video). It wasn't quite what I wanted -- it shows the position of the jet stream in December in successive years -- but the important thing is that he provides a Python script on GitHub that shows how he produced his beautiful animation.

[Sample jet steam image]

Well -- mostly. It turns out his data sources are no longer available, and he didn't go into a lot of detail on where he got his data, only saying that it was from the ECMWF ERA re-analysis model (with a link that's now 404). That led me on a merry chase through the ECMWF website trying to figure out which part of which database I needed. ECMWF has lots of publically available databases (and even more) and they even have Python libraries to access them; and they even have a lot of documentation, but somehow none of the documentation addresses questions like which database includes which variables and how to find and fetch the data you're after, and a lot of the sample code doesn't actually work. I ended up using the "ERA Interim, Daily" dataset and requesting data for only specific times and only the variables and pressure levels I was interested in. It's a great source of data once you figure out how to request it.

Sign up for an ECMWF API Key

Access ECMWF Public Datasets (there's also Access MARS and I'm not sure what the difference is), which has links you can click on to register for an API key.

Once you get the email with your initial password, log in using the URL in the email, and change the password. That gave me a "next" button that, when I clicked it, took me to a page warning me that the page was obsolete and I should update whatever bookmark I had used to get there. That page also doesn't offer a link to the new page where you can get your key details, so go here: Your API key. The API Key page gives you some lines you can paste into ~/.ecmwfapirc.

You'll also have to accept the license terms for the databases you want to use.

Install the Python API

That sets you up to use the ECMWF api. They have a Web API and a Python library, plus some other Python packages, but after struggling with a bunch of Magics tutorial examples that mostly crashed or couldn't find data, I decided I was better off sticking to the basic Python downloader API and plotting the results with Matplotlib.

The Python data-fetching API works well. To install it, activate your preferred Python virtualenv or whatever you use for pip packages, then run the pip command shown at Web API Downloads (under "Click here to see the installation/update instructions..."). As always with pip packages, you'll have to decide on a Python version (they support both 2 and 3) and whether to use a virtualenv, the much-disrecommended sudo pip, pip3, etc. I used pip3 in a virtualenv and it worked fine.

Specify a dataset and parameters

That's great, but how do you know which dataset you want to load?

There doesn't seem to be anything that just lists which datasets have which variables. The only way I found is to go to the Web API page for a particular dataset to see the form where you can request different variables. For instance, I ended up using the "interim-full-daily" database, where you can choose date ranges and lists of parameters. There are more choices in the sidebar: for instance, clicking on "Pressure levels" lets you choose from a list of barometric pressures ranging from 1000 all the way down to 1. No units are specified, but they're millibars, also known as hectoPascals (hPa): 1000 is more or less the pressure at ground level, 250 is roughly where the jet stream is, and Los Alamos is roughly at 775 hPa (you can find charts of pressure vs. altitude on the web).

When you go to any of the Web API pages, it will show you a dialog suggesting you read about Data retrieval efficiency, which you should definitely do if you're expecting to request a lot of data, then click on the details for the database you're using to find out how data is grouped in "tape files". For instance, in the ERA-interim database, tapes are grouped by date, so if you're requesting multiple parameters for multiple months, request all the parameters for a given month together, rather than making one request for level 250, another request for level 1000, etc.

Once you've checked the boxes for the data you want, you can fetch the data via the web interface, or click on "View the MARS request" to get parameters you can plug into a Python script.

If you choose the Python script option as I did, you can start with the basic data retrieval example. Use the second example, the one that uses 'format' : "netcdf", which will (eventually) give you a file ending in .nc.

Requesting a specific area

You can request only a limited area,

"area": "75/-20/10/60",
but they're not very forthcoming on the syntax of that, and it's particularly confusing since "75/-20/10/60" supposedly means "Europe". It's hard to figure how those numbers as longitudes and latitudes correspond to Europe, which doesn't go down to 10 degrees latitude, let alone -20 degrees. The Post-processing keywords page gives more information: it's North/West/South/East, which still makes no sense for Europe, until you expand the Area examples tab on that page and find out that by "Europe" they mean Europe plus Saudi Arabia and most of North Africa.

Using the data: What's in it?

Once you have the data file, assuming you requested data in netcdf format, you can parse the .nc file with the netCDF4 Python module -- available as Debian package "python3-netcdf4", or via pip -- to read that file:

import netCDF4

data = netCDF4.Dataset('filename.nc')

But what's in that Dataset? Try running the preceding two lines in the interactive Python shell, then:

>>> for key in data.variables:
...   print(key)
... 
longitude
latitude
level
time
w
vo
u
v

You can find out more about a parameter, like its units, type, and shape (array dimensions). Let's look at "level":

>>> data['level']
<class 'netCDF4._netCDF4.Variable'>
int32 level(level)
    units: millibars
    long_name: pressure_level
unlimited dimensions: 
current shape = (3,)
filling on, default _FillValue of -2147483647 used

>>> data['level'][:]
array([ 250,  775, 1000], dtype=int32)

>>> type(data['level'][:])
<class 'numpy.ndarray'>

Levels has shape (3,): it's a one-dimensional array with three elements: 250, 775 and 1000. Those are the three levels I requested from the web API and in my Python script). The units are millibars.

More complicated variables

How about something more complicated? u and v are the two components of wind speed.

>>> data['u']
<class 'netCDF4._netCDF4.Variable'>
int16 u(time, level, latitude, longitude)
    scale_factor: 0.002161405503194121
    add_offset: 30.095301438361684
    _FillValue: -32767
    missing_value: -32767
    units: m s**-1
    long_name: U component of wind
    standard_name: eastward_wind
unlimited dimensions: time
current shape = (30, 3, 241, 480)
filling on
u (v is the same) has a shape of (30, 3, 241, 480): it's a 4-dimensional array. Why? Looking at the numbers in the shape gives a clue. The second dimension has 3 rows: they correspond to the three levels, because there's a wind speed at every level. The first dimension has 30 rows: it corresponds to the dates I requested (the month of April 2015). I can verify that:
>>> data['time'].shape
(30,)

Sure enough, there are 30 times, so that's what the first dimension of u and v correspond to. The other dimensions, presumably, are latitude and longitude. Let's check that:

>>> data['longitude'].shape
(480,)
>>> data['latitude'].shape
(241,)

Sure enough! So, although it would be nice if it actually told you which dimension corresponded with which parameter, you can probably figure it out. If you're not sure, print the shapes of all the variables and work out which dimensions correspond to what:

>>> for key in data.variables:
...   print(key, data[key].shape)

Iterating over times

data['time'] has all the times for which you have data (30 data points for my initial test of the days in April 2015). The easiest way to plot anything is to iterate over those values:

    timeunits = JSdata.data['time'].units
    cal = JSdata.data['time'].calendar
    for i, t in enumerate(JSdata.data['time']):
        thedate = netCDF4.num2date(t, units=timeunits, calendar=cal)

Then you can use thedate like a datetime, calling thedate.strftime or whatever you need.

So that's how to access your data. All that's left is to plot it -- and in this case I had Geert Barentsen's script to start with, so I just modified it a little to work with slightly changed data format, and then added some argument parsing and runtime options.

Converting to Video

I already wrote about how to take the still images the program produces and turn them into a video: Making Videos (that work in Firefox) from a Series of Images.

However, it turns out ffmpeg can't handle files that are named with timestamps, like jetstream-2017-06-14-250.png. It can only handle one sequential integer. So I thought, what if I removed the dashes from the name, and used names like jetstream-20170614-250.png with %8d? No dice: ffmpeg also has the limitation that the integer can have at most four digits.

So I had to rename my images. A shell command works: I ran this in zsh but I think it should work in bash too.

cd outdir
mkdir moviedir

i=1
for fil in *.png; do
  newname=$(printf "%04d.png" $i)
  ln -s ../$fil moviedir/$newname
  i=$((i+1))
done

ffmpeg -i moviedir/%4d.png -filter:v "setpts=2.5*PTS" -pix_fmt yuv420p jetstream.mp4
The -filter:v "setpts=2.5*PTS" controls the delay between frames -- I'm not clear on the units, but larger numbers have more delay, and I think it's a multiplier, so this is 2.5 times slower than the default.

When I uploaded the video to YouTube, I got a warning, "Your videos will process faster if you encode into a streamable file format." I then spent half a day trying to find a combination of ffmpeg arguments that avoided that warning, and eventually gave up. As far as I can tell, the warning only affects the 20 seconds or so of processing that happens after the 5-10 minutes it takes to upload the video, so I'm not sure it's terribly important.

Results

Here's a video of the jet stream from 2012 to early 2018, and an earlier effort with a much longer 6.0x delay.

And here's the script, updated from the original Barentsen script and with a bunch of command-line options to let you plot different collections of data: jetstream.py on GitHub.

Tags: , , ,
[ 14:18 May 14, 2018    More programming | permalink to this entry | ]

Sat, 13 Apr 2013

Parsing NOAA historical weather data

We've been considering the possibility of moving out of the Bay Area to somewhere less crowded, somewhere in the desert southwest we so love to visit. But that also means moving to somewhere with much harsher weather.

How harsh? It's pretty easy to search for a specific location and get average temperatures. But what if I want to make a table to compare several different locations? I couldn't find any site that made that easy.

No problem, I say. Surely there's a Python library, I say. Well, no, as it turns out. There are Python APIs to get the current weather anywhere; but if you want historical weather data, or weather data averaged over many years, you're out of luck.

NOAA purports to have historical climate data, but the only dataset I found was spotty and hard to use. There's an FTP site containing directories by year; inside are gzipped files with names like 723710-03162-2012.op.gz. The first two numbers are station numbers, and there's a file at the top level called ish-history.txt with a list of the station codes and corresponding numbers. Not obvious!

Once you figure out the station codes, the files themselves are easy to parse, with lines like

STN--- WBAN   YEARMODA    TEMP       DEWP      SLP        STP       VISIB      WDSP     MXSPD   GUST    MAX     MIN   PRCP   SNDP   FRSHTT
724945 23293  20120101    49.5 24    38.8 24  1021.1 24  1019.5 24    9.9 24    1.5 24    4.1  999.9    68.0    37.0   0.00G 999.9  000000
Each line represents one day (20120101 is January 1st, 2012), and the codes are explained in another file called GSOD_DESC.txt. For instance, MAX is the daily high temperature, and SNDP is snow depth.

[NOAA historical temp program] So all I needed was to decode the station names, download the right files and parse them. That took about a day to write (including a lot of time wasted futzing with mysterious incantations for matplotlib).

Little accessibility refresher: I showed it to Dave -- "Neat, look at this, San Jose is the blue pair, Flagstaff is green and Page is red." His reaction: "This makes no sense. They all look the same to me. I have no idea which is which." Oops -- right. Don't use color as your only visual indicator. I knew that, supposedly! So I added markers in different shapes for each site. (I wish somebody would teach that lesson to Google Maps, which uses color as its only indicator on the traffic layer, so it's useless for red-green colorblind people.)

Back to the data -- it turns out NOAA doesn't actually have that much historical data available for download. If you search on most of these locations, you'll find sites that claim to have historical temperatures dating back 50 years or more, sometimes back to the 1800s. But NOAA typically only has files starting at about 2005 or 2006. I don't know where sites are getting this older data, or how reliable it is.

Still, averages since 2006 are still interesting to compare. Here's a run of noaatemps.py KSJC KFLG KSAF KLAM KCEZ KPGA KCNY. It's striking how moderate California weather is compared to any of these inland sites. No surprise there. Another surprise was that Los Alamos, despite its high elevation, has more moderate weather than most of the others -- lower highs, higher lows. I was a bit disappointed at how sparse the site list was -- no site in Moab? Really? So I used Canyonlands Field instead.

Anyway, it's fun for a data junkie to play around with, and it prints data on other weather factors, like precipitation and snowpack, although it doesn't plot them yet. The code is on my GitHub scripts page, under Weather.

Anyone found a better source for historical weather information? I'd love to have something that went back far enough to do some climate research, see what sites are getting warmer, colder, or seeing greater or lesser spreads between their extreme temperatures. The NOAA dataset obviously can't do that, so there must be something else that weather researchers use. Data on other countries would be interesting, too. Is there anything that's available to the public?

Tags: , , ,
[ 22:57 Apr 13, 2013    More programming | permalink to this entry | ]

Fri, 08 Feb 2013

Our massive winter snowstorm

[Hail in California]

I keep seeing references to the massive winter storm that's on the way, but I thought they were talking about New York. And then it started to hail ... and kept it up, long enough that we actually got little lentil-sized hailstones piling up in the yard, looking almost like it snowed.

So of course we rushed outside to take pictures. For you folks outside California, hailstorms are something we see maybe every three or four years, and hail that doesn't melt immediately after hitting the ground is quite a bit rarer. So please excuse us our excitement over a little bit of frozen water ...

Here are a few photos: Hail in Burbank.

Tags:
[ 15:49 Feb 08, 2013    More misc | permalink to this entry | ]