Plotting Shapes with Python Basemap wwithout Shapefiles
In my article on Plotting election (and other county-level) data with Python Basemap, I used ESRI shapefiles for both states and counties.
But one of the election data files I found, OpenDataSoft's USA 2016 Presidential Election by county had embedded county shapes, available either as CSV or as GeoJSON. (I used the CSV version, but inside the CSV the geo data are encoded as JSON so you'll need JSON decoding either way. But that's no problem.)
Just about all the documentation I found on coloring shapes in Basemap assumed that the shapes were defined as ESRI shapefiles. How do you draw shapes if you have latitude/longitude data in a more open format?
As it turns out, it's quite easy, but it took a fair amount of poking around inside Basemap to figure out how it worked.
In the loop over counties in the US in the previous article,
the end goal was to create a matplotlib Polygon
and use that to add a Basemap patch
.
But matplotlib's Polygon wants map coordinates, not latitude/longitude.
If m is your basemap (i.e. you created the map with
m = Basemap( ... )
, you can translate coordinates like this:
(mapx, mapy) = m(longitude, latitude)
So once you have a region as a list of (longitude, latitude) coordinate pairs, you can create a colored, shaped patch like this:
for coord_pair in region: coord_pair[0], coord_pair[1] = m(coord_pair[0], coord_pair[1]) poly = Polygon(region, facecolor=color, edgecolor=color) ax.add_patch(poly)
Working with the OpenDataSoft data file was actually a little harder than
that, because the list of coordinates was JSON-encoded inside the CSV file,
so I had to decode it with json.loads(county["Geo Shape"])
.
Once decoded, it had some counties as a Polygon
list of
lists (allowing for discontiguous outlines), and others as
a MultiPolygon
list of list of lists (I'm not sure why,
since the Polygon format already allows for discontiguous boundaries)
And a few counties were missing, so there were blanks on the map, which show up as white patches in this screenshot. The counties missing data either have inconsistent formatting in their coordinate lists, or they have only one coordinate pair, and they include Washington, Virginia; Roane, Tennessee; Schley, Georgia; Terrell, Georgia; Marshall, Alabama; Williamsburg, Virginia; and Pike Georgia; plus Oglala Lakota (which is clearly meant to be Oglala, South Dakota), and all of Alaska.
One thing about crunching data files from the internet is that there are always a few special cases you have to code around. And I could have gotten those coordinates from the census shapefiles; but as long as I needed the census shapefile anyway, why use the CSV shapes at all? In this particular case, it makes more sense to use the shapefiles from the Census.
Still, I'm glad to have learned how to use arbitrary coordinates as shapes, freeing me from the proprietary and annoying ESRI shapefile format.
The code: Blue-red map using CSV with embedded county shapes
[ 09:36 Jan 19, 2017 More programming | permalink to this entry | ]