M is for Merging ("Dissolving") Several Geographic Shapes (Shallow Thoughts)

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Fri, 29 May 2020

M is for Merging ("Dissolving") Several Geographic Shapes

[San Juan County Council Districts]
San Juan County Council districts
[San Juan County voting precincts]
San Juan County Council voting precincts

For this year's LWV NM Voter Guides at VOTE411.org, I've been doing a lot of GIS fiddling, since the system needs to know the voting districts for each race.

You would think it would be easy to find GIS for voting districts — surely that's public information? — but counties and the state are remarkably resistant to giving out any sort of data (they're happy to give you a PDF or a JPG), so finding the district data takes a lot of searching.

Often, when we finally manage to get GIS info, it isn't for what we want. For instance, for San Juan County, there's a file that claims to be County Commission districts (which would look like the image above left), but the shapes in the file are actually voting precincts (above right). A district is made up of multiple precincts; in San Juan, there are 77 precincts making up five districts.

In a case like that, you need some way of combining several shapes (a bunch of precincts) into one (a district).

GIS "Dissolving"

It turns out that the process of coalescing lots of small shapes into a smaller number of larger shapes is unintuitively called "dissolving".

Discovering that term made the web searches much easier. In particular, I found this StackExchange thread which suggested the excellent Python libraries Shapely, for dealing with shapes, and Fiona, for reading and writing the shapefile format, and that gave me enough to figure out the rest.

Here are the imports I'll use in these examples:

from shapely.geometry import shape, mapping
from shapely.ops import unary_union
import fiona
import itertools
from collections import OrderedDict

Use Fiona to Read the Shapefile

For instance, let's say you're reading an input file (like the San Juan County Commissioner district file I had to process yesterday) where the shapes are all voting precincts. You want to know the county commission districts, and each district may consist of several precincts.

Let's also say each precinct has an attribute called 'Comm_Dist' that give you the county commission district that precinct is in, and that's the quantity you do care about.

The goal is to take that input file of precinct shapes, find all the shapes with the same district number (Comm_Dist) and merge them together into one shape per district, with a new attribute DIST containing the district number. Then repeat for every district.

Just for fun, let's save the original precinct numbers too, in an attribute called 'COMPONENTS'. I hate to throw away information.

First, open the input and output files with Fiona. For the output file, you need to specify the key to use in the merge/dissolve: 'Comm_Dist', in this case.. Fiona's representation of the input file includes a property called meta, which includes important details like the coordinate reference system (CRS), but it's also where you specify the schema for the file, defining which attributes each shape will have: in this case, each shape has DIST and COMPONENTS.

indata = fiona.open("Clerk_Data.shp")

meta = indata.meta
meta['schema']['properties'] = OrderedDict([('DIST', 'str:100'),
                                            ('COMPONENTS', 'str:400')])
output = fiona.open("Commission_Districts", 'w', **meta)

indata is a list of 77 precincts. The name passed for the output is the name of a directory which fiona will create, which will hold the .shp plus all the supporting files a shapefile needs: .cpg, .dbf, .prj and .shx.

Group All Shapes in One District Together

Now you can use itertools.groupby() to cluster shapes together that share attributes, e.g. all shapes where ['Comm_Dist'] is all the same, but you have to sort by attribute first:

sorted_shapes = sorted(indata, key=lambda k: k['properties']['Comm_Dist'])

There are 77 precincts, but if you group them by 'Comm_Dist', you end up with only five groups, the commission districts. Loop over those five:

for key, group in itertools.groupby(sorted_shapes,
                      key=lambda x:x['properties']['Comm_Dist']):

Here's the tricky part. What groupby() gives you (here called group -- you can ignore key -- is an iterable. For instance, there are 17 precincts in Comm_Dist 1, so the first time through this loop, you'll get a group which, if you iterate over it, will give you 17 objects, each of which is a precinct shape from the original shapefile. Eesh! But you can use zip to rearrange that data, making a list of all the properties from all those shapes, and another list of all the geometries:

    properties, geom = zip(*[(feature['properties'],
                              shape(feature['geometry']))
                             for feature in group])

Finally, the Dissolve

Now, for the current group (a district), you have all the precinct geometries held in the list geom. You can pass that to shapely.ops.unary_union to merge all those precinct shapes into a single combined shape. That's the GIS dissolve.

    district_geometry = mapping(unary_union(geom))

Almost done. We just need to make a dictionary of attributes. I want the district number to be stored as attribute DIST: that's easy enough. Remember how I wanted to capture which precincts went into each district? The precinct number is features['OBJECTID'], so I can use that list of all properties in the group to make a sorted, comma-separated list of all the group's precincts:

    precincts = [ prop['OBJECTID'] for prop in properties ]
    precincts.sort()

    newproperties = {
        'DIST':       properties[0][args.inkey],
        'COMPONENTS': ','.join([ str(p) for p in precincts ])
    }

Now newproperties['COMPONENTS'] will be something like "1,2,3,4,5,6,7,8,9,10,11,12,13,14,21,22,26". That's what will go under the key ['properties'] for the new combined shape, which we can now write to the output shapefile:

    output.write({
        'geometry': mapping(unary_union(geom)),
        'properties': newproperties
    })

A program that puts this all together is shapemerge.py, and I run it like this:

shapemerge.py -k Comm_Dist -K DIST Clerk_Data.shp CommDists

Converting to GeoJSON

One last thing. Fiona writes ARCGIS Shapefile format. That's great since that's what VOTE411 uses; but given how much work we're putting into finding and massaging these data files, I want to be able to share them on the LWVNM District Maps page too. And I vastly prefer using GeoJSON for that page -- it plays well with Leaflet, it's a single file instead of five, and it's human readable if things go wrong.

Fiona is supposed to be able to write GeoJSON, but when I tried, it gave errors and wrote files that didn't work right. That's okay: it's easy to translate a shapefile to GeoJSON with GDAL's ogr2ogr command (package gdal-bin in Debian/Ubuntu):

ogr2ogr -f GeoJSON CommDists.json CommDists/

GDAL has Python bindings too, so I'm sure it would be easy to include that in the program, but it's so easy to do from the command line that I haven't bothered.

Downloadable Data

In addition to the VOTE411.org voter guide, I've collected all the districts I have so far on the LWV New Mexico Voting District Maps page, and you can download GeoJSON from there. I hope this will make life a little easier for the next person who needs New Mexico voting data.

Tags: ,
[ 17:43 May 29, 2020    More mapping | permalink to this entry | comments ]