Using Census Population Data (Shallow Thoughts)

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Mon, 26 May 2025

Using Census Population Data

[Census interactive map showing population in blocks] A couple of us in the local League of Women Voters chapter have been talking about how our county's school board is elected. There are five school districts, to go with the five elementary schools ... but when it comes time to vote for the school board, the voting districts aren't the same as the school districts. For example, a parent whose kid goes to Barranca might be voting for the school board rep from the Aspen district. This confuses pretty much everybody.

Apparently the reason it's set up this way is that the voting districts need to have roughly equal population, which the actual school districts don't. That made us curious about how the populations of the actual school districts compared. But it turns out if you ask that question, no one has those numbers, or at least, we couldn't find anyone who would release them.

"No problem!" I chirped. "I can get population data from the Census website, and combine that with the GIS for the school districts!"

Little did I know, when I promised that, what a soul-sucking pit of despair the Census website is, and how difficult it is to get data out of it.

Finding Decadal Census Population Data

What I needed initially was data on the total population in each census block (a block is the smallest geographic unit the census deals with) as measured in the last decadal (every decade) census, 2020.

I spent several hours fruitlessly clicking around census.gov and data.census.gov and web searching for terms like us census decadal population data block. I also tried the University of New Mexico's geographic data repository, UNM RGIS: sometimes that's a good source of data that's hard to get from the Feds, but not this time. Under Census and Demography there was a file Population by Tracts — but only for 2015. They didn't have anything new enough to reflect the 2020 census. Darn.

Eventually I found a video on the census website that helped: How to find a Geography Using Maps on data.census.gov. Here's what worked:

In Firefox you'll probably have to disable popup blockers, though I don't understand why they need to be unblocked because the only popup involved is the normal download file chooser. In Firefox's Preferences, go to Privacy & Security, scroll down to Permissions and add https://data.census.gov in the Exceptions list.

Interactive Block Map

Before I talk about the downloaded data file, there's one other useful feature on this page: the Maps tab (also available by clicking Map instead of table on the P1 | RACE line).

If you're lucky, this will show you an interactive map where you can mouse around and see the census blocks (they highlight as you mouse over them): see the image at the top of this article. Click on a census block to see population in that block. It's super useful, but most of the time when I click on Map, I got a map that only showed state borders, or county borders, or the javascript hung and I couldn't zoom in or move the map and eventually Firefox popped up the "This page is slowing down Firefox. To speed up your browser, Stop this page. [Stop]". But if you keep trying, you might eventually get a useful interactive map.

Back to the Data

[A screenshot of gnumeric showing the top left of the USGS population table] The download is a zip file that expands to two CSV files and a text file. The actual data is in the file named DECENNIALPL2020.P1-Data.csv. You don't strictly need the other two files.

The first row in the CSV is column headers, like GEO_ID" and "P1_001N". The second row is long descriptions for each column, like " !!Total:" for total population (I don't know what the exclamation points or the extra space are for). Each row in the CSV (beyond the first two) corresponds with some geographic area

The first column is a long code for a geographic area, like "1000000US350280005006014". 35 is the state code for New Mexico, and the last four digits are the block number. There's other information encoded there as well: compare it with the second column, which is a long descriptive string like "Block 6014, Block Group 6, Census Tract 5, Los Alamos County, New Mexico".

The rest of the columns are data. Column 3, "P1_001N", is what I was after, the total population for that area (row 2 column 3 identifies that column as " !!Total:"; I don't know what the exclamation points or the extra space are for). The other columns break down the block's population by various combinations of race.

Shapefile for Census Blocks

Great! Now I just needed an up-to-date shapefile for census blocks. Fortunately, that was easier to find.

Now you have a whole bunch of shapefiles with an unknown naming convention. They have names that are all over the map; they all start with tl_2020_35 (the decennial year and state code), but after that are all sorts of names like 001_cousub10, 009_tabblock20, 043_linearwater, 045_bg10 and so on. (Each of which have seven different files, because of the way the ESRI shapefile format is designed.) Which one should you use?

I haven't found anything explaining the naming convention, but I knew I wanted all the blocks for Los Alamos County, so I tried

grep 'Los Alamos' *.xml
There were quite a few hits, but since I was looking for block shapes, I decided to limit myself to those with block in the name:
grep 'Los Alamos' *.xml | grep block
which narrowed it down to two shapefiles: tl_2020_35028_tabblock10 and tl_2020_35028_tabblock20. But which one was the right one?

[A QGIS screenshot showing two overlapping similar, but notidentical, sets of block boundaries] I tried loading both files into QGIS, and found that they covered the same area and were very similar, but not identical. A few boundaries -- often in unpopulated areas, way out in National Forest land -- had shifted slightly.

My best guess is that tabblock10 is for the 2010 census, and that tabblock20 has the 2020 census blocks. I'm using tabblock20 for my calculations, though I don't think it will make much difference either way. If anyone knows better, please enlighten me!

Anyway, now I had a shapefile for blocks and a CSV that gave population per block number. Plus, of course, my school district file. The next step was to correlate them all. But this article has gotten long, so I'll write that up separately.

Tags: , ,
[ 10:39 May 26, 2025    More programming | permalink to this entry | ]

Comments via Disqus:

blog comments powered by Disqus