Using Census Population Data
A couple of us in the local League of Women Voters chapter have been
talking about how our county's school board is elected.
There are five school districts, to go with the five elementary
schools ... but when it comes time to vote for the school board,
the voting districts aren't the same as the school districts.
For example, a parent whose kid goes to Barranca might be voting for
the school board rep from the Aspen district. This confuses pretty
much everybody.
Apparently the reason it's set up this way is that the voting districts need to have roughly equal population, which the actual school districts don't. That made us curious about how the populations of the actual school districts compared. But it turns out if you ask that question, no one has those numbers, or at least, we couldn't find anyone who would release them.
"No problem!" I chirped. "I can get population data from the Census website, and combine that with the GIS for the school districts!"
Little did I know, when I promised that, what a soul-sucking pit of despair the Census website is, and how difficult it is to get data out of it.
Finding Decadal Census Population Data
What I needed initially was data on the total population in each census block (a block is the smallest geographic unit the census deals with) as measured in the last decadal (every decade) census, 2020.
I spent several hours fruitlessly clicking around census.gov
and data.census.gov and web searching for terms like
us census decadal population data block
.
I also tried the University of New Mexico's geographic data repository,
UNM RGIS:
sometimes that's a good source of data that's hard to get from the Feds,
but not this time.
Under Census and Demography there was a file Population
by Tracts — but only for 2015. They didn't have
anything new enough to reflect the 2020 census. Darn.
Eventually I found a video on the census website that helped: How to find a Geography Using Maps on data.census.gov. Here's what worked:
- Start at data.census.gov
- Search for population.
- Click on the Filters button at the upper left if the filter sidebar isn't already visible.
- In the sidebar, under Geographies, click on Block. Choose your state and click on "All blocks", or narrow it down by county or census tract.
- Click on the X to close the geography window.
- On the far right edge of the P1 | RACE line, click on Table. You may get a notice that "Table is too large to display", but don't worry about that.
- Look for the link at the top right of the center pane: "Download Table Data". That adds a new line that says "0 selected".
- Click the checkbox to the left of "Decennial Census: P1 | RACE", then click Download
In Firefox you'll probably have to disable popup blockers,
though I don't understand why they need to be unblocked
because the only popup involved is the normal download file chooser.
In Firefox's Preferences, go to Privacy & Security,
scroll down to Permissions
and add https://data.census.gov
in the Exceptions list.
Interactive Block Map
Before I talk about the downloaded data file, there's one other useful feature on this page: the Maps tab (also available by clicking Map instead of table on the P1 | RACE line).
If you're lucky, this will show you an interactive map where you can mouse around and see the census blocks (they highlight as you mouse over them): see the image at the top of this article. Click on a census block to see population in that block. It's super useful, but most of the time when I click on Map, I got a map that only showed state borders, or county borders, or the javascript hung and I couldn't zoom in or move the map and eventually Firefox popped up the "This page is slowing down Firefox. To speed up your browser, Stop this page. [Stop]". But if you keep trying, you might eventually get a useful interactive map.
Back to the Data
The download is a zip file that expands to two CSV files and a text file.
The actual data is in the file named
DECENNIALPL2020.P1-Data.csv.
You don't strictly need the other two files.
The first row in the CSV is column headers, like GEO_ID" and "P1_001N". The second row is long descriptions for each column, like " !!Total:" for total population (I don't know what the exclamation points or the extra space are for). Each row in the CSV (beyond the first two) corresponds with some geographic area
The first column is a long code for a geographic area, like "1000000US350280005006014". 35 is the state code for New Mexico, and the last four digits are the block number. There's other information encoded there as well: compare it with the second column, which is a long descriptive string like "Block 6014, Block Group 6, Census Tract 5, Los Alamos County, New Mexico".
The rest of the columns are data.
Column 3, "P1_001N", is what I was after, the total population for
that area (row 2 column 3 identifies that column as " !!Total:"; I
don't know what the exclamation points or the extra space are for).
The other columns break down the block's population by various
combinations of race.
Shapefile for Census Blocks
Great! Now I just needed an up-to-date shapefile for census blocks. Fortunately, that was easier to find.
- Go to TIGER/Line Shapefiles.
- Click on the 2020 tab if it isn't already selected.
- Scroll down to Web Interface, which takes you to another page where you can set Year to 2024, layer type to blocks, and click Submit,
- That takes you to another page where you can choose your state and click Download. (On the original Tiger/Line page there was also an option to download by state, but I found that the state menu didn't work: I'd choose New Mexico and as soon as I released the mouse button it would jump to a random different state.)
Now you have a whole bunch of shapefiles with an unknown naming convention. They have names that are all over the map; they all start with tl_2020_35 (the decennial year and state code), but after that are all sorts of names like 001_cousub10, 009_tabblock20, 043_linearwater, 045_bg10 and so on. (Each of which have seven different files, because of the way the ESRI shapefile format is designed.) Which one should you use?
I haven't found anything explaining the naming convention, but I knew I wanted all the blocks for Los Alamos County, so I tried
grep 'Los Alamos' *.xmlThere were quite a few hits, but since I was looking for block shapes, I decided to limit myself to those with block in the name:
grep 'Los Alamos' *.xml | grep blockwhich narrowed it down to two shapefiles: tl_2020_35028_tabblock10 and tl_2020_35028_tabblock20. But which one was the right one?
I tried loading both files into QGIS, and found that they covered the
same area and were very similar, but not identical. A few boundaries
-- often in unpopulated areas, way out in National Forest land
-- had shifted slightly.
My best guess is that tabblock10 is for the 2010 census, and that
tabblock20 has the 2020 census blocks.
I'm using tabblock20 for my calculations, though I don't think
it will make much difference either way.
If anyone knows better, please enlighten me!
Anyway, now I had a shapefile for blocks and a CSV that gave population per block number. Plus, of course, my school district file. The next step was to correlate them all. But this article has gotten long, so I'll write that up separately.
[ 10:39 May 26, 2025 More programming | permalink to this entry | ]