Finding the Bounding Box of a Dataset
Yesterday I wrote about PyTopo's difficulty in displaying a large all-US dam dataset. I've finally fixed the bug ... though it's still very slow to display the whole dataset: you have to wait a minute or more to redrew if it's zoomed all the way out to show the whole world.
Wait, the whole world? I thought this dataset was for all dams in the US?
Yes, in theory. In practice, not so much.
(I'm going to cheat a little bit and call this 30 Day Map Challenge Day 3: Polygons, because a bounding box is a polygon and I've spent so much time on this that I definitely don't have time to do any other 30 day projects today.)
It turns out the dataset has errors. In addition to dams in Alaska and
Hawaii (expected), it turns out the dataset includes coordinates
in South America,
on Guam, in Antarctica, and
in the ocean off the coast of West Africa.
I realized that when I started adding debug prints to PyTopo and
discovered the bounding box for the dataset went from latitude -72.8 to
71.3, and from longitude -176.6 to 144.7.
Most of these dams have names indicating they're clearly meant to be in US states. They're simply wrong coordinates (the Guam one might be real), but they make the dataset a bit more challenging to deal with.
First, how do you find out the bounding box? Sure, I could print what PyTopo was seeing, but given that I was in the middle of fixing a PyTopo bug, it would be nice to have a double-check. I loaded the dataset into QGIS (which is how I found out about the dams in Antarctica and off the coast of Africa), but I couldn't find a way to get it to just give me the numbers for the bounding box.
GDAL to the rescue.
ogrinfo -jsongave me a bunch of info about the dataset, and in particular if I looked in layers → dams → geometryFields → extent I could see the bounding box (though only by context could I tell that the order was min longitude, min latitude, max longitude, max latitude).
I wondered if there was a way to get it without reading JSON.
ogrinfo -summary all-us-dams.gpkg just said
INFO: Open of `all-us-dams.gpkg'
      using driver `GPKG' successful.
1: dams (Point)
but adding the layer name,
ogrinfo -summary all-us-dams.gpkg dams | grep Extentgave me
Extent: (-176.687300, -72.809420) - (144.706100, 71.266667)
That helped a little with my debugging. PyTopo still doesn't deal with the dataset very well, though I've fixed some bugs in its zooming code so at least now it can zoom out to the proper levels. Meanwhile, the WMS map tile servers I'm using (from Thunderforest and the USGS) mostly refuse requests at zoom levels like 2 and 3, which foiled my attempts to get a screenshot for this article. I managed to get a screenshot from QGIS, which then crashed before I could get much information on some of the outlying points. That makes me feel a little better about PyTopo's bugs. Obviously I need to make a smaller world-spanning dataset with only a small number of points if I want to get PyTopo working more solidly at low zoom levels.
[ 16:21 Nov 03, 2025 More mapping | permalink to this entry | ]