Maps with data#
In the second part of this lecture, we will use Python to produce another type of thematic map. But this instead of representing elevation data, we will visualize Census Bureau data.
We can automatically download Census Bureau data from the 2019 American Community Survey using CenPy
.
from cenpy import products
acs = products.ACS(2019)
Search for data#
Inspired by the article “Plumbing Poverty: Mapping Hot Spots of Racial and Geographic Inequality in U.S. Household Water Insecurity” by Shiloh Deitz and Katie Meehan (2019), we will search for data that determines whether household have adequate plumbing facilities.
The Census Bureau defines “complete plumbing facilities” as (1) piped hot and cold water, (2) a flush toilet, and (3) a bathtub or shower, all located within the housing unit and used only by occupants. We can find the variable code in the data tables by searching for the word “plumbing”.
acs.filter_tables('PLUMBING', by='description')
description | columns | |
---|---|---|
table_name | ||
B25016 | TENURE BY PLUMBING FACILITIES BY OCCUPANTS PER... | [B25016_001E, B25016_002E, B25016_003E, B25016... |
B25047 | PLUMBING FACILITIES FOR ALL HOUSING UNITS | [B25047_001E, B25047_002E, B25047_003E] |
B25048 | PLUMBING FACILITIES FOR OCCUPIED HOUSING UNITS | [B25048_001E, B25048_002E, B25048_003E] |
B25049 | TENURE BY PLUMBING FACILITIES | [B25049_001E, B25049_002E, B25049_003E, B25049... |
B25050 | PLUMBING FACILITIES BY OCCUPANTS PER ROOM BY Y... | [B25050_001E, B25050_002E, B25050_003E, B25050... |
B99259 | ALLOCATION OF PLUMBING FACILITIES | [B99259_001E, B99259_002E, B99259_003E] |
# Print list of tables
acs.filter_variables('B25047')
label | concept | predicateType | group | limit | predicateOnly | hasGeoCollectionSupport | attributes | required | |
---|---|---|---|---|---|---|---|---|---|
B25047_003E | Estimate!!Total:!!Lacking complete plumbing fa... | PLUMBING FACILITIES FOR ALL HOUSING UNITS | int | B25047 | 0 | NaN | NaN | B25047_003EA,B25047_003M,B25047_003MA | NaN |
B25047_002E | Estimate!!Total:!!Complete plumbing facilities | PLUMBING FACILITIES FOR ALL HOUSING UNITS | int | B25047 | 0 | NaN | NaN | B25047_002EA,B25047_002M,B25047_002MA | NaN |
B25047_001E | Estimate!!Total: | PLUMBING FACILITIES FOR ALL HOUSING UNITS | int | B25047 | 0 | NaN | NaN | B25047_001EA,B25047_001M,B25047_001MA | NaN |
Download data#
We will focus our analysis on western United States and download data at the county level. Of course, a more local analysis could look at smaller spatial scales such as the tract or block group level.
# Download data
wa_plumbing = products.ACS(2019).from_state('Washington',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
or_plumbing = products.ACS(2019).from_state('Oregon',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
ca_plumbing = products.ACS(2019).from_state('California',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
az_plumbing = products.ACS(2019).from_state('Arizona',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
nm_plumbing = products.ACS(2019).from_state('New Mexico',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
id_plumbing = products.ACS(2019).from_state('Idaho',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
az_plumbing = products.ACS(2019).from_state('Arizona',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
ut_plumbing = products.ACS(2019).from_state('Utah',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
id_plumbing = products.ACS(2019).from_state('Idaho',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
nv_plumbing = products.ACS(2019).from_state('Nevada',
level='county',
variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 2
1 # Download data
----> 2 wa_plumbing = products.ACS(2019).from_state('Washington',
3 level='county',
4 variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
5 or_plumbing = products.ACS(2019).from_state('Oregon',
6 level='county',
7 variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
8 ca_plumbing = products.ACS(2019).from_state('California',
9 level='county',
10 variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:767, in ACS.from_state(self, state, variables, level, **kwargs)
766 def from_state(self, state, variables=None, level="tract", **kwargs):
--> 767 return self._from_name(state, variables, level, "States", **kwargs)
File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:723, in ACS._from_name(self, place, variables, level, layername, return_geometry, cache_name, strict_within, return_bounds, geometry_precision)
720 variables.append("GEO_ID")
722 caller = super(ACS, self)._from_name
--> 723 geoms, variables, *rest = caller(
724 place,
725 variables,
726 level,
727 layername,
728 return_geometry=return_geometry,
729 cache_name=cache_name,
730 strict_within=strict_within,
731 return_bounds=return_bounds,
732 geometry_precision=geometry_precision,
733 )
734 variables["GEOID"] = variables.GEO_ID.str.split("US").apply(lambda x: x[1])
735 return_table = geoms[["GEOID", "geometry"]].merge(
736 variables.drop("GEO_ID", axis=1), how="left", on="GEOID"
737 )
File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:389, in _Product._from_name(self, place, variables, level, layername, strict_within, return_bounds, geometry_precision, cache_name, replace_missing, return_geometry)
380 geoms, data = self._from_bbox(
381 env.to_crs(epsg=4326).total_bounds,
382 variables=variables,
(...)
386 replace_missing=replace_missing,
387 )
388 if strict_within:
--> 389 geoms = geopandas.sjoin(geoms, env[["geometry"]], how="inner", op="within")
390 if return_bounds:
391 return geoms, data, env
File /opt/miniconda3/envs/book/lib/python3.12/site-packages/geopandas/tools/sjoin.py:110, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, distance, on_attribute, **kwargs)
108 if kwargs:
109 first = next(iter(kwargs.keys()))
--> 110 raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
112 on_attribute = _maybe_make_list(on_attribute)
114 _basic_checks(left_df, right_df, how, lsuffix, rsuffix, on_attribute=on_attribute),
TypeError: sjoin() got an unexpected keyword argument 'op'
wa_plumbing.head()
plumbing = pd.concat([wa_plumbing, or_plumbing, ca_plumbing,
az_plumbing, ut_plumbing, id_plumbing,
nv_plumbing])
# Compute proportion of household with inadequate plumbing
plumbing['lack_plumbing_percent'] = plumbing['B25047_003E'] / plumbing['B25047_001E']
Simple choropleth map#
GeoPandas
provides a high-level interface to the Matplotlib
library for making maps. It is as easy as using the plot()
method on a GeoDataFrame
.
plumbing.plot('lack_plumbing_percent')
Improving the choropleth map#
This is useful for a quick look over the data but we would prefer to customize our plot. For instance, we should add a title, colorbar, and nicer colormap.
fig, ax = plt.subplots(figsize=(8,5))
plt.title('Lack of plumbing in the western United States', fontsize=10)
ax.set_axis_off()
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="3%", pad=0.5,alpha=0.5)
plumbing.plot('lack_plumbing_percent', ax=ax, alpha=0.5, cmap='Greens',
edgecolor='k', legend=True, cax=cax, linewidth=0.1)
plt.ylabel('Proportion of households', fontsize=10)
plt.show()
Interactive plots#
Alongside static plots, GeoPandas
can produce interactive maps based on the Folium
library, which is itself based on Leaflet
.
m = plumbing.explore(
column="lack_plumbing_percent", # make choropleth based on column
scheme="naturalbreaks", # use mapclassify's natural breaks scheme
tooltip="lack_plumbing_percent",# show column value in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="CartoDB positron", # use "CartoDB positron" tiles
cmap="Greens", # use "Greens" matplotlib colormap
style_kwds=dict(color="black", weight=0.5) # use black outline with weight of 1
)
m
Conclusions#
Maps made using programming don’t have to be terrible - it is possible to produce visually appealing maps in Python. They may never be quite as good as what professional cartographers can produce in Adobe Illustrator but they have other advantages. Once the code is written, we can quickly make edits or add another dataset. I hope this demo has inspired you to improve the quality of your Python plots!
Content in this lecture was inspired by Adam Symington and this tutorial.