Maps with data#

In the second part of this lecture, we will use Python to produce another type of thematic map. But this instead of representing elevation data, we will visualize Census Bureau data.

We can automatically download Census Bureau data from the 2019 American Community Survey using CenPy.

from cenpy import products
acs = products.ACS(2019)

Search for data#

Inspired by the article “Plumbing Poverty: Mapping Hot Spots of Racial and Geographic Inequality in U.S. Household Water Insecurity” by Shiloh Deitz and Katie Meehan (2019), we will search for data that determines whether household have adequate plumbing facilities.

The Census Bureau defines “complete plumbing facilities” as (1) piped hot and cold water, (2) a flush toilet, and (3) a bathtub or shower, all located within the housing unit and used only by occupants. We can find the variable code in the data tables by searching for the word “plumbing”.

acs.filter_tables('PLUMBING', by='description')
description columns
table_name
B25016 TENURE BY PLUMBING FACILITIES BY OCCUPANTS PER... [B25016_001E, B25016_002E, B25016_003E, B25016...
B25047 PLUMBING FACILITIES FOR ALL HOUSING UNITS [B25047_001E, B25047_002E, B25047_003E]
B25048 PLUMBING FACILITIES FOR OCCUPIED HOUSING UNITS [B25048_001E, B25048_002E, B25048_003E]
B25049 TENURE BY PLUMBING FACILITIES [B25049_001E, B25049_002E, B25049_003E, B25049...
B25050 PLUMBING FACILITIES BY OCCUPANTS PER ROOM BY Y... [B25050_001E, B25050_002E, B25050_003E, B25050...
B99259 ALLOCATION OF PLUMBING FACILITIES [B99259_001E, B99259_002E, B99259_003E]
# Print list of tables
acs.filter_variables('B25047')
label concept predicateType group limit predicateOnly hasGeoCollectionSupport attributes required
B25047_003E Estimate!!Total:!!Lacking complete plumbing fa... PLUMBING FACILITIES FOR ALL HOUSING UNITS int B25047 0 NaN NaN B25047_003EA,B25047_003M,B25047_003MA NaN
B25047_002E Estimate!!Total:!!Complete plumbing facilities PLUMBING FACILITIES FOR ALL HOUSING UNITS int B25047 0 NaN NaN B25047_002EA,B25047_002M,B25047_002MA NaN
B25047_001E Estimate!!Total: PLUMBING FACILITIES FOR ALL HOUSING UNITS int B25047 0 NaN NaN B25047_001EA,B25047_001M,B25047_001MA NaN

Download data#

We will focus our analysis on western United States and download data at the county level. Of course, a more local analysis could look at smaller spatial scales such as the tract or block group level.

# Download data
wa_plumbing = products.ACS(2019).from_state('Washington', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
or_plumbing = products.ACS(2019).from_state('Oregon', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
ca_plumbing = products.ACS(2019).from_state('California', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
az_plumbing = products.ACS(2019).from_state('Arizona', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
nm_plumbing = products.ACS(2019).from_state('New Mexico', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
id_plumbing = products.ACS(2019).from_state('Idaho', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
az_plumbing = products.ACS(2019).from_state('Arizona', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
ut_plumbing = products.ACS(2019).from_state('Utah', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
id_plumbing = products.ACS(2019).from_state('Idaho', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
nv_plumbing = products.ACS(2019).from_state('Nevada', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 2
      1 # Download data
----> 2 wa_plumbing = products.ACS(2019).from_state('Washington', 
      3                                           level='county',
      4                                           variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
      5 or_plumbing = products.ACS(2019).from_state('Oregon', 
      6                                           level='county',
      7                                           variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
      8 ca_plumbing = products.ACS(2019).from_state('California', 
      9                                           level='county',
     10                                           variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:767, in ACS.from_state(self, state, variables, level, **kwargs)
    766 def from_state(self, state, variables=None, level="tract", **kwargs):
--> 767     return self._from_name(state, variables, level, "States", **kwargs)

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:723, in ACS._from_name(self, place, variables, level, layername, return_geometry, cache_name, strict_within, return_bounds, geometry_precision)
    720 variables.append("GEO_ID")
    722 caller = super(ACS, self)._from_name
--> 723 geoms, variables, *rest = caller(
    724     place,
    725     variables,
    726     level,
    727     layername,
    728     return_geometry=return_geometry,
    729     cache_name=cache_name,
    730     strict_within=strict_within,
    731     return_bounds=return_bounds,
    732     geometry_precision=geometry_precision,
    733 )
    734 variables["GEOID"] = variables.GEO_ID.str.split("US").apply(lambda x: x[1])
    735 return_table = geoms[["GEOID", "geometry"]].merge(
    736     variables.drop("GEO_ID", axis=1), how="left", on="GEOID"
    737 )

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:389, in _Product._from_name(self, place, variables, level, layername, strict_within, return_bounds, geometry_precision, cache_name, replace_missing, return_geometry)
    380 geoms, data = self._from_bbox(
    381     env.to_crs(epsg=4326).total_bounds,
    382     variables=variables,
   (...)
    386     replace_missing=replace_missing,
    387 )
    388 if strict_within:
--> 389     geoms = geopandas.sjoin(geoms, env[["geometry"]], how="inner", op="within")
    390 if return_bounds:
    391     return geoms, data, env

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/geopandas/tools/sjoin.py:110, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, distance, on_attribute, **kwargs)
    108 if kwargs:
    109     first = next(iter(kwargs.keys()))
--> 110     raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
    112 on_attribute = _maybe_make_list(on_attribute)
    114 _basic_checks(left_df, right_df, how, lsuffix, rsuffix, on_attribute=on_attribute),

TypeError: sjoin() got an unexpected keyword argument 'op'
wa_plumbing.head()
plumbing = pd.concat([wa_plumbing, or_plumbing, ca_plumbing, 
                      az_plumbing, ut_plumbing, id_plumbing, 
                      nv_plumbing])

# Compute proportion of household with inadequate plumbing
plumbing['lack_plumbing_percent'] = plumbing['B25047_003E'] / plumbing['B25047_001E']

Simple choropleth map#

GeoPandas provides a high-level interface to the Matplotlib library for making maps. It is as easy as using the plot() method on a GeoDataFrame.

plumbing.plot('lack_plumbing_percent')

Improving the choropleth map#

This is useful for a quick look over the data but we would prefer to customize our plot. For instance, we should add a title, colorbar, and nicer colormap.

fig, ax = plt.subplots(figsize=(8,5))

plt.title('Lack of plumbing in the western United States', fontsize=10)
ax.set_axis_off()
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="3%", pad=0.5,alpha=0.5)
plumbing.plot('lack_plumbing_percent', ax=ax, alpha=0.5, cmap='Greens', 
              edgecolor='k', legend=True, cax=cax, linewidth=0.1)
plt.ylabel('Proportion of households', fontsize=10)
plt.show()

Interactive plots#

Alongside static plots, GeoPandas can produce interactive maps based on the Folium library, which is itself based on Leaflet.

m = plumbing.explore(
     column="lack_plumbing_percent", # make choropleth based on column
     scheme="naturalbreaks",         # use mapclassify's natural breaks scheme
     tooltip="lack_plumbing_percent",# show column value in tooltip (on hover)
     popup=True,                     # show all values in popup (on click)
     tiles="CartoDB positron",       # use "CartoDB positron" tiles
     cmap="Greens",                  # use "Greens" matplotlib colormap
     style_kwds=dict(color="black", weight=0.5) # use black outline with weight of 1
    )
m

Conclusions#

Maps made using programming don’t have to be terrible - it is possible to produce visually appealing maps in Python. They may never be quite as good as what professional cartographers can produce in Adobe Illustrator but they have other advantages. Once the code is written, we can quickly make edits or add another dataset. I hope this demo has inspired you to improve the quality of your Python plots!

Content in this lecture was inspired by Adam Symington and this tutorial.