Maps with data

Maps with data#

In the second part of this lecture, we will use Python to produce another type of thematic map. But this instead of representing elevation data, we will visualize Census Bureau data.

We can automatically download Census Bureau data from the 2019 American Community Survey using CenPy.

from cenpy import products
acs = products.ACS(2019)

Search for data#

Inspired by the article “Plumbing Poverty: Mapping Hot Spots of Racial and Geographic Inequality in U.S. Household Water Insecurity” by Shiloh Deitz and Katie Meehan (2019), we will search for data that determines whether household have adequate plumbing facilities.

The Census Bureau defines “complete plumbing facilities” as (1) piped hot and cold water, (2) a flush toilet, and (3) a bathtub or shower, all located within the housing unit and used only by occupants. We can find the variable code in the data tables by searching for the word “plumbing”.

acs.filter_tables('PLUMBING', by='description')

	description	columns
table_name
B25016	TENURE BY PLUMBING FACILITIES BY OCCUPANTS PER...	[B25016_001E, B25016_002E, B25016_003E, B25016...
B25047	PLUMBING FACILITIES FOR ALL HOUSING UNITS	[B25047_001E, B25047_002E, B25047_003E]
B25048	PLUMBING FACILITIES FOR OCCUPIED HOUSING UNITS	[B25048_001E, B25048_002E, B25048_003E]
B25049	TENURE BY PLUMBING FACILITIES	[B25049_001E, B25049_002E, B25049_003E, B25049...
B25050	PLUMBING FACILITIES BY OCCUPANTS PER ROOM BY Y...	[B25050_001E, B25050_002E, B25050_003E, B25050...
B99259	ALLOCATION OF PLUMBING FACILITIES	[B99259_001E, B99259_002E, B99259_003E]

# Print list of tables
acs.filter_variables('B25047')

	label	concept	predicateType	group	predicateOnly	hasGeoCollectionSupport	attributes	required
B25047_003E	Estimate!!Total:!!Lacking complete plumbing fa...	PLUMBING FACILITIES FOR ALL HOUSING UNITS	int	B25047	NaN	NaN	B25047_003EA,B25047_003M,B25047_003MA	NaN
B25047_002E	Estimate!!Total:!!Complete plumbing facilities	PLUMBING FACILITIES FOR ALL HOUSING UNITS	int	B25047	NaN	NaN	B25047_002EA,B25047_002M,B25047_002MA	NaN
B25047_001E	Estimate!!Total:	PLUMBING FACILITIES FOR ALL HOUSING UNITS	int	B25047	NaN	NaN	B25047_001EA,B25047_001M,B25047_001MA	NaN

Download data#

We will focus our analysis on western United States and download data at the county level. Of course, a more local analysis could look at smaller spatial scales such as the tract or block group level.

# Download data
wa_plumbing = products.ACS(2019).from_state('Washington', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
or_plumbing = products.ACS(2019).from_state('Oregon', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
ca_plumbing = products.ACS(2019).from_state('California', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
az_plumbing = products.ACS(2019).from_state('Arizona', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
nm_plumbing = products.ACS(2019).from_state('New Mexico', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
id_plumbing = products.ACS(2019).from_state('Idaho', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
az_plumbing = products.ACS(2019).from_state('Arizona', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
ut_plumbing = products.ACS(2019).from_state('Utah', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
id_plumbing = products.ACS(2019).from_state('Idaho', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
nv_plumbing = products.ACS(2019).from_state('Nevada', 
                                          level='county',
                                          variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 2
# Download data
----> 2 wa_plumbing = products.ACS(2019).from_state('Washington', 
                                         level='county',
                                         variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
or_plumbing = products.ACS(2019).from_state('Oregon', 
                                         level='county',
                                         variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])
ca_plumbing = products.ACS(2019).from_state('California', 
                                         level='county',
                                         variables=['B25047_001E', 'B25047_002E', 'B25047_003E'])

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:767, in ACS.from_state(self, state, variables, level, **kwargs)
def from_state(self, state, variables=None, level="tract", **kwargs):
--> 767     return self._from_name(state, variables, level, "States", **kwargs)

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:723, in ACS._from_name(self, place, variables, level, layername, return_geometry, cache_name, strict_within, return_bounds, geometry_precision)
variables.append("GEO_ID")
caller = super(ACS, self)._from_name
--> 723 geoms, variables, *rest = caller(
   place,
   variables,
   level,
   layername,
   return_geometry=return_geometry,
   cache_name=cache_name,
   strict_within=strict_within,
   return_bounds=return_bounds,
   geometry_precision=geometry_precision,
)
variables["GEOID"] = variables.GEO_ID.str.split("US").apply(lambda x: x[1])
return_table = geoms[["GEOID", "geometry"]].merge(
   variables.drop("GEO_ID", axis=1), how="left", on="GEOID"
)

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/cenpy/products.py:389, in _Product._from_name(self, place, variables, level, layername, strict_within, return_bounds, geometry_precision, cache_name, replace_missing, return_geometry)
geoms, data = self._from_bbox(
   env.to_crs(epsg=4326).total_bounds,
   variables=variables,
   (...)
   replace_missing=replace_missing,
)
if strict_within:
--> 389     geoms = geopandas.sjoin(geoms, env[["geometry"]], how="inner", op="within")
if return_bounds:
   return geoms, data, env

File /opt/miniconda3/envs/book/lib/python3.12/site-packages/geopandas/tools/sjoin.py:110, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, distance, on_attribute, **kwargs)
if kwargs:
   first = next(iter(kwargs.keys()))
--> 110     raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
on_attribute = _maybe_make_list(on_attribute)
_basic_checks(left_df, right_df, how, lsuffix, rsuffix, on_attribute=on_attribute),

TypeError: sjoin() got an unexpected keyword argument 'op'

wa_plumbing.head()

plumbing = pd.concat([wa_plumbing, or_plumbing, ca_plumbing, 
                      az_plumbing, ut_plumbing, id_plumbing, 
                      nv_plumbing])

# Compute proportion of household with inadequate plumbing
plumbing['lack_plumbing_percent'] = plumbing['B25047_003E'] / plumbing['B25047_001E']

Simple choropleth map#

GeoPandas provides a high-level interface to the Matplotlib library for making maps. It is as easy as using the plot() method on a GeoDataFrame.

plumbing.plot('lack_plumbing_percent')

Improving the choropleth map#

This is useful for a quick look over the data but we would prefer to customize our plot. For instance, we should add a title, colorbar, and nicer colormap.

fig, ax = plt.subplots(figsize=(8,5))

plt.title('Lack of plumbing in the western United States', fontsize=10)
ax.set_axis_off()
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="3%", pad=0.5,alpha=0.5)
plumbing.plot('lack_plumbing_percent', ax=ax, alpha=0.5, cmap='Greens', 
              edgecolor='k', legend=True, cax=cax, linewidth=0.1)
plt.ylabel('Proportion of households', fontsize=10)
plt.show()

Interactive plots#

Alongside static plots, GeoPandas can produce interactive maps based on the Folium library, which is itself based on Leaflet.

m = plumbing.explore(
     column="lack_plumbing_percent", # make choropleth based on column
     scheme="naturalbreaks",         # use mapclassify's natural breaks scheme
     tooltip="lack_plumbing_percent",# show column value in tooltip (on hover)
     popup=True,                     # show all values in popup (on click)
     tiles="CartoDB positron",       # use "CartoDB positron" tiles
     cmap="Greens",                  # use "Greens" matplotlib colormap
     style_kwds=dict(color="black", weight=0.5) # use black outline with weight of 1
    )
m

Conclusions#

Maps made using programming don’t have to be terrible - it is possible to produce visually appealing maps in Python. They may never be quite as good as what professional cartographers can produce in Adobe Illustrator but they have other advantages. Once the code is written, we can quickly make edits or add another dataset. I hope this demo has inspired you to improve the quality of your Python plots!

Content in this lecture was inspired by Adam Symington and this tutorial.