Gridded data analysis#
In this demo, we will be investigating the extent of shrubification in Alaska using the National Land Cover Database (NLCD). “Shrubification” describes the expansion of shrubs across the Arctic. Shrubs are woody plants with diverse growth forms but are often the tallest plants occupying tundra ecosystems and can form dense thickets with closed canopies in suitable habitats.
Studies indicate that warming temperatures, changes in snow cover, altered disturbance regimes (due to permafrost thaw, tundra fires, anthropogenic activities or changes in herbivory intensity) are all contributing to general increases in shrub abundance which will have wide-ranging consequences for the ecosystems and climate of the region.

National Land Cover Database#
The National Land Cover Database (NLCD) provides gridded land cover at a 30 m resolution with a 20-class legend (in Alaska) based on a modified Anderson Level II classification system.
The database is designed to provide cyclical updates of United States land cover every 2-3 years since 2001.
Enables systematic documententation of land cover change for monitoring and trend assessments.

Gridded data#
Gridded (or raster) data represent a matrix of cells (or pixels) organized into rows and columns (or a grid). Grid cells can represent data that changes continuously across a landscape (surface) such as elevation, air temperature, or reflectance data from satellite imaging platforms. Grid cells can also represent discrete data such as vegetation type or land cover.

Libraries#
The main library for accessing and analyzing gridded data in Python is called rasterio
which builds on the popular Geographic Raster Abstraction Library or GDAL. It supports read/write access for over 160 gridded data formats (e.g. GeoTIFF, NetCDF4) and includes methods for finding dataset information, reprojecting, resampling, format conversion, and mosaicking.

Let’s get started…
# Import libraries
import numpy as np
import rasterio
import pandas as pd
import matplotlib.pyplot as plt
Open dataset#
The data is formatted as a GeoTIFF (.tif
) file which is one of the many formats that can be read by rasterio
. We can open the dataset using the function open()
which accepts a path string and returns a dataset object.
Note
The data for this demo can be accessed here.
src = rasterio.open('data/nlcd_2016_ak.tif')
src
<open DatasetReader name='data/nlcd_2016_ak.tif' mode='r'>
Dataset attributes#
The dataset object contains a number of attributes which can be explored using the following methods. Remember that a raster band is an array of values representing a single variable in 2D space. All bands of a dataset have the same number of rows and columns.
print(f"Number of bands: {src.count}")
print(f"Width: {src.width}")
print(f"Height: {src.height}")
print(f"Data type: {src.dtypes}")
Number of bands: 1
Width: 3991
Height: 2868
Data type: ('uint8',)
Georeferencing#
Like vector data, pixels in raster data can be mapped to regions on the Earth’s surface. Like GeoPandas
, we can display the coordinate reference system of our data using the crs
method.
src.crs
CRS.from_wkt('PROJCS["WGS_1984_Albers",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",50],PARAMETER["longitude_of_center",-154],PARAMETER["standard_parallel_1",55],PARAMETER["standard_parallel_2",65],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]')
The output from this function can be a little overwelming but there is some useful information contained here. For example it looks like the data are projected using an Albers projection system and the units are in meters.
We can display some more specific information about a dataset’s projection using the transform
method. This function returns the spatial resolution of the dataset (i.e. the dimensions that each pixel of our dataset represents on the ground) and the coordinates of the top left corner.
src.transform
Affine(30.000052000454264, 0.0, -351410.251360936,
0.0, -29.996428936149588, 1939259.112990518)
Alternatively, we can display the bounding box of our dataset using the bounds()
method.
src.bounds
BoundingBox(left=-351410.251360936, bottom=1853229.354801641, right=-231680.043827123, top=1939259.112990518)
Reading raster data#
Now that we have some basic information, we can go ahead and import our data as a NumPy N-D array using the read()
function. Data from each band can be accessed by an index number.
Note
Note that bands are indexed from 1 due to a GDAL convention.
nlcd_2016 = src.read(1)
nlcd_2016
array([[42, 42, 42, ..., 52, 52, 41],
[42, 42, 42, ..., 52, 52, 41],
[42, 42, 42, ..., 41, 41, 52],
...,
[11, 11, 11, ..., 90, 11, 90],
[11, 11, 11, ..., 90, 90, 90],
[11, 11, 31, ..., 52, 52, 90]], shape=(2868, 3991), dtype=uint8)
type(nlcd_2016)
numpy.ndarray
Plot data#
We can have a look at the data using matplotlib
. Since the dataset is so big (eight billion pixels!), we will just plot a subset of the data.
fig, ax = plt.subplots(figsize=(16,8))
im = ax.imshow(nlcd_2016, cmap='tab20b')
ax.set_title("Alaska land cover", fontsize=14)
Text(0.5, 1.0, 'Alaska land cover')

Single land cover type#
It would be useful to identify what some of these colors mean. To do this we can use matrix indexing to find the value of a pixel at a given location (i.e. array[row, column]
). In this case we find the land cover at y (or row) = 500 and x (or column) = 3000.
nlcd_2016[500, 3000]
np.uint8(52)
The land cover type is 52 meaning shrub/scrub.

fig, ax = plt.subplots(figsize=(16,8))
im = ax.imshow(nlcd_2016, cmap='tab20b')
ax.scatter(3000, 500, s=100, color='k')
ax.set_title("Alaska land cover", fontsize=14)
Text(0.5, 1.0, 'Alaska land cover')

Frequency of land cover type#
Clearly shrubs (class 52) cover a large portion the land cover in this region of Alaska. To quantify how much area they cover, it would be useful to count the occurrence of each land cover class. We can do that using NumPy’s unique()
function, making sure we set return_counts
to True
.
unique, counts = np.unique(nlcd_2016, return_counts=True)
dict(zip(unique, counts))
{np.uint8(11): np.int64(2378247),
np.uint8(12): np.int64(35),
np.uint8(22): np.int64(2445),
np.uint8(23): np.int64(87),
np.uint8(31): np.int64(59650),
np.uint8(41): np.int64(59832),
np.uint8(42): np.int64(1456210),
np.uint8(43): np.int64(146517),
np.uint8(51): np.int64(2517179),
np.uint8(52): np.int64(1936912),
np.uint8(71): np.int64(3609),
np.uint8(72): np.int64(1007993),
np.uint8(90): np.int64(1161732),
np.uint8(95): np.int64(715740)}
# Count number of land pixels
land_pixels = nlcd_2016.size
# Convert to DataFrame
df_2016 = pd.DataFrame(list(zip(unique, counts, (counts/land_pixels)*100)),
columns=['lc', 'count_2016', 'fraction_2016'])
df_2016
lc | count_2016 | fraction_2016 | |
---|---|---|---|
0 | 11 | 2378247 | 20.777634 |
1 | 12 | 35 | 0.000306 |
2 | 22 | 2445 | 0.021361 |
3 | 23 | 87 | 0.000760 |
4 | 31 | 59650 | 0.521134 |
5 | 41 | 59832 | 0.522724 |
6 | 42 | 1456210 | 12.722227 |
7 | 43 | 146517 | 1.280051 |
8 | 51 | 2517179 | 21.991418 |
9 | 52 | 1936912 | 16.921896 |
10 | 71 | 3609 | 0.031530 |
11 | 72 | 1007993 | 8.806364 |
12 | 90 | 1161732 | 10.149510 |
13 | 95 | 715740 | 6.253086 |
This is very useful. With reference to the NLCD land cover classes, our table shows that the most dominant type of land cover is dwarf shrub (51) at 22% althoug normal shrubs (52) are not far behind with 17%. We also have some evergreen forest (42) at 13% and woody wetlands (90) at 10%.

Aggregate land cover classes#
Since we are interested in all kinds of shrubs, we will combine the the dwarf shrubs and shrub/scrub classes. The most simple way of doing this is to re-assign all grid cells classified as shrub/scrub (52) to dwarf shrubs (51) using a mask.
# Re-assign land cover class
nlcd_2016[nlcd_2016 == 52] = 51
# Check there are no more grid cells classified as normal shrubs
unique, counts = np.unique(nlcd_2016, return_counts=True)
# Convert to DataFrame
df_2016 = pd.DataFrame(list(zip(unique, counts, (counts/land_pixels)*100)),
columns=['lc', 'count_2016', 'fraction_2016'])
df_2016
lc | count_2016 | fraction_2016 | |
---|---|---|---|
0 | 11 | 2378247 | 20.777634 |
1 | 12 | 35 | 0.000306 |
2 | 22 | 2445 | 0.021361 |
3 | 23 | 87 | 0.000760 |
4 | 31 | 59650 | 0.521134 |
5 | 41 | 59832 | 0.522724 |
6 | 42 | 1456210 | 12.722227 |
7 | 43 | 146517 | 1.280051 |
8 | 51 | 4454091 | 38.913313 |
9 | 71 | 3609 | 0.031530 |
10 | 72 | 1007993 | 8.806364 |
11 | 90 | 1161732 | 10.149510 |
12 | 95 | 715740 | 6.253086 |
Land cover change#
The big question here is have these shrubs always occupied a large portion of this region of Alaska? To answer that we could perform the same analysis on a previous NLCD dataset from 2001.
# Open
src_2001 = rasterio.open('data/nlcd_2001_ak.tif')
# Read
nlcd_2001 = src_2001.read(1)
# Re-assign land cover class
nlcd_2001[nlcd_2001 == 52] = 51
# Count occurrence in each class
unique, counts = np.unique(nlcd_2001, return_counts=True)
# Convert to DataFrame
df_2001 = pd.DataFrame(list(zip(unique, counts, (counts/land_pixels)*100)),
columns=['lc', 'count_2001', 'fraction_2001'])
df_2001
lc | count_2001 | fraction_2001 | |
---|---|---|---|
0 | 11 | 2375777 | 20.756054 |
1 | 12 | 24 | 0.000210 |
2 | 22 | 2436 | 0.021282 |
3 | 23 | 91 | 0.000795 |
4 | 31 | 58014 | 0.506841 |
5 | 41 | 57582 | 0.503067 |
6 | 42 | 1461918 | 12.772095 |
7 | 43 | 144954 | 1.266395 |
8 | 51 | 4454788 | 38.919403 |
9 | 72 | 1009936 | 8.823339 |
10 | 90 | 1162063 | 10.152402 |
11 | 95 | 718605 | 6.278116 |
Combining the DataFrames should allow us to compute the difference between 2001 and 2016
df = pd.merge(df_2001, df_2016, on=['lc'])
df
lc | count_2001 | fraction_2001 | count_2016 | fraction_2016 | |
---|---|---|---|---|---|
0 | 11 | 2375777 | 20.756054 | 2378247 | 20.777634 |
1 | 12 | 24 | 0.000210 | 35 | 0.000306 |
2 | 22 | 2436 | 0.021282 | 2445 | 0.021361 |
3 | 23 | 91 | 0.000795 | 87 | 0.000760 |
4 | 31 | 58014 | 0.506841 | 59650 | 0.521134 |
5 | 41 | 57582 | 0.503067 | 59832 | 0.522724 |
6 | 42 | 1461918 | 12.772095 | 1456210 | 12.722227 |
7 | 43 | 144954 | 1.266395 | 146517 | 1.280051 |
8 | 51 | 4454788 | 38.919403 | 4454091 | 38.913313 |
9 | 72 | 1009936 | 8.823339 | 1007993 | 8.806364 |
10 | 90 | 1162063 | 10.152402 | 1161732 | 10.149510 |
11 | 95 | 718605 | 6.278116 | 715740 | 6.253086 |
df['change'] = (df['count_2016'] - df['count_2001'])
df['change_percent'] = (((df['count_2016'] - df['count_2001']) / df['count_2001']) * 100)
df
lc | count_2001 | fraction_2001 | count_2016 | fraction_2016 | change | change_percent | |
---|---|---|---|---|---|---|---|
0 | 11 | 2375777 | 20.756054 | 2378247 | 20.777634 | 2470 | 0.103966 |
1 | 12 | 24 | 0.000210 | 35 | 0.000306 | 11 | 45.833333 |
2 | 22 | 2436 | 0.021282 | 2445 | 0.021361 | 9 | 0.369458 |
3 | 23 | 91 | 0.000795 | 87 | 0.000760 | -4 | -4.395604 |
4 | 31 | 58014 | 0.506841 | 59650 | 0.521134 | 1636 | 2.820009 |
5 | 41 | 57582 | 0.503067 | 59832 | 0.522724 | 2250 | 3.907471 |
6 | 42 | 1461918 | 12.772095 | 1456210 | 12.722227 | -5708 | -0.390446 |
7 | 43 | 144954 | 1.266395 | 146517 | 1.280051 | 1563 | 1.078273 |
8 | 51 | 4454788 | 38.919403 | 4454091 | 38.913313 | -697 | -0.015646 |
9 | 72 | 1009936 | 8.823339 | 1007993 | 8.806364 | -1943 | -0.192388 |
10 | 90 | 1162063 | 10.152402 | 1161732 | 10.149510 | -331 | -0.028484 |
11 | 95 | 718605 | 6.278116 | 715740 | 6.253086 | -2865 | -0.398689 |
Conclusion#
Oh dear - the extent of shrubs did not increase between 2001 and 2016. One takeaway from this could be that shrubification is not happening in this region of the Arctic. But the percentage change in shrubs was very small (-0.02%) and could well be smaller than the uncertainty in our land cover product. For example, if just 1% of pixels were incorrectly classified as shrubs in 2001 our result would be insignificant.
Other, perhaps more significant changes, that occurred between 2001 and 2016 include an increase in deciduous forest (41) by 3.9%, a decrease in evergreen forest (42) by 0.4%, and a decrease in emergent herbaceous wetlands (95) by 0.4%.
Direction of land cover change#
It would be interesting to know how grid cells changed between 2001 and 2016. We can do this by masking the 2016 land cover data with specific grid cells from the 2001 land cover data. For example, in the code below we identify grid cells in 2016 that were classified as evergreen forest (class 42) in 2001. We then compute what land cover they are by masking the 2016 land cover data and using the unique()
function.
mask = nlcd_2016[nlcd_2001 == 42]
unique, counts = np.unique(mask, return_counts=True)
forestchange_df = pd.DataFrame(list(zip(unique, counts, (counts/mask.shape[0])*100)),
columns=['lc', 'count', 'fraction'])
forestchange_df
lc | count | fraction | |
---|---|---|---|
0 | 11 | 308 | 0.021068 |
1 | 42 | 1455641 | 99.570633 |
2 | 51 | 4273 | 0.292287 |
3 | 71 | 1672 | 0.114370 |
4 | 95 | 24 | 0.001642 |
It looks like most of the evergreen forest (99.6%) did not change between 2001 and 2016. When it did, it was most likely to change to shrub (51) which may indicate some “shrubification” in Alaska. Let’s finish this section by plotting a map showing where evergreen forest was replaced by shrub.
forest2shrub = (nlcd_2001 == 42) & ((nlcd_2016 == 51))
fig, ax = plt.subplots(figsize=(16,8))
im = ax.imshow(forest2shrub.astype(int), cmap='Blues')
ax.set_title("Fores to shrubs in Alaska between 2001 and 2016", fontsize=14)
Text(0.5, 1.0, 'Fores to shrubs in Alaska between 2001 and 2016')

At this scale, it’s difficult to see what’s going on. Let’s zoom in a little…
fig, ax = plt.subplots(figsize=(16,8))
im2 = ax.imshow(forest2shrub.astype(int)[1000:1750,2750:3750], cmap='Blues')
ax.set_title("Forest to shrubs in Alaska between 2001 and 2016", fontsize=14)
Text(0.5, 1.0, 'Forest to shrubs in Alaska between 2001 and 2016')
