Gridded data analysis

Gridded data analysis#

In this demo, we will be investigating the extent of shrubification in Alaska using the National Land Cover Database (NLCD). “Shrubification” describes the expansion of shrubs across the Arctic. Shrubs are woody plants with diverse growth forms but are often the tallest plants occupying tundra ecosystems and can form dense thickets with closed canopies in suitable habitats.

Studies indicate that warming temperatures, changes in snow cover, altered disturbance regimes (due to permafrost thaw, tundra fires, anthropogenic activities or changes in herbivory intensity) are all contributing to general increases in shrub abundance which will have wide-ranging consequences for the ecosystems and climate of the region.

National Land Cover Database#

The National Land Cover Database (NLCD) provides gridded land cover at a 30 m resolution with a 20-class legend (in Alaska) based on a modified Anderson Level II classification system.

The database is designed to provide cyclical updates of United States land cover every 2-3 years since 2001.

Enables systematic documententation of land cover change for monitoring and trend assessments.

Gridded data#

Gridded (or raster) data represent a matrix of cells (or pixels) organized into rows and columns (or a grid). Grid cells can represent data that changes continuously across a landscape (surface) such as elevation, air temperature, or reflectance data from satellite imaging platforms. Grid cells can also represent discrete data such as vegetation type or land cover.

Libraries#

The main library for accessing and analyzing gridded data in Python is called rasterio which builds on the popular Geographic Raster Abstraction Library or GDAL. It supports read/write access for over 160 gridded data formats (e.g. GeoTIFF, NetCDF4) and includes methods for finding dataset information, reprojecting, resampling, format conversion, and mosaicking.

Let’s get started…

# Import libraries
import numpy as np
import rasterio
import pandas as pd
import matplotlib.pyplot as plt

Open dataset#

The data is formatted as a GeoTIFF (.tif) file which is one of the many formats that can be read by rasterio. We can open the dataset using the function open() which accepts a path string and returns a dataset object.

Note

The data for this demo can be accessed here.

src = rasterio.open('data/nlcd_2016_ak.tif')
src

<open DatasetReader name='data/nlcd_2016_ak.tif' mode='r'>

Dataset attributes#

The dataset object contains a number of attributes which can be explored using the following methods. Remember that a raster band is an array of values representing a single variable in 2D space. All bands of a dataset have the same number of rows and columns.

print(f"Number of bands: {src.count}")
print(f"Width: {src.width}")
print(f"Height: {src.height}")
print(f"Data type: {src.dtypes}")

Number of bands: 1
Width: 3991
Height: 2868
Data type: ('uint8',)

Georeferencing#

Like vector data, pixels in raster data can be mapped to regions on the Earth’s surface. Like GeoPandas, we can display the coordinate reference system of our data using the crs method.

src.crs

CRS.from_wkt('PROJCS["WGS_1984_Albers",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",50],PARAMETER["longitude_of_center",-154],PARAMETER["standard_parallel_1",55],PARAMETER["standard_parallel_2",65],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]')

The output from this function can be a little overwelming but there is some useful information contained here. For example it looks like the data are projected using an Albers projection system and the units are in meters.

We can display some more specific information about a dataset’s projection using the transform method. This function returns the spatial resolution of the dataset (i.e. the dimensions that each pixel of our dataset represents on the ground) and the coordinates of the top left corner.

src.transform

Affine(30.000052000454264, 0.0, -351410.251360936,
       0.0, -29.996428936149588, 1939259.112990518)

Alternatively, we can display the bounding box of our dataset using the bounds() method.

src.bounds

BoundingBox(left=-351410.251360936, bottom=1853229.354801641, right=-231680.043827123, top=1939259.112990518)

Reading raster data#

Now that we have some basic information, we can go ahead and import our data as a NumPy N-D array using the read() function. Data from each band can be accessed by an index number.

Note

Note that bands are indexed from 1 due to a GDAL convention.

nlcd_2016 = src.read(1)
nlcd_2016

array([[42, 42, 42, ..., 52, 52, 41],
       [42, 42, 42, ..., 52, 52, 41],
       [42, 42, 42, ..., 41, 41, 52],
       ...,
       [11, 11, 11, ..., 90, 11, 90],
       [11, 11, 11, ..., 90, 90, 90],
       [11, 11, 31, ..., 52, 52, 90]], shape=(2868, 3991), dtype=uint8)

type(nlcd_2016)

numpy.ndarray

Plot data#

We can have a look at the data using matplotlib. Since the dataset is so big (eight billion pixels!), we will just plot a subset of the data.

fig, ax = plt.subplots(figsize=(16,8))
im = ax.imshow(nlcd_2016, cmap='tab20b')
ax.set_title("Alaska land cover", fontsize=14)

Text(0.5, 1.0, 'Alaska land cover')

../../_images/6295d1f735f5ee749eb403993c5d59e88bdcecda57d62321d14053fc239c1eb5.png

Single land cover type#

It would be useful to identify what some of these colors mean. To do this we can use matrix indexing to find the value of a pixel at a given location (i.e. array[row, column]). In this case we find the land cover at y (or row) = 500 and x (or column) = 3000.

nlcd_2016[500, 3000]

np.uint8(52)

The land cover type is 52 meaning shrub/scrub.

fig, ax = plt.subplots(figsize=(16,8))
im = ax.imshow(nlcd_2016, cmap='tab20b')
ax.scatter(3000, 500, s=100, color='k')
ax.set_title("Alaska land cover", fontsize=14)

Text(0.5, 1.0, 'Alaska land cover')

../../_images/68a5bec1f33671b57f78cbf1ab7656f2f3060fcbd142e8769d55f78fa0859ed1.png

Frequency of land cover type#

Clearly shrubs (class 52) cover a large portion the land cover in this region of Alaska. To quantify how much area they cover, it would be useful to count the occurrence of each land cover class. We can do that using NumPy’s unique() function, making sure we set return_counts to True.

unique, counts = np.unique(nlcd_2016, return_counts=True)
dict(zip(unique, counts))

{np.uint8(11): np.int64(2378247),
 np.uint8(12): np.int64(35),
 np.uint8(22): np.int64(2445),
 np.uint8(23): np.int64(87),
 np.uint8(31): np.int64(59650),
 np.uint8(41): np.int64(59832),
 np.uint8(42): np.int64(1456210),
 np.uint8(43): np.int64(146517),
 np.uint8(51): np.int64(2517179),
 np.uint8(52): np.int64(1936912),
 np.uint8(71): np.int64(3609),
 np.uint8(72): np.int64(1007993),
 np.uint8(90): np.int64(1161732),
 np.uint8(95): np.int64(715740)}

# Count number of land pixels
land_pixels = nlcd_2016.size

# Convert to DataFrame
df_2016 = pd.DataFrame(list(zip(unique, counts, (counts/land_pixels)*100)), 
                       columns=['lc', 'count_2016', 'fraction_2016'])
df_2016

	lc	count_2016	fraction_2016
0	11	2378247	20.777634
1	12	35	0.000306
2	22	2445	0.021361
3	23	87	0.000760
4	31	59650	0.521134
5	41	59832	0.522724
6	42	1456210	12.722227
7	43	146517	1.280051
8	51	2517179	21.991418
9	52	1936912	16.921896
10	71	3609	0.031530
11	72	1007993	8.806364
12	90	1161732	10.149510
13	95	715740	6.253086

This is very useful. With reference to the NLCD land cover classes, our table shows that the most dominant type of land cover is dwarf shrub (51) at 22% althoug normal shrubs (52) are not far behind with 17%. We also have some evergreen forest (42) at 13% and woody wetlands (90) at 10%.

Aggregate land cover classes#

Since we are interested in all kinds of shrubs, we will combine the the dwarf shrubs and shrub/scrub classes. The most simple way of doing this is to re-assign all grid cells classified as shrub/scrub (52) to dwarf shrubs (51) using a mask.

# Re-assign land cover class
nlcd_2016[nlcd_2016 == 52] = 51

# Check there are no more grid cells classified as normal shrubs
unique, counts = np.unique(nlcd_2016, return_counts=True)

# Convert to DataFrame
df_2016 = pd.DataFrame(list(zip(unique, counts, (counts/land_pixels)*100)), 
                       columns=['lc', 'count_2016', 'fraction_2016'])
df_2016

	lc	count_2016	fraction_2016
0	11	2378247	20.777634
1	12	35	0.000306
2	22	2445	0.021361
3	23	87	0.000760
4	31	59650	0.521134
5	41	59832	0.522724
6	42	1456210	12.722227
7	43	146517	1.280051
8	51	4454091	38.913313
9	71	3609	0.031530
10	72	1007993	8.806364
11	90	1161732	10.149510
12	95	715740	6.253086

Land cover change#

The big question here is have these shrubs always occupied a large portion of this region of Alaska? To answer that we could perform the same analysis on a previous NLCD dataset from 2001.

# Open
src_2001 = rasterio.open('data/nlcd_2001_ak.tif')

# Read
nlcd_2001 = src_2001.read(1)

# Re-assign land cover class
nlcd_2001[nlcd_2001 == 52] = 51

# Count occurrence in each class
unique, counts = np.unique(nlcd_2001, return_counts=True)

# Convert to DataFrame
df_2001 = pd.DataFrame(list(zip(unique, counts, (counts/land_pixels)*100)),
                       columns=['lc', 'count_2001', 'fraction_2001'])
df_2001

	lc	count_2001	fraction_2001
0	11	2375777	20.756054
1	12	24	0.000210
2	22	2436	0.021282
3	23	91	0.000795
4	31	58014	0.506841
5	41	57582	0.503067
6	42	1461918	12.772095
7	43	144954	1.266395
8	51	4454788	38.919403
9	72	1009936	8.823339
10	90	1162063	10.152402
11	95	718605	6.278116

Combining the DataFrames should allow us to compute the difference between 2001 and 2016

df = pd.merge(df_2001, df_2016, on=['lc'])
df

	lc	count_2001	fraction_2001	count_2016	fraction_2016
0	11	2375777	20.756054	2378247	20.777634
1	12	24	0.000210	35	0.000306
2	22	2436	0.021282	2445	0.021361
3	23	91	0.000795	87	0.000760
4	31	58014	0.506841	59650	0.521134
5	41	57582	0.503067	59832	0.522724
6	42	1461918	12.772095	1456210	12.722227
7	43	144954	1.266395	146517	1.280051
8	51	4454788	38.919403	4454091	38.913313
9	72	1009936	8.823339	1007993	8.806364
10	90	1162063	10.152402	1161732	10.149510
11	95	718605	6.278116	715740	6.253086

df['change'] = (df['count_2016'] - df['count_2001'])
df['change_percent'] = (((df['count_2016'] - df['count_2001']) / df['count_2001']) * 100)
df

	lc	count_2001	fraction_2001	count_2016	fraction_2016	change	change_percent
0	11	2375777	20.756054	2378247	20.777634	2470	0.103966
1	12	24	0.000210	35	0.000306	11	45.833333
2	22	2436	0.021282	2445	0.021361	9	0.369458
3	23	91	0.000795	87	0.000760	-4	-4.395604
4	31	58014	0.506841	59650	0.521134	1636	2.820009
5	41	57582	0.503067	59832	0.522724	2250	3.907471
6	42	1461918	12.772095	1456210	12.722227	-5708	-0.390446
7	43	144954	1.266395	146517	1.280051	1563	1.078273
8	51	4454788	38.919403	4454091	38.913313	-697	-0.015646
9	72	1009936	8.823339	1007993	8.806364	-1943	-0.192388
10	90	1162063	10.152402	1161732	10.149510	-331	-0.028484
11	95	718605	6.278116	715740	6.253086	-2865	-0.398689

Conclusion#

Oh dear - the extent of shrubs did not increase between 2001 and 2016. One takeaway from this could be that shrubification is not happening in this region of the Arctic. But the percentage change in shrubs was very small (-0.02%) and could well be smaller than the uncertainty in our land cover product. For example, if just 1% of pixels were incorrectly classified as shrubs in 2001 our result would be insignificant.

Other, perhaps more significant changes, that occurred between 2001 and 2016 include an increase in deciduous forest (41) by 3.9%, a decrease in evergreen forest (42) by 0.4%, and a decrease in emergent herbaceous wetlands (95) by 0.4%.

Direction of land cover change#

It would be interesting to know how grid cells changed between 2001 and 2016. We can do this by masking the 2016 land cover data with specific grid cells from the 2001 land cover data. For example, in the code below we identify grid cells in 2016 that were classified as evergreen forest (class 42) in 2001. We then compute what land cover they are by masking the 2016 land cover data and using the unique() function.

mask = nlcd_2016[nlcd_2001 == 42]
unique, counts = np.unique(mask, return_counts=True)

forestchange_df = pd.DataFrame(list(zip(unique, counts, (counts/mask.shape[0])*100)),
                 columns=['lc', 'count', 'fraction'])
forestchange_df

	lc	count	fraction
0	11	308	0.021068
1	42	1455641	99.570633
2	51	4273	0.292287
3	71	1672	0.114370
4	95	24	0.001642

It looks like most of the evergreen forest (99.6%) did not change between 2001 and 2016. When it did, it was most likely to change to shrub (51) which may indicate some “shrubification” in Alaska. Let’s finish this section by plotting a map showing where evergreen forest was replaced by shrub.

forest2shrub = (nlcd_2001 == 42) & ((nlcd_2016 == 51))

fig, ax = plt.subplots(figsize=(16,8))
im = ax.imshow(forest2shrub.astype(int), cmap='Blues')
ax.set_title("Fores to shrubs in Alaska between 2001 and 2016", fontsize=14)

Text(0.5, 1.0, 'Fores to shrubs in Alaska between 2001 and 2016')

../../_images/b75a2bd94e123943320b89c39cc7758bf9792a3440a1eaa28899d58e458c8fcc.png

At this scale, it’s difficult to see what’s going on. Let’s zoom in a little…

fig, ax = plt.subplots(figsize=(16,8))
im2 = ax.imshow(forest2shrub.astype(int)[1000:1750,2750:3750], cmap='Blues')
ax.set_title("Forest to shrubs in Alaska between 2001 and 2016", fontsize=14)

Text(0.5, 1.0, 'Forest to shrubs in Alaska between 2001 and 2016')

../../_images/a252b8231e060f72280e943f9236ea3c13e58fffb3590a5a91abe0e4e19edb43.png