Climate data analysis#

In this demo, we will be investigating the world’s climate. The data contains gridded monthly air temperature and precipitation for the 1959-2021 time period from the Copernicus Climate Change Service.

../../_images/climate.jpeg

xarray#

  • The best Python package for this task is xarray which introduces labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays, a bit like Pandas.

  • xarray can be used read, write, and analyze any scientific datasets stored in .nc or .hdf format.

  • Almost all climate data (and some remote sensing data) are stored in these formats.

xarray
# Import package
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable

# Read data
xds = xr.open_dataset('data/world_climate.nc')

Note

The data for this demo can be accessed here.

Basic information about Dataset#

type(xds)
xarray.core.dataset.Dataset
xds.dims
FrozenMappingWarningOnValuesAccess({'longitude': 1440, 'latitude': 721, 'time': 756})
xds.coords
Coordinates:
  * longitude  (longitude) float32 6kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
  * latitude   (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 6kB 1959-01-01 1959-02-01 ... 2021-12-01

All information can be printed using the keys() function.

xds.keys
<bound method Mapping.keys of <xarray.Dataset> Size: 13GB
Dimensions:    (longitude: 1440, latitude: 721, time: 756)
Coordinates:
  * longitude  (longitude) float32 6kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
  * latitude   (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 6kB 1959-01-01 1959-02-01 ... 2021-12-01
Data variables:
    t2m        (time, latitude, longitude) float64 6GB ...
    mtpr       (time, latitude, longitude) float64 6GB ...
Attributes:
    Conventions:  CF-1.6
    history:      2023-01-28 21:51:01 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...>

Note

t2m stands for 2 m above the surface, the standard height used by temperature sensors. It is a common metric used in climatology but note that it is different from the surface temperature which would be the temperature of the ground surface.

Since we are using an interactive notebook, a more convenient way of finding information about our dataset is to just execute the variable name.

xds
<xarray.Dataset> Size: 13GB
Dimensions:    (longitude: 1440, latitude: 721, time: 756)
Coordinates:
  * longitude  (longitude) float32 6kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
  * latitude   (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 6kB 1959-01-01 1959-02-01 ... 2021-12-01
Data variables:
    t2m        (time, latitude, longitude) float64 6GB ...
    mtpr       (time, latitude, longitude) float64 6GB ...
Attributes:
    Conventions:  CF-1.6
    history:      2023-01-28 21:51:01 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...

Basic information about DataArray#

type(xds['latitude'])
xarray.core.dataarray.DataArray
xds['latitude']
<xarray.DataArray 'latitude' (latitude: 721)> Size: 3kB
array([ 90.  ,  89.75,  89.5 , ..., -89.5 , -89.75, -90.  ],
      shape=(721,), dtype=float32)
Coordinates:
  * latitude  (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
Attributes:
    units:      degrees_north
    long_name:  latitude

Print as a NumPy array

type(xds['latitude'].values)
numpy.ndarray

First item in array

xds['latitude'].values[0]
np.float32(90.0)

Last item in array

xds['latitude'].values[-1]
np.float32(-90.0)

Shape of DataArray

xds['latitude'].values.shape
(721,)

Time#

type(xds['time'].values[0])
numpy.datetime64
period = xds['time'].values[-1] - xds['time'].values[0]
period
np.timedelta64(1985472000000000000,'ns')
period.astype('timedelta64[Y]')
np.timedelta64(62,'Y')

Convert to integer

period.astype('timedelta64[Y]').astype(int)
np.int64(62)

Plot#

fig, ax = plt.subplots(figsize=(10,6))

im1 = ax.imshow(xds['t2m'][0,:,:], cmap='RdYlBu_r')

ax.set_title("Air temperature", fontsize=14)
divider = make_axes_locatable(ax)
cax = divider.append_axes('right', size='5%', pad=0.05)
fig.colorbar(im1, cax=cax, orientation='vertical')
<matplotlib.colorbar.Colorbar at 0x12f454290>
../../_images/1d33a8f47d08354ea8e70381004339ce4bb2ba785f9acc7a2280dc42b10d85a7.png

Stats#

Compute mean air temperature for entire period

temp = xds['t2m']
mean_temp = temp.mean(['time'])
mean_temp
<xarray.DataArray 't2m' (latitude: 721, longitude: 1440)> Size: 8MB
array([[258.49005406, 258.49005406, 258.49005406, ..., 258.49005406,
        258.49005406, 258.49005406],
       [258.49639085, 258.49655877, 258.49666606, ..., 258.49590574,
        258.49609931, 258.49620893],
       [258.54803436, 258.54841452, 258.54879001, ..., 258.54686356,
        258.54714576, 258.54754691],
       ...,
       [228.08867964, 228.0895729 , 228.09131278, ..., 228.08614213,
        228.08758348, 228.08856303],
       [227.98272217, 227.98401892, 227.98524336, ..., 227.98061613,
        227.98129249, 227.98197351],
       [227.51711896, 227.51711896, 227.51711896, ..., 227.51711896,
        227.51711896, 227.51711896]], shape=(721, 1440))
Coordinates:
  * longitude  (longitude) float32 6kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
  * latitude   (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
mean_temp.shape
(721, 1440)
mean_temp[300, 100]
<xarray.DataArray 't2m' ()> Size: 8B
array(297.00346626)
Coordinates:
    longitude  float32 4B 25.0
    latitude   float32 4B 15.0
# Plot
fig, ax = plt.subplots(figsize=(10,6))

im1 = ax.imshow(xds['t2m'][0,:,:], cmap='RdYlBu_r')
ax.scatter(100, 300, s=100, c='k')

ax.set_title("Air temperature", fontsize=14)
divider = make_axes_locatable(ax)
cax = divider.append_axes('right', size='5%', pad=0.05)
fig.colorbar(im1, cax=cax, orientation='vertical')
<matplotlib.colorbar.Colorbar at 0x14e7cd250>
../../_images/6becf1bf4ad97569b891266a15b418895a5b2cab25869a1b0c085b894ea08809.png

Indexing multi-dimensional datasets#

Since our xarray dataset is aware of the latitude and longitude coordinates, we can index values conveniently.

t = xds['t2m'].sel(latitude=27.7, longitude=85.3, method='nearest')
print('The mean annual air temperature in Kathmandu is %.2f F' %((t.mean('time').values - 273.15) * 9/5 + 32))
The mean annual air temperature in Kathmandu is 62.49 F

Convert to Pandas DataFrame

t.to_dataframe()
longitude latitude t2m
time
1959-01-01 85.25 27.75 282.517155
1959-02-01 85.25 27.75 283.446361
1959-03-01 85.25 27.75 288.469712
1959-04-01 85.25 27.75 293.018762
1959-05-01 85.25 27.75 294.681458
... ... ... ...
2021-08-01 85.25 27.75 295.180443
2021-09-01 85.25 27.75 294.954754
2021-10-01 85.25 27.75 292.877707
2021-11-01 85.25 27.75 287.104996
2021-12-01 85.25 27.75 283.973557

756 rows × 3 columns

Where is the coldest place on Earth (1959-2021)?#

To find the coldest place on Earth we have to find the grid cell with the lowest temperature. The argmin() function returns the indices of the minimum values of an array.

min_value = mean_temp.argmin()
print(min_value)
<xarray.DataArray 't2m' ()> Size: 8B
array(978122)

Perhaps unexpectedly, argmin() returns a single number instead of a row/column pair. We can use NumPy’s unravel_index() to convert this 1D index to 2D coordinates. It just needs to know the shape of our original 2D DataArray.

low_idx = np.unravel_index(min_value, mean_temp.shape)
print(low_idx)
(np.int64(679), np.int64(362))
cold = mean_temp[low_idx[0], low_idx[1]].values
print('Coldest place on Earth is %.2f F' % ((cold - 273.15) * 9/5 + 32))
Coldest place on Earth is -64.64 F
fig, ax1 = plt.subplots(figsize=(10,6))

im1 = ax1.imshow(xds['t2m'][1,:,:], cmap='RdYlBu_r')

ax1.set_title("Coldest place on Earth (1959-2021)", fontsize=14)
ax1.scatter(low_idx[1], low_idx[0], s=100, color='k')

divider = make_axes_locatable(ax1)
cax = divider.append_axes('right', size='5%', pad=0.05)
fig.colorbar(im1, cax=cax, orientation='vertical')
<matplotlib.colorbar.Colorbar at 0x14e863680>
../../_images/fa3e7487edc173a2e9e6c774fcc0077c8e6d1622c41541c23b41c26bce5e06d1.png

Which was the hottest month on Earth (1959-2021)?#

To find the hottest month on Earth, we need to preserve the time dimension of our data. Instead we need average over the latitude and longitude dimensions.

hot = xds['t2m'].mean(['longitude','latitude'])
hot['time'][hot.argmax()].values
np.datetime64('2019-07-01T00:00:00.000000000')

Which was July 2019!!

Which was the hottest year on Earth (1959-2021)?#

There are a couple of ways to find the hottest year. The first is to use the groupby function.

temp_yearly = xds['t2m'].groupby('time.year').mean()
hot = temp_yearly.mean(['longitude','latitude'])
hot['year'][hot.argmax()].values
array(2016)

Note

Since we grouped by year, the time interval dimension was renamed to year.

Alternatively, we could use the resample function.

temp_yearly = xds['t2m'].resample(time="Y").mean()
/opt/miniconda3/envs/book/lib/python3.12/site-packages/xarray/groupers.py:487: FutureWarning: 'Y' is deprecated and will be removed in a future version, please use 'YE' instead.
  self.index_grouper = pd.Grouper(
hot = temp_yearly.mean(['longitude','latitude'])
hot['time'][hot.argmax()].values
np.datetime64('2016-12-31T00:00:00.000000000')