Working with ECOSTRESS Evapotranspiration Data¶

This tutorial demonstrates how to work with the ECOSTRESS Evapotranspiration PT-JPL Daily L3 Global 70m Version 1 (ECO3ETPTJPL.001) data product in Python.¶

The Land Processes Distributed Active Archive Center (LP DAAC) distributes the Ecosystem Spaceborne Thermal Radiometer Experiment on Space Station (ECOSTRESS) data products. The ECOSTRESS mission is tasked with measuring the temperature of plants in to better understand how much water plants need and how they respond to stress. ECOSTRESS products are archived and distributed in the HDF5 file format as swath-based products.

In this tutorial, you will use Python to perform a swath to grid conversion to project the swath data unto a grid with a defined coordinate reference system (CRS), compare ECOSTRESS data with ground-based AmeriFlux flux tower observations, and export science dataset (SDS) layers as GeoTIFF files that can be loaded into a GIS and/or Remote Sensing software program.

Example: Converting a swath ECO3ETPTJPL.001 HDF5 file into a GeoTIFF with a defined CRS and comparing ECOSTRESS Evapotranspiration (ET) with ground-based ET observations from an AmeriFlux flux tower location in California.¶

Data Used in the Example:¶

Data Product: ECOSTRESS Evapotranspiration PT-JPL Daily L3 Global 70m Version 1 (ECO3ETPTJPL.001)
- Science Dataset (SDS) layers:
  - ETinst
  - ETinstUncertainty
Data Product: ECOSTRESS Geolocation Daily L1B Global 70m Version 1 (ECO1BGEO.001)
- Science Dataset (SDS) layers:
  - Latitude
  - Longitude
Data Product: AmeriFlux Ground Observations for Flux Tower US-CZ3: Sierra Critical Zone, Sierra Transect, Sierran Mixed Conifer, P301
- Variables:
  - Latent Heat (W/m$^{2}$)

Topics Covered:¶

Getting Started
1a. Import Packages
1b. Set Up the Working Environment
1c. Retrieve Files
Importing and Interpreting Data
2a. Open an ECOSTRESS HDF5 File and Read File Metadata
2b. Subset SDS Layers
Performing Swath2grid Conversion
3a. Import Geolocation File
3b. Define Projection and Output Grid
3c. Read SDS Metadata
3d. Perform K-D Tree Resampling
3e. Basic Image Processing
Exporting Results
4a. Set Up a Dictionary
4b. Define CRS and Export as GeoTIFFs
Combining ECOSTRESS and AmeriFlux Tower Data
5a. Loading Tables with Pandas
5b. Locate ECOSTRESS Pixel from Lat/Lon Coordinates
Visualizing Data
6a. Create Colormap
6b. Plot ET Data
6c. Exporting an Image
Comparing Observations
7a. Calculate Distribution of ECOSTRESS Data
7b. Visualize Ground Observations
7c. Combine ECOSTRESS and Ground Observations

Before Starting this Tutorial:¶

Dependencies:¶

This tutorial was tested using Python 3.6.7.
A NASA Earthdata Login account is required to download the data used in this tutorial. You can create an account at the link provided.
Libraries: h5py, numpy, pandas, matplotlib, datetime, pyproj, pyresample, gdal
The Geospatial Data Abstraction Library (GDAL) is required.

For a complete list of packages and a yml file that can be used to create your own environment, check out windows-py.yml or mac-py.yml.¶

Need help setting up an environment? check out the Python Environment Setup section in the README.

Example Data:¶

This tutorial uses an ECO3ETPTJPL.001 (and accompanying ECO1BGEO.001) observation from August 05, 2018. Until the data are released publicly, you can download the files directly via NASA Earthdata Search using the query in the link provided. Note that NASA Earthdata Search will only work for Early Adopters. You can alternatively download the granules below from the following location: https://ecostress.jpl.nasa.gov/mar21workshop (follow the Workshop Training Material link).

ECOSTRESS_L1B_GEO_00468_007_20180805T220314_0502_02.h5
ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02.h5

The AmeriFlux Latent Heat data used in Section 4 can be downloaded via a csv file

Source Code used to Generate this Tutorial:¶

Jupyter Notebook

If you are simply looking to batch process/perform the swath2grid conversion for ECOSTRESS files, be sure to check out the ECOSTRESS Swath to Grid Conversion Script.¶

NOTE: This tutorial was developed specifically for the ECOSTRESS Evapotranspiration PT-JPL Level 3, Version 1 HDF5 files and will need to be adapted to work with other ECOSTRESS products.

1. Getting Started

1a. Import Packages¶

Import the python packages required to complete this tutorial.¶

# Import packages
import h5py
import os
from os.path import join
import pyproj
import numpy as np
import pandas as pd
import datetime
from dateutil import parser
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from pyresample import geometry as geom
from pyresample import kd_tree as kdt
from osgeo import gdal, gdal_array, gdalconst, osr

# Set plots inside of the Jupyter Notebook
%matplotlib inline

1b. Set Up the Working Environment¶

The input directory is defined as the current working directory. Note that you will need to have the jupyter notebook and example data (.h5 and .csv) stored in this directory in order to execute the tutorial successfully.¶

# Current working directory will be set as the input directory
inDir = os.getcwd() + os.sep                                                     
print("input directory:\n{}".format(inDir))

# Set output directory
outDir = os.path.normpath(os.path.split(inDir)[0] + os.sep + 'output') + os.sep  
print("output directory:\n{}".format(outDir))

# Create output directory
if not os.path.exists(outDir): 
    os.makedirs(outDir)

input directory:
C:\ECOSTRESS\
output directory:
C:\ECOSTRESS\output\

Make sure that the ECOSTRESS .h5 data files, and Ameriflux ET data file (.csv) are located in the input directory printed above.¶

1c. Retrieve Files¶

Make sure that the ECO1BGEO and ECO3ETPTJPL .h5 files listed in the directions have been downloaded to the `inDir` defined above to follow along in the tutorial.¶

# List directory contents and create lists of ECOSTRESS HDF5 files (GEO, ET)
geoList = [file for file in os.listdir() if file.endswith('.h5') and 'GEO' in file]
print("geolocation:\n{}".format("\n".join(geoList)))
ecoList = [file for file in os.listdir() if file.endswith('.h5') and 'GEO' not in file]
print("products:\n{}".format("\n".join(ecoList)))

geolocation:
ECOSTRESS_L1B_GEO_00468_007_20180805T220314_0502_02.h5
products:
ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02.h5

The standard format for ECOSTRESS filenames is as follows:¶

ECOSTRESS_L3_ET_PT-JPL: Product Type
00468: Orbit number; starting at start of mission, ascending equatorial crossing
007: Scene ID; starting at first scene of first orbit
20180805T220314: Date and time of data start: YYYYMMDDThhmmss
0502: Build ID of software that generated product, Major+Minor (2+2 digits)
02: Product version number (2 digits)

2. Importing and Interpreting Data

2a. Open an ECOSTRESS HDF5 File¶

Read in an ECOSTRESS HDF5 file using the `h5py` package.¶

f = h5py.File(ecoList[0])             # Read in ECOSTRESS HDF5 file
ecoName = ecoList[0].split('.h5')[0]  # Keep original filename
print(ecoName)

ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02

2b. Subset SDS Layers and read SDS Metadata¶

Identify and generate a list of all the SDS layers in the HDF5 file.¶

# Create a list of all SDS inside of the .h5 file
eco_objs = []
f.visit(eco_objs.append)
ecoSDS = [str(obj) for obj in eco_objs if isinstance(f[obj], h5py.Dataset)] 
for dataset in ecoSDS[0:10]: 
    print(dataset)

EVAPOTRANSPIRATION PT-JPL/ETcanopy
EVAPOTRANSPIRATION PT-JPL/ETdaily
EVAPOTRANSPIRATION PT-JPL/ETinst
EVAPOTRANSPIRATION PT-JPL/ETinstUncertainty
EVAPOTRANSPIRATION PT-JPL/ETinterception
EVAPOTRANSPIRATION PT-JPL/ETsoil
L3_ET_PT-JPL Metadata/AncillaryFileAerosolOpticalDepth
L3_ET_PT-JPL Metadata/AncillaryFileAirTemperatureNWP
L3_ET_PT-JPL Metadata/AncillaryFileAirTemperatureRS
L3_ET_PT-JPL Metadata/AncillaryFileAlbedo

Below, subset the SDS list to the two layers needed for comparison with the ground-based AmeriFlux data, `ETinst` and `ETinstUncertainty`.¶

# Subset list to ETinst and ETinstUncertainty
sds = ['ETinst', 'ETinstUncertainty']
ecoSDS = [dataset for dataset in ecoSDS if dataset.endswith(tuple(sds))]
for dataset in ecoSDS:
    print(dataset.split('/')[-1])

ETinst
ETinstUncertainty

3. Performing Swath2grid Conversion

Resample the native ECOSTRESS swath data to a grid with defined coordinate reference system (CRS).¶

3a. Import Geolocation File¶

The `latitude` and `longitude` arrays from the ECO1BGEO product for the same ECOSTRESS orbit/scene ID are needed to perform the swath2grid conversion on the ECO3ETPT-JPL file.¶

# Find the matching ECO1BGEO file from the file list
geo = [geoFile for geoFile in geoList if ecoList[0][-37:] in geoFile]
print(geo[0])
print(ecoList[0])

ECOSTRESS_L1B_GEO_00468_007_20180805T220314_0502_02.h5
ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02.h5

Read in the ECO1BGEO file, search for the `latitude` and `longitude` SDS, and import into Python as arrays.¶

# Open Geo File
g = h5py.File(geo[0])
geo_objs = []
g.visit(geo_objs.append)

# Search for lat/lon SDS inside data file
latSD = [str(obj) for obj in geo_objs if isinstance(g[obj], h5py.Dataset) and '/latitude' in obj]
lonSD = [str(obj) for obj in geo_objs if isinstance(g[obj], h5py.Dataset) and '/longitude' in obj]

# Open SDS as arrays
lat = g[latSD[0]].value.astype(np.float)
lon = g[lonSD[0]].value.astype(np.float)

# Read the array dimensions
dims = lat.shape
print(dims)

(5632, 5400)

3b. Define Projection and Output Grid¶

The `latitude` and `longitude` arrays from the ECO1BGEO product for the same ECOSTRESS orbit/scene ID are needed to perform the swath2grid conversion on the ECO3ETPT-JPL file.¶

The following sections use the pyresample package to resample the ECOSTRESS swath dataset to a grid using nearest neighbor method. This process begins by defining the swath dimensions using the lat/lon arrays below.¶

# Set swath definition from lat/lon arrays
swathDef = geom.SwathDefinition(lons=lon, lats=lat)
swathDef.corners

[(-118.72374893600771, 34.220901494817554),
 (-121.8667573806392, 36.84466762740315),
 (-118.81986958831025, 39.05335446928084),
 (-115.71680093109661, 36.355977323511624)]

Define the coordinates in the middle of the swath, which are used to calculate an estimate of the output rows/columns for the gridded output.¶

# Define the lat/lon for the middle of the swath
mid = [int(lat.shape[1] / 2) - 1, int(lat.shape[0] / 2) - 1]
midLat, midLon = lat[mid[0]][mid[1]], lon[mid[0]][mid[1]]
midLat, midLon

(36.63135938975165, -118.8874589751626)

Below, `pyproj.Proj` is used to perform a cartographic transformation by defining an Azimuthal Equidistant projection centered on the midpoint of the swath. Once the projection is defined, convert the lower left and upper right corners of the lat/lon arrays to a location (in meters) in the new projection. Lastly, measure the distance between the corners and divide by 70 (meters), the nominal pixel size that we are aiming for. Azimuthal Equidistant projection was chosen here based on the following characteristics of this projection:¶

Units in meters (necessary for defining 70 m pixels)
Distances between all points are proportionally correct from center point
Azimuth (direction) are correct from the center point

# Define AEQD projection centered at swath center
epsgConvert = pyproj.Proj("+proj=aeqd +lat_0={} +lon_0={}".format(midLat, midLon))

# Use info from AEQD projection bbox to calculate output cols/rows/pixel size
llLon, llLat = epsgConvert(np.min(lon), np.min(lat), inverse=False)
urLon, urLat = epsgConvert(np.max(lon), np.max(lat), inverse=False)
areaExtent = (llLon, llLat, urLon, urLat)
cols = int(round((areaExtent[2] - areaExtent[0]) / 70))  # 70 m pixel size
rows = int(round((areaExtent[3] - areaExtent[1]) / 70))

Use number of rows and columns generated above from the AEQD projection to set a representative number of rows and columns in the Geographic area definition, which will then be translated to degrees below, then take the smaller of the two pixel dims to determine output size and ensure square pixels.¶

# Define Geographic projection
epsg, proj, pName = '4326', 'longlat', 'Geographic'

# Define bounding box of swath
llLon, llLat, urLon, urLat = np.min(lon), np.min(lat), np.max(lon), np.max(lat)
areaExtent = (llLon, llLat, urLon, urLat)

# Create area definition with estimated number of columns and rows
projDict = {'proj': proj, 'datum': 'WGS84', 'units': 'degree'}
areaDef = geom.AreaDefinition(epsg, pName, proj, projDict, cols, rows, areaExtent)
areaDef

Area ID: 4326
Description: Geographic
Projection ID: longlat
Projection: {'datum': 'WGS84', 'proj': 'longlat', 'units': 'degree'}
Number of columns: 7844
Number of rows: 7667
Area extent: (-121.8667573806392, 34.220901494817554, -115.71680093109661, 39.05335446928083)

Below, square the pixels by setting the pixel size to the smaller of the x any y values output by the `AreaDefinition`, then use the pixel size to recalculate the number of output cols/rows.¶

# Square pixels and calculate output cols/rows
ps = np.min([areaDef.pixel_size_x, areaDef.pixel_size_y])
cols = int(round((areaExtent[2] - areaExtent[0]) / ps))
rows = int(round((areaExtent[3] - areaExtent[1]) / ps))

# Set up a new Geographic area definition with the refined cols/rows
areaDef = geom.AreaDefinition(epsg, pName, proj, projDict, cols, rows, areaExtent)

Below, use `pyresample kd_tree`'s get_neighbour_info to create arrays with information on the nearest neighbor to each grid point.¶

This is the most computationally heavy task in the swath2grid conversion and using `get_neighbour_info` speeds up the process if you plan to resample multiple SDS within an ECOSTRESS product (compute once instead of for every SDS).¶

# Get arrays with information about the nearest neighbor to each grid point 
index, outdex, indexArr, distArr = kdt.get_neighbour_info(swathDef, areaDef, 210, neighbours=1)

Above, the function is comparing the swath and area definitions to locate the nearest neighbor (neighbours=1). 210 is the `radius_of_influence`, or the radius used to search for the nearest neighboring pixel in the swath (in meters).¶

3c. Read SDS Metadata¶

List the attributes for the `ETinst` layer, which can then be used to define the fill value and scale factor.¶

# Read in ETinst and print out SDS attributes
s = ecoSDS[0]
ecoSD = f[s].value 
for attr in f[s].attrs:
    if type(f[s].attrs[attr]) == np.ndarray:
        print(f'{attr} = {f[s].attrs[attr][0]}')
    else:
        print(f'{attr} = {f[s].attrs[attr].decode("utf-8")}')

_FillValue = nan
units = W/m^2
long_name = Instantaneous Latent Heat Flux
valid_max = 2000.0
valid_min = 0.0

Extract the scale factor, add offset and fill value from the SDS metadata.¶

# Read SDS attributes and define fill value, add offset, and scale factor if available
try:
    fv = int(f[s].attrs['_FillValue'])
except KeyError:
    fv = None
except ValueError:
    fv = f[s].attrs['_FillValue'][0]
try:
    sf = f[s].attrs['_Scale'][0]
except:
    sf = 1
try:
    add_off = f[s].attrs['_Offset'][0]
except:
    add_off = 0
try:
    units = f[s].attrs['units'].decode("utf-8")
except:
    units = 'none'

3d. Perform K-D Tree Resampling¶

Remember that the resampling has been split into two steps. In section 3b. arrays containing the nearest neighbor to each grid point were created. The second step is to use those arrays to retrieve a resampled result.¶

# Perform K-D Tree nearest neighbor resampling (swath 2 grid conversion)
ETgeo = kdt.get_sample_from_neighbour_info('nn', areaDef.shape, ecoSD, index, outdex, indexArr, fill_value=fv)

Above, resample the swath `ecoSD` array using nearest neighbor (already calculated in section 3b. and defined above as the `index`, `outdex`, and `indexArr`), and also set the fill value that was defined in section 3c.¶

Below, define the geotransform for the output (upper left x, horizontal pixel size, rotation, upper left y, rotation, vertical pixel size).¶

# Define the geotransform 
gt = [areaDef.area_extent[0], ps, 0, areaDef.area_extent[3], 0, -ps]
gt

[-121.8667573806392,
 0.0006302925491669857,
 0,
 39.05335446928083,
 0,
 -0.0006302925491669857]

3e. Basic Image Processing¶

Apply the scale factor and add offset and set the fill value defined in the previous section on the resampled data.¶

ETgeo = ETgeo * sf + add_off            # Apply Scale Factor and Add Offset
ETgeo[ETgeo == fv * sf + add_off] = fv  # Set Fill Value

Rerun steps 3c - 3e for `ETinstUncertainty`.¶

s = ecoSDS[1]
ecoSD = f[s].value
try:
    fv = int(f[s].attrs['_FillValue'])
except KeyError:
    fv = None
except ValueError:
    fv = f[s].attrs['_FillValue'][0]
try:
    sf = f[s].attrs['_Scale'][0]
except:
    sf = 1
try:
    add_off = f[s].attrs['_Offset'][0]
except:
    add_off = 0
UNgeo = kdt.get_sample_from_neighbour_info('nn', areaDef.shape, ecoSD, index, outdex, indexArr, fill_value=fv)
UNgeo = UNgeo * sf + add_off
UNgeo[UNgeo == fv * sf + add_off] = fv

4. Exporting Results

4a. Set Up a Dictionary¶

In this section, create a dictionary containing each of the arrays that will be exported as GeoTIFFs.¶

# Set up dictionary of arrays to export
outFiles = {'ETinst': ETgeo, 'ETinstUncertainty': UNgeo}

4b. Define CRS and Export as GeoTIFFs¶

Now that the data have been imported and resampled into a gridded raster array, export the results as GeoTIFFs using a `for` loop in this section.¶

# Loop through each item in dictionary created above
for file in outFiles:
    
    # Set up output name
    outName = join(outDir, '{}_{}.tif'.format(ecoName, file))
    print("output file:\n{}\n".format(outName))
    
    # Get driver, specify dimensions, define and set output geotransform
    height, width = outFiles[file].shape
    driv = gdal.GetDriverByName('GTiff')
    dataType = gdal_array.NumericTypeCodeToGDALTypeCode(outFiles[file].dtype)
    d = driv.Create(outName, width, height, 1, dataType)
    d.SetGeoTransform(gt)
        
    # Create and set output projection, write output array data
    # Define target SRS
    srs = osr.SpatialReference()
    srs.ImportFromEPSG(int(epsg))
    d.SetProjection(srs.ExportToWkt())
    srs.ExportToWkt()
    
    # Write array to band
    band = d.GetRasterBand(1)
    band.WriteArray(outFiles[file])
    
    # Define fill value if it exists, if not, set to mask fill value
    if fv is not None and fv != 'NaN':
        band.SetNoDataValue(fv)
    else:
        try:
            band.SetNoDataValue(outFiles[file].fill_value)
        except AttributeError:
            pass
    band.FlushCache()
    d, band = None, None

output file:
C:\ECOSTRESS\output\ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02_ETinst.tif

output file:
C:\ECOSTRESS\output\ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02_ETinstUncertainty.tif

5. Combining ECOSTRESS and AmeriFlux Tower Data

5a. Loading Tables with Pandas¶

In this section, begin by highlighting how to open a csv file using the `pandas` package.¶

The AmeriFlux tower data was provided by Mike Goulden for the AmeriFlux US-CZ3 tower. The csv includes half-hourly observations of Latent Heat (W/m$^{2}$) for the same day as the ECOSTRESS observation.

# Import csv file with AmeriFlux data and drop NaNs
towerData = pd.read_csv('tower_data.csv')
towerData = towerData.dropna()

Next, use the `parser` package and a lambda function to go through each time stamp and reformat to date and time objects.¶

# Define a lambda function to use the parser packgage to convert each time stamp to a datetime object
towerData["Date/Time"] = towerData["Time"].apply(lambda x: parser.parse(x))
towerData["Time"] = towerData["Date/Time"].apply(lambda x: datetime.time(x.hour, x.minute))
towerData = towerData[["Date/Time", "Time", "LE"]]
towerData

5b. Locate ECOSTRESS Pixel from Lat/Lon Coordinates¶

Calculate the gridded pixel nearest to the tower location.¶

towerLat, towerLon = 37.0674, -119.1951  # AmeriFlux US-CZ3 tower location

# Calculate tower lat/lon distance from upper left corner, then divide by pixel size to find x,y pixel location
Tcol = int(round((towerLon - gt[0]) / gt[1]))
Trow = int(round((towerLat - gt[3]) / gt[5]))

# Print ET at the tower location
ETgeo[Trow, Tcol]

162.27896

6. Visualizing Data

6a. Create a Colormap¶

Before plotting the ET data, set up an Evapotranspiration color map using `LinearSegmentedColormap` from the `matplotlib` package.¶

# Create a colormap for the ET data
ETcolors = ["#f6e8c3", "#d8b365", "#99974a", "#53792d", "#6bdfd2", "#1839c5"]
ETcmap = LinearSegmentedColormap.from_list("ET", ETcolors)

6b. Calculate Local Overpass Time¶

ECOSTRESS observation times are reported in Universal Time Coordinated (UTC). Below, grab the observation time from the filename and convert to local time using the longitude location of the tower.¶

# Grab UTC time of observation from file name
ecoTime = ecoName.split('_')[-3]
ecoTime

'20180805T220314'

Next, convert UTC observation time to local overpass time.¶

observationTime = parser.parse(ecoTime)
solarOverpass = observationTime + datetime.timedelta(hours=(np.radians(towerLon) / np.pi * 12))
overpass = datetime.time(solarOverpass.hour, solarOverpass.minute)
date = observationTime.strftime('%Y-%m-%d')

6c. Plot ET Data¶

In this section, begin by highlighting the functionality of the `matplotlib` plotting package. First, make a plot of the entire gridded ET output. Next, zoom in on the tower location and add some additional parameters to the plot. Finally, export the completed plot to an image file.¶

title = 'ECO3ETPTJPL Evapotranspiration'
SDSname = ecoSDS[0].split("/")[-1]
fig = plt.figure(figsize=(9.7,7.6))                                                       # Set the figure size (x,y)
fig.suptitle(f'{title} ({ecoSDS[0].split("/")[-1]})\n{date} at {overpass}', fontsize=22)  # Add title for the plots
plt.axis('off')                                                                           # Remove axes from plot
im = plt.imshow(ETgeo, cmap=ETcmap);                                                        # Plot array using colormap
plt.scatter(Tcol, Trow, color="black", marker='x')                                        # Plot tower location

# Add a colormap legend
plt.colorbar(im, orientation='horizontal', fraction=0.05, pad=0.004, label=f'ET ({units})', shrink=0.6).outline.set_visible(True)

6d. Exporting an Image¶

Zoom in to get a closer look at the region surrounding the AmeriFlux tower by creating a subset.¶

# Set a Radius and calculate subset region from flux tower location (row, col)
radius = 1700
ETsubset = ETgeo[(Trow - radius):(Trow + radius + 1), (Tcol - radius):(Tcol + radius + 1)]

Make another plot, this time zoomed in to the tower location. Export the plot as a .png file.¶

fig = plt.figure(figsize=(14,12))                                                 # Set the figure size (x,y)
fig.suptitle(f'{title} ({SDSname})\n{date} at {overpass}', fontsize=26)           # Add title for the plots
plt.axis('off')                                                                   # Remove axes from plot
im = plt.imshow(ETsubset, cmap=ETcmap);                                             # Plot array using colormap
plt.scatter(ETsubset.shape[0]/2, ETsubset.shape[1]/2, color="black", marker='x')  # Tower is in middle of subset

# Add a colormap legend
plt.colorbar(im, orientation='horizontal', fraction=0.05, pad=0.004, label=f'ET ({units})', shrink=0.6).outline.set_visible(True)

# Set up file name and export to png file
figure_filename = join(outDir, "{}_{}.png".format(ecoName, SDSname))
print("figure filename: {}".format(figure_filename))
fig.savefig(figure_filename, dpi=300)
plt.show()

figure filename: C:\ECOSTRESS\output\ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02_ETinst.png

7. Comparing Observations

7a. Calculate Distribution of ECOSTRESS Data¶

First, collect a 3x3 grid centered on the flux tower pixel as a subset to calculate statistics on.¶

# Subset data to 3x3 grid surrounding flux tower for both layers
ETfootprint = ETgeo[(Trow - 1):(Trow + 2), (Tcol - 1):(Tcol + 2)] 
UNfootprint = UNgeo[(Trow - 1):(Trow + 2), (Tcol - 1):(Tcol + 2)] 
print(ETfootprint)

[[155.45839 132.1322  132.1322 ]
 [162.27896 162.27896 181.59097]
 [170.64388 170.89914 175.31497]]

In case the 3x3 grid contains missing values, use `np.nanmedian` to ignoring missing values and calculate the measure of central tendency.¶

ETmedian = np.nanmedian(ETfootprint)
UNmedian = np.nanmedian(UNfootprint)
print(f"Median ET: {ETmedian:0.3f} \nUncertainty: {UNmedian:0.3f}")

Median ET: 162.279 
Uncertainty: 161.677

Next, generate a probability density function for the 3x3 grid of ET values.¶

pd.DataFrame({title: ETfootprint.flatten()}).plot.kde()                                   # Pandas Kernel Density Estimate
plt.title(f'Probability Density of {SDSname} Surrounding US-CZ3\n{date} at {overpass}');  # Title
plt.xlabel(f'{SDSname} ({units})');                                                       # X-axis label

7b. Visualize Ground Observations¶

Next, examine the series of eddy covariance observations from the AmeriFlux US-CZ3 dataset.¶

fig = plt.figure(figsize=(12,7))                                                                      # Set fig size (x,y)
ax = fig.add_subplot(111)                                                                             # Create a subplot
ax.plot(towerData.Time, towerData.LE, 'k', lw=2.5, color='black')                                     # Plot as a black line
ax.set_title(f'Time Series of Eddy Covariance Data at Site US-CZ3', fontsize=20, fontweight='bold');  # Set Title
ax.set_ylabel('Latent Heat (W/m^2)', fontsize=18);                                                    # Y-axis label
ax.set_xlabel(f'Time of Day ({date})', fontsize=18);                                                  # X-axis label
ax.set_xticks(np.array(towerData.Time));                                                              # Set the x ticks
ax.set_xticklabels(towerData.Time, rotation=45,fontsize=12);                                          # Set x tick labels

Above, we can see the daily range in Latent Heat as captured by the eddy covariance observations on the flux tower.¶

7c. Combine ECOSTRESS and Ground Observations¶

Finally, compare the ECOSTRESS Evapotranspiration and uncertainty with the time series of observations from the flux tower.¶

# Set the figure size, create a subplot
fig = plt.figure(1, figsize=(12, 7))
ax = fig.add_subplot(111)

# Plot the flux tower observatisons followed by the ecostress median ET and median uncertainty
ax.plot(towerData.Time, towerData.LE, 'k', lw=2.5, color='black')
ax.plot(overpass, ETmedian, 'bo', ms=10, color='darkgreen')
ax.errorbar(overpass, ETmedian, yerr=UNmedian, lw=2.0, c='lightgreen', fmt='o', capsize=3, capthick=2)

# Set x/y axes and labels
ax.set_xlabel(f'Time of Day ({date})', fontsize=18);
ax.set_xticks(np.array(towerData.Time));
ax.set_xticklabels(towerData.Time, rotation=45,fontsize=12);
ax.set_ylabel("Latent Heat Flux (Wm2)", fontsize=16, fontweight='bold')

# Add a title and export figure as png file
ax.set_title(f"Time Series of Eddy Covariance Data at Site US-CZ3\n vs. {title} ({SDSname})", fontsize=22)
figure_filename = join(outDir, "{}_{}_vs_fluxtower.png".format(ecoName, SDSname))
print("figure filename: {}".format(figure_filename))
fig.savefig(figure_filename, bbox_inches='tight')

figure filename: C:\ECOSTRESS\output\ECOSTRESS_L3_ET_PT-JPL_00468_007_20180805T220314_0502_02_ETinst_vs_fluxtower.png

Citations¶

Fisher, J.B., Halverson, G.H., Jewell, L., Freepartner, R., Cawse-Nicholson, K., Freeborn, D., (2018). ECOSTRESS Evapotranspiration PT-JPL Daily L3 Global 70m Version 1 [Data set]. NASA EOSDIS Land Processes DAAC. doi: 10.5067/ECOSTRESS/ECO3ETPTJPL.001

Goulden, M., (2018). AmeriFlux US-CZ3 Sierra Critical Zone, Sierra Transect, Sierran Mixed Conifer, P301, doi:10.17190/AMF/1419512.

Krehbiel, C., and Halverson, G.H., (2019). Working with ECOSTRESS Evapotranspiration Data [Jupyter Notebook]. Retrieved from https://git.earthdata.nasa.gov/projects/LPDUR/repos/tutorial-ecostress/browse

Contact Information

Material written by Cole Krehbiel$^{1}$ and Gregory Halverson$^{2}$

Contact: LPDAAC@usgs.gov

Voice: +1-866-573-3222

Organization: Land Processes Distributed Active Archive Center (LP DAAC)

Website: https://lpdaac.usgs.gov/

Date last modified: 03-13-2019

$^{1}$Innovate! Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, 57198-001, USA. Work performed under USGS contract G15PD00467 for LP DAAC$^{3}$.

$^{3}$LP DAAC Work performed under NASA contract NNG14HH33I.

$^{2}$ Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA

	Date/Time	Time	LE
1	2018-08-05 09:30:00	09:30:00	18.027752
3	2018-08-05 10:00:00	10:00:00	68.495970
5	2018-08-05 10:30:00	10:30:00	89.050031
7	2018-08-05 11:00:00	11:00:00	154.564199
9	2018-08-05 11:30:00	11:30:00	118.685090
11	2018-08-05 12:00:00	12:00:00	107.071870
13	2018-08-05 12:30:00	12:30:00	189.551249
15	2018-08-05 13:00:00	13:00:00	129.985112
19	2018-08-05 14:00:00	14:00:00	139.651745
21	2018-08-05 14:30:00	14:30:00	155.416112
23	2018-08-05 15:00:00	15:00:00	124.511936
25	2018-08-05 15:30:00	15:30:00	106.666146
27	2018-08-05 16:00:00	16:00:00	176.321112
29	2018-08-05 16:30:00	16:30:00	203.363058
31	2018-08-05 17:00:00	17:00:00	166.714904
33	2018-08-05 17:30:00	17:30:00	124.236027
35	2018-08-05 18:00:00	18:00:00	96.232450
37	2018-08-05 18:30:00	18:30:00	133.121365
39	2018-08-05 19:00:00	19:00:00	44.660266
41	2018-08-05 19:30:00	19:30:00	11.618052
43	2018-08-05 20:00:00	20:00:00	44.201492

Working with ECOSTRESS Evapotranspiration Data¶

This tutorial demonstrates how to work with the ECOSTRESS Evapotranspiration PT-JPL Daily L3 Global 70m Version 1 (ECO3ETPTJPL.001) data product in Python.¶

Example: Converting a swath ECO3ETPTJPL.001 HDF5 file into a GeoTIFF with a defined CRS and comparing ECOSTRESS Evapotranspiration (ET) with ground-based ET observations from an AmeriFlux flux tower location in California.¶

Data Used in the Example:¶

Topics Covered:¶

Before Starting this Tutorial:¶

Dependencies:¶

For a complete list of packages and a yml file that can be used to create your own environment, check out windows-py.yml or mac-py.yml.¶

Example Data:¶

Source Code used to Generate this Tutorial:¶

If you are simply looking to batch process/perform the swath2grid conversion for ECOSTRESS files, be sure to check out the ECOSTRESS Swath to Grid Conversion Script.¶

1. Getting Started

1a. Import Packages¶

Import the python packages required to complete this tutorial.¶

1b. Set Up the Working Environment¶

The input directory is defined as the current working directory. Note that you will need to have the jupyter notebook and example data (.h5 and .csv) stored in this directory in order to execute the tutorial successfully.¶

Make sure that the ECOSTRESS .h5 data files, and Ameriflux ET data file (.csv) are located in the input directory printed above.¶

1c. Retrieve Files¶

Make sure that the ECO1BGEO and ECO3ETPTJPL .h5 files listed in the directions have been downloaded to the inDir defined above to follow along in the tutorial.¶

The standard format for ECOSTRESS filenames is as follows:¶

2. Importing and Interpreting Data

2a. Open an ECOSTRESS HDF5 File¶

Read in an ECOSTRESS HDF5 file using the h5py package.¶

2b. Subset SDS Layers and read SDS Metadata¶

Identify and generate a list of all the SDS layers in the HDF5 file.¶

Below, subset the SDS list to the two layers needed for comparison with the ground-based AmeriFlux data, ETinst and ETinstUncertainty.¶

3. Performing Swath2grid Conversion

Resample the native ECOSTRESS swath data to a grid with defined coordinate reference system (CRS).¶

3a. Import Geolocation File¶

The latitude and longitude arrays from the ECO1BGEO product for the same ECOSTRESS orbit/scene ID are needed to perform the swath2grid conversion on the ECO3ETPT-JPL file.¶

Read in the ECO1BGEO file, search for the latitude and longitude SDS, and import into Python as arrays.¶

3b. Define Projection and Output Grid¶

The latitude and longitude arrays from the ECO1BGEO product for the same ECOSTRESS orbit/scene ID are needed to perform the swath2grid conversion on the ECO3ETPT-JPL file.¶

The following sections use the pyresample package to resample the ECOSTRESS swath dataset to a grid using nearest neighbor method. This process begins by defining the swath dimensions using the lat/lon arrays below.¶

Define the coordinates in the middle of the swath, which are used to calculate an estimate of the output rows/columns for the gridded output.¶

Below, square the pixels by setting the pixel size to the smaller of the x any y values output by the AreaDefinition, then use the pixel size to recalculate the number of output cols/rows.¶

Below, use pyresample kd_tree's get_neighbour_info to create arrays with information on the nearest neighbor to each grid point.¶

This is the most computationally heavy task in the swath2grid conversion and using get_neighbour_info speeds up the process if you plan to resample multiple SDS within an ECOSTRESS product (compute once instead of for every SDS).¶

Above, the function is comparing the swath and area definitions to locate the nearest neighbor (neighbours=1). 210 is the radius_of_influence, or the radius used to search for the nearest neighboring pixel in the swath (in meters).¶

3c. Read SDS Metadata¶

List the attributes for the ETinst layer, which can then be used to define the fill value and scale factor.¶

Extract the scale factor, add offset and fill value from the SDS metadata.¶

3d. Perform K-D Tree Resampling¶

Remember that the resampling has been split into two steps. In section 3b. arrays containing the nearest neighbor to each grid point were created. The second step is to use those arrays to retrieve a resampled result.¶

Above, resample the swath ecoSD array using nearest neighbor (already calculated in section 3b. and defined above as the index, outdex, and indexArr), and also set the fill value that was defined in section 3c.¶

Below, define the geotransform for the output (upper left x, horizontal pixel size, rotation, upper left y, rotation, vertical pixel size).¶

3e. Basic Image Processing¶

Apply the scale factor and add offset and set the fill value defined in the previous section on the resampled data.¶

Rerun steps 3c - 3e for ETinstUncertainty.¶

4. Exporting Results

4a. Set Up a Dictionary¶

In this section, create a dictionary containing each of the arrays that will be exported as GeoTIFFs.¶

4b. Define CRS and Export as GeoTIFFs¶

Now that the data have been imported and resampled into a gridded raster array, export the results as GeoTIFFs using a for loop in this section.¶

5. Combining ECOSTRESS and AmeriFlux Tower Data

5a. Loading Tables with Pandas¶

In this section, begin by highlighting how to open a csv file using the pandas package.¶

Next, use the parser package and a lambda function to go through each time stamp and reformat to date and time objects.¶

5b. Locate ECOSTRESS Pixel from Lat/Lon Coordinates¶

Calculate the gridded pixel nearest to the tower location.¶

6. Visualizing Data

6a. Create a Colormap¶

Before plotting the ET data, set up an Evapotranspiration color map using LinearSegmentedColormap from the matplotlib package.¶

6b. Calculate Local Overpass Time¶

ECOSTRESS observation times are reported in Universal Time Coordinated (UTC). Below, grab the observation time from the filename and convert to local time using the longitude location of the tower.¶

Next, convert UTC observation time to local overpass time.¶

6c. Plot ET Data¶

In this section, begin by highlighting the functionality of the matplotlib plotting package. First, make a plot of the entire gridded ET output. Next, zoom in on the tower location and add some additional parameters to the plot. Finally, export the completed plot to an image file.¶

6d. Exporting an Image¶

Zoom in to get a closer look at the region surrounding the AmeriFlux tower by creating a subset.¶

Make another plot, this time zoomed in to the tower location. Export the plot as a .png file.¶

7. Comparing Observations

7a. Calculate Distribution of ECOSTRESS Data¶

First, collect a 3x3 grid centered on the flux tower pixel as a subset to calculate statistics on.¶

In case the 3x3 grid contains missing values, use np.nanmedian to ignoring missing values and calculate the measure of central tendency.¶

Next, generate a probability density function for the 3x3 grid of ET values.¶

7b. Visualize Ground Observations¶

Next, examine the series of eddy covariance observations from the AmeriFlux US-CZ3 dataset.¶

Above, we can see the daily range in Latent Heat as captured by the eddy covariance observations on the flux tower.¶

7c. Combine ECOSTRESS and Ground Observations¶

Make sure that the ECO1BGEO and ECO3ETPTJPL .h5 files listed in the directions have been downloaded to the `inDir` defined above to follow along in the tutorial.¶

Read in an ECOSTRESS HDF5 file using the `h5py` package.¶

Below, subset the SDS list to the two layers needed for comparison with the ground-based AmeriFlux data, `ETinst` and `ETinstUncertainty`.¶

The `latitude` and `longitude` arrays from the ECO1BGEO product for the same ECOSTRESS orbit/scene ID are needed to perform the swath2grid conversion on the ECO3ETPT-JPL file.¶

Read in the ECO1BGEO file, search for the `latitude` and `longitude` SDS, and import into Python as arrays.¶

The `latitude` and `longitude` arrays from the ECO1BGEO product for the same ECOSTRESS orbit/scene ID are needed to perform the swath2grid conversion on the ECO3ETPT-JPL file.¶

Below, square the pixels by setting the pixel size to the smaller of the x any y values output by the `AreaDefinition`, then use the pixel size to recalculate the number of output cols/rows.¶

Below, use `pyresample kd_tree`'s get_neighbour_info to create arrays with information on the nearest neighbor to each grid point.¶

This is the most computationally heavy task in the swath2grid conversion and using `get_neighbour_info` speeds up the process if you plan to resample multiple SDS within an ECOSTRESS product (compute once instead of for every SDS).¶

Above, the function is comparing the swath and area definitions to locate the nearest neighbor (neighbours=1). 210 is the `radius_of_influence`, or the radius used to search for the nearest neighboring pixel in the swath (in meters).¶

List the attributes for the `ETinst` layer, which can then be used to define the fill value and scale factor.¶

Above, resample the swath `ecoSD` array using nearest neighbor (already calculated in section 3b. and defined above as the `index`, `outdex`, and `indexArr`), and also set the fill value that was defined in section 3c.¶

Rerun steps 3c - 3e for `ETinstUncertainty`.¶

Now that the data have been imported and resampled into a gridded raster array, export the results as GeoTIFFs using a `for` loop in this section.¶

In this section, begin by highlighting how to open a csv file using the `pandas` package.¶

Next, use the `parser` package and a lambda function to go through each time stamp and reformat to date and time objects.¶

Before plotting the ET data, set up an Evapotranspiration color map using `LinearSegmentedColormap` from the `matplotlib` package.¶

In this section, begin by highlighting the functionality of the `matplotlib` plotting package. First, make a plot of the entire gridded ET output. Next, zoom in on the tower location and add some additional parameters to the plot. Finally, export the completed plot to an image file.¶

In case the 3x3 grid contains missing values, use `np.nanmedian` to ignoring missing values and calculate the measure of central tendency.¶