Masking, Visualizing, and Plotting AppEEARS Output GeoTIFF Time Series

This tutorial demonstrates how to use Python to explore time series data in GeoTIFF format generated from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) Area Sampler.

AppEEARS allows users to select GeoTIFF as an output file format for Area Requests. This tutorial provides step-by-step instructions for how to filter data in AppEEARS based on specific quality indicators and land cover types, re-create AppEEARS output visualizations and statistics, and generate additional plots to visualize time series data in Python.


Case Study: An Analysis of Field Crop Yield Estimates vs. MODIS Enhanced Vegetation Index (EVI) Data

Reference:

Johnson, D. M. (2016). A comprehensive assessment of the correlations between field crop yields and commonly used MODIS products. International Journal of Applied Earth Observation and Geoinformation, 52, 65-81, https://doi.org/10.1016/j.jag.2016.05.010.

What: The study provides an assessment of correlations between field crop yield estimates and MODIS products.
Why: To explore the range of temperature and vegetation products available at regional-scale (MODIS) to document the relationships between remotely sensed measurements and yield for a variety of crop types.
How: The study analyzed correlation and timing of several MODIS products against field crop yields for ten crops at the county-level.
Where: Conterminous United States, MODIS Tiles: h09-v04,v05,v06; h10-v04,v05,v06; h11-v04,v05
When: April-October 2008-2013
AppEEARS Request Parameters:

  • Date (Recurring): April 16 to October 16, 2008-2013
  • Data layers:
    • Terra MODIS Vegetation Indices
    • Combined MODIS Land Cover Type
  • Location: Iowa, United States
  • Output File Format: GeoTIFF
  • Output Projection: Geographic (EPSG: 4326)

Field Crop Yield Estimates Request Parameters: This tutorial uses annual crop yields statistics from the United States Department of Agriculture National Agricultural Statistics Service (USDA NASS). The statistics used include annual surveys of corn and soybean yields by year for the state of Iowa, with yield measured in BU/Acre. The data are available for download through the QuickStats platform.


Topics Covered:

  1. Getting Started
    1a. Setting Up the Working Environment
    1b. Opening a GeoTIFF (.tif) File
    1c. Reading GeoTIFF Metadata
    1d. Applying Scale Factors
  2. Filtering Data
    2a. Reading Quality Lookup Tables (LUTs) from CSV Files
    2b. Masking by Quality
    2c. Masking by Land Cover Type
  3. Visualizations and Statistics
    3a. Visualizing GeoTIFF Files
    3b. Generating Statistics
    3c. Visualizing Statistics
  4. Working with Time Series (Automation)
    4a. Batch Processing
    4b. Visualizing MODIS Time Series
    4c. Generating Statistics
    4d. Visualizing Statistics
    4e. Exporting Masked GeoTIFF Files

AppEEARS Information:

To access AppEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/
For information on how to make an AppEEARS request, visit: https://lpdaacsvc.cr.usgs.gov/appeears/help

Statistics:

The statistics covered in this tutorial are the same as the statistics available in AppEEARS. These include box plot statistics, such as the total number of values/pixels (n), the median, and the interquartile range (IQR) of the sample data for values within a feature from a selected date/layer. Also calculated are the Lower 1.5 IQR, which represent the lowest datum still within 1.5 IQR of the lower quartile, and the Upper 1.5 IQR, which represent the highest datum still within 1.5 IQR of the upper quartile. Additional statistics generated for each layer/observation/feature combination include min/max/range, mean, standard deviation, and variance. Frequency distributions also will be calculated for the categorical land cover type 1 layer.


Files Used in this Tutorial:

Source Code used to Generate this Tutorial:

NOTE: You will need to have GDAL installed on your OS to execute this tutorial on your own.


1. Getting Started

Import the python libraries needed to run this tutorial. If you are missing a package, you will need to download it to execute the notebook.

In [1]:
# Import libraries
import os
import glob
from osgeo import gdal
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
import pandas as pd
import datetime as dt

1a. Setting Up the Working Environment

Next, set your input/working directory, and create an output directory for the results.

In [2]:
# Set input directory, and change working directory
inDir = 'C:/Users/ckrehbiel/Documents/appeears-tutorials/GeoTIFF/'            # IMPORTANT: Update to reflect directory on your OS
os.chdir(inDir)                                                               # Change to working directory
outDir = os.path.normpath(os.path.split(inDir)[0] + os.sep + 'output') + '\\' # Create and set output directory
if not os.path.exists(outDir): os.makedirs(outDir)

If you plan to execute this tutorial on your own OS, `inDir` above needs to be changed.


1b. Opening a GeoTIFF (.tif) File

You will need to download the MOD13Q1.006 and MCD12Q1.051 zip files and unzip them into the inDir directory to follow along in the tutorial.

Search for the GeoTIFF files in the working directory, create a list of the files, and print the list of MCD12Q1.051 files.

In [3]:
EVIFiles = glob.glob('MOD13Q1.006__250m_16_days_EVI_**.tif') # Search for and create a list of EVI files
LCTFiles =  glob.glob('MCD12Q1.051_Land_Cover_Type**.tif')   # Search for and create a list of LCT files
LCTFiles
Out[3]:
['MCD12Q1.051_Land_Cover_Type_1_doy2008001_aid0001.tif',
 'MCD12Q1.051_Land_Cover_Type_1_doy2009001_aid0001.tif',
 'MCD12Q1.051_Land_Cover_Type_1_doy2010001_aid0001.tif',
 'MCD12Q1.051_Land_Cover_Type_1_doy2011001_aid0001.tif',
 'MCD12Q1.051_Land_Cover_Type_1_doy2012001_aid0001.tif',
 'MCD12Q1.051_Land_Cover_Type_1_doy2013001_aid0001.tif']

The products used here include the version 5.1 MODIS Combined Yearly Land Cover Type 1 (MCD12Q1.051) and version 6 MODIS Terra 16-day Vegetation Indices (MOD13Q1.006) products. Start with the first MOD13Q1.006 GeoTIFF file.

Notice below that the list index for EVIFiles is set to 0, indicating that we want the first file in the list. Read the file in by calling the Open() function from the Geospatial Data Abstraction library (GDAL).

In [4]:
EVI = gdal.Open(EVIFiles[0])                    # Read file in, starting with MOD13Q1 version 6
EVIBand = EVI.GetRasterBand(1)                  # Read the band (layer)
EVIData = EVIBand.ReadAsArray().astype('float') # Import band as an array with type float

1c. Reading GeoTIFF Metadata

Below, read the metadata in the GeoTIFF file. Start by using the AppEEARS output filename for some basic metadata.

In [5]:
print(EVIFiles[0])
MOD13Q1.006__250m_16_days_EVI_doy2008113_aid0001.tif

Above is an example of an AppEEARS Output file name. The standard format is as follows:

Product.Version_layer_doyYYYYDDD_aid####.tif

Below, use the split() function to split the AppEEARS filename string into product.version, layer, area id (aid) and date of the observation, converting from YYYYDDD to MM/DD/YYYY.

In [6]:
# File name metadata:
productId = EVIFiles[0].split('_')[0]                                          # First: product name
layerId = EVIFiles[0].split(productId + '_')[1].split('_doy')[0]               # Second: layer name
yeardoy = EVIFiles[0].split(layerId+'_doy')[1].split('_aid')[0]                # Third: date
aid = EVIFiles[0].split(yeardoy+'_')[1].split('.tif')[0]                       # Fourth: unique ROI identifier (aid)
date = dt.datetime.strptime(yeardoy, '%Y%j').strftime('%m/%d/%Y')              # Convert YYYYDDD to MM/DD/YYYY
print('Product Name: {}\nLayer Name: {}\nDate of Observation: {}'.format(productId, layerId, date))
Product Name: MOD13Q1.006
Layer Name: _250m_16_days_EVI
Date of Observation: 04/22/2008

Below, use the GDAL function GetMetadata() to read the metadata stored within the GeoTIFF file. Store the information as variables to be used at the end of the tutorial for exporting the masked arrays back to GeoTIFF files.

In [7]:
# File Metadata
EVI_meta = EVI.GetMetadata()                   # Store metadata in dictionary
rows, cols = EVI.RasterYSize, EVI.RasterXSize  # Number of rows,columns

# Projection information
geotransform = EVI.GetGeoTransform()
proj= EVI.GetProjection() 

# Band metadata
EVIFill = EVIBand.GetNoDataValue()            # Returns fill value
EVIStats = EVIBand.GetStatistics(True, True)  # returns min, max, mean, and standard deviation
EVI = None                                    # Close the GeoTIFF file
print('Min EVI: {}\nMax EVI: {}\nMean EVI: {}\nSD EVI: {}'.format(EVIStats[0],EVIStats[1], EVIStats[2], EVIStats[3]))
Min EVI: -1002.0
Max EVI: 7570.0
Mean EVI: 2097.7074928952597
SD EVI: 960.9360144243974

Notice above that the EVI data are unscaled.


1d. Applying Scale Factors

Next, search for the scale factor in the metadata and apply it to the unscaled array.

In [8]:
scaleFactor = float(EVI_meta['scale_factor'])  # Search the metadata dictionary for the scale factor 
units = EVI_meta['units']                      # Search the metadata dictionary for the units
EVIData[EVIData == EVIFill] = np.nan           # Set the fill value equal to NaN for the array
EVIScaled = EVIData * scaleFactor              # Apply the scale factor using simple multiplication

# Generate statistics on the scaled data
EVIStats_sc = [np.nanmin(EVIScaled), np.nanmax(EVIScaled), np.nanmean(EVIScaled), np.nanstd(EVIScaled)] # Create a list of stats
print('Min EVI: {}\nMax EVI: {}\nMean EVI: {}\nSD EVI: {}'.format(EVIStats_sc[0],EVIStats_sc[1], EVIStats_sc[2], EVIStats_sc[3]))
Min EVI: -0.1277
Max EVI: 0.8963000000000001
Mean EVI: 0.21003303941694598
SD EVI: 0.09669416460347872

Above, notice the statistics have changed and appear to fall within a normal distribution (-1 to 1) for EVI values.


2. Filtering Data

2a. Reading Quality Lookup Tables (LUTs) from CSV Files

In this section, take the quality lookup table (LUT) CSV file from an AppEEARS request, read it into Python using the pandas library, and use it to help set the custom quality masking parameters.

The quality LUT for your request (files ending in lookup.csv) can be found under Supporting Files on the Download Area Sample page:

For more information on how to retrieve the quality LUTs from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.

Once you have saved the MOD13Q1.006 Quality Lookup Table to the inDir directory, search the directory for the LUT and VI Quality GeoTIFFs.

In [9]:
lut = glob.glob('**lookup.csv')                                        # Search for look up table 
qualityFiles =glob.glob('MOD13Q1.006__250m_16_days_VI_Quality**.tif')  # Search the directory for the associated quality .tifs
quality = gdal.Open(qualityFiles[0])                                   # Open the first quality file
qualityData = quality.GetRasterBand(1).ReadAsArray()                   # Read in as an array
quality = None                                                         # Close the quality file

Below, use the read_csv function from the pandas library to read the CSV file in as a pandas dataframe.

In [10]:
v6_QA_lut = pd.read_csv(lut[0])     # Read in the lut
v6_QA_lut.head()                    # print the first few rows of the pandas dataframe
Out[10]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
0 2057 VI produced, but check other QA Decreasing quality Climatology No No No Land (Nothing else but land) No No
1 2058 Pixel produced, but most probably cloudy Decreasing quality Climatology No No No Land (Nothing else but land) No No
2 2061 VI produced, but check other QA Decreasing quality Climatology No No No Land (Nothing else but land) No No
3 2062 Pixel produced, but most probably cloudy Decreasing quality Climatology No No No Land (Nothing else but land) No No
4 2112 VI produced with good quality Highest quality Low No No No Land (Nothing else but land) No No

Above shows the quality information contained in the LUT. "Value" refers to the pixel value, followed by each specific bit-word beginning with MODLAND. AppEEARS decodes each binary quality bit-field into more meaningful information for the user.


2b. Masking by Quality

This section demonstrates how to take the list of quality values imported above, and mask the quality and EVI arrays to include only pixels where the quality value is in the list of 'good quality' values based on the parameters defined below.

Quality parameters used to define the masking process:

1. MODLAND = 0 or 1

0 = good quality 
1 = check other QA  

2. VI usefulness <= 11

0000    Highest quality = 0  
0001    Lower quality = 1  
0010    Decreasing quality = 2  
...  
1011   Decreasing quality = 11      

3. Adjacent cloud detected, Mixed Clouds, and Possible shadow = 0

0 = No

4. Aerosol Quantity = 1 or 2

1 = low aerosol
2 = average aerosol  

Below, use the pixel quality flag parameters outlined above to filter the data. Pandas makes this easier by allowing you to search the dataframe by column and value, and only keep the specific values that relate to the parameters desired.

In [11]:
# Include good quality based on MODLAND
v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(['VI produced with good quality', 'VI produced, but check other QA'])]

# Exclude lower quality VI usefulness
VIU =["Lowest quality","Quality so low that it is not useful","L1B data faulty","Not useful for any other reason/not processed"]
v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)]

v6_QA_lut = v6_QA_lut[v6_QA_lut['Aerosol Quantity'].isin(['Low','Average'])]   # Include low or average aerosol
v6_QA_lut = v6_QA_lut[v6_QA_lut['Adjacent cloud detected'] == 'No' ]           # Include where adjacent cloud not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Mixed Clouds'] == 'No' ]                      # Include where mixed clouds not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Possible shadow'] == 'No' ]                   # Include where possible shadow not detected
v6_QA_lut
Out[11]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
4 2112 VI produced with good quality Highest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced with good quality Lower quality Low No No No Land (Nothing else but land) No No
13 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
16 2185 VI produced, but check other QA Decreasing quality Average No No No Land (Nothing else but land) No No
19 2189 VI produced, but check other QA Decreasing quality Average No No No Land (Nothing else but land) No No
66 4160 VI produced with good quality Highest quality Low No No No Ocean coastlines and lake shorelines No No
69 4164 VI produced with good quality Lower quality Low No No No Ocean coastlines and lake shorelines No No
72 4168 VI produced with good quality Decreasing quality Low No No No Ocean coastlines and lake shorelines No No
75 4229 VI produced, but check other QA Lower quality Average No No No Ocean coastlines and lake shorelines No No
78 4233 VI produced, but check other QA Decreasing quality Average No No No Ocean coastlines and lake shorelines No No
119 6208 VI produced with good quality Highest quality Low No No No Shallow inland water No No
122 6212 VI produced with good quality Lower quality Low No No No Shallow inland water No No
125 6216 VI produced with good quality Decreasing quality Low No No No Shallow inland water No No
129 6277 VI produced, but check other QA Lower quality Average No No No Shallow inland water No No
131 6281 VI produced, but check other QA Decreasing quality Average No No No Shallow inland water No No
Notice in the line: v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)], it includes all rows where the MODLAND column is either 'VI_produced with good quality' or 'VI produced, but check other QA'. However, in the line: v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)], by adding the tilde in front of the command, the pandas dataframe will exclude the values in the VIU variable.

Next, demonstrate how to take the list of possible values (based on table of good quality values above), and mask the quality array. Ultimately, use the masked quality array to mask the EVI array to include only pixels where the quality value is in the list of 'good quality' values.

In [12]:
goodQuality = list(v6_QA_lut['Value']) # Retrieve list of possible QA values from the quality dataframe
print(goodQuality)
[2112, 2116, 2181, 2185, 2189, 4160, 4164, 4168, 4229, 4233, 6208, 6212, 6216, 6277, 6281]

So above, only pixels with those specific _250m_16_days_VI_Quality values will be included after masking.

Below, mask the EVI array using the quality mask. Notice that invert is set to "True" here, meaning that you do not want to exclude the list of values above, but rather exclude all other values.

In [13]:
EVI_masked = np.ma.MaskedArray(EVIScaled, np.in1d(qualityData, goodQuality, invert = True))    # Apply QA mask to the EVI data

2c. Masking by Land Cover Type

Now, take the quality filtered data and apply a mask by land cover type, only keeping those pixels defined as 'croplands' in the MCD12Q1.051 Land Cover Type 1 file. We use Scipy to resample the LCT array (500 m resolution) down to 250 m resolution to match the MOD13Q1.006 EVI array.

In [14]:
LCT = gdal.Open(LCTFiles[0])                              # Open the 2008 land cover type 1 file
LCTData = LCT.GetRasterBand(1).ReadAsArray()              # Read the file as an array
year =LCTFiles[0].split('_doy')[1].split('_aid')[0][0:4]  # Grab the year from the file name
LCT_resampled = scipy.ndimage.zoom(LCTData, 2, order=0)   # Resample by a factor of 2 with nearest neighbor interpolation

# Cropland =12 for Land Cover Type 1 (https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mcd12q1)
cropland = 12 

# Resampling leads to 1 extra row and column, so delete first row and last column to align arrays
LCT_resampled = np.delete(LCT_resampled, 0, 0)
LCT_resampled = np.delete(LCT_resampled, 3121, 1)
EVI_crops = np.ma.MaskedArray(EVI_masked, np.in1d(LCT_resampled, cropland, invert = True)) # Mask array, only include Croplands
LCT = None                                                                                 # Close the file

3. Visualization and Statistics

In [15]:
# Set matplotlib plots inline
%matplotlib inline 

3a. Visualizing GeoTIFF Files

In this section, begin by highlighting the functionality of the matplotlib plotting package. First, make a basic plot of the EVI data. Next, add some additional parameters to the plot. Finally, export the completed plot to an image file.

In [16]:
plt.imshow(EVIScaled);  # Visualize a basic plot of the scaled EVI data

Above, adding ; to the end of the call suppresses the text output from appearing in the notebook. The output is a basic matplotlib plot of the scaled EVI array showing the state of Iowa, United States.

Next, add some additional parameters to the visualization.

In [17]:
plt.figure(figsize = (10,7.5))    # Set the figure size (x,y)
plt.axis('off')                   # Remove the axes' values

# Plot the array, using a colormap and setting a custom linear stretch based on the min/max EVI values
plt.imshow(EVIScaled, vmin = np.nanmin(EVIScaled), vmax = np.nanmax(EVIScaled), cmap = 'YlGn');

Above, notice a better visual representation of the data. However, it still is missing important items, such as a title and a colormap legend.

Plot the quality-filtered, land cover-masked EVI array, add a title/subtitle and colormap legend, and export to a .png file in the output directory.

In [18]:
fig = plt.figure(figsize=(10,6.5))                                                                 # Set the figure size
plt.axis('off')                                                                                    # Remove the axes' values
fig.suptitle('MODIS Version 6 EVI Quality-Filtered Cropland Data', fontsize=16, fontweight='bold') # Make a figure title
ax = fig.add_subplot(111)                                                                          # Make a subplot
fig.subplots_adjust(top=3.8)                                                                       # Adjust spacing
ax.set_title('Iowa: {}'.format(date), fontsize=12, fontweight='bold')                              # Add figure subtitle
ax1 = plt.gca()                                                                                    # Get current axes

# Plot the masked data, using a colormap and setting a custom linear stretch based on the min/max EVI values
im = ax1.imshow(EVI_crops, vmin=EVI_crops.min(), vmax=EVI_crops.max(), cmap='YlGn');

# Add a colormap legend
plt.colorbar(im, orientation='horizontal', fraction=0.047, pad=0.004, label='EVI', shrink=0.6).outline.set_visible(True)

# Set up file name and export to png file
fig.savefig('{}{}.png'.format(outDir, (EVIFiles[0].split('.tif'))[0]), bbox_inches='tight')