# Masking, Visualizing, and Plotting AppEEARS Output GeoTIFF Time Series¶

### This tutorial demonstrates how to use Python to explore time series data in GeoTIFF format generated from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) Area Sampler.¶

AppEEARS allows users to select GeoTIFF as an output file format for Area Requests. This tutorial provides step-by-step instructions for how to filter data in AppEEARS based on specific quality indicators and land cover types, re-create AppEEARS output visualizations and statistics, and generate additional plots to visualize time series data in Python.

### Case Study: An Analysis of Field Crop Yield Estimates vs. MODIS Enhanced Vegetation Index (EVI) Data¶

Reference:

Johnson, D. M. (2016). A comprehensive assessment of the correlations between field crop yields and commonly used MODIS products. International Journal of Applied Earth Observation and Geoinformation, 52, 65-81, https://doi.org/10.1016/j.jag.2016.05.010.

What: The study provides an assessment of correlations between field crop yield estimates and MODIS products.
Why: To explore the range of temperature and vegetation products available at regional-scale (MODIS) to document the relationships between remotely sensed measurements and yield for a variety of crop types.
How: The study analyzed correlation and timing of several MODIS products against field crop yields for ten crops at the county-level.
Where: Conterminous United States, MODIS Tiles: h09-v04,v05,v06; h10-v04,v05,v06; h11-v04,v05
When: April-October 2008-2013
AppEEARS Request Parameters:

• Date (Recurring): April 16 to October 16, 2008-2013
• Data layers:
• Terra MODIS Vegetation Indices
• Combined MODIS Land Cover Type
• Location: Iowa, United States
• Output File Format: GeoTIFF
• Output Projection: Geographic (EPSG: 4326)

Field Crop Yield Estimates Request Parameters: This tutorial uses annual crop yields statistics from the United States Department of Agriculture National Agricultural Statistics Service (USDA NASS). The statistics used include annual surveys of corn and soybean yields by year for the state of Iowa, with yield measured in BU/Acre. The data are available for download through the QuickStats platform.

# Topics Covered:¶

1. Getting Started
1a. Setting Up the Working Environment
1b. Opening a GeoTIFF (.tif) File
1d. Applying Scale Factors
2. Filtering Data
2a. Reading Quality Lookup Tables (LUTs) from CSV Files
2c. Masking by Land Cover Type
3. Visualizations and Statistics
3a. Visualizing GeoTIFF Files
3b. Generating Statistics
3c. Visualizing Statistics
4. Working with Time Series (Automation)
4a. Batch Processing
4b. Visualizing MODIS Time Series
4c. Generating Statistics
4d. Visualizing Statistics

### AppEEARS Information:¶

To access AppEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/
For information on how to make an AppEEARS request, visit: https://lpdaacsvc.cr.usgs.gov/appeears/help

### Statistics:¶

The statistics covered in this tutorial are the same as the statistics available in AppEEARS. These include box plot statistics, such as the total number of values/pixels (n), the median, and the interquartile range (IQR) of the sample data for values within a feature from a selected date/layer. Also calculated are the Lower 1.5 IQR, which represent the lowest datum still within 1.5 IQR of the lower quartile, and the Upper 1.5 IQR, which represent the highest datum still within 1.5 IQR of the upper quartile. Additional statistics generated for each layer/observation/feature combination include min/max/range, mean, standard deviation, and variance. Frequency distributions also will be calculated for the categorical land cover type 1 layer.

### Source Code used to Generate this Tutorial:¶

NOTE: Be aware that any reprojection of data from its source projection to a different projection will inherently change the data from its original format. All reprojections use the [GDAL](http://www.gdal.org/) [gdalwarp](http://www.gdal.org/gdalwarp.html) function with the PROJ.4 string listed above. For additional information, see the AppEEARS [help documentation](https://lpdaacsvc.cr.usgs.gov/appeears/help).

# 1. Getting Started

#### Import the python libraries needed to run this tutorial. If you are missing a package, you will need to download it to execute the notebook.¶

In [1]:
# Import libraries
import os
import glob
from osgeo import gdal
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
import pandas as pd
import datetime as dt


## 1a. Setting Up the Working Environment¶

#### Next, set your input/working directory, and create an output directory for the results.¶

In [2]:
# Set input directory, and change working directory
inDir = 'C:/Users/ckrehbiel/Documents/appeears-tutorials/GeoTIFF/'            # IMPORTANT: Update to reflect directory on your OS
os.chdir(inDir)                                                               # Change to working directory
outDir = os.path.normpath(os.path.split(inDir)[0] + os.sep + 'output') + '\\' # Create and set output directory
if not os.path.exists(outDir): os.makedirs(outDir)


If you plan to execute this tutorial on your own OS, inDir above needs to be changed.

## 1b. Opening a GeoTIFF (.tif) File¶

#### Search for the GeoTIFF files in the working directory, create a list of the files, and print the list of MCD12Q1.051 files.¶

In [3]:
EVIFiles = glob.glob('MOD13Q1.006__250m_16_days_EVI_**.tif') # Search for and create a list of EVI files
LCTFiles =  glob.glob('MCD12Q1.051_Land_Cover_Type**.tif')   # Search for and create a list of LCT files
LCTFiles

Out[3]:
['MCD12Q1.051_Land_Cover_Type_1_doy2008001_aid0001.tif',
'MCD12Q1.051_Land_Cover_Type_1_doy2009001_aid0001.tif',
'MCD12Q1.051_Land_Cover_Type_1_doy2010001_aid0001.tif',
'MCD12Q1.051_Land_Cover_Type_1_doy2011001_aid0001.tif',
'MCD12Q1.051_Land_Cover_Type_1_doy2012001_aid0001.tif',
'MCD12Q1.051_Land_Cover_Type_1_doy2013001_aid0001.tif']

#### Notice below that the list index for EVIFiles is set to 0, indicating that we want the first file in the list. Read the file in by calling the Open() function from the Geospatial Data Abstraction library (GDAL).¶

In [4]:
EVI = gdal.Open(EVIFiles[0])                    # Read file in, starting with MOD13Q1 version 6
EVIBand = EVI.GetRasterBand(1)                  # Read the band (layer)
EVIData = EVIBand.ReadAsArray().astype('float') # Import band as an array with type float


#### Below, read the metadata in the GeoTIFF file. Start by using the AppEEARS output filename for some basic metadata.¶

In [5]:
print(EVIFiles[0])

MOD13Q1.006__250m_16_days_EVI_doy2008113_aid0001.tif


## Product.Version_layer_doyYYYYDDD_aid####.tif¶

#### Below, use the split() function to split the AppEEARS filename string into product.version, layer, area id (aid) and date of the observation, converting from YYYYDDD to MM/DD/YYYY.¶

In [6]:
# File name metadata:
productId = EVIFiles[0].split('_')[0]                                          # First: product name
layerId = EVIFiles[0].split(productId + '_')[1].split('_doy')[0]               # Second: layer name
yeardoy = EVIFiles[0].split(layerId+'_doy')[1].split('_aid')[0]                # Third: date
aid = EVIFiles[0].split(yeardoy+'_')[1].split('.tif')[0]                       # Fourth: unique ROI identifier (aid)
date = dt.datetime.strptime(yeardoy, '%Y%j').strftime('%m/%d/%Y')              # Convert YYYYDDD to MM/DD/YYYY
print('Product Name: {}\nLayer Name: {}\nDate of Observation: {}'.format(productId, layerId, date))

Product Name: MOD13Q1.006
Layer Name: _250m_16_days_EVI
Date of Observation: 04/22/2008


#### Below, use the GDAL function GetMetadata() to read the metadata stored within the GeoTIFF file. Store the information as variables to be used at the end of the tutorial for exporting the masked arrays back to GeoTIFF files.¶

In [7]:
# File Metadata
rows, cols = EVI.RasterYSize, EVI.RasterXSize  # Number of rows,columns

# Projection information
geotransform = EVI.GetGeoTransform()
proj= EVI.GetProjection()

EVIFill = EVIBand.GetNoDataValue()            # Returns fill value
EVIStats = EVIBand.GetStatistics(True, True)  # returns min, max, mean, and standard deviation
EVI = None                                    # Close the GeoTIFF file
print('Min EVI: {}\nMax EVI: {}\nMean EVI: {}\nSD EVI: {}'.format(EVIStats[0],EVIStats[1], EVIStats[2], EVIStats[3]))

Min EVI: -1002.0
Max EVI: 7570.0
Mean EVI: 2097.7074928952597
SD EVI: 960.9360144243974


## 1d. Applying Scale Factors¶

#### Next, search for the scale factor in the metadata and apply it to the unscaled array.¶

In [8]:
scaleFactor = float(EVI_meta['scale_factor'])  # Search the metadata dictionary for the scale factor
units = EVI_meta['units']                      # Search the metadata dictionary for the units
EVIData[EVIData == EVIFill] = np.nan           # Set the fill value equal to NaN for the array
EVIScaled = EVIData * scaleFactor              # Apply the scale factor using simple multiplication

# Generate statistics on the scaled data
EVIStats_sc = [np.nanmin(EVIScaled), np.nanmax(EVIScaled), np.nanmean(EVIScaled), np.nanstd(EVIScaled)] # Create a list of stats
print('Min EVI: {}\nMax EVI: {}\nMean EVI: {}\nSD EVI: {}'.format(EVIStats_sc[0],EVIStats_sc[1], EVIStats_sc[2], EVIStats_sc[3]))

Min EVI: -0.1277
Max EVI: 0.8963000000000001
Mean EVI: 0.21003303941694598
SD EVI: 0.09669416460347872


# 2. Filtering Data

## 2a. Reading Quality Lookup Tables (LUTs) from CSV Files¶

#### Once you have saved the MOD13Q1.006 Quality Lookup Table to the inDir directory, search the directory for the LUT and VI Quality GeoTIFFs.¶

In [9]:
lut = glob.glob('**lookup.csv')                                        # Search for look up table
qualityFiles =glob.glob('MOD13Q1.006__250m_16_days_VI_Quality**.tif')  # Search the directory for the associated quality .tifs
quality = gdal.Open(qualityFiles[0])                                   # Open the first quality file
quality = None                                                         # Close the quality file


#### Below, use the read_csv function from the pandas library to read the CSV file in as a pandas dataframe.¶

In [10]:
v6_QA_lut = pd.read_csv(lut[0])     # Read in the lut
v6_QA_lut.head()                    # print the first few rows of the pandas dataframe

Out[10]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
0 2057 VI produced, but check other QA Decreasing quality Climatology No No No Land (Nothing else but land) No No
1 2058 Pixel produced, but most probably cloudy Decreasing quality Climatology No No No Land (Nothing else but land) No No
2 2061 VI produced, but check other QA Decreasing quality Climatology No No No Land (Nothing else but land) No No
3 2062 Pixel produced, but most probably cloudy Decreasing quality Climatology No No No Land (Nothing else but land) No No
4 2112 VI produced with good quality Highest quality Low No No No Land (Nothing else but land) No No

### Quality parameters used to define the masking process:¶

#### 1. MODLAND = 0 or 1¶

0 = good quality
1 = check other QA


#### 2. VI usefulness <= 11¶

0000    Highest quality = 0
0001    Lower quality = 1
0010    Decreasing quality = 2
...
1011   Decreasing quality = 11


#### 3. Adjacent cloud detected, Mixed Clouds, and Possible shadow = 0¶

0 = No


#### 4. Aerosol Quantity = 1 or 2¶

1 = low aerosol
2 = average aerosol  

#### Below, use the pixel quality flag parameters outlined above to filter the data. Pandas makes this easier by allowing you to search the dataframe by column and value, and only keep the specific values that relate to the parameters desired.¶

In [11]:
# Include good quality based on MODLAND
v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(['VI produced with good quality', 'VI produced, but check other QA'])]

# Exclude lower quality VI usefulness
VIU =["Lowest quality","Quality so low that it is not useful","L1B data faulty","Not useful for any other reason/not processed"]
v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)]

v6_QA_lut = v6_QA_lut[v6_QA_lut['Aerosol Quantity'].isin(['Low','Average'])]   # Include low or average aerosol
v6_QA_lut = v6_QA_lut[v6_QA_lut['Adjacent cloud detected'] == 'No' ]           # Include where adjacent cloud not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Mixed Clouds'] == 'No' ]                      # Include where mixed clouds not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Possible shadow'] == 'No' ]                   # Include where possible shadow not detected
v6_QA_lut

Out[11]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
4 2112 VI produced with good quality Highest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced with good quality Lower quality Low No No No Land (Nothing else but land) No No
13 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
16 2185 VI produced, but check other QA Decreasing quality Average No No No Land (Nothing else but land) No No
19 2189 VI produced, but check other QA Decreasing quality Average No No No Land (Nothing else but land) No No
66 4160 VI produced with good quality Highest quality Low No No No Ocean coastlines and lake shorelines No No
69 4164 VI produced with good quality Lower quality Low No No No Ocean coastlines and lake shorelines No No
72 4168 VI produced with good quality Decreasing quality Low No No No Ocean coastlines and lake shorelines No No
75 4229 VI produced, but check other QA Lower quality Average No No No Ocean coastlines and lake shorelines No No
78 4233 VI produced, but check other QA Decreasing quality Average No No No Ocean coastlines and lake shorelines No No
119 6208 VI produced with good quality Highest quality Low No No No Shallow inland water No No
122 6212 VI produced with good quality Lower quality Low No No No Shallow inland water No No
125 6216 VI produced with good quality Decreasing quality Low No No No Shallow inland water No No
129 6277 VI produced, but check other QA Lower quality Average No No No Shallow inland water No No
131 6281 VI produced, but check other QA Decreasing quality Average No No No Shallow inland water No No
Notice in the line: v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)], it includes all rows where the MODLAND column is either 'VI_produced with good quality' or 'VI produced, but check other QA'. However, in the line: v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)], by adding the tilde in front of the command, the pandas dataframe will exclude the values in the VIU variable.

#### Next, demonstrate how to take the list of possible values (based on table of good quality values above), and mask the quality array. Ultimately, use the masked quality array to mask the EVI array to include only pixels where the quality value is in the list of 'good quality' values.¶

In [12]:
goodQuality = list(v6_QA_lut['Value']) # Retrieve list of possible QA values from the quality dataframe
print(goodQuality)

[2112, 2116, 2181, 2185, 2189, 4160, 4164, 4168, 4229, 4233, 6208, 6212, 6216, 6277, 6281]


#### Below, mask the EVI array using the quality mask. Notice that invert is set to "True" here, meaning that you do not want to exclude the list of values above, but rather exclude all other values.¶

In [13]:
EVI_masked = np.ma.MaskedArray(EVIScaled, np.in1d(qualityData, goodQuality, invert = True))    # Apply QA mask to the EVI data


# 2c. Masking by Land Cover Type¶

#### Now, take the quality filtered data and apply a mask by land cover type, only keeping those pixels defined as 'croplands' in the MCD12Q1.051 Land Cover Type 1 file. We use Scipy to resample the LCT array (500 m resolution) down to 250 m resolution to match the MOD13Q1.006 EVI array.¶

In [14]:
LCT = gdal.Open(LCTFiles[0])                              # Open the 2008 land cover type 1 file
year =LCTFiles[0].split('_doy')[1].split('_aid')[0][0:4]  # Grab the year from the file name
LCT_resampled = scipy.ndimage.zoom(LCTData, 2, order=0)   # Resample by a factor of 2 with nearest neighbor interpolation

# Cropland =12 for Land Cover Type 1 (https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mcd12q1)
cropland = 12

# Resampling leads to 1 extra row and column, so delete first row and last column to align arrays
LCT_resampled = np.delete(LCT_resampled, 0, 0)
LCT_resampled = np.delete(LCT_resampled, 3121, 1)
LCT = None                                                                                 # Close the file


# 3. Visualization and Statistics

In [15]:
# Set matplotlib plots inline
%matplotlib inline


## 3a. Visualizing GeoTIFF Files¶

#### In this section, begin by highlighting the functionality of the matplotlib plotting package. First, make a basic plot of the EVI data. Next, add some additional parameters to the plot. Finally, export the completed plot to an image file.¶

In [16]:
plt.imshow(EVIScaled);  # Visualize a basic plot of the scaled EVI data


#### Above, adding ; to the end of the call suppresses the text output from appearing in the notebook. The output is a basic matplotlib plot of the scaled EVI array showing the state of Iowa, United States.¶

In [17]:
plt.figure(figsize = (10,7.5))    # Set the figure size (x,y)
plt.axis('off')                   # Remove the axes' values

# Plot the array, using a colormap and setting a custom linear stretch based on the min/max EVI values
plt.imshow(EVIScaled, vmin = np.nanmin(EVIScaled), vmax = np.nanmax(EVIScaled), cmap = 'YlGn');


#### Plot the quality-filtered, land cover-masked EVI array, add a title/subtitle and colormap legend, and export to a .png file in the output directory.¶

In [18]:
fig = plt.figure(figsize=(10,6.5))                                                                 # Set the figure size
plt.axis('off')                                                                                    # Remove the axes' values
fig.suptitle('MODIS Version 6 EVI Quality-Filtered Cropland Data', fontsize=16, fontweight='bold') # Make a figure title
ax = fig.add_subplot(111)                                                                          # Make a subplot