Working with AppEEARS NetCDF-4 Output Data in Python

Details:

Published: Dec. 6, 2017

Working with AppEEARS NetCDF-4 Output Data in Python

This tutorial demonstrates how to work with data output from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) Area Sampler in Python.
AppEEARS allows users to select NetCDF-4 as an output file format for Area Requests. This tutorial provides step-by-step instructions for filtering NetCDF-4 files based on specific quality and land cover type determinations.

Case Study:

Throughout the tutorial, the following study is used as an example of how and why quality filtering is necessary in order to arrive at scientifically sound conclusions when working with remote sensing data. The tutorial follows similar data and methods of this study as closely as possible to demonstrate how to work with AppEEARS NetCDF-4 output files in Python.
  • Samanta, A., Ganguly, S., Hashimoto, H., Devadiga, S., Vermote, E., Knyazikhin, Y., Nemani, R.R., and Myneni, R.B., 2010, Amazon forests did not green‐up during the 2005 drought: Geophysical Research Letters, v. 37, no. 5.

What: The study tries to reproduce reports from previous studies showing greening of Amazon forests during the 2005 drought.
Why: To demonstrate the importance of considering remote sensing data quality, and reconcile contradicting reports of increased tree mortality and biomass burning with reports of anomalous Amazon forest greening.
How: The authors replicated a previous study claiming that Amazon forests were greening during the 2005 drought, with very specific quality and land cover type screening in their methods. They then used the quality-filtered data to calculate standard anomalies of EVI in order to arrive at their conclusions.
Where: Amazon Rainforest, from latitude: 20° S to 10° N, and longitude: 80° W to 45° W
When: July-September 2000-2008
AppEEARS Request Parameters:
Request time series data for MODIS Enhance Vegetation Indices (EVI) and Land Cover.

  • Date: July 01, 2005 to September 30, 2005
  • Data layers:
    • Terra MODIS Vegetation Indices
    • Combined MODIS Land Cover Type
      • MCD12Q1.051, 500m, Yearly: 'Land_Cover_Type_1'
  • Location: 20° S to 10° N, and longitude: 80° W to 45° W
  • File Format: NetCDF-4
  • Projection: Geographic (EPSG: 4326)

AppEEARS Information:

To access AppEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/
For information on how to make an AppEEARS request, visit: https://lpdaacsvc.cr.usgs.gov/appeears/help

Topics Covered:

  1. Setting Up the Working Environment
  2. Downloading AppEEARS Output in Python from Text File
  3. Opening a NetCDF-4 (.nc) File
  4. Reading NetCDF-4 File Metadata
  5. Reading Quality Lookup Tables (LUTs) from CSV Files
  6. Setting Custom QA/QC Filtering (Masking)
  7. Visualizing NetCDF-4 Data
  8. Masking by Land Cover Type
  9. Visualizing NetCDF-4 Data Using Basemap
  10. Generating Statistics on Quality Filtered Data
  11. Automating QA/QC Filtering
  12. Visualizing Time Series
  13. Exporting NetCDF-4 Files

Prerequisites:

  • Python 2.7
Libraries:
  • urllib2
  • shutil
  • os
  • glob
  • numpy
  • netCDF4
  • matplotlib
  • pandas
  • warnings
  • Basemap
  • scipy

NOTE: You will need to have cURL installed on your OS in order to perform topic two (downloading). If you are unable to download cURL, you can skip step two and download the files needed to run the rest of the tutorial in the section below.

The AppEEARS files used in this tutorial can be downloaded from:

The source code used to generate this tutorial can be downloaded from:


1. Setting Up A Working Environment

First, import the python libraries needed to run this tutorial. If you are missing any of the packages, you will need to download them in order to run the entire notebook.
# Set matplotlib plots inline
%matplotlib inline

# Import libraries
import urllib2
import shutil
import os
import glob
import numpy as np
import netCDF4
from netCDF4 import Dataset
Next, set your input/working directory, and create an output directory for the results.
# Set input directory, and change working directory
in_dir = 'C:/Users/ckrehbiel/Documents/appeears-tutorials/' # IMPORTANT: Need to update this to reflect your working directory
os.chdir(in_dir)

# Create and set output directory
out_dir = os.path.normpath((os.path.split(in_dir)[0] + os.sep + 'output/' ))+ '\\'
if not os.path.exists(out_dir):
    os.makedirs(out_dir)

2. Downloading AppEEARS Output in Python from Text File

Now take an AppEEARS output download text file and download all of your NetCDF-4 output files.

AppEEARS Download Textfile Example

For more information on how to retrieve the output download text file from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.
If you plan to use the files for the use case in this tutorial, you will need to download the text file of AppEEARS output file links and make sure that the text file is located in your input directory.
# Search for, open, and read text file containing links to AppEEARS output files
download_file = open(glob.glob('**.txt')[0],'r')
download_link = download_file.readlines()

# Loop through text file and download all AppEEARS outputs
for i in range(len(download_link)):  
    file_name = download_link[i].split('/')[-1].strip()

    # Download the file
    response = urllib2.urlopen(download_link[i])
    out_file = open(file_name, 'wb')
    shutil.copyfileobj(response, out_file)
    print('Downloaded file: '+ file_name + ' (' + str(i+1) + " of " + str(len(download_link)) + ')')
    out_file.close()

# Once you have finished downloading the files, remove variables not needed for the remainder of the tutorial
del download_link, file_name, download_file
Downloaded file: MCD12Q1.051_aid0001.nc (1 of 2)
Downloaded file: MOD13A2.006_aid0001.nc (2 of 2)
Notice that two files were downloaded. For an AppEEARS Area Sample request, if NetCDF-4 is selected as the output format, outputs will be grouped into .nc files by product and by feature.

3. Opening a NetCDF-4 (.nc) File

Search for the NetCDF-4 files that you just downloaded in your working directory, create a list of the files, and print the list.
# Now that we have files downloaded, search for downloaded files
file_list = glob.glob('**.nc')
file_list
['MCD12Q1.051_aid0001.nc', 'MOD13A2.006_aid0001.nc']
Notice that the request includes the version 5.1 MODIS Combined Yearly Land Cover Type (MCD12Q1.051) and version 6 MODIS Terra 16-day Vegetation Indices (MOD13A2.006) products. Start with the MOD13A2 version 6 NetCDF file.
Notice below that the file_list index is set to 1, to select the MOD13A2.006.nc file above [0,1]. Read the file in by calling the Dataset() function from the NetCDF4 library.
# Read file in, starting with MOD13A2 version 6
file_in = Dataset(file_list[1],'r', format = 'NETCDF4')

4. Reading NetCDF-4 File Metadata

Below, look at the variables and metadata in the NetCDF-4 file.
# Print a list of variables in file
list(file_in.variables)
[u'crs',
 u'time',
 u'lat',
 u'lon',
 u'_1_km_16_days_EVI',
 u'_1_km_16_days_VI_Quality']
_1_km_16_days_EVI and _1_km_16_days_VI_Quality are the data layers focused on for this tutorial, which are Enhanced Vegetation Index (EVI) and the corresponding quality data that were used in the case study.
Below, read in the metadata for the EVI dataset.
# Grab variable and variable metadata
v6_info = file_in.variables['_1_km_16_days_EVI']
v6_info
<type 'netCDF4._netCDF4.Variable'>
int16 _1_km_16_days_EVI(time, lat, lon)
    _FillValue: -3000
    coordinates: time lat lon
    grid_mapping: crs
    valid_min: -2000
    valid_max: 10000
    long_name: 1 km 16 days EVI
    units: EVI
    scale_factor_err: 0.0
    add_offset_err: 0.0
    calibrated_nt: 5
    scale_factor: 0.0001
    add_offset: 0.0
unlimited dimensions: 
current shape = (6, 3600, 4200)
filling on
Above, notice the useful metadata, including things like units, scale factor, and fill value.
Below, set the dataset arrays to variables by adding [:].
# import the EVI and VI Quality datasets
v6_EVI = file_in.variables['_1_km_16_days_EVI'][:]
v6_QA = file_in.variables['_1_km_16_days_VI_Quality'][:]


# File dimensions
file_in.dimensions.values()
[<type 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 6,
 <type 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 3600,
 <type 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 4200]
Above, notice the NetCDF-4 file dimensions. 'time' indicates that the file includes 6 timesteps or layers (observations).
Next, grab the min/max values from the latitude and longitude arrays that are included in the NetCDF-4 files.
# Set lat and lon min/max for EVI data
lat_EVI_min = file_in.variables['lat'][:].min()
lat_EVI_max = file_in.variables['lat'][:].max()
lon_EVI_min = file_in.variables['lon'][:].min()
lon_EVI_max = file_in.variables['lon'][:].max()

5. Reading Quality Lookup Tables (LUTs) from CSV Files

In this section, take the quality lookup table (LUT) CSV file from an AppEEARS request, read it into Python using the pandas library, and use it to help set the quality masking parameters.
The quality LUT for your request (files ending in lookup.csv) can be found under Supporting Files on the Download Area Sample page:

Example of AppEEARS quality LUT csv files

For more information on how to retrieve the quality LUTs from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.
Next, search the working directory for the quality LUT CSV file.
# Search for look up table 
lut = glob.glob('**lookup.csv')
lut
['MOD13A2-006-1-km-16-days-VI-Quality-lookup.csv']
Below, import the pandas library and use the pd.read_csv function to read the CSV file in as a pandas dataframe.
import pandas as pd

# Read in the lut
v6_QA_lut = pd.read_csv(lut[0])
v6_QA_lut
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
0 2050 Pixel produced, but most probably cloudy Higest quality Climatology No No No Land (Nothing else but land) No No
1 2057 VI produced, but check other QA Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
2 2058 Pixel produced, but most probably cloudy Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
3 2061 VI produced, but check other QA Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
4 2062 Pixel produced, but most probably cloudy Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
5 2112 VI produced, good quality Higest quality Low No No No Land (Nothing else but land) No No
6 2114 Pixel produced, but most probably cloudy Higest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced, good quality Lower quality Low No No No Land (Nothing else but land) No No
8 2118 Pixel produced, but most probably cloudy Lower quality Low No No No Land (Nothing else but land) No No
9 2177 VI produced, but check other QA Higest quality Average No No No Land (Nothing else but land) No No
10 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
11 2182 Pixel produced, but most probably cloudy Lower quality Average No No No Land (Nothing else but land) No No
12 2185 VI produced, but check other QA Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
13 2186 Pixel produced, but most probably cloudy Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
14 2241 VI produced, but check other QA Higest quality High No No No Land (Nothing else but land) No No
15 2242 Pixel produced, but most probably cloudy Higest quality High No No No Land (Nothing else but land) No No
16 2253 VI produced, but check other QA Decreasing quality (0011) High No No No Land (Nothing else but land) No No
17 2254 Pixel produced, but most probably cloudy Decreasing quality (0011) High No No No Land (Nothing else but land) No No
18 2257 VI produced, but check other QA Decreasing quality (0100) High No No No Land (Nothing else but land) No No
19 2258 Pixel produced, but most probably cloudy Decreasing quality (0100) High No No No Land (Nothing else but land) No No
20 2317 VI produced, but check other QA Decreasing quality (0011) Climatology Yes No No Land (Nothing else but land) No No
21 2318 Pixel produced, but most probably cloudy Decreasing quality (0011) Climatology Yes No No Land (Nothing else but land) No No
22 2321 VI produced, but check other QA Decreasing quality (0100) Climatology Yes No No Land (Nothing else but land) No No
23 2322 Pixel produced, but most probably cloudy Decreasing quality (0100) Climatology Yes No No Land (Nothing else but land) No No
24 2372 VI produced, good quality Lower quality Low Yes No No Land (Nothing else but land) No No
25 2374 Pixel produced, but most probably cloudy Lower quality Low Yes No No Land (Nothing else but land) No No
26 2376 VI produced, good quality Decreasing quality (0010) Low Yes No No Land (Nothing else but land) No No
27 2378 Pixel produced, but most probably cloudy Decreasing quality (0010) Low Yes No No Land (Nothing else but land) No No
28 2433 VI produced, but check other QA Higest quality Average Yes No No Land (Nothing else but land) No No
29 2441 VI produced, but check other QA Decreasing quality (0010) Average Yes No No Land (Nothing else but land) No No
... ... ... ... ... ... ... ... ... ... ...
287 39309 VI produced, but check other QA Decreasing quality (0011) Average Yes No No Shallow inland water No Yes
288 39313 VI produced, but check other QA Decreasing quality (0100) Average Yes No No Shallow inland water No Yes
289 39317 VI produced, but check other QA Decreasing quality (0101) Average Yes No No Shallow inland water No Yes
290 39361 VI produced, but check other QA Higest quality High Yes No No Shallow inland water No Yes
291 39362 Pixel produced, but most probably cloudy Higest quality High Yes No No Shallow inland water No Yes
292 39381 VI produced, but check other QA Decreasing quality (0101) High Yes No No Shallow inland water No Yes
293 39385 VI produced, but check other QA Decreasing quality (0110) High Yes No No Shallow inland water No Yes
294 39386 Pixel produced, but most probably cloudy Decreasing quality (0110) High Yes No No Shallow inland water No Yes
295 39389 VI produced, but check other QA Decreasing quality (0111) High Yes No No Shallow inland water No Yes
296 39390 Pixel produced, but most probably cloudy Decreasing quality (0111) High Yes No No Shallow inland water No Yes
297 39938 Pixel produced, but most probably cloudy Higest quality Climatology No No Yes Shallow inland water No Yes
298 39966 Pixel produced, but most probably cloudy Decreasing quality (0111) Climatology No No Yes Shallow inland water No Yes
299 39970 Pixel produced, but most probably cloudy Decreasing quality (1000) Climatology No No Yes Shallow inland water No Yes
300 40022 Pixel produced, but most probably cloudy Decreasing quality (0101) Low No No Yes Shallow inland water No Yes
301 40026 Pixel produced, but most probably cloudy Decreasing quality (0110) Low No No Yes Shallow inland water No Yes
302 40066 Pixel produced, but most probably cloudy Higest quality Average No No Yes Shallow inland water No Yes
303 40090 Pixel produced, but most probably cloudy Decreasing quality (0110) Average No No Yes Shallow inland water No Yes
304 40094 Pixel produced, but most probably cloudy Decreasing quality (0111) Average No No Yes Shallow inland water No Yes
305 40130 Pixel produced, but most probably cloudy Higest quality High No No Yes Shallow inland water No Yes
306 40162 Pixel produced, but most probably cloudy Decreasing quality (1000) High No No Yes Shallow inland water No Yes
307 40166 Pixel produced, but most probably cloudy Decreasing quality (1001) High No No Yes Shallow inland water No Yes
308 40194 Pixel produced, but most probably cloudy Higest quality Climatology Yes No Yes Shallow inland water No Yes
309 40222 Pixel produced, but most probably cloudy Decreasing quality (0111) Climatology Yes No Yes Shallow inland water No Yes
310 40226 Pixel produced, but most probably cloudy Decreasing quality (1000) Climatology Yes No Yes Shallow inland water No Yes
311 40230 Pixel produced, but most probably cloudy Decreasing quality (1001) Climatology Yes No Yes Shallow inland water No Yes
312 40386 Pixel produced, but most probably cloudy Higest quality High Yes No Yes Shallow inland water No Yes
313 40422 Pixel produced, but most probably cloudy Decreasing quality (1001) High Yes No Yes Shallow inland water No Yes
314 40426 Pixel produced, but most probably cloudy Decreasing quality (1010) High Yes No Yes Shallow inland water No Yes
315 51199 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Shallow ocean Yes Yes
316 63487 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Moderate or continental ocean Yes Yes

317 rows × 10 columns

Scroll through the table above to see the quality information contained in the LUT. Value refers to the pixel values, followed by the specific bit beginning with MODLAND. AppEEARS decodes each binary quality bit-field into more meaningful information for the user.

The case study used the following quality parameters to define their screening process:

1. MODLAND_QA flag = 0 or 1
    0 = good quality 
    1 = check other QA
2. VI usefulness <= 11
    0000    Highest quality = 0  
    0001    Lower quality = 1  
    0010    Decreasing quality = 2  
    ...  
    1011   Decreasing quality = 11
3. Adjacent cloud detected, Mixed Clouds, and Possible shadow = 0
    0 = No
4. Aerosol Quantity = 1 or 2
    1 = low aerosol
    2 = average aerosol
Below, use the pixel quality flag parameters outlined above in order to filter the data. Pandas makes this easier by allowing you to search the dataframe by column and value, and only keep the specific values that relate to the parameters outlined above.

Notice in the line: v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)], it is including all rows where the MODLAND column is either 'VI_produced, good quality' or 'VI produced, but check other QA'.

However, in the line: v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)], by adding the tilde in front of the command, the pandas dataframe will exclude the values in the VIU variable.

# Include good quality based on MODLAND
modland = ['VI produced, good quality', 'VI produced, but check other QA']
v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)]

# Exclude lower quality VI usefulness
VIU =["Lowest quality","Quality so low that it is not useful","L1B data faulty","Not useful for any other reason\/not processed"]
v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)]

# Include low or average aerosol
AQ = ['Low','Average']
v6_QA_lut = v6_QA_lut[v6_QA_lut['Aerosol Quantity'].isin(AQ)]

# Include where adjacent cloud, mixed clouds, or possible shadow were not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Adjacent cloud detected'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Mixed Clouds'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Possible shadow'] == 'No' ]
v6_QA_lut
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
5 2112 VI produced, good quality Higest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced, good quality Lower quality Low No No No Land (Nothing else but land) No No
9 2177 VI produced, but check other QA Higest quality Average No No No Land (Nothing else but land) No No
10 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
12 2185 VI produced, but check other QA Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
64 4160 VI produced, good quality Higest quality Low No No No Ocean coastlines and lake shorelines No No
66 4164 VI produced, good quality Lower quality Low No No No Ocean coastlines and lake shorelines No No
68 4225 VI produced, but check other QA Higest quality Average No No No Ocean coastlines and lake shorelines No No
69 4229 VI produced, but check other QA Lower quality Average No No No Ocean coastlines and lake shorelines No No
71 4233 VI produced, but check other QA Decreasing quality (0010) Average No No No Ocean coastlines and lake shorelines No No
104 6208 VI produced, good quality Higest quality Low No No No Shallow inland water No No
106 6212 VI produced, good quality Lower quality Low No No No Shallow inland water No No
108 6273 VI produced, but check other QA Higest quality Average No No No Shallow inland water No No
110 6277 VI produced, but check other QA Lower quality Average No No No Shallow inland water No No
112 6281 VI produced, but check other QA Decreasing quality (0010) Average No No No Shallow inland water No No

6. Setting Custom QA/QC Filtering (Masking)

This section demonstrates how to take the list of possible values (based on our quality requirements) above, and mask the EVI and quality arrays to only include pixels where the quality value is in the list of 'good quality' values based on the parameters from the case study.
# Print list of possible QA values based on parameters above
v6_QA_mask = list(v6_QA_lut['Value'])
del v6_QA_lut

v6_QA_mask
[2112,
 2116,
 2177,
 2181,
 2185,
 4160,
 4164,
 4225,
 4229,
 4233,
 6208,
 6212,
 6273,
 6277,
 6281]
So above, only pixels with those specific _1_km_16_days_VI_Quality values will be included after masking.
Below, mask the first observation (index 0) from the EVI array. Notice that invert is set to True here, meaning that you don't want to exclude the list of values above, but rather exclude every other value.
# Apply QA mask to the EVI data 
v6_EVI_masked = np.ma.MaskedArray(v6_EVI[0], np.in1d(v6_QA[0], v6_QA_mask, invert = True))

7. Visualizing NetCDF-4 Data

In this section, begin by highlighting the functionality of matplotlib plotting. First, make a basic plot of the EVI data. Next, add some additional parameters to the plot. Lastly, export the finished plot to an image file.
import matplotlib.pyplot as plt

# Visualize a basic plot
plt.imshow(v6_EVI[0]); # Adding ';' to the end of the call suppresses the text output from showing in the notebook 

png

Above is a basic matplotlib plot of the EVI array over the Amazon basin.
Next, add some additional parameters to the visualization.
# Set the figure size
plt.figure(figsize = (10,10))

# Remove the axes' values
plt.axis('off')

# Plot the original data, using a colormap and setting a custom linear stretch based on the min/max EVI values
plt.imshow(v6_EVI[0], vmin = v6_EVI[0].min(), vmax = v6_EVI[0].max(), cmap = 'YlGn');

png

Above, notice a better visualization of the data, but it is still missing important items such as a title and a colormap legend.
Below, plot the quality filtered data, add a figure title/subtitle and colormap legend, and export the result to a .png file in the output directory.
# Set the figure size
fig = plt.figure(figsize = (10,10))

# Remove the axes' values
plt.axis('off')

# Make a figure title
fig.suptitle('MODIS Version 6 Enhanced Vegetation Index (EVI) Quality Filtered Data', fontsize = 12, fontweight = 'bold')

# Make a figure subtitle
ax = fig.add_subplot(111)
fig.subplots_adjust(top = 2.2)
ax.set_title('Amazon Rainforest: 07-12-2005')
ax1 = plt.gca()

# Plot the masked data
im = ax1.imshow(v6_EVI_masked, vmin = -0.2, vmax = 1.0, cmap = 'YlGn');

# Add a colormap legend
plt.colorbar(im, orientation='horizontal', fraction = 0.047, pad = 0.02, label = 'EVI').outline.set_visible(True)

# Set up file name and export to png file
fig.savefig('{}{}_ndvi.png'.format(out_dir, (file_list[1].split('.nc'))[0]))

png


8. Masking by Land Cover Type

The case study limited the analysis to "forest" land covers. The example below demonstrates how to limit your analysis based on land cover.
# Use MCD12Q1.051 to define land cover types
file_list
['MCD12Q1.051_aid0001.nc', 'MOD13A2.006_aid0001.nc']
# set to 'MCD12Q1.051_aid0001.nc' and open
lct_file = Dataset(file_list[0],'r', format = 'NETCDF4')

# print a list of variables in file
list(lct_file.variables)
[u'crs', u'time', u'lat', u'lon', u'Land_Cover_Type_1', u'Land_Cover_Type_QC']
# Show the variable metadata
lct_file.variables['Land_Cover_Type_1']
&lt;type 'netCDF4._netCDF4.Variable'&gt;
int16 Land_Cover_Type_1(time, lat, lon)
    _FillValue: 255
    coordinates: time lat lon
    grid_mapping: crs
    valid_min: 0
    valid_max: 254
    long_name: Land_Cover_Type_1
    units: class number
    water: 0
    evergreen_needleleaf_forest: 1
    evergreen_broadleaf_forest: 2
    deciduous_needleleaf_forest: 3
    deciduous_broadleaf_forest: 4
    mixed_forests: 5
    closed_shrubland: 6
    open_shrublands: 7
    woody_savannas: 8
    savannas: 9
    grasslands: 10
    permanent_wetlands: 11
    croplands: 12
    urban_and_built_up: 13
    cropland_natural_vegetation_mosaic: 14
    snow_and_ice: 15
    barren_or_sparsely_vegetated: 16
    unclassified: -2
unlimited dimensions: 
current shape = (1, 7200, 8400)
filling on
Above, notice the class number assignments. Below, mask the data to only include forest land cover types (LCT).
In the study, the researchers used the MOD12Q1 Collection 5 Land Cover product to identify forest pixels in their study area. This tutorial uses the newer MCD12Q1 Version 5.1 Land Cover Type product to only include pixels in our study region that are classified as one of five forest types:
  • evergreen_needleleaf_forest: 1
  • evergreen_broadleaf_forest: 2
  • deciduous_needleleaf_forest: 3
  • deciduous_broadleaf_forest: 4
  • mixed_forests: 5
# Open the Land_Cover_Type_1 dataset
lct = lct_file.variables['Land_Cover_Type_1'][:][0]

# Make a list of the forest land cover types outlined above
lct_forest = [1,2,3,4,5]

# Mask the dataset to only include forest LCTs
lct_masked_forest = np.ma.MaskedArray(lct, np.in1d(lct, lct_forest, invert = True))
Before using the newly created forest LCT mask to filter out non-forest pixels from the EVI data, the 500 m LCT data will need to be resampled to the same resolution as the EVI data (1000 m).
Below, the scipy library is used to resample the data from 500 m to 1000 m using nearest neighbor resampling.
import scipy.ndimage

# Resample by a factor of 2 with nearest neighbor interpolation
resampled_lct = scipy.ndimage.zoom(lct, 0.5, order=0)

# Next, apply QA mask to the EVI data
v6_EVI_masked_forest = np.ma.MaskedArray(v6_EVI_masked, np.in1d(resampled_lct, lct_forest, invert = True))

del lct
Visualize the (1) original data, (2) quality masked data, & (3) quality masked forest-only data by creating 3 plots & plotting them side by side.
# Show before/after quality screening
figure, axes = plt.subplots(nrows=1, ncols=3, figsize=(40, 10), sharey=True)
figure.suptitle('MODIS Version 6 Enhanced Vegetation Index (EVI): 07-12-2005', fontsize=32, fontweight='bold')

# Turn off axes' values
axes[0].axis('off'), axes[1].axis('off'), axes[2].axis('off')

# Plot the original data
axes[0].set_title('Original Data', fontsize=20, fontweight='bold')
axes[0].imshow(v6_EVI[0], vmin = -0.2, vmax = 1.0, cmap = 'YlGn')

# Plot the quality filtered data
axes[1].set_title('Quality Filtered Data', fontsize=20, fontweight='bold')
axes[1].imshow(v6_EVI_masked, vmin = -0.2, vmax = 1.0, cmap = 'YlGn')

# Plot the forest-only data
axes[2].set_title('Quality Filtered Forest Data', fontsize=20, fontweight='bold')
axes[2].imshow(v6_EVI_masked_forest, vmin = -0.2, vmax = 1.0, cmap = 'YlGn');

png


9. Visualizing NetCDF-4 Data Using Basemap

Next, import a matplotlib library called Basemap, which allows you to create geographically referenced plots and add other useful layers such as country boundaries and coastlines.
In the first Basemap example, define a colormap for the LCT data and plot with coastlines and country borders.
from matplotlib.colors import LinearSegmentedColormap

# Define colormap
lct_cmap = ((0.149,0.451,0),(0.220,0.659,0),(0.298,0.902,0),(0.639,1,0.451),(0.266,0.537,0.439))
cmap1 = LinearSegmentedColormap.from_list("LCT", lct_cmap, N=5, gamma=1.0)

del lct_cmap

# Remove Matplotlib deprecation warnings when importing Basemap
import warnings
warnings.simplefilter("ignore")
Explanation of 'Basemap()' parameters:
    projection = the desired projection (EPSG: 4326 here)
    ellps = the desired datum (WGS84 here)
    llcrnrlat= lower left bounding box latitude
    urcrnrlat= upper right bounding box latitude
    llcrnrlon= lower left bounding box longitude
    urcrnrlon= upper right bounding box longitude
    resolution= resolution of ancillary boundary layers in basemap plot, 'i' here = intermediate resolution
For more information, please see the Basemap documentation.
Below, import Basemap, set up the lat/lon grid, generate the Basemap, draw the desired borders, and ultimately plot the lct (forest only) data, assigning it to the colormap generated above.
# Import Basemap library
from mpl_toolkits.basemap import Basemap ;

# Set up the figure
fig = plt.figure(figsize=(8,8))

# Set the extent values
x0, y0 = (lon_EVI_min, lat_EVI_min)
x1, y1 = (lon_EVI_max, lat_EVI_max)

# Create Basemap
m = Basemap(projection='cyl', ellps = 'WGS84',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l');

# Draw coastlines and country boundaries, zorder is used to define the order of elements
m.drawcoastlines(zorder = 10), m.drawcountries(zorder = 10)
m.drawmapboundary(fill_color='lightblue'), m.fillcontinents(color='white')

# Plot the LCT data
plt.imshow(lct_masked_forest, vmin = 1 , vmax = 5, cmap = cmap1, zorder =5, origin = 'upper', extent= (x0, x1, y0, y1))

# Add a title
plt.title("Forest Land Cover Types of the Amazon Basin (2005)")
plt.show()

warnings.simplefilter("ignore")
lct_file.close()
del lct_masked_forest

png


10. Generating Statistics on Quality Filtered Data

Below, look at the mean EVI values for the Amazon region from July 12, 2005. Notice that differences arise based on (1) quality filtering (or lack thereof) and (2) land cover type used in calculating the mean EVI.
# Call np.mean() and print results
print('Version 6 all quality mean EVI: ' + str(float(format((np.mean(v6_EVI[0])), '.4f'))))
print('Version 6 quality filtered mean EVI: ' + str(float(format((np.mean(v6_EVI_masked)), '.4f'))))
print('Version 6 quality filtered mean (forests) EVI: ' + str(float(format((np.mean(v6_EVI_masked_forest)), '.4f'))))
Version 6 all quality mean EVI: 0.4078
Version 6 quality filtered mean EVI: 0.4072
Version 6 quality filtered mean (forests) EVI: 0.4995

11. Automating QA/QC Filtering

Here loop through and mask arrays by quality for multiple time steps in a NetCDF-4 file.
# Make a copy of the original NetCDF-4 file
v6_EVI_final = v6_EVI
Below, use a 'for' loop and indexing (i) in order to loop through each layer and filter based on quality and land cover type.
# Loop through all timesteps (observations) in the file, mask out poor quality and exclude non-forest pixels
for i in range(len(file_in.dimensions['time'])):

    # Next, apply QA mask to the EVI data
    v6_EVI_masked = np.ma.MaskedArray(v6_EVI[i], np.in1d(v6_QA[i], v6_QA_mask, invert = True));

    # Next, apply QA mask to the EVI data
    v6_EVI_final[i] = np.ma.MaskedArray(v6_EVI_masked, np.in1d(resampled_lct, lct_forest, invert = True));
del v6_EVI, v6_EVI_masked, v6_QA, resampled_lct

12. Visualizing Time Series

Below, set up subplots to visualize all 6 timesteps (observations) on the same plot using Basemap.
# Set up subplots with 2 rows and 3 columns
figure, axes = plt.subplots(nrows=2, ncols=3, figsize=(25, 10), sharey=True, sharex=True)

# Add a title for the series of plots
figure.suptitle('MODIS Version 6 Enhanced Vegetation Index (EVI): 07-12-2005 to 09-30-2005', fontsize=32, fontweight='bold')

# Set the subplot titles
axes[0][0].set_title('07-12-2005',fontsize=20,fontweight='bold'),axes[0][1].set_title('07-28-2005',fontsize=20,fontweight='bold')
axes[0][2].set_title('08-13-2005',fontsize=20,fontweight='bold'),axes[1][0].set_title('08-29-2005',fontsize=20,fontweight='bold')
axes[1][1].set_title('09-14-2005',fontsize=20,fontweight='bold'),axes[1][2].set_title('09-30-2005',fontsize=20,fontweight='bold')

# Plot each observation
axes[0][0].imshow(v6_EVI_final[0],vmin = -0.2, vmax = 1.0, cmap = 'YlGn',zorder = 5, origin = 'upper', extent= (x0, x1, y0, y1))
axes[0][1].imshow(v6_EVI_final[1],vmin = -0.2, vmax = 1.0, cmap = 'YlGn',zorder = 5, origin = 'upper', extent= (x0, x1, y0, y1))
axes[0][2].imshow(v6_EVI_final[2],vmin = -0.2, vmax = 1.0, cmap = 'YlGn',zorder = 5, origin = 'upper', extent= (x0, x1, y0, y1))
axes[1][0].imshow(v6_EVI_final[3],vmin = -0.2, vmax = 1.0, cmap = 'YlGn',zorder = 5, origin = 'upper', extent= (x0, x1, y0, y1))
axes[1][1].imshow(v6_EVI_final[4],vmin = -0.2, vmax = 1.0, cmap = 'YlGn',zorder = 5, origin = 'upper', extent= (x0, x1, y0, y1))
axes[1][2].imshow(v6_EVI_final[5],vmin = -0.2, vmax = 1.0, cmap = 'YlGn',zorder = 5, origin = 'upper', extent= (x0, x1, y0, y1))

# Set up the basemap for each subplot
m2 =Basemap(projection='cyl',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l',ellps ='WGS84',ax=axes[0][0])
m2.drawcoastlines(zorder = 10), m2.drawcountries(zorder = 10)
m2.drawmapboundary(fill_color='lightblue'),m2.fillcontinents(color='white')
m2 =Basemap(projection='cyl',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l',ellps ='WGS84',ax=axes[0][1])
m2.drawcoastlines(zorder = 10), m2.drawcountries(zorder = 10)
m2.drawmapboundary(fill_color='lightblue'),m2.fillcontinents(color='white')
m2 =Basemap(projection='cyl',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l',ellps ='WGS84',ax=axes[0][2])
m2.drawcoastlines(zorder = 10), m2.drawcountries(zorder = 10)
m2.drawmapboundary(fill_color='lightblue'),m2.fillcontinents(color='white')
m2 =Basemap(projection='cyl',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l',ellps ='WGS84',ax=axes[1][0])
m2.drawcoastlines(zorder = 10), m2.drawcountries(zorder = 10)
m2.drawmapboundary(fill_color='lightblue'),m2.fillcontinents(color='white')
m2 =Basemap(projection='cyl',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l',ellps ='WGS84',ax=axes[1][1])
m2.drawcoastlines(zorder = 10), m2.drawcountries(zorder = 10)
m2.drawmapboundary(fill_color='lightblue'),m2.fillcontinents(color='white')
m2 =Basemap(projection='cyl',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l',ellps ='WGS84',ax=axes[1][2])
m2.drawcoastlines(zorder=10),m2.drawcountries(zorder=10)
m2.drawmapboundary(fill_color='lightblue'),m2.fillcontinents(color='white');

png


13. Exporting NetCDF-4 Files

Lastly, demonstrate how to copy the original NetCDF-4 file, insert the quality and LCT-masked data, and save to a new NetCDF-4 file.
# Create output file
v6_file_out = Dataset(out_dir+file_list[1].split('.nc')[0] + '_masked.nc','w', format = 'NETCDF4')

#Copy dimensions of the input file
file_in.dimensions.items()
v6_file_out.createDimension('time', len(file_in.dimensions['time']))
v6_file_out.createDimension('lat', len(file_in.dimensions['lat']))
v6_file_out.createDimension('lon', len(file_in.dimensions['lon']))

# Copy variables from the input file
v6_variables = list(file_in.variables)
Below, use a 'for' loop to iterate through each variable, copy the input file attributes and dimensions, replace the original EVI data with our QA and LCT-masked data, and export to a new file.
# loop through each variable
for j in range(len(v6_variables)):
    # Copy the input variable 
    in_var = file_in.variables[v6_variables[j]]
    # Create the output variable
    out_var = v6_file_out.createVariable(v6_variables[j], in_var.datatype, in_var.dimensions)
    # Copy variable attributes
    out_var.setncatts({k: in_var.getncattr(k) for k in in_var.ncattrs()})
    # replace the original EVI data with our quality and forest masked data
    if v6_variables[j] == '_1_km_16_days_EVI':
        out_var[:] = v6_EVI_final[:]
    else: 
        out_var[:] = in_var[:]
# close the input and output files
v6_file_out.close()
file_in.close()

Contact Information

Material written by Cole Krehbiel1

Contact: LPDAAC@usgs.gov
Voice: +1-605-594-6116
Organization: Land Processes Distributed Active Archive Center (LP DAAC)
Website: https://lpdaac.usgs.gov/
Date last modified: 12-06-2017

1Innovate! Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, USA. Work performed under USGS contract G15PD00403 for LP DAAC2.

2LP DAAC Work performed under NASA contract NNG14HH33I.

Relevant Products

Product Long Name
MOD13A2.006 MODIS/Terra Vegetation Indices 16-Day L3 Global 1 km SIN Grid

Tools

Name Filters Description
AppEEARS Search, Subset, Decode Quality

The Application for Extracting and Exploring Analysis Ready Samples (AρρE…