Working with AppEEARS NetCDF-4 Output Data in Python

Details:

Published: Dec. 6, 2017

Working with AppEEARS NetCDF-4 Output Data in Python


Objective:

This tutorial demonstrates how to work with data output from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) Area Sampler in Python.

AppEEARS allows users to select NetCDF-4 as an output file format for Area Requests. This tutorial provides step-by-step instructions for filtering NetCDF-4 files based on specific quality and land cover type determinations.

Case Study:

Throughout the tutorial, the following study is used as an example of how and why quality filtering is necessary in order to arrive at scientifically sound conclusions when working with remote sensing data. The tutorial follows similar data and methods of this study as closely as possible to demonstrate how to work with AppEEARS NetCDF-4 output files in Python.

  • Samanta, A., Ganguly, S., Hashimoto, H., Devadiga, S., Vermote, E., Knyazikhin, Y., Nemani, R.R., and Myneni, R.B., 2010, Amazon forests did not green‐up during the 2005 drought: Geophysical Research Letters, v. 37, no. 5.

What: The study tries to reproduce reports from previous studies showing greening of Amazon forests during the 2005 drought.
Why: To demonstrate the importance of considering remote sensing data quality, and reconcile contradicting reports of increased tree mortality and biomass burning with reports of anomalous Amazon forest greening.
How: The authors replicated a previous study claiming that Amazon forests were greening during the 2005 drought, with very specific quality and land cover type screening in their methods. They then used the quality-filtered data to calculate standard anomalies of EVI in order to arrive at their conclusions.
Where: Amazon Rainforest, from latitude: 20° S to 10° N, and longitude: 80° W to 45° W
When: July-September 2000-2008

AppEEARS Request Parameters:

Request time series data for MODIS Enhance Vegetation Indices (EVI) and Land Cover.

  • Date: July 01, 2005 to September 30, 2005
  • Data layers:
    • Terra MODIS Vegetation Indices
    • Combined MODIS Land Cover Type
  • Location: 20° S to 10° N, and longitude: 80° W to 45° W
  • File Format: NetCDF-4
  • Projection: Geographic (EPSG: 4326)

AppEEARS Information:

To access AppEEARS, visit: https://appeears.earthdatacloud.nasa.gov/
For information on how to make an AppEEARS request, visit: https://appeears.earthdatacloud.nasa.gov/help


Topics Covered:

  1. Setting Up the Working Environment
  2. Downloading AppEEARS Output in Python from Text File
  3. Opening a NetCDF-4 (.nc) File
  4. Reading NetCDF-4 File Metadata
  5. Reading Quality Lookup Tables (LUTs) from CSV Files
  6. Setting Custom QA/QC Filtering (Masking)
  7. Visualizing NetCDF-4 Data
  8. Masking by Land Cover Type
  9. Visualizing NetCDF-4 Data Using Basemap
  10. Generating Statistics on Quality Filtered Data
  11. Automating QA/QC Filtering
  12. Visualizing Time Series
  13. Exporting NetCDF-4 Files

Prerequisites:

A NASA Earthdata Login account is required to download the data used in this tutorial. You can create an account at the link provided. + Python Version 3.6.1
+ shutil
+ urllib
+ os
+ glob
+ numpy
+ netCDF4 + matplotlib + pandas
+ warnings
+ Basemap
+ scipy

NOTE: You will need to have cURL installed on your OS in order to perform topic two (downloading). If you are unable to download cURL, you can skip step two and download the files needed to run the rest of the tutorial in the section below.


Procedures:

1. The AppEEARS files used in this tutorial can be downloaded from:

2. Copy/clone/download the AppEEARS Tutorial repo, or the desired tutorial from the LP DAAC Data User Resources Repository.

3. The source code used to generate this tutorial can be downloaded from:


Python Environment Setup

1. It is recommended to use Conda, an environment manager, to set up a compatible Python environment. Download Conda for your OS here. Once you have Conda installed, Follow the instructions below to successfully setup a Python environment on Windows, MacOS, or Linux.

2. Setup

  • Using your preferred command line interface (command prompt, terminal, cmder, etc.) type the following to successfully create a compatible python environment:
  • conda create -n appeearstutorial -c conda-forge python=3.7 netCDF4 numpy pandas matplotlib=3.1.3 scipy basemap
  • conda activate appeearstutorial
  • jupyter notebook

If you do not have jupyter notebook installed, you may need to run:
- conda install jupyter notebook

TIP: Having trouble activating your environment, or loading specific packages once you have activated your environment? Try the following:

Type: 'conda update conda'

If you prefer to not install Conda, the same setup and dependencies can be achieved by using another package manager such as pip and the requirements.txt file listed above.
Additional information on setting up and managing Conda environments.

Still having trouble getting a compatible Python environment set up? Contact LP DAAC User Services.


## 1. Setting Up A Working Environment

First, import the python libraries needed to run this tutorial. If you are missing any of the packages, you will need to download them in order to run the entire notebook.

# Set matplotlib plots inline
%matplotlib inline

# Import libraries
import urllib
import shutil
import os
import glob
import numpy as np
from netCDF4 import Dataset
Next, set your input/working directory, and create an output directory for the results.
# Set input directory, and change working directory
in_dir = input("Main input directory: ")                     # IMPORTANT: Need to enter your working directory
os.chdir(in_dir)

# Create and set output directory
out_dir = os.path.normpath((os.path.split(in_dir)[0] + os.sep + 'output/' ))+ '\\'
if not os.path.exists(out_dir):
    os.makedirs(out_dir)

# 2. Downloading AppEEARS Output in Python from Text File
Now take an AppEEARS output download text file and download all of your NetCDF-4 output files.
For more information on how to retrieve the output download text file from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.
If you plan to use the files for the use case in this tutorial, you will need to download the text file of AppEEARS output file links and make sure that the text file is located in your input directory.
# Search for, open, and read text file containing links to AppEEARS output files
download_file = open(glob.glob('**.txt')[0],'r')
download_link = download_file.readlines()

# Loop through text file and download all AppEEARS outputs
for i in range(len(download_link)):  
    file_name = download_link[i].split('/')[-1].strip()

    # Download the file
    response = urllib.request.urlopen(download_link[i])
    out_file = open(file_name, 'wb')
    shutil.copyfileobj(response, out_file)
    print('Downloaded file: '+ file_name + ' (' + str(i+1) + " of " + str(len(download_link)) + ')')
    out_file.close()

# Once you have finished downloading the files, remove variables not needed for the remainder of the tutorial
del download_link, file_name, download_file
Downloaded file: MCD12Q1.006_500m_aid0001.nc (1 of 2)
Downloaded file: MOD13A2.006_1km_aid0001.nc (2 of 2)

Notice that two files were downloaded. For an AppEEARS Area Sample request, if NetCDF-4 is selected as the output format, outputs will be grouped into .nc files by product and by feature.


# 3. Opening a NetCDF-4 (.nc) File

Search for the NetCDF-4 files that you just downloaded in your working directory, create a list of the files, and print the list.

# Now that we have files downloaded, search for downloaded files
file_list = glob.glob('**.nc')
file_list
['MCD12Q1.051_aid0001.nc', 'MOD13A2.006_aid0001.nc']

Notice that the request includes the version 6 MODIS Combined Yearly Land Cover Type (MCD12Q1.06) and version 6 MODIS Terra 16-day Vegetation Indices (MOD13A2.006) products. Start with the MOD13A2 version 6 NetCDF file.

Notice below that the file_list index is set to 1, to select the MOD13A2.006.nc file above [0,1]. Read the file in by calling the Dataset() function from the NetCDF4 library.

# Read file in, starting with MOD13A2 version 6
file_in = Dataset(file_list[1],'r', format = 'NETCDF4')

# 4. Reading NetCDF-4 File Metadata
Below, look at the variables and metadata in the NetCDF-4 file.
# Print a list of variables in file
list(file_in.variables)
['crs', 'time', 'lat', 'lon', '_1_km_16_days_EVI', '_1_km_16_days_VI_Quality']

_1_km_16_days_EVI and _1_km_16_days_VI_Quality are the data layers focused on for this tutorial, which are Enhanced Vegetation Index (EVI) and the corresponding quality data that were used in the case study.

Below, read in the metadata for the EVI dat

# Grab variable and variable metadata
v6_info = file_in.variables['_1_km_16_days_EVI']
print(v6_info)

# File dimensions
file_in.dimensions.values()
<class 'netCDF4._netCDF4.Variable'>
int16 _1_km_16_days_EVI(time, lat, lon)
    _FillValue: -3000
    coordinates: time lat lon
    grid_mapping: crs
    valid_min: -2000
    valid_max: 10000
    long_name: 1 km 16 days EVI
    units: EVI
    scale_factor_err: 0.0
    add_offset_err: 0.0
    calibrated_nt: 5
    scale_factor: 0.0001
    add_offset: 0.0
unlimited dimensions: 
current shape = (7, 3600, 4200)
filling on
dict_values([<class 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 7, <class 
'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 3600, <class 'netCDF4._netCDF4.Dimension'>: name = 
'lon', size = 4200])

Above, notice the useful metadata, including things like units, scale factor, and fill value.

  • Note that the netCDF4 module automatically applies scale_factor and add_offset values.

The time variables are days since 2000-01-01. Below, set the dates to a variable.

from netCDF4 import num2date

times = file_in.variables["time"]                            # import the time variable 
dates = num2date(times[:], times.units)                      # get the time info
dates = [date.strftime('%Y-%m-%d') for date in dates]        # get the list of datetime
print(dates)

['2005-06-26', '2005-07-12', '2005-07-28', '2005-08-13', '2005-08-29', '2005-09-14', '2005-09-30']

For this tutorial, we use observations from July to September 2005. Below, the date outside this time span is removed.

dates.remove('2005-06-26') print(dates)

['2005-07-12', '2005-07-28', '2005-08-13', '2005-08-29', '2005-09-14', '2005-09-30']

Below, set the dataset arrays to variables by adding [:].

# import the EVI and VI Quality datasets
v6_EVI = file_in.variables['_1_km_16_days_EVI'][1:]          # select the EVI layers from July to September 
print(v6_EVI.shape)
v6_QA = file_in.variables['_1_km_16_days_VI_Quality'][1:]   # select the EVI quality layers from July to September
print(v6_QA.shape)
(6, 3600, 4200)
(6, 3600, 4200)

Above, notice the NetCDF-4 file dimensions. 'time' indicates that the file includes 6 timesteps or layers (observations).

Next, grab the min/max values from the latitude and longitude arrays that are included in the NetCDF-4 files.

# Set lat and lon min/max for EVI data
lat_EVI_min = file_in.variables['lat'][:].min()
lat_EVI_max = file_in.variables['lat'][:].max()
lon_EVI_min = file_in.variables['lon'][:].min()
lon_EVI_max = file_in.variables['lon'][:].max()

# 5. Reading Quality Lookup Tables (LUTs) from CSV Files

In this section, take the quality lookup table (LUT) CSV file from an AppEEARS request, read it into Python using the pandas library, and use it to help set the quality masking parameters.

The quality LUT for your request (files ending in lookup.csv) can be found under Supporting Files on the Download Area Sample page:

For more information on how to retrieve the quality LUTs from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.

Next, search the working directory for the quality LUT CSV file.

# Search for look up table 
lut = glob.glob('**lookup.csv')
lut
['MOD13A2-006-1-km-16-days-VI-Quality-lookup.csv']

Below, import the pandas library and use the pd.read_csv function to read the CSV file in as a pandas dataframe.

import pandas as pd
# Read in the lut
v6_QA_lut = pd.read_csv(lut[0])
v6_QA_lut
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
0 2050 Pixel produced, but most probably cloudy Higest quality Climatology No No No Land (Nothing else but land) No No
1 2057 VI produced, but check other QA Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
2 2058 Pixel produced, but most probably cloudy Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
3 2061 VI produced, but check other QA Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
4 2062 Pixel produced, but most probably cloudy Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
5 2112 VI produced, good quality Higest quality Low No No No Land (Nothing else but land) No No
6 2114 Pixel produced, but most probably cloudy Higest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced, good quality Lower quality Low No No No Land (Nothing else but land) No No
8 2118 Pixel produced, but most probably cloudy Lower quality Low No No No Land (Nothing else but land) No No
9 2177 VI produced, but check other QA Higest quality Average No No No Land (Nothing else but land) No No
10 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
11 2182 Pixel produced, but most probably cloudy Lower quality Average No No No Land (Nothing else but land) No No
12 2185 VI produced, but check other QA Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
13 2186 Pixel produced, but most probably cloudy Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
14 2241 VI produced, but check other QA Higest quality High No No No Land (Nothing else but land) No No
15 2242 Pixel produced, but most probably cloudy Higest quality High No No No Land (Nothing else but land) No No
16 2253 VI produced, but check other QA Decreasing quality (0011) High No No No Land (Nothing else but land) No No
17 2254 Pixel produced, but most probably cloudy Decreasing quality (0011) High No No No Land (Nothing else but land) No No
18 2257 VI produced, but check other QA Decreasing quality (0100) High No No No Land (Nothing else but land) No No
19 2258 Pixel produced, but most probably cloudy Decreasing quality (0100) High No No No Land (Nothing else but land) No No
20 2317 VI produced, but check other QA Decreasing quality (0011) Climatology Yes No No Land (Nothing else but land) No No
21 2318 Pixel produced, but most probably cloudy Decreasing quality (0011) Climatology Yes No No Land (Nothing else but land) No No
22 2321 VI produced, but check other QA Decreasing quality (0100) Climatology Yes No No Land (Nothing else but land) No No
23 2322 Pixel produced, but most probably cloudy Decreasing quality (0100) Climatology Yes No No Land (Nothing else but land) No No
24 2372 VI produced, good quality Lower quality Low Yes No No Land (Nothing else but land) No No
25 2374 Pixel produced, but most probably cloudy Lower quality Low Yes No No Land (Nothing else but land) No No
26 2376 VI produced, good quality Decreasing quality (0010) Low Yes No No Land (Nothing else but land) No No
27 2378 Pixel produced, but most probably cloudy Decreasing quality (0010) Low Yes No No Land (Nothing else but land) No No
28 2433 VI produced, but check other QA Higest quality Average Yes No No Land (Nothing else but land) No No
29 2441 VI produced, but check other QA Decreasing quality (0010) Average Yes No No Land (Nothing else but land) No No
... ... ... ... ... ... ... ... ... ... ...
287 39309 VI produced, but check other QA Decreasing quality (0011) Average Yes No No Shallow inland water No Yes
288 39313 VI produced, but check other QA Decreasing quality (0100) Average Yes No No Shallow inland water No Yes
289 39317 VI produced, but check other QA Decreasing quality (0101) Average Yes No No Shallow inland water No Yes
290 39361 VI produced, but check other QA Higest quality High Yes No No Shallow inland water No Yes
291 39362 Pixel produced, but most probably cloudy Higest quality High Yes No No Shallow inland water No Yes
292 39381 VI produced, but check other QA Decreasing quality (0101) High Yes No No Shallow inland water No Yes
293 39385 VI produced, but check other QA Decreasing quality (0110) High Yes No No Shallow inland water No Yes
294 39386 Pixel produced, but most probably cloudy Decreasing quality (0110) High Yes No No Shallow inland water No Yes
295 39389 VI produced, but check other QA Decreasing quality (0111) High Yes No No Shallow inland water No Yes
296 39390 Pixel produced, but most probably cloudy Decreasing quality (0111) High Yes No No Shallow inland water No Yes
297 39938 Pixel produced, but most probably cloudy Higest quality Climatology No No Yes Shallow inland water No Yes
298 39966 Pixel produced, but most probably cloudy Decreasing quality (0111) Climatology No No Yes Shallow inland water No Yes
299 39970 Pixel produced, but most probably cloudy Decreasing quality (1000) Climatology No No Yes Shallow inland water No Yes
300 40022 Pixel produced, but most probably cloudy Decreasing quality (0101) Low No No Yes Shallow inland water No Yes
301 40026 Pixel produced, but most probably cloudy Decreasing quality (0110) Low No No Yes Shallow inland water No Yes
302 40066 Pixel produced, but most probably cloudy Higest quality Average No No Yes Shallow inland water No Yes
303 40090 Pixel produced, but most probably cloudy Decreasing quality (0110) Average No No Yes Shallow inland water No Yes
304 40094 Pixel produced, but most probably cloudy Decreasing quality (0111) Average No No Yes Shallow inland water No Yes
305 40130 Pixel produced, but most probably cloudy Higest quality High No No Yes Shallow inland water No Yes
306 40162 Pixel produced, but most probably cloudy Decreasing quality (1000) High No No Yes Shallow inland water No Yes
307 40166 Pixel produced, but most probably cloudy Decreasing quality (1001) High No No Yes Shallow inland water No Yes
308 40194 Pixel produced, but most probably cloudy Higest quality Climatology Yes No Yes Shallow inland water No Yes
309 40222 Pixel produced, but most probably cloudy Decreasing quality (0111) Climatology Yes No Yes Shallow inland water No Yes
310 40226 Pixel produced, but most probably cloudy Decreasing quality (1000) Climatology Yes No Yes Shallow inland water No Yes
311 40230 Pixel produced, but most probably cloudy Decreasing quality (1001) Climatology Yes No Yes Shallow inland water No Yes
312 40386 Pixel produced, but most probably cloudy Higest quality High Yes No Yes Shallow inland water No Yes
313 40422 Pixel produced, but most probably cloudy Decreasing quality (1001) High Yes No Yes Shallow inland water No Yes
314 40426 Pixel produced, but most probably cloudy Decreasing quality (1010) High Yes No Yes Shallow inland water No Yes
315 51199 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Shallow ocean Yes Yes
316 63487 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Moderate or continental ocean Yes Yes

317 rows × 10 columns

Scroll through the table above to see the quality information contained in the LUT. Value refers to the pixel values, followed by the specific bit beginning with MODLAND. AppEEARS decodes each binary quality bit-field into more meaningful information for the user.

The case study used the following quality parameters to define their screening process:

1. MODLAND_QA flag = 0 or 1
0 = good quality 1 = check other QA
2. VI usefulness <= 11
0000 Highest quality = 0
0001 Lower quality = 1
0010 Decreasing quality = 2
...
1011 Decreasing quality = 11
3. Adjacent cloud detected, Mixed Clouds, and Possible shadow = 0
0 = No
4. Aerosol Quantity = 1 or 2
1 = low aerosol 2 = average aerosol

Below, use the pixel quality flag parameters outlined above in order to filter the data. Pandas makes this easier by allowing you to search the dataframe by column and value, and only keep the specific values that relate to the parameters outlined above.

Notice in the line: v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)], it is including all rows where the MODLAND column is either 'VI_produced, good quality' or 'VI produced, but check other QA'.

However, in the line: v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)], by adding the tilde in front of the command, the pandas dataframe will exclude the values in the VIU variable.

# Include good quality based on MODLAND
modland = ['VI produced, good quality', 'VI produced, but check other QA']
v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)]

# Exclude lower quality VI usefulness
VIU =["Lowest quality","Quality so low that it is not useful","L1B data faulty","Not useful for any other reason\/not processed"]
v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)]

# Include low or average aerosol
AQ = ['Low','Average']
v6_QA_lut = v6_QA_lut[v6_QA_lut['Aerosol Quantity'].isin(AQ)]

# Include where adjacent cloud, mixed clouds, or possible shadow were not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Adjacent cloud detected'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Mixed Clouds'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Possible shadow'] == 'No' ]
v6_QA_lut
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
5 2112 VI produced, good quality Higest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced, good quality Lower quality Low No No No Land (Nothing else but land) No No
9 2177 VI produced, but check other QA Higest quality Average No No No Land (Nothing else but land) No No
10 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
12 2185 VI produced, but check other QA Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
64 4160 VI produced, good quality Higest quality Low No No No Ocean coastlines and lake shorelines No No
66 4164 VI produced, good quality Lower quality Low No No No Ocean coastlines and lake shorelines No No
68 4225 VI produced, but check other QA Higest quality Average No No No Ocean coastlines and lake shorelines No No
69 4229 VI produced, but check other QA Lower quality Average No No No Ocean coastlines and lake shorelines No No
71 4233 VI produced, but check other QA Decreasing quality (0010) Average No No No Ocean coastlines and lake shorelines No No
104 6208 VI produced, good quality Higest quality Low No No No Shallow inland water No No
106 6212 VI produced, good quality Lower quality Low No No No Shallow inland water No No
108 6273 VI produced, but check other QA Higest quality Average No No No Shallow inland water No No
110 6277 VI produced, but check other QA Lower quality Average No No No Shallow inland water No No
112 6281 VI produced, but check other QA Decreasing quality (0010) Average No No No Shallow inland water No No

# 6. Setting Custom QA/QC Filtering (Masking)

This section demonstrates how to take the list of possible values (based on our quality requirements) above, and mask the EVI and quality arrays to only include pixels where the quality value is in the list of 'good quality' values based on the parameters from the case study.

# Print list of possible QA values based on parameters above
v6_QA_mask = list(v6_QA_lut['Value'])
del v6_QA_lut

v6_QA_mask
[2112,
 2116,
 2177,
 2181,
 2185,
 4160,
 4164,
 4225,
 4229,
 4233,
 6208,
 6212,
 6273,
 6277,
 6281]

So above, only pixels with those specific _1_km_16_days_VI_Quality values will be included after masking.

Below, mask the first observation (index 0) from the EVI array. Notice that invert is set to True here, meaning that you don't want to exclude the list of values above, but rather exclude every other value.

# Apply QA mask to the EVI data 
v6_EVI_masked = np.ma.MaskedArray(v6_EVI[0], np.in1d(v6_QA[0], v6_QA_mask, invert = True))

# 7. Visualizing NetCDF-4 Data

In this section, begin by highlighting the functionality of matplotlib plotting. First, make a basic plot of the EVI data. Next, add some additional parameters to the plot. Lastly, export the finished plot to an image file.

import matplotlib.pyplot as plt

# Visualize a basic plot
plt.figure(figsize = (10,10))                # Set the figure size
plt.axis('off')                              # Remove the axes' values

# Plot the original data, using a colormap and setting a custom linear stretch based on the min/max EVI values
plt.imshow(v6_EVI[0], vmin = -0.2 , vmax = 1, cmap = 'YlGn');        # Adding ';' to the end of the call suppresses the text output from showing in the notebook 

Vegetation data over Brazil

Above, notice a better visualization of the data, but it is still missing important items such as a title and a colormap legend.
Below, plot the quality filtered data, add a figure title/subtitle and colormap legend, and export the result to a .png file in the output directory.
# Set the figure size
fig = plt.figure(figsize = (10,10))

# Remove the axes' values
plt.axis('off')

# Make a figure title
fig.suptitle('MODIS Version 6 Enhanced Vegetation Index (EVI) Quality Filtered Data', fontsize = 12, fontweight = 'bold')

# Make a figure subtitle
ax = fig.add_subplot(111)
fig.subplots_adjust(top = 2.2)
ax.set_title('Amazon Rainforest: 07-12-2005')
ax1 = plt.gca()

# Plot the masked data
im = ax1.imshow(v6_EVI_masked, vmin = -0.2, vmax = 1, cmap = 'YlGn');

# Add a colormap legend
plt.colorbar(im, orientation='horizontal', fraction = 0.047, pad = 0.02, label = 'EVI').outline.set_visible(True)

# Set up file name and export to png file
import warnings
warnings.filterwarnings("ignore") 
fig.savefig('{}{}_evi.png'.format(out_dir, (file_list[1].split('.nc'))[0]))

EVI data over the Amazon Rainforest.


# 8. Masking by Land Cover Type
The case study limited the analysis to "forest" land covers. The example below demonstrates how to limit your analysis based on land cover.
# Use MCD12Q1.006 to define land cover types
file_list
['MCD12Q1.006_500m_aid0001.nc', 'MOD13A2.006_1km_aid0001.nc']
# set to 'MCD12Q1.006_aid0001.nc' and open
lct_file = Dataset(file_list[0],'r', format = 'NETCDF4')

# print a list of variables in file
list(lct_file.variables)
['crs', 'time', 'lat', 'lon', 'LC_Type1', 'QC']
# Show the variable metadata
lct_file.variables['LC_Type1']
<class 'netCDF4._netCDF4.Variable'>
int16 LC_Type1(time, lat, lon)
    _FillValue: 255
    coordinates: time lat lon
    grid_mapping: crs
    valid_min: 1
    valid_max: 17
    long_name: Land_Cover_Type_1
    Evergreen_Needleleaf_Forests: 1
    Evergreen_Broadleaf_Forests: 2
    Deciduous_Needleleaf_Forests: 3
    Deciduous_Broadleaf_Forests: 4
    Mixed_Forests: 5
    Closed_Shrublands: 6
    Open_Shrublands: 7
    Woody_Savannas: 8
    Savannas: 9
    Grasslands: 10
    Permanent_Wetlands: 11
    Croplands: 12
    Urban_and_Built_up_Lands: 13
    Cropland_Natural_Vegetation_Mosaics: 14
    Permanent_Snow_and_Ice: 15
    Barren: 16
    Water_Bodies: 17
    Unclassified: -1
unlimited dimensions: 
current shape = (1, 7200, 8400)
filling on
Above, notice the class number assignments. Below, mask the data to only include forest land cover types (LCT).
In the study, the researchers used the MOD12Q1 Collection 5 Land Cover product to identify forest pixels in their study area. This tutorial uses the newer MCD12Q1 Version 5.1 Land Cover Type product to only include pixels in our study region that are classified as one of five forest types:
  • evergreen_needleleaf_forest: 1
  • evergreen_broadleaf_forest: 2
  • deciduous_needleleaf_forest: 3
  • deciduous_broadleaf_forest: 4
  • mixed_forests: 5
# Open the Land_Cover_Type_1 dataset
lct = lct_file.variables['LC_Type1'][:][0]

# Make a list of the forest land cover types outlined above
lct_forest = [1,2,3,4,5]

# Mask the dataset to only include forest LCTs
lct_masked_forest = np.ma.MaskedArray(lct, np.in1d(lct, lct_forest, invert = True))

Before using the newly created forest LCT mask to filter out non-forest pixels from the EVI data, the 500 m LCT data will need to be resampled to the same resolution as the EVI data (1000 m).

Below, the scipy library is used to resample the data from 500 m to 1000 m using nearest neighbor resampling.

import scipy.ndimage

# Resample by a factor of 2 with nearest neighbor interpolation
resampled_lct = scipy.ndimage.zoom(lct, 0.5, order=0)

# Next, exclude non-Forest land covers from the EVI data
v6_EVI_masked_forest = np.ma.MaskedArray(v6_EVI_masked, np.in1d(resampled_lct, lct_forest, invert = True))

del lct
Visualize the (1) original data, (2) quality masked data, & (3) quality masked forest-only data by creating 3 plots & plotting them side by side.
# Show before/after quality screening
figure, axes = plt.subplots(nrows=1, ncols=3, figsize=(40, 10), sharey=True)
figure.suptitle('MODIS Version 6 Enhanced Vegetation Index (EVI): 07-12-2005', fontsize=32, fontweight='bold')

# Turn off axes' values
axes[0].axis('off'), axes[1].axis('off'), axes[2].axis('off')

# Plot the original data
axes[0].set_title('Original Data', fontsize=20, fontweight='bold')
axes[0].imshow(v6_EVI[0], vmin = -0.2, vmax = 1.0, cmap = 'YlGn')

# Plot the quality filtered data
axes[1].set_title('Quality Filtered Data', fontsize=20, fontweight='bold')
axes[1].imshow(v6_EVI_masked, vmin = -0.2, vmax = 1.0, cmap = 'YlGn')

# Plot the forest-only data
axes[2].set_title('Quality Filtered Forest Data', fontsize=20, fontweight='bold')
axes[2].imshow(v6_EVI_masked_forest, vmin = -0.2, vmax = 1.0, cmap = 'YlGn');

EVI data over the Amazon Rainforest.


# 9. Visualizing NetCDF-4 Data Using Basemap
Next, import a matplotlib library called Basemap, which allows you to create geographically referenced plots and add other useful layers such as country boundaries and coastlines.
In the first Basemap example, define a colormap for the LCT data and plot with coastlines and country borders.
from matplotlib.colors import LinearSegmentedColormap

# Define colormap
lct_cmap = ((0.149,0.451,0),(0.220,0.659,0),(0.298,0.902,0),(0.639,1,0.451),(0.266,0.537,0.439))
cmap1 = LinearSegmentedColormap.from_list("LCT", lct_cmap, N=5, gamma=1.0)

del lct_cmap

# Remove Matplotlib deprecation warnings when importing Basemap
import warnings
warnings.simplefilter("ignore")
Explanation of 'Basemap()' parameters:
    projection = the desired projection (EPSG: 4326 here)
    ellps = the desired datum (WGS84 here)
    llcrnrlat= lower left bounding box latitude
    urcrnrlat= upper right bounding box latitude
    llcrnrlon= lower left bounding box longitude
    urcrnrlon= upper right bounding box longitude
    resolution= resolution of ancillary boundary layers in basemap plot, 'i' here = intermediate resolution
For more information, please see the Basemap documentation.
Below, import Basemap, set up the lat/lon grid, generate the Basemap, draw the desired borders, and ultimately plot the lct (forest only) data, assigning it to the colormap generated above.
# Import Basemap library
from mpl_toolkits.basemap import Basemap ;

# Set up the figure
fig = plt.figure(figsize=(8,8))

# Set the extent values
x0, y0 = (lon_EVI_min, lat_EVI_min)
x1, y1 = (lon_EVI_max, lat_EVI_max)

# Create Basemap
m = Basemap(projection='cyl', ellps = 'WGS84',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l');

# Draw coastlines and country boundaries, zorder is used to define the order of elements
m.drawcoastlines(zorder = 10), m.drawcountries(zorder = 10)
m.drawmapboundary(fill_color='lightblue'), m.fillcontinents(color='white')

# Plot the LCT data
plt.imshow(lct_masked_forest, vmin = 1 , vmax = 5, cmap = cmap1, zorder =5, origin = 'upper', extent= (x0, x1, y0, y1))

# Add a title
plt.title("Forest Land Cover Types of the Amazon Basin (2005)")
plt.show()

warnings.simplefilter("ignore")
lct_file.close()
del lct_masked_forest

Forest Land Cover Type over Amazon.


# 10. Generating Statistics on Quality Filtered Data
Below, look at the mean EVI values for the Amazon region from July 12, 2005. Notice that differences arise based on (1) quality filtering (or lack thereof) and (2) land cover type used in calculating the mean EVI.
# Call np.mean() and print results
print('Version 6 all quality mean EVI: ' + str(float(format((np.mean(v6_EVI[0])), '.4f'))))
print('Version 6 quality filtered mean EVI: ' + str(float(format((np.mean(v6_EVI_masked)), '.4f'))))
print('Version 6 quality filtered mean (forests) EVI: ' + str(float(format((np.mean(v6_EVI_masked_forest)), '.4f'))))
Version 6 all quality mean EVI: 0.4078
Version 6 quality filtered mean EVI: 0.4072
Version 6 quality filtered mean (forests) EVI: 0.5003

# 11. Automating QA/QC Filtering
Here loop through and mask arrays by quality for multiple time steps in a NetCDF-4 file.
# Make a copy of the original NetCDF-4 file
v6_EVI_final = v6_EVI
Below, use a 'for' loop and indexing (i) in order to loop through each layer and filter based on quality and land cover type.
# Loop through all timesteps (observations) in the file, mask out poor quality and exclude non-forest pixels
for i in range(len(dates)):

    # Next, apply QA mask to the EVI data
    v6_EVI_masked = np.ma.MaskedArray(v6_EVI[i], np.in1d(v6_QA[i], v6_QA_mask, invert = True));

    # Next, # Next, exclude npn-Forest land covers from the EVI data
    v6_EVI_final[i] = np.ma.MaskedArray(v6_EVI_masked, np.in1d(resampled_lct, lct_forest, invert = True));
del v6_EVI, v6_EVI_masked, v6_QA, resampled_lct

# 12. Visualizing Time Series
Below, set up subplots to visualize all 6 timesteps (observations) on the same plot using Basemap.
# Set up subplots with row and column numbers
figure, axes = plt.subplots(nrows=int(len(dates)/3), ncols=3, figsize=(25, 10), sharey=True, sharex=True)
# Add a title for the series of plots
figure.suptitle('MODIS Version 6 Enhanced Vegetation Index (EVI): 2005-07-12 to 2005-09-30', fontsize=32, fontweight='bold')

for i in range(int(len(dates)/3+1)):
    for j in range(0,3):
        try:
            axes[i][j].set_title(dates[3*i+j],fontsize=20,fontweight='bold')
            # Plot each observation
            axes[i][j].imshow(v6_EVI_final[3*i+j],vmin = -0.2, vmax = 1.0, cmap = 'YlGn',zorder = 5, origin = 'upper', 
                              extent= (x0, x1, y0, y1))
            # Set up the basemap for each subplot
            m2 =Basemap(projection='cyl',llcrnrlat=-20,urcrnrlat=13,llcrnrlon=-83,urcrnrlon=-34,resolution='l',
                        ellps ='WGS84',ax=axes[i][j])
            m2.drawcoastlines(zorder=10)
            m2.drawcountries(zorder=10)
            m2.drawmapboundary(fill_color='lightblue'),m2.fillcontinents(color='white');
        except IndexError:
            pass

MODIS V6 EVI


# 13. Exporting NetCDF-4 Files
Lastly, demonstrate how to copy the original NetCDF-4 file, insert the quality and LCT-masked data, and save to a new NetCDF-4 file.
# Create output file
v6_file_out = Dataset(out_dir+file_list[1].split('.nc')[0] + '_masked.nc','w', format = 'NETCDF4')

#Copy dimensions of the input file
print(file_in.dimensions.items())
v6_file_out.createDimension('time', len(dates))
v6_file_out.createDimension('lat', len(file_in.dimensions['lat']))
v6_file_out.createDimension('lon', len(file_in.dimensions['lon']))

# Copy variables from the input file
v6_variables = list(file_in.variables)
Below, use a 'for' loop to iterate through each variable, copy the input file attributes and dimensions, replace the original EVI data with our QA and LCT-masked data, and export to a new file.
# loop through each variable
for j in range(len(v6_variables)):
    # Copy the input variable 
    in_var = file_in.variables[v6_variables[j]]
    # Create the output variable
    out_var = v6_file_out.createVariable(v6_variables[j], in_var.datatype, in_var.dimensions)
    # Copy variable attributes
    out_var.setncatts({k: in_var.getncattr(k) for k in in_var.ncattrs()})
    # replace the original EVI data with our quality and forest masked data
    if v6_variables[j] == '_1_km_16_days_EVI':
        out_var[:] = v6_EVI_final[:]
    elif v6_variables[j] == 'time':
            out_var[:] = in_var[1:]
    elif v6_variables[j] =='_1_km_16_days_VI_Quality':
        out_var[:] = in_var[1:]
    else: 
        out_var[:] = in_var[:]
# close the input and output files
v6_file_out.close()
file_in.close()

# Contact Information ##### Material written by LP DAAC1 **Contact:** **Voice:** +1-605-594-6116 **Organization:** Land Processes Distributed Active Archive Center (LP DAAC) **Website:** **Date last modified:** 05-10-2022 1Innovate! Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, USA. Work performed under USGS contract G15PD00403 for LP DAAC2. 2LP DAAC Work performed under NASA contract NNG14HH33I.

Relevant Products

Product Long Name
MOD13A2.006 MODIS/Terra Vegetation Indices 16-Day L3 Global 1 km SIN Grid
MCD12Q1.006 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500 m SIN Grid

Tools

Name Filters Description
AppEEARS Decode Quality, Order, Search, Subset

The Application for Extracting and Exploring Analysis Ready Samples (AρρE…