Working with AppEEARS NetCDF-4 Output Data in Python


Objective:

This tutorial demonstrates how to work with data output from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) Area Sampler in Python.

AppEEARS allows users to select NetCDF-4 as an output file format for Area Requests. This tutorial provides step-by-step instructions for filtering NetCDF-4 files based on specific quality and land cover type determinations.

Case Study:

Throughout the tutorial, the following study is used as an example of how and why quality filtering is necessary in order to arrive at scientifically sound conclusions when working with remote sensing data. The tutorial follows similar data and methods of this study as closely as possible to demonstrate how to work with AppEEARS NetCDF-4 output files in Python.

  • Samanta, A., Ganguly, S., Hashimoto, H., Devadiga, S., Vermote, E., Knyazikhin, Y., Nemani, R.R., and Myneni, R.B., 2010, Amazon forests did not green‐up during the 2005 drought: Geophysical Research Letters, v. 37, no. 5.

What: The study tries to reproduce reports from previous studies showing greening of Amazon forests during the 2005 drought.
Why: To demonstrate the importance of considering remote sensing data quality, and reconcile contradicting reports of increased tree mortality and biomass burning with reports of anomalous Amazon forest greening.
How: The authors replicated a previous study claiming that Amazon forests were greening during the 2005 drought, with very specific quality and land cover type screening in their methods. They then used the quality-filtered data to calculate standard anomalies of EVI in order to arrive at their conclusions.
Where: Amazon Rainforest, from latitude: 20° S to 10° N, and longitude: 80° W to 45° W
When: July-September 2000-2008

AppEEARS Request Parameters:

Request time series data for MODIS Enhance Vegetation Indices (EVI) and Land Cover.

  • Date: July 01, 2005 to September 30, 2005
  • Data layers:
    • Terra MODIS Vegetation Indices
    • Combined MODIS Land Cover Type
  • Location: 20° S to 10° N, and longitude: 80° W to 45° W
  • File Format: NetCDF-4
  • Projection: Geographic (EPSG: 4326)

AppEEARS Information:

To access AppEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/
For information on how to make an AppEEARS request, visit: https://lpdaacsvc.cr.usgs.gov/appeears/help


Topics Covered:

  1. Setting Up the Working Environment
  2. Downloading AppEEARS Output in Python from Text File
  3. Opening a NetCDF-4 (.nc) File
  4. Reading NetCDF-4 File Metadata
  5. Reading Quality Lookup Tables (LUTs) from CSV Files
  6. Setting Custom QA/QC Filtering (Masking)
  7. Visualizing NetCDF-4 Data
  8. Masking by Land Cover Type
  9. Visualizing NetCDF-4 Data Using Basemap
  10. Generating Statistics on Quality Filtered Data
  11. Automating QA/QC Filtering
  12. Visualizing Time Series
  13. Exporting NetCDF-4 Files

Prerequisites:

A NASA Earthdata Login account is required to download the data used in this tutorial. You can create an account at the link provided.

  • Python Version 3.6.1
    • shutil
    • urllib
    • os
    • glob
    • numpy
    • netCDF4
    • matplotlib
    • pandas
    • warnings
    • Basemap
    • scipy

NOTE: You will need to have cURL installed on your OS in order to perform topic two (downloading). If you are unable to download cURL, you can skip step two and download the files needed to run the rest of the tutorial in the section below.


Procedures:

1. The AppEEARS files used in this tutorial can be downloaded from:

2. Copy/clone/download the AppEEARS Tutorial repo, or the desired tutorial from the LP DAAC Data User Resources Repository.

3. The source code used to generate this tutorial can be downloaded from:


Python Environment Setup

2. Setup

  • Using your preferred command line interface (command prompt, terminal, cmder, etc.) type the following to successfully create a compatible python environment:
    • conda create -n appeearstutorial -c conda-forge python=3.7 netCDF4 numpy pandas matplotlib=3.1.3 scipy basemap
    • conda activate appeearstutorial
    • jupyter notebook

If you do not have jupyter notebook installed, you may need to run:

  • conda install jupyter notebook

TIP: Having trouble activating your environment, or loading specific packages once you have activated your environment? Try the following:

Type: 'conda update conda'

If you prefer to not install Conda, the same setup and dependencies can be achieved by using another package manager such as pip and the requirements.txt file listed above.
Additional information on setting up and managing Conda environments.

Still having trouble getting a compatible Python environment set up? Contact LP DAAC User Services.


1. Setting Up A Working Environment

First, import the python libraries needed to run this tutorial. If you are missing any of the packages, you will need to download them in order to run the entire notebook.

In [2]:
# Set matplotlib plots inline
%matplotlib inline

# Import libraries
import urllib
import shutil
import os
import glob
import numpy as np
from netCDF4 import Dataset

Next, set your input/working directory, and create an output directory for the results.

In [53]:
# Set input directory, and change working directory
in_dir = input("Main input directory: ")                     # IMPORTANT: Need to enter your working directory
os.chdir(in_dir)

# Create and set output directory
out_dir = os.path.normpath((os.path.split(in_dir)[0] + os.sep + 'output/' ))+ '\\'
if not os.path.exists(out_dir):
    os.makedirs(out_dir)
Main input directory: C:\appeears-tutorials\

2. Downloading AppEEARS Output in Python from Text File

Now take an AppEEARS output download text file and download all of your NetCDF-4 output files.

AppEEARS Download Textfile Example

For more information on how to retrieve the output download text file from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.

In [6]:
# Search for, open, and read text file containing links to AppEEARS output files
download_file = open(glob.glob('**.txt')[0],'r')
download_link = download_file.readlines()

# Loop through text file and download all AppEEARS outputs
for i in range(len(download_link)):  
    file_name = download_link[i].split('/')[-1].strip()
    
    # Download the file
    response = urllib.request.urlopen(download_link[i])
    out_file = open(file_name, 'wb')
    shutil.copyfileobj(response, out_file)
    print('Downloaded file: '+ file_name + ' (' + str(i+1) + " of " + str(len(download_link)) + ')')
    out_file.close()
    
# Once you have finished downloading the files, remove variables not needed for the remainder of the tutorial
del download_link, file_name, download_file
Downloaded file: MCD12Q1.006_500m_aid0001.nc (1 of 2)
Downloaded file: MOD13A2.006_1km_aid0001.nc (2 of 2)

Notice that two files were downloaded. For an AppEEARS Area Sample request, if NetCDF-4 is selected as the output format, outputs will be grouped into .nc files by product and by feature.


3. Opening a NetCDF-4 (.nc) File

Search for the NetCDF-4 files that you just downloaded in your working directory, create a list of the files, and print the list.

In [11]:
# Now that we have files downloaded, search for downloaded files
file_list = glob.glob('**.nc')
file_list
Out[11]:
['MCD12Q1.006_500m_aid0001.nc', 'MOD13A2.006_1km_aid0001.nc']

Notice that the request includes the version 6 MODIS Combined Yearly Land Cover Type (MCD12Q1.06) and version 6 MODIS Terra 16-day Vegetation Indices (MOD13A2.006) products. Start with the MOD13A2 version 6 NetCDF file.

Notice below that the file_list index is set to 1, to select the MOD13A2.006.nc file above [0,1]. Read the file in by calling the Dataset() function from the NetCDF4 library.

In [12]:
# Read file in, starting with MOD13A2 version 6
file_in = Dataset(file_list[1],'r', format = 'NETCDF4')

4. Reading NetCDF-4 File Metadata

Below, look at the variables and metadata in the NetCDF-4 file.

In [13]:
# Print a list of variables in file
list(file_in.variables)
Out[13]:
['crs', 'time', 'lat', 'lon', '_1_km_16_days_EVI', '_1_km_16_days_VI_Quality']

_1_km_16_days_EVI and _1_km_16_days_VI_Quality are the data layers focused on for this tutorial, which are Enhanced Vegetation Index (EVI) and the corresponding quality data that were used in the case study.

Below, read in the metadata for the EVI dataset.

In [14]:
# Grab variable and variable metadata
v6_info = file_in.variables['_1_km_16_days_EVI']
print(v6_info)

# File dimensions
file_in.dimensions.values()
<class 'netCDF4._netCDF4.Variable'>
int16 _1_km_16_days_EVI(time, lat, lon)
    _FillValue: -3000
    coordinates: time lat lon
    grid_mapping: crs
    valid_min: -2000
    valid_max: 10000
    long_name: 1 km 16 days EVI
    units: EVI
    scale_factor_err: 0.0
    add_offset_err: 0.0
    calibrated_nt: 5
    scale_factor: 0.0001
    add_offset: 0.0
unlimited dimensions: 
current shape = (7, 3600, 4200)
filling on
Out[14]:
dict_values([<class 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 7, <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 3600, <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 4200])

Above, notice the useful metadata, including things like units, scale factor, and fill value.

  • Note that the netCDF4 module automatically applies scale_factor and add_offset values.

The time variables are days since 2000-01-01. Below, set the dates to a variable.

In [15]:
from netCDF4 import num2date

times = file_in.variables["time"]                            # import the time variable 
dates = num2date(times[:], times.units)                      # get the time info
dates = [date.strftime('%Y-%m-%d') for date in dates]        # get the list of datetime
print(dates)
['2005-06-26', '2005-07-12', '2005-07-28', '2005-08-13', '2005-08-29', '2005-09-14', '2005-09-30']

For this tutorial, we use observations from July to September 2005. Below, the date outside this time span is removed.

In [16]:
dates.remove('2005-06-26')
print(dates)
['2005-07-12', '2005-07-28', '2005-08-13', '2005-08-29', '2005-09-14', '2005-09-30']

Below, set the dataset arrays to variables by adding [:].

In [17]:
# import the EVI and VI Quality datasets
v6_EVI = file_in.variables['_1_km_16_days_EVI'][1:]          # select the EVI layers from July to September 
print(v6_EVI.shape)
v6_QA = file_in.variables['_1_km_16_days_VI_Quality'][1:]   # select the EVI quality layers from July to September
print(v6_QA.shape)
(6, 3600, 4200)
(6, 3600, 4200)

Above, notice the NetCDF-4 file dimensions. 'time' indicates that the file includes 6 timesteps or layers (observations).

Next, grab the min/max values from the latitude and longitude arrays that are included in the NetCDF-4 files.

In [18]:
# Set lat and lon min/max for EVI data
lat_EVI_min = file_in.variables['lat'][:].min()
lat_EVI_max = file_in.variables['lat'][:].max()
lon_EVI_min = file_in.variables['lon'][:].min()
lon_EVI_max = file_in.variables['lon'][:].max()

5. Reading Quality Lookup Tables (LUTs) from CSV Files

In this section, take the quality lookup table (LUT) CSV file from an AppEEARS request, read it into Python using the pandas library, and use it to help set the quality masking parameters.

The quality LUT for your request (files ending in lookup.csv) can be found under Supporting Files on the Download Area Sample page:

For more information on how to retrieve the quality LUTs from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.

Next, search the working directory for the quality LUT CSV file.

In [19]:
# Search for look up table 
lut = glob.glob('**lookup.csv')
lut
Out[19]:
['MOD13A2-006-1-km-16-days-VI-Quality-lookup.csv']

Below, import the pandas library and use the pd.read_csv function to read the CSV file in as a pandas dataframe.

In [20]:
import pandas as pd

# Read in the lut
v6_QA_lut = pd.read_csv(lut[0])
v6_QA_lut
Out[20]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
0 2050 Pixel produced, but most probably cloudy Highest quality Climatology No No No Land (Nothing else but land) No No
1 2057 VI produced, but check other QA Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
2 2058 Pixel produced, but most probably cloudy Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
3 2061 VI produced, but check other QA Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
4 2062 Pixel produced, but most probably cloudy Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
... ... ... ... ... ... ... ... ... ... ...
314 40386 Pixel produced, but most probably cloudy Highest quality High Yes No Yes Shallow inland water No Yes
315 40422 Pixel produced, but most probably cloudy Decreasing quality (1001) High Yes No Yes Shallow inland water No Yes
316 40426 Pixel produced, but most probably cloudy Decreasing quality (1010) High Yes No Yes Shallow inland water No Yes
317 51199 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Shallow ocean Yes Yes
318 63487 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Moderate or continental ocean Yes Yes

319 rows × 10 columns

Scroll through the table above to see the quality information contained in the LUT. Value refers to the pixel values, followed by the specific bit beginning with MODLAND. AppEEARS decodes each binary quality bit-field into more meaningful information for the user.

The case study used the following quality parameters to define their screening process:

1. MODLAND_QA flag = 0 or 1

    0 = good quality 
    1 = check other QA  

2. VI usefulness <= 11

    0000    Highest quality = 0  
    0001    Lower quality = 1  
    0010    Decreasing quality = 2  
    ...  
    1011   Decreasing quality = 11      

3. Adjacent cloud detected, Mixed Clouds, and Possible shadow = 0

    0 = No

4. Aerosol Quantity = 1 or 2

    1 = low aerosol
    2 = average aerosol  

Below, use the pixel quality flag parameters outlined above in order to filter the data. Pandas makes this easier by allowing you to search the dataframe by column and value, and only keep the specific values that relate to the parameters outlined above.

Notice in the line: v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)], it is including all rows where the MODLAND column is either 'VI_produced, good quality' or 'VI produced, but check other QA'.

However, in the line: v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)], by adding the tilde in front of the command, the pandas dataframe will exclude the values in the VIU variable.

In [21]:
# Include good quality based on MODLAND
modland = ['VI produced, good quality', 'VI produced, but check other QA']
v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)]

# Exclude lower quality VI usefulness
VIU =["Lowest quality","Quality so low that it is not useful","L1B data faulty","Not useful for any other reason\/not processed"]
v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)]

# Include low or average aerosol
AQ = ['Low','Average']
v6_QA_lut = v6_QA_lut[v6_QA_lut['Aerosol Quantity'].isin(AQ)]

# Include where adjacent cloud, mixed clouds, or possible shadow were not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Adjacent cloud detected'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Mixed Clouds'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Possible shadow'] == 'No' ]
v6_QA_lut
Out[21]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
5 2112 VI produced, good quality Highest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced, good quality Lower quality Low No No No Land (Nothing else but land) No No
9 2177 VI produced, but check other QA Highest quality Average No No No Land (Nothing else but land) No No
10 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
12 2185 VI produced, but check other QA Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
64 4160 VI produced, good quality Highest quality Low No No No Ocean coastlines and lake shorelines No No
66 4164 VI produced, good quality Lower quality Low No No No Ocean coastlines and lake shorelines No No
68 4225 VI produced, but check other QA Highest quality Average No No No Ocean coastlines and lake shorelines No No
69 4229 VI produced, but check other QA Lower quality Average No No No Ocean coastlines and lake shorelines No No
71 4233 VI produced, but check other QA Decreasing quality (0010) Average No No No Ocean coastlines and lake shorelines No No
104 6208 VI produced, good quality Highest quality Low No No No Shallow inland water No No
106 6212 VI produced, good quality Lower quality Low No No No Shallow inland water No No
108 6273 VI produced, but check other QA Highest quality Average No No No Shallow inland water No No
110 6277 VI produced, but check other QA Lower quality Average No No No Shallow inland water No No
112 6281 VI produced, but check other QA Decreasing quality (0010) Average No No No Shallow inland water No No

6. Setting Custom QA/QC Filtering (Masking)

This section demonstrates how to take the list of possible values (based on our quality requirements) above, and mask the EVI and quality arrays to only include pixels where the quality value is in the list of 'good quality' values based on the parameters from the case study.

In [22]:
# Print list of possible QA values based on parameters above
v6_QA_mask = list(v6_QA_lut['Value'])
del v6_QA_lut

v6_QA_mask
Out[22]:
[2112,
 2116,
 2177,
 2181,
 2185,
 4160,
 4164,
 4225,
 4229,
 4233,
 6208,
 6212,
 6273,
 6277,
 6281]

So above, only pixels with those specific _1_km_16_days_VI_Quality values will be included after masking.

Below, mask the first observation (index 0) from the EVI array. Notice that invert is set to True here, meaning that you don't want to exclude the list of values above, but rather exclude every other value.

In [23]:
# Apply QA mask to the EVI data 
v6_EVI_masked = np.ma.MaskedArray(v6_EVI[0], np.in1d(v6_QA[0], v6_QA_mask, invert = True))

7. Visualizing NetCDF-4 Data

In this section, begin by highlighting the functionality of matplotlib plotting. First, make a basic plot of the EVI data. Next, add some additional parameters to the plot. Lastly, export the finished plot to an image file.

In [24]:
import matplotlib.pyplot as plt

# Visualize a basic plot
plt.figure(figsize = (10,10))                # Set the figure size
plt.axis('off')                              # Remove the axes' values

# Plot the original data, using a colormap and setting a custom linear stretch based on the min/max EVI values
plt.imshow(v6_EVI[0], vmin = -0.2 , vmax = 1, cmap = 'YlGn');        # Adding ';' to the end of the call suppresses the text output from showing in the notebook