Working with AppEEARS NetCDF-4 Output Data in Python¶

This tutorial demonstrates how to work with data output from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) Area Sampler in Python.¶

AppEEARS allows users to select NetCDF-4 as an output file format for Area Requests. This tutorial provides step-by-step instructions for filtering NetCDF-4 files based on specific quality and land cover type determinations.

Case Study:¶

Throughout the tutorial, the following study is used as an example of how and why quality filtering is necessary in order to arrive at scientifically sound conclusions when working with remote sensing data. The tutorial follows similar data and methods of this study as closely as possible to demonstrate how to work with AppEEARS NetCDF-4 output files in Python.¶

  • Samanta, A., Ganguly, S., Hashimoto, H., Devadiga, S., Vermote, E., Knyazikhin, Y., Nemani, R.R., and Myneni, R.B., 2010, Amazon forests did not green‐up during the 2005 drought: Geophysical Research Letters, v. 37, no. 5.

What: The study tries to reproduce reports from previous studies showing greening of Amazon forests during the 2005 drought.
Why: To demonstrate the importance of considering remote sensing data quality, and reconcile contradicting reports of increased tree mortality and biomass burning with reports of anomalous Amazon forest greening.
How: The authors replicated a previous study claiming that Amazon forests were greening during the 2005 drought, with very specific quality and land cover type screening in their methods. They then used the quality-filtered data to calculate standard anomalies of EVI in order to arrive at their conclusions.
Where: Amazon Rainforest, from latitude: 20° S to 10° N, and longitude: 80° W to 45° W
When: July-September 2000-2008
AppEEARS Request Parameters:
Request time series data for MODIS Enhance Vegetation Indices (EVI) and Land Cover.

  • Date: July 01, 2005 to September 30, 2005
  • Data layers:
    • Terra MODIS Vegetation Indices
    • Combined MODIS Land Cover Type
  • Location: 20° S to 10° N, and longitude: 80° W to 45° W
  • File Format: NetCDF-4
  • Projection: Geographic (EPSG: 4326)

AppEEARS Information:¶

To access AppEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/
For information on how to make an AppEEARS request, visit: https://lpdaacsvc.cr.usgs.gov/appeears/help

Topics Covered:¶

  1. Setting Up the Working Environment
  2. Downloading AppEEARS Output in Python from Text File
  3. Opening a NetCDF-4 (.nc) File
  4. Reading NetCDF-4 File Metadata
  5. Reading Quality Lookup Tables (LUTs) from CSV Files
  6. Setting Custom QA/QC Filtering (Masking)
  7. Visualizing NetCDF-4 Data
  8. Masking by Land Cover Type
  9. Visualizing NetCDF-4 Data Using Basemap
  10. Generating Statistics on Quality Filtered Data
  11. Automating QA/QC Filtering
  12. Visualizing Time Series
  13. Exporting NetCDF-4 Files

Prerequisites:¶

  • Python 2.7

Libraries:¶

  • urllib2
  • shutil
  • os
  • glob
  • numpy
  • netCDF4
  • matplotlib
  • pandas
  • warnings
  • Basemap
  • scipy

NOTE: You will need to have cURL installed on your OS in order to perform topic two (downloading). If you are unable to download cURL, you can skip step two and download the files needed to run the rest of the tutorial in the section below.

The AppEEARS files used in this tutorial can be downloaded from:¶

The source code used to generate this tutorial can be downloaded from:¶

1. Setting Up A Working Environment¶

First, import the python libraries needed to run this tutorial. If you are missing any of the packages, you will need to download them in order to run the entire notebook.¶

In [1]:
# Set matplotlib plots inline
%matplotlib inline

# Import libraries
import urllib2
import shutil
import os
import glob
import numpy as np
import netCDF4
from netCDF4 import Dataset

Next, set your input/working directory, and create an output directory for the results.¶

In [2]:
# Set input directory, and change working directory
in_dir = 'C:/Users/ckrehbiel/Documents/appeears-tutorials/' # IMPORTANT: Need to update this to reflect your working directory
os.chdir(in_dir)

# Create and set output directory
out_dir = os.path.normpath((os.path.split(in_dir)[0] + os.sep + 'output/' ))+ '\\'
if not os.path.exists(out_dir):
    os.makedirs(out_dir)

2. Downloading AppEEARS Output in Python from Text File¶

Now take an AppEEARS output download text file and download all of your NetCDF-4 output files.¶

AppEEARS Download Textfile Example

For more information on how to retrieve the output download text file from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.¶

In [3]:
# Search for, open, and read text file containing links to AppEEARS output files
download_file = open(glob.glob('**.txt')[0],'r')
download_link = download_file.readlines()

# Loop through text file and download all AppEEARS outputs
for i in range(len(download_link)):  
    file_name = download_link[i].split('/')[-1].strip()
    
    # Download the file
    response = urllib2.urlopen(download_link[i])
    out_file = open(file_name, 'wb')
    shutil.copyfileobj(response, out_file)
    print('Downloaded file: '+ file_name + ' (' + str(i+1) + " of " + str(len(download_link)) + ')')
    out_file.close()
    
# Once you have finished downloading the files, remove variables not needed for the remainder of the tutorial
del download_link, file_name, download_file
Downloaded file: MCD12Q1.051_aid0001.nc (1 of 2)
Downloaded file: MOD13A2.006_aid0001.nc (2 of 2)

Notice that two files were downloaded. For an AppEEARS Area Sample request, if NetCDF-4 is selected as the output format, outputs will be grouped into .nc files by product and by feature.

3. Opening a NetCDF-4 (.nc) File¶

Search for the NetCDF-4 files that you just downloaded in your working directory, create a list of the files, and print the list.¶

In [4]:
# Now that we have files downloaded, search for downloaded files
file_list = glob.glob('**.nc')
file_list
Out[4]:
['MCD12Q1.051_aid0001.nc', 'MOD13A2.006_aid0001.nc']

Notice that the request includes the version 5.1 MODIS Combined Yearly Land Cover Type (MCD12Q1.051) and version 6 MODIS Terra 16-day Vegetation Indices (MOD13A2.006) products. Start with the MOD13A2 version 6 NetCDF file.¶

Notice below that the file_list index is set to 1, to select the MOD13A2.006.nc file above [0,1]. Read the file in by calling the Dataset() function from the NetCDF4 library.¶

In [5]:
# Read file in, starting with MOD13A2 version 6
file_in = Dataset(file_list[1],'r', format = 'NETCDF4')

4. Reading NetCDF-4 File Metadata¶

Below, look at the variables and metadata in the NetCDF-4 file.¶

In [6]:
# Print a list of variables in file
list(file_in.variables)
Out[6]:
[u'crs',
 u'time',
 u'lat',
 u'lon',
 u'_1_km_16_days_EVI',
 u'_1_km_16_days_VI_Quality']

_1_km_16_days_EVI and _1_km_16_days_VI_Quality are the data layers focused on for this tutorial, which are Enhanced Vegetation Index (EVI) and the corresponding quality data that were used in the case study.¶

Below, read in the metadata for the EVI dataset.¶

In [7]:
# Grab variable and variable metadata
v6_info = file_in.variables['_1_km_16_days_EVI']
v6_info
Out[7]:
<type 'netCDF4._netCDF4.Variable'>
int16 _1_km_16_days_EVI(time, lat, lon)
    _FillValue: -3000
    coordinates: time lat lon
    grid_mapping: crs
    valid_min: -2000
    valid_max: 10000
    long_name: 1 km 16 days EVI
    units: EVI
    scale_factor_err: 0.0
    add_offset_err: 0.0
    calibrated_nt: 5
    scale_factor: 0.0001
    add_offset: 0.0
unlimited dimensions: 
current shape = (6, 3600, 4200)
filling on

Above, notice the useful metadata, including things like units, scale factor, and fill value.¶

Below, set the dataset arrays to variables by adding [:].¶

In [8]:
# import the EVI and VI Quality datasets
v6_EVI = file_in.variables['_1_km_16_days_EVI'][:]
v6_QA = file_in.variables['_1_km_16_days_VI_Quality'][:]


# File dimensions
file_in.dimensions.values()
Out[8]:
[<type 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 6,
 <type 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 3600,
 <type 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 4200]

Above, notice the NetCDF-4 file dimensions. 'time' indicates that the file includes 6 timesteps or layers (observations).¶

Next, grab the min/max values from the latitude and longitude arrays that are included in the NetCDF-4 files.¶

In [9]:
# Set lat and lon min/max for EVI data
lat_EVI_min = file_in.variables['lat'][:].min()
lat_EVI_max = file_in.variables['lat'][:].max()
lon_EVI_min = file_in.variables['lon'][:].min()
lon_EVI_max = file_in.variables['lon'][:].max()

5. Reading Quality Lookup Tables (LUTs) from CSV Files¶

In this section, take the quality lookup table (LUT) CSV file from an AppEEARS request, read it into Python using the pandas library, and use it to help set the quality masking parameters.¶

The quality LUT for your request (files ending in lookup.csv) can be found under Supporting Files on the Download Area Sample page:¶

Example of AppEEARS quality LUT csv files

For more information on how to retrieve the quality LUTs from your request, visit the AppEEARS help pages, under Area Samples -> Downloading Your Request.¶

Next, search the working directory for the quality LUT CSV file.¶

In [10]:
# Search for look up table 
lut = glob.glob('**lookup.csv')
lut
Out[10]:
['MOD13A2-006-1-km-16-days-VI-Quality-lookup.csv']

Below, import the pandas library and use the pd.read_csv function to read the CSV file in as a pandas dataframe.¶

In [11]:
import pandas as pd

# Read in the lut
v6_QA_lut = pd.read_csv(lut[0])
v6_QA_lut
Out[11]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
0 2050 Pixel produced, but most probably cloudy Higest quality Climatology No No No Land (Nothing else but land) No No
1 2057 VI produced, but check other QA Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
2 2058 Pixel produced, but most probably cloudy Decreasing quality (0010) Climatology No No No Land (Nothing else but land) No No
3 2061 VI produced, but check other QA Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
4 2062 Pixel produced, but most probably cloudy Decreasing quality (0011) Climatology No No No Land (Nothing else but land) No No
5 2112 VI produced, good quality Higest quality Low No No No Land (Nothing else but land) No No
6 2114 Pixel produced, but most probably cloudy Higest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced, good quality Lower quality Low No No No Land (Nothing else but land) No No
8 2118 Pixel produced, but most probably cloudy Lower quality Low No No No Land (Nothing else but land) No No
9 2177 VI produced, but check other QA Higest quality Average No No No Land (Nothing else but land) No No
10 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
11 2182 Pixel produced, but most probably cloudy Lower quality Average No No No Land (Nothing else but land) No No
12 2185 VI produced, but check other QA Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
13 2186 Pixel produced, but most probably cloudy Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
14 2241 VI produced, but check other QA Higest quality High No No No Land (Nothing else but land) No No
15 2242 Pixel produced, but most probably cloudy Higest quality High No No No Land (Nothing else but land) No No
16 2253 VI produced, but check other QA Decreasing quality (0011) High No No No Land (Nothing else but land) No No
17 2254 Pixel produced, but most probably cloudy Decreasing quality (0011) High No No No Land (Nothing else but land) No No
18 2257 VI produced, but check other QA Decreasing quality (0100) High No No No Land (Nothing else but land) No No
19 2258 Pixel produced, but most probably cloudy Decreasing quality (0100) High No No No Land (Nothing else but land) No No
20 2317 VI produced, but check other QA Decreasing quality (0011) Climatology Yes No No Land (Nothing else but land) No No
21 2318 Pixel produced, but most probably cloudy Decreasing quality (0011) Climatology Yes No No Land (Nothing else but land) No No
22 2321 VI produced, but check other QA Decreasing quality (0100) Climatology Yes No No Land (Nothing else but land) No No
23 2322 Pixel produced, but most probably cloudy Decreasing quality (0100) Climatology Yes No No Land (Nothing else but land) No No
24 2372 VI produced, good quality Lower quality Low Yes No No Land (Nothing else but land) No No
25 2374 Pixel produced, but most probably cloudy Lower quality Low Yes No No Land (Nothing else but land) No No
26 2376 VI produced, good quality Decreasing quality (0010) Low Yes No No Land (Nothing else but land) No No
27 2378 Pixel produced, but most probably cloudy Decreasing quality (0010) Low Yes No No Land (Nothing else but land) No No
28 2433 VI produced, but check other QA Higest quality Average Yes No No Land (Nothing else but land) No No
29 2441 VI produced, but check other QA Decreasing quality (0010) Average Yes No No Land (Nothing else but land) No No
... ... ... ... ... ... ... ... ... ... ...
287 39309 VI produced, but check other QA Decreasing quality (0011) Average Yes No No Shallow inland water No Yes
288 39313 VI produced, but check other QA Decreasing quality (0100) Average Yes No No Shallow inland water No Yes
289 39317 VI produced, but check other QA Decreasing quality (0101) Average Yes No No Shallow inland water No Yes
290 39361 VI produced, but check other QA Higest quality High Yes No No Shallow inland water No Yes
291 39362 Pixel produced, but most probably cloudy Higest quality High Yes No No Shallow inland water No Yes
292 39381 VI produced, but check other QA Decreasing quality (0101) High Yes No No Shallow inland water No Yes
293 39385 VI produced, but check other QA Decreasing quality (0110) High Yes No No Shallow inland water No Yes
294 39386 Pixel produced, but most probably cloudy Decreasing quality (0110) High Yes No No Shallow inland water No Yes
295 39389 VI produced, but check other QA Decreasing quality (0111) High Yes No No Shallow inland water No Yes
296 39390 Pixel produced, but most probably cloudy Decreasing quality (0111) High Yes No No Shallow inland water No Yes
297 39938 Pixel produced, but most probably cloudy Higest quality Climatology No No Yes Shallow inland water No Yes
298 39966 Pixel produced, but most probably cloudy Decreasing quality (0111) Climatology No No Yes Shallow inland water No Yes
299 39970 Pixel produced, but most probably cloudy Decreasing quality (1000) Climatology No No Yes Shallow inland water No Yes
300 40022 Pixel produced, but most probably cloudy Decreasing quality (0101) Low No No Yes Shallow inland water No Yes
301 40026 Pixel produced, but most probably cloudy Decreasing quality (0110) Low No No Yes Shallow inland water No Yes
302 40066 Pixel produced, but most probably cloudy Higest quality Average No No Yes Shallow inland water No Yes
303 40090 Pixel produced, but most probably cloudy Decreasing quality (0110) Average No No Yes Shallow inland water No Yes
304 40094 Pixel produced, but most probably cloudy Decreasing quality (0111) Average No No Yes Shallow inland water No Yes
305 40130 Pixel produced, but most probably cloudy Higest quality High No No Yes Shallow inland water No Yes
306 40162 Pixel produced, but most probably cloudy Decreasing quality (1000) High No No Yes Shallow inland water No Yes
307 40166 Pixel produced, but most probably cloudy Decreasing quality (1001) High No No Yes Shallow inland water No Yes
308 40194 Pixel produced, but most probably cloudy Higest quality Climatology Yes No Yes Shallow inland water No Yes
309 40222 Pixel produced, but most probably cloudy Decreasing quality (0111) Climatology Yes No Yes Shallow inland water No Yes
310 40226 Pixel produced, but most probably cloudy Decreasing quality (1000) Climatology Yes No Yes Shallow inland water No Yes
311 40230 Pixel produced, but most probably cloudy Decreasing quality (1001) Climatology Yes No Yes Shallow inland water No Yes
312 40386 Pixel produced, but most probably cloudy Higest quality High Yes No Yes Shallow inland water No Yes
313 40422 Pixel produced, but most probably cloudy Decreasing quality (1001) High Yes No Yes Shallow inland water No Yes
314 40426 Pixel produced, but most probably cloudy Decreasing quality (1010) High Yes No Yes Shallow inland water No Yes
315 51199 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Shallow ocean Yes Yes
316 63487 Pixel not produced due to other reasons than c... Not useful for any other reason/not processed High Yes Yes Yes Moderate or continental ocean Yes Yes

317 rows × 10 columns

Scroll through the table above to see the quality information contained in the LUT. Value refers to the pixel values, followed by the specific bit beginning with MODLAND. AppEEARS decodes each binary quality bit-field into more meaningful information for the user.¶

The case study used the following quality parameters to define their screening process:¶

1. MODLAND_QA flag = 0 or 1¶

    0 = good quality 
    1 = check other QA  

2. VI usefulness <= 11¶

    0000    Highest quality = 0  
    0001    Lower quality = 1  
    0010    Decreasing quality = 2  
    ...  
    1011   Decreasing quality = 11      

3. Adjacent cloud detected, Mixed Clouds, and Possible shadow = 0¶

    0 = No

4. Aerosol Quantity = 1 or 2¶

    1 = low aerosol
    2 = average aerosol  

Below, use the pixel quality flag parameters outlined above in order to filter the data. Pandas makes this easier by allowing you to search the dataframe by column and value, and only keep the specific values that relate to the parameters outlined above.¶

Notice in the line: v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)], it is including all rows where the MODLAND column is either 'VI_produced, good quality' or 'VI produced, but check other QA'.

However, in the line: v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)], by adding the tilde in front of the command, the pandas dataframe will exclude the values in the VIU variable.

In [12]:
# Include good quality based on MODLAND
modland = ['VI produced, good quality', 'VI produced, but check other QA']
v6_QA_lut = v6_QA_lut[v6_QA_lut['MODLAND'].isin(modland)]

# Exclude lower quality VI usefulness
VIU =["Lowest quality","Quality so low that it is not useful","L1B data faulty","Not useful for any other reason\/not processed"]
v6_QA_lut = v6_QA_lut[~v6_QA_lut['VI Usefulness'].isin(VIU)]

# Include low or average aerosol
AQ = ['Low','Average']
v6_QA_lut = v6_QA_lut[v6_QA_lut['Aerosol Quantity'].isin(AQ)]

# Include where adjacent cloud, mixed clouds, or possible shadow were not detected
v6_QA_lut = v6_QA_lut[v6_QA_lut['Adjacent cloud detected'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Mixed Clouds'] == 'No' ]
v6_QA_lut = v6_QA_lut[v6_QA_lut['Possible shadow'] == 'No' ]
v6_QA_lut
Out[12]:
Value MODLAND VI Usefulness Aerosol Quantity Adjacent cloud detected Atmosphere BRDF Correction Mixed Clouds Land/Water Mask Possible snow/ice Possible shadow
5 2112 VI produced, good quality Higest quality Low No No No Land (Nothing else but land) No No
7 2116 VI produced, good quality Lower quality Low No No No Land (Nothing else but land) No No
9 2177 VI produced, but check other QA Higest quality Average No No No Land (Nothing else but land) No No
10 2181 VI produced, but check other QA Lower quality Average No No No Land (Nothing else but land) No No
12 2185 VI produced, but check other QA Decreasing quality (0010) Average No No No Land (Nothing else but land) No No
64 4160 VI produced, good quality Higest quality Low No No No Ocean coastlines and lake shorelines No No
66 4164 VI produced, good quality Lower quality Low No No No Ocean coastlines and lake shorelines No No
68 4225 VI produced, but check other QA Higest quality Average No No No Ocean coastlines and lake shorelines No No
69 4229 VI produced, but check other QA Lower quality Average No No No Ocean coastlines and lake shorelines No No
71 4233 VI produced, but check other QA Decreasing quality (0010) Average No No No Ocean coastlines and lake shorelines No No
104 6208 VI produced, good quality Higest quality Low No No No Shallow inland water No No
106 6212 VI produced, good quality Lower quality Low No No No Shallow inland water No No
108 6273 VI produced, but check other QA Higest quality Average No No No Shallow inland water No No
110 6277 VI produced, but check other QA Lower quality Average No No No Shallow inland water No No
112 6281 VI produced, but check other QA Decreasing quality (0010) Average No No No Shallow inland water No No

6. Setting Custom QA/QC Filtering (Masking)¶

This section demonstrates how to take the list of possible values (based on our quality requirements) above, and mask the EVI and quality arrays to only include pixels where the quality value is in the list of 'good quality' values based on the parameters from the case study.¶

In [13]:
# Print list of possible QA values based on parameters above
v6_QA_mask = list(v6_QA_lut['Value'])
del v6_QA_lut

v6_QA_mask
Out[13]:
[2112,
 2116,
 2177,
 2181,
 2185,
 4160,
 4164,
 4225,
 4229,
 4233,
 6208,
 6212,
 6273,
 6277,
 6281]

So above, only pixels with those specific _1_km_16_days_VI_Quality values will be included after masking.¶

Below, mask the first observation (index 0) from the EVI array. Notice that invert is set to True here, meaning that you don't want to exclude the list of values above, but rather exclude every other value.¶

In [14]:
# Apply QA mask to the EVI data 
v6_EVI_masked = np.ma.MaskedArray(v6_EVI[0], np.in1d(v6_QA[0], v6_QA_mask, invert = True))

7. Visualizing NetCDF-4 Data¶

In this section, begin by highlighting the functionality of matplotlib plotting. First, make a basic plot of the EVI data. Next, add some additional parameters to the plot. Lastly, export the finished plot to an image file.¶

In [15]:
import matplotlib.pyplot as plt

# Visualize a basic plot
plt.imshow(v6_EVI[0]); # Adding ';' to the end of the call suppresses the text output from showing in the notebook 

Above is a basic matplotlib plot of the EVI array over the Amazon basin.¶

Next, add some additional parameters to the visualization.¶

In [16]:
# Set the figure size
plt.figure(figsize = (10,10))

# Remove the axes' values
plt.axis('off')

# Plot the original data, using a colormap and setting a custom linear stretch based on the min/max EVI values
plt.imshow(v6_EVI[0], vmin = v6_EVI[0].min(), vmax = v6_EVI[0].max(), cmap = 'YlGn');