Working with Daily NASA VIIRS Surface Reflectance Data


Objective:

This tutorial demonstrates how to work with the daily S-NPP NASA VIIRS Surface Reflectance (VNP09GA.001) data product.

The Land Processes Distributed Active Archive Center (LP DAAC) distributes the NASA-produced surface reflectance data products from the Visible Infrared Imaging Radiometer Suite (VIIRS) sensor aboard the joint NOAA/NASA Suomi-National Polar-orbiting Partnership (S-NPP) satellite. The S-NPP NASA VIIRS Surface Reflectance products are archived and distributed in the HDF-EOS5 file format. The Hierarchical Data Format version 5 (HDF5) is a NASA selected format of choice for standard science product archival and distribution and is the underlying format for HDF-EOS5. HDF-EOS is the standard format and I/O library for the Earth Observing System (EOS) Data and Information System (EOSDIS). HDF-EOS5 extends the capabilities of the HDF5 storage format, adding support for the various EOS data types (e.g., point, swath, grid) into the HDF5 framework.

In this tutorial, you will use Python to define the coordinate reference system (CRS) and export science dataset (SDS) layers as GeoTIFF files that can be loaded into a GIS and/or Remote Sensing software program.


Example: Converting VNP09GA files into quality-filtered vegetation indices and examining the trend in vegetation greenness during July 2018 to observe drought in Europe.

Data Used in the Example:

  • Data Product: VIIRS/NPP Surface Reflectance Daily L2G Global 1km and 500m SIN Grid Version 1 (VNP09GA.001)
    • Science Dataset (SDS) layers:
      • SurfReflect_M3_1
      • SurfReflect_M4_1
      • SurfReflect_M5_1
      • SurfReflect_M7_1

Topics Covered:

  1. Getting Started
    1a. Import Packages
    1b. Set Up the Working Environment
    1c. Retrieve Files
  2. Importing and Interpreting Data
    2a. Open a VIIRS HDF5-EOS File and Read File Metadata
    2b. Subset SDS Layers and read SDS Metadata
  3. Generating an RGB Composite
    3a. Apply a Scale Factor
    3b. Create an Image
  4. Quality Filtering
    4a. Decode Bit Values
    4b. Apply Quality and Land/Water Masks
  5. Functions
    5a. Defining Functions
    5b. Executing Functions
  6. Visualizing Results
    6a. Plot Quality-Filtered VIs
    6b. Scatterplot Comparison of NDVI vs. EVI
  7. Georeferencing
  8. Exporting Results
    8a. Set Up a Dictionary
    8b. Export Masked Arrays as GeoTIFFs
  9. Working with Time Series
    9a. Automation of Steps 2-5, 7-8
    9b. Time Series Analysis
    9c. Time Series Visualization

Prerequisites:

A NASA Earthdata Login account is required to download the data used in this tutorial. You can create an account at the link provided.


Procedures:

NOTE: This tutorial was developed for NASA VIIRS Daily Surface Reflectance HDF-EOS5 files and should only be used for the VNP09GA.001 product.


Python Environment Setup

2. Setup

  • Using your preferred command line interface (command prompt, terminal, cmder, etc.) type the following to successfully create a compatible python environment:
    • conda create -n viirstutorial -c conda-forge --yes python=3.7 h5py numpy pandas matplotlib scikit-image gdal
    • conda activate viirstutorial
    • jupyter notebook

If you do not have jupyter notebook installed, you may need to run:

  • conda install jupyter notebook

TIP: Having trouble activating your environment, or loading specific packages once you have activated your environment? Try the following:

Type: 'conda update conda'

If you prefer to not install Conda, the same setup and dependencies can be achieved by using another package manager such as pip and the requirements.txt file listed above.
Additional information on setting up and managing Conda environments.

Still having trouble getting a compatible Python environment set up? Contact LP DAAC User Services.


1. Getting Started

1a. Import Packages

Import the python packages required to complete this tutorial.

In [ ]:
import h5py, os
import numpy as np
import matplotlib.pyplot as plt
from osgeo import gdal, gdal_array
import datetime as dt
import pandas as pd
from skimage import exposure

1b. Set Up the Working Environment

Define the input directory containing the VNP09GA.001 files, and create an output directory to store the results from this tutorial.

Change the inDir variable below to the path of the directory where your data is stored (e.g., C:/Data/VIIRS/VNP09A1/).

In [ ]:
inDir = input("Main input directory: ")                                # Update to dir on your OS containing VIIRS files
os.chdir(inDir)                                                         # Change to working directory
outDir = os.path.normpath(os.path.split(inDir)[0]+os.sep+'output')+'\\' # Set output directory
if not os.path.exists(outDir): os.makedirs(outDir)                      # Create output directory

If you plan to execute this tutorial on your own OS, `inDir` above needs to be changed.


1c. Retrieve Files

In this section, a VNP09GA .h5 file has been downloaded to the inDir defined above.

Make sure that you download the file into the inDir directory defined above to follow along in the tutorial.

In [ ]:
inFile = 'VNP09GA.A2018183.h18v03.001.2018184074906.h5' # File used in example
outName = inFile.rsplit('.', 1)[0]                      # Parse out the file extension, keep file name
outName

The standard format for VIIRS filenames is as follows:

VNP09GA: Product Short Name
A2018183: Julian Date of Acquisition (AYYYYDDD)
h18v03: Tile Identifier (horizontalXXverticalYY)
001: Collection Version
2018184074906: Julian Date of Production (YYYYDDDHHMMSS)

Below, take the Julian Date of Acquisition and convert to calendar date using the datetime package.

In [ ]:
yeardoy = inFile.split('.')[1][1:]                                  # Split filename to retrieve acquisition date
date = dt.datetime.strptime(yeardoy, '%Y%j').strftime('%m/%d/%Y')   # Convert YYYYDDD to MM/DD/YYYY
date

2. Importing and Interpreting Data

2a. Open a VIIRS HDF-EOS5 File and Read File Metadata

Read in a VNP09GA HDF-EOS5 file using the h5py package.

In [ ]:
f = h5py.File(inFile, "r") # Read in VIIRS HDF-EOS5 file and set to variable
In [ ]:
list(f.keys()) # List groups in file

The VNP09GA HDF-EOS5 file contains two groups in which data and metadata are stored.

First, the HDFEOS group contains the VNP09GA SDS layers.

In [ ]:
list(f['HDFEOS']) # List contents of HDFEOS group

Continue into the GRIDS directory.

In [ ]:
grids = list(f['HDFEOS']['GRIDS']) # List contents of GRIDS directory
grids

TheVNP09GA data product provides data for three imagery bands (I1, I2, I3) at nominal 500 meter resolution (~ 463 meter) and nine moderate-resolution bands (M1, M2, M3, M4, M5, M7, M8, M10, M11) at nominal 1 kilometer (km) (~926 meter) resolution. This example uses the nominal 1 km M bands.

In [ ]:
list(f['HDFEOS']['GRIDS']['VNP_Grid_1km_2D']) # List contents of 1km directory

Below, list the SDS contained in the HDF-EOS5 file at 1 km resolution.

In [ ]:
list(f['HDFEOS']['GRIDS']['VNP_Grid_1km_2D']['Data Fields']) # List Data Fields

Second, the HDFEOS INFORMATION group contains the metadata for the file. The StructMetadata.0 object stores the file metadata.

In [ ]:
list(f['HDFEOS INFORMATION']) # List contents of HDFEOS group

Below, read the file metadata and store as a list.

In [ ]:
fileMetadata = f['HDFEOS INFORMATION']['StructMetadata.0'][()].split()   # Read file metadata
fileMetadata = [m.decode('utf-8') for m in fileMetadata]                 # Clean up file metadata
fileMetadata[0:33]                                                       # Print a subset of the entire file metadata record

Identify all the objects in the VNP09GA HDF-EOS5 file below.

In [ ]:
h5_objs = []            # Create empty list
f.visit(h5_objs.append) # Walk through directory tree, retrieve objects and append to list
h5_objs

2b. Subset SDS Layers and read SDS Metadata

Identify and generate a list of all the SDS layers in the HDF-EOS5 file.

In [ ]:
# Search for SDS with 1km or 500m grid
all_datasets = [obj for grid in grids for obj in h5_objs if isinstance(f[obj],h5py.Dataset) and grid in obj] 
all_datasets

Read in the near infrared (SurfReflect_M7), red (SurfReflect_M5), green (SurfReflect_M4), and blue (SurfReflect_M3) SDS layers from the HDF-EOS5 file. These layers will be used later in the tutorial to calculate daily vegetation indices.

In [ ]:
r = f[[a for a in all_datasets if 'M5' in a][0]] # M5 = Red
g = f[[a for a in all_datasets if 'M4' in a][0]] # M4 = Green
b = f[[a for a in all_datasets if 'M3' in a][0]] # M3 = Blue
n = f[[a for a in all_datasets if 'M7' in a][0]] # M7  = NIR

List the attributes for one of the SDS.

In [ ]:
list(r.attrs) # List attributes

Extract the scale factor and fill value from the SDS metadata.

In [ ]:
scaleFactor = r.attrs['Scale'][0]    # Set scale factor to a variable
fillValue = r.attrs['_FillValue'][0] # Set fill value to a variable

3. Generating an RGB Composite

3a. Apply a Scale Factor

Extract the data for each SDS and apply the scale factor defined in the previous section. The scale factor is the same for all surface reflectance M-bands, so you can simply use the scale factor from the red band for the other bands.

In [ ]:
red = r[()] * scaleFactor
green = g[()] * scaleFactor
blue = b[()] * scaleFactor
nir = n[()] * scaleFactor 

Create an RGB array using np.dstack() and set the fill values equal to nan.

In [ ]:
rgb = np.dstack((red,green,blue))            # Create RGB array
rgb[rgb == fillValue * scaleFactor] = 0      # Set fill value equal to 0

3b. Create an Image

Before filtering the data for poor quality observations such as clouds, visualize the non-filtered RGB image using the matplotlib package.

In [ ]:
# Set plots inside of the Jupyter Notebook
%matplotlib inline 

Below, use the skimage package's exposure module (http://scikit-image.org/docs/dev/api/skimage.exposure.html) to apply a contrast stretch and gamma correction to the RGB composite for a better looking image.

In [ ]:
p2, p98 = np.percentile(rgb, (2, 98))                              # Calculate 2nd,98th percentile for updating min/max vals
rgbStretched = exposure.rescale_intensity(rgb, in_range=(p2, p98)) # Perform contrast stretch on RGB range
rgbStretched = exposure.adjust_gamma(rgbStretched, 0.5)            # Perform Gamma Correction

Note: stretching the array above inherently changes the values of the array, and those values should be used for visualization purposes only.

Take the stretched RGB composite and plot the results.

In [ ]:
    fig = plt.figure(figsize =(10,10))                           # Set the figure size
    ax = plt.Axes(fig,[0,0,1,1]) 
    ax.set_axis_off()                                            # Turn off axes
    fig.add_axes(ax)
    ax.imshow(rgbStretched, interpolation='bilinear', alpha=0.9) # Plot a natural color RGB
    fig.savefig('{}{}_RGB.png'.format(outDir, outName))          # Export natural color RGB as jpeg

Above, the natural color RGB composite shows some cloud cover over VIIRS tile h18v03. This tile covers parts of England, France, Belgium, the Netherlands, Germany, Poland, Denmark, Norway, and Sweden. Notice the dry conditions (tan-brown) particularly evident over Denmark, Germany, and Poland.

In [ ]:
del rgbStretched # Remove the rgb array (no longer needed)

4. Quality Filtering

This section demonstrates how to decode bit values, determine which values to keep and which values to mask, and how to apply masks to the VNP09GA data arrays. Begin by importing the Quality Flag 5 from the VNP09GA HDF-EOS5 file.

In [ ]:
qf = f[[a for a in all_datasets if 'QF5' in a][0]][()] # Import QF5 SDS

Surface Reflectance Quality Flag QF5 contains an overall quality determination (good/bad) for each 1 km surface reflectance band (M1-M7).

Table 14. Surface Reflectance Quality Flags (QF5).

Above are the bit field mappings for Quality Flag layer 5. This table can also be found in the VIIRS SR Version 1.6 User Guide under Section 3.3, Table 14 (pp. 34-35). Below, use the 4th, 5th, 6th, and 7th bits to assure that the overall quality for the bands used in this example (M3-5, M7) are set to 0, or Good.

Note: Remember that bits are read from right to left (bit 0 is the Least Significant Bit).


4a. Decode Bit Values

Below, set the number of bits and create a list of all possible combinations (bit values). Based on the table above, only include pixels where the QF5 layer values are equal to 0 for bit numbers 4-7, meaning that the overall quality for bands M3-M5 and M7 are set to Good.

In [ ]:
bits = 8                        # Define number of bits
vals = list(range(0,(2**bits))) # Generate a list of all possible bit values 
goodQF = []                     # Create an empty list used to store bit values where bits 4-7 = 0

Next, loop through all possible values, convert numeric to binary, find values where bits 4-7 = 0000, and add them to the list of good values.

In [ ]:
for v in vals:  
    bitVal = format(vals[v],'b').zfill(bits)             # Convert to binary based on values and # of bits defined above:
    if bitVal[0:4] == '0000':                            # Keep if bit-words 5-8 to all = 0
        goodQF.append(vals[v])                           # Append to list
        print('\n' + str(vals[v]) + ' = ' + str(bitVal)) # print good quality values

Notice above, all of the values determined to be good quality contain 0000 reading left to right, which represent bits 4-7.


4b. Apply Quality and Land/Water Masks

Use np.ma.MaskedArray() and the list of good quality values to apply a QA mask to the surface reflectance data from the QF5 layer.

In [ ]:
# Apply mask to each SR SDS based on the qf layer and the list of good quality values generated above
red = np.ma.MaskedArray(red, np.in1d(qf, goodQF, invert = True))    
green = np.ma.MaskedArray(green, np.in1d(qf, goodQF, invert = True))    
blue = np.ma.MaskedArray(blue, np.in1d(qf, goodQF, invert = True))    
nir = np.ma.MaskedArray(nir, np.in1d(qf, goodQF, invert = True))    

Below, run through the same steps as above, this time using the Surface Reflectance Quality Flag QF2, which contains a land/water mask that can be used to mask out water pixels in the surface reflectance arrays. More information on QF2 can be found in Section 3.3, Table 11 (pp. 32-33) in the VIIRS SR Version 1.6 User Guide.

In [ ]:
qf2 = f[[a for a in all_datasets if 'QF2' in a][0]][()]   # Import QF5 SDS
land = []                                                 # Create an empty list used to store bit values classified as land

Below, exclude each value where bits 0-2 are classified as:

010: Inland water
011: Sea Water

In [ ]:
for v in vals:  
    bitVal = format(vals[v],'b').zfill(bits)           # Convert to binary based on values and # of bits defined above:
    if bitVal[-3:] != '010' and bitVal[-3:] != '011' : # Keep all values NOT equal to inland or sea water
        land.append(vals[v])                           # Append to list
        
# Apply mask to each SR SDS based on the qf2 layer and the list of land values generated above
red = np.ma.MaskedArray(red, np.in1d(qf2, land, invert = True))    
green = np.ma.MaskedArray(green, np.in1d(qf2, land, invert = True))    
blue = np.ma.MaskedArray(blue, np.in1d(qf2, land, invert = True))    
nir = np.ma.MaskedArray(nir, np.in1d(qf2, land, invert = True))    

5. Functions

5a. Defining Functions

Below, define two functions for calculating the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI).

In Python, to create a function, first define (def) the function which includes a name ndviCalc() and add variable names for the inputs needed to be passed into the function (red, nir). Adding return on the next line means that every time that the ndviCalc() function is executed, it will return the results of the equation ((nir - red)/(nir + red)) based on the input provided (red, nir).

In [ ]:
# Define a function to calculate NDVI
def ndviCalc(red, nir):
    return ((nir - red)/(nir + red))

# Define a function to calculate EVI
def eviCalc(red, nir, blue):
    return (2.5 * (nir - red)/(nir + 6 * red - 7.5 * blue + 1))  

5b. Executing Functions

Next, apply the functions to the masked surface reflectance data by passing the bands needed to calculate each Vegetation Index.

In [ ]:
ndvi = ndviCalc(red, nir)                                   # Calculate NDVI from red/NIR bands
evi = eviCalc(red, nir, blue)                               # Calculate EVI from red, NIR, and blue bands
ndvi[np.where(np.logical_or(ndvi < 0, ndvi > 1)) ] = np.nan # Exclude VIs outside of range
evi[np.where(np.logical_or(evi < 0, evi > 1)) ] = np.nan    # Exclude VIs outside of range

6. Visualizing Results

6a. Plot Quality-Filtered VIs

In this section, begin by highlighting the functionality of the matplotlib plotting package. First, make basic plots of the NDVI and EVI results. Next, combine both plots into a single image and add some additional parameters to the plot. Finally, export the completed plot to an image file.

In [ ]:
fig = plt.figure(figsize=(10,10)) # Set the figure size (x,y)
fig.set_facecolor("black")        # Set the background color to black
plt.axis('off')                   # Remove axes from plot
plt.imshow(ndvi, cmap = 'YlGn');  # Plot the array using a color map

Above, adding ; to the end of the call suppresses the text output from appearing in the notebook. The output is a basic matplotlib plot of the quality filtered, water masked NDVI array centered over Germany and Denmark in Europe. Below, create the same plot for EVI.

In [ ]:
fig = plt.figure(figsize=(10,10)) # Set the figure size (x,y)
fig.set_facecolor("black")        # Set the background color to black
plt.axis('off')                   # Remove axes from plot
plt.imshow(evi, cmap = 'YlGn');   # Plot the array using a color map

Next, add some additional parameters to the visualization and combine both plots into a single figure. Export the figure as a .png file to the output directory created earlier in the tutorial.

In [ ]:
t = 'VIIRS Vegetation Indices'                                                           # Set title to a variable
figure, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 8), sharey=True, sharex=True) # Set subplots with 1 row & 2 columns
axes[0].axis('off'), axes[1].axis('off')                                                 # Turn off axes' values
figure.set_facecolor("black")                                                            # Set the background color to black
figure.suptitle('{}: {}'.format(t, date), fontsize=24, fontweight='bold', color="white") # Add a title for the plots
axes[0].set_title('NDVI', fontsize=18, fontweight='bold', color="white")                 # Set the first subplot title
axes[1].set_title('EVI', fontsize=18, fontweight='bold', color="white")                  # Set the second subplot title
axes[0].imshow(ndvi, vmin=0.1, vmax=0.9, cmap='YlGn');                                   # Plot original data
axes[1].imshow(evi, vmin=0.1, vmax=0.9, cmap='YlGn');                                    # Plot Quality-filtered data
figure.savefig('{}VIIRS_VIs.png'.format(outDir), facecolor = "black")                    # Set up filename, export to png 

6b. Scatterplot Comparison of NDVI vs. EVI

Plot all NDVI vs. EVI values for the quality filtered, land-only data derived from VIIRS Surface Reflectance on July 2, 2018.

In [ ]:
t = 'VIIRS Surface Reflectance-derived EVI vs. NDVI'                    # Set title string to variable
figure, axes = plt.subplots(figsize=(10,8))                             # Set the figure size
figure.suptitle('{}: {}'.format(t, date),fontsize=22,fontweight='bold') # Add a title
figure.set_facecolor("white")                                           # Set background color to white
axes.set_ylabel('EVI',fontsize=18,fontweight='bold')                    # Set y axis labels
axes.set_xlabel('NDVI',fontsize=18,fontweight='bold'),                  # Set x axis labels
axes.scatter(ndvi,evi, alpha=1, c='darkgreen', s=1);                    # Plot values for NDVI and EVI

7. Georeferencing

In this section, identify the upper-left longitude and latitude coordinates from the metadata, the output pixel size, and the proj.4 string that will later be used to export the quality filtered, land-only VIIRS Vegetation Indices data that were created earlier in the tutorial as GeoTIFFs.

In [ ]:
ulc = [i for i in fileMetadata if 'UpperLeftPointMtrs' in i][0]    # Search file metadata for the upper left corner of the file
ulcLon = float(ulc.split('=(')[-1].replace(')', '').split(',')[0]) # Parse metadata string for upper left corner lon value
ulcLat = float(ulc.split('=(')[-1].replace(')', '').split(',')[1]) # Parse metadata string for upper left corner lat value

Currently, VIIRS HDF-EOS5 files do not contain information regarding the spatial resolution of the dataset within, so here set to the standard nominal resolution of 1 km for this product.

In [ ]:
yRes, xRes = -926.6254330555555,  926.6254330555555 # Define the x and y resolution   
geoInfo = (ulcLon, xRes, 0, ulcLat, 0, yRes)        # Define geotransform parameters

The VNP09GA.001 product is referenced to a projected Sinusoidal coordinate system identical to the one used for the Moderate Resolution Imaging Spectroradiometer (MODIS) collection, defined below.

In [ ]:
prj = 'PROJCS["unnamed",\
GEOGCS["Unknown datum based upon the custom spheroid", \
DATUM["Not specified (based on custom spheroid)", \
SPHEROID["Custom spheroid",6371007.181,0]], \
PRIMEM["Greenwich",0],\
UNIT["degree",0.0174532925199433]],\
PROJECTION["Sinusoidal"], \
PARAMETER["longitude_of_center",0], \
PARAMETER["false_easting",0], \
PARAMETER["false_northing",0], \
UNIT["Meter",1]]'
print(prj)

8. Exporting Results

8a. Set Up a Dictionary

In this section, create a dictionary containing each of the arrays that will be exported as GeoTIFFS, including the Surface Reflectance bands, the derived vegetation indices, and quality flag layers used above and define a band name for each that will be used to set the output filename.

In [ ]:
params = {'red':{'data':red, 'band': 'M5'}, 'green':{'data':green, 'band': 'M4'},'blue':{'data':blue, 'band': 'M3',},
          'ndvi':{'data':ndvi, 'band': 'NDVI'},'evi':{'data':evi, 'band': 'EVI'}, 'qf':{'data':qf, 'band': 'QF'},
          'qf2':{'data':qf2, 'band': 'QF2'}, 'rgb':{'data':rgb, 'band': 'RGB'}}

8b. Export Masked Arrays as GeoTIFFs

Now that the data have been imported, subset, filtered, and functions have been executed, steps to export the results as masked GeoTIFFs using a for loop are provided in this section.

In [ ]:
# Loop through each item in dictionary created above
for p in params:
    try: 
        data = params[p]['data']                                                            # Define the array to be exported
        data[data.mask == True] = fillValue                                                 # Set masked values = to fill value
    except: AttributeError
    outputName = os.path.normpath('{}{}_{}.tif'.format(outDir, outName, params[p]['band'])) # Generate output filename
    nRow, nCol = data.shape[0], data.shape[1]                                               # Define rows/cols from array size
    dataType = gdal_array.NumericTypeCodeToGDALTypeCode(data.dtype)                         # Define output data type
    driver = gdal.GetDriverByName('GTiff')                                                  # Select GDAL GeoTIFF driver
    if p == 'rgb':                                                                          # Diff process for exporting RGB
        data = params[p]['data']                                                            # Define the array to be exported
        dataType = gdal_array.NumericTypeCodeToGDALTypeCode(data.dtype)                     # Define output data type
        options = ['PHOTOMETRIC=RGB', 'PROFILE=GeoTIFF']                                    # Set options to RGB TIFF
        outFile = driver.Create(outputName, nCol, nRow, 3, dataType, options=options)       # Specify parameters of the GeoTIFF
        for b in range(data.shape[2]):                                                      # loop through each band (3)
            outFile.GetRasterBand(b+1).WriteArray(rgb[:,:,b])                               # Write to output bands 1-3
            outFile.GetRasterBand(b+1).SetNoDataValue(0)                                    # Set fill value for each band
        outFile.GetRasterBand(1).SetColorInterpretation(gdal.GCI_RedBand)                   # Define red band
        outFile.GetRasterBand(2).SetColorInterpretation(gdal.GCI_GreenBand)                 # Define green band
        outFile.GetRasterBand(3).SetColorInterpretation(gdal.GCI_BlueBand)                  # Define blue band
    else:
        outFile = driver.Create(outputName, nCol, nRow, 1, dataType)                        # Specify parameters of the GeoTIFF
        band = outFile.GetRasterBand(1)                                                     # Get band 1
        band.WriteArray(data)                                                               # Write data array to band 1
        band.FlushCache                                                                     # Export data
        band.SetNoDataValue(float(fillValue))                                               # Set fill value
    outFile.SetGeoTransform(geoInfo)                                                        # Set Geotransform
    outFile.SetProjection(prj)                                                              # Set projection
    outFile = None                                                                          # Close file
    print('Processing: {}'.format(outputName))                                              # Print the progress

9. Working with Time Series

Here, automate steps 2-5, 7-8 from above to get data for the entire month of July for visualization and statistical analysis of the progression of drought conditions in Europe.

The cell below will find all the .h5 files in that directory, create a list of all the file names, and print the results.

In [ ]:
fileList = [file for file in os.listdir() if file.endswith('.h5') and file.startswith('VNP09GA')] # Search for .h5 files in current directory
for f in fileList: print(f)                                                                       # Print files in list
In [ ]:
date = [] # Create empty list to store dates of each file
i = 0     # Set up iterator for automation in cell block below

9a. Automation of Steps 2-5, 7-8

This section provides steps to loop through each daily VNP09GA file from July 2018 and import the data, scale the data, apply a land/water mask and filter by quality, calculate VIs, georeference, and export as GeoTIFFs.

Ultimately, near the end of the code block below, append each file into a 3D array containing all NDVI observations for July 2018.

In [ ]:
for t in fileList:
    yeardoy = t.split('.')[1][1:]                                                                  # Split name,retrieve ob date
    outName = t.rsplit('.', 1)[0]                                                                  # Keep filename for outname
    date1 = dt.datetime.strptime(yeardoy,'%Y%j').strftime('%m/%d/%Y')                              # Convert date
    date.append(date1)                                                                             # Append to list of dates
    f = h5py.File(os.path.normpath(t),"r")                                                             # Read in VIIRS HDF-EOS5 file
    h5_objs = []                                                                                   # Create empty list
    f.visit(h5_objs.append)                                                                        # Retrieve obj append to list
    fileMetadata = f['HDFEOS INFORMATION']['StructMetadata.0'][()].split()                         # Read file metadata
    fileMetadata = [m.decode('utf-8') for m in fileMetadata]                                       # Clean up file metadata
    allSDS = [o for grid in grids for o in h5_objs if isinstance(f[o],h5py.Dataset) and grid in o] # Create list of SDS in file
    r = f[[a for a in allSDS if 'M5' in a][0]]                                                     # Open SDS M5 = Red
    g = f[[a for a in allSDS if 'M4' in a][0]]                                                     # Open SDS M4 = Green
    b = f[[a for a in allSDS if 'M3' in a][0]]                                                     # Open SDS M3 = Blue
    n = f[[a for a in allSDS if 'M7' in a][0]]                                                     # Open SDS M7  = NIR
    red = r[()] * scaleFactor                                                                      # Apply scale factor
    green = g[()] * scaleFactor                                                                    # Apply scale factor
    blue = b[()] * scaleFactor                                                                     # Apply scale factor
    nir = n[()] * scaleFactor                                                                      # Apply scale factor
    rgb = np.dstack((red,green,blue))                                                              # Create RGB array
    rgb[rgb == fillValue * scaleFactor] = 0                                                        # Set fill value equal to nan
    qf = f[[a for a in allSDS if 'QF5' in a][0]][()]                                               # Import QF5 SDS
    red = np.ma.MaskedArray(red, np.in1d(qf, goodQF, invert = True))                               # Apply QF mask to red data
    green = np.ma.MaskedArray(green, np.in1d(qf, goodQF, invert = True))                           # Apply QF mask to green data
    blue = np.ma.MaskedArray(blue, np.in1d(qf, goodQF, invert = True))                             # Apply QF mask to blue data
    nir = np.ma.MaskedArray(nir, np.in1d(qf, goodQF, invert = True))                               # Apply QF mask to NIR data
    qf2 = f[[a for a in allSDS if 'QF2' in a][0]][()]                                              # Import QF2 SDS                                                                  # Append to list
    red = np.ma.MaskedArray(red, np.in1d(qf2, land, invert = True))                                # Apply QF mask to red data
    green = np.ma.MaskedArray(green, np.in1d(qf2, land, invert = True))                            # Apply QF mask to green data
    blue = np.ma.MaskedArray(blue, np.in1d(qf2, land, invert = True))                              # Apply QF mask to blue data
    nir = np.ma.MaskedArray(nir, np.in1d(qf2, land, invert = True))                                # Apply QF mask to NIR data
    ndvi = ndviCalc(red, nir)                                                                      # Calculate NDVI
    evi = eviCalc(red, nir, blue)                                                                  # Calculate EVI
    ndvi[np.where(np.logical_or(ndvi < 0, ndvi > 1)) ] = np.nan                                    # Set fill value equal to nan
    evi[np.where(np.logical_or(evi < 0, evi > 1)) ] = np.nan                                     # Set fill value equal to nan
    p2, p98 = np.percentile(rgb, (2, 98))                                                          # Calc val at 2nd/98th % 
    rgbStretched = exposure.rescale_intensity(rgb, in_range=(p2, p98))                             # Contrast stretch RGB range
    rgbStretched = exposure.adjust_gamma(rgbStretched, 0.5)                                        # Perform Gamma Correction
    fig = plt.figure(figsize =(10,10))                                                             # Set the figure size
    ax = plt.Axes(fig,[0,0,1,1]) 
    ax.set_axis_off()                                                                              # Turn off axes
    fig.add_axes(ax)
    ax.imshow(rgbStretched, interpolation='bilinear', alpha=0.9)                                   # Plot a natural color RGB
    fig.savefig('{}{}_RGB.png'.format(outDir, outName))                                            # Export  RGB as png
    plt.close(fig)
    params = {'red':{'data':red, 'band': 'M5'}, 'green':{'data':green, 'band': 'M4'},              # Create dict for each layer
              'blue':{'data':blue, 'band': 'M3',}, 'ndvi':{'data':ndvi, 'band': 'NDVI'},
              'evi':{'data':evi, 'band': 'EVI'}, 'qf':{'data':qf, 'band': 'QF'},
              'qf2':{'data':qf2, 'band': 'QF2'}, 'rgb':{'data':rgb, 'band': 'RGB'}}
    for p in params:
        try: 
            data = params[p]['data']                                                               # Define array to be exported
            data[data.mask == True] = fillValue                                                    # Masked values = fill value
        except: AttributeError
        outputName = os.path.normpath('{}{}_{}.tif'.format(outDir, outName, params[p]['band']))    # Generate output filename
        nRow, nCol = data.shape[0], data.shape[1]                                                  # Define row/col from array
        dataType = gdal_array.NumericTypeCodeToGDALTypeCode(data.dtype)                            # Define output data type
        driver = gdal.GetDriverByName('GTiff')                                                     # Select GDAL GeoTIFF driver
        if p == 'rgb':                                                                             # Diff for exporting RGBs
            data = params[p]['data']                                                               # Define the array to export
            dataType = gdal_array.NumericTypeCodeToGDALTypeCode(data.dtype)                        # Define output data type
            options = ['PHOTOMETRIC=RGB', 'PROFILE=GeoTIFF']                                       # Set options to RGB TIFF
            outFile = driver.Create(outputName, nCol, nRow, 3, dataType, options=options)          # Specify parameters of GTIFF
            for b in range(data.shape[2]):                                                         # loop through each band (3)
                outFile.GetRasterBand(b+1).WriteArray(rgb[:,:,b])                                  # Write to output bands 1-3
                outFile.GetRasterBand(b+1).SetNoDataValue(0)                                       # Set fill val for each band
            outFile.GetRasterBand(1).SetColorInterpretation(gdal.GCI_RedBand)                      # Define red band
            outFile.GetRasterBand(2).SetColorInterpretation(gdal.GCI_GreenBand)                    # Define green band
            outFile.GetRasterBand(3).SetColorInterpretation(gdal.GCI_BlueBand)                     # Define blue band
        else:
            outFile = driver.Create(outputName, nCol, nRow, 1, dataType)                           # Specify parameters of GTIFF
            band = outFile.GetRasterBand(1)                                                        # Get band 1
            band.WriteArray(data)                                                                  # Write data array to band 1
            band.FlushCache                                                                        # Export data
            band.SetNoDataValue(float(fillValue))                                                  # Set fill value
        outFile.SetGeoTransform(geoInfo)                                                           # Set Geotransform
        outFile.SetProjection(prj)                                                                 # Set projection
        outFile = None                                                                             # Close file
    if i == 0:                                                                                     # Set up 3d raster stack
        ndviTS = ndvi[np.newaxis,:,:]                                                              # Add 3rd axis for stacking 
    else:
        ndvi  = ndvi[np.newaxis,:,:]                                                               # Add a third axis
        ndviTS = np.ma.append(ndviTS,ndvi, axis=0)                                                 # Append to 'master' 3d array
    print('Processed file: {} of {}'.format(i+1,len(fileList)))                                    # Print the progress
    i += 1                                                                                         # increase iterator by one

Notice above that the for loop also stacked the monthly observations into a three-dimensional array that we will use to generate time series visualizations in the final section.

In [ ]:
ndviTS = np.ma.MaskedArray(ndviTS, np.in1d(ndviTS, fillValue, invert = False)) # Mask fill values in NDVI time series (3d array)

9b. Time Series Analysis

This section provides steps for how to calculate basic statistics on the NDVI time series created above, export the results to a .csv file, and visualize the mean and standard deviation of NDVI over Central Europe to gain a better understanding of the progression of drought in the region.

First set a pandas dataframe to store the statistics.

In [ ]:
# Create dataframe with column names
stats_df = pd.DataFrame(columns=['File Name', 'Dataset', 'Date', 'Count', 'Minimum', 'Maximum', 'Range','Mean','Median',
                        'Upper Quartile', 'Lower Quartile', 'Upper 1.5 IQR', 'Lower 1.5 IQR', 'Standard Deviation', 'Variance'])
Statistics: The statistics calculated below include box plot statistics, such as the total number of values/pixels (n), the median, and the interquartile range (IQR) of the sample data for values within a feature from a selected date/layer. Also calculated are the `Lower 1.5 IQR`, which represent the lowest datum still within 1.5 IQR of the lower quartile, and the `Upper 1.5 IQR`, which represent the highest datum still within 1.5 IQR of the upper quartile. Additional statistics generated on each layer/observation/feature combination include min/max/range, mean, standard deviation, and variance.

Below, use Numpy functions to calculate statistics on the quality-masked, land-only data. The format( , '4f') command will ensure that the statistics are rounded to four decimal places.

In [ ]:
i = 0                                                                     # Set up iterator
for n in ndviTS:                                                          # Loop through each layer (day) in NDVI time series
    ndvi = n.data[~n.mask]                                                # Exclude poor quality and water (masked) pixels
    ndvi = ndvi[~np.isnan(ndvi)]                                          # Exclude nan pixel values
    n = len(ndvi)                                                         # Count of array
    min_val = float(format((np.min(ndvi)), '.4f'))                        # Minimum value in array
    max_val = float(format((np.max(ndvi)), '.4f'))                        # Maximum value in array
    range_val = (min_val, max_val)                                        # Range of values in array
    mean = float(format((np.mean(ndvi)), '.4f'))                          # Mean of values in array
    std = float(format((np.std(ndvi)), '.4f'))                            # Standard deviation of values in array
    var = float(format((np.var(ndvi)), '.4f'))                            # Variance of values in array
    median = float(format((np.median(ndvi)), '.4f'))                      # Median of values in array
    quartiles = np.percentile(ndvi, [25, 75])                             # 1st (25) & 3rd (75) quartiles of values in array 
    upper_quartile = float(format((quartiles[1]), '.4f'))                      
    lower_quartile = float(format((quartiles[0]), '.4f'))                      
    iqr = quartiles[1] - quartiles[0]                                     # Interquartile range
    iqr_upper = upper_quartile + 1.5 * iqr                                # 1.5 IQR of the upper quartile
    iqr_lower = lower_quartile - 1.5 * iqr                                # 1.5 IQR of the lower quartile
    top = float(format(np.max(ndvi[ndvi <= iqr_upper]), '.4f'))           # Highest datum within 1.5 IQR of upper quartile
    bottom = float(format(np.min(ndvi[ndvi>=iqr_lower]),'.4f'))           # Lowest datum within 1.5 IQR of lower quartile
    d = {'File Name':(fileList[i]),'Dataset': 'NDVI','Date': date[i],
         'Count':n,'Minimum':min_val,'Maximum':max_val,'Range':range_val, 
         'Mean':mean,'Median':median,'Upper Quartile':upper_quartile,
         'Lower Quartile':lower_quartile,'Upper 1.5 IQR':top, 
         'Lower 1.5 IQR':bottom,'Standard Deviation':std, 'Variance':var} 
    stats_df = stats_df.append(d, ignore_index=True)                      # Append the dictionary to the dataframe
    i = i + 1                                                             # Add one to iterator
In [ ]:
# Remove variables that are no longer needed
del d, n, min_val, max_val, range_val, mean, std, var, median, quartiles, upper_quartile 
del bottom,  axes, yeardoy, lower_quartile, iqr, iqr_upper, iqr_lower, top

Next, export the Pandas dataframe containing the statistics to a csv file.

In [ ]:
stats_df.to_csv('{}{}-Statistics.csv'.format(outDir, outName), index = False) # Export statistics to CSV

Use the previously calculated statistics to plot a time series of Mean NDVI for the month of July 2018. Taking the figure one step further, plot the standard deviation as a confidence interval around the mean NDVI as a time series.

In [ ]:
fig = plt.figure(1, figsize=(15, 10))                                                           # Set the figure size
fig.set_facecolor("white")                                                                      # Set the background color
dates = np.arange(0,len(ndviTS),1)                                                              # Arrange all dates
ax = fig.add_subplot(111)                                                                       # Create a subplot
ax.plot(stats_df['Mean'], 'k', lw=2.5, color='black')                                           # Plot as a black line
ax.plot(stats_df['Mean'], 'bo', ms=10, color='lightgreen')                                      # Plot as a green circle

# Add and subtract the std dev from the mean to form the upper and lower intervals for the plot, then plot linear regression
ax.fill_between(dates,stats_df.Mean-stats_df['Standard Deviation'],stats_df.Mean+stats_df['Standard Deviation'],color='#254117') 
ax.plot(np.unique(dates), np.poly1d(np.polyfit(dates, stats_df['Mean'], 1))(np.unique(dates)), lw=3.5, c='yellow'); 
ax.set_xticks((np.arange(0,len(fileList))))                                                     # Set the x ticks
ax.set_xticklabels(date, rotation=45,fontsize=12)                                               # Set the x tick labels
ax.set_yticks((np.arange(0.4,1.01, 0.1)))                                                       # Arrange the y ticks
ax.set_yticklabels(np.arange(0.4,1.01, 0.1).round(1),fontsize=12,fontweight='bold')                      # Set the Y tick labels
ax.set_xlabel('Date',fontsize=16,fontweight='bold')                                             # Set x-axis label
ax.set_ylabel("{}({})".format('NDVI', 'unitless'),fontsize=16,fontweight='bold')                # Set y-axis label
ax.set_title('VIIRS Surface Reflectance-derived NDVI: July 2018',fontsize=22,fontweight='bold') # Set title
fig.savefig('{}{}_{}_Mean_TS.png'.format(outDir, outName, 'NDVI'), bbox_inches='tight')         # Set up filename and export

Above, based on the linear regression trend line, it appears that NDVI is decreasing throughout the month of July 2018 over Germany, Denmark, and the surrounding countries. This could indicate that the drought conditions were persisting throughout the month.


9c. Time Series Visualization

Now import the imageio package to create a gif of the daily RGB composites from earlier in the tutorial to visually observe drought conditions taking place over the month of July 2018.

In [ ]:
import imageio                                                                 # Import package to create gif
gifList = [file for file in os.listdir(outDir) if file.endswith('RGB.png')]    # Search output dir for rgb png files 
images = []                                                                    # Create empty list
for filename in gifList:images.append(imageio.imread(outDir + filename))       # Append images to list
imageio.mimsave(outDir + 'VIIRS_RGB_July2018.gif', images, 'GIF', duration=1) # Export gif, edit speed using 'duration'

Lastly, visualize the gif created above in the Jupyter Notebook.

In [ ]:
from IPython.display import Image, display               # Import packages   
with open(outDir + 'VIIRS_RGB_July2018.gif','rb') as f:  # Open gif
    display(Image(data=f.read(), format='png'))          # Display gif

Note: If the gif won't display, try increasing the notebook data rate limit when starting a new jupyter notebook: `jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10`


Citations

  • Vermote, E., Franch, B., Claverie, M. (2016). VIIRS/NPP Surface Reflectance Daily L2G Global 1km and 500m SIN Grid V001 [Data set]. NASA EOSDIS Land Processes DAAC. doi: 10.5067/VIIRS/VNP09GA.001

Contact Information

Material written by Cole Krehbiel$^{1}$

Contact: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)
Website: https://lpdaac.usgs.gov/
Date last modified: 09-11-2018

$^{1}$Innovate! Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, 57198-001, USA. Work performed under USGS contract G15PD00467 for LP DAAC$^{2}$.

$^{2}$LP DAAC Work performed under NASA contract NNG14HH33I.