Using the AρρEEARS API in a Landsat ARD Workflow - Getting Started


Objective

The intent of this tutorial is to familiarize Landsat Analysis Ready Data (ARD) users with the AρρEEARS application programming interface (API) with demonstrations on how the API, and the services it provides, can be leveraged in an analysis workflow.


Use Case

This tutorial was developed using a real-world use case for a project being completed by the NASA DEVELOP Node at the Marshall Space Flight Center. NASA Develop is a program aimed at integrating NASA Earth observations with society to foster future innovation and cultivate the professionals of tomorrow by addressing diverse environmental issues today.

The example use case comes from a project titled, "Utilizing NASA Earth Observations to Assess Coastline Replenishment Initiatives and Shoreline Risk along Delaware's Coasts". The group is partnering with the Delaware Department of Natural Resources and Environmental Control, Division of Watershed Stewardship for this project. The goals for the project include to identify areas of current and potential shoreline loss along the coast of Delaware, assess the current restoration efforts, and create time-series coastline maps.

Example: Submit an AppEEARS area request for a portion of the Delaware coast along the Prime Hook National Wildlife Refuge to extract Landsat Analysis Ready Data for the years before and after Hurricane Sandy. The outputs will be used to generate false color composite time series to visualize changes to the coastline during the time period.

AρρEEARS Information

To access AρρEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/

For comprehensive documentation of the full functionality of the AρρEEARS API, please see the AρρEEARS API Documentation: https://lpdaacsvc.cr.usgs.gov/appeears/api/

Throughout the exercise, specific sections of the API documentation can be accessed by clicking the hyperlinked text.

Setup and Dependencies

It is recommended to use Conda, an environment manager to set up a compatible Python environment. Download Conda for your OS here: https://www.anaconda.com/download/. Once you have Conda installed, Follow the instructions below to successfully setup a Python environment on MacOS or Windows.

This Python Jupyter Notebook tutorial has been tested using Python versions 3.6, 3.6.6 and 3.7.

Conda was used to create the python environment.

  • Option 1: Download the environment yml file:
    • Open the environment.yml file and change the prefix (last line) to the directory on your OS where you want to create the environment (ex: C:\Username\Anaconda3\envs\ardtutorial) and save the environment file.
    • Using Command Prompt, Anaconda Prompt, Cmder, Terminal, or your preferred command line interface, navigate to the directory where you saved the environment.yml file.
    • Type conda env create -f environment.yml
    • Type activate ardtutorial
  • Option 2: Download each package separately
    • Windows OS or macOS conda create -n ardtutorial python=3.6
  • If you already had conda installed on your OS, it is recommended that you update to the latest version:
    conda update -n base -c defaults conda
  • Required Python packages were installed from the conda-forge channel. Installing packages from the conda-forge channel is done by adding conda-forge to your channels with:
    conda config --add channels conda-forge
  • Activate the newly created environment using the command: activate ardtutorial
  • Required packages needed for this exercise are listed below.
    • requests
      conda install requests
    • pandas
      conda install pandas
    • geopandas
      conda install geopandas
    • xarray
      conda install xarray
    • numpy
      conda install numpy
    • netcdf4
      conda install netcdf4
    • holoviews
      conda install holoviews
    • pyviz   NOTE - PyViz is installed using the pyviz channel not conda-forge.
      conda install -c pyviz hvplot

      If you encounter an issue downloading hvplot using conda, try pip install pyviz hvplot

Next, download the Jupyter Notebook and example shapefile to get started.


1. Getting Started

1.1 Login to AρρEEARS/Earthdata

To submit a request, you must first login to the AρρEEARS API using your Earthdata login credentials. We’ll use the getpass package to conceal our Earthdata login username and password. When executed, the code below will prompt you to enter your username followed by your password and store them as variables.

Import the required packages and set the input/working directory to run this Jupyter Notebook locally.

In [1]:
# Import required Python packages
import requests
import getpass
import time
import os
import cgi
import json
import pandas as pd
import geopandas as gpd
import xarray
import numpy as np
import hvplot.xarray
import holoviews as hv
In [2]:
# Set input directory, change working directory
inDir = os.getcwd() + os.sep  # Set input directory to the current working directory
os.chdir(inDir)               # Change to working directory

If you have downloaded the tutorial materials to a different directory than the Jupyter Notebook, `inDir` above needs to be changed.

To submit a request, you must first login to the AρρEEARS API. Use the getpass package to enter your NASA Earthdata login Username and Password. When prompted after executing the code block below, enter your username followed by your password.

In [3]:
# Enter Earthdata login credentials
username = getpass.getpass('Earthdata Username:')
password = getpass.getpass('Earthdata Password:')
Earthdata Username:········
Earthdata Password:········
In [4]:
API = 'https://lpdaacsvc.cr.usgs.gov/appeears/api/'  # Set the AρρEEARS API to a variable

Use the requests package to post your username and password. A successful login will provide you with a token to be used later in this tutorial to submit a request. For more information or if you are experiencing difficulties, please see the API Documentation.

In [5]:
# Insert API URL, call login service, provide credentials & return json
login_response = requests.post(f"{API}/login", auth=(username, password)).json() 
del username, password
login_response
Out[5]:
{'token_type': 'Bearer',
 'token': 'JC_MNTgi3qgv0635fRUeSArIDfEBzKmX0hW1W3u_zkQXoYxDZ3BFKQ5sN7twpYJbN73k27C66_XN3KzIvOe3wA',
 'expiration': '2019-10-18T17:51:50Z'}

Above, you should see a Bearer Token. The Bearer Token is needed to leverage the AρρEEARS API via HTTP request methods (e.g., POST and GET). Notice that this token will expire approximately 48 hours after being acquired.

In [6]:
# Assign the token to a variable
token = login_response['token']
head = {'Authorization': f"Bearer {token}"} 
head
Out[6]:
{'Authorization': 'Bearer JC_MNTgi3qgv0635fRUeSArIDfEBzKmX0hW1W3u_zkQXoYxDZ3BFKQ5sN7twpYJbN73k27C66_XN3KzIvOe3wA'}

2. Submit an Area Request

The Tasks service, among other things (see below), is used to submit requests (e.g., POST and GET) to the AρρEEARS system. Each call to the Tasks service is associated with your user account. Therefore, each of the calls to this service require an authentication token. The submit task API call provides a way to submit a new request. It accepts data via JSON, query string, or a combination of both. In the example below, we will compile a json and submit a request.

2.1 Import a shapefile

In this example, we are interested in a portion of the coastline of Delaware. We will use the Geopandas package to import a shapefile that contains the Prime Hook National Wildlife Refuge (with a 6 km buffer applied to the refuge boundaries). The shapefile was extracted from the United States Fish and Wildlife Service Cadastral Database and can be downloaded here.

In [7]:
primehookNWR = gpd.read_file('PrimeHookNWR_6kmBuffer.shp')  # Import shapefile using geopandas
primehookNWR.head()
Out[7]:
OBJECTID CMPXNAME ORGNAME ORGCODE IFWS LIT GISACRES DOCACRES MAXACRES APPTYPE ... RSL_TYPE FWSREGION LABELNAME CreatedBy CreatedDat ModifiedBy ModifiedDa SHAPE_Leng SHAPE_Area geometry
0 663.0 COASTAL DELAWARE NATIONAL WILDLIFE REFUGE COMPLEX PRIME HOOK NATIONAL WILDLIFE REFUGE 51560 630 PMH 10251.121256 None None 0 ... NWR 5 Prime Hook NWR R5 2019/03/29 18:50:17.000 R5 2019/03/29 18:50:17.000 0.673839 0.00431 POLYGON ((-75.36337470754277 38.81103856447815...

1 rows × 21 columns

Geopandas imports the shapefile in as a Geopandas GeoDataframe.

In [8]:
type(primehookNWR)
Out[8]:
geopandas.geodataframe.GeoDataFrame

Convert the Geopandas GeoDataframe into an object that has geojson structure. Use the method json.loads to make the conversion.

In [9]:
primehookNWR = json.loads(primehookNWR.to_json())
type(primehookNWR)
Out[9]:
dict

The primehookNWR variable is now a python dictionary that matches the geojson structure.

2.2 Compile the JSON payload to submit to AρρEEARS

Many of the required items needed in the AρρEEARS API request payload have multiple options. For example, AρρEEARS has several projections that can be selected for the output. We can use the AρρEEARS API to find out what projections are available. In this example, we are explicitly assigning our projection to the proj variable. To find out how to use the AρρEEARS API to list the available options for each parameter, check out the AρρEEARS API Tutorials produced by the LP DAAC.

In [10]:
task_name = 'Prime Hook NWR Coastline Time Series'    # User-defined name of the task
task_type = 'area'                                    # Type of task, area or point
proj = "albers_ard_conus"                             # Set output projection 
outFormat = 'netcdf4'                                 # Set output file format type
startDate = '01-01-2012'                              # Start of the date range for which to extract data: MM-DD-YYYY
endDate = '05-31-2013'                                # End of the date range for which to extract data: MM-DD-YYYY
recurring = False                                     # Specify True for a recurring date range

# Define the products and layers desired
prodLayer = [{
        "layer": "SRB3",
        "product": "CU_LE07.001"
      },
      {
        "layer": "SRB4",
        "product": "CU_LE07.001"
      },
      {
        "layer": "SRB5",
        "product": "CU_LE07.001"
      } ]

Compile the JSON to be submitted as an area request.

Notice that primehookNWR is inserted from the shapefile transformed to a geojson via the geopandas and json packages above in section 2.1.

In [11]:
task = {
    'task_type': task_type,
    'task_name': task_name,
    'params': {
         'dates': [
         {
             'startDate': startDate,
             'endDate': endDate
         }],
         'layers': prodLayer,
         'output': {
                 'format': {
                         'type': outFormat}, 
                         'projection': proj},
         'geo': primehookNWR,
    }
}
In [12]:
task
Out[12]:
{'task_type': 'area',
 'task_name': 'Prime Hook NWR Coastline Time Series',
 'params': {'dates': [{'startDate': '01-01-2012', 'endDate': '05-31-2013'}],
  'layers': [{'layer': 'SRB3', 'product': 'CU_LE07.001'},
   {'layer': 'SRB4', 'product': 'CU_LE07.001'},
   {'layer': 'SRB5', 'product': 'CU_LE07.001'}],
  'output': {'format': {'type': 'netcdf4'}, 'projection': 'albers_ard_conus'},
  'geo': {'type': 'FeatureCollection',
   'features': [{'id': '0',
     'type': 'Feature',
     'properties': {'APPTYPE': 0,
      'CMPXNAME': 'COASTAL DELAWARE NATIONAL WILDLIFE REFUGE COMPLEX',
      'COMMENTS': None,
      'CreatedBy': 'R5',
      'CreatedDat': '2019/03/29 18:50:17.000',
      'DOCACRES': None,
      'FWSREGION': '5',
      'GISACRES': 10251.1212562,
      'IFWS': '630',
      'LABELNAME': 'Prime Hook NWR',
      'LIT': 'PMH',
      'MAXACRES': None,
      'ModifiedBy': 'R5',
      'ModifiedDa': '2019/03/29 18:50:17.000',
      'OBJECTID': 663.0,
      'ORGCODE': 51560,
      'ORGNAME': 'PRIME HOOK NATIONAL WILDLIFE REFUGE',
      'RSL_TYPE': 'NWR',
      'SHAPE_Area': 0.004309666381901,
      'SHAPE_Leng': 0.673838515118319},
     'geometry': {'type': 'Polygon',
      'coordinates': [[[-75.36337470754277, 38.81103856447815],
        [-75.36684623839793, 38.814108983317865],
        [-75.3673902689009, 38.814685229317845],
        [-75.37213118078465, 38.82008155061829],
        [-75.37284602989138, 38.820961841618285],
        [-75.37292676194586, 38.82109893561042],
        [-75.38066020861933, 38.827932193520105],
        [-75.39219500392865, 38.84338449352398],
        [-75.39629008173304, 38.85706813639797],
        [-75.39629181975268, 38.85717690239801],
        [-75.3957166937612, 38.86016931314761],
        [-75.39560970457556, 38.86153318416018],
        [-75.39533229692118, 38.86313141416013],
        [-75.39516125646422, 38.8630600595839],
        [-75.39495484732245, 38.86413450064249],
        [-75.39452423754125, 38.865111843642474],
        [-75.39394349009319, 38.864842895005374],
        [-75.392281229942, 38.87043549542343],
        [-75.38065240158467, 38.87764473640972],
        [-75.37765249667939, 38.878582856409714],
        [-75.37565666108746, 38.87908165806733],
        [-75.37993532030686, 38.88538957223554],
        [-75.38091881903999, 38.89290373897693],
        [-75.38752740166262, 38.90230192270185],
        [-75.38865193805547, 38.90864871779438],
        [-75.39664573522931, 38.91770508787822],
        [-75.40309960193864, 38.933748308413904],
        [-75.39951192355599, 38.946177978529846],
        [-75.39541068178005, 38.95101910705229],
        [-75.3952595221568, 38.95121471318857],
        [-75.39505501708528, 38.95147935753647],
        [-75.39174520950266, 38.955862326060846],
        [-75.39160983610694, 38.95604157818668],
        [-75.39149029069716, 38.95620713473624],
        [-75.38044293645724, 38.96372041087951],
        [-75.36218992586565, 38.96637644517485],
        [-75.338461593418, 38.96392376673482],
        [-75.31151062746424, 38.956594592901276],
        [-75.28844891186671, 38.946980985228315],
        [-75.28487729899736, 38.94559764215446],
        [-75.28407271165516, 38.945156309057126],
        [-75.28389652142155, 38.94508284312066],
        [-75.28276784365292, 38.94444054459322],
        [-75.25673153738597, 38.93015437556376],
        [-75.24936007013217, 38.92457781835633],
        [-75.24900916991201, 38.92468234061907],
        [-75.24693765332951, 38.92323838561905],
        [-75.23017273069657, 38.90999253006259],
        [-75.22901241264326, 38.90894094306264],
        [-75.22388016970439, 38.903701341853534],
        [-75.22383231076932, 38.90370030420148],
        [-75.22284668861646, 38.90273670157089],
        [-75.21903259593331, 38.899865999148666],
        [-75.21665211976224, 38.89773897914867],
        [-75.21528897479907, 38.896504624639675],
        [-75.21384717424414, 38.89518132663969],
        [-75.21153772302593, 38.893009970816664],
        [-75.21012214930715, 38.89164576081666],
        [-75.20656714899361, 38.88780902134303],
        [-75.20083037904622, 38.8834351743125],
        [-75.19998739353547, 38.88274849331251],
        [-75.19803124680682, 38.88105215581955],
        [-75.19718865819998, 38.88035343505429],
        [-75.19556341341132, 38.87896538341083],
        [-75.19539183005413, 38.878821150887774],
        [-75.19525877166946, 38.87870519177715],
        [-75.19379836690909, 38.8774578290123],
        [-75.1931311575245, 38.8768704700123],
        [-75.1931523914744, 38.876869415388356],
        [-75.19136815626119, 38.87531427161658],
        [-75.19070772023588, 38.87471674461661],
        [-75.18416545153744, 38.86836111380578],
        [-75.18286263256987, 38.86699438680577],
        [-75.18217548120434, 38.86611661621794],
        [-75.18176121969094, 38.8656984309756],
        [-75.1817307297651, 38.8656671859769],
        [-75.18165719937241, 38.86559808893511],
        [-75.181436386547, 38.8653836139351],
        [-75.1813142872777, 38.86523994138351],
        [-75.18023068745065, 38.86411338363665],
        [-75.17917089348448, 38.86299530963662],
        [-75.17941995507302, 38.86301076074576],
        [-75.16893890850983, 38.85067210064778],
        [-75.16795395776212, 38.84851382561647],
        [-75.16588318610651, 38.8456091204332],
        [-75.15189698482118, 38.83433208501529],
        [-75.14407112364663, 38.82860885674428],
        [-75.14359481028913, 38.828208628744285],
        [-75.13894757659892, 38.82413588038378],
        [-75.13878563653914, 38.823987619383786],
        [-75.13753991912834, 38.82283255194136],
        [-75.13726367106122, 38.82257309094132],
        [-75.13591834193934, 38.82129110612918],
        [-75.13560400459733, 38.82098714012919],
        [-75.13109127185336, 38.81617884309415],
        [-75.12980228729339, 38.81488001999468],
        [-75.12944986882404, 38.81449441899468],
        [-75.12931412318115, 38.814331371452376],
        [-75.12658525857468, 38.81207060674312],
        [-75.1217529360361, 38.80648164581689],
        [-75.12125255221777, 38.80644253125006],
        [-75.12085259768305, 38.80600499525007],
        [-75.11488834007781, 38.79853927060031],
        [-75.11439117964657, 38.79796391961275],
        [-75.11416267392795, 38.79763070650588],
        [-75.11346232274528, 38.79675379340342],
        [-75.11294812207349, 38.795997065403405],
        [-75.11236915762088, 38.79513004163086],
        [-75.11184264241105, 38.794327373630864],
        [-75.11094839579836, 38.792922598522495],
        [-75.11087947757758, 38.79281090652248],
        [-75.11047167303944, 38.79214995310461],
        [-75.11047407549478, 38.7921504512502],
        [-75.10686783884115, 38.78710616638148],
        [-75.10282918480794, 38.77578366908748],
        [-75.10281009334716, 38.77565001308747],
        [-75.10258932033632, 38.77265295557403],
        [-75.10258916512706, 38.772430134574016],
        [-75.10288908225453, 38.76912110166511],
        [-75.10292697700967, 38.7689205046651],
        [-75.10599311405831, 38.76217581758729],
        [-75.10609182310033, 38.76205063658731],
        [-75.11249109315828, 38.75883979886808],
        [-75.11160130451658, 38.758349549798],
        [-75.11194313876092, 38.75804000379796],
        [-75.11299293379287, 38.757156017019646],
        [-75.11337277273772, 38.756858283019696],
        [-75.11853109528975, 38.75381456451581],
        [-75.1188805911068, 38.7536594895158],
        [-75.12101502019148, 38.752804686992214],
        [-75.12164922988899, 38.752576045256426],
        [-75.12240995434887, 38.75230117861517],
        [-75.12256737421336, 38.752407341197994],
        [-75.12992921394077, 38.75009350917912],
        [-75.13021795256428, 38.75003954117915],
        [-75.13817401914797, 38.74911112451129],
        [-75.13839437443568, 38.74909886151126],
        [-75.14365061098313, 38.74920970609597],
        [-75.1480993366586, 38.74896963416003],
        [-75.14839570252418, 38.74897505616004],
        [-75.15257668827905, 38.74914056124854],
        [-75.15286545972168, 38.74915788324855],
        [-75.16063129198899, 38.74988775926459],
        [-75.16100369246898, 38.749934644264556],
        [-75.17573111371289, 38.75255698578695],
        [-75.17609598123266, 38.752639532786986],
        [-75.1828712964025, 38.75445627980677],
        [-75.19246843089034, 38.75625425708399],
        [-75.20673733630748, 38.75352872055439],
        [-75.21172128148393, 38.75393540916266],
        [-75.21328642865045, 38.753199992895],
        [-75.23249561943436, 38.751863205981245],
        [-75.25636389378245, 38.755339964980344],
        [-75.26449208240261, 38.75779701401932],
        [-75.28645993367856, 38.764144817134856],
        [-75.31472020828446, 38.776540607396015],
        [-75.32675478965666, 38.78376793722171],
        [-75.32691387638198, 38.78384725575695],
        [-75.32719603882136, 38.78403288649795],
        [-75.34050216572057, 38.79202120798539],
        [-75.3424031658148, 38.79334954698539],
        [-75.3471075263183, 38.797128623862285],
        [-75.34923293570701, 38.79852606346754],
        [-75.35301620573651, 38.8018742058872],
        [-75.36075397005767, 38.80808724229037],
        [-75.36337470754277, 38.81103856447815]]]}}]}}}

The task object is what we will submit to the AρρEEARS system.

2.3 Submit a task request

We will now submit our task object to AρρEEARS using the submit task API call.

In [13]:
# Post json to the API task service, return response as json
task_response = requests.post(f"{API}/task", json=task, headers=head)
task_response.json()   
Out[13]:
{'task_id': 'f87fabbc-ed3a-4694-ac8c-381ca19e0944', 'status': 'pending'}

A task ID is generated for each request and is returned in the response. Task IDs are unique for each request and are used to check request status, explore request details, and list files generated for the request.

In [14]:
task_id = task_response.json()['task_id']
task_id
Out[14]:
'f87fabbc-ed3a-4694-ac8c-381ca19e0944'

2.4 Get task status

We can use the Status service to retrieve information on the status of all task requests that are currently being processed for your account. We will use the task status API call with our task_id to get information on the request we just submitted.

In [15]:
status_response = requests.get(f"{API}/status/{task_id}", headers=head)
status_response.json()
Out[15]:
{'task_id': 'f87fabbc-ed3a-4694-ac8c-381ca19e0944',
 'updated': '2019-10-16T17:52:07.005627',
 'user_id': 'appeears.testing@gmail.com',
 'progress': {'details': [{'desc': 'Initializing',
    'step': 1,
    'pct_complete': 0},
   {'desc': 'Downloading', 'step': 2, 'pct_complete': 0},
   {'desc': 'Subsetting', 'step': 3, 'pct_complete': 0},
   {'desc': 'Generating Files', 'step': 4, 'pct_complete': 0},
   {'desc': 'Extracting Stats', 'step': 5, 'pct_complete': 0},
   {'desc': 'Finalizing', 'step': 6, 'pct_complete': 0}],
  'summary': 0},
 'status_id': '32f4e320-34d2-49fb-bb78-93c186b2bb46',
 'status_type': 'task'}

For longer running requests we can gently ping the API to get the status of our submitted request using the snippet below. Once the request is complete, we can move on to downloading our request contents.

In [16]:
# Use while statement to ping the API every 20 seconds until a response of 'done' is returned
starttime = time.time()
while requests.get(f"{API}/task/{task_id}", headers=head).json()['status'] != 'done':
    print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])
    time.sleep(20.0 - ((time.time() - starttime) % 20.0))
print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
done

3. Download a Request

The Bundle service provides information about completed tasks (i.e., tasks that have a status of done). A bundle will be generated containing all of the files that were created as part of the task request.

3.1 List files associated with the request

The list files API call lists all of the files contained in the bundle which are available for download.

In [17]:
bundle = requests.get(f"{API}/bundle/{task_id}").json()    # Call API and return bundle contents for the task_id as json
bundle
Out[17]:
{'files': [{'sha256': 'f3a4c478f2e9b909461a931d23960053618da4ecf62558631c8b0d0a46b293b8',
   'file_id': 'c233c086-8144-427d-bb05-1352f64edce5',
   'file_name': 'CU_LE07.001_30m_aid0001.nc',
   'file_size': 32861099,
   'file_type': 'nc'},
  {'sha256': 'c15c2b7a0948f72528e503ee9c42a98bca44c4d68a74e282cf4555bcf105150a',
   'file_id': '542ae6bc-1649-4683-ac38-2c3819c2f8e7',
   'file_name': 'CU-LE07-001-PIXELQA-lookup.csv',
   'file_size': 431,
   'file_type': 'csv'},
  {'sha256': 'da83b403c07cccaf530e2223f03e5cad123ede8666ebc5c7b360a79607778328',
   'file_id': 'd4e3df6d-cf2c-4178-9fe9-5b8e5a9c888a',
   'file_name': 'CU-LE07-001-PIXELQA-Statistics-QA.csv',
   'file_size': 2557,
   'file_type': 'csv'},
  {'sha256': 'eb01c569e55c1ee96bc3167a034c51c2ea859c95934cbd548445e0e86011ad5d',
   'file_id': '9d2bd22e-537c-4739-b4f9-309b2d7275e0',
   'file_name': 'CU-LE07-001-Statistics.csv',
   'file_size': 11247,
   'file_type': 'csv'},
  {'sha256': 'eb09ce34946d5a3e040d7248d992bbb739d3aac86afe44ff291b33521abf6a3f',
   'file_id': 'e745bc8c-1535-4ee5-b7d5-a4fa4f625351',
   'file_name': 'Prime-Hook-NWR-Coastline-Time-Series-granule-list.txt',
   'file_size': 23426,
   'file_type': 'txt'},
  {'sha256': '193009e3ea6bc262ddc996900d8ee10cf2273afd127a11a19c2286b712197221',
   'file_id': '9c348438-0b25-4db3-a2b0-4916bbe432bb',
   'file_name': 'Prime-Hook-NWR-Coastline-Time-Series-request.json',
   'file_size': 8223,
   'file_type': 'json'},
  {'sha256': 'd7e0f6e0fe382a270a66b1f6aaad8179e4bdf61bfa6f753a86060afa06f5295f',
   'file_id': 'a828d544-bf52-41df-8a5e-36e81f76249f',
   'file_name': 'Prime-Hook-NWR-Coastline-Time-Series-CU-LE07-001-metadata.xml',
   'file_size': 22288,
   'file_type': 'xml'},
  {'sha256': '64a28d62bf1fa1cf72ce230b746478f4b16b75575bdb21169aab6f7a578ffb54',
   'file_id': 'ca60b3af-6c76-4a4f-a79c-f554ad08cb84',
   'file_name': 'README.txt',
   'file_size': 23871,
   'file_type': 'txt'}],
 'created': '2019-10-16T17:52:06.964636',
 'task_id': 'f87fabbc-ed3a-4694-ac8c-381ca19e0944',
 'updated': '2019-10-16T18:00:12.852903',
 'bundle_type': 'area'}

3.2 Download files in a request

The download file API call gives us the information needed to download all, or a subset of the files available for a request. Just as the task has a task_id to identify it, each file in the bundle will also have a unique file_id which should be used for any operation on that specific file. The Content-Type and Content-Disposition headers will be returned when accessing each file to give more details about the format of the file and the filename to be used when saving the file.

The bundle variable we created has more information than we need to download the files. We will first create a python dictionary to hold the file_id and associated file_name for each file.

In [18]:
files = {}
for f in bundle['files']: 
    files[f['file_id']] = f['file_name']    # Fill dictionary with file_id as keys and file_name as values
files
Out[18]:
{'c233c086-8144-427d-bb05-1352f64edce5': 'CU_LE07.001_30m_aid0001.nc',
 '542ae6bc-1649-4683-ac38-2c3819c2f8e7': 'CU-LE07-001-PIXELQA-lookup.csv',
 'd4e3df6d-cf2c-4178-9fe9-5b8e5a9c888a': 'CU-LE07-001-PIXELQA-Statistics-QA.csv',
 '9d2bd22e-537c-4739-b4f9-309b2d7275e0': 'CU-LE07-001-Statistics.csv',
 'e745bc8c-1535-4ee5-b7d5-a4fa4f625351': 'Prime-Hook-NWR-Coastline-Time-Series-granule-list.txt',
 '9c348438-0b25-4db3-a2b0-4916bbe432bb': 'Prime-Hook-NWR-Coastline-Time-Series-request.json',
 'a828d544-bf52-41df-8a5e-36e81f76249f': 'Prime-Hook-NWR-Coastline-Time-Series-CU-LE07-001-metadata.xml',
 'ca60b3af-6c76-4a4f-a79c-f554ad08cb84': 'README.txt'}

Now we will download the files using the file_ids from the dictionary into an output directory.

In [19]:
# Set up output directory on local machine
outDir = f'{inDir}Outputs/'
if not os.path.exists(outDir):
    os.makedirs(outDir)

Use the files dictionary and a for loop to automate downloading all of the output files into the output directory.

In [20]:
for file in files:
    download_response = requests.get(f"{API}/bundle/{task_id}/{file}", stream=True)                                   # Get a stream to the bundle file
    filename = os.path.basename(cgi.parse_header(download_response.headers['Content-Disposition'])[1]['filename'])    # Parse the name from Content-Disposition header 
    filepath = os.path.join(outDir, filename)                                                                         # Create output file path
    with open(filepath, 'wb') as file:                                                                                # Write file to dest dir
        for data in download_response.iter_content(chunk_size=8192): 
            file.write(data)
print("Downloading complete!")
Downloading complete!

4. Explore AρρEEARS Outputs

Now that we have downloaded all the files from our request, let's start to check out our data! In our AρρEEARS request, we set the output format to 'netcdf4'. As a result, we have only one output data file. We will open the dataset as an xarray Dataset and start to explore.

4.1 Open and explore data using xarray

Xarray extends and combines much of the core functionality from both the Pandas library and Numpy, hence making it very good at handling multi-dimensional (N-dimensional) datasets that contain labels (e.g., variable names or dimension names). Let's open the netcdf file with our data as an xarray object.

In [21]:
ds = xarray.open_dataset(f'{outDir}CU_LE07.001_30m_aid0001.nc')  # Open the L7 ARD output NC4 file from AppEEARS
ds
Out[21]:
<xarray.Dataset>
Dimensions:  (time: 102, xdim: 974, ydim: 711)
Coordinates:
  * time     (time) object 2012-01-05 00:00:00 ... 2013-05-31 00:00:00
  * ydim     (ydim) float64 1.963e+06 1.963e+06 ... 1.942e+06 1.942e+06
  * xdim     (xdim) float64 1.755e+06 1.755e+06 ... 1.784e+06 1.784e+06
Data variables:
    crs      int8 ...
    SRB3     (time, ydim, xdim) float32 ...
    SRB4     (time, ydim, xdim) float32 ...
    SRB5     (time, ydim, xdim) float32 ...
    PIXELQA  (time, ydim, xdim) float64 ...
Attributes:
    title:        CU_LE07.001 for aid0001
    Conventions:  CF-1.6
    institution:  Land Processes Distributed Active Archive Center (LP DAAC)
    source:       AppEEARS v2.30
    references:   See README.txt
    history:      See README.txt

Xarray has two fundamental data structures. A Dataset holds multiple variables that potentially share the same coordinates and global metadata for the file (see above). A DataArray contains a single multi-dimensional variable and its coordinates, attributes, and metadata. Data values can be pulled out of the DataArray as a numpy.ndarray using the values attribute.

In [22]:
type(ds)
Out[22]:
xarray.core.dataset.Dataset
In [23]:
type(ds.SRB3)
Out[23]:
xarray.core.dataarray.DataArray
In [24]:
type(ds.SRB3.values)
Out[24]:
numpy.ndarray

We can also pull out information for each coordinate item (e.g., lat, lon, time). Here we pull out the time coordinate.

In [25]:
ds['time']
Out[25]:
<xarray.DataArray 'time' (time: 102)>
array([cftime.DatetimeJulian(2012, 1, 5, 0, 0, 0, 0, 2, 5),
       cftime.DatetimeJulian(2012, 1, 7, 0, 0, 0, 0, 4, 7),
       cftime.DatetimeJulian(2012, 1, 14, 0, 0, 0, 0, 4, 14),
       cftime.DatetimeJulian(2012, 1, 16, 0, 0, 0, 0, 6, 16),
       cftime.DatetimeJulian(2012, 1, 30, 0, 0, 0, 0, 6, 30),
       cftime.DatetimeJulian(2012, 2, 1, 0, 0, 0, 0, 1, 32),
       cftime.DatetimeJulian(2012, 2, 6, 0, 0, 0, 0, 6, 37),
       cftime.DatetimeJulian(2012, 2, 8, 0, 0, 0, 0, 1, 39),
       cftime.DatetimeJulian(2012, 2, 15, 0, 0, 0, 0, 1, 46),
       cftime.DatetimeJulian(2012, 2, 22, 0, 0, 0, 0, 1, 53),
       cftime.DatetimeJulian(2012, 3, 2, 0, 0, 0, 0, 3, 62),
       cftime.DatetimeJulian(2012, 3, 11, 0, 0, 0, 0, 5, 71),
       cftime.DatetimeJulian(2012, 3, 18, 0, 0, 0, 0, 5, 78),
       cftime.DatetimeJulian(2012, 3, 20, 0, 0, 0, 0, 0, 80),
       cftime.DatetimeJulian(2012, 3, 27, 0, 0, 0, 0, 0, 87),
       cftime.DatetimeJulian(2012, 4, 3, 0, 0, 0, 0, 0, 94),
       cftime.DatetimeJulian(2012, 4, 5, 0, 0, 0, 0, 2, 96),
       cftime.DatetimeJulian(2012, 4, 10, 0, 0, 0, 0, 0, 101),
       cftime.DatetimeJulian(2012, 4, 12, 0, 0, 0, 0, 2, 103),
       cftime.DatetimeJulian(2012, 4, 19, 0, 0, 0, 0, 2, 110),
       cftime.DatetimeJulian(2012, 4, 21, 0, 0, 0, 0, 4, 112),
       cftime.DatetimeJulian(2012, 4, 26, 0, 0, 0, 0, 2, 117),
       cftime.DatetimeJulian(2012, 4, 28, 0, 0, 0, 0, 4, 119),
       cftime.DatetimeJulian(2012, 5, 5, 0, 0, 0, 0, 4, 126),
       cftime.DatetimeJulian(2012, 5, 7, 0, 0, 0, 0, 6, 128),
       cftime.DatetimeJulian(2012, 5, 12, 0, 0, 0, 0, 4, 133),
       cftime.DatetimeJulian(2012, 5, 21, 0, 0, 0, 0, 6, 142),
       cftime.DatetimeJulian(2012, 5, 23, 0, 0, 0, 0, 1, 144),
       cftime.DatetimeJulian(2012, 5, 28, 0, 0, 0, 0, 6, 149),
       cftime.DatetimeJulian(2012, 6, 6, 0, 0, 0, 0, 1, 158),
       cftime.DatetimeJulian(2012, 6, 13, 0, 0, 0, 0, 1, 165),
       cftime.DatetimeJulian(2012, 6, 15, 0, 0, 0, 0, 3, 167),
       cftime.DatetimeJulian(2012, 6, 22, 0, 0, 0, 0, 3, 174),
       cftime.DatetimeJulian(2012, 6, 24, 0, 0, 0, 0, 5, 176),
       cftime.DatetimeJulian(2012, 6, 29, 0, 0, 0, 0, 3, 181),
       cftime.DatetimeJulian(2012, 7, 1, 0, 0, 0, 0, 5, 183),
       cftime.DatetimeJulian(2012, 7, 8, 0, 0, 0, 0, 5, 190),
       cftime.DatetimeJulian(2012, 7, 10, 0, 0, 0, 0, 0, 192),
       cftime.DatetimeJulian(2012, 7, 15, 0, 0, 0, 0, 5, 197),
       cftime.DatetimeJulian(2012, 7, 17, 0, 0, 0, 0, 0, 199),
       cftime.DatetimeJulian(2012, 7, 24, 0, 0, 0, 0, 0, 206),
       cftime.DatetimeJulian(2012, 7, 31, 0, 0, 0, 0, 0, 213),
       cftime.DatetimeJulian(2012, 8, 2, 0, 0, 0, 0, 2, 215),
       cftime.DatetimeJulian(2012, 8, 9, 0, 0, 0, 0, 2, 222),
       cftime.DatetimeJulian(2012, 8, 11, 0, 0, 0, 0, 4, 224),
       cftime.DatetimeJulian(2012, 8, 16, 0, 0, 0, 0, 2, 229),
       cftime.DatetimeJulian(2012, 8, 18, 0, 0, 0, 0, 4, 231),
       cftime.DatetimeJulian(2012, 8, 25, 0, 0, 0, 0, 4, 238),
       cftime.DatetimeJulian(2012, 8, 27, 0, 0, 0, 0, 6, 240),
       cftime.DatetimeJulian(2012, 9, 1, 0, 0, 0, 0, 4, 245),
       cftime.DatetimeJulian(2012, 9, 10, 0, 0, 0, 0, 6, 254),
       cftime.DatetimeJulian(2012, 9, 12, 0, 0, 0, 0, 1, 256),
       cftime.DatetimeJulian(2012, 9, 17, 0, 0, 0, 0, 6, 261),
       cftime.DatetimeJulian(2012, 9, 19, 0, 0, 0, 0, 1, 263),
       cftime.DatetimeJulian(2012, 9, 26, 0, 0, 0, 0, 1, 270),
       cftime.DatetimeJulian(2012, 10, 3, 0, 0, 0, 0, 1, 277),
       cftime.DatetimeJulian(2012, 10, 5, 0, 0, 0, 0, 3, 279),
       cftime.DatetimeJulian(2012, 10, 12, 0, 0, 0, 0, 3, 286),
       cftime.DatetimeJulian(2012, 10, 14, 0, 0, 0, 0, 5, 288),
       cftime.DatetimeJulian(2012, 10, 21, 0, 0, 0, 0, 5, 295),
       cftime.DatetimeJulian(2012, 10, 30, 0, 0, 0, 0, 0, 304),
       cftime.DatetimeJulian(2012, 11, 4, 0, 0, 0, 0, 5, 309),
       cftime.DatetimeJulian(2012, 11, 6, 0, 0, 0, 0, 0, 311),
       cftime.DatetimeJulian(2012, 11, 20, 0, 0, 0, 0, 0, 325),
       cftime.DatetimeJulian(2012, 11, 22, 0, 0, 0, 0, 2, 327),
       cftime.DatetimeJulian(2012, 11, 29, 0, 0, 0, 0, 2, 334),
       cftime.DatetimeJulian(2012, 12, 6, 0, 0, 0, 0, 2, 341),
       cftime.DatetimeJulian(2012, 12, 15, 0, 0, 0, 0, 4, 350),
       cftime.DatetimeJulian(2012, 12, 22, 0, 0, 0, 0, 4, 357),
       cftime.DatetimeJulian(2012, 12, 24, 0, 0, 0, 0, 6, 359),
       cftime.DatetimeJulian(2012, 12, 31, 0, 0, 0, 0, 6, 366),
       cftime.DatetimeJulian(2013, 1, 2, 0, 0, 0, 0, 1, 2),
       cftime.DatetimeJulian(2013, 1, 7, 0, 0, 0, 0, 6, 7),
       cftime.DatetimeJulian(2013, 1, 9, 0, 0, 0, 0, 1, 9),
       cftime.DatetimeJulian(2013, 1, 18, 0, 0, 0, 0, 3, 18),
       cftime.DatetimeJulian(2013, 1, 23, 0, 0, 0, 0, 1, 23),
       cftime.DatetimeJulian(2013, 1, 25, 0, 0, 0, 0, 3, 25),
       cftime.DatetimeJulian(2013, 2, 1, 0, 0, 0, 0, 3, 32),
       cftime.DatetimeJulian(2013, 2, 3, 0, 0, 0, 0, 5, 34),
       cftime.DatetimeJulian(2013, 2, 10, 0, 0, 0, 0, 5, 41),
       cftime.DatetimeJulian(2013, 2, 17, 0, 0, 0, 0, 5, 48),
       cftime.DatetimeJulian(2013, 2, 19, 0, 0, 0, 0, 0, 50),
       cftime.DatetimeJulian(2013, 2, 24, 0, 0, 0, 0, 5, 55),
       cftime.DatetimeJulian(2013, 2, 26, 0, 0, 0, 0, 0, 57),
       cftime.DatetimeJulian(2013, 3, 5, 0, 0, 0, 0, 0, 64),
       cftime.DatetimeJulian(2013, 3, 14, 0, 0, 0, 0, 2, 73),
       cftime.DatetimeJulian(2013, 3, 23, 0, 0, 0, 0, 4, 82),
       cftime.DatetimeJulian(2013, 3, 28, 0, 0, 0, 0, 2, 87),
       cftime.DatetimeJulian(2013, 3, 30, 0, 0, 0, 0, 4, 89),
       cftime.DatetimeJulian(2013, 4, 6, 0, 0, 0, 0, 4, 96),
       cftime.DatetimeJulian(2013, 4, 8, 0, 0, 0, 0, 6, 98),
       cftime.DatetimeJulian(2013, 4, 13, 0, 0, 0, 0, 4, 103),
       cftime.DatetimeJulian(2013, 4, 15, 0, 0, 0, 0, 6, 105),
       cftime.DatetimeJulian(2013, 4, 22, 0, 0, 0, 0, 6, 112),
       cftime.DatetimeJulian(2013, 4, 24, 0, 0, 0, 0, 1, 114),
       cftime.DatetimeJulian(2013, 5, 1, 0, 0, 0, 0, 1, 121),
       cftime.DatetimeJulian(2013, 5, 8, 0, 0, 0, 0, 1, 128),
       cftime.DatetimeJulian(2013, 5, 10, 0, 0, 0, 0, 3, 130),
       cftime.DatetimeJulian(2013, 5, 15, 0, 0, 0, 0, 1, 135),
       cftime.DatetimeJulian(2013, 5, 17, 0, 0, 0, 0, 3, 137),
       cftime.DatetimeJulian(2013, 5, 26, 0, 0, 0, 0, 5, 146),
       cftime.DatetimeJulian(2013, 5, 31, 0, 0, 0, 0, 3, 151)], dtype=object)
Coordinates:
  * time     (time) object 2012-01-05 00:00:00 ... 2013-05-31 00:00:00
Attributes:
    standard_name:  time
    axis:           T

The cftime.DatetimeJulian format of the time coordinate is a little problematic for some plotting libraries and analysis routines. We are going to convert the time coordinate to the more useable datetime format datetime64.

In [26]:
import warnings
warnings.filterwarnings('ignore')
datatimeindex = ds.indexes['time'].to_datetimeindex(); # Convert to datetime64
ds['time'] = datatimeindex                             # Set converted index to dataset time coordinate

4.2 Visualize Time Series Data

Below, use the hvPlot and holoviews packages to create an interactive time series plot of the Landsat 7 ARD data.

In [27]:
hv_ds = hv.Dataset(ds) # Convert to holoviews dataset

Plot the holoviews dataset as a four dimensional RGB false color composite, defining the x, y, and time dims from the coordinates, and also the fourth dimension which defines where to put each data variable in the RGB composite.

Here we are using a Landsat 7 false color composite combination of:

  • R = B4 (NIR)
  • G = B5 (SWIR 1)
  • B = B3 (RED)
    #### This false color combination was chosen because it highlights land-water boundaries and is useful in analysis of soil conditions. Since we are interested in analyzing the coastal shoreline, this is a good combination in order to maximize our ability to delineate between land and water features and highlight the shoreline. Vegetation will appear in shades of green/orange/brown, with increasingly saturated soils appearing in very dark colors. Water will appear very dark, almost black.
In [28]:
# Use the .to() method to plot the holoviews dataset as an RGB, defining the x/y/z dimensions and data variables used
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)
Out[28]:

Above, visualize the multidimensional (t,x,y) plot of our gridded data. Move the slide on the right to visualize the different time slices.

Notice in the visualization above that there are many time slices with only fill values. This is easily explained when considering that the Landsat 7 data has been tiled and gridded into an Analysis Ready Data stack. Thus, for the observations with all fill values, there are Landsat 7 data that exist in the ARD tile, however outside of the extent of our region of interest (ROI). Below, remove all observations that only contain fill values.

In [29]:
# Set up a list to store time slices that contain non-fill value data by going through each time slice for a variable
goodTimes = []
for i in range(len(ds.time)):
    if np.nanmean(ds.SRB4[i,:,:]) > 0:                 # Only keep time slices where mean reflectance is greater than 0
        goodTimes.append(ds.SRB4[i,:,:].time.values)   # Append time value for valid observation to list
In [30]:
goodTimes
Out[30]:
[numpy.datetime64('2012-01-14T00:00:00.000000000'),
 numpy.datetime64('2012-01-30T00:00:00.000000000'),
 numpy.datetime64('2012-02-15T00:00:00.000000000'),
 numpy.datetime64('2012-03-02T00:00:00.000000000'),
 numpy.datetime64('2012-03-18T00:00:00.000000000'),
 numpy.datetime64('2012-04-03T00:00:00.000000000'),
 numpy.datetime64('2012-04-19T00:00:00.000000000'),
 numpy.datetime64('2012-05-05T00:00:00.000000000'),
 numpy.datetime64('2012-06-06T00:00:00.000000000'),
 numpy.datetime64('2012-06-22T00:00:00.000000000'),
 numpy.datetime64('2012-07-08T00:00:00.000000000'),
 numpy.datetime64('2012-07-24T00:00:00.000000000'),
 numpy.datetime64('2012-08-09T00:00:00.000000000'),
 numpy.datetime64('2012-09-10T00:00:00.000000000'),
 numpy.datetime64('2012-09-26T00:00:00.000000000'),
 numpy.datetime64('2012-10-12T00:00:00.000000000'),
 numpy.datetime64('2012-11-29T00:00:00.000000000'),
 numpy.datetime64('2012-12-15T00:00:00.000000000'),
 numpy.datetime64('2012-12-31T00:00:00.000000000'),
 numpy.datetime64('2013-02-01T00:00:00.000000000'),
 numpy.datetime64('2013-02-17T00:00:00.000000000'),
 numpy.datetime64('2013-03-05T00:00:00.000000000'),
 numpy.datetime64('2013-04-06T00:00:00.000000000'),
 numpy.datetime64('2013-05-08T00:00:00.000000000')]

Use xarray's powerful indexing method to pull out the time coordinates in the goodTimes list.

In [31]:
ds = ds.sel(time=goodTimes)

Plot the holoviews dataset again without the empty time slices.

In [32]:
hv_ds = hv.Dataset(ds)
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)
Out[32]:

How about cloudy observations? Below is an example of how to filter out cloudy or poor quality pixels from the image time series.

5. Quality Filtering

When available, AρρEEARS extracts and returns quality assurance (QA) data for each data file returned regardless of whether the user requests it. This is done to ensure that the user possesses the information needed to determine the usability and usefulness of the data they get from AρρEEARS. The Quality service from the AρρEEARS API can be leveraged to create masks that filter out undesirable data values.

In [33]:
ds
Out[33]:
<xarray.Dataset>
Dimensions:  (time: 24, xdim: 974, ydim: 711)
Coordinates:
  * time     (time) datetime64[ns] 2012-01-14 2012-01-30 ... 2013-05-08
  * ydim     (ydim) float64 1.963e+06 1.963e+06 ... 1.942e+06 1.942e+06
  * xdim     (xdim) float64 1.755e+06 1.755e+06 ... 1.784e+06 1.784e+06
Data variables:
    crs      int8 ...
    SRB3     (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
    SRB4     (time, ydim, xdim) float32 ...
    SRB5     (time, ydim, xdim) float32 ...
    PIXELQA  (time, ydim, xdim) float64 ...
Attributes:
    title:        CU_LE07.001 for aid0001
    Conventions:  CF-1.6
    institution:  Land Processes Distributed Active Archive Center (LP DAAC)
    source:       AppEEARS v2.30
    references:   See README.txt
    history:      See README.txt

Notice that the xarray Dataset contains a data array/variable called PIXELQA, which has the same dimensions/coordinates as the SRB# data arrays/variables. We can use the quality array to create a mask of poor-quality data. We'll use the Quality service to decode the quality assurance information.

We'll use the following criteria to mask out poor quality data:

  • Cloud (Cloud) == No
  • Cloud Shadow (CS) == No

5.1 Decode quality values

We do not want to decode the same value multiple times. Let's extract all of the unique data values from the PixelQA xarray DataArray.

In [34]:
# Quality Filtering
quality_values = pd.DataFrame(np.unique(ds.PIXELQA.values), columns=['value']).dropna()
quality_values
Out[34]:
value
0 66.0
1 68.0
2 72.0
3 80.0
4 96.0
5 112.0
6 130.0
7 132.0
8 136.0
9 144.0
10 160.0
11 176.0
12 224.0

The following function decodes the data values from the PIXELQA xarray DataArray using the Quality service.

In [35]:
def qualityDecode(qualityservice_url, product, qualitylayer, value):
    req = requests.get(f"{qualityservice_url}/{product}/{qualitylayer}/{value}")
    return(req.json())

Now we will create an empty dataframe to store the decoded quality information for the masking criteria we identified above.

In [36]:
quality_desc = pd.DataFrame(columns=['value', 'Cloud_bits', 'Cloud_description', 'CS_bits', 'CS_description'])

The for loop below goes through all of the unique quality data values, decodes them using the quality service, and appends the quality descriptions to our empty dataframe.

In [37]:
for index, row in quality_values.iterrows():
    decode_int = qualityDecode(f'{API}/quality',
                               "CU_LE07.001",
                               'PIXELQA',
                               str(int(row['value'])))
    quality_info = decode_int
    df = pd.DataFrame({'value': int(row['value']),
                       'Cloud_bits': quality_info['Cloud']['bits'], 
                       'Cloud_description': quality_info['Cloud']['description'], 
                       'CS_bits': quality_info['Cloud Shadow']['bits'],
                       'CS_description': quality_info['Cloud Shadow']['description']}, index=[index])

    quality_desc = quality_desc.append(df)
In [38]:
quality_desc
Out[38]:
value Cloud_bits Cloud_description CS_bits CS_description
0 66 0b0 No 0b0 No
1 68 0b0 No 0b0 No
2 72 0b0 No 0b1 Yes
3 80 0b0 No 0b0 No
4 96 0b1 Yes 0b0 No
5 112 0b1 Yes 0b0 No
6 130 0b0 No 0b0 No
7 132 0b0 No 0b0 No
8 136 0b0 No 0b1 Yes
9 144 0b0 No 0b0 No
10 160 0b1 Yes 0b0 No
11 176 0b1 Yes 0b0 No
12 224 0b1 Yes 0b0 No

5.2 Create and apply quality mask

Now we have a dataframe with all of the quality information we need to create a quality mask. Next, we'll identify the quality categories that we would like to keep.

In [39]:
# Only keep observations where cloud AND cloud shadow both = no (meaning, there are no clouds/shadows present)
mask_values = quality_desc[((quality_desc['Cloud_description'] == 'No') &
                           (quality_desc['CS_description'] == 'No'))]
mask_values
Out[39]:
value Cloud_bits Cloud_description CS_bits CS_description
0 66 0b0 No 0b0 No
1 68 0b0 No 0b0 No
3 80 0b0 No 0b0 No
6 130 0b0 No 0b0 No
7 132 0b0 No 0b0 No
9 144 0b0 No 0b0 No

Let's apply the mask to our xarray dataset, keeping only the values that we have deemed acceptable.

In [40]:
dsMasked = ds.where(ds['PIXELQA'].isin(mask_values['value']))
dsMasked
Out[40]:
<xarray.Dataset>
Dimensions:  (time: 24, xdim: 974, ydim: 711)
Coordinates:
  * time     (time) datetime64[ns] 2012-01-14 2012-01-30 ... 2013-05-08
  * ydim     (ydim) float64 1.963e+06 1.963e+06 ... 1.942e+06 1.942e+06
  * xdim     (xdim) float64 1.755e+06 1.755e+06 ... 1.784e+06 1.784e+06
Data variables:
    crs      (time, ydim, xdim) float64 nan nan nan nan nan ... nan nan nan nan
    SRB3     (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
    SRB4     (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
    SRB5     (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
    PIXELQA  (time, ydim, xdim) float64 nan nan nan nan nan ... nan nan nan nan
Attributes:
    title:        CU_LE07.001 for aid0001
    Conventions:  CF-1.6
    institution:  Land Processes Distributed Active Archive Center (LP DAAC)
    source:       AppEEARS v2.30
    references:   See README.txt
    history:      See README.txt

Filter out any additional observations that may be returning only fill values after applying the cloud mask.

In [41]:
goodTimes = []
for i in range(len(dsMasked.time)):
    if np.nanmean(dsMasked.SRB4[i,:,:]) > 0:
        goodTimes.append(dsMasked.SRB4[i,:,:].time.values)
dsFinal = dsMasked.sel(time=goodTimes)

5.3 Plot quality filtered data

Using the same plotting functionality from above, let's see how our data looks when we mask out the undesirable pixels.

In [42]:
hv_ds = hv.Dataset(dsFinal)
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)
Out[42]:

This tutorial provides a template to use for your own research workflows. Leveraging the AρρEEARS API for extracting and formatting analysis ready data and importing it directly into Python means that you can keep your entire research workflow in a single software program, from start to finish.

Contact Information

Material written by Cole Krehbiel$^{1}$ & Aaron Friesz$^{2}$

    Contact: LPDAAC@usgs.gov
    Voice: +1-605-594-6116
    Organization: Land Processes Distributed Active Archive Center (LP DAAC)
    Website: https://lpdaac.usgs.gov/
    Date last modified: 01-03-2020
$^{1}$Innovate! Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, 57198-001, USA. Work performed under USGS contract G15PD00467 for LP DAAC$^{3}$. $^{2}$KBR Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, 57198-001, USA. Work performed under USGS contract G15PD00467 for LP DAAC$^{3}$. $^{3}$LP DAAC Work performed under NASA contract NNG14HH33I.