This tutorial was developed using a real-world use case for a project being completed by the NASA DEVELOP Node at the Marshall Space Flight Center. NASA Develop is a program aimed at integrating NASA Earth observations with society to foster future innovation and cultivate the professionals of tomorrow by addressing diverse environmental issues today.
The example use case comes from a project titled, "Utilizing NASA Earth Observations to Assess Coastline Replenishment Initiatives and Shoreline Risk along Delaware's Coasts". The group is partnering with the Delaware Department of Natural Resources and Environmental Control, Division of Watershed Stewardship for this project. The goals for the project include to identify areas of current and potential shoreline loss along the coast of Delaware, assess the current restoration efforts, and create time-series coastline maps.
To access AρρEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/
For comprehensive documentation of the full functionality of the AρρEEARS API, please see the AρρEEARS API Documentation: https://lpdaacsvc.cr.usgs.gov/appeears/api/
Throughout the exercise, specific sections of the API documentation can be accessed by clicking the hyperlinked text.
It is recommended to use Conda, an environment manager to set up a compatible Python environment. Download Conda for your OS here: https://www.anaconda.com/download/. Once you have Conda installed, Follow the instructions below to successfully setup a Python environment on MacOS or Windows.
This Python Jupyter Notebook tutorial has been tested using Python versions 3.6, 3.6.6 and 3.7.
Conda was used to create the python environment.
environment.yml
file. conda env create -f environment.yml
activate ardtutorial
conda create -n ardtutorial python=3.6
conda update -n base -c defaults conda
conda config --add channels conda-forge
activate ardtutorial
conda install requests
conda install pandas
conda install geopandas
conda install xarray
conda install numpy
conda install netcdf4
conda install holoviews
conda install -c pyviz hvplot
If you encounter an issue downloading hvplot using conda, try
pip install pyviz hvplot
If this is your first time using the AρρEEARS API, you must first enable API access by following the instructions provided below after signing in with your NASA Earthdata Login.
To enable access to the AρρEEARS API, navigate to the AρρEEARS website. Click the Sign In button in the top right portion of the AρρEEARS landing page screen.
Once you are signed in, click the Manage User icon in the top right portion of the AρρEEARS landing page screen and select Settings.
Select the Enable API box to gain access to the AρρEEARS API.
To submit a request, you must first login to the AρρEEARS API using your Earthdata login credentials. We’ll use the
getpass
package to conceal our Earthdata login username and password. When executed, the code below will prompt you to enter your username followed by your password and store them as variables.
# Import required Python packages
import requests
import getpass
import time
import os
import cgi
import json
import pandas as pd
import geopandas as gpd
import xarray
import numpy as np
import hvplot.xarray
import holoviews as hv
# Set input directory, change working directory
inDir = os.getcwd() + os.sep # Set input directory to the current working directory
os.chdir(inDir) # Change to working directory
getpass
package to enter your NASA Earthdata login Username and Password. When prompted after executing the code block below, enter your username followed by your password.¶# Enter Earthdata login credentials
username = getpass.getpass('Earthdata Username:')
password = getpass.getpass('Earthdata Password:')
API = 'https://lpdaacsvc.cr.usgs.gov/appeears/api/' # Set the AρρEEARS API to a variable
requests
package to post your username and password. A successful login will provide you with a token to be used later in this tutorial to submit a request. For more information or if you are experiencing difficulties, please see the API Documentation.¶# Insert API URL, call login service, provide credentials & return json
login_response = requests.post(f"{API}/login", auth=(username, password)).json()
del username, password
login_response
# Assign the token to a variable
token = login_response['token']
head = {'Authorization': f"Bearer {token}"}
head
The Tasks service, among other things (see below), is used to submit requests (e.g., POST and GET) to the AρρEEARS system. Each call to the Tasks service is associated with your user account. Therefore, each of the calls to this service require an authentication token. The submit task API call provides a way to submit a new request. It accepts data via JSON, query string, or a combination of both. In the example below, we will compile a json and submit a request.
In this example, we are interested in a portion of the coastline of Delaware. We will use the
Geopandas
package to import a shapefile that contains the Prime Hook National Wildlife Refuge (with a 6 km buffer applied to the refuge boundaries). The shapefile was extracted from the United States Fish and Wildlife Service Cadastral Database and can be downloaded here.
primehookNWR = gpd.read_file('PrimeHookNWR_6kmBuffer.shp') # Import shapefile using geopandas
primehookNWR.head()
Geopandas imports the shapefile in as a Geopandas GeoDataframe.
type(primehookNWR)
Convert the
Geopandas GeoDataframe
into an object that has geojson structure. Use the methodjson.loads
to make the conversion.
primehookNWR = json.loads(primehookNWR.to_json())
type(primehookNWR)
The primehookNWR variable is now a python dictionary that matches the geojson structure.
Many of the required items needed in the AρρEEARS API request payload have multiple options. For example, AρρEEARS has several projections that can be selected for the output. We can use the AρρEEARS API to find out what projections are available. In this example, we are explicitly assigning our projection to the proj variable. To find out how to use the AρρEEARS API to list the available options for each parameter, check out the AρρEEARS API Tutorials produced by the LP DAAC.
task_name = 'Prime Hook NWR Coastline Time Series' # User-defined name of the task
task_type = 'area' # Type of task, area or point
proj = "albers_ard_conus" # Set output projection
outFormat = 'netcdf4' # Set output file format type
startDate = '01-01-2012' # Start of the date range for which to extract data: MM-DD-YYYY
endDate = '05-31-2013' # End of the date range for which to extract data: MM-DD-YYYY
recurring = False # Specify True for a recurring date range
# Define the products and layers desired
prodLayer = [{
"layer": "SRB3",
"product": "CU_LE07.001"
},
{
"layer": "SRB4",
"product": "CU_LE07.001"
},
{
"layer": "SRB5",
"product": "CU_LE07.001"
} ]
Notice that primehookNWR
is inserted from the shapefile transformed to a geojson via the geopandas
and json
packages above in section 2.1.
task = {
'task_type': task_type,
'task_name': task_name,
'params': {
'dates': [
{
'startDate': startDate,
'endDate': endDate
}],
'layers': prodLayer,
'output': {
'format': {
'type': outFormat},
'projection': proj},
'geo': primehookNWR,
}
}
task
The task object is what we will submit to the AρρEEARS system.
We will now submit our task object to AρρEEARS using the submit task API call.
# Post json to the API task service, return response as json
task_response = requests.post(f"{API}/task", json=task, headers=head)
task_response.json()
A task ID is generated for each request and is returned in the response. Task IDs are unique for each request and are used to check request status, explore request details, and list files generated for the request.
task_id = task_response.json()['task_id']
task_id
We can use the Status service to retrieve information on the status of all task requests that are currently being processed for your account. We will use the task status API call with our task_id to get information on the request we just submitted.
status_response = requests.get(f"{API}/status/{task_id}", headers=head)
status_response.json()
For longer running requests we can gently ping the API to get the status of our submitted request using the snippet below. Once the request is complete, we can move on to downloading our request contents.
# Use while statement to ping the API every 20 seconds until a response of 'done' is returned
starttime = time.time()
while requests.get(f"{API}/task/{task_id}", headers=head).json()['status'] != 'done':
print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])
time.sleep(20.0 - ((time.time() - starttime) % 20.0))
print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])
The list files API call lists all of the files contained in the bundle which are available for download.
bundle = requests.get(f"{API}/bundle/{task_id}").json() # Call API and return bundle contents for the task_id as json
bundle
The download file API call gives us the information needed to download all, or a subset of the files available for a request. Just as the task has a task_id to identify it, each file in the bundle will also have a unique file_id which should be used for any operation on that specific file. The
Content-Type
andContent-Disposition
headers will be returned when accessing each file to give more details about the format of the file and the filename to be used when saving the file.
The
bundle
variable we created has more information than we need to download the files. We will first create a python dictionary to hold the file_id and associated file_name for each file.
files = {}
for f in bundle['files']:
files[f['file_id']] = f['file_name'] # Fill dictionary with file_id as keys and file_name as values
files
Now we will download the files using the file_ids from the dictionary into an output directory.
# Set up output directory on local machine
outDir = f'{inDir}Outputs/'
if not os.path.exists(outDir):
os.makedirs(outDir)
files
dictionary and a for
loop to automate downloading all of the output files into the output directory.¶for file in files:
download_response = requests.get(f"{API}/bundle/{task_id}/{file}", stream=True) # Get a stream to the bundle file
filename = os.path.basename(cgi.parse_header(download_response.headers['Content-Disposition'])[1]['filename']) # Parse the name from Content-Disposition header
filepath = os.path.join(outDir, filename) # Create output file path
with open(filepath, 'wb') as file: # Write file to dest dir
for data in download_response.iter_content(chunk_size=8192):
file.write(data)
print("Downloading complete!")
Now that we have downloaded all the files from our request, let's start to check out our data! In our AρρEEARS request, we set the output format to 'netcdf4'. As a result, we have only one output data file. We will open the dataset as an xarray Dataset
and start to explore.
Xarray
extends and combines much of the core functionality from both the Pandas library and Numpy, hence making it very good at handling multi-dimensional (N-dimensional) datasets that contain labels (e.g., variable names or dimension names). Let's open the netcdf file with our data as an xarray object.
ds = xarray.open_dataset(f'{outDir}CU_LE07.001_30m_aid0001.nc') # Open the L7 ARD output NC4 file from AppEEARS
ds
Xarray has two fundamental data structures. A
Dataset
holds multiple variables that potentially share the same coordinates and global metadata for the file (see above). ADataArray
contains a single multi-dimensional variable and its coordinates, attributes, and metadata. Data values can be pulled out of the DataArray as anumpy.ndarray
using thevalues
attribute.
type(ds)
type(ds.SRB3)
type(ds.SRB3.values)
We can also pull out information for each coordinate item (e.g., lat, lon, time). Here we pull out the time coordinate.
ds['time']
The
cftime.DatetimeJulian
format of the time coordinate is a little problematic for some plotting libraries and analysis routines. We are going to convert the time coordinate to the more useable datetime formatdatetime64
.
import warnings
warnings.filterwarnings('ignore')
datatimeindex = ds.indexes['time'].to_datetimeindex(); # Convert to datetime64
ds['time'] = datatimeindex # Set converted index to dataset time coordinate
hv_ds = hv.Dataset(ds) # Convert to holoviews dataset
Here we are using a Landsat 7 false color composite combination of:
- R = B4 (NIR)
- G = B5 (SWIR 1)
- B = B3 (RED)
#### This false color combination was chosen because it highlights land-water boundaries and is useful in analysis of soil conditions. Since we are interested in analyzing the coastal shoreline, this is a good combination in order to maximize our ability to delineate between land and water features and highlight the shoreline. Vegetation will appear in shades of green/orange/brown, with increasingly saturated soils appearing in very dark colors. Water will appear very dark, almost black.
# Use the .to() method to plot the holoviews dataset as an RGB, defining the x/y/z dimensions and data variables used
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)
Above, visualize the multidimensional (t,x,y) plot of our gridded data. Move the slide on the right to visualize the different time slices.
# Set up a list to store time slices that contain non-fill value data by going through each time slice for a variable
goodTimes = []
for i in range(len(ds.time)):
if np.nanmean(ds.SRB4[i,:,:]) > 0: # Only keep time slices where mean reflectance is greater than 0
goodTimes.append(ds.SRB4[i,:,:].time.values) # Append time value for valid observation to list
goodTimes
Use xarray's powerful indexing method to pull out the
time
coordinates in thegoodTimes
list.
ds = ds.sel(time=goodTimes)
hv_ds = hv.Dataset(ds)
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)
When available, AρρEEARS extracts and returns quality assurance (QA) data for each data file returned regardless of whether the user requests it. This is done to ensure that the user possesses the information needed to determine the usability and usefulness of the data they get from AρρEEARS. The Quality service from the AρρEEARS API can be leveraged to create masks that filter out undesirable data values.
ds
Notice that the xarray Dataset contains a data array/variable called
PIXELQA
, which has the same dimensions/coordinates as theSRB#
data arrays/variables. We can use the quality array to create a mask of poor-quality data. We'll use the Quality service to decode the quality assurance information.We'll use the following criteria to mask out poor quality data:
- Cloud (Cloud) == No
- Cloud Shadow (CS) == No
We do not want to decode the same value multiple times. Let's extract all of the unique data values from the
PixelQA
xarray DataArray.
# Quality Filtering
quality_values = pd.DataFrame(np.unique(ds.PIXELQA.values), columns=['value']).dropna()
quality_values
The following function decodes the data values from the
PIXELQA
xarray DataArray using the Quality service.
def qualityDecode(qualityservice_url, product, qualitylayer, value):
req = requests.get(f"{qualityservice_url}/{product}/{qualitylayer}/{value}")
return(req.json())
Now we will create an empty dataframe to store the decoded quality information for the masking criteria we identified above.
quality_desc = pd.DataFrame(columns=['value', 'Cloud_bits', 'Cloud_description', 'CS_bits', 'CS_description'])
The for loop below goes through all of the unique quality data values, decodes them using the quality service, and appends the quality descriptions to our empty dataframe.
for index, row in quality_values.iterrows():
decode_int = qualityDecode(f'{API}/quality',
"CU_LE07.001",
'PIXELQA',
str(int(row['value'])))
quality_info = decode_int
df = pd.DataFrame({'value': int(row['value']),
'Cloud_bits': quality_info['Cloud']['bits'],
'Cloud_description': quality_info['Cloud']['description'],
'CS_bits': quality_info['Cloud Shadow']['bits'],
'CS_description': quality_info['Cloud Shadow']['description']}, index=[index])
quality_desc = quality_desc.append(df)
quality_desc
Now we have a dataframe with all of the quality information we need to create a quality mask. Next, we'll identify the quality categories that we would like to keep.
# Only keep observations where cloud AND cloud shadow both = no (meaning, there are no clouds/shadows present)
mask_values = quality_desc[((quality_desc['Cloud_description'] == 'No') &
(quality_desc['CS_description'] == 'No'))]
mask_values
Let's apply the mask to our xarray dataset, keeping only the values that we have deemed acceptable.
dsMasked = ds.where(ds['PIXELQA'].isin(mask_values['value']))
dsMasked
goodTimes = []
for i in range(len(dsMasked.time)):
if np.nanmean(dsMasked.SRB4[i,:,:]) > 0:
goodTimes.append(dsMasked.SRB4[i,:,:].time.values)
dsFinal = dsMasked.sel(time=goodTimes)
Using the same plotting functionality from above, let's see how our data looks when we mask out the undesirable pixels.
hv_ds = hv.Dataset(dsFinal)
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)