Published: Oct. 16, 2019
The intent of this tutorial is to familiarize Landsat Analysis Ready Data (ARD) users with the AρρEEARS application programming interface (API) with demonstrations on how the API, and the services it provides, can be leveraged in an analysis workflow.
This tutorial was developed using a real-world use case for a project being completed by the NASA DEVELOP Node at the Marshall Space Flight Center. NASA Develop is a program aimed at integrating NASA Earth observations with society to foster future innovation and cultivate the professionals of tomorrow by addressing diverse environmental issues today.
The example use case comes from a project titled, "Utilizing NASA Earth Observations to Assess Coastline Replenishment Initiatives and Shoreline Risk along Delaware's Coasts". The group is partnering with the Delaware Department of Natural Resources and Environmental Control, Division of Watershed Stewardship for this project. The goals for the project include to identify areas of current and potential shoreline loss along the coast of Delaware, assess the current restoration efforts, and create time-series coastline maps.
To access AρρEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/
For comprehensive documentation of the full functionality of the AρρEEARS API, please see the AρρEEARS API Documentation: https://lpdaacsvc.cr.usgs.gov/appeears/api/
Throughout the exercise, specific sections of the API documentation can be accessed by clicking the hyperlinked text.
It is recommended to use Conda, an environment manager to set up a compatible Python environment. Download Conda for your OS here: https://www.anaconda.com/download/. Once you have Conda installed, Follow the instructions below to successfully setup a Python environment on MacOS or Windows.
This Python Jupyter Notebook tutorial has been tested using Python versions 3.6, 3.6.6 and 3.7.
Conda was used to create the python environment.
- Option 1: Download the environment yml file:
- Open the environment.yml file and change the prefix (last line) to the directory on your OS where you want to create the environment (ex: C:\Username\Anaconda3\envs\ardtutorial) and save the environment file.
- Using Command Prompt, Anaconda Prompt, Cmder, Terminal, or your preferred command line interface, navigate to the directory where you saved the environment.yml
file.
- Type conda env create -f environment.yml
- Type activate ardtutorial
conda create -n ardtutorial python=3.6
conda update -n base -c defaults conda
conda config --add channels conda-forge
activate ardtutorial
conda install requests
conda install pandas
conda install geopandas
conda install xarray
conda install numpy
conda install netcdf4
conda install holoviews
conda install -c pyviz hvplot
If you encounter an issue downloading hvplot using conda, try
pip install pyviz hvplot
To submit a request, you must first login to the AρρEEARS API using your Earthdata login credentials. We’ll use the
getpass
package to conceal our Earthdata login username and password. When executed, the code below will prompt you to enter your username followed by your password and store them as variables.
# Import required Python packages
import requests
import getpass
import time
import os
import cgi
import json
import pandas as pd
import geopandas as gpd
import xarray
import numpy as np
import hvplot.xarray
import holoviews as hv
# Set input directory, change working directory
inDir = os.getcwd() + os.sep # Set input directory to the current working directory
os.chdir(inDir) # Change to working directory
getpass
package to enter your NASA Earthdata login Username and Password. When prompted after executing the code block below, enter your username followed by your password.# Enter Earthdata login credentials
username = getpass.getpass('Earthdata Username:')
password = getpass.getpass('Earthdata Password:')
Earthdata Username:········
Earthdata Password:········
API = 'https://lpdaacsvc.cr.usgs.gov/appeears/api/' # Set the AρρEEARS API to a variable
requests
package to post your username and password. A successful login will provide you with a token to be used later in this tutorial to submit a request. For more information or if you are experiencing difficulties, please see the API Documentation.# Insert API URL, call login service, provide credentials & return json
login_response = requests.post(f"{API}/login", auth=(username, password)).json()
del username, password
login_response
{'token_type': 'Bearer',
'token': 'JC_MNTgi3qgv0635fRUeSArIDfEBzKmX0hW1W3u_zkQXoYxDZ3BFKQ5sN7twpYJbN73k27C66_XN3KzIvOe3wA',
'expiration': '2019-10-18T17:51:50Z'}
# Assign the token to a variable
token = login_response['token']
head = {'Authorization': f"Bearer {token}"}
head
{'Authorization': 'Bearer JC_MNTgi3qgv0635fRUeSArIDfEBzKmX0hW1W3u_zkQXoYxDZ3BFKQ5sN7twpYJbN73k27C66_XN3KzIvOe3wA'}
The Tasks service, among other things (see below), is used to submit requests (e.g., POST and GET) to the AρρEEARS system. Each call to the Tasks service is associated with your user account. Therefore, each of the calls to this service require an authentication token. The submit task API call provides a way to submit a new request. It accepts data via JSON, query string, or a combination of both. In the example below, we will compile a json and submit a request.
In this example, we are interested in a portion of the coastline of Delaware. We will use the
Geopandas
package to import a shapefile that contains the Prime Hook National Wildlife Refuge (with a 6 km buffer applied to the refuge boundaries). The shapefile was extracted from the United States Fish and Wildlife Service Cadastral Database and can be downloaded here.
primehookNWR = gpd.read_file('PrimeHookNWR_6kmBuffer.shp') # Import shapefile using geopandas
primehookNWR.head()
OBJECTID | CMPXNAME | ORGNAME | ORGCODE | IFWS | LIT | GISACRES | DOCACRES | MAXACRES | APPTYPE | ... | RSL_TYPE | FWSREGION | LABELNAME | CreatedBy | CreatedDat | ModifiedBy | ModifiedDa | SHAPE_Leng | SHAPE_Area | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 663.0 | COASTAL DELAWARE NATIONAL WILDLIFE REFUGE COMPLEX | PRIME HOOK NATIONAL WILDLIFE REFUGE | 51560 | 630 | PMH | 10251.121256 | None | None | 0 | ... | NWR | 5 | Prime Hook NWR | R5 | 2019/03/29 18:50:17.000 | R5 | 2019/03/29 18:50:17.000 | 0.673839 | 0.00431 | POLYGON ((-75.36337470754277 38.81103856447815... |
1 rows × 21 columns
Geopandas imports the shapefile in as a Geopandas GeoDataframe.
type(primehookNWR)
geopandas.geodataframe.GeoDataFrame
Convert the
Geopandas GeoDataframe
into an object that has geojson structure. Use the methodjson.loads
to make the conversion.
primehookNWR = json.loads(primehookNWR.to_json())
type(primehookNWR)
dict
The primehookNWR variable is now a python dictionary that matches the geojson structure.
Many of the required items needed in the AρρEEARS API request payload have multiple options. For example, AρρEEARS has several projections that can be selected for the output. We can use the AρρEEARS API to find out what projections are available. In this example, we are explicitly assigning our projection to the proj variable. To find out how to use the AρρEEARS API to list the available options for each parameter, check out the AρρEEARS API Tutorials produced by the LP DAAC.
task_name = 'Prime Hook NWR Coastline Time Series' # User-defined name of the task
task_type = 'area' # Type of task, area or point
proj = "albers_ard_conus" # Set output projection
outFormat = 'netcdf4' # Set output file format type
startDate = '01-01-2012' # Start of the date range for which to extract data: MM-DD-YYYY
endDate = '05-31-2013' # End of the date range for which to extract data: MM-DD-YYYY
recurring = False # Specify True for a recurring date range
# Define the products and layers desired
prodLayer = [{
"layer": "SRB3",
"product": "CU_LE07.001"
},
{
"layer": "SRB4",
"product": "CU_LE07.001"
},
{
"layer": "SRB5",
"product": "CU_LE07.001"
} ]
Notice that primehookNWR
is inserted from the shapefile transformed to a geojson via the geopandas
and json
packages above in section 2.1.
task = {
'task_type': task_type,
'task_name': task_name,
'params': {
'dates': [
{
'startDate': startDate,
'endDate': endDate
}],
'layers': prodLayer,
'output': {
'format': {
'type': outFormat},
'projection': proj},
'geo': primehookNWR,
}
}
task
{'task_type': 'area',
'task_name': 'Prime Hook NWR Coastline Time Series',
'params': {'dates': [{'startDate': '01-01-2012', 'endDate': '05-31-2013'}],
'layers': [{'layer': 'SRB3', 'product': 'CU_LE07.001'},
{'layer': 'SRB4', 'product': 'CU_LE07.001'},
{'layer': 'SRB5', 'product': 'CU_LE07.001'}],
'output': {'format': {'type': 'netcdf4'}, 'projection': 'albers_ard_conus'},
'geo': {'type': 'FeatureCollection',
'features': [{'id': '0',
'type': 'Feature',
'properties': {'APPTYPE': 0,
'CMPXNAME': 'COASTAL DELAWARE NATIONAL WILDLIFE REFUGE COMPLEX',
'COMMENTS': None,
'CreatedBy': 'R5',
'CreatedDat': '2019/03/29 18:50:17.000',
'DOCACRES': None,
'FWSREGION': '5',
'GISACRES': 10251.1212562,
'IFWS': '630',
'LABELNAME': 'Prime Hook NWR',
'LIT': 'PMH',
'MAXACRES': None,
'ModifiedBy': 'R5',
'ModifiedDa': '2019/03/29 18:50:17.000',
'OBJECTID': 663.0,
'ORGCODE': 51560,
'ORGNAME': 'PRIME HOOK NATIONAL WILDLIFE REFUGE',
'RSL_TYPE': 'NWR',
'SHAPE_Area': 0.004309666381901,
'SHAPE_Leng': 0.673838515118319},
'geometry': {'type': 'Polygon',
'coordinates': [[[-75.36337470754277, 38.81103856447815],
[-75.36684623839793, 38.814108983317865],
[-75.3673902689009, 38.814685229317845],
[-75.37213118078465, 38.82008155061829],
[-75.37284602989138, 38.820961841618285],
[-75.37292676194586, 38.82109893561042],
[-75.38066020861933, 38.827932193520105],
[-75.39219500392865, 38.84338449352398],
[-75.39629008173304, 38.85706813639797],
[-75.39629181975268, 38.85717690239801],
[-75.3957166937612, 38.86016931314761],
[-75.39560970457556, 38.86153318416018],
[-75.39533229692118, 38.86313141416013],
[-75.39516125646422, 38.8630600595839],
[-75.39495484732245, 38.86413450064249],
[-75.39452423754125, 38.865111843642474],
[-75.39394349009319, 38.864842895005374],
[-75.392281229942, 38.87043549542343],
[-75.38065240158467, 38.87764473640972],
[-75.37765249667939, 38.878582856409714],
[-75.37565666108746, 38.87908165806733],
[-75.37993532030686, 38.88538957223554],
[-75.38091881903999, 38.89290373897693],
[-75.38752740166262, 38.90230192270185],
[-75.38865193805547, 38.90864871779438],
[-75.39664573522931, 38.91770508787822],
[-75.40309960193864, 38.933748308413904],
[-75.39951192355599, 38.946177978529846],
[-75.39541068178005, 38.95101910705229],
[-75.3952595221568, 38.95121471318857],
[-75.39505501708528, 38.95147935753647],
[-75.39174520950266, 38.955862326060846],
[-75.39160983610694, 38.95604157818668],
[-75.39149029069716, 38.95620713473624],
[-75.38044293645724, 38.96372041087951],
[-75.36218992586565, 38.96637644517485],
[-75.338461593418, 38.96392376673482],
[-75.31151062746424, 38.956594592901276],
[-75.28844891186671, 38.946980985228315],
[-75.28487729899736, 38.94559764215446],
[-75.28407271165516, 38.945156309057126],
[-75.28389652142155, 38.94508284312066],
[-75.28276784365292, 38.94444054459322],
[-75.25673153738597, 38.93015437556376],
[-75.24936007013217, 38.92457781835633],
[-75.24900916991201, 38.92468234061907],
[-75.24693765332951, 38.92323838561905],
[-75.23017273069657, 38.90999253006259],
[-75.22901241264326, 38.90894094306264],
[-75.22388016970439, 38.903701341853534],
[-75.22383231076932, 38.90370030420148],
[-75.22284668861646, 38.90273670157089],
[-75.21903259593331, 38.899865999148666],
[-75.21665211976224, 38.89773897914867],
[-75.21528897479907, 38.896504624639675],
[-75.21384717424414, 38.89518132663969],
[-75.21153772302593, 38.893009970816664],
[-75.21012214930715, 38.89164576081666],
[-75.20656714899361, 38.88780902134303],
[-75.20083037904622, 38.8834351743125],
[-75.19998739353547, 38.88274849331251],
[-75.19803124680682, 38.88105215581955],
[-75.19718865819998, 38.88035343505429],
[-75.19556341341132, 38.87896538341083],
[-75.19539183005413, 38.878821150887774],
[-75.19525877166946, 38.87870519177715],
[-75.19379836690909, 38.8774578290123],
[-75.1931311575245, 38.8768704700123],
[-75.1931523914744, 38.876869415388356],
[-75.19136815626119, 38.87531427161658],
[-75.19070772023588, 38.87471674461661],
[-75.18416545153744, 38.86836111380578],
[-75.18286263256987, 38.86699438680577],
[-75.18217548120434, 38.86611661621794],
[-75.18176121969094, 38.8656984309756],
[-75.1817307297651, 38.8656671859769],
[-75.18165719937241, 38.86559808893511],
[-75.181436386547, 38.8653836139351],
[-75.1813142872777, 38.86523994138351],
[-75.18023068745065, 38.86411338363665],
[-75.17917089348448, 38.86299530963662],
[-75.17941995507302, 38.86301076074576],
[-75.16893890850983, 38.85067210064778],
[-75.16795395776212, 38.84851382561647],
[-75.16588318610651, 38.8456091204332],
[-75.15189698482118, 38.83433208501529],
[-75.14407112364663, 38.82860885674428],
[-75.14359481028913, 38.828208628744285],
[-75.13894757659892, 38.82413588038378],
[-75.13878563653914, 38.823987619383786],
[-75.13753991912834, 38.82283255194136],
[-75.13726367106122, 38.82257309094132],
[-75.13591834193934, 38.82129110612918],
[-75.13560400459733, 38.82098714012919],
[-75.13109127185336, 38.81617884309415],
[-75.12980228729339, 38.81488001999468],
[-75.12944986882404, 38.81449441899468],
[-75.12931412318115, 38.814331371452376],
[-75.12658525857468, 38.81207060674312],
[-75.1217529360361, 38.80648164581689],
[-75.12125255221777, 38.80644253125006],
[-75.12085259768305, 38.80600499525007],
[-75.11488834007781, 38.79853927060031],
[-75.11439117964657, 38.79796391961275],
[-75.11416267392795, 38.79763070650588],
[-75.11346232274528, 38.79675379340342],
[-75.11294812207349, 38.795997065403405],
[-75.11236915762088, 38.79513004163086],
[-75.11184264241105, 38.794327373630864],
[-75.11094839579836, 38.792922598522495],
[-75.11087947757758, 38.79281090652248],
[-75.11047167303944, 38.79214995310461],
[-75.11047407549478, 38.7921504512502],
[-75.10686783884115, 38.78710616638148],
[-75.10282918480794, 38.77578366908748],
[-75.10281009334716, 38.77565001308747],
[-75.10258932033632, 38.77265295557403],
[-75.10258916512706, 38.772430134574016],
[-75.10288908225453, 38.76912110166511],
[-75.10292697700967, 38.7689205046651],
[-75.10599311405831, 38.76217581758729],
[-75.10609182310033, 38.76205063658731],
[-75.11249109315828, 38.75883979886808],
[-75.11160130451658, 38.758349549798],
[-75.11194313876092, 38.75804000379796],
[-75.11299293379287, 38.757156017019646],
[-75.11337277273772, 38.756858283019696],
[-75.11853109528975, 38.75381456451581],
[-75.1188805911068, 38.7536594895158],
[-75.12101502019148, 38.752804686992214],
[-75.12164922988899, 38.752576045256426],
[-75.12240995434887, 38.75230117861517],
[-75.12256737421336, 38.752407341197994],
[-75.12992921394077, 38.75009350917912],
[-75.13021795256428, 38.75003954117915],
[-75.13817401914797, 38.74911112451129],
[-75.13839437443568, 38.74909886151126],
[-75.14365061098313, 38.74920970609597],
[-75.1480993366586, 38.74896963416003],
[-75.14839570252418, 38.74897505616004],
[-75.15257668827905, 38.74914056124854],
[-75.15286545972168, 38.74915788324855],
[-75.16063129198899, 38.74988775926459],
[-75.16100369246898, 38.749934644264556],
[-75.17573111371289, 38.75255698578695],
[-75.17609598123266, 38.752639532786986],
[-75.1828712964025, 38.75445627980677],
[-75.19246843089034, 38.75625425708399],
[-75.20673733630748, 38.75352872055439],
[-75.21172128148393, 38.75393540916266],
[-75.21328642865045, 38.753199992895],
[-75.23249561943436, 38.751863205981245],
[-75.25636389378245, 38.755339964980344],
[-75.26449208240261, 38.75779701401932],
[-75.28645993367856, 38.764144817134856],
[-75.31472020828446, 38.776540607396015],
[-75.32675478965666, 38.78376793722171],
[-75.32691387638198, 38.78384725575695],
[-75.32719603882136, 38.78403288649795],
[-75.34050216572057, 38.79202120798539],
[-75.3424031658148, 38.79334954698539],
[-75.3471075263183, 38.797128623862285],
[-75.34923293570701, 38.79852606346754],
[-75.35301620573651, 38.8018742058872],
[-75.36075397005767, 38.80808724229037],
[-75.36337470754277, 38.81103856447815]]]}}]}}}
The task object is what we will submit to the AρρEEARS system.
We will now submit our task object to AρρEEARS using the submit task API call.
# Post json to the API task service, return response as json
task_response = requests.post(f"{API}/task", json=task, headers=head)
task_response.json()
{'task_id': 'f87fabbc-ed3a-4694-ac8c-381ca19e0944', 'status': 'pending'}
A task ID is generated for each request and is returned in the response. Task IDs are unique for each request and are used to check request status, explore request details, and list files generated for the request.
task_id = task_response.json()['task_id']
task_id
'f87fabbc-ed3a-4694-ac8c-381ca19e0944'
We can use the Status service to retrieve information on the status of all task requests that are currently being processed for your account. We will use the task status API call with our task_id to get information on the request we just submitted.
status_response = requests.get(f"{API}/status/{task_id}", headers=head)
status_response.json()
{'task_id': 'f87fabbc-ed3a-4694-ac8c-381ca19e0944',
'updated': '2019-10-16T17:52:07.005627',
'user_id': 'appeears.testing@gmail.com',
'progress': {'details': [{'desc': 'Initializing',
'step': 1,
'pct_complete': 0},
{'desc': 'Downloading', 'step': 2, 'pct_complete': 0},
{'desc': 'Subsetting', 'step': 3, 'pct_complete': 0},
{'desc': 'Generating Files', 'step': 4, 'pct_complete': 0},
{'desc': 'Extracting Stats', 'step': 5, 'pct_complete': 0},
{'desc': 'Finalizing', 'step': 6, 'pct_complete': 0}],
'summary': 0},
'status_id': '32f4e320-34d2-49fb-bb78-93c186b2bb46',
'status_type': 'task'}
For longer running requests we can gently ping the API to get the status of our submitted request using the snippet below. Once the request is complete, we can move on to downloading our request contents.
# Use while statement to ping the API every 20 seconds until a response of 'done' is returned
starttime = time.time()
while requests.get(f"{API}/task/{task_id}", headers=head).json()['status'] != 'done':
print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])
time.sleep(20.0 - ((time.time() - starttime) % 20.0))
print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
processing
done
The Bundle service provides information about completed tasks (i.e., tasks that have a status of done). A bundle will be generated containing all of the files that were created as part of the task request.
The list files API call lists all of the files contained in the bundle which are available for download.
bundle = requests.get(f"{API}/bundle/{task_id}").json() # Call API and return bundle contents for the task_id as json
bundle
{'files': [{'sha256': 'f3a4c478f2e9b909461a931d23960053618da4ecf62558631c8b0d0a46b293b8',
'file_id': 'c233c086-8144-427d-bb05-1352f64edce5',
'file_name': 'CU_LE07.001_30m_aid0001.nc',
'file_size': 32861099,
'file_type': 'nc'},
{'sha256': 'c15c2b7a0948f72528e503ee9c42a98bca44c4d68a74e282cf4555bcf105150a',
'file_id': '542ae6bc-1649-4683-ac38-2c3819c2f8e7',
'file_name': 'CU-LE07-001-PIXELQA-lookup.csv',
'file_size': 431,
'file_type': 'csv'},
{'sha256': 'da83b403c07cccaf530e2223f03e5cad123ede8666ebc5c7b360a79607778328',
'file_id': 'd4e3df6d-cf2c-4178-9fe9-5b8e5a9c888a',
'file_name': 'CU-LE07-001-PIXELQA-Statistics-QA.csv',
'file_size': 2557,
'file_type': 'csv'},
{'sha256': 'eb01c569e55c1ee96bc3167a034c51c2ea859c95934cbd548445e0e86011ad5d',
'file_id': '9d2bd22e-537c-4739-b4f9-309b2d7275e0',
'file_name': 'CU-LE07-001-Statistics.csv',
'file_size': 11247,
'file_type': 'csv'},
{'sha256': 'eb09ce34946d5a3e040d7248d992bbb739d3aac86afe44ff291b33521abf6a3f',
'file_id': 'e745bc8c-1535-4ee5-b7d5-a4fa4f625351',
'file_name': 'Prime-Hook-NWR-Coastline-Time-Series-granule-list.txt',
'file_size': 23426,
'file_type': 'txt'},
{'sha256': '193009e3ea6bc262ddc996900d8ee10cf2273afd127a11a19c2286b712197221',
'file_id': '9c348438-0b25-4db3-a2b0-4916bbe432bb',
'file_name': 'Prime-Hook-NWR-Coastline-Time-Series-request.json',
'file_size': 8223,
'file_type': 'json'},
{'sha256': 'd7e0f6e0fe382a270a66b1f6aaad8179e4bdf61bfa6f753a86060afa06f5295f',
'file_id': 'a828d544-bf52-41df-8a5e-36e81f76249f',
'file_name': 'Prime-Hook-NWR-Coastline-Time-Series-CU-LE07-001-metadata.xml',
'file_size': 22288,
'file_type': 'xml'},
{'sha256': '64a28d62bf1fa1cf72ce230b746478f4b16b75575bdb21169aab6f7a578ffb54',
'file_id': 'ca60b3af-6c76-4a4f-a79c-f554ad08cb84',
'file_name': 'README.txt',
'file_size': 23871,
'file_type': 'txt'}],
'created': '2019-10-16T17:52:06.964636',
'task_id': 'f87fabbc-ed3a-4694-ac8c-381ca19e0944',
'updated': '2019-10-16T18:00:12.852903',
'bundle_type': 'area'}
The download file API call gives us the information needed to download all, or a subset of the files available for a request. Just as the task has a task_id to identify it, each file in the bundle will also have a unique file_id which should be used for any operation on that specific file. The
Content-Type
andContent-Disposition
headers will be returned when accessing each file to give more details about the format of the file and the filename to be used when saving the file.The
bundle
variable we created has more information than we need to download the files. We will first create a python dictionary to hold the file_id and associated file_name for each file.
files = {}
for f in bundle['files']:
files[f['file_id']] = f['file_name'] # Fill dictionary with file_id as keys and file_name as values
files
{'c233c086-8144-427d-bb05-1352f64edce5': 'CU_LE07.001_30m_aid0001.nc',
'542ae6bc-1649-4683-ac38-2c3819c2f8e7': 'CU-LE07-001-PIXELQA-lookup.csv',
'd4e3df6d-cf2c-4178-9fe9-5b8e5a9c888a': 'CU-LE07-001-PIXELQA-Statistics-QA.csv',
'9d2bd22e-537c-4739-b4f9-309b2d7275e0': 'CU-LE07-001-Statistics.csv',
'e745bc8c-1535-4ee5-b7d5-a4fa4f625351': 'Prime-Hook-NWR-Coastline-Time-Series-granule-list.txt',
'9c348438-0b25-4db3-a2b0-4916bbe432bb': 'Prime-Hook-NWR-Coastline-Time-Series-request.json',
'a828d544-bf52-41df-8a5e-36e81f76249f': 'Prime-Hook-NWR-Coastline-Time-Series-CU-LE07-001-metadata.xml',
'ca60b3af-6c76-4a4f-a79c-f554ad08cb84': 'README.txt'}
Now we will download the files using the file_ids from the dictionary into an output directory.
# Set up output directory on local machine
outDir = f'{inDir}Outputs/'
if not os.path.exists(outDir):
os.makedirs(outDir)
files
dictionary and a for
loop to automate downloading all of the output files into the output directory.for file in files:
download_response = requests.get(f"{API}/bundle/{task_id}/{file}", stream=True) # Get a stream to the bundle file
filename = os.path.basename(cgi.parse_header(download_response.headers['Content-Disposition'])[1]['filename']) # Parse the name from Content-Disposition header
filepath = os.path.join(outDir, filename) # Create output file path
with open(filepath, 'wb') as file: # Write file to dest dir
for data in download_response.iter_content(chunk_size=8192):
file.write(data)
print("Downloading complete!")
Downloading complete!
Now that we have downloaded all the files from our request, let's start to check out our data! In our AρρEEARS request, we set the output format to 'netcdf4'. As a result, we have only one output data file. We will open the dataset as an xarray Dataset
and start to explore.
xarray
Xarray
extends and combines much of the core functionality from both the Pandas library and Numpy, hence making it very good at handling multi-dimensional (N-dimensional) datasets that contain labels (e.g., variable names or dimension names). Let's open the netcdf file with our data as an xarray object.
ds = xarray.open_dataset(f'{outDir}CU_LE07.001_30m_aid0001.nc') # Open the L7 ARD output NC4 file from AppEEARS
ds
<xarray.Dataset>
Dimensions: (time: 102, xdim: 974, ydim: 711)
Coordinates:
* time (time) object 2012-01-05 00:00:00 ... 2013-05-31 00:00:00
* ydim (ydim) float64 1.963e+06 1.963e+06 ... 1.942e+06 1.942e+06
* xdim (xdim) float64 1.755e+06 1.755e+06 ... 1.784e+06 1.784e+06
Data variables:
crs int8 ...
SRB3 (time, ydim, xdim) float32 ...
SRB4 (time, ydim, xdim) float32 ...
SRB5 (time, ydim, xdim) float32 ...
PIXELQA (time, ydim, xdim) float64 ...
Attributes:
title: CU_LE07.001 for aid0001
Conventions: CF-1.6
institution: Land Processes Distributed Active Archive Center (LP DAAC)
source: AppEEARS v2.30
references: See README.txt
history: See README.txt
Xarray has two fundamental data structures. A
Dataset
holds multiple variables that potentially share the same coordinates and global metadata for the file (see above). ADataArray
contains a single multi-dimensional variable and its coordinates, attributes, and metadata. Data values can be pulled out of the DataArray as anumpy.ndarray
using thevalues
attribute.
type(ds)
xarray.core.dataset.Dataset
type(ds.SRB3)
xarray.core.dataarray.DataArray
type(ds.SRB3.values)
numpy.ndarray
We can also pull out information for each coordinate item (e.g., lat, lon, time). Here we pull out the time coordinate.
ds['time']
<xarray.DataArray 'time' (time: 102)>
array([cftime.DatetimeJulian(2012, 1, 5, 0, 0, 0, 0, 2, 5),
cftime.DatetimeJulian(2012, 1, 7, 0, 0, 0, 0, 4, 7),
cftime.DatetimeJulian(2012, 1, 14, 0, 0, 0, 0, 4, 14),
cftime.DatetimeJulian(2012, 1, 16, 0, 0, 0, 0, 6, 16),
cftime.DatetimeJulian(2012, 1, 30, 0, 0, 0, 0, 6, 30),
cftime.DatetimeJulian(2012, 2, 1, 0, 0, 0, 0, 1, 32),
cftime.DatetimeJulian(2012, 2, 6, 0, 0, 0, 0, 6, 37),
cftime.DatetimeJulian(2012, 2, 8, 0, 0, 0, 0, 1, 39),
cftime.DatetimeJulian(2012, 2, 15, 0, 0, 0, 0, 1, 46),
cftime.DatetimeJulian(2012, 2, 22, 0, 0, 0, 0, 1, 53),
cftime.DatetimeJulian(2012, 3, 2, 0, 0, 0, 0, 3, 62),
cftime.DatetimeJulian(2012, 3, 11, 0, 0, 0, 0, 5, 71),
cftime.DatetimeJulian(2012, 3, 18, 0, 0, 0, 0, 5, 78),
cftime.DatetimeJulian(2012, 3, 20, 0, 0, 0, 0, 0, 80),
cftime.DatetimeJulian(2012, 3, 27, 0, 0, 0, 0, 0, 87),
cftime.DatetimeJulian(2012, 4, 3, 0, 0, 0, 0, 0, 94),
cftime.DatetimeJulian(2012, 4, 5, 0, 0, 0, 0, 2, 96),
cftime.DatetimeJulian(2012, 4, 10, 0, 0, 0, 0, 0, 101),
cftime.DatetimeJulian(2012, 4, 12, 0, 0, 0, 0, 2, 103),
cftime.DatetimeJulian(2012, 4, 19, 0, 0, 0, 0, 2, 110),
cftime.DatetimeJulian(2012, 4, 21, 0, 0, 0, 0, 4, 112),
cftime.DatetimeJulian(2012, 4, 26, 0, 0, 0, 0, 2, 117),
cftime.DatetimeJulian(2012, 4, 28, 0, 0, 0, 0, 4, 119),
cftime.DatetimeJulian(2012, 5, 5, 0, 0, 0, 0, 4, 126),
cftime.DatetimeJulian(2012, 5, 7, 0, 0, 0, 0, 6, 128),
cftime.DatetimeJulian(2012, 5, 12, 0, 0, 0, 0, 4, 133),
cftime.DatetimeJulian(2012, 5, 21, 0, 0, 0, 0, 6, 142),
cftime.DatetimeJulian(2012, 5, 23, 0, 0, 0, 0, 1, 144),
cftime.DatetimeJulian(2012, 5, 28, 0, 0, 0, 0, 6, 149),
cftime.DatetimeJulian(2012, 6, 6, 0, 0, 0, 0, 1, 158),
cftime.DatetimeJulian(2012, 6, 13, 0, 0, 0, 0, 1, 165),
cftime.DatetimeJulian(2012, 6, 15, 0, 0, 0, 0, 3, 167),
cftime.DatetimeJulian(2012, 6, 22, 0, 0, 0, 0, 3, 174),
cftime.DatetimeJulian(2012, 6, 24, 0, 0, 0, 0, 5, 176),
cftime.DatetimeJulian(2012, 6, 29, 0, 0, 0, 0, 3, 181),
cftime.DatetimeJulian(2012, 7, 1, 0, 0, 0, 0, 5, 183),
cftime.DatetimeJulian(2012, 7, 8, 0, 0, 0, 0, 5, 190),
cftime.DatetimeJulian(2012, 7, 10, 0, 0, 0, 0, 0, 192),
cftime.DatetimeJulian(2012, 7, 15, 0, 0, 0, 0, 5, 197),
cftime.DatetimeJulian(2012, 7, 17, 0, 0, 0, 0, 0, 199),
cftime.DatetimeJulian(2012, 7, 24, 0, 0, 0, 0, 0, 206),
cftime.DatetimeJulian(2012, 7, 31, 0, 0, 0, 0, 0, 213),
cftime.DatetimeJulian(2012, 8, 2, 0, 0, 0, 0, 2, 215),
cftime.DatetimeJulian(2012, 8, 9, 0, 0, 0, 0, 2, 222),
cftime.DatetimeJulian(2012, 8, 11, 0, 0, 0, 0, 4, 224),
cftime.DatetimeJulian(2012, 8, 16, 0, 0, 0, 0, 2, 229),
cftime.DatetimeJulian(2012, 8, 18, 0, 0, 0, 0, 4, 231),
cftime.DatetimeJulian(2012, 8, 25, 0, 0, 0, 0, 4, 238),
cftime.DatetimeJulian(2012, 8, 27, 0, 0, 0, 0, 6, 240),
cftime.DatetimeJulian(2012, 9, 1, 0, 0, 0, 0, 4, 245),
cftime.DatetimeJulian(2012, 9, 10, 0, 0, 0, 0, 6, 254),
cftime.DatetimeJulian(2012, 9, 12, 0, 0, 0, 0, 1, 256),
cftime.DatetimeJulian(2012, 9, 17, 0, 0, 0, 0, 6, 261),
cftime.DatetimeJulian(2012, 9, 19, 0, 0, 0, 0, 1, 263),
cftime.DatetimeJulian(2012, 9, 26, 0, 0, 0, 0, 1, 270),
cftime.DatetimeJulian(2012, 10, 3, 0, 0, 0, 0, 1, 277),
cftime.DatetimeJulian(2012, 10, 5, 0, 0, 0, 0, 3, 279),
cftime.DatetimeJulian(2012, 10, 12, 0, 0, 0, 0, 3, 286),
cftime.DatetimeJulian(2012, 10, 14, 0, 0, 0, 0, 5, 288),
cftime.DatetimeJulian(2012, 10, 21, 0, 0, 0, 0, 5, 295),
cftime.DatetimeJulian(2012, 10, 30, 0, 0, 0, 0, 0, 304),
cftime.DatetimeJulian(2012, 11, 4, 0, 0, 0, 0, 5, 309),
cftime.DatetimeJulian(2012, 11, 6, 0, 0, 0, 0, 0, 311),
cftime.DatetimeJulian(2012, 11, 20, 0, 0, 0, 0, 0, 325),
cftime.DatetimeJulian(2012, 11, 22, 0, 0, 0, 0, 2, 327),
cftime.DatetimeJulian(2012, 11, 29, 0, 0, 0, 0, 2, 334),
cftime.DatetimeJulian(2012, 12, 6, 0, 0, 0, 0, 2, 341),
cftime.DatetimeJulian(2012, 12, 15, 0, 0, 0, 0, 4, 350),
cftime.DatetimeJulian(2012, 12, 22, 0, 0, 0, 0, 4, 357),
cftime.DatetimeJulian(2012, 12, 24, 0, 0, 0, 0, 6, 359),
cftime.DatetimeJulian(2012, 12, 31, 0, 0, 0, 0, 6, 366),
cftime.DatetimeJulian(2013, 1, 2, 0, 0, 0, 0, 1, 2),
cftime.DatetimeJulian(2013, 1, 7, 0, 0, 0, 0, 6, 7),
cftime.DatetimeJulian(2013, 1, 9, 0, 0, 0, 0, 1, 9),
cftime.DatetimeJulian(2013, 1, 18, 0, 0, 0, 0, 3, 18),
cftime.DatetimeJulian(2013, 1, 23, 0, 0, 0, 0, 1, 23),
cftime.DatetimeJulian(2013, 1, 25, 0, 0, 0, 0, 3, 25),
cftime.DatetimeJulian(2013, 2, 1, 0, 0, 0, 0, 3, 32),
cftime.DatetimeJulian(2013, 2, 3, 0, 0, 0, 0, 5, 34),
cftime.DatetimeJulian(2013, 2, 10, 0, 0, 0, 0, 5, 41),
cftime.DatetimeJulian(2013, 2, 17, 0, 0, 0, 0, 5, 48),
cftime.DatetimeJulian(2013, 2, 19, 0, 0, 0, 0, 0, 50),
cftime.DatetimeJulian(2013, 2, 24, 0, 0, 0, 0, 5, 55),
cftime.DatetimeJulian(2013, 2, 26, 0, 0, 0, 0, 0, 57),
cftime.DatetimeJulian(2013, 3, 5, 0, 0, 0, 0, 0, 64),
cftime.DatetimeJulian(2013, 3, 14, 0, 0, 0, 0, 2, 73),
cftime.DatetimeJulian(2013, 3, 23, 0, 0, 0, 0, 4, 82),
cftime.DatetimeJulian(2013, 3, 28, 0, 0, 0, 0, 2, 87),
cftime.DatetimeJulian(2013, 3, 30, 0, 0, 0, 0, 4, 89),
cftime.DatetimeJulian(2013, 4, 6, 0, 0, 0, 0, 4, 96),
cftime.DatetimeJulian(2013, 4, 8, 0, 0, 0, 0, 6, 98),
cftime.DatetimeJulian(2013, 4, 13, 0, 0, 0, 0, 4, 103),
cftime.DatetimeJulian(2013, 4, 15, 0, 0, 0, 0, 6, 105),
cftime.DatetimeJulian(2013, 4, 22, 0, 0, 0, 0, 6, 112),
cftime.DatetimeJulian(2013, 4, 24, 0, 0, 0, 0, 1, 114),
cftime.DatetimeJulian(2013, 5, 1, 0, 0, 0, 0, 1, 121),
cftime.DatetimeJulian(2013, 5, 8, 0, 0, 0, 0, 1, 128),
cftime.DatetimeJulian(2013, 5, 10, 0, 0, 0, 0, 3, 130),
cftime.DatetimeJulian(2013, 5, 15, 0, 0, 0, 0, 1, 135),
cftime.DatetimeJulian(2013, 5, 17, 0, 0, 0, 0, 3, 137),
cftime.DatetimeJulian(2013, 5, 26, 0, 0, 0, 0, 5, 146),
cftime.DatetimeJulian(2013, 5, 31, 0, 0, 0, 0, 3, 151)], dtype=object)
Coordinates:
* time (time) object 2012-01-05 00:00:00 ... 2013-05-31 00:00:00
Attributes:
standard_name: time
axis: T
The
cftime.DatetimeJulian
format of the time coordinate is a little problematic for some plotting libraries and analysis routines. We are going to convert the time coordinate to the more useable datetime formatdatetime64
.
import warnings
warnings.filterwarnings('ignore')
datatimeindex = ds.indexes['time'].to_datetimeindex(); # Convert to datetime64
ds['time'] = datatimeindex # Set converted index to dataset time coordinate
hvPlot
and holoviews
packages to create an interactive time series plot of the Landsat 7 ARD data.hv_ds = hv.Dataset(ds) # Convert to holoviews dataset
Here we are using a Landsat 7 false color composite combination of:
- R = B4 (NIR)
- G = B5 (SWIR 1)
- B = B3 (RED)
# Use the .to() method to plot the holoviews dataset as an RGB, defining the x/y/z dimensions and data variables used
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)
Above, visualize the multidimensional (t,x,y) plot of our gridded data. Move the slide on the right to visualize the different time slices.
# Set up a list to store time slices that contain non-fill value data by going through each time slice for a variable
goodTimes = []
for i in range(len(ds.time)):
if np.nanmean(ds.SRB4[i,:,:]) > 0: # Only keep time slices where mean reflectance is greater than 0
goodTimes.append(ds.SRB4[i,:,:].time.values) # Append time value for valid observation to list
goodTimes
[numpy.datetime64('2012-01-14T00:00:00.000000000'),
numpy.datetime64('2012-01-30T00:00:00.000000000'),
numpy.datetime64('2012-02-15T00:00:00.000000000'),
numpy.datetime64('2012-03-02T00:00:00.000000000'),
numpy.datetime64('2012-03-18T00:00:00.000000000'),
numpy.datetime64('2012-04-03T00:00:00.000000000'),
numpy.datetime64('2012-04-19T00:00:00.000000000'),
numpy.datetime64('2012-05-05T00:00:00.000000000'),
numpy.datetime64('2012-06-06T00:00:00.000000000'),
numpy.datetime64('2012-06-22T00:00:00.000000000'),
numpy.datetime64('2012-07-08T00:00:00.000000000'),
numpy.datetime64('2012-07-24T00:00:00.000000000'),
numpy.datetime64('2012-08-09T00:00:00.000000000'),
numpy.datetime64('2012-09-10T00:00:00.000000000'),
numpy.datetime64('2012-09-26T00:00:00.000000000'),
numpy.datetime64('2012-10-12T00:00:00.000000000'),
numpy.datetime64('2012-11-29T00:00:00.000000000'),
numpy.datetime64('2012-12-15T00:00:00.000000000'),
numpy.datetime64('2012-12-31T00:00:00.000000000'),
numpy.datetime64('2013-02-01T00:00:00.000000000'),
numpy.datetime64('2013-02-17T00:00:00.000000000'),
numpy.datetime64('2013-03-05T00:00:00.000000000'),
numpy.datetime64('2013-04-06T00:00:00.000000000'),
numpy.datetime64('2013-05-08T00:00:00.000000000')]
Use xarray's powerful indexing method to pull out the
time
coordinates in thegoodTimes
list.
ds = ds.sel(time=goodTimes)
hv_ds = hv.Dataset(ds)
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)
When available, AρρEEARS extracts and returns quality assurance (QA) data for each data file returned regardless of whether the user requests it. This is done to ensure that the user possesses the information needed to determine the usability and usefulness of the data they get from AρρEEARS. The Quality service from the AρρEEARS API can be leveraged to create masks that filter out undesirable data values.
ds
<xarray.Dataset>
Dimensions: (time: 24, xdim: 974, ydim: 711)
Coordinates:
* time (time) datetime64[ns] 2012-01-14 2012-01-30 ... 2013-05-08
* ydim (ydim) float64 1.963e+06 1.963e+06 ... 1.942e+06 1.942e+06
* xdim (xdim) float64 1.755e+06 1.755e+06 ... 1.784e+06 1.784e+06
Data variables:
crs int8 ...
SRB3 (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
SRB4 (time, ydim, xdim) float32 ...
SRB5 (time, ydim, xdim) float32 ...
PIXELQA (time, ydim, xdim) float64 ...
Attributes:
title: CU_LE07.001 for aid0001
Conventions: CF-1.6
institution: Land Processes Distributed Active Archive Center (LP DAAC)
source: AppEEARS v2.30
references: See README.txt
history: See README.txt
Notice that the xarray Dataset contains a data array/variable called
PIXELQA
, which has the same dimensions/coordinates as theSRB#
data arrays/variables. We can use the quality array to create a mask of poor-quality data. We'll use the Quality service to decode the quality assurance information.We'll use the following criteria to mask out poor quality data: - Cloud (Cloud) == No
- Cloud Shadow (CS) == No
We do not want to decode the same value multiple times. Let's extract all of the unique data values from the
PixelQA
xarray DataArray.
# Quality Filtering
quality_values = pd.DataFrame(np.unique(ds.PIXELQA.values), columns=['value']).dropna()
quality_values
value | |
---|---|
0 | 66.0 |
1 | 68.0 |
2 | 72.0 |
3 | 80.0 |
4 | 96.0 |
5 | 112.0 |
6 | 130.0 |
7 | 132.0 |
8 | 136.0 |
9 | 144.0 |
10 | 160.0 |
11 | 176.0 |
12 | 224.0 |
The following function decodes the data values from the
PIXELQA
xarray DataArray using the Quality service.
def qualityDecode(qualityservice_url, product, qualitylayer, value):
req = requests.get(f"{qualityservice_url}/{product}/{qualitylayer}/{value}")
return(req.json())
Now we will create an empty dataframe to store the decoded quality information for the masking criteria we identified above.
quality_desc = pd.DataFrame(columns=['value', 'Cloud_bits', 'Cloud_description', 'CS_bits', 'CS_description'])
The for loop below goes through all of the unique quality data values, decodes them using the quality service, and appends the quality descriptions to our empty dataframe.
for index, row in quality_values.iterrows():
decode_int = qualityDecode(f'{API}/quality',
"CU_LE07.001",
'PIXELQA',
str(int(row['value'])))
quality_info = decode_int
df = pd.DataFrame({'value': int(row['value']),
'Cloud_bits': quality_info['Cloud']['bits'],
'Cloud_description': quality_info['Cloud']['description'],
'CS_bits': quality_info['Cloud Shadow']['bits'],
'CS_description': quality_info['Cloud Shadow']['description']}, index=[index])
quality_desc = quality_desc.append(df)
quality_desc
value | Cloud_bits | Cloud_description | CS_bits | CS_description | |
---|---|---|---|---|---|
0 | 66 | 0b0 | No | 0b0 | No |
1 | 68 | 0b0 | No | 0b0 | No |
2 | 72 | 0b0 | No | 0b1 | Yes |
3 | 80 | 0b0 | No | 0b0 | No |
4 | 96 | 0b1 | Yes | 0b0 | No |
5 | 112 | 0b1 | Yes | 0b0 | No |
6 | 130 | 0b0 | No | 0b0 | No |
7 | 132 | 0b0 | No | 0b0 | No |
8 | 136 | 0b0 | No | 0b1 | Yes |
9 | 144 | 0b0 | No | 0b0 | No |
10 | 160 | 0b1 | Yes | 0b0 | No |
11 | 176 | 0b1 | Yes | 0b0 | No |
12 | 224 | 0b1 | Yes | 0b0 | No |
Now we have a dataframe with all of the quality information we need to create a quality mask. Next, we'll identify the quality categories that we would like to keep.
# Only keep observations where cloud AND cloud shadow both = no (meaning, there are no clouds/shadows present)
mask_values = quality_desc[((quality_desc['Cloud_description'] == 'No') &
(quality_desc['CS_description'] == 'No'))]
mask_values
value | Cloud_bits | Cloud_description | CS_bits | CS_description | |
---|---|---|---|---|---|
0 | 66 | 0b0 | No | 0b0 | No |
1 | 68 | 0b0 | No | 0b0 | No |
3 | 80 | 0b0 | No | 0b0 | No |
6 | 130 | 0b0 | No | 0b0 | No |
7 | 132 | 0b0 | No | 0b0 | No |
9 | 144 | 0b0 | No | 0b0 | No |
Let's apply the mask to our xarray dataset, keeping only the values that we have deemed acceptable.
dsMasked = ds.where(ds['PIXELQA'].isin(mask_values['value']))
dsMasked
<xarray.Dataset>
Dimensions: (time: 24, xdim: 974, ydim: 711)
Coordinates:
* time (time) datetime64[ns] 2012-01-14 2012-01-30 ... 2013-05-08
* ydim (ydim) float64 1.963e+06 1.963e+06 ... 1.942e+06 1.942e+06
* xdim (xdim) float64 1.755e+06 1.755e+06 ... 1.784e+06 1.784e+06
Data variables:
crs (time, ydim, xdim) float64 nan nan nan nan nan ... nan nan nan nan
SRB3 (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
SRB4 (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
SRB5 (time, ydim, xdim) float32 nan nan nan nan nan ... nan nan nan nan
PIXELQA (time, ydim, xdim) float64 nan nan nan nan nan ... nan nan nan nan
Attributes:
title: CU_LE07.001 for aid0001
Conventions: CF-1.6
institution: Land Processes Distributed Active Archive Center (LP DAAC)
source: AppEEARS v2.30
references: See README.txt
history: See README.txt
goodTimes = []
for i in range(len(dsMasked.time)):
if np.nanmean(dsMasked.SRB4[i,:,:]) > 0:
goodTimes.append(dsMasked.SRB4[i,:,:].time.values)
dsFinal = dsMasked.sel(time=goodTimes)
Using the same plotting functionality from above, let's see how our data looks when we mask out the undesirable pixels.
hv_ds = hv.Dataset(dsFinal)
timeSeries = hv_ds.to(hv.RGB, kdims=["xdim", "ydim"], dynamic=True, vdims=["SRB4","SRB5","SRB3"])
timeSeries.opts(width=600, height=600)