The Harmonized Landsat and Sentinel-2 (HLS) project produces seamless, harmonized surface reflectance data from the Operational Land Imager (OLI) and Multi-Spectral Instrument (MSI) aboard Landsat-8 and Sentinel-2 Earth-observing satellites, respectively. The aim is to produce seamless products with normalized parameters, which include atmospheric correction, cloud and cloud-shadow masking, geographic co-registration and common gridding, normalized bidirectional reflectance distribution function, and spectral band adjustment. This will provide global observation of the Earth’s surface every 2-3 days with 30 meter spatial resolution. One of the major applications that will benefit from HLS is agriculture assessment and monitoring, which is used as the use case for this tutorial.
NASA's Land Processes Distributed Active Archive Center (LP DAAC) archives and distributes HLS products in the LP DAAC Cumulus cloud archive as Cloud Optimized GeoTIFFs (COG). This tutorial will demonstrate how to query and subset HLS data using the NASA Common Metadata Repository (CMR) SpatioTemporal Asset Catalog (STAC) application programming interface (API). Because these data are stored as COGs, this tutorial will teach users how to load subsets of individual files into memory for just the bands you are interested in--a paradigm shift from the more common workflow where you would need to download a .zip/HDF file containing every band over the entire scene/tile. This tutorial will also cover how to process HLS data (quality filtering and EVI calculation), visualize, and "stack" the scenes over a region of interest into an xarray data array, calculate statistics for an EVI time series, and export as a comma-separated values (CSV) file--providing you with all of the information you need for your area of interest without having to download the source data file. The Enhanced Vegetation Index (EVI), is a vegetation index similar to NDVI that has been found to be more sensitive to ground cover below the vegetated canopy and saturates less over areas of dense green vegetation.
This tutorial was developed using an example use case for crop monitoring over a single large farm field in northern California. The goal of the project is to observe HLS-derived mean EVI over a farm field in northern California without downloading the entirety of the HLS source data.
This tutorial will show how to use the CMR-STAC API to investigate the HLS collection available in the cloud and search for and subset to the specific time period, bands (layers), and region of interest for our use case, load subsets of the desired COGs into a Jupyter Notebook directly from the cloud, quality filter and calculate EVI, stack the time series, visualize the time series, and export a CSV of statistics on the EVI of the single farm field.
It is recommended to use Conda, an environment manager to set up a compatible Python environment. Download Conda for your OS here: https://www.anaconda.com/download/. Once you have Conda installed, Follow the instructions below to successfully setup a Python environment on Linux, MacOS, or Windows.
This Python Jupyter Notebook tutorial has been tested using Python version 3.7. Conda was used to create the python environment.
Using your preferred command line interface (command prompt, terminal, cmder, etc.) type the following to successfully create a compatible python environment:
conda create -n hlstutorial -c conda-forge --yes python=3.7 gdal rasterio shapely geopandas geoviews holoviews xarray matplotlib cartopy scikit-image hvplot pyepsg
conda activate hlstutorial
jupyter notebook
If you do not have jupyter notebook installed, you may need to run:
conda install jupyter notebook
TIP: Having trouble activating your environment, or loading specific packages once you have activated your environment? Try the following:
Type:
conda update conda
orconda update --all
If you prefer to not install Conda, the same setup and dependencies can be achieved by using another package manager such as pip
.
Setting up a netrc File
section in the README.¶The repository containing all of the required files is located at: https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-tutorial/browse
import os
from datetime import datetime
import requests as r
import numpy as np
import pandas as pd
import geopandas as gp
from skimage import io
import matplotlib.pyplot as plt
from osgeo import gdal
import rasterio as rio
from rasterio.mask import mask
from rasterio.enums import Resampling
import pyproj
from pyproj import Proj
from shapely.ops import transform
import xarray as xr
import geoviews as gv
from cartopy import crs
import hvplot.xarray
import holoviews as hv
gv.extension('bokeh', 'matplotlib')
# Set Up Working Environment
inDir = os.getcwd()
os.chdir(inDir)
STAC is a specification that provides a common language for interpreting geospatial information in order to standardize indexing and discovering data.
Four STAC Specifications:¶
- STAC API
- STAC Catalog
- STAC Collection
- STAC Item
#### In the section below, we will walk through an example of each. For additional information, check out: https://stacspec.org/.
stac = 'https://cmr.earthdata.nasa.gov/stac/' # CMR-STAC API Endpoint
stac_response = r.get(stac).json() # Call the STAC API endpoint
for s in stac_response: print(s)
id title stac_version description links
print(f"You are now using the {stac_response['id']} API (STAC Version: {stac_response['stac_version']}). {stac_response['description']}")
print(f"There are {len(stac_response['links'])} STAC catalogs available in CMR.")
You are now using the cmr-stac API (STAC Version: 1.0.0-beta.1). This is the landing page for CMR-STAC. Each provider link below contains a STAC endpoint. There are 42 STAC catalogs available in CMR.
LPCLOUD
.¶stac_lp = [s for s in stac_response['links'] if 'LP' in s['title']] # Search for only LP-specific catalogs
# LPCLOUD is the STAC catalog we will be using and exploring today
lp_cloud = r.get([s for s in stac_lp if s['title'] == 'LPCLOUD'][0]['href']).json()
for l in lp_cloud: print(f"{l}: {lp_cloud[l]}")
id: LPCLOUD title: LPCLOUD description: Root catalog for LPCLOUD stac_version: 1.0.0-beta.1 links: [{'rel': 'self', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD', 'title': 'Root endpoint for this provider', 'type': 'application/json'}, {'rel': 'root', 'href': 'https://cmr.earthdata.nasa.gov/stac/', 'title': 'CMR-STAC Root', 'type': 'application/json'}, {'rel': 'collections', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections', 'title': 'Collections for this provider', 'type': 'application/json'}, {'rel': 'search', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search', 'title': 'STAC Search endpoint for this provider', 'type': 'application/json'}]
lp_links = lp_cloud['links']
for l in lp_links: print(f"{l['href']} is the {l['title']}")
https://cmr.earthdata.nasa.gov/stac/LPCLOUD is the Root endpoint for this provider https://cmr.earthdata.nasa.gov/stac/ is the CMR-STAC Root https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections is the Collections for this provider https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search is the STAC Search endpoint for this provider
lp_collections = [l['href'] for l in lp_links if l['rel'] == 'collections'][0] # Set collections endpoint to variable
collections_response = r.get(f"{lp_collections}").json() # Call collections endpoint
print(f"This collection contains {collections_response['description']} ({len(collections_response['collections'])} available)")
This collection contains All collections provided by LPCLOUD (1 available)
collections = collections_response['collections']
collections[0]
{'id': 'C1711924822-LPCLOUD', 'short_name': 'HLSS30', 'stac_version': '1.0.0-beta.1', 'license': 'not-provided', 'title': 'HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30 m V1.5', 'description': 'PROVISIONAL - The Harmonized Landsat and Sentinel-2 (HLS) data have not been validated for their science quality and should not be used in science research or applications. The HLS project provides consistent surface reflectance data from the Operational Land Imager (OLI) aboard the joint NASA/USGS Landsat 8 satellite and the Multi-Spectral Instrument (MSI) aboard the European Union’s Copernicus Sentinel-2A and Sentinel-2B satellites. The combined measurement enables global observations of the land every 2-3 days at 30 meter (m) spatial resolution. The HLS project uses a set of algorithms to obtain seamless products from OLI and MSI that include atmospheric correction, cloud and cloud-shadow masking, spatial co-registration and common gridding, illumination and view angle normalization, and spectral bandpass adjustment. \r\n\r\nThe HLSS30 product provides 30 m Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) and is derived from Sentinel-2A and Sentinel-2B MSI data products. The HLSS30 and HLSL30 products are gridded to the same resolution and MGRS tiling system, and thus are “stackable” for time series analysis.\r\nThe HLSS30 product is provided in Cloud Optimized GeoTIFF (COG) format, and each band is distributed as a separate COG. There are 13 bands included in the HLSS30 product along with four angle bands and a quality assessment (QA) band. For a more detailed description of the individual bands provided in the HLSS30 product, please see the User Guide (https://lpdaac.usgs.gov/documents/770/HLS_User_Guide_V15_provisional.pdf).', 'links': [{'rel': 'self', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD', 'title': 'Info about this collection', 'type': 'application/json'}, {'rel': 'provider', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD', 'title': 'Root for this provider', 'type': 'application/json'}, {'rel': 'stac', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?collections=C1711924822-LPCLOUD', 'title': 'STAC Search this collection', 'type': 'application/json'}, {'rel': 'cmr', 'href': 'https://cmr.earthdata.nasa.gov/search/granules.json?collection_concept_id=C1711924822-LPCLOUD', 'title': 'CMR Search this collection', 'type': 'application/json'}, {'rel': 'items', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items', 'title': 'Granules in this collection', 'type': 'application/json'}, {'rel': 'overview', 'href': 'https://cmr.earthdata.nasa.gov/search/concepts/C1711924822-LPCLOUD.html', 'title': 'HTML metadata for collection', 'type': 'text/html'}, {'rel': 'metadata', 'href': 'https://cmr.earthdata.nasa.gov/search/concepts/C1711924822-LPCLOUD.xml', 'title': 'Native metadata for collection', 'type': 'application/xml'}, {'rel': 'metadata', 'href': 'https://cmr.earthdata.nasa.gov/search/concepts/C1711924822-LPCLOUD.umm_json', 'title': 'JSON metadata for collection', 'type': 'application/json'}], 'extent': {'crs': 'http://www.opengis.net/def/crs/OGC/1.3/CRS84', 'spatial': {'bbox': [[-180, -90, 180, 90]]}, 'trs': 'http://www.opengis.net/def/uom/ISO-8601/0/Gregorian', 'temporal': {'interval': [['2014-04-03T00:00:00.000Z', None]]}}}
Concept ID
is used to query by a specific product, so be sure to save the Concept ID for the HLS S30 V1.5 product below:¶# Search available collections for HLS and print them out
hls_collections = [c for c in collections if 'HLS' in c['title']]
for h in hls_collections: print(f"{h['title']} ({h['short_name']}) has a Concept ID of: {h['id']}")
HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30 m V1.5 (HLSS30) has a Concept ID of: C1711924822-LPCLOUD
s30 = [h for h in hls_collections if h['short_name'] == 'HLSS30'][0] # Grab HLSS30 collection
for s in s30['extent']: print(f"{s}: {s30['extent'][s]}") # Check out the extent of this collection
crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84 spatial: {'bbox': [[-180, -90, 180, 90]]} trs: http://www.opengis.net/def/uom/ISO-8601/0/Gregorian temporal: {'interval': [['2014-04-03T00:00:00.000Z', None]]}
print(f"HLS S30 Start Date is: {s30['extent']['temporal']['interval'][0][0]}")
s30_id = s30['id']
HLS S30 Start Date is: 2014-04-03T00:00:00.000Z
# Below, go through all links in the collection and return the link containing the items endpoint
s30_items = [s['href'] for s in s30['links'] if s['rel'] == 'items'][0] # Set items endpoint to variable
s30_items
'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items'
s30_items_response = r.get(f"{s30_items}").json() # Call items endpoint
s30_item = s30_items_response['features'][0] # select first item (10 items returned by default)
s30_item
{'type': 'Feature', 'id': 'G1953488623-LPCLOUD', 'stac_version': '1.0.0-beta.1', 'stac_extensions': ['eo'], 'collection': 'C1711924822-LPCLOUD', 'geometry': {'type': 'Polygon', 'coordinates': [[[-148.7983336, -18.1685097], [-148.7884973, -17.1765228], [-149.0134917, -17.1744029], [-149.2552935, -18.1636899], [-148.7983336, -18.1685097]]]}, 'bbox': [-149.255293, -18.16851, -148.788497, -17.174403], 'links': [{'rel': 'self', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items/G1953488623-LPCLOUD'}, {'rel': 'parent', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items'}, {'rel': 'collection', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD'}, {'rel': 'root', 'href': 'https://cmr.earthdata.nasa.gov/stac/'}, {'rel': 'provider', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD'}], 'properties': {'datetime': '2020-02-14T20:10:01.402Z', 'start_datetime': '2020-02-14T20:10:01.402Z', 'end_datetime': '2020-02-14T20:10:01.402Z', 'eo:cloud_cover': 6}, 'assets': {'B8A': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B8A.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B8A.tif'}, 'B02': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B02.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B02.tif'}, 'SZA': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.SZA.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.SZA.tif'}, 'B08': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B08.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B08.tif'}, 'B11': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B11.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B11.tif'}, 'B09': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B09.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B09.tif'}, 'B07': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B07.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B07.tif'}, 'VZA': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.VZA.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.VZA.tif'}, 'B03': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B03.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B03.tif'}, 'SAA': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.SAA.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.SAA.tif'}, 'VAA': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.VAA.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.VAA.tif'}, 'Fmask': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.Fmask.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.Fmask.tif'}, 'B05': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B05.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B05.tif'}, 'B10': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B10.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B10.tif'}, 'B04': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B04.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B04.tif'}, 'B12': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B12.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B12.tif'}, 'B06': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B06.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B06.tif'}, 'B01': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.B01.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.B01.tif'}, 'browse': {'name': 'Download HLS.S30.T06KTF.2020045T200849.v1.5.jpg', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-public/HLSS30.015/HLS.S30.T06KTF.2020045T200849.v1.5.jpg', 'type': 'image/jpeg'}, 'metadata': {'href': 'https://cmr.earthdata.nasa.gov/search/concepts/G1953488623-LPCLOUD.xml', 'type': 'application/xml'}}}
# Print metadata attributes from this observation
print(f"The ID for this item is: {s30_item['id']}")
print(f"It was acquired on: {s30_item['properties']['datetime']}")
print(f"over: {s30_item['bbox']} (Lower Left, Upper Right corner coordinates)")
print(f"It contains {len(s30_item['assets'])} assets")
print(f"and is {s30_item['properties']['eo:cloud_cover']}% cloudy.")
The ID for this item is: G1953488623-LPCLOUD It was acquired on: 2020-02-14T20:10:01.402Z over: [-149.255293, -18.16851, -148.788497, -17.174403] (Lower Left, Upper Right corner coordinates) It contains 20 assets and is 6% cloudy.
for i, s in enumerate(s30_items_response['features']):
print(f"Item at index {i} is {s['properties']['eo:cloud_cover']}% cloudy.")
Item at index 0 is 6% cloudy. Item at index 1 is 100% cloudy. Item at index 2 is 3% cloudy. Item at index 3 is 1% cloudy. Item at index 4 is 43% cloudy. Item at index 5 is 35% cloudy. Item at index 6 is 9% cloudy. Item at index 7 is 87% cloudy. Item at index 8 is 52% cloudy. Item at index 9 is 18% cloudy.
item_index
below to whichever observation is the least cloudy above.¶item_index = 3 # Indexing starts at 0 in Python, so here select the fourth item in the list at index 3
s30_item = s30_items_response['features'][item_index] # Grab the next item in the list
print(f"The ID for this item is: {s30_item['id']}")
print(f"It was acquired on: {s30_item['properties']['datetime']}")
print(f"over: {s30_item['bbox']} (Lower Left, Upper Right corner coordinates)")
print(f"It contains {len(s30_item['assets'])} assets")
print(f"and is {s30_item['properties']['eo:cloud_cover']}% cloudy.")
The ID for this item is: G1947759912-LPCLOUD It was acquired on: 2020-04-25T20:00:36.934Z over: [-69.00128, 80.920299, -62.006726, 81.956919] (Lower Left, Upper Right corner coordinates) It contains 16 assets and is 1% cloudy.
print("The following assets are available for download:")
for a in s30_item['assets']: print(a)
The following assets are available for download: B09 B06 B8A B11 B08 B02 B05 B01 Fmask B07 B10 B12 B03 B04 browse metadata
s30_item['assets']['browse']
{'name': 'Download HLS.S30.T19XEL.2020116T195901.v1.5.jpg', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-public/HLSS30.015/HLS.S30.T19XEL.2020116T195901.v1.5.jpg', 'type': 'image/jpeg'}
skimage
package to load the browse image into memory and matplotlib
to quickly visualize it.¶image = io.imread(s30_item['assets']['browse']['href']) # Load jpg browse image into memory
# Basic plot of the image
plt.figure(figsize=(10,10))
plt.imshow(image)
plt.show()
# Remove unnecessary variables
del image, s30_items, s30_items_response, stac_lp, stac_response
del a, collections, collections_response, h, hls_collections, l, lp_cloud, lp_collections, s, s30, s30_id, s30_item
lp_search = [l['href'] for l in lp_links if l['rel'] == 'search'][0] # Define the search endpoint
search_response = r.get(f"{lp_search}").json() # Send GET request to retrieve items
print(f"{len(search_response['features'])} items found!")
10 items found!
lim = 100
search_query = f"{lp_search}?&limit={lim}" # Add in a limit parameter to retrieve 100 items at a time.
search_response = r.get(search_query).json() # send GET request to retrieve first 100 items in the STAC collection
print(f"{len(search_response['features'])} items found!")
100 items found!
geopandas
. You will need to have downloaded the Field_Boundary.geojson from the repo, and it must be stored in the current working directory in order to continue.¶# Bring in the farm field region of interest
field = gp.read_file('Field_Boundary.geojson')
field
geometry | |
---|---|
0 | POLYGON ((-122.05172 39.91309, -122.06227 39.9... |
field = gp.read_file('C:/Username/HLS-Tutorial/Field_Boundary.geojson')
and try again.¶fieldShape = field['geometry'][0] # Define the geometry as a shapely polygon
fieldShape
geoviews
plots using *
) with a basemap layer.¶# Use geoviews to combine a basemap with the shapely polygon of our Region of Interest (ROI)
base = gv.tile_sources.EsriImagery.opts(width=650, height=500)
farmField = gv.Polygons(fieldShape).opts(line_color='yellow', color=None)
base * farmField
bbox = f'{fieldShape.bounds[0]},{fieldShape.bounds[1]},{fieldShape.bounds[2]},{fieldShape.bounds[3]}' # Defined from ROI bounds
search_query2 = f"{search_query}&bounding_box={bbox}" # Add bbox to query
search_response = r.get(search_query2).json() # Send request
print(f"{len(search_response['features'])} items found!")
20 items found!
datetime
parameter. Here we have set the time period of interest to all of 2020. Additional information on setting temporal searches can be found in the NASA CMR Documentation.¶date_time = "2020-01-01T00:00:00Z/2020-12-31T23:31:12Z" # Define start time period / end time period
search_query3 = f"{search_query2}&datetime={date_time}" # Add to query that already includes bounding_box
search_response = r.get(search_query3).json()
print(f"{len(search_response['features'])} items found!")
20 items found!
hls_items = search_response['features'] # Start a list of items for our use case
# This section is skipped for now, see note above.
# Next, narrow it down to our product of interest:
#collection = s30_item['collection']
#search_query4 = f"{search_query3}&concept_id={collection}"
#search_response = r.get(search_query4).json()#['links']
# Now add the L30 items of interest:
#for c in collections: print(f"{c['title']}: {c['id']}")
#collection = 'C1711972753-LPCLOUD'
#search_query4 = f"{search_query3}&concept_id={collection}"
#l30_items = r.get(search_query4).json()['features']
#hls_items = items + l30_items
del bbox, date_time, field, lim, lp_links, lp_search, search_query, search_query2, search_query3, search_response # Remove
# GDAL configurations used to successfully access LP DAAC Cloud Assets via vsicurl
gdal.SetConfigOption("GDAL_HTTP_UNSAFESSL", "YES")
gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','YES')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF')
h = hls_items[0]
h
{'type': 'Feature', 'id': 'G1947763353-LPCLOUD', 'stac_version': '1.0.0-beta.1', 'stac_extensions': ['eo'], 'collection': 'C1711924822-LPCLOUD', 'geometry': {'type': 'Polygon', 'coordinates': [[[-121.7203579, 39.6545573], [-121.7016593, 40.6435572], [-122.678039, 40.6504077], [-122.9750523, 39.6616043], [-121.7203579, 39.6545573]]]}, 'bbox': [-122.975052, 39.654557, -121.701659, 40.650408], 'links': [{'rel': 'self', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items/G1947763353-LPCLOUD'}, {'rel': 'parent', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items'}, {'rel': 'collection', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD'}, {'rel': 'root', 'href': 'https://cmr.earthdata.nasa.gov/stac/'}, {'rel': 'provider', 'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD'}], 'properties': {'datetime': '2020-07-08T19:03:27.893Z', 'start_datetime': '2020-07-08T19:03:27.893Z', 'end_datetime': '2020-07-08T19:03:27.893Z', 'eo:cloud_cover': 2}, 'assets': {'B08': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B08.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B08.tif'}, 'B07': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B07.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B07.tif'}, 'B05': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B05.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B05.tif'}, 'B12': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B12.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B12.tif'}, 'B04': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B04.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B04.tif'}, 'B8A': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B8A.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B8A.tif'}, 'B10': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B10.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B10.tif'}, 'B06': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B06.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B06.tif'}, 'B09': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B09.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B09.tif'}, 'B03': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B03.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B03.tif'}, 'B11': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B11.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B11.tif'}, 'B01': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B01.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B01.tif'}, 'Fmask': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.Fmask.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.Fmask.tif'}, 'B02': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.B02.tif', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B02.tif'}, 'browse': {'name': 'Download HLS.S30.T10TEK.2020190T184919.v1.5.jpg', 'href': 'https://lpdaac.earthdata.nasa.gov/lp-prod-public/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.jpg', 'type': 'image/jpeg'}, 'metadata': {'href': 'https://cmr.earthdata.nasa.gov/search/concepts/G1947763353-LPCLOUD.xml', 'type': 'application/xml'}}}
- "narrow" NIR = B8A
- Red = B04
- Blue = B02
- Quality = Fmask
evi_band_links = []
# Define which HLS product is being accessed
if h['assets']['browse']['href'].split('/')[4] == 'HLSS30.015':
evi_bands = ['B8A', 'B04', 'B02', 'Fmask'] # NIR RED BLUE Quality for S30
else:
evi_bands = ['B05', 'B04', 'B02', 'Fmask'] # NIR RED BLUE Quality for L30 (not yet released)
# Subset the assets in the item down to only the desired bands
for a in h['assets']:
if any(b == a for b in evi_bands):
evi_band_links.append(h['assets'][a]['href'])
for e in evi_band_links: print(e)
https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B04.tif https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B8A.tif https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.Fmask.tif https://lpdaac.earthdata.nasa.gov/lp-prod-protected/HLSS30.015/HLS.S30.T10TEK.2020190T184919.v1.5.B02.tif
image = io.imread(h['assets']['browse']['href']) # Load jpg browse image into memory
# Basic plot of the image
plt.figure(figsize=(10,10))
plt.imshow(image)
plt.show()
del image # Remove the browse image
Before loading the COGs into memory, run the cell below to check and make sure that you have a netrc file set up with your NASA Earthdata Login credentials, which will be needed to access the HLS files in the cells that follow. If you do not have a netrc file set up on your OS, the cell below should prompt you for your NASA Earthdata Login username and password.¶
# AUTHENTICATION CONFIGURATION
from netrc import netrc
from subprocess import Popen
from getpass import getpass
urs = 'urs.earthdata.nasa.gov' # Earthdata URL to call for authentication
prompts = ['Enter NASA Earthdata Login Username \n(or create an account at urs.earthdata.nasa.gov): ',
'Enter NASA Earthdata Login Password: ']
# Determine if netrc file exists, and if so, if it includes NASA Earthdata Login Credentials
try:
netrcDir = os.path.expanduser("~/.netrc")
netrc(netrcDir).authenticators(urs)[0]
del netrcDir
# Below, create a netrc file and prompt user for NASA Earthdata Login Username and Password
except FileNotFoundError:
homeDir = os.path.expanduser("~")
Popen('touch {0}.netrc | chmod og-rw {0}.netrc | echo machine {1} >> {0}.netrc'.format(homeDir + os.sep, urs), shell=True)
Popen('echo login {} >> {}.netrc'.format(getpass(prompt=prompts[0]), homeDir + os.sep), shell=True)
Popen('echo password {} >> {}.netrc'.format(getpass(prompt=prompts[1]), homeDir + os.sep), shell=True)
del homeDir
# Determine OS and edit netrc file if it exists but is not set up for NASA Earthdata Login
except TypeError:
homeDir = os.path.expanduser("~")
Popen('echo machine {1} >> {0}.netrc'.format(homeDir + os.sep, urs), shell=True)
Popen('echo login {} >> {}.netrc'.format(getpass(prompt=prompts[0]), homeDir + os.sep), shell=True)
Popen('echo password {} >> {}.netrc'.format(getpass(prompt=prompts[1]), homeDir + os.sep), shell=True)
del homeDir
del urs, prompts
rasterio
.¶# Use vsicurl to load the data directly into memory (be patient, may take a few seconds)
for e in evi_band_links:
if e.rsplit('.', 2)[-2] == evi_bands[0]: # NIR index
nir = rio.open(e)
elif e.rsplit('.', 2)[-2] == evi_bands[1]: # red index
red = rio.open(e)
elif e.rsplit('.', 2)[-2] == evi_bands[2]: # blue index
blue = rio.open(e)
elif e.rsplit('.', 2)[-2] == evi_bands[3]: # Fmask index
fmask = rio.open(e)
print("The COGs have been loaded into memory!")
The COGs have been loaded into memory!
Setting up a netrc File
section in the README.¶shapely
polygon and convert it from lat/lon (EPSG: 4326) into the native projection of HLS, UTM (aligned to the Military Grid Reference System). This must be done in order to use the Region of Interest (ROI) to subset the COG that is being pulled into memory--it must be in the native projection of the data being extracted.¶geo_CRS = Proj('+proj=longlat +datum=WGS84 +no_defs', preserve_units=True) # Source coordinate system of the ROI
utm = pyproj.Proj(nir.crs) # Destination coordinate system
project = pyproj.Transformer.from_proj(geo_CRS, utm) # Set up the transformation
fsUTM = transform(project.transform, fieldShape) # Apply reprojection
rasterio
. This greatly reduces the amount of data that are needed to load into memory.¶nir_array, nir_transform = rio.mask.mask(nir, [fsUTM], crop=True) # Extract the data for the ROI and clip to that bbox
plt.imshow(nir_array[0]); # Quick visual to assure that it worked
red_array, _ = rio.mask.mask(red,[fsUTM],crop=True)
blue_array, _ = rio.mask.mask(blue,[fsUTM],crop=True)
print('Data is loaded into memory!')
Data is loaded into memory!
del a, e, evi_band_links, evi_bands # Remove variables that are no longer needed
nir.meta
{'driver': 'GTiff', 'dtype': 'int16', 'nodata': -9999.0, 'width': 3660, 'height': 3660, 'count': 1, 'crs': CRS.from_epsg(32610), 'transform': Affine(30.0, 0.0, 499980.0, 0.0, -30.0, 4500000.0)}
nan
.¶# Grab scale factor from metadata and apply to each band
nir_scaled = nir_array[0] * nir.scales[0]
red_scaled = red_array[0] * red.scales[0]
blue_scaled = blue_array[0] * blue.scales[0]
# Set all nodata values to nan
nir_scaled[nir_array[0]==nir.nodata] = np.nan
red_scaled[red_array[0]==red.nodata] = np.nan
blue_scaled[blue_array[0]==blue.nodata] = np.nan
def evi(red, blue, nir):
return 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0)
evi_scaled = evi(red_scaled, blue_scaled, nir_scaled) # Generate EVI array
eviDate = h['properties']['datetime'].split('T')[0] # Set the observation date to a variable
matplotlib
.¶# Ignore matplotlib warnings
import warnings
warnings.filterwarnings('ignore')
fig = plt.figure(figsize = (10,7.5)) # Set the figure size (x,y)
plt.axis('off') # Remove the axes' values
fig.suptitle('HLS S30-derived EVI Data', fontsize=14, fontweight='bold') # Make a figure title
ax = fig.add_subplot(111) # Make a subplot
ax.set_title(f'Tehama County, CA: {eviDate}', fontsize=12, fontweight='bold') # Add figure subtitle
ax1 = plt.gca() # Get current axes
# Plot the array, using a colormap and setting a custom linear stretch
im = plt.imshow(evi_scaled, vmin=0, vmax=1, cmap='YlGn');
# Add a colormap legend
plt.colorbar(im, orientation='horizontal', fraction=0.047, pad=0.009, label='EVI', shrink=0.6).outline.set_visible(True)
fmask_array, _ = rio.mask.mask(fmask, [fsUTM], crop=True) # Load in the Quality data
bitword_order = (1, 1, 1, 1, 1, 1, 2) # set the number of bits per bitword
num_bitwords = len(bitword_order) # Define the number of bitwords based on your input above
total_bits = sum(bitword_order) # Should be 8, 16, or 32 depending on datatype
- Cloud == 0 (No Cloud)
- Cloud shadow == 0 (No Cloud shadow)
- Snow/ice == 0 (No Snow/ice present)
- Water == 0 (No Water present)
qVals = list(np.unique(fmask_array)) # Create a list of unique values that need to be converted to binary and decoded
all_bits = list()
goodQuality = []
for v in qVals:
all_bits = []
bits = total_bits
i = 0
# Convert to binary based on the values and # of bits defined above:
bit_val = format(v, 'b').zfill(bits)
print('\n' + str(v) + ' = ' + str(bit_val))
all_bits.append(str(v) + ' = ' + str(bit_val))
# Go through & split out the values for each bit word based on input above:
for b in bitword_order:
prev_bit = bits
bits = bits - b
i = i + 1
if i == 1:
bitword = bit_val[bits:]
print(' Bit Word ' + str(i) + ': ' + str(bitword))
all_bits.append(' Bit Word ' + str(i) + ': ' + str(bitword))
elif i == num_bitwords:
bitword = bit_val[:prev_bit]
print(' Bit Word ' + str(i) + ': ' + str(bitword))
all_bits.append(' Bit Word ' + str(i) + ': ' + str(bitword))
else:
bitword = bit_val[bits:prev_bit]
print(' Bit Word ' + str(i) + ': ' + str(bitword))
all_bits.append(' Bit Word ' + str(i) + ': ' + str(bitword))
# 2, 4, 5, 6 are the bits used. All 4 should = 0 if no clouds, cloud shadows were present, and pixel is not snow/ice/water
if int(all_bits[2].split(': ')[-1]) + int(all_bits[4].split(': ')[-1]) + \
int(all_bits[5].split(': ')[-1]) + int(all_bits[6].split(': ')[-1]) == 0:
goodQuality.append(v)
0 = 00000000 Bit Word 1: 0 Bit Word 2: 0 Bit Word 3: 0 Bit Word 4: 0 Bit Word 5: 0 Bit Word 6: 0 Bit Word 7: 00 255 = 11111111 Bit Word 1: 1 Bit Word 2: 1 Bit Word 3: 1 Bit Word 4: 1 Bit Word 5: 1 Bit Word 6: 1 Bit Word 7: 11
goodQuality
[0]
evi_band = np.ma.MaskedArray(evi_scaled, np.in1d(fmask_array, goodQuality, invert=True)) # Apply QA mask to the EVI data
evi_band = np.ma.filled(evi_band, np.nan) # Set masked data to nan
originalName = nir.name.rsplit('/', 1)[-1] # Grab the original HLS S30 granule name
originalName
'HLS.S30.T10TEK.2020190T184919.v1.5.B8A.tif'
HLS.S30: Product Short Name
T10TEK: MGRS Tile ID (T+5-digits)
2020190T184919: Julian Date and Time of Acquisition (YYYYDDDTHHMMSS)
v1.5: Product Version
B8A: Spectral Band
.tif: Data Format (Cloud Optimized GeoTIFF)For additional information on HLS naming conventions, be sure to check out the HLS Overview Page.¶
outName = f"{originalName.split('.B')[0]}_EVI.tif" # Generate output name from the original filename
outName
'HLS.S30.T10TEK.2020190T184919.v1.5_EVI.tif'
# Create output GeoTIFF with overviews
evi_tif = rio.open(outName, 'w', driver='GTiff', height=evi_band.shape[0], width=evi_band.shape[1], count=1,
dtype=str(evi_band.dtype), crs=nir.crs, transform=nir_transform)
evi_tif.write(evi_band, 1) # Write the EVI band to the newly created GeoTIFF
evi_tif.build_overviews([2, 4, 8], Resampling.average) # Calculate overviews
evi_tif.update_tags(ns='rio_overview', resampling='average') # Update tags
evi_tif.overviews(1) # Print overviews
[2, 4, 8]
evi_tif.close() # Close file
# Open newly created GeoTIFF, add tiling, compression, and export as COG
with rio.open(outName) as src:
kwds = src.profile
kwds['tiled'] = True
kwds['compress'] = 'LZW'
with rio.open(outName, 'w', **kwds) as dst:
dst.write(evi_band, 1)
dst.close(), src.close(), nir.close(), red.close(), fmask.close(), blue.close()
del nir_array, red_array, blue_array, red_scaled, blue_scaled, nir_scaled, i, originalName, outName, prev_bit, qVals, v
del all_bits, b, bit_val, bits, bitword, eviDate, evi_band, evi_scaled, fmask_array, goodQuality, h
len(hls_items)
20
# Now put it all together and loop through each of the files, visualize, calculate statistics on EVI, and export
for j, h in enumerate(hls_items):
evi_band_links = []
if h['assets']['browse']['href'].split('/')[4] == 'HLSS30.015':
evi_bands = ['B8A', 'B04', 'B02', 'Fmask'] # NIR RED BLUE FMASK
else:
evi_bands = ['B05', 'B04', 'B02', 'Fmask'] # NIR RED BLUE FMASK
for a in h['assets']:
if any(b == a for b in evi_bands):
evi_band_links.append(h['assets'][a]['href'])
# Use vsicurl to load the data directly into memory (be patient, may take a few seconds)
for e in evi_band_links:
if e.rsplit('.', 2)[-2] == evi_bands[0]: # NIR index
nir = rio.open(e)
elif e.rsplit('.', 2)[-2] == evi_bands[1]: # red index
red = rio.open(e)
elif e.rsplit('.', 2)[-2] == evi_bands[2]: # blue index
blue = rio.open(e)
elif e.rsplit('.', 2)[-2] == evi_bands[3]: # fmask index
fmask = rio.open(e)
# load data and scale
nir_array,nir_transform = rio.mask.mask(nir,[fsUTM],crop=True)
red_array, _ = rio.mask.mask(red,[fsUTM],crop=True)
blue_array, _ = rio.mask.mask(blue,[fsUTM],crop=True)
nir_scaled = nir_array[0] * nir.scales[0]
red_scaled = red_array[0] * red.scales[0]
blue_scaled = blue_array[0] * blue.scales[0]
nir_scaled[nir_array[0]==nir.nodata] = np.nan
red_scaled[red_array[0]==red.nodata] = np.nan
blue_scaled[blue_array[0]==blue.nodata] = np.nan
# Generate EVI
evi_scaled = evi(red_scaled, blue_scaled, nir_scaled)
# Quality Filter the data
fmask_array, _ = rio.mask.mask(fmask,[fsUTM],crop=True)
qVals = list(np.unique(fmask_array))
all_bits = list()
goodQuality = []
for v in qVals:
all_bits = []
bits = total_bits
i = 0
# Convert to binary based on the values and # of bits defined above:
bit_val = format(v, 'b').zfill(bits)
all_bits.append(str(v) + ' = ' + str(bit_val))
# Go through & split out the values for each bit word based on input above:
for b in bitword_order:
prev_bit = bits
bits = bits - b
i = i + 1
if i == 1:
bitword = bit_val[bits:]
all_bits.append(' Bit Word ' + str(i) + ': ' + str(bitword))
elif i == num_bitwords:
bitword = bit_val[:prev_bit]
all_bits.append(' Bit Word ' + str(i) + ': ' + str(bitword))
else:
bitword = bit_val[bits:prev_bit]
all_bits.append(' Bit Word ' + str(i) + ': ' + str(bitword))
# 2, 4, 5, 6 are the bits used. All should = 0 if no clouds, cloud shadows were present & pixel is not snow/ice/water
if int(all_bits[2].split(': ')[-1]) + int(all_bits[4].split(': ')[-1]) + \
int(all_bits[5].split(': ')[-1]) + int(all_bits[6].split(': ')[-1]) == 0:
goodQuality.append(v)
evi_band = np.ma.MaskedArray(evi_scaled, np.in1d(fmask_array, goodQuality, invert=True)) # Apply QA mask to the EVI data
evi_band = np.ma.filled(evi_band, np.nan)
outName = f"{nir.name.rsplit('/', 1)[-1].split('.B')[0]}_EVI.tif"
# Create output GeoTIFF with overviews
evi_tif = rio.open(outName, 'w', driver='GTiff', height=evi_band.shape[0], width=evi_band.shape[1], count=1,
dtype=str(evi_band.dtype), crs=nir.crs, transform=nir_transform)
evi_tif.write(evi_band, 1)
evi_tif.build_overviews([2, 4, 8], Resampling.average)
evi_tif.update_tags(ns='rio_overview', resampling='average')
evi_tif.close()
with rio.open(outName) as src:
kwds = src.profile
kwds['tiled'] = True
kwds['compress'] = 'LZW'
with rio.open(outName, 'w', **kwds) as dst:
dst.write(evi_band, 1)
src.close(), dst.close()
print(f"Processing file {j+1} of {len(hls_items)}")
Processing file 1 of 20 Processing file 2 of 20 Processing file 3 of 20 Processing file 4 of 20 Processing file 5 of 20 Processing file 6 of 20 Processing file 7 of 20 Processing file 8 of 20 Processing file 9 of 20 Processing file 10 of 20 Processing file 11 of 20 Processing file 12 of 20 Processing file 13 of 20 Processing file 14 of 20 Processing file 15 of 20 Processing file 16 of 20 Processing file 17 of 20 Processing file 18 of 20 Processing file 19 of 20 Processing file 20 of 20
# Remove variables that are no longer needed and close the files that were read in memory
del all_bits, b, bit_val, bits, bitword, bitword_order, blue_array, blue_scaled, e, evi_band, evi_band_links, evi_bands
del evi_scaled, fmask_array, goodQuality, h, hls_items, i, nir_array, nir_scaled, nir_transform, num_bitwords, outName
del prev_bit, qVals, red_array, red_scaled, stac, total_bits, v
nir.close(), red.close(), fmask.close(), blue.close()
(None, None, None, None)
Xarray
extends and combines much of the core functionality from both the Pandas library and Numpy, hence making it very good at handling multi-dimensional (N-dimensional) datasets that contain labels (e.g., variable names or dimension names).List the files in the current working directory.¶
eviFiles = [o for o in os.listdir() if o.endswith('EVI.tif')] # List EVI COGs
eviFiles
['HLS.S30.T10TEK.2020190T184919.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020193T185919.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020273T190109.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020275T185221.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020278T190241.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020280T185249.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020283T190319.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020285T185321.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020288T190351.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020290T185359.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020295T185431.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020298T190451.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020300T185459.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020303T190519.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020305T185531.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020308T190551.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020310T185559.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020313T190619.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020315T185621.v1.5_EVI.tif', 'HLS.S30.T10TEK.2020320T185649.v1.5_EVI.tif']
for i, e in enumerate(eviFiles):
time = datetime.strptime(e.rsplit('.v1.5', 1)[0].rsplit('.', 1)[-1], '%Y%jT%H%M%S') # Grab acquisition time from filename
# Need to set up the xarray data array for the first file
if i == 0:
eviStack = xr.open_rasterio(e) # Open file using rasterio
eviStack = eviStack.squeeze(drop=True)
eviStack.coords['time'] = np.array(time) # Define time coordinate
eviStack = eviStack.rename({'x':'lon', 'y':'lat', 'time':'time'}) # Rename coordinates
eviStack = eviStack.expand_dims(dim='time')
else:
eviS = xr.open_rasterio(e)
eviS = eviS.squeeze(drop=True)
eviS.coords['time'] = np.array(time)
eviS = eviS.rename({'x':'lon', 'y':'lat', 'time':'time'})
eviS = eviS.expand_dims(dim='time')
eviStack = xr.concat([eviStack, eviS], dim='time') # concatenate the new array to the data array
eviStack.name = 'EVI'
Xarray has two fundamental data structures. A
Dataset
holds multiple variables that potentially share the same coordinates and global metadata for the file (see above). ADataArray
contains a single multi-dimensional variable and its coordinates, attributes, and metadata. Data values can be pulled out of the DataArray as anumpy.ndarray
using thevalues
attribute.
eviStack
<xarray.DataArray 'EVI' (time: 20, lat: 59, lon: 38)> array([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]]) Coordinates: * lat (lat) float64 4.419e+06 4.419e+06 4.418e+06 ... 4.417e+06 4.417e+06 * lon (lon) float64 5.802e+05 5.802e+05 5.802e+05 ... 5.812e+05 5.813e+05 * time (time) datetime64[ns] 2020-07-08T18:49:19 ... 2020-11-15T18:56:49 Attributes: transform: (30.0, 0.0, 580140.0, 0.0, -30.0, 4418550.0) crs: +init=epsg:32610 res: (30.0, 30.0) is_tiled: 1 nodatavals: (nan,) scales: (1.0,) offsets: (0.0,) AREA_OR_POINT: Area
array([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]])
array([4418535., 4418505., 4418475., 4418445., 4418415., 4418385., 4418355., 4418325., 4418295., 4418265., 4418235., 4418205., 4418175., 4418145., 4418115., 4418085., 4418055., 4418025., 4417995., 4417965., 4417935., 4417905., 4417875., 4417845., 4417815., 4417785., 4417755., 4417725., 4417695., 4417665., 4417635., 4417605., 4417575., 4417545., 4417515., 4417485., 4417455., 4417425., 4417395., 4417365., 4417335., 4417305., 4417275., 4417245., 4417215., 4417185., 4417155., 4417125., 4417095., 4417065., 4417035., 4417005., 4416975., 4416945., 4416915., 4416885., 4416855., 4416825., 4416795.])
array([580155., 580185., 580215., 580245., 580275., 580305., 580335., 580365., 580395., 580425., 580455., 580485., 580515., 580545., 580575., 580605., 580635., 580665., 580695., 580725., 580755., 580785., 580815., 580845., 580875., 580905., 580935., 580965., 580995., 581025., 581055., 581085., 581115., 581145., 581175., 581205., 581235., 581265.])
array(['2020-07-08T18:49:19.000000000', '2020-07-11T18:59:19.000000000', '2020-09-29T19:01:09.000000000', '2020-10-01T18:52:21.000000000', '2020-10-04T19:02:41.000000000', '2020-10-06T18:52:49.000000000', '2020-10-09T19:03:19.000000000', '2020-10-11T18:53:21.000000000', '2020-10-14T19:03:51.000000000', '2020-10-16T18:53:59.000000000', '2020-10-21T18:54:31.000000000', '2020-10-24T19:04:51.000000000', '2020-10-26T18:54:59.000000000', '2020-10-29T19:05:19.000000000', '2020-10-31T18:55:31.000000000', '2020-11-03T19:05:51.000000000', '2020-11-05T18:55:59.000000000', '2020-11-08T19:06:19.000000000', '2020-11-10T18:56:21.000000000', '2020-11-15T18:56:49.000000000'], dtype='datetime64[ns]')
EVI
that has lat (y) and lon (x) coordinates, as well as the z dimension, which is time. This allows us to plot and visualize the HLS S30-derived EVI data as a time series sequentially by time and in geographic space using the lat/lon coordinates.¶# set the x, y, and z (groupby) dimensions, add a colormap/bar and other parameters.
title = 'HLS-derived EVI over an agricultural field in northern California'
eviStack.hvplot(x='lon', y='lat',groupby='time', cmap='YlGn', width=600, height=600, colorbar=True).opts(clim=(0.0, 1.0),
title=title)
ecrs = crs.epsg(eviStack.crs.split(':')[-1]) # Define CRS
# Use holoviews to create a plot grouped by time and with a specific stretch and colorbar/map applied and shown
hls_stack = eviStack.hvplot(x='lon', y='lat', crs=ecrs, groupby='time', cmap='YlGn', colorbar=True).opts(clim=(0.0, 1.0),
title=title)
base * hls_stack # Add in the basemap created earlier
eviStack.sel(time='2020-07-08 18:49:19') # Select a single time slice and pull out of xarray data array
<xarray.DataArray 'EVI' (lat: 59, lon: 38)> array([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]) Coordinates: * lat (lat) float64 4.419e+06 4.419e+06 4.418e+06 ... 4.417e+06 4.417e+06 * lon (lon) float64 5.802e+05 5.802e+05 5.802e+05 ... 5.812e+05 5.813e+05 time datetime64[ns] 2020-07-08T18:49:19 Attributes: transform: (30.0, 0.0, 580140.0, 0.0, -30.0, 4418550.0) crs: +init=epsg:32610 res: (30.0, 30.0) is_tiled: 1 nodatavals: (nan,) scales: (1.0,) offsets: (0.0,) AREA_OR_POINT: Area
array([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]])
array([4418535., 4418505., 4418475., 4418445., 4418415., 4418385., 4418355., 4418325., 4418295., 4418265., 4418235., 4418205., 4418175., 4418145., 4418115., 4418085., 4418055., 4418025., 4417995., 4417965., 4417935., 4417905., 4417875., 4417845., 4417815., 4417785., 4417755., 4417725., 4417695., 4417665., 4417635., 4417605., 4417575., 4417545., 4417515., 4417485., 4417455., 4417425., 4417395., 4417365., 4417335., 4417305., 4417275., 4417245., 4417215., 4417185., 4417155., 4417125., 4417095., 4417065., 4417035., 4417005., 4416975., 4416945., 4416915., 4416885., 4416855., 4416825., 4416795.])
array([580155., 580185., 580215., 580245., 580275., 580305., 580335., 580365., 580395., 580425., 580455., 580485., 580515., 580545., 580575., 580605., 580635., 580665., 580695., 580725., 580755., 580785., 580815., 580845., 580875., 580905., 580935., 580965., 580995., 581025., 581055., 581085., 581115., 581145., 581175., 581205., 581235., 581265.])
array('2020-07-08T18:49:19.000000000', dtype='datetime64[ns]')
sTime = '2020-07-08T18:49:19.000000000' # Single date
eviStack.sel(time=sTime).hvplot.box(by=['time'], rot=45, box_fill_color='lightblue', padding=0.1, width=450, height=350)
# Select all observations from 2020 and plot each as a boxplot showing the distribution of EVI values for this field
eviStack.sel(time='2020').hvplot.box('EVI', by=['time'], rot=45, box_fill_color='lightblue', padding=0.1, width=800, height=450)
# xarray allows you to easily calculate a number of statistics
evi_min = eviStack.min(('lat', 'lon'))
evi_max = eviStack.max(('lat', 'lon'))
evi_mean = eviStack.mean(('lat', 'lon'))
evi_sd = eviStack.std(('lat', 'lon'))
evi_count = eviStack.count(('lat', 'lon'))
evi_median = eviStack.median(('lat', 'lon'))
We now have the
mean
andstandard deviation
for each time slice as well as themaximum
andminimum
values. Let's do some plotting! We will use thehvPlot
package to create simple but interactive charts/plots. Hover your curser over the visualization to see the data values.
evi_mean.hvplot.line()
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')
# Combine line plots for different statistics
stats = (evi_mean.hvplot.line(height=350, width=450, line_width=1.5, color='red', grid=True, padding=0.05).opts(title='Mean')+
evi_sd.hvplot.line(height=350, width=450, line_width=1.5, color='red', grid=True, padding=0.05).opts(title='Standard Deviation')
+ evi_max.hvplot.line(height=350, width=450, line_width=1.5, color='red', grid=True, padding=0.05).opts(title='Max') +
evi_min.hvplot.line(height=350, width=450, line_width=1.5, color='red', grid=True, padding=0.05).opts(title='Min')).cols(2)
stats
pandas
dataframe with the statistics, and export to a CSV file.¶# Create pandas dataframe from dictionary
df = pd.DataFrame({'Min EVI': evi_min, 'Max EVI': evi_max,
'Mean EVI': evi_mean, 'Standard Deviation EVI': evi_sd,
'Median EVI': evi_median, 'Count': evi_count})
df.index = eviStack.time.data # Set the observation date as the index
df.to_csv('HLS-Derived_EVI_Stats.csv', index=True) # Export to CSV
del eviStack