Published: Oct. 14, 2021
This tutorial demonstrates how to interact with CMR-STAC in R.
This tutorial will teach you how to navigate and explore NASA's Common Metadata Repository (CMR) SpatioTemporal Asset Catalog (STAC) to learn about the datasets available through LP DAAC Cumulus cloud archive.
R and RStudio are required to execute this tutorial. Installation details can be found here.
This tutorial has been tested on Windows using R Version 4.1.0 and RStudio version 1.4.1717.
Clone or download HLS_Tutorial_R Repository from the LP DAAC Data User Resources Repository.
When you open this Rmarkdown notebook in RStudio, you can click the little green "Play" button in each grey code chunk to execute the code. The result can be printed either in the R Console or inline in the RMarkdown notebook, depending on your RStudio preferences.
version
into the console and RStudio by typing RStudio.Version()
into the console and update them if needed.Windows
Install and load installr:
install.packages("installr");library(installr)
Copy/Update the existing packages to the new R installation:
updateR()
Open RStudio, go to Help > Check for Updates to install newer version of RStudio (if available).
Mac
Required packages:
httr
jsonlite
purrr
DT
dplyr
magrittr
xml2
Run the cell below to identify any missing packages to install, and then load all of the required packages.
packages <- c('httr','purrr','jsonlite','DT','magrittr', 'xml2', 'dplyr')
new.packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos='http://cran.rstudio.com/') else print('All required packages are installed.')
invisible(lapply(packages, library, character.only = TRUE))
[1] "All required packages are installed."
STAC is short for Spatiotemporal Asset Catalog, a series
of specifications that provide a common language for interpreting geospatial
information in order to standardize indexing and discovery of spatiotemporal assets
(files containing information about the Earth across space and time).
There are four specifications that work both independently and together:
1) STAC Catalog
2) STAC Collection
3) STAC Item
4) STAC API specification
builds on top of the three core specifications mentioned above. All these
specifications are intended to be used together, yet are designed in a way that
each piece is small, self-contained, and reusable in other contexts.
The Common Metadata Repository (CMR) is a metadata system that catalogs Earth Science data and associated metadata records. NASA's CMR-STAC Application Programming Interface (API) is a translation API for STAC users who want to access and search through CMR's vast metadata holdings using STAC keywords.
The CMR-STAC API contains endpoints that enable the querying of STAC items.
Assign the CMR-STAC URL to a static variable.
CMR_STAC_URL <- 'https://cmr.earthdata.nasa.gov/stac/'
Connect to the CMR-STAC landing page which contains all the available data
providers and their STAC endpoint. In this tutorial, the httr
package is used
to navigate CMR-STAC API.
cmr_cat <- httr::GET(CMR_STAC_URL) %>% # Request and retrieve the info from CMR-STAC URL
httr::content()
cat('You are using',cmr_cat$title,'version',cmr_cat$stac_version,".", cmr_cat$description,sep=" ")
You are using NASA CMR STAC Proxy version 1.0.0 . This is the landing page for CMR-STAC. Each provider link contains a STAC endpoint.
Here, jsonlite
is used to change the format of the content returned from our
request and the DT
package is used to make the returned information more readable.
The providers' names and URL links are found in the title
and 'href' fields respectively.
cmr_cat_links <- cmr_cat$links %>%
jsonlite::toJSON(auto_unbox = TRUE) %>%
jsonlite::fromJSON() %>%
as.data.frame()
DT::datatable(cmr_cat_links)
The data frame above shows all the data providers with their associated STAC
catalog endpoints. You will notice above that the CMR-STAC API contains many
different endpoints--not just from NASA LP DAAC, but also contains endpoints
for other NASA ESDIS DAACs. Use the title
field to identify the data provider
you are interested in. The data product used in this tutorial is hosted in the
LP DAAC Cumulus Cloud space (LPCLOUD).
Assign LPCLOUD
to the provider
variable and get this provider's endpoint
from the CMR catalog using the URL in Link
field.
provider <- 'LPCLOUD'
lpcloud_cat_link <- cmr_cat_links[which(cmr_cat_links$title == provider), 'href']
lpcloud_cat_link
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD"
STAC Catalog Contains a JSON file of links that organize all the available
collections. Below, connect to the LPCLOUD STAC Catalog endpoint using httr
package and print the information contained in the Catalog.
lpcloud_cat <- httr::GET(lpcloud_cat_link) %>%
httr::content()
lpcloud_cat <- lpcloud_cat %>%
jsonlite::toJSON(auto_unbox = TRUE) %>%
jsonlite::fromJSON()
DT::datatable(lpcloud_cat$links)
LPCLOUD STAC catalog includes URL links to the root, collections, search, and child STAC Catalogs. The data frame above also shows the available collections in the LPCLOUD catalog.
STAC Collection is extension of STAC Catalog containing additional information that describe the STAC Items in that Collection.
Get the URL link to the STAC Collections.
lpcloud_col_link <- lpcloud_cat$links[which(lpcloud_cat$links$rel == 'collections'),'href']
lpcloud_col_link
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections"
Next, get the content describing the collections within LPCLOUD Catalog. Important information such as data collection ID, title, description, and links to collection endpoints are provided here.
lpcloud_collection <- httr::GET(lpcloud_col_link) %>%
httr::content()
lpcloud_collection <- lpcloud_collection %>%
jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)
Print the collections within LPCLOUD STAC catalog.
lpcloud_collection_df <- jsonlite::fromJSON(lpcloud_collection)$collections
lpcloud_collection_df$id
[1] "ASTGTM.v003" "HLSL30.v2.0" "HLSL30.v1.5" "HLSS30.v1.5" "HLSS30.v2.0"
In CMR, Collection ID is used to query by a specific product, so be sure to
save the ID for a collection you are interested in. For instance, the Collection
ID for ASTER Global Digital Elevation Model V003 is ASTGTM.v003
. Note that the
"id" shortname is in the format: productshortname.vVVV (where VVV = product version).
Here, get the URL link to the ASTGTM.v003
STAC Collection. If you are
interested in querying a different LPCLOUD product, swap out the shortname to
assign to the collection
variable below.
collection <- 'ASTGTM.v003' # USER INPUT
col_links <- lpcloud_collection_df$links[which(lpcloud_collection_df$id == collection)] %>%
as.data.frame()
astgtm_URL <- col_links[which(col_links$rel == 'self'), 'href']
astgtm_URL
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
The STAC Collection metadata for any collection contains metadata and information that is applicable to every STAC Item and asset(s) that it contains. Get the content of the ASTGTM.v003 collection URL and print the collection description.
astgtm_collection <- httr::GET(astgtm_URL) %>%
httr::content()
astgtm_collection <- astgtm_collection %>%
jsonlite::toJSON(auto_unbox = TRUE) %>%
jsonlite::fromJSON()
cat(astgtm_collection$description)
The ASTER Global Digital Elevation Model (GDEM) Version 3 (ASTGTM) provides a global digital elevation model (DEM) of land areas on Earth at a spatial resolution of 1 arc second (approximately 30 meter horizontal posting at the equator).
The development of the ASTER GDEM data products is a collaborative effort between National Aeronautics and Space Administration (NASA) and Japan’s Ministry of Economy, Trade, and Industry (METI). The ASTER GDEM data products are created by the Sensor Information Laboratory Corporation (SILC) in Tokyo.
The ASTER GDEM Version 3 data product was created from the automated processing of the entire ASTER Level 1A (https://doi.org/10.5067/ASTER/AST_L1A.003) archive of scenes acquired between March 1, 2000, and November 30, 2013. Stereo correlation was used to produce over one million individual scene based ASTER DEMs, to which cloud masking was applied. All cloud screened DEMs and non-cloud screened DEMs were stacked. Residual bad values and outliers were removed. In areas with limited data stacking, several existing reference DEMs were used to supplement ASTER data to correct for residual anomalies. Selected data were averaged to create final pixel values before partitioning the data into 1 degree latitude by 1 degree longitude tiles with a one pixel overlap. To correct elevation values of water body surfaces, the ASTER Global Water Bodies Database (ASTWBD) (https://doi.org/10.5067/ASTER/ASTWBD.001) Version 1 data product was also generated.
The geographic coverage of the ASTER GDEM extends from 83° North to 83° South. Each tile is distributed in GeoTIFF format and projected on the 1984 World Geodetic System (WGS84)/1996 Earth Gravitational Model (EGM96) geoid. Each of the 22,912 tiles in the collection contain at least 0.01% land area.
Provided in the ASTER GDEM product are layers for DEM and number of scenes (NUM). The NUM layer indicates the number of scenes that were processed for each pixel and the source of the data.
While the ASTER GDEM Version 3 data products offer substantial improvements over Version 2, users are advised that the products still may contain anomalies and artifacts that will reduce its usability for certain applications.
Improvements/Changes from Previous Versions
• Expansion of acquisition coverage to increase the amount of cloud-free input scenes from about 1.5 million in Version 2 to about 1.88 million scenes in Version 3.
• Separation of rivers from lakes in the water body processing.
• Minimum water body detection size decreased from 1 km2 to 0.2 km2.
We can also get the spatial and temporal extent information. Below, we can see this collection has a global spatial extent. ASTER GDEM is a single, static dataset that incorporates observation from March 2000 to November 2013.
astgtm_collection$extent %>%
jsonlite::toJSON(auto_unbox = TRUE)
{"spatial":{"bbox":[[-180,-83,180,82]]},"temporal":{"interval":[["2000-03-01T00:00:00.000Z","2013-11-30T23:59:59.999Z"]]}}
STAC collection also includes useful links. You can visit all the items within
this collection using the Items
URL.
DT::datatable(astgtm_collection$links)
Get the URL to the ASTGTM.v003 Items.
items_url <- astgtm_collection$links [which(astgtm_collection$links$rel == 'items'), 'href']
items_url
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003/items"
STAC Item represents data and metadata assets that are spatiotemporally coincident.
Below, query the STAC Items within the ASTGTM.v003
STAC Collection and print the
first item in the collection.
astgtm_items <- httr::GET(items_url) %>%
httr::content(as = "text") %>%
jsonlite::fromJSON()
F1 <- astgtm_items$features[1,] %>%
jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)
F1
[
{
"type": "Feature",
"id": "ASTGTMV003_N03E008",
"stac_version": "1.0.0",
"stac_extensions": [],
"collection": "ASTGTM.v003",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[7.9999, 2.9999],
[9.0001, 2.9999],
[9.0001, 4.0001],
[7.9999, 4.0001],
[7.9999, 2.9999]
]
]
},
"bbox": [7.9999, 2.9999, 9.0001, 4.0001],
"links": [
{
"rel": "self",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003/items/ASTGTMV003_N03E008"
},
{
"rel": "parent",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
},
{
"rel": "collection",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
},
{
"rel": "root",
"href": "https://cmr.earthdata.nasa.gov/stac/"
},
{
"rel": "provider",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD"
},
{
"rel": "via",
"href": "https://cmr.earthdata.nasa.gov/search/concepts/G1716133754-LPCLOUD.json"
},
{
"rel": "via",
"href": "https://cmr.earthdata.nasa.gov/search/concepts/G1716133754-LPCLOUD.umm_json"
}
],
"properties": {
"datetime": "2000-03-01T00:00:00.000Z",
"start_datetime": "2000-03-01T00:00:00.000Z",
"end_datetime": "2013-11-30T23:59:59.000Z"
},
"assets": {
"003/ASTGTMV003_N03E008_dem": {
"title": "Download ASTGTMV003_N03E008_dem.tif",
"href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E008_dem.tif"
},
"003/ASTGTMV003_N03E008_num": {
"title": "Download ASTGTMV003_N03E008_num.tif",
"href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E008_num.tif"
},
"browse": {
"title": "Download ASTGTMV003_N03E008.1.jpg",
"href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/ASTGTM.003/ASTGTMV003_N03E008.1.jpg",
"type": "image/jpeg"
},
"metadata": {
"href": "https://cmr.earthdata.nasa.gov/search/concepts/G1716133754-LPCLOUD.xml",
"type": "application/xml"
},
"003/ASTGTMV003_N02E022_dem": {},
"003/ASTGTMV003_N02E022_num": {},
"003/ASTGTMV003_N00W065_dem": {},
"003/ASTGTMV003_N00W065_num": {},
"003/ASTGTMV003_N01E009_dem": {},
"003/ASTGTMV003_N01E009_num": {},
"003/ASTGTMV003_N02E009_dem": {},
"003/ASTGTMV003_N02E009_num": {},
"003/ASTGTMV003_N03E021_dem": {},
"003/ASTGTMV003_N03E021_num": {},
"003/ASTGTMV003_N01E021_dem": {},
"003/ASTGTMV003_N01E021_num": {},
"003/ASTGTMV003_N01E042_dem": {},
"003/ASTGTMV003_N01E042_num": {},
"003/ASTGTMV003_N01W069_dem": {},
"003/ASTGTMV003_N01W069_num": {},
"003/ASTGTMV003_N01W080_dem": {},
"003/ASTGTMV003_N01W080_num": {}
}
}
]
Notice that the number of items matching our request is far more than what is returned.
cat(astgtm_items$context$matched, 'items matched your request but', astgtm_items$context$returned, 'items are returned.')
22912 items matched your request but 10 items are returned.
This is because the return is paginated. The STAC API, by default, returns the
first 10 records. To explore more items, you can add ?page=n
(in which n
is
the page number (i.e. ?page=2)) to the URL link and submit another request.
Below, request a query to return records on the second page.
page_2_url <- paste0(items_url, '?page=2')
astgtm_page2_items <- httr::GET(page_2_url) %>%
httr::content(as = "text") %>%
jsonlite::fromJSON()
astgtm_page2_items$features[1,] %>%
jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)
[
{
"type": "Feature",
"id": "ASTGTMV003_N03E042",
"stac_version": "1.0.0",
"stac_extensions": [],
"collection": "ASTGTM.v003",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[41.9999, 2.9999],
[43.0001, 2.9999],
[43.0001, 4.0001],
[41.9999, 4.0001],
[41.9999, 2.9999]
]
]
},
"bbox": [41.9999, 2.9999, 43.0001, 4.0001],
"links": [
{
"rel": "self",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003/items/ASTGTMV003_N03E042"
},
{
"rel": "parent",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
},
{
"rel": "collection",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
},
{
"rel": "root",
"href": "https://cmr.earthdata.nasa.gov/stac/"
},
{
"rel": "provider",
"href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD"
},
{
"rel": "via",
"href": "https://cmr.earthdata.nasa.gov/search/concepts/G1726373735-LPCLOUD.json"
},
{
"rel": "via",
"href": "https://cmr.earthdata.nasa.gov/search/concepts/G1726373735-LPCLOUD.umm_json"
}
],
"properties": {
"datetime": "2000-03-01T00:00:00.000Z",
"start_datetime": "2000-03-01T00:00:00.000Z",
"end_datetime": "2013-11-30T23:59:59.000Z"
},
"assets": {
"003/ASTGTMV003_N03E042_dem": {
"title": "Download ASTGTMV003_N03E042_dem.tif",
"href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E042_dem.tif"
},
"003/ASTGTMV003_N03E042_num": {
"title": "Download ASTGTMV003_N03E042_num.tif",
"href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E042_num.tif"
},
"browse": {
"title": "Download ASTGTMV003_N03E042.1.jpg",
"href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/ASTGTM.003/ASTGTMV003_N03E042.1.jpg",
"type": "image/jpeg"
},
"metadata": {
"href": "https://cmr.earthdata.nasa.gov/search/concepts/G1726373735-LPCLOUD.xml",
"type": "application/xml"
},
"003/ASTGTMV003_N00W061_dem": {},
"003/ASTGTMV003_N00W061_num": {},
"003/ASTGTMV003_N02W066_dem": {},
"003/ASTGTMV003_N02W066_num": {},
"003/ASTGTMV003_N02W069_dem": {},
"003/ASTGTMV003_N02W069_num": {},
"003/ASTGTMV003_N01E022_dem": {},
"003/ASTGTMV003_N01E022_num": {},
"003/ASTGTMV003_N01E026_dem": {},
"003/ASTGTMV003_N01E026_num": {},
"003/ASTGTMV003_N02W064_dem": {},
"003/ASTGTMV003_N02W064_num": {},
"003/ASTGTMV003_N01W064_dem": {},
"003/ASTGTMV003_N01W064_num": {},
"003/ASTGTMV003_N01E027_dem": {},
"003/ASTGTMV003_N01E027_num": {},
"003/ASTGTMV003_N00E006_dem": {},
"003/ASTGTMV003_N00E006_num": {}
}
}
]
The STAC Item ID (CMR Granule ID) is the unique identifier assigned to each granule within a data collection. Within each STAC Item are assets, which include the downloadable and streamable URL to data files along with other asset objects. Below, the first Granule ID is used to get the downloadable data file.
items_df <- jsonlite::fromJSON(F1)
item <- items_df$assets # Get the assets for the first Item
assets <- purrr::map_df(items_df$assets, data.frame, .id = 'asset')
assets
The links found in the href
field can be used to download each specific asset.
In the previous section, we learned how to navigate and explore the STAC Catalog. In this section, we are utilizing CMR-STAC Search endpoint to query items and associated assets faster and in a more precise way. With the CMR-STAC Search endpoint, we can specify the collection(s), the area of interest, the time period of interest, as well as other parameters to identify the STAC Items that meet our criteria. Visit here for more information on search query parameters.
Use the following code to find the link from within the LPCLOUD catalog.
lpcloud_search_URL <- lpcloud_cat$links[which(lpcloud_cat$links$rel == 'search'),'href']
lpcloud_search_URL
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search"
[2] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search"
Next, define the search parameters.
- Query by collection: Collection IDs should be defined as a list.
- Spatial Querying via Bounding Box: A bounding box including the coordinates
of LL (lower left) and UR (upper right) respectively.
- Temporal Querying: Time period of interest should be specified as
YYYY-MM-DDTHH:MM:SSZ/YYYY-MM-DDTHH:MM:SSZ
.
collections <- list('ASTGTM.v003')
datetime <- '2000-01-01T00:00:00Z/2001-01-31T23:59:59Z' #YYYY-MM-DDTHH:MM:SSZ/YYYY-MM-DDTHH:MM:SSZ
bbox <- '-122.0622682571411,39.897234301806,-122.04918980598451,39.91309383703065' # LL and UR Coordinates
Create search body object from our search parameters.
body <- list(limit=100,
datetime=datetime,
bbox= bbox,
collections= collections)
Notice the limit
parameter in the body
object. This parameter allows us to
adjust the number of records returned during a request (default = 10).
Next, submit a query to STAC Search endpoint using a POST request.
search_req <- httr::POST(lpcloud_search_URL[1], body = body, encode = "json") %>%
httr::content(as = "text") %>%
jsonlite::fromJSON()
names(search_req)
[1] "type" "stac_version" "numberMatched" "numberReturned"
[5] "features" "links" "context"
Let's see how many STAC Items, or granules, intersect with our search parameters.
cat("The number of STAC Items matched your query is ", search_req$numberMatched, 'and ', search_req$numberReturned, 'Items are returned.')
The number of STAC Items matched your query is 1 and 1 Items are returned.
Next, create a data frame with the returned information, including granule ID, datetime properties, and the downloadable URL links to the assets.
granule_list <- list()
n <- 1
for(row in row.names(search_req$features)){
f <- search_req$features[row,]
for (b in f$assets){
df <- data.frame(Collection = f$collection,
Granule_ID = f$id,
Datetime = f$properties$datetime,
Asset_Link = b$href, stringsAsFactors=FALSE)
granule_list[[n]] <- df
n <- n + 1
}
}
search_df <- do.call(rbind, granule_list)
DT::datatable(search_df)
The CMR-STAC Search endpoint allows user to quickly search for STAC Items that
meet their specific spatial, temporal, and data product requirements. Now that
you learned how to navigate and explore the CMR-STAC catalog, check out the HLS_tutorial
to learn how to interact with HLS data specifically.