amadeus

amadeus is a mechanism for data, environments, and user setup for common environmental and weather datasets in R. amadeus has been developed to improve access to and utility with large scale, publicly available environmental data in R.

See the peer-reviewed publication, Amadeus: Accessing and analyzing large scale environmental data in R, for full description and details.

Cite amadeus as:

Manware, M., Song, I., Marques, E. S., Kassien, M. A., Clark, L. P., & Messier, K. P. (2025). Amadeus: Accessing and analyzing large scale environmental data in R. Environmental Modelling & Software, 186, 106352.

Installation

amadeus can be installed from CRAN with install.packages or from GitHub with pak.

install.packages("amadeus")

pak::pak("NIEHS/amadeus")

Download

download_data accesses and downloads raw geospatial data from a variety of open source data repositories. The function is a wrapper that calls source-specific download functions, each of which account for the source's unique combination of URL, file naming conventions, and data types. Download functions cover the following sources:

Data Source	File Type	Data Genre	Spatial Extent	Function Suffix
Climatology Lab TerraClimate	netCDF	Meteorology	Global	`_terraclimate`
Climatology Lab GridMet	netCDF	Climate Water	Contiguous United States	`_gridmet`
Köppen-Geiger Climate Classification	GeoTIFF	Climate Classification	Global	`_koppen_geiger`
MRLC 1 Consortium National Land Cover Database (NLCD)	GeoTIFF	Land Use	United States	`_nlcd`
USDA CropScape Cropland Data Layer (CDL)	GeoTIFF	Land Use Agriculture	United States	`_cropscape`
NASA 2 Moderate Resolution Imaging Spectroradiometer (MODIS)	HDF	Atmosphere Meteorology Land Use Satellite	Global	`_modis`
NASA Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2)	netCDF	Atmosphere Meteorology	Global	`_merra2`
NASA SEDAC 3 UN WPP-Adjusted Population Density	GeoTIFF netCDF	Population	Global	`_population`
NASA SEDAC Global Roads Open Access Data Set	Shapefile Geodatabase	Roadways	Global	`_groads`
USGS 4 Hydrologic Unit Codes (HUC)	Geodatabase Shapefile	Hydrology	United States	`_huc`
NASA Goddard Earth Observing System Composition Forcasting (GEOS-CF)	netCDF	Atmosphere Meteorology	Global	`_geos`
EDGAR Emissions Database for Global Atmospheric Research	netCDF TXT	Emissions	Global	`_edgar`
NOAA Hazard Mapping System Fire and Smoke Product	Shapefile KML	Wildfire Smoke	North America	`_hms`
NOAA GOES Aerosol Detection Product (ADP)	netCDF	Atmosphere Satellite	Americas Pacific	`_goes`
NOAA NCEP 5 North American Regional Reanalysis (NARR)	netCDF	Atmosphere Meteorology	North America	`_narr`
PRISM Climate Group	netCDF ASCII Grid GRIB2	Meteorology Climate	Contiguous United States	`_prism`
[Drought indices (SPEI, EDDI, USDM)](https://droughtmonitor.unl.edu)	netCDF ASCII Grid Shapefile	Drought	Global Contiguous United States	`_drought`
US EPA 6 Air Data Pre-Generated Data Files	CSV	Air Pollution	United States	`_aqs`
IMPROVE aerosol monitoring program	TXT (pipe-delimited)	Air Pollution Aerosols	United States	`_improve`
US EPA Ecoregions	Shapefile	Climate Regions	North America	`_ecoregions`
US EPA National Emissions Inventory (NEI)	CSV	Emissions	United States	`_nei`
US EPA Toxic Release Inventory (TRI) Program	CSV	Chemicals Pollution	United States	`_tri`
USGS 4 Global Multi-resolution Terrain Elevation Data (GMTED2010)	ESRI ASCII Grid	Elevation	Global	`_gmted`

See the "download_data" vignette for a detailed description of source-specific download functions.

For TRI, download_tri() can retrieve EPA annual basic data files for the nationwide dataset (jurisdiction = "US"), individual states or territories (jurisdiction = "AZ", "NC", etc.), and the tribal file (jurisdiction = "tbl").

NASA Earthdata authentication with `setup_nasa_token()`

Many NASA-hosted datasets require an Earthdata Login bearer token. In amadeus, this includes modis, merra2, geos, and population (NASA SEDAC). Use setup_nasa_token() to store the token before calling the corresponding download_*() functions. See vignette("protected_datasets", package = "amadeus") for more detail.

setup_nasa_token() supports three storage methods: method = "renviron" writes NASA_EARTHDATA_TOKEN to ~/.Renviron for persistent personal use; method = "file" writes a local token file such as ~/.nasa_earthdata_token; and method = "session" uses Sys.setenv() for the current R session only.

setup_nasa_token()                              # prompts interactively
setup_nasa_token(method = "renviron", token = "<your_token>")

Never commit Earthdata tokens to git or include them in shared scripts. Prefer method = "renviron" on personal machines, and method = "session" for shared systems or CI jobs where the token is supplied from a CI secret.

Example use of download_data using NOAA NCEP North American Regional Reanalysis's (NARR) "weasd" (Daily Accumulated Snow at Surface) variable.

directory <- "/  EXAMPLE  /  FILE  /  PATH  /"
download_data(
  dataset_name = "narr",
  year = 2022,
  variable = "weasd",
  directory_to_save = directory,
  acknowledgement = TRUE,
  download = TRUE,
  hash = TRUE
)

Downloading requested files...
Requested files have been downloaded.
[1] "5655d4281b76f4d4d5bee234c2938f720cfec879"

list.files(file.path(directory, "weasd"))

[1] "weasd.2022.nc"

Process

process_covariates imports and cleans raw geospatial data (downloaded with download_data), and returns a single SpatRaster or SpatVector into the user's R environment. process_covariates "cleans" the data by defining interpretable layer names, ensuring a coordinate reference system is present, and managing `timedata (if applicable).

To avoid errors when using process_covariates, do not edit the raw downloaded data files. Passing user-generated or edited data into process_covariates may result in errors as the underlying functions are adapted to each sources' raw data file type.

Example use of process_covariates using the downloaded "weasd" data.

weasd_process <- process_covariates(
  covariate = "narr",
  date = c("2022-01-01", "2022-01-05"),
  variable = "weasd",
  path = file.path(directory, "weasd"),
  extent = NULL
)

Detected monolevel data...
Cleaning weasd data for 2022...
Returning daily weasd data from 2022-01-01 to 2022-01-05.

weasd_process

class       : SpatRaster
dimensions  : 277, 349, 5  (nrow, ncol, nlyr)
resolution  : 32462.99, 32463  (x, y)
extent      : -16231.49, 11313351, -16231.5, 8976020  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs
source      : weasd.2022.nc:weasd
varname     : weasd (Daily Accumulated Snow at Surface)
names       : weasd_20220101, weasd_20220102, weasd_20220103, weasd_20220104, weasd_20220105
unit        :         kg/m^2,         kg/m^2,         kg/m^2,         kg/m^2,         kg/m^2
time        : 2022-01-01 to 2022-01-05 UTC

Calculate Covariates

calculate_covariates stems from the beethoven project's need for various types of data extracted at precise locations. calculate_covariates, therefore, extracts data from the "cleaned" SpatRaster or SpatVector object at user defined locations. Users can choose to buffer the locations. The function returns a data.frame, sf, or SpatVector with data extracted at all locations for each layer or row in the SpatRaster or SpatVector object, respectively.

Example of calculate_covariates using processed "weasd" data.

locs <- data.frame(id = "001", lon = -78.8277, lat = 35.95013)
weasd_covar <- calculate_covariates(
  covariate = "narr",
  from = weasd_process,
  locs = locs,
  locs_id = "id",
  radius = 0,
  geom = "sf"
)

Detected `data.frame` extraction locations...
Calculating weasd covariates for 2022-01-01...
Calculating weasd covariates for 2022-01-02...
Calculating weasd covariates for 2022-01-03...
Calculating weasd covariates for 2022-01-04...
Calculating weasd covariates for 2022-01-05...
Returning extracted covariates.

weasd_covar

Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 8184606 ymin: 3523283 xmax: 8184606 ymax: 3523283
Projected CRS: unnamed
   id       time     weasd_0                geometry
1 001 2022-01-01 0.000000000 POINT (8184606 3523283)
2 001 2022-01-02 0.000000000 POINT (8184606 3523283)
3 001 2022-01-03 0.000000000 POINT (8184606 3523283)
4 001 2022-01-04 0.000000000 POINT (8184606 3523283)
5 001 2022-01-05 0.001953125 POINT (8184606 3523283)

Computational considerations

amadeus builds on terra and exactextractr, which are C++-backed and efficient for individual raster, vector, and extraction operations. For large spatial or temporal domains, however, the cumulative wall-clock cost of many process_*() or calculate_*() calls can still be significant.

These workloads are often embarrassingly parallel across dates, variables, or location chunks. See vignette("computational_considerations", package = "amadeus") for examples using sequential baselines, process-level parallelism, and reproducible pipeline tools.

Calculate_* buffer radius information

locs are first projected to crs(from), then buffering uses that projected geometry.
radius is interpreted in the geometry CRS distance units
Most calc_* docs explicitly describe radius in meters, and output column names often encode that radius (sometimes zero-padded).
For radius == 0, many paths do point extraction (no real buffer), but a couple helper paths create a tiny fallback buffer (1 or 1e-6) for weighted/exact extraction logic.

Connecting Health Outcomes Research Data Systems

The amadeus package has been developed as part of the National Institute of Environmental Health Science's (NIEHS) Connecting Health Outcomes Research Data Systems (CHORDS) program. CHORDS aims to "build and strengthen data infrastructure for patient-centered outcomes research on environment and health" by providing curated data, analysis tools, and educational resources. As the CHORDS project comes to an end in FY26, it is being absorbed into the larger NIH Health and Extreme Weather program and the NIH Accelerator program (https://www.niehs.nih.gov/research/programs/chords/hew-data).

Future Development, Maintenance, and Opportunities for Contribution

amadeus is being actively developed and maintained by the SET group at NIEHS. Future development will focus on expanding the number of data sources and datasets covered, improving the efficiency of download and processing functions, and adding new functionality for calculating covariates and analyzing data.

PI driven datasets: There are many datasets created by individual researchers. To expand the number of datasets covered by amadeus, we will be adding functions to access and process datasets created by individual researchers. If you are an environmental health researcher with a dataset that you would like to see added to amadeus, please reach out via the issues tab on GitHub and add a tag new dataset to your issue.
More options for covariate calculations: Developing the best exposure metric for a given research question is an active area of research in environmental health. To support this research, we will be adding new options for calculating covariates from the processed data. If you have a method for calculating covariates that you would like to see added to amadeus, please reach out via the issues tab on GitHub and add a tag new covariate calculation to your issue.
Bug Fixes: As with any software, there may be bugs that arise as users interact with the package. We will be actively monitoring the issues tab on GitHub for bug reports and will work to fix any bugs that are reported in a timely manner. If you encounter a bug while using amadeus, please report it via the issues tab on GitHub and add a tag bug to your issue.

Additional Resources

The following R packages can also be used to access environmental and weather data in R, but each differs from amadeus in the data sources covered or type of functionality provided.

Package	Source
`dataRetrieval`	USGS Hydrological Data and EPA Water Quality Data
`daymetr`	Daymet
`ecmwfr`	ECMWF Reanalysis v5 (ERA5)
`rNOMADS`	NOAA Operational Model Archive and Distribution System
`sen2r`7	Sentinel-2
`eddi`	EDDI
`heat`	[Harmonized Environmental Exposure Aggregation Tools] (https://github.com/echolab-stanford)

Contribution and AI use

The long-term sustainability and continuous improvements and development of amadeus is relying on contributions from agentic AI products. GitHub Copilot is currently being used to assist with code development, documentation, and testing. To ensure the quality and reliability of the package, all contributions are reviewed and extensively tested by the maintainers before being merged into the main branch.

To add or edit functionality for new data sources or datasets, open a Pull request into the main branch with a detailed description of the proposed changes. Pull requests must pass all status checks, and then will be approved or rejected by amadeus's authors.

Utilize Issues to notify the authors of bugs, questions, or recommendations. Identify each issue with the appropriate label to help ensure a timely response.

Multi-Resolution Land Characteristics ↩
National Aeronautics and Space Administration ↩
Socioeconomic Data and Applications Center ↩
United States Geological Survey ↩ ↩²
National Centers for Environmental Prediction ↩
United States Environmental Protection Agency ↩
Archived; no longer maintained. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
R		R
build		build
inst		inst
man		man
vignettes		vignettes
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
MD5		MD5
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

amadeus

Installation

Download

NASA Earthdata authentication with `setup_nasa_token()`

Process

Calculate Covariates

Computational considerations

Calculate_* buffer radius information

Connecting Health Outcomes Research Data Systems

Future Development, Maintenance, and Opportunities for Contribution

Additional Resources

Contribution and AI use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

amadeus

Installation

Download

NASA Earthdata authentication with setup_nasa_token()

Process

Calculate Covariates

Computational considerations

Calculate_* buffer radius information

Connecting Health Outcomes Research Data Systems

Future Development, Maintenance, and Opportunities for Contribution

Additional Resources

Contribution and AI use

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

NASA Earthdata authentication with `setup_nasa_token()`

Packages