This registry exists to help people discover and share datasets that are available via AWS resources. Learn more about sharing data on AWS.
See all usage examples for datasets listed in this registry tagged with sustainability.
You are currently viewing a subset of data tagged with sustainability.
If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.
Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.
The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in on-going studies. L1C data are available from June 2015 globally. L2A data are available from April 2017 over wider Europe region and globally since December 2018.
An ongoing collection of satellite imagery of all land on Earth produced by the Landsat 8 satellite.
This project creates a S3 repository with imagery acquired by the China-Brazil Earth Resources Satellite (CBERS). The image files are recorded and processed by Instituto Nacional de Pesquisa Espaciais (INPE) and are converted to Cloud Optimized Geotiff format in order to optimize its use for cloud based applications. The repository contains all CBERS-4 MUX, AWFI, PAN5M and PAN10M scenes acquired since the start of the satellite mission and is daily updated with new scenes.
A global dataset providing bare-earth terrain heights, tiled for easy usage and provided on S3.
Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.
The eBird Status and Trends project generates estimates of bird occurrence and abundance at a high spatiotemporal resolution. This dataset represents the primary modeled results from the analysis workflow and are designed for further analysis, synthesis, visualization, and exploration.
The Community Earth System Model (CESM) Large Ensemble Numerical Simulation (LENS) dataset includes a 40-member ensemble of climate simulations for the period 1920-2100 using historical data (1920-2005) or assuming the RCP8.5 greenhouse gas concentration scenario (2006-2100), as well as longer control runs based on pre-industrial conditions. The data comprise both surface (2D) and volumetric (3D) variables in the atmosphere, ocean, land, and ice domains. The total data volume of the original dataset is ~500TB, which has traditionally been stored as ~150,000 individual CF/NetCDF files on disk or magnetic tape made available through the NCAR Climate Data Gateway for download or via web services. NCAR has copied a subset (currently ~70 TB) of CESM LENS data to Amazon S3 as part of the AWS Public Datasets Program. To optimize for large-scale analytics we have represented the data as ~275 Zarr stores format accessible through the Python Xarray library. Each Zarr store contains a single physical variable for a given model run type and temporal frequency (monthly, daily, 6-hourly).
OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3.
Sentinel-1 is a pair of European radar imaging (SAR) satellites launched in 2014 and 2016. Its 6 days revisit cycle and ability to observe through clouds makes it perfect for sea and land monitoring, emergency response due to environmental disasters, and economic applications. GRD data is available globally since January 2017.
Input data for the GEOS-Chem Chemical Transport Model. Including the NASA/GMAO MERRA-2 and GEOS-FP meteorological products, the HEMCO emission inventories, and other small data such as model initial conditions.
Select products from the Moderate Resolution Imaging Spectroradiometer (MODIS) managed by the U.S. Geological Survey and NASA.
The Wind Integration National Dataset (WIND) is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies.
Global, aggregated physical air quality data from public data sources provided by government, research-grade and other sources. These awesome groups do the hard work of measuring these data and publicly sharing them, and our community makes them more universally-accessible to both humans and machines.
SILO is a database of Australian climate data from 1889 to the present. It provides continuous, daily time-step data products in ready-to-use formats for research and operational applications. Gridded SILO data in annual NetCDF format are on AWS. Point data are available from the SILO website.
This dataset contains ground motion velocity and acceleration seismic waveforms recorded by the Southern California Seismic Network (SCSN) and archived at the Southern California Earthquake Data Center (SCEDC).
The goal of the USGS 3D Elevation Program (3DEP) is to collect elevation data in the form of light detection and ranging (LiDAR) data over the conterminous United States, Hawaii, and the U.S. territories, with data acquired over an 8-year period. This dataset provides two realizations of the 3DEP point cloud data. The first resource is a public access organization provided in Entwine Point Tiles format, which a lossless, full-density, streamable octree based on LASzip (LAZ) encoding. The second resource is a Requester Pays of the same data in LAZ (Compressed LAS) format. Resource names in both buckets correspond to the USGS project names.
This dataset contains soil infrared spectral data and paired soil property reference measurements for georeferenced soil samples that were collected through the Africa Soil Information Service (AfSIS) project, which lasted from 2009 through 2018. In this release, we include data collected during Phase I (2009-2013.) Georeferenced samples were collected from 19 countries in Sub-Saharan African using a statistically sound sampling scheme, and their soil properties were analyzed using both conventional soil testing methods and spectral methods (infrared diffuse reflectance spectroscopy). The two types of data can be paired to form a training dataset for machine learning, such that certain soil properties can be well-predicted through less expensive spectral techniques.
Global and high-resolution regional atmospheric models from Météo-France.
ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, and the first reanalysis produced as an operational service. It utilizes the best available observation data from satellites and in-situ stations, which are assimilated and processed using ECMWF's Integrated Forecast System (IFS) Cycle 41r2. The dataset provides all essential atmospheric meteorological parameters like, but not limited to, air temperature, pressure and wind at different altitudes, along with surface parameters like rainfall, soil moisture content and sea parameters like sea-surface temperature and wave height. ERA5 provides data at a considerably higher spatial and temporal resolution than its legacy counterpart ERA-Interim. ERA5 consists of high resolution version with 31 km horizontal resolution, and a reduced resolution ensemble version with 10 members. It is currently available since 2008, but will be continuously extended backwards, first until 1979 and then to 1950. Learn more about ERA5 in Jon Olauson's paper ERA5: The new champion of wind power modelling?.
GOES satellites (GOES-16 & GOES-17) provide continuous weather imagery and monitoring of meteorological and space environment data across North America. GOES satellites provide the kind of continuous monitoring necessary for intensive data analysis. They hover continuously over one position on the surface. The satellites orbit high enough to allow for a full-disc view of the Earth. Because they stay above a fixed spot on the surface, they provide a constant vigil for the atmospheric "triggers" for severe weather conditions such as tornadoes, flash floods, hailstorms, and hurricanes. When these conditions develop, the GOES satellites are able to monitor storm development and track their movements.
The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. This "leaf-on" imagery andtypically ranges from 60 centimeters to 100 centimeters in resolution and is available from the naip-analytic Amazon S3 bucket as 4-band (RGB + NIR) imagery in MRF format, on naip-source Amazon S3 bucket as 4-band (RGB + NIR) in uncompressed Raw GeoTiff format and naip-visualization as 3-band (RGB) Cloud Optimized GeotTiff format. NAIP data is delivered at the state level; every year, a number of states receive updates, with an overall update cycle of two or three years. More details on NAIP
A collection of Earth science datasets maintained by NASA, including climate change projections and satellite images of the Earth's surface.
The National Solar Radiation Database (NSRDB) is a serially complete collection of hourly and half-hourly values of the three most common measurements of solar radiation – global horizontal, direct normal, and diffuse horizontal irradiance — and meteorological data. These data have been collected at a sufficient number of locations and temporal and spatial scales to accurately represent regional solar radiation climates.
Grillo has developed an IoT-based earthquake early-warning system, with sensors currently deployed in Mexico, Chile and Costa Rica, and is now opening its entire archive of unprocessed accelerometer data to the world to encourage the development of new algorithms capable of rapidly detecting and characterizing earthquakes in real time.
Meteorological data reusers now have an exciting opportunity to sample, experiment and evaluate Met Office atmospheric model data, whilst also experiencing a transformative method of requesting data via Restful APIs on AWS. For information about the data see the Met Office website. For examples of using the data check out the examples repository. If you need help and support using the data please raise an issue on the examples repository.Please note: Met Office continuously improves and updates its operational forecast models. Our next update becomes effective soon. Please find the details here.
Earth & Atmospheric Sciences at Cornell University has created a public data lake of climate data. The data is stored in columnar storage formats (ORC) to make it straightforward to query using standard tools like Amazon Athena or Apache Spark. The data itself is originally intended to be used for building decision support tools for farmers and digital agriculture. The first dataset is the historical NDFD / NDGD data distributed by NCEP / NOAA / NWS. The NDFD (National Digital Forecast Database) and NDGD (National Digital Guidance Database) contain gridded forecasts and observations at 2.5km resolution for the Contiguous United States (CONUS). There are also 5km grids for several smaller US regions and non-continguous territories, such as Hawaii, Guam, Puerto Rico and Alaska. NOAA distributes archives of the NDFD/NDGD via its NOAA Operational Model Archive and Distribution System (NOMADS) in Grib2 format. The data has been converted to ORC to optimize storage space and to, more importantly, simplify data access via standard data analytics tools.
Global Historical Climatology Network - Daily is a dataset from NOAA that contains daily observations over global land areas. It contains station-based measurements from land-based stations worldwide, about two thirds of which are for precipitation measurement only. Other meteorological elements include, but are not limited to, daily maximum and minimum temperature, temperature at the time of observation, snowfall and snow depth. It is a composite of climate records from numerous sources that were merged together and subjected to a common suite of quality assurance reviews. Some data are more than 175 years old. The data is in CSV format. Each file corresponds to a year from 1763 to present and is named as such.
Air Quality is a global SILAM atmospheric composition and air quality forecast performed on a daily basis for > 100 species and covering the troposphere and the stratosphere. The output produces 3D concentration fields and aerosol optical thickness. The data are unique: 20km resolution for global AQ models is unseen worldwide.
An ongoing collection of radiation and air quality measurements taken by devices involved in the Safecast project.
The S1 Single Look Complex (SLC) dataset contains Synthetic Aperture Radar (SAR) data in the C-Band wavelength. The SAR sensors are installed on a two-satellite (Sentinel-1A and Sentinel-1B) constellation orbiting the Earth with a combined revisit time of six days, operated by the European Space Agency. The S1 SLC data are a Level-1 product that collects radar amplitude and phase information in all-weather, day or night conditions, which is ideal for studying natural hazards and emergency response, land applications, oil spill monitoring, sea-ice conditions, and associated climate change effects.
U.S. Census Bureau American Community Survey (ACS) Public Use Microdata Sample (PUMS) available in a linked data format using the Resource Description Framework (RDF) data model.
The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The entire globe is covered by the GFS at a base horizontal resolution of 18 miles (28 kilometers) between grid points, which is used by the operational forecasters who predict weather out to 16 days in the future. Horizontal resolution drops to 44 miles (70 kilometers) between grid point for forecasts between one week and two weeks.
High resolution climate data to help assess the impacts of climate change primarily on agriculture. These open access datasets of climate projections will help researchers make climate change impact assessments.
COSMO-D2 high-resolution, short-range numerical weather prediction model for Germany and adjacent countries; regular grid with 2.2km resolution and 65 vertical levels; updated at 00UTC and every following 3h; forecast range 27h (45h for 03UTC); selection of commonly used parameters
COSMO-D2 EPS high-resolution, short-range numerical weather ensemble prediction model for Germany and adjacent countries; 20 ensemble members, regular grid with 2.2km resolution and 65 vertical levels; updated at 00UTC and every following 3h; forecast range 27h (45h for 03UTC); selection of commonly used parameters; ensemble members are bundled in joint grib files
ICON global numerical weather prediction model; average resolution of 13km with 90 vertical levels; udpated at 00UTC and every following 6h with a forecast range of 120h (180h for 00UTC and 12UTC); selection of commonly used parameters
ICON global EPS ensemble prediction model; 40 ensemble members; average resolution of 40km; updated at 00UTC and every following 6h with a forecast range of 120h (extended to 180h for 00UTC and 12UTC); selection of commonly used parameters; ensemble members are bundled in joint grib files
ICON-EU regional numerical weather prediction model; european nesting region with increased resolution of approximately 6.5km with 60 vertical levels; updated at 00UTC and every following 3h with 120h forecast range; selection of commonly used parameters
ICON-EU EPS regional ensemble weather prediction model; 40 ensemble members; European nesting region with increased resolution of approximately 20km; updated at 00UTC and every following 3h with 120h forecast range; selection of commonly used parameters; ensemble members are bundled in joint grib files
This dataset contains historical and projected dynamically downscaled climate data for the State of Alaska and surrounding regions at 20km spatial resolution and hourly temporal resolution. This data was produced using the Weather Research and Forecasting (WRF) model (Version 3.5). We downscaled both ERA-Interim historical reanalysis data (1979-2015) and both historical and projected runs from 2 GCM’s from the Coupled Model Inter-comparison Project 5 (CMIP5): GFDL-CM3 and NCAR-CCSM4 (historical run: 1970-2005 and RCP 8.5: 2006-2100).
Detailed air model results from EPA’s Risk-Screening Environmental Indicators (RSEI) model.
GSOD is a collection of daily weather measurements (temperature, wind speed, humidity, pressure, and more) from 9000+ weather stations around the world.
HIRLAM (High Resolution Limited Area Model) is an operational synoptic and mesoscale weather prediction model managed by the Finnish Meteorological Institute.
Population data for a selection of countries, allocated to 1 arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files. This refines CIESIN’s Gridded Population of the World using machine learning models on high-resolution worldwide Digital Globe satellite imagery. CIESIN population counts aggregated from worldwide census data are allocated to blocks where imagery appears to contain buildings.
In order to support NOAA's homeland security and emergency response requirements, the National Geodetic Survey Remote Sensing Division (NGS/RSD) has the capability to acquire and rapidly disseminate a variety of spatially-referenced datasets to federal, state, and local government agencies, as well as the general public. Remote sensing technologies used for these projects have included lidar, high-resolution digital cameras, a film-based RC-30 aerial camera system, and hyperspectral imagers. Examples of rapid response initiatives include acquiring high resolution images with the Emerge/Applanix Digital Sensor System (DSS).
The Global Ensemble Forecast System (GEFS), previously known as the GFS Global ENSemble (GENS), is a weather forecast model made up of 21 separate forecasts, or ensemble members. The National Centers for Environmental Prediction (NCEP) started the GEFS to address the nature of uncertainty in weather observations, which is used to initialize weather forecast models. The GEFS attempts to quantify the amount of uncertainty in a forecast by generating an ensemble of multiple forecasts, each minutely different, or perturbed, from the original observations. With global coverage, GEFS is produced four times a day with weather forecasts going out to 16 days.
The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The entire globe is covered by the GFS at a base horizontal resolution of 18 miles (28 kilometers) between grid points, which is used by the operational forecasters who predict weather out to 16 days in the future. Horizontal resolution drops to 44 miles (70 kilometers) between grid point for forecasts between one week and two weeks. Both the current version and the FV3-based parallel version of the GFS being tested to become the new operational model at a future date are available.
Global Historical Climatology Network - Hourly is a dataset from NOAA that contains daily observations over global land areas. It contains station-based measurements from land-based stations worldwide, about two thirds of which are for precipitation measurement only. Other meteorological elements include, but are not limited to, daily maximum and minimum temperature, temperature at the time of observation, snowfall and snow depth. It is a composite of climate records from numerous sources that were merged together and subjected to a common suite of quality assurance reviews. Some data are more than 175 years old. The data is in CSV format. Each file corresponds to a year from 1763 to present and is named as such.
Global Hydro-Estimator provides a global mosaic imagery of rainfall estimates from multi-geostationary satellites, which currently includes GOES-16, GOES-15, Meteosat-8, Meteosat-11 and Himawari-8. The GHE products include: Instantaneous rain rate, 1 hour, 3 hour, 6 hour, 24 hour and also multi-day rainfall accumulation.
The HRRR is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation. Radar data is assimilated in the HRRR every 15 min over a 1-h period adding further detail to that provided by the hourly data assimilation from the 13km radar-enhanced Rapid Refresh.
The Integrated Surface Database (ISD) consists of global hourly and synoptic observations compiled from numerous sources into a gzipped fixed width format. ISD was developed as a joint activity within Asheville's Federal Climate Complex. The database includes over 35,000 stations worldwide, with some having data as far back as 1901, though the data show a substantial increase in volume in the 1940s and again in the early 1970s. Currently, there are over 14,000 "active" stations updated daily in the database. The total uncompressed data volume is around 600 gigabytes; however, it continues to grow as more data are added. ISD includes numerous parameters such as wind speed and direction, wind gust, temperature, dew point, cloud data, sea level pressure, altimeter setting, station pressure, present weather, visibility, precipitation amounts for various time periods, snow depth, and various other elements as observed by each station.
The NOAA National Water Model Reanalysis dataset contains output from multi-decade retrospective simulations. These simulations used observed rainfall as input and ingested other required meteorological input fields from a weather reanalysis dataset. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time forecast model. One application of this dataset is to provide historical context to current real-time streamflow, soil moisture and snowpack NWM conditions. The reanalysis data can be used to infer flow frequencies and perform temporal analyses with hourly streamflow output and 3-hourly land surface output. The long-term dataset can also be used in the development of end user applications which require a long baseline of data for system training or verification purposes. This dataset contains output from two retrospective simulations. Currently there are two versions of the dataset: A 25-year (January 1993 through December 2017) retrospective simulation using version 1.2 of the National Water Model, and a 26-year (January 1993 through December 2018) retrospective simulation using version 2.0 of the National Water Model.
The National Water Model (NWM) is a water resources model that simulates and forecasts water budget variables, including snowpack, evapotranspiration, soil moisture and streamflow, over the entire continental United States (CONUS). The model, launched in August 2016, is designed to improve the ability of NOAA to meet the needs of its stakeholders (forecasters, emergency managers, reservoir operators, first responders, recreationists, farmers, barge operators, and ecosystem and floodplain managers) by providing expanded accuracy, detail, and frequency of water information. It is operated by NOAA’s Office of Water Prediction. This bucket contains a four-week rollover of the Short Range Forecast model output and the corresponding forcing data for the model. The model is forced with meteorological data from the High Resolution Rapid Refresh (HRRR) and the Rapid Refresh (RAP) models. The Short Range Forecast configuration cycles hourly and produces hourly deterministic forecasts of streamflow and hydrologic states out to 18 hours.
The Operational Forecast System (OFS) has been developed to serve the maritime user community. OFS was developed in a joint project of the NOAA/National Ocean Service (NOS)/Office of Coast Survey, the NOAA/NOS/Center for Operational Oceanographic Products and Services (CO-OPS), and the NOAA/National Weather Service (NWS)/National Centers for Environmental Prediction (NCEP) Central Operations (NCO). OFS generates water level, water current, water temperature, water salinity (except for the Great Lakes) and wind conditions nowcast and forecast guidance four times per day.
OSMLR a linear referencing system built on top of OpenStreetMap. OSM has great information about roads around the world and their interconnections, but it lacks the means to give a stable identifier to a stretch of roadway. OSMLR provides a stable set of numerical IDs for every 1 kilometer stretch of roadway around the world. In urban areas, OSMLR IDs are attached to each block of roadways between significant intersections.
GOES provides continuous weather imagery and monitoring of meteorological and space environment data across North America.
Data from the Moderate Resolution Imaging Spectroradiometer (MODIS), managed by the U.S. Geological Survey and NASA. Five products are included: MCD43A4 (MODIS/Terra and Aqua Nadir BRDF-Adjusted Reflectance Daily L3 Global 500 m SIN Grid), MOD11A1 (MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1 km SIN Grid), MYD11A1 (MODIS/Aqua Land Surface Temperature/Emissivity Daily L3 Global 1 km SIN Grid), MOD13A1 (MODIS/Terra Vegetation Indices 16-Day L3 Global 500 m SIN Grid), and MYD13A1 (MODIS/Aqua Vegetation Indices 16-Day L3 Global 500 m SIN Grid). MCD43A4 has global coverage, all time (~21 years). The other products have ~11 years of global coverage. All data files are in single-band cloud-optimized GeoTIFF (COG) format.
Pre and post event high-resolution satellite imagery in support of emergency planning, risk assessment, monitoring of staging areas and emergency response, damage assessment, and recovery. Also incudes crowdsourced damage assessments for major, sudden onset disasters.