The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

About

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

See all usage examples for datasets listed in this registry tagged with biodiversity.


Search datasets (currently 13 matching datasets)

You are currently viewing a subset of data tagged with biodiversity.


Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.


Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

MERRA-2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics 0.625 x 0.5 degree

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfoceansopendapwater

M2T1NXSLV (or tavg1_2d_slv_Nx) is an hourly time-averaged 2-dimensional data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of meteorology diagnostics at popularly used vertical levels, such as air temperature at 2-meter (or at 10-meter, 850hPa, 500 hPa, 250hPa), wind components at 50-meter (or at 2-meter, 10-meter, 850 hPa, 500hPa, 250 hPa), sea level pressure, surface pressure, and total precipitable water vapor (or ice water, liquid water). The data field is time-stamped with the central time of an hour starting from 00:30 UTC, e.g.: 00:30, 01:30, … , 23:30 UTC.MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 16 usage examples →

MERRA-2 inst3_3d_aer_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Aerosol Mixing Ratio 0.625 x 0.5 degree

agricultureair qualityatmospherebiodiversitycarbonclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NVAER (or inst3_3d_aer_Nv) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of aerosol mixing ratio parameters at 72 model layers, such as dust, sulphur dioxide, sea salt, black carbon, and organic carbon. The data field is available every three hour starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. Section 4.2 of the MERRA-2 File Specification document provides pressure values nominal for a 1000 hPa surface pressure and refers to the top edge of the layer. The lev=1 is for the top layer, and lev=72 is for the bottom (or surface) model layer. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 15 usage examples →

MERRA-2 inst3_3d_asm_Np: 3d,3-Hourly,Instantaneous,Pressure-Level,Assimilation,Assimilated Meteorological Fields

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NPASM (or inst3_3d_asm_Np) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of meteorological parameters at 42 pressure levels, such as temperature, wind components, vertical pressure velocity, water vapor, ozone mass mixing ratio, and layer height. The data field is available every three hours starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. The information on the pressure levels can be found in the section 4.2 of the MERRA-2 File Specification document. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 15 usage examples →

MERRA-2 inst3_3d_asm_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Assimilated Meteorological Fields 0.625 x 0.5 degree

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NVASM (or inst3_3d_asm_Nv) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of meteorological parameters at 72 model layers, such as temperature, wind components, vertical pressure velocity, water vapor, and layer height. The data field is available every three hour starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. Section 4.2 of the MERRA-2 File Specification document provides pressure values nominal for a 1000 hPa surface pressure and refers to the top edge of the layer. The lev=1 is for the top layer, and lev=72 is for the bottom (or surface) model layer. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov). Read our doc on how to get AWS Credentials to retrieve this data: Details →

Usage examples

See 15 usage examples →

NOAA Water-Column Sonar Data Archive

biodiversityearth observationecosystemsenvironmentalgeospatialmappingoceans

Water-column sonar data archived at the NOAA National Centers for Environmental Information.

Details →

Usage examples

See 8 usage examples →

Pacific Ocean Sound Recordings

acousticsbiodiversitybiologyclimatecoastaldeep learningecosystemsenvironmentalmachine learningmarine mammalsoceansopen source software

This project offers passive acoustic data (sound recordings) from a deep-ocean environment off central California. Recording began in July 2015, has been nearly continuous, and is ongoing. These resources are intended for applications in ocean soundscape research, education, and the arts.

Details →

Usage examples

See 6 usage examples →

Sea Around Us Global Fisheries Catch Data

biodiversityecosystemsfisheriesmarine

The project presents Sea Around Us Global Fisheries Catch Data aggregated at EEZ level. The data are computed from reconstructed catches from various official fisheries statistics, scientific, technical and policy reports about the fisheries, and includes estimation of discards, unreported and illegal catch data from all maritime countries and major territories of the world.This project was the result of a work between Sea Around Us and the CIC programme, a collaborative programme between the University of British Columbia (UBC) and AWS.

Details →

Usage examples

See 5 usage examples →

SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)

biodiversitybiologyecosystemsimage processingmultimediawildlife

The SiPeCaM goal is to create a data source that allows to evaluate changes in the biodiversity state, considering key aspect of how does the ecosystem behaves.

Details →

Usage examples

See 4 usage examples →

iSDAsoil

agricultureanalyticsbiodiversityconservationdeep learningfood securitygeospatialmachine learningsatellite imagery

iSDAsoil is a resource containing soil property predictions for the entire African continent, generated using machine learning. Maps for over 20 different soil properties have been created at 2 different depths (0-20 and 20-50cm). Soil property predictions were made using machine learning coupled with remote sensing data and a training set of over 100,000 analyzed soil samples. Included in this dataset are images of predicted soil properties, model error and satellite covariates used in the mapping process.

Details →

Usage examples

See 4 usage examples →

CZ Grand Challenges - Imaging MIT Licensed data and models

biodiversitybioinformaticsbiologybiomolecular modelingbrain imagescell biologycell imagingcziimaginglife sciencesmachine learningmicroscopymodelproteinzarr

This dataset contains a diverse range of imaging biological data and models. The data is sourced and curated by a team of experts at CZI and is made available as part of these datasets only when it is not publicly accessible or requires transformations to support model training.

Details →

Usage examples

See 6 usage examples →

CZ Grand Challenges - Transcriptomic MIT Licensed data and models

biodiversitybiologybiomolecular modelingcell biologyczihdf5life sciencesmachine learningmodelproteintranscriptomics

This dataset contains a transcriptomics biological data and models. The models embed transcriptomic data and facilitate transcriptomic analysis. The data is sourced and curated by a team of experts at CZI and is made available as part of these datasets only when it is not publicly accessible or requires transformations to support model training.

Details →

Usage examples

See 6 usage examples →

National Herbarium of NSW

agriculturebiodiversitybiologyclimatedigital preservationecosystemsenvironmental

The National Herbarium of New South Wales is one of the most significant scientific, cultural and historical botanical resources in the Southern hemisphere. The 1.43 million preserved plant specimens have been captured as high-resolution images and the biodiversity metadata associated with each of the images captured in digital form. Botanical specimens date from year 1770 to today, and form voucher collections that document the distribution and diversity of the world's flora through time, particularly that of NSW, Austalia and the Pacific.The data is used in biodiversity assessment, syste...

Details →

Usage examples

See 3 usage examples →

Biodiversity Heritage Library Metadata and Page Images

biodiversitybioinformaticslife sciences

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access.

Details →

Usage examples

See 5 usage examples →

GEDI L2A Elevation and Height Metrics Data Global Footprint Level V002

biodiversitycarbondatacenterearth observationenergyglobalhdficelandland coverlidarmetadataorbiturbanwater

The Global Ecosystem Dynamics Investigation (GEDI) mission aims to characterize ecosystem structure and dynamics to enable radically improved quantification and understanding of the Earth’s carbon cycle and biodiversity. The GEDI instrument produces high resolution laser ranging observations of the 3-dimensional structure of the Earth. GEDI is attached to the International Space Station (ISS) and collects data globally between 51.6° N and 51.6° S latitudes at the highest resolution and densest sampling of any light detection and ranging (lidar) instrument in orbit to date. Each GEDI Version 2 granule encompasses one-fourth of an ISS orbit and includes georeferenced metadata to allow for spatial querying and subsetting.The GEDI instrument was removed from the ISS and placed into storage on March 17, 2023. No data were acquired during the hibernation period from March 17, 2023, to April 24, 2024. GEDI has since been reinstalled on the ISS and resumed operations as of April 26, 2024.The purpose of the GEDI Level 2A Geolocated Elevation and Height Metrics product (GEDI02_A) is to provide waveform interpretation and extracted products from each GEDI01_B received waveform, including ground elevation, canopy top height, and relative height (RH) metrics. The methodology for generating the GEDI02_A product datasets is adapted from the Land, Vegetation, and Ice Sensor (LVIS) algorithm. The GEDI02_A product is provided in HDF5 format and has a spatial resolution (average footprint) of 25 meters.The GEDI02_A data product contains 156 layers for each of the eight beams, including ground elevation, canopy top height, relative return energy metrics (e.g., canopy vertical structure), and many other interpreted products from the return waveforms. Additional information for the layers can be found in the GEDI Level 2A Dictionary.Known Issues

  • Data acquisition gaps: GEDI data acquisitions were suspended on December 19, 2019 (2019 Day 353) and resumed on January 8, 2020 (2020 Day 8).
  • Incorrect Reference Ground Track (RGT) number in the filename for select GEDI files: GEDI Science Data Products for six orbits on August 7, 2020, and November 12, 2021, had the incorrect RGT number in the filename. There is no impact to the science data, but users should reference this document for the correct RGT numbers.
  • Known Issues: Section 8 of the User Guide provides additional information on known issues.
Improvements/Changes from Previous Versions

SeeFar V0

biodiversityclimatecoastalearth observationenvironmentalgeospatialglobalmachine learningmappingnatural resourcesatellite imagerysustainability

A collection of multi-resolution satellite images from both public and commercial satellites. The dataset is specifically curated for training geospatial foundation models.

Details →

Usage examples

See 2 usage examples →

Sounds of Central African landscapes

biodiversitybiologyecosystemsgeospatiallandlife sciencesnatural resourcesurvey

Archival soundscapes recorded in the rainforest landscapes of Central Africa, with a focus on the vocalizations of African forest elephants (Loxodonta cyclotis).

Details →

Usage examples

See 2 usage examples →

Global Biodiversity Information Facility (GBIF) Species Occurrences

biodiversitybioinformaticsconservationearth observationlife sciences

The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken a...

Details →

Usage examples

See 1 usage example →

Orcasound - bioacoustic data for marine conservation

biodiversitybiologycoastalconservationdeep learningecosystemsenvironmentalgeospatiallabeledmachine learningmappingoceansopen source softwaresignal processing

Live-streamed and archived audio data (~2018-present) from underwater microphones (hydrophones) containing marine biological signals as well as ambient ocean noise. Hydrophone placement and passive acoustic monitoring effort prioritizes detection of orca sounds (calls, clicks, whistles) and potentially harmful noise. Geographic focus is on the US/Canada critical habitat of Southern Resident killer whales (northern CA to central BC) with initial focus on inland waters of WA. In addition to the raw lossy or lossless compressed data, we provide a growing archive of annotated bioacoustic bouts.

Details →

Usage examples

See 1 usage example →

Australasian Genomes

biodiversitybiologyconservationgeneticgenomiclife sciencestranscriptomicswildlife

Australasian Genomes is the genomic data repository for the Threatened Species Initiative (TSI) and the ARC Centre for Innovations in Peptide and Protein Science (CIPPS). This repository contains reference genomes, transcriptomes, resequenced genomes and reduced representation sequencing data from Australasian species. Australasian Genomes is managed by the Australasian Wildlife Genomics Group (AWGG) at the University of Sydney on behalf of our collaborators within TSI and CIPPS.

Details →

CZ Grand Challenges - Imaging BSD licensed data and models

biodiversitybioinformaticsbiologybiomolecular modelingbrain imagescell biologycell imagingcziimaginglife sciencesmachine learningmicroscopymodelproteinzarr

This dataset contains a diverse range of imaging biological data and models. The data is sourced and curated by a team of experts at CZI and is made available as part of these datasets only when it is not publicly accessible or requires transformations to support model training.

Details →

Usage examples

See 3 usage examples →

Genome Ark

biodiversitybioinformaticsbiologyconservationgeneticgenomiclife sciences

The Genome Ark hosts genomic information for the Vertebrate Genomes Project (VGP) and other related projects. The VGP is an international collaboration that aims to generate complete and near error-free reference genomes for all extant vertebrate species. These genomes will be used to address fundamental questions in biology and disease, to identify species most genetically at risk for extinction, and to preserve genetic information of life.

Details →

MetaGraph Sequence Indexes

analysis ready databiodiversitybioinformaticsbiologyfastagenomegenomicgraphinformation retrievallife sciencesmedicinemetagenomicsmicrobiometranscriptomicswhole exome sequencingwhole genome sequencing

The MetaGraph Sequence Indexes dataset comprises full-text searchable index files for raw sequencing data hosted in major public repositories. These include the European Nucleotide Archive (ENA) managed by the European Bioinformatics Institute (EMBL-EBI), the Sequence Read Archive (SRA) maintained by the National Center for Biotechnology Information (NCBI), and the DNA Data Bank of Japan (DDBJ) Sequence Read Archive (DRA).All index files can be used with the MetaGraph framework for sequence search. Indexes can be jointly used for aggregated search in the cloud or can be individually downloaded...

Details →

Usage examples

See 3 usage examples →

OpenWings OpenData

biodiversityfastqgeneticgenomelife sciencesmuseumwildlife

DNA sequence data of UCE loci collected from the world's bird species (n=10,560).

Details →

Usage examples

See 3 usage examples →

University of British Columbia Sunflower Genome Dataset

agriculturebiodiversitybioinformaticsbiologyfood securitygeneticgenomiclife scienceswhole genome sequencing

This dataset captures Sunflower's genetic diversity originating from thousands of wild, cultivated, and landrace sunflower individuals distributed across North America.The data consists of raw sequences and associated botanical metadata, aligned sequences (to three different reference genomes), and sets of SNPs computed across several cohorts.

Details →

iNaturalist Licensed Observation Images

biodiversitybioinformaticsconservationearth observationlife sciences

iNaturalist is a community science effort in which participants share observations of living organisms that they encounter and document with photographic evidence, location, and date. The community works together reviewing these images to identify these observations to species. This collection represents the licensed images accompanying iNaturalist observations.

Details →

Ocean Biodiversity Information System (OBIS) species occurrence data

biodiversitycoastalconservationecosystemsenvironmentalgeospatiallife sciencesoceanswater

The Ocean Biodiversity Information System (OBIS) was founded in 2000 under the Census of Marine Life. It is now a programme component of the International Oceanographic Data and Information Exchange (IODE) programme of the Intergovernmental Oceanographic Commission (IOC) of UNESCO. OBIS aims to be the most comprehensive data and information gateway on the diversity, distribution and abundance of marine life to support its Member States in achieving a healthy and resilient ocean ecosystem. The OBIS network consists of over 30 regional and thematic nodes, and provides access to more than 5,000 d...

Details →

Usage examples

See 1 usage example →

OceanOmics

biodiversitybioinformaticsbiologyconservationgeneticgenomiclife sciences

Minderoo Foundation OceanOmics aims to establish environmental DNA (eDNA) as a tool to measure, understand, and protect oceans. OceanOmics mainly generates two types of data: eDNA sequencing data (metabarcoding, metagenomics), and genome assembly data (marine vertebrates).

Details →

Usage examples

See 1 usage example →