Usage examples for all datasets listed in the Registry of Open Data on AWS.


The Cancer Genome Atlas

Tools & Applications
Publications

Therapeutically Applicable Research to Generate Effective Treatments (TARGET)

Tools & Applications
Publications

Common Crawl

Tutorials
Tools & Applications
Publications

Gabriella Miller Kids First Pediatric Research Program (Kids First)

Tools & Applications
Publications

Sentinel-2

Tutorials
Tools & Applications
Publications

Sudachi Language Resources

Tutorials
Tools & Applications
Publications

USGS Landsat

Tutorials
Tools & Applications
Publications

Foldingathome COVID-19 Datasets

Tutorials
Tools & Applications
Publications

Genome Aggregation Database (gnomAD)

Tools & Applications
Publications

NEXRAD on AWS

Tutorials
Tools & Applications
Publications

Terrain Tiles

Tutorials
Tools & Applications
Publications

Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 Imagery

Tutorials
Tools & Applications
Publications

NOAA Geostationary Operational Environmental Satellites (GOES) 16 & 17

Tutorials
Tools & Applications
Publications

Allen Cell Imaging Collections

Tutorials
Tools & Applications
Publications

International Neuroimaging Data-Sharing Initiative (INDI)

Tutorials
Tools & Applications
Publications

SpaceNet

Tutorials
Tools & Applications
Publications

IRS 990 Filings

Tutorials
Tools & Applications

Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST)

Tutorials
Tools & Applications
Publications

RADARSAT-1

Tutorials
Tools & Applications

Sentinel-2 Cloud-Optimized GeoTIFFs

Tutorials
Tools & Applications
Publications

CBERS on AWS

Tutorials
Tools & Applications
Publications

Department of Energy's Open Energy Data Initiative (OEDI)

Tools & Applications
Publications

Open NeuroData

Tutorials
Tools & Applications
Publications

PubSeq - Public Sequence Resource

Tutorials
Tools & Applications
Publications

Cancer Cell Line Encyclopedia (CCLE)

Tools & Applications
Publications

DOE's Water Power Technology Office's (WPTO) US Wave dataset

Tools & Applications
Publications

NOAA Water-Column Sonar Data Archive

Tutorials
Tools & Applications
Publications

NREL Wind Integration National Dataset

Tutorials
Tools & Applications
Publications

World Bank - Light Every Night

Tutorials
Tools & Applications
Publications

Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)

Tools & Applications
Publications

ICGC on AWS

Tutorials
Publications

OpenAQ

Tutorials
Tools & Applications
Publications

Radiant MLHub

Tutorials
Tools & Applications
Publications

1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 and 3.7

Tutorials
Tools & Applications
Publications

BossDB Open Neuroimagery Datasets

Tutorials
Tools & Applications
Publications

Clinical Proteomic Tumor Analysis Consortium 3 (CPTAC-3)

Tools & Applications
Publications

Global Database of Events, Language and Tone (GDELT)

Tutorials
Tools & Applications

Low Altitude Disaster Imagery (LADI) Dataset

Tutorials
Tools & Applications
Publications

NYU Langone & FAIR FastMRI Dataset

Tutorials
Publications

New York City Taxi and Limousine Commission (TLC) Trip Record Data

Tutorials

PoroTomo

Tutorials
Publications

Southern California Earthquake Data

Tutorials
Publications

USGS 3DEP LiDAR Point Clouds

Tutorials
Tools & Applications
Publications

COVID-19 Data Lake

Tutorials
Tools & Applications

CoMMpass from the Multiple Myeloma Research Foundation

Tools & Applications
Publications

Coupled Model Intercomparison Project 6

Tutorials
Publications

Digital Earth Africa Landsat Collection 2 Level 2

Tutorials
Tools & Applications

ECMWF ERA5 Reanalysis

Tutorials

First Street Foundation (FSF) Flood Risk Summary Statistics

Tools & Applications
Publications

NIH NCBI Sequence Read Archive (SRA) on AWS

Tutorials
Tools & Applications
Publications

NOAA Rapid Refresh Forecast System (RRFS) Ensemble [Prototype]

Publications

Normalized Difference Urban Index (NDUI)

Tutorials
Tools & Applications
Publications

OpenStreetMap on AWS

Tutorials
Tools & Applications

Ozone Monitoring Instrument (OMI) / Aura NO2 Tropospheric Column Density

Tutorials
Tools & Applications

Prefeitura Municipal de São Paulo (PMSP) LiDAR Point Cloud

Tools & Applications
Publications

SondeHub Radiosonde Telemetry

Tutorials
Tools & Applications
Publications

3000 Rice Genomes Project

Tools & Applications
Publications

Basic Local Alignment Sequences Tool (BLAST) Databases

Tools & Applications
Publications

Community Earth System Model Large Ensemble (CESM LENS)

Tutorials
Tools & Applications
Publications

Digital Earth Africa Sentinel-2 Level-2A

Tutorials
Tools & Applications

Encyclopedia of DNA Elements (ENCODE)

Tutorials
Publications

GEOS-Chem Input Data

Tutorials
Publications

Genome in a Bottle on AWS

Tools & Applications
Publications

JMA Himawari-8

Publications

NA-CORDEX - North American component of the Coordinated Regional Downscaling Experiment

Tools & Applications
Publications

OpenCell on AWS

Tools & Applications
Publications

Refgenie reference genome assets

Tutorials
Tools & Applications
Publications

SILO climate data on AWS

Tutorials
Tools & Applications

Sentinel-1

Tools & Applications

Sentinel-3

Tutorials
Tools & Applications
Publications

Sentinel-5P Level 2

Tutorials
Tools & Applications
Publications

UK Biobank Pan-Ancestry Summary Statistics

Tutorials
Tools & Applications
Publications

Yale-CMU-Berkeley (YCB) Object and Model Set

Publications

iSDAsoil

Tutorials
Tools & Applications
Publications

Allen Ivy Glioblastoma Atlas

Tutorials
Tools & Applications
Publications

Allen Mouse Brain Atlas

Tutorials
Tools & Applications
Publications

Beat Acute Myeloid Leukemia (AML) 1.0

Tools & Applications
Publications

COVID-19 Harmonized Data

Tutorials
Tools & Applications

Clinical Trial Sequencing Project - Diffuse Large B-Cell Lymphoma

Tools & Applications
Publications

Deutsche Börse Public Dataset

Tutorials
Tools & Applications

Digital Earth Africa Sentinel-1 Radiometrically Terrain Corrected

Tutorials
Tools & Applications

Distributed Archives for Neurophysiology Data Integration (DANDI)

Tools & Applications

Finnish Meteorological Institute Weather Radar Data

Tutorials

Foundation Medicine Adult Cancer Clinical Dataset (FM-AD)

Tools & Applications
Publications

Global Seasonal Sentinel-1 Interferometric Coherence and Backscatter Data Set

Tutorials
Publications

Japanese Tokenizer Dictionaries

Tutorials
Tools & Applications
Publications

MIMIC-III (‘Medical Information Mart for Intensive Care’)

Tutorials
Tools & Applications

Medical Segmentation Decathlon

Tutorials
Tools & Applications
Publications

NASA NEX

Tutorials
Tools & Applications
Publications

NOAA Global Ensemble Forecast System (GEFS) Re-forecast

Tutorials
Publications

NOAA Global Historical Climatology Network Daily (GHCN-D)

Tutorials

NREL National Solar Radiation Database

Tools & Applications
Publications

National Herbarium of NSW

Tutorials
Publications

OpenEEW

Tutorials
Tools & Applications

Serratus: Ultra-deep Search for Novel Viruses - Versioned Data Release

Tools & Applications
Publications

Sophos/ReversingLabs 20 Million malware detection dataset

Tutorials
Tools & Applications
Publications

Storm EVent ImageRy (SEVIR)

Tutorials
Tools & Applications

The Human Microbiome Project

Publications

Variant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) Plugin

Tools & Applications

1940 Census Population Schedules, Enumeration District Maps, and Enumeration District Descriptions

Tutorials
Tools & Applications

4D Nucleome (4DN)

Tutorials

Africa Soil Information Service (AfSIS) Soil Chemistry

Tutorials
Publications

Amazon Bin Image Dataset

Publications

Atmospheric Models from Météo-France

Tools & Applications

Broad Genome References

Tutorials
Tools & Applications

Cancer Genome Characterization Initiatives - Burkitt Lymphoma, HIV+ Cervical Cancer

Tools & Applications
Publications

Cell Organelle Segmentation in Electron Microscopy (COSEM) on AWS

Publications

Cloud Indexes for Bowtie, Kraken, HISAT, and Centrifuge

Tutorials
Publications

ComStock

Tutorials

Copernicus Digital Elevation Model (DEM)

Tools & Applications

DigitalCorpora

Publications

Hubble Space Telescope Public Data

Tutorials
Publications

NAIP on AWS

Tools & Applications

NOAA Climate Forecast System (CFS)

Publications

NOAA High-Resolution Rapid Refresh (HRRR) Model

Tutorials

NOAA National Water Model Reanalysis

Tutorials
Publications

NOAA Operational Forecast System (OFS)

Tools & Applications
Publications

NOAA World Ocean Database (WOD)

Publications

National Archives Catalog

Tutorials
Tools & Applications

National Cancer Institute Center for Cancer Research - Diffuse Large B Cell Lymphoma (DLBCL) Genomics and Expression

Tools & Applications
Publications

Open City Model (OCM)

Tutorials

Oregon Health & Science University Chronic Neutrophilic Leukemia Dataset

Tools & Applications
Publications

Pancreatic Cancer Organoid Profiling

Tools & Applications
Publications

RAPID NRT Flood Maps

Publications

REDASA COVID-19 Open Data

Tools & Applications
Publications

Rapid7 FDNS ANY Dataset

Tutorials

RarePlanes

Tools & Applications
Publications

Sentinel-1 SLC dataset for South and Southeast Asia, Taiwan, Korea and Japan

Tutorials
Publications

Sounds of Central African landscapes

Publications

Terra Fusion Data Sampler

Tutorials
Tools & Applications

UK Met Office Atmospheric Deterministic and Probabilistic Forecasts

Tutorials

UniProt

Tutorials

1000 Genomes

Publications

A2D2: Audi Autonomous Driving Dataset

Tutorials

AI2 Diagram Dataset (AI2D)

Publications

AI2 Meaningful Citations Data Set

Publications

AI2 Reasoning Challenge (ARC) 2018

Publications

ARPA-E PERFORM Forecast data

Tools & Applications

AWS iGenomes

Tools & Applications

Allen Brain Observatory - Visual Coding AWS Public Data Set

Tutorials

Amazon-PQA

Publications

Answer Reformulation

Publications

Automatic Speech Recognition (ASR) Error Robustness

Publications

CIViC (Clinical Interpretation of Variants in Cancer)

Publications

CMIP6 GCMs downscaled using WRF

Tutorials

COVID-19 Genome Sequence Dataset

Tools & Applications

Cell Painting Image Collection

Publications

Conformational Space of Short Peptides

Tutorials

CoversBR

Tutorials

Crowdsourced Bathymetry

Tutorials

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

Publications

Discrete Reasoning Over the content of Paragraphs (DROP)

Publications

Enriched Topical-Chat Dataset for Knowledge-Grounded Dialogue Systems

Publications

Ford Multi-AV Seasonal Dataset

Tutorials

GATK Test Data

Tools & Applications

Geosnap Data, Center for Geospatial Sciences

Tools & Applications

Helpful Sentences from Reviews

Publications

Human Cancer Models Initiative (HCMI) Cancer Model Development Center

Tools & Applications

Human PanGenomics Project

Publications

Humor Detection from Product Question Answering Systems

Publications

IDEAM - Colombian Radar Network

Tutorials

Image classification - fast.ai datasets

Tools & Applications

LOFAR ELAIS-N1 cycle 2 observations on AWS

Publications

Low Context Name Entity Recognition (NER) Datasets with Gazetteer

Publications

Multilingual Name Entity Recognition (NER) Datasets with Gazetteer

Publications

NIH NCBI PMC Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS

Tutorials

NOAA Emergency Response Imagery

Publications

NOAA Global Forecast System (GFS)

Publications

NOAA Integrated Surface Database (ISD)

Tutorials

NOAA National Digital Forecast Database (NDFD)

Publications

NOAA S-111 Surface Water Currents Data

Tutorials

NOAA/PMEL Ocean Climate Stations Moorings

Publications

Natural Earth

Publications

New Jersey Statewide Digital Aerial Imagery Catalog

Tutorials

New Jersey Statewide LiDAR

Tutorials

Ohio State Cardiac MRI Raw Data (OCMR)

Tutorials

Oxford Nanopore Technologies Benchmark Datasets

Tutorials

PASS: Perturb-and-Select Summarizer for Product Reviews

Publications

Pre- and post-purchase product questions

Publications

QIIME 2 User Tutorial Datasets

Tutorials

Quoref

Publications

Reasoning Over Paragraph Effects in Situations (ROPES)

Publications

SILAM Air Quality

Tutorials

Safecast

Tools & Applications

Sentinel-2 L2A 120m Mosaic

Tools & Applications

Speedtest by Ookla Global Fixed and Mobile Network Performance Maps

Tutorials

Tabula Muris

Publications

The Multilingual Amazon Reviews Corpus

Publications

Transiting Exoplanet Survey Satellite (TESS)

Publications

U.S. Census ACS PUMS

Tutorials

Voices Obscured in Complex Environmental Settings (VOiCES)

Tutorials

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

Publications

Xiph.Org Test Media

Tutorials

ZINC Database

Publications

iHART Whole Genome Sequencing Data Set

Publications

AgricultureVision

Publications

Binding DB

Tutorials
Publications

ChEMBL

Tutorials
Publications

ClinVar

Tutorials
Publications

Covid Job Impacts - US Hiring Data Since March 1 2020

Tutorials
Tools & Applications

Open Targets

Tutorials
Publications

1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 and 3.7 - Data Lakehouse Ready

Tutorials

COVID-19 Open Research Dataset (CORD-19)

Tools & Applications

Corn Kernel Counting Dataset

Publications

Genome Aggregation Database (gnomAD) - Data Lakehouse ready

Tutorials

Google Brain Genomics Sequencing Dataset for Benchmarking and Development

Publications

High-Order Accurate Direct Numerical Simulation of Flow over a MTU-T161 Low Pressure Turbine Blade

Publications

Longitudinal Nutrient Deficiency

Publications

MODIS MYD13A1, MOD13A1, MYD11A1, MOD11A1, MCD43A4

Tools & Applications

OpenSurfaces

Publications

Swiss Public Transport Stops

Tools & Applications

If you want to add a dataset or usage example to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Home