Registry of Open Data on AWS

About

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

Get started using data quickly by viewing all tutorials with associated SageMaker Studio Lab notebooks.

See all usage examples for datasets listed in this registry.

See datasets from EPA, Allen Institute for Artificial Intelligence (AI2), Biohub, Digital Earth Africa, Data for Good at Meta, NASA Space Act Agreement, NIH STRIDES, NOAA Open Data Dissemination Program, Space Telescope Science Institute, and Amazon Sustainability Data Initiative.

Search datasets (currently 13 matching datasets)

Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.

Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

The Human Sleep Project

bioinformaticsdeep learninglife sciencesmachine learningmedicineneurophysiologyneuroscience

The Human Sleep Project (HSP) sleep physiology dataset is a growing collection of clinical polysomnography (PSG) recordings. Beginning with PSG recordings from from ~15K patients evaluated at the Massachusetts General Hospital, the HSP will grow over the coming years to include data from >200K patients, as well as people evaluated outside of the clinical setting. This data is being used to develop CAISR (Complete AI Sleep Report), a collection of deep neural networks, rule-based algorithms, and signal processing approaches designed to provide better-than-human detection of conventional PSG...

Usage examples

Applications of a Capacitor-Based Respiratory Position Sensing Device: Implications for Radiation Therapy. Austin Journal of Medical Oncology 2014;1(2). PMCID: PMC6956860. by Weng Y, Westover MB, Speier C, Sharp G, Bianchi MT, Westover KD.
Automated Sleep Apnea Quantification Based on Respiratory Movement. International Journal of Medical Sciences 2014; 11(8):796-802. PMCID: PMC4057486. by Bianchi MT, Lipoma T, Darling C, Alameddine Y, Westover MB.
Sleep EEG-based Brain Age Index is a Biomarker for Dementia. JAMA Network Open. 2020 Sep 1;3(9):e2017357. doi: 10.1001/jamanetworkopen.2020.17357. PMID: 32986106; PMCID: PMC7522697. by Ye E, Sun H, Leone MJ, Paixao L, Thomas RJ, Lam AD, et al.
HIV Increases Sleep-based Brain Age Despite Antiretroviral Therapy. SLEEP. 2021 Mar 30:zsab058. doi: 10.1093/sleep/zsab058. Epub ahead of print. PMCID: PMC8361332. by Leone MJ*, Sun H*, Boutros CL, Liu L, Ye E, Sullivan L, et al.
Decision modeling in sleep apnea: the critical roles of pre-test probability, cost of untreated OSA, and time horizon. Journal of Clinical Sleep Medicine. 2016 Mar;12(3):409-418. PMCID: PMC4773629. by Moro M, Westover MB, Kelly J, Bianchi MT.

See 37 usage examples →

Common Crawl

encyclopedicinternetnatural language processingweb archive

A corpus of web crawl data composed of over 300 billion web pages.

Usage examples

Mapping languages: The Corpus of Global Language Use by Jonathan Dunn
Language is not all you need: aligning perception with language models by Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, et al
C4Corpus: Multilingual Web-Size Corpus with Free License by Ivan Habernal, Omnia Zayed, Iryna Gurevych
Web Data Commons - RDFa, microdata, and microformat data sets by Christian Bizer, Robert Meusel, Anna Primpeli
Index fun by Philippe Suter

See 36 usage examples →

The Cancer Genome Atlas

cancergenomiclife sciencesSTRIDESwhole genome sequencing

The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to generate comprehensive, multi-dimensional maps of the key genomic changes in major types and subtypes of cancer. TCGA has analyzed matched tumor and normal tissues from 11,000 patients, allowing for the comprehensive characterization of 33 cancer types and subtypes, including 10 rare cancers. The dataset contains open Clinical Supplement, Biospecimen Supplement, RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantificati...

Usage examples

TCGA Cancers Selected for Study by National Cancer Institute
Genomic and Functional Approaches to Understanding Cancer Aneuploidy by Alison M. Taylor, Juliann Shih, et al.
Molecular Characterization and Clinical Relevance of Metabolic Expression Subtypes in Human Cancers by Xinxin Peng, Zhongyuan Chen, et al.
Cancer Genomics Cloud by Seven Bridges
The Immune Landscape of Cancer by Vésteinn Thorsson, David L. Gibbs, et al.

See 29 usage examples →

CCRS MODIS albedo over Canada | Albédo MODIS du CCT couvrant le Canada

analysis ready databroadbandcogearth observationsatellite imagery

Times series of 10-day spectral and broadband albedo products derived at 250-m spatial resolution over Canadian territory and neighboring areas produced at the Canada Centre for Remote Sensing (CCRS) since February 2000 using MODIS L1B C6.1 swath imagery as input. The imagery for all spectral bands was downscaled and re-projected into the Lambert Conformal Conic (LCC) projection at 250-m spatial resolution. The area size is 5,700 km x 4,800 km (22,800 pixel x 19,200 lines). Séries temporelles de produits d’albédo spectral et à large bande générés à des intervalles de 10 jours avec une résolut...

Usage examples

Landfast ice mapping using MODIS clear-sky composites:application for the Banks Island coastline in Beaufort Sea and comparison with Canadian Ice Service data by Trishchenko, A.P., Luo, Y.
Annual mapping of large Forest disturbances across Canada's forests using 250 m MODIS imagery from 2000 to 2011 by Guindon, L., Bernier, P.Y., Beaudoin, A., Pouliot, D., Villemaire, P., Hall, R.J., Latifovic, R., St-Amant, R.
Perennial snow and ice variations (2000-2008) in the Arctic circumpolar land area from satellite observations by Fontana F.M.A., Trishchenko A.P., Luo Y., Khlopenkov K.V., Nussbaumer S.U., Wunderle S.
Warm season snow/ice probability maps from modis and viirs sensors over Canada by Trishchenko, Alexander P., Ungureanu, Calin
CCRS MODIS albedo over Canada on GEO.ca | Albédo MODIS du CCT couvrant le Canada sur GEO.ca by Canada Centre for Remote Sensing | Centre canadien de télédétection

See 24 usage examples →

Foldingathome COVID-19 Datasets

alchemical free energy calculationsbiomolecular modelingcoronavirusCOVID-19foldingathomehealthlife sciencesmolecular dynamicsproteinSARS-CoV-2simulationsstructural biology

Folding@home is a massively distributed computing project that uses biomolecular simulations to investigate the molecular origins of disease and accelerate the discovery of new therapies. Run by the Folding@home Consortium, a worldwide network of research laboratories focusing on a variety of different diseases, Folding@home seeks to address problems in human health on a scale that is infeasible by another other means, sharing the results of these large-scale studies with the research community through peer-reviewed publications and publicly shared datasets. During the COVID-19 epidemic, Folding@h...

Usage examples

See 24 usage examples →

Sentinel-2

agriculturedisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in on-going studies. L1C data are available from June 2015 globally. L2A data are available from November 2016 over Europe region and globally since January 2017.

Usage examples

How to Work with Landsat and Sentinel-2 on AWS with Python by Martin D. Maas
Integrate imagery from the Sentinel-2 archive into your own apps, maps, and analysis with the Sentinel-2 image service by Esri
Planet Insights Platform by Planet
EO Browser by Sinergise
Coral-spawn slicks: Reflectance spectra and detection using optical satellite data by Hiroya Yamano, Asahi Sakuma, Saki Harii

See 24 usage examples →

Therapeutically Applicable Research to Generate Effective Treatments (TARGET)

cancergenomiclife sciencesSTRIDESwhole genome sequencing

Therapeutically Applicable Research to Generate Effective Treatments (TARGET) is the collaborative effort of a large, diverse consortium of extramural and NCI investigators. The goal of the effort is to accelerate molecular discoveries that drive the initiation and progression of hard-to-treat childhood cancers and facilitate rapid translation of those findings into the clinic. TARGET projects provide comprehensive molecular characterization to determine the genetic changes that drive the initiation and progression of childhood cancers.The dataset contains open Clinical Supplement, Biospecimen...

Usage examples

MicroRNA Expression-Based Model Indicates Event-Free Survival in Pediatric Acute Myeloid Leukemia by Lim EL, Trinh DL, Ries RE, et al.
ISB Cancer Genomics Cloud by Institute for Systems Biology
CSF3R mutations have a high degree of overlap with CEBPA mutations in pediatric AM by Maxson JE, Ries RE, Wang YC, et al.
Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia by Yang JJ, Cheng C, Devidas M, et al.
Identification and analyses of extra-cranial and cranial rhabdoid tumor molecular subgroups reveal tumors with cytotoxic T cell infiltration by Hye-Jung E. Chun, Pascal D. Johann, Katy Milne et al.

See 24 usage examples →

USGS Landsat

agriculturecogdisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

This joint NASA/USGS program provides the longest continuous space-based record of Earth’s land in existence. Every day, Landsat satellites provide essential information to help land managers and policy makers make wise decisions about our resources and our environment. Data is provided for Landsats 1, 2, 3, 4, 5, 7, 8, and 9 (excludes Landsat 6).As of June 28, 2023 (announcement), the previous single SNS topic arn:aws:sns:us-west-2:673253540267:public-c2-notify was replaced with three new SNS topics for different types of scenes.

Usage examples

See 23 usage examples →

Allen Cell Imaging Collections

biologycell biologycell imagingHomo sapiensimage processinglife sciencesmachine learningmicroscopy

This bucket contains multiple datasets (as Quilt packages) created by the Allen Institute for Cell Science. The types of data included in this bucket are listed below:

Field of view or cropped images of cells
Segmentations of structures in the images (e.g., boundaries of cells, DNA, other intracellular structures, etc.)
Processed versions of the above images and segmentations
Machine learning predictions and labels of the data listed above
Models trained on the previously listed data
Additional supporting non-image data related to the above listed data types (e.g., gene expression data, whole genome sequenc

...

Usage examples

See 20 usage examples →

Sudachi Language Resources

natural language processing

Japanese dictionaries and pre-trained models (word embeddings and language models) for natural language processing. SudachiDict is the dictionary for a Japanese tokenizer (morphological analyzer) Sudachi. chiVe is Japanese pretrained word embeddings (word vectors), trained using the ultra-large-scale web corpus NWJC by National...

Usage examples

SudachiPy Tutorial by Works Applications
analysis-sudachi Tutorial by Works Applications
Sudachi Tutorial by Works Applications
Kintoki: Dependency Parser by Works Applications
analysis-sudachi: Sudachi pluglin for Elasticsearch by Works Applications

See 20 usage examples →

CELLxGENE Discover Census

Biohubbioinformaticscell biologylife sciencessingle-cell transcriptomicstranscriptomics

CELLxGENE Discover (cellxgene.cziscience.com) is a free-to-use platform for the exploration, analysis, and retrieval of single-cell data. CELLxGENE Discover hosts the largest aggregation of standardized single-cell data from the major human and mouse tissues, with modalities that include gene expression, chromatin accessibility, DNA methylation, and spatial transcriptomics. This year, CELLxGENE Discover has made available all of its human and mouse RNA single-cell data through Census (https://chanzuckerberg.github.io/cellxgene-census/) – a free-to-use service with an API and data that allows f...

Usage examples

Out-of-core (incremental) mean and variance calculation by Biohub
Understanding and filtering out duplicate cells by Biohub
Normalizing full-length gene sequencing data by Biohub
scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI. by Haotian Cui, et al.
CELLxGENE Discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data by Biohub Single-Cell Biology et al.

See 19 usage examples →

Gabriella Miller Kids First Pediatric Research Program (Kids First)

cancergeneticgenomicHomo sapienslife sciencespediatricSTRIDESstructural birth defectwhole genome sequencing

The NIH Common Fund's Gabriella Miller Kids First Pediatric Research Program’s (“Kids First”) vision is to “alleviate suffering from childhood cancer and structural birth defects by fostering collaborative research to uncover the etiology of these diseases and by supporting data sharing within the pediatric research community.” The program continues to generate and share whole genome sequence data from thousands of children affected by these conditions, ranging from rare pediatric cancers, such as osteosarcoma, to more prevalent diagnoses, such as congenital heart defects. In 2018, Kids Fi...

Usage examples

Genome-wide Enrichment of De Novo Coding Mutations in Orofacial Cleft Trios. by Madison R Bishop, Kimberly K Diaz Perez, et al.
CAVATICA by Seven Bridges Genomics
Kids First DRC Portal by Kids First DRC
Development and Clinical Validation of a Large Fusion Gene Panel for Pediatric Cancers. by Fengqi Chang, Fumin Lin, et al.
Whole genome sequencing of orofacial cleft trios from the Gabriella Miller Kids First Pediatric Research Consortium identifies a new locus on chromosome 21. by Nandita Mukhopadhyay, Madison Bishop, et al.

See 19 usage examples →

NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 & 19

agriculturedisaster responseearth observationgeospatialmeteorologicalsatellite imageryweather

NEW GOES-19 Data!! On April 4, 2025 at 1500 UTC, the GOES-19 satellite will be declared the Operational GOES-East satellite. All products and services, including NODD, for GOES-East will transition to GOES-19 data at that time. GOES-19 will operate out of the GOES-East location of 75.2°W starting on April 1, 2025 and through the operational transition. Until the transition time and during the final stretch of Post Launch Product Testing (PLPT), GOES-19 products are considered non-operational regardless of their validation maturity level. Shortly following the transition of GOES-19 to GOES-East, all data distri...

Usage examples

Visualize GOES-16 in Python using Xarray by Hamed Alemohammad
Observations of lightning in relation to transitions in volcanic activity during the 3 June 2018 Fuego Eruption by Christopher J. Schultz, Virginia P. Andrews, Kimberly D. Genareau, and Aaron R. Naeger
Beginner’s Guide to GOES-R Series Data by Danielle Losos
Billions of Birds Migrate. Where Do They Go? by National Geographic
Comparison of Lightning Forecasts from the High-Resolution Rapid Refresh Model to Geostationary Lightning Mapper Observations by Brian K. Blaylock, and John D. Horel

See 19 usage examples →

Sentinel-2 Cloud-Optimized GeoTIFFs

agriculturecogdisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in ongoing studies. This dataset is the same as the Sentinel-2 dataset, except the JP2K files were converted into Cloud-Optimized GeoTIFFs (COGs). Additionally, SpatioTemporal Asset Catalog metadata has were in a JSON file alongside the data, and a STAC API called Earth-search is freely available t...

Usage examples

Monitoring Lake Mead drought using the new Amazon SageMaker geospatial capabilities by Xiong Zhou, Anirudh Viswanathan, Erran Li, Trenton Lipscomb, and Xingjian Shi
SITS - Satellite Image Time Series Analysis for Earth Observation Data Cubes by e-Sensing
rio-tiler-pds by Vincent Sarago, et al.
Monitoring of methane (CH4) emission point sources on AWS by Janosch Woschitz, Karsten Schroer
STAC, COG, Python and QGIS by Andrew Cutts

See 19 usage examples →

Terrain Tiles

agriculturedisaster responseearth observationelevationgeospatial

A global dataset providing bare-earth terrain heights, tiled for easy usage and provided on S3.

Usage examples

See 19 usage examples →

NASA Prediction of Worldwide Energy Resources (POWER)

agricultureair qualityanalyticsarchivesatmosphereclimateclimate modeldata assimilationdeep learningearth observationenergyenvironmentalforecastgeosciencegeospatialglobalhistoryimagingindustrymachine learningmachine translationmetadatameteorologicalmodelnetcdfopendapradiationsatellite imagerysolarstatisticssustainabilitytime series forecastingwaterweatherzarr

NASA's goal in Earth science is to observe, understand, and model the Earth system to discover how it is changing, to better predict change, and to understand the consequences for life on Earth. The Applied Sciences Program, within the Earth Science Division of the NASA Science Mission Directorate, serves individuals and organizations around the globe by expanding and accelerating societal and economic benefits derived from Earth science, information, and technology research and development.

The Prediction Of Worldwide Energy Resources (POWER) Project, funded through the Applied Sciences Program at ...

Usage examples

About the Prediction Of Worldwide Energy Resources (POWER) Project ArcGIS StoryMap. by The POWER Project
Evaluation of Satellite-Based, Modeled-Derived Daily Solar Radiation Data for the Continental United States by White, J. W., G. Hoogenboom, P. W. Wilkens, P. W. Stackhouse, and J. M. Hoel
Accessing and Subsetting POWER data, from AWS S3 using Python. by The POWER Project
CEOS contributions to informing energy management and policy decision making using space-based Earth observations by Eckman, R. S., and P. W. Stackhouse
A solar azimuth formula that renders circumstantial treatment unnecessary without compromising mathematical rigor: Mathematical setup, application and extension of a formula based on the subsolar point and atan2 function by Zhang, T., P. W. Stackhouse, B. Macpherson, and J. C. Mikovitz

See 18 usage examples →

NEXRAD on AWS

agricultureearth observationmeteorologicalnatural resourceweather

Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network....

Usage examples

Level 2 Interface Control Document for Message Data Formats: Build 18 by NOAA ROC
Unlocking the Potential of NEXRAD Data through NOAA’s Big Data Partnership by Steve Ansari and Stephen Del Greco
Observed Concentric Eyewalls of Supertyphoon Hinnamnor by Sachie Kanada and Akira Nishii
Level 2 Interface Control Document for Transfer: Build 18 by NOAA ROC
Seasonal abundance and survival of North America’s migratory avifauna determined by weather radar by Adriaan M. Dokter, Andrew Farnsworth, Daniel Fink, Viviana Ruiz-Gutierrez, Wesley M. Hochachka, Frank A. La Sorte, Orin J. Robinson, Kenneth V. Rosenberg & Steve Kelling

See 18 usage examples →

1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, 4.2, and 4.4

bambioinformaticsbiologycramgeneticgenomicgenotypinglife sciencesmachine learningpopulation geneticsshort read sequencingstructural variationtertiary analysisvariant annotationwhole genome sequencing

Overview

This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (i.e., repeat expansion; STR), structural variant (SV) and other variant call files from the 1000 Genomes Project (1KGP) Phase 3 dataset (3,202 individuals, 602 trios) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, v4.2.7, and v4.4.7 software. All DRAGEN analyses were performed in the cloud using the Illumina Connected Analytics bioinformatics platform powered by Amazon Web Services (see 'Data solution empowering population genomics' for more infor

...

Usage examples

Data solution empowering population genomics by Illumina Inc. (2021)
Unveiling Illumina Connected Annotations: A breakthrough in genomic annotation by Illumina Inc. (2024)
Demystifying the versions of GRCh38/hg38 reference genomes, how they are used in DRAGEN and their impact on accuracy by Illumina Inc. (2021)
DRAGEN on AWS by Illumina Inc.
DRAGEN Bio-IT Platform by Illumina Inc.

See 17 usage examples →

Cell Painting Gallery

bioinformaticsbiologycancercell biologycell imagingcell paintingchemical biologycomputer visioncsvdeep learningfluorescence imaginggenetichigh-throughput imagingimage processingimage-based profilingimaginglife sciencesmachine learningmedicinemicroscopyorganelle

The Cell Painting Gallery is a collection of image datasets created using the Cell Painting assay. The images of cells are captured by microscopy imaging, and reveal the response of various labeled cell components to whatever treatments are tested, which can include genetic perturbations, chemicals or drugs, or different cell types. The datasets can be used for diverse applications in basic biology and pharmaceutical research, such as identifying disease-associated phenotypes, understanding disease mechanisms, and predicting a drug’s activity, toxicity, or mechanism of action (Chandrasekaran et al 2020). This collection is maintained ...

Usage examples

Image-based Profiling Handbook - for processing image-based profiling datasets using CellProfiler and pycytominer by Multiple Authors
Multiplex Cytological Profiling Assay to Measure Diverse Cellular States by Gustafsdottir SM, Ljosa V, Sokolnicki KL, Wilson JA, Walpita D, Kemp MM, Seiler KP, Carrel HA, Golub TR, Schreiber SL, Clemons PA, Carpenter AE, and Shamji AF
Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes by Bray M-A, Singh S, Han H, Davis CT, Borgeson B, Hartland C, Kost-Alimova M, Gustafsdottir SM, Gibson CC, & Carpenter AE
Scientific Community Image Forum - for software-oriented aspects of scientific imaging including analysis, processing, and acquisition by Multiple Authors
Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling by Wawer MJ, Li K, Gustafsdottir SM, Ljosa V, BodycombeNE, Marton MA, Sokolnicki KL, Bray M-A, Kemp MM, Winchester E, Taylor B, Grant GB, Hon CSK, Duvall JR, Wilson JA, Bittker JA, Dancik V, Narayan R, Subramanian A, Winckler W, Golub TR, Carpenter AE, Shamji AF, Schreiber SL, & Clemons PA

See 17 usage examples →

Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE)

cogearth observationgeophysicsgeospatialglobalicenetcdfsatellite imagerystaczarr

The Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) project has a singular mission: to accelerate ice sheet and glacier research by producing globally comprehensive, high resolution, low latency, temporally dense, multi-sensor records of land ice and ice shelf change while minimizing barriers between the data and the user. ITS_LIVE data currently consists of NetCDF Level 2 scene-pair ice flow products posted to a standard 120 m grid derived from Landsat 4/5/7/8/9, Sentinel-2 optical scenes, and Sentinel-1 SAR scenes. We have processed all land-ice intersecting image pai...

Usage examples

ITS_LIVE_TOOL: A package designed to aid users working with the ITS_LIVE data by Victor Devaux-Chupin and Emma Marshall
Widespread slowdown in thinning rates of West Antarctic Ice Shelves by Paolo, F. S., A.S. Gardner, C.A. Greene, J. Nilsson, M.P. Schodlok, N.-J. Schlegel, and H.A. Fricker
ITS_LIVE Data Access Portal by NSIDC
Increased West Antarctic and unchanged East Antarctic ice discharge over the last 7 years by Gardner, A. S., G. Moholdt, T. Scambos, M. Fahnstock, S. Ligtenberg, M. van den Broeke, and J. Nilsson
ITS_LIVE Point Data Access by Maria Liukis, Alex S. Gardner, Luis A. López, Mark Fahnestock, and Joseph H. Kennedy

See 16 usage examples →

MERRA-2 tavg1_2d_slv_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics 0.625 x 0.5 degree

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfoceansopendapwater

M2T1NXSLV (or tavg1_2d_slv_Nx) is an hourly time-averaged 2-dimensional data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of meteorology diagnostics at popularly used vertical levels, such as air temperature at 2-meter (or at 10-meter, 850hPa, 500 hPa, 250hPa), wind components at 50-meter (or at 2-meter, 10-meter, 850 hPa, 500hPa, 250 hPa), sea level pressure, surface pressure, and total precipitable water vapor (or ice water, liquid water). The data field is time-stamped with the central time of an hour starting from 00:30 UTC, e.g.: 00:30, 01:30, … , 23:30 UTC.MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) us...

Usage examples

Design and implementation of components in the Earth System Modeling Framework by Collins, N., G. Theurich, C. DeLuca, M. Suarez, A. Trayanov, V. Balaji, P. Li, W. Yang, C. Hill, and A. da Silva
Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances by Purser, R. J., W.-S. Wu, D. F. Parrish, and N. M. Roberts
Documentation and Validation of the Goddard Earth Observing System (GEOS) Data Assimilation System - Version 4 by Bloom, S., A. da Silva, D. Dee, M. Bosilovich, J.-D. Chern, S. Pawson, S. Schubert, M. Sienkiewicz, I. Stajner, W.-W. Tan, M.-L. Wu
Land Surface Precipitation in MERRA-2. by Reichle, R.H., Q. Liu, R.D. Koster, C.S. Draper, S.P.P. Mahanama, and G.S. Partyka
The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). by Gelaro, R., W. McCarty, M. J. Suárez, R. Todling, A. Molod, L. Takacs, C. A. Randles, A. Darmenov, M. G. Bosilovich, R. Reichle, et al.

See 16 usage examples →

ESA WorldCover

agriculturecogdisaster responseearth observationgeospatialland coverland usemachine learningmappingnatural resourcesatellite imagerystacsustainabilitysynthetic aperture radar

The European Space Agency (ESA) WorldCover product provides global land cover maps for 2020 & 2021 at 10 m resolution based on Copernicus Sentinel-1 and Sentinel-2 data. The WorldCover product comes with 11 land cover classes and has been generated in the framework of the ESA WorldCover project, part of the 5th Earth Observation Envelope Programme (EOEP-5) of the European Space Agency. A first version of the product (v100), containing the 2020 map was released in October 2021. The 2021 map was released in October 2022 using an improved algorithm (v200). The WorldCover 2020 and 2021 maps we...

Usage examples

ESA WorldCover 10 m 2021 v200 by Zanaga, D., Van De Kerchove, R.,Daems, D.,De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S., Lesiv, M., Herold, M., Tsendbazar, N.E., Xu, P., Ramoino, F., Arino, O.
ESA Viewer 2021 by ESA
Release of the 10 m WorldCover map by Ruben Van De Kerchove
TerraScope Viewer by TerraScope
The world's most populated and greenest megacities (and how we found out) by Michael Dangermond, Emily Meriam

See 15 usage examples →

Genome Aggregation Database (gnomAD)

bioinformaticsgeneticgenomiclife sciencespopulationpopulation geneticsshort read sequencingwhole genome sequencing

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects. The summary data provided here are released for the benefit of the wider scientific community without restriction on use. The v4.1 data set (GRCh38) spans 730,947 exome sequences and 76,215 whole-genome sequences from unrelated individuals, of diverse ancestries, sequenced sequenced as part of various disease-specific and population genetic studies. The gnomAD Principal Investigators and team can be found Details →

Usage examples

Evaluating potential drug targets through human loss-of-function genetic variation. Nature 581, 459–464 (2020) by Minikel, E. V., Karczewski, K. J., Martin, H. C., Cummings, B. B., Whiffin, N., Rhodes, D., Alföldi, J., Trembath, R. C., van Heel, D. A., Daly, M. J., Genome Aggregation Database Production Team, Genome Aggregation Database Consortium, Schreiber, S. L., & MacArthur, D. G.
A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020) by Collins, R. L., Brand, H., Karczewski, K. J., Zhao, X., Alföldi, J., Francioli, L. C., Khera, A. V., Lowther, C., Gauthier, L. D., Wang, H., Watts, N. A., Solomonson, M., O’Donnell-Luria, A., Baumann, A., Munshi, R., Walker, M., Whelan, C., Huang, Y., Brookings, T., ... Talkowski, M. E.
Technical artifact drives apparent deviation from Hardy-Weinberg equilibrium at CCR5-∆32 and other variants in gnomAD. bioRxiv (p. 784157) by Karczewski, K. J., Gauthier, L. D., Daly, M. J.
gnomAD v3.0 by Laurent Francioli, Daniel MacArthur
Hail utilities for gnomAD by gnomAD Production Team

See 15 usage examples →

GeoNet Aotearoa New Zealand Data

broadbandcoastalContinuously Operating Reference Station (CORS)earthquakesgeophysicsgeosciencegeoscienceGNSSGPSoceansRINEXseismology

GeoNet provides geological hazard information for Aotearoa New Zealand. This dataset contains data and products recorded by the GeoNet sensor network.

GNSS (Global Navigation Satellite System) data include raw data in proprietary and Receiver Independent Exchange Format (RINEX) and local tie-in survey conducted during equipment changes, more details can be found on the GeoNet geodetic page website.
Coastal gauge data include relative measurement of sea level measured by tsunami monitoring gauges. Raw and quality control data are provided in CREX format (Character Form for the Representtion and eXchange of metereological data), mo...

Usage examples

The New Zealand National Seismograph Network by T. Petersen, K. Gledhill, M. Chadwick, N. Gale, J. Ristau
GeoNet Seismic Digital Waveform dataset by GNS Science - Te Pū Ao
GeoNet datasets available via AWS Open Data Programme by GNS Science - Te Pū Ao
Available GeoNet GNSS event high rate data by GNS Science - Te Pū Ao
GeoNet Deep-ocean Assessment and Reporting of Tsunami (DART) Dataset by GNS Science - Te Pū Ao

See 15 usage examples →

MERRA-2 inst3_3d_aer_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Aerosol Mixing Ratio 0.625 x 0.5 degree

agricultureair qualityatmospherebiodiversitycarbonclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NVAER (or inst3_3d_aer_Nv) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of aerosol mixing ratio parameters at 72 model layers, such as dust, sulphur dioxide, sea salt, black carbon, and organic carbon. The data field is available every three hour starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. Section 4.2 of the MERRA-2 File Specification document provides pressure values nominal for a 1000 hPa surface pressure and refers to the top edge of the layer. The lev=1 is for the top layer, and lev=72 is for the bottom (or surface) model layer. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced ...

Usage examples

The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). by Gelaro, R., W. McCarty, M. J. Suárez, R. Todling, A. Molod, L. Takacs, C. A. Randles, A. Darmenov, M. G. Bosilovich, R. Reichle, et al.
Land Surface Precipitation in MERRA-2. by Reichle, R.H., Q. Liu, R.D. Koster, C.S. Draper, S.P.P. Mahanama, and G.S. Partyka
Numerical aspects of the application of recursive filters to variational statistical analysis. Part II: Spatially inhomogeneous and anisotropic general covariances by Purser, R. J., W.-S. Wu, D. F. Parrish, and N. M. Roberts
The MERRA-2 aerosol reanalysis, 1980 onward. Part II: Evaluation and case studies. by Buchard V., C. A. Randles, A. M. da Silva, A. Darmenov, P. R. Colarco, R. Govindaraju, R. Ferrare, J. Hair, A. J. Beyersdorf, L. D. Ziemba, H. Yu
Assessment of MERRA-2 Land Surface Hydrology Estimates. by Reichle, R. H., C. S. Draper, Q. Liu, M. Girotto, S. P. P. Mahanama, R. D. Koster, and G. J. M. De Lannoy

See 15 usage examples →

MERRA-2 inst3_3d_asm_Np: 3d,3-Hourly,Instantaneous,Pressure-Level,Assimilation,Assimilated Meteorological Fields

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NPASM (or inst3_3d_asm_Np) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of meteorological parameters at 42 pressure levels, such as temperature, wind components, vertical pressure velocity, water vapor, ozone mass mixing ratio, and layer height. The data field is available every three hours starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. The information on the pressure levels can be found in the section 4.2 of the MERRA-2 File Specification document. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observi...

Usage examples

Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances by Purser, R. J., W.-S. Wu, D. F. Parrish, and N. M. Roberts
A catchment-based approach to modeling land surface processes in a GCM, Part 1, Model Structure by Koster, R. D., M. J. Suarez, A. Ducharne, M. Stieglitz, and P. Kumar
2015b: MERRA-2: Initial Evaluation of the Climate by Bosilovich, M. G., S. Akella, L. Coy, R. Cullather, C. Draper, R. Gelaro, R. Kovach, Q.Liu, A. Molod, P. Norris, K. Wargan, W. Chao, R. Reichle, L. Takacs, Y. Vikhliaev, S. Bloom, A. Collow, S. Firth, G. Labow, G. Partyka, S. Pawson, O. Reale, S. D. Schubert, and M. Suarez
The MERRA-2 aerosol reanalysis, 1980 onward. Part II: Evaluation and case studies. by Buchard V., C. A. Randles, A. M. da Silva, A. Darmenov, P. R. Colarco, R. Govindaraju, R. Ferrare, J. Hair, A. J. Beyersdorf, L. D. Ziemba, H. Yu
Three-dimensional variational analysis with spatially inhomogeneous covariances by Wu, W.-S., R.J. Purser and D.F. Parrish

See 15 usage examples →

MERRA-2 inst3_3d_asm_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Assimilated Meteorological Fields 0.625 x 0.5 degree

agricultureair temperatureatmospherebiodiversityclimatecoastaldatacenterecosystemsglobalhydrologyicelandmetadatanetcdfopendapwater

M2I3NVASM (or inst3_3d_asm_Nv) is an instantaneous 3-dimensional 3-hourly data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of meteorological parameters at 72 model layers, such as temperature, wind components, vertical pressure velocity, water vapor, and layer height. The data field is available every three hour starting from 00:00 UTC, e.g.: 00:00, 03:00, … , 21:00 UTC. Section 4.2 of the MERRA-2 File Specification document provides pressure values nominal for a 1000 hPa surface pressure and refers to the top edge of the layer. The lev=1 is for the top layer, and lev=72 is for the bottom (or surface) model layer. MERRA-2 is the latest version of global atmospheric reanalysis for the satell...

Usage examples

Documentation and Validation of the Goddard Earth Observing System (GEOS) Data Assimilation System - Version 4 by Bloom, S., A. da Silva, D. Dee, M. Bosilovich, J.-D. Chern, S. Pawson, S. Schubert, M. Sienkiewicz, I. Stajner, W.-W. Tan, M.-L. Wu
Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances by Purser, R. J., W.-S. Wu, D. F. Parrish, and N. M. Roberts
The MERRA-2 Aerosol Reanalysis, 1980 Onward. Part I: System Description and Data Assimilation Evaluation. by Randles, C. A., A. M. da Silva, V. Buchard, P. R. Colarco, A. Darmenov, R. Govindaraju, A. Smirnov, B. Holben, R. Ferrare, J. Hair, Y.Shinozuka, and C.J. Flynn
The MERRA-2 aerosol reanalysis, 1980 onward. Part II: Evaluation and case studies. by Buchard V., C. A. Randles, A. M. da Silva, A. Darmenov, P. R. Colarco, R. Govindaraju, R. Ferrare, J. Hair, A. J. Beyersdorf, L. D. Ziemba, H. Yu
Three-dimensional variational analysis with spatially inhomogeneous covariances by Wu, W.-S., R.J. Purser and D.F. Parrish

See 15 usage examples →

NOAA Joint Polar Satellite System (JPSS)

agricultureclimatemeteorologicalweather

Near Real Time JPSS data is now flowing! See bucket information on the right side of this page to access products!
Satellites in the JPSS constellation gather global measurements of atmospheric, terrestrial and oceanic conditions, including sea and land surface temperatures, vegetation, clouds, rainfall, snow and ice cover, fire locations and smoke plumes, atmospheric temperature, water vapor and ozone. JPSS delivers key observations for the Nation's essential products and services, including forecasting severe weather like hurricanes, tornadoes and blizzards days in advance, and assessin...

Usage examples

JPSS Annual Science Digest 2022 by NOAA
Python scripts to download and display JPSS NODD datasets by Amy Huff and Rebekah Esmaili
JPSS Proving Ground by NOAA
GOES-R/JPSS Short Course on Making Beautiful Images of NOAA Satellite Data using Python from the 2023 Annual Meeting of American Meteorological Society by Amy Huff and Rebekah Esmaili
JPSS_AWS Timeselect by Mya Sears

See 15 usage examples →

SpaceNet

computer visiondisaster responseearth observationgeospatialmachine learningsatellite imagery

SpaceNet, launched in August 2016 as an open innovation project offering a repository of freely available imagery with co-registered map features. Before SpaceNet, computer vision researchers had minimal options to obtain free, precision-labeled, and high-resolution satellite imagery. Today, SpaceNet hosts datasets developed by its own team, along with data sets from projects like IARPA’s Functional Map of the World (fMoW).

Usage examples

Getting Started with SpaceNet Data by Adam Van Etten
Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery by Ritwik Gupta, Colorado Reed, Anja Rohrbach, and Trevor Darrell
The SpaceNet 7 Multi-Temporal Urban Development Challenge: Dataset Release by Adam Van Etten
SpaceNet: Winning Implementations and New Imagery Release by Todd Stavish
SpaceNet 6: Dataset Release by Jake Shermeyer

See 15 usage examples →

The Singapore Nanopore Expression Data Set

bambioinformaticsfast5fastafastqgenomiclife scienceslong read sequencingshort read sequencingtranscriptomics

The Singapore Nanopore Expression (SG-NEx) project is an international collaboration to generate reference transcriptomes and a comprehensive benchmark data set for long read Nanopore RNA-Seq. Transcriptome profiling is done using PCR-cDNA sequencing (PCR-cDNA), amplification-free cDNA sequencing (direct cDNA), direct sequencing of native RNA (direct RNA), and short read RNA-Seq. The SG-NEx core data includes 5 of the most commonly used cell lines and it is extended with additional cell lines and samples that cover a broad range of human tissues. All core samples are sequenced with at least 3 ...

Usage examples

Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. by Christopher Hendra et al.
Accessing the SG-NEx dataset on AWS by Ying Chen
nf-core/nanoseq: A nanopore DNA and RNA-Seq demultiplexing, QC, alignment and analysis pipeline by Chelsea Sawyer et al.
xPore: Identification of differential RNA modification from direct RNA-Seq data by Ploy Pratanwanich et al.
Basecalling and analysing SG-NEx samples in S/BLOW5 format by Hasindu Gamaarachchi

See 15 usage examples →

2021 Amazon Last Mile Routing Research Challenge Dataset

amazon.scienceanalyticsdeep learninggeospatiallast milelogisticsmachine learningoptimizationroutingtransportationurban

The 2021 Amazon Last Mile Routing Research Challenge was an innovative research initiative led by Amazon.com and supported by the Massachusetts Institute of Technology’s Center for Transportation and Logistics. Over a period of 4 months, participants were challenged to develop innovative machine learning-based methods to enhance classic optimization-based approaches to solve the travelling salesperson problem, by learning from historical routes executed by Amazon delivery drivers. The primary goal of the Amazon Last Mile Routing Research Challenge was to foster innovative applied research in r...

Usage examples

The Driver-Aide Problem Coordinated Logistics for Last-Mile Delivery by S. Raghavan , Rui Zhang
Code repository used for the 2021 Amazon Routing Research Challenge (this repository is included for reference and documentation purposes only, you do not need to install it to access the data) by CAVE Lab, MIT Center for Transportation and Logistics
AWS Last Mile Route Sequence Optimization by Chen Wu, Yin Song, Verdi March, Eden Duthi
Does parking matter? The impact of parking time on last-mile delivery optimization by Sara Reed, Ann Melissa Campbell, Barrett W. Thomas
Integrating driver behavior into last-mile delivery routing - Combining machine learning and optimization in a hybrid decision support framework by Peter Dieter, Matthew Caron, Guido Schryen

See 17 usage examples →

Distributed Archives for Neurophysiology Data Integration (DANDI)

biologycalcium imagingcell imagingelectrophysiologyhdf5life sciencesneuroimagingneurophysiologyneurosciencezarr

DANDI is a public archive of neurophysiology datasets, including raw and processed data, and associated software containers. Datasets are shared according to Creative Commons CC0 or CC-BY licenses. This US BRAIN Initiative supported archive provides a broad range of cellular neurophysiology data including intracellular and extracellular electrophysiology, optophysiology, calcium imaging, fiber photometry, behavioral time-series, and images from immunostaining experiments, from over 20 species.Data is organized using community standards: NWB - Neurodata Without Borders, BIDS - Brain Imaging Data Structure, NGFF - Next Generation File Format for...

Usage examples

A comparison of neuroelectrophysiology databases by Subash P, Gray A, Bhattacharyya B, et al.
Neurosift - Interactive NWB Viewer by Flatiron Institute
DANDI User Guide - Downloading and Using Data by DANDI Project
DANDI JupyterHub by DANDI Project
Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools by Magland JF, Ly R, Rübel O, Dichter B

See 14 usage examples →

Digital Earth Africa Global Mangrove Watch

coastalcogdeafricaearth observationgeospatialland covernatural resourcesatellite imagerystacsustainability

The Global Mangrove Watch (GMW) dataset is a result of the collaboration between Aberystwyth University (U.K.), solo Earth Observation (soloEO; Japan), Wetlands International the World Conservation Monitoring Centre (UNEP-WCMC) and the Japan Aerospace Exploration Agency (JAXA). The primary objective of producing this dataset is to provide countries lacking a national mangrove monitoring system with first cut mangrove extent and change maps, to help safeguard against further mangrove forest loss and degradation. The Global Mangrove Watch dataset (version 2) consists of a global baseline map of ...

Usage examples

Climate Next: How data and community can save Zanzibar’s mangroves by Amazon Staff
Digital Earth Africa Global Mangrove Watch Notebook by Digital Earth Africa Contributors
Digital Earth Africa Sandbox by Digital Earth Africa Contributors
Digital Earth Africa Training by Digital Earth Africa Contributors
Zanzibar: The Essential Mangrove | Climate Next by AWS by Amazon Web Services (AWS)

See 13 usage examples →

Digital Earth Africa Landsat Collection 2 Level 2

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

Digital Earth Africa (DE Africa) provides free and open access to a copy of Landsat Collection 2 Level-2 products over Africa. These products are produced and provided by the United States Geological Survey (USGS). The Landsat series of Earth Observation satellites, jointly led by USGS and NASA, have been continuously acquiring images of the Earth’s land surface since 1972. DE Africa provides data from Landsat 5, 7 and 8 satellites, including historical observations dating back to late 1980s and regularly updated new acquisitions. New Level-2 Landsat 7 and Landsat 8 data are available after 15...

Usage examples

Digital Earth Africa Explorer (LS5 Surface Reflectance) by Digital Earth Africa Contributors
Introduction to DE Africa by Dr Fang Yuan
Digital Earth Africa Training by Digital Earth Africa Contributors
Digital Earth Africa Geoportal by ESRI
Digital Earth Africa web services by Digital Earth Africa Contributors

See 13 usage examples →

Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 Imagery

biologyfluorescence imagingimage processingimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscience

This data set, made available by Janelia's FlyLight project, consists of fluorescence images of Drosophila melanogaster driver lines, aligned to standard templates, and stored in formats suitable for rapid searching in the cloud. Additional data will be added as it is published.

Usage examples

NeuronBridge by Jody Clements, Rob Svirskas, Hideo Otsuna, Cristian Goina, Konrad Rokicki
Scaling Neuroscience Research on AWS by Konrad Rokicki
A GAL4-Driver Line Resource for Drosophila Neurobiology by Arnim Jenett, Gerald M Rubin, Teri-TB Ngo, David Shepherd, Christine Murphy, Heather Dionne, Barret D Pfeiffer, Amanda Cavallaro, Donald Hall, Jennifer Jeter, Nirmala Iyer, Dona Fetter, Joanna H Hausenfluck, Hanchuan Peng, Eric T Trautman, Robert R Svirskas, Eugene W Myers, Zbigniew R Iwinski, Yoshinori Aso, Gina M DePasquale, Adrianne Enos, Phuson Hulamm, Shing Chun Benny Lam, Hsing-Hsi Li, Todd R Laverty, Fuhui Long, Lei Qu, Sean D Murphy, Konrad Rokicki, Todd Safford, Kshiti Shaw, Julie H Simpson, Allison Sowell, Susana Tae, Yang Yu, Christopher T Zugates
FlyLight Project Website by Geoffrey Meissner
Color depth MIP mask search: a new tool to expedite Split-GAL4 creation by Hideo Otsuna, Masayoshi Ito, Takashi Kawase

See 13 usage examples →

RADIANT Public Data

cancergeneticgenomicHomo sapienslife sciencesmedical imagingpediatricradiologytranscriptomicswhole genome sequencing

The Real-time Analysis and Discovery in Integrated And Networked Technologies (RADIANT) initiative seeks to develop an extensible, federated framework for rapid exchange of multimodal clinical and research data on behalf of accelerated discovery and patient impact. Coordination and implementation of initial RADIANT deployments will leverage a network of more than 35 partnered health care systems and participating patient families within the Children’s Brain Tumor Network (CBTN) and the Pediatric Neuro-Oncology Consortium (PNOC). This data set is composed of public multi-modal data provisio...

Usage examples

The children's brain tumor network (CBTN) - Accelerating research in pediatric central nervous system tumors through collaboration and open science. by Jena V Lilly, Jo Lynne Rokita, Jennifer L Mason, et al.
CAVATICA by Seven Bridges Genomics
PedcBioPortal by cBioPortal
Use of External Control Cohorts in Pediatric Brain Tumor Clinical Trials by Ashley S Margol, Annette M Molinaro, Arzu Onar-Thomas, et al.
RADIANT Source Code by RADIANT Team

See 13 usage examples →

Digital Earth Africa - Copernicus Global Land Service - Lake Water Quality

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacwater

The Copernicus Global Land Service – Lake Water Quality products offer a comprehensive, satellite-derived monitoring system for assessing key water quality indicators in major large lakes, typically those greater than 50 hectares. These datasets are generated using optical satellite sensors, primarily Sentinel-2 MSI and Sentinel-3 OLCI, with earlier archives derived from Envisat MERIS. Spanning multiple spatial resolutions (100 m and 300 m) and temporal scales (10-day composites), they support both near-real-time and retrospective assessments of inland water quality.Key parameters include surf...

Usage examples

Digital Earth Africa Sandbox by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Explorer (Lake Water Quality 2019-2024 (raster 100 m), 10-daily – version 1) by Digital Earth Africa Contributors
Digital Earth Africa Training by Digital Earth Africa Contributors
Digital Earth Africa web services by Digital Earth Africa Contributors

See 11 usage examples →

Digital Earth Africa CHIRPS Rainfall

agricultureclimatecogdeafricaearth observationfood securitygeospatialmeteorologicalsatellite imagerystacsustainability

Digital Earth Africa (DE Africa) provides free and open access to a copy of the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) monthly and daily products over Africa. The CHIRPS rainfall maps are produced and provided by the Climate Hazards Center in collaboration with the US Geological Survey, and use both rain gauge and satellite observations. The CHIRPS-2.0 Africa Monthly dataset is regularly indexed to DE Africa from the CHIRPS monthly data. The CHIRPS-2.0 Africa Daily dataset is likewise indexed from the CHIRPS daily data. Both products have been converted to clou...

Usage examples

Rainfall - Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) by Digital Earth Africa Contributors
The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes by Chris Funk, Pete Peterson, Martin Landsfeld, Diego Pedreros, James Verdin, Shraddhanand Shukla, Gregory Husak, James Rowland, Laura Harrison, Andrew Hoell and Joel Michaelsen
Digital Earth Africa Notebook Repo by Digital Earth Africa Contributors
Digital Earth Africa Sandbox by Digital Earth Africa Contributors
Digital Earth Africa Explorer (CHIRPS daily rainfall) by Digital Earth Africa Contributors

See 11 usage examples →

Digital Earth Africa Coastlines

climatecoastaldeafricaearth observationgeospatialsatellite imagerysustainability

Africa's long and dynamic coastline is subject to a wide range of pressures, including extreme weather and climate, sea level rise and human development. Understanding how the coastline responds to these pressures is crucial to managing this region, from social, environmental and economic perspectives. The Digital Earth Africa Coastlines (provisional) is a continental dataset that includes annual shorelines and rates of coastal change along the entire African coastline from 2000 to the present. The product combines satellite data from the Digital Earth Africa program with tidal modelling t...

Usage examples

Introducing the Digital Earth Africa Coastlines Service by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Geoportal by Digital Earth Africa Contributors
Digital Earth Africa Training by Digital Earth Africa Contributors
Digital Earth Africa Sandbox by Digital Earth Africa Contributors

See 11 usage examples →

Digital Earth Africa GeoMAD

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

GeoMAD is the Digital Earth Africa (DE Africa) surface reflectance geomedian and triple Median Absolute Deviation data service. It is a cloud-free composite of satellite data compiled over specific timeframes. The geomedian component combines measurements collected over the specified timeframe to produce one representative, multispectral measurement for every pixel unit of the African continent. The end result is a comprehensive dataset that can be used to generate true-colour images for visual inspection of anthropogenic or natural landmarks. The full spectral dataset can be used to develop m...

Usage examples

Digital Earth Africa Sandbox by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Explorer (Landsat 8 GeoMAD) by Digital Earth Africa Contributors
Digital Earth Africa Explorer (Sentinel-2 Semi-Annual GeoMAD) by Digital Earth Africa Contributors
Digital Earth Africa Explorer (Sentinel-2 Annual GeoMAD) by Digital Earth Africa Contributors

See 11 usage examples →

Digital Earth Africa Sentinel-2 Level-2A Surface Reflectance Collection 1

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is part of the European Union Copernicus programme for Earth observations. Sentinel-2 consists of twin satellites, Sentinel-2A (launched 23 June 2015) and Sentinel-2B (launched 7 March 2017). The two satellites have the same orbit, but 180° apart for optimal coverage and data delivery. Their combined data is used in the Digital Earth Africa Sentinel-2 product. Together, they cover all Earth’s land surfaces, large islands, inland and coastal waters every 3-5 days. Sentinel-2 data is tiered by level of pre-processing. Level-0, Level-1A and Level-1B data contain raw data fr...

Usage examples

Use Sentinel-2-C1 data in the Open Data Cube by Alex Leith
Digital Earth Africa Geoportal by Digital Earth Africa Contributors
Digital Earth Africa web services by Digital Earth Africa Contributors
Introduction to DE Africa by Dr Fang Yuan
Downloading and streaming data using STAC metadata by Digital Earth Africa Contributors

See 11 usage examples →

Digital Earth Africa Water Observations from Space

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacwater

Water Observations from Space (WOfS) is a service that draws on satellite imagery to provide historical surface water observations of the whole African continent. WOfS allows users to understand the location and movement of inland and coastal water present in the African landscape. It shows where water is usually present; where it is seldom observed; and where inundation of the surface has been observed by satellite. They are generated using the WOfS classification algorithm on Landsat satellite data. There are several WOfS products available for the African continent including scene-level dat...

Usage examples

Digital Earth Africa Training by Digital Earth Africa Contributors
Digital Earth Africa Explorer (Water Observations from Space All-Time Summary) by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Explorer (Water Observations from Space) by Digital Earth Africa Contributors
Analysing effects of drought on inundation extent and vegetation cover dynamics in the Okavango Delta by Kelebogile Mfundisi, Kenneth Mubea, Fang Yuan, Chad Burton and Edward Boamah

See 11 usage examples →

International Neuroimaging Data-Sharing Initiative (INDI)

Homo sapiensimaginglife sciencesmagnetic resonance imagingneuroimagingneuroscience

This bucket contains multiple neuroimaging datasets that are part of the International Neuroimaging Data-Sharing Initiative. Raw human and non-human primate neuroimaging data include 1) Structural MRI; 2) Functional MRI; 3) Diffusion Tensor Imaging; 4) Electroencephalogram (EEG) In addition to the raw data, preprocessed data is also included for some datasets. A complete list of the available datasets can be seen in the documentation lonk provided below.

Usage examples

The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. by A. Di Martino, C-G Yan, ..., M.P. Milham
Downloading FCP-INDI Neuroimaging Data from Amazon S3 by INDI
Making data sharing work: The FCP/INDI experience by M. Mennes, B.B. Biswal, F.X. Castellanos, M.P. Milham
Accelerating the Evolution of Nonhuman Primate Neuroimaging by M.P. Milham, C. Petkov
The Healthy Brain Network Serial Scanning Initiative: a resource for evaluating inter-individual differences and their reliabilities across scan conditions and sessions by D. O'Connor, N.V. Potler, ..., M.P. Milham

See 11 usage examples →

Low Altitude Disaster Imagery (LADI) Dataset

aerial imagerycoastalcomputer visiondisaster responseearth observationearthquakesgeospatialimage processingimaginginfrastructurelandmachine learningmappingnatural resourceseismologytransportationurbanwater

The Low Altitude Disaster Imagery (LADI) Dataset consists of human and machine annotated airborne images collected by the Civil Air Patrol in support of various disaster responses from 2015-2023. Two key distinctions are the low altitude, oblique perspective of the imagery and disaster-related features, which are rarely featured in computer vision benchmarks and datasets.

Usage examples

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains by George Awad, Asad A. Butt, Keith Curtis, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Jesse Zhang, Eliot Godard, Baptiste Chocot, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quenot
An overview on the evaluated video retrieval tasks at TRECVID 2022 by George Awad, Keith Curtis, Asad Butt, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Eliot Godard, Lukas Diduch, Jeffrey Liu, Yvette Graham, Georges Quenot
Evaluating Multiple Video Understanding and Retrieval Tasks at TRECVID 2021 by George Awad, Asad Butt, Keith Curtis, Jonathan G. Fiscus, Afzal A. Godil, Yooyoung Lee, Andrew Delgado, Eliot Godard, Baptiste Chocot, Lukas Diduch, Jeffrey Liu, Yvette Graham, Gareth Jones, Georges Quenot
Train and Deploy an Image Classifier for Disaster Response by Jianyu Mao, Kiana Harris, Nae-Rong Chang, Caleb Pennell, Yiming Ren
Video Testing at the FirstNet Innovation and Test Lab Using a Public Safety Dataset by Chris Budny, Jeffrey Liu, Andrew Weinert

See 11 usage examples →

Maxar Open Data Program

cogdisaster responseearth observationgeospatialsatellite imagerystac

Pre and post event high-resolution satellite imagery in support of emergency planning, risk assessment, monitoring of staging areas and emergency response, damage assessment, and recovery. These images are generated using the Maxar ARD pipeline, tiled on an organized grid in analysis-ready cloud-optimized formats.

Usage examples

Maxar ARD SDK (max-ard) by Maxar Open Data
MGP Xpress by MGP Xpress
Using Data from Earth Observation to Support Sustainable Development Indicators: An Analysis of the Literature and Challenges for the Future by Ana Andries, Stephen Morse, Richard J. Murphy, Jim Lynch, and Emma R. Woolliams
ARD and Command Line Tools by Maxar Open Data
Visualizing Maxar Open Data with SageMaker Studio Lab by Qiusheng Wu

See 11 usage examples →

NOAA Operational Forecast System (OFS)

climatecoastaldisaster responseenvironmentalmeteorologicaloceanswaterweather

ANNOUNCEMENTS: [NOS OFS Version Updates and Implementation of Upgraded Oceanographic Forecast Modeling Systems for Lakes Superior and Ontario; Effective October 25, 2022}(https://www.weather.gov/media/notification/pdf2/scn22-91_nos_loofs_lsofs_v3.pdf)

For decades, mariners in the United States have depended on NOAA's Tide Tables for the best estimate of expected water levels. These tables provide accurate predictions of the astronomical tide (i.e., the change in water level due to the gravitational effects of the moon and sun and the rotation of the Earth); however, they cannot predict water-level changes due to wind, atmospheric pressure, and river flow, which are often significan...

Usage examples

Delaware Bay and River OFS Flyer by NOAA
Chesapeake Bay OFS Flyer by NOAA
Technical Implementation Notice for Chesapeake Bay OFS by NOAA
Technical Implementation Notice for Delaware River and Bay OFS by NOAA
Tampa Bay OFS Flyer by NOAA

See 11 usage examples →

Open Targets

bioinformaticsbiologydrug discoverygeneticgenomiclife sciencesprotein

The Open Targets Platform is a comprehensive data integration tool that supports systematic identification and prioritisation of potential therapeutic drug targets. By integrating publicly available datasets including data generated by the Open Targets experimental and informatics research programmes, the Platform provides data and services to assist in the task of therapeutic hypothesis building.

Usage examples

See 11 usage examples →

The Cancer Dependency Map (DepMap) Cancer Cell Line Encyclopedia (CCLE) Dataset

bambioinformaticsbiologycancergeneticgenomicHomo sapienslife sciencesshort read sequencingtranscriptomicswhole exome sequencingwhole genome sequencing

This dataset consists of whole genome sequencing (WGS), whole exome sequencing (WES), and RNA sequencing files generated from ~1000 cancer cell lines described in Ghandi et al., 2019.

Usage examples

The Cancer Dependency Map (DepMap) by Arafeh, Shibue, Dempster et al.
The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks by Ben Guebila, Wang, Lopes-Ramos et al.
Partial gene suppression improves identification of cancer vulnerabilities when CRISPR-Cas9 knockout is pan-lethal by Krill-Burger, Dempster, Borah et al.
Machine learning multi-omics analysis reveals cancer driver dysregulation in pan-cancer cell lines compared to primary tumors by Sanders, Chandra, Zebarjadi et al.
Cancer Cell Line Encyclopedia (CCLE) by Ghandi, Huang, Jané-Valbuena et al.

See 11 usage examples →

Alliance of Genome Resources

bioinformaticsbiologyCaenorhabditis elegansDanio rerioDrosophila melanogasterfastagene expressiongeneticgenomegenomicHomo sapienslife sciencesMus musculusproteinRattus norvegicustranscriptomicsvcf

The Alliance of Genome Resources is a consortium that integrates genomic, genetic, and molecular data from leading model organism databases including Drosophila melanogaster, Caenorhabditis elegans, Danio rerio (zebrafish), Mus musculus (mouse), Rattus norvegicus (rat), Saccharomyces cerevisiae (yeast), Xenopus laevis and Xenopus tropicalis (frogs), and human reference data. The Alliance provides comprehensive datasets including gene annotations, disease associations, expression data (bulk and single-cell RNA-Seq), protein and genetic interactions, orthology relationships, variants and alleles...

Usage examples

WormBase - C. elegans Database by WormBase Consortium
Xenbase - Xenopus Database by Xenbase
SGD - Saccharomyces Genome Database by SGD
Alliance of Genome Resources Portal - unified model organism research platform by Alliance of Genome Resources Consortium
FlyBase - Drosophila Database by FlyBase Consortium

See 10 usage examples →

CBERS on AWS

agriculturecogdisaster responseearth observationgeospatialimagingsatellite imagerystac

Imagery acquired by the China-Brazil Earth Resources Satellite (CBERS), 4 and 4A. The image files are recorded and processed by Instituto Nacional de Pesquisas Espaciais (INPE) and are converted to Cloud Optimized Geotiff format in order to optimize its use for cloud based applications. Contains all CBERS-4 MUX, AWFI, PAN5M and PAN10M scenes acquired since the start of the satellite mission and is daily updated with new scenes. CBERS-4A MUX Level 4 (Orthorectified) scenes are being ingested starting from 04-13-2021. CBERS-4A WFI Level 4 (Orthorectified) scenes are being ingested starting from ...

Usage examples

STAC V1.0.0 endpoint by Scitekno
CBERS static STAC catalog served by stac-browser by Radiant Earth
cbers-tiler by Mapbox
Using Remote Sensing Images and Cloud Services on AWS to Improve Land Use and Cover Monitoring by K. R. Ferreira, et al.
Forest Monitor by Brazil Datacube, INPE

See 10 usage examples →

Digital Earth Africa Sentinel-1 Radiometrically Terrain Corrected

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsynthetic aperture radar

DE Africa’s Sentinel-1 backscatter product is developed to be compliant with the CEOS Analysis Ready Data for Land (CARD4L) specifications. The Sentinel-1 mission, composed of a constellation of two C-band Synthetic Aperture Radar (SAR) satellites, are operated by European Space Agency (ESA) as part of the Copernicus Programme. The mission currently collects data every 12 days over Africa at a spatial resolution of approximately 20 m. Radar backscatter measures the amount of microwave radiation reflected back to the sensor from the ground surface. This measurement is sensitive to surface rough...

Usage examples

Digital Earth Africa web services by Digital Earth Africa Contributors
Water detection with Sentinel-1 by Madeleine Seehaber
Introduction to DE Africa by Dr Fang Yuan
Digital Earth Africa Training by Digital Earth Africa Contributors
Digital Earth Africa Explorer by Digital Earth Africa Contributors

See 10 usage examples →

Digital Earth Africa Sentinel-2 Level-2A

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

The Sentinel-2 mission is part of the European Union Copernicus programme for Earth observations. Sentinel-2 consists of twin satellites, Sentinel-2A (launched 23 June 2015) and Sentinel-2B (launched 7 March 2017). The two satellites have the same orbit, but 180° apart for optimal coverage and data delivery. Their combined data is used in the Digital Earth Africa Sentinel-2 product. Together, they cover all Earth’s land surfaces, large islands, inland and coastal waters every 3-5 days. Sentinel-2 data is tiered by level of pre-processing. Level-0, Level-1A and Level-1B data contain raw data fr...

Usage examples

Digital Earth Africa Explorer by Digital Earth Africa Contributors
Introduction to DE Africa by Dr Fang Yuan
Digital Earth Africa Sandbox by Digital Earth Africa Contributors
Digital Earth Africa Training by Digital Earth Africa Contributors
Digital Earth Africa web services by Digital Earth Africa Contributors

See 10 usage examples →

Garvan Institute Long Read Sequencing Benchmark Data

bioinformaticsgenomiclife scienceslong read sequencing

The dataset contains reference samples that will be useful for benchmarking and comparing bioinformatics tools for genome analysis. Examples include: NA12878 (HG001) and NA24385 (HG002) sequenced on an Oxford Nanopore Technologies (ONT) PromethION using the latest R10.4.1 flowcells; and, UHR RNA (direct-RNA) on an ONT PromethION using the latest RNA004 flowcells. Raw signal data output by the sequencer is provided for these datasets in BLOW5 format, and can be rebasecalled when basecalling software updates bring accuracy and feature improvements over the years. Raw signal data is not only for ...

Usage examples

Slow5curl: library and tool for accessing remote BLOW5 files. by Wong, B., et al.
Flexible and efficient handling of nanopore sequencing signal data with slow5tools. by Samarakoon, H., Ferguson, J.M., Jenner, S.P. et al.
Directly processing on an s3fs mount by Hasindu Gamaarachchi
Slow5lib: toolkit slow5lib is a software library for reading & writing SLOW5 files. by Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al.
Fast nanopore sequencing data analysis with SLOW5. by Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al.

See 10 usage examples →

IBL Neuropixels Brainwide Map on AWS

life sciencesMus musculusneurophysiologyneuroscienceopen source software

Electrophysiological recordings of mouse brain activity acquired during a decision making task.

Usage examples

See 10 usage examples →

Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST)

climateearth observationenvironmentalnatural resourceoceanssatellite imagerywaterweather

A global, gap-free, gridded, daily 1 km Sea Surface Temperature (SST) dataset created by merging multiple Level-2 satellite SST datasets. Those input datasets include the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave Scanning Radiometer 2 (AMSR-2) on GCOM-W1, the Moderate Resolution Imaging Spectroradiometers (MODIS) on the NASA Aqua and Terra platforms, the US Navy microwave WindSat radiometer, the Advanced Very High Resolution Radiometer (AVHRR) on several NOAA satellites, and in situ SST observations from the NOAA iQuam project. Data are available fro...

Usage examples

THREDDS server by PO.DAAC
GHRSST Data Processing Specification by GHRSST Project
Web discovery service by PO.DAAC
A multi-scale high-resolution analysis of global sea surface temperature by Chin, Toshio Michael, Jorge Vazquez-Cuervo, and Edward M. Armstrong
Improving our knowledge about the oceans by providing cloud-based access to large datasets by Chelle Gentemann

See 10 usage examples →

NHGRI AnVIL Project

biologygene expressiongenomegenomicHomo sapienslife sciences

The NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL) Project (https://anvilproject.org/) is the National Human Genome Research Institute's cloud-based platform for genomic data sharing and analysis. AnVIL hosts widely used human genome reference datasets generated through NHGRI-funded research. AnVIL on Open Data on AWS provides public access to open-access datasets available through AnVIL. The project is a collaborative effort involving NHGRI, the Broad Institute, Johns Hopkins University, the University of California Santa Cruz, Vanderbilt University Medical Center, Brigh...

Usage examples

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation by Mikhail Kolmogorov, Kimberley J. Billingsley, Mira Mastoras, Melissa Meredith, Jean Monlong, Ryan Lorig-Roach, Mobin Asri, Pilar Alvarez Jerez, Laksh Malik, Ramita Dewan, Xylena Reed, Rylee M. Genner, Kensuke Daida, Sairam Behera, Kishwar Shafin, Trevor Pesout, Jeshuwin Prabakaran, Paolo Carnevali, Jianzhi Yang, Arang Rhie, Sonja W. Scholz, Bryan J. Traynor, Karen H. Miga, Miten Jain, Winston Timp, Adam M. Phillippy, Mark Chaisson, Fritz J. Sedlazeck, Cornelis Blauwendraat, Benedict Paten
The complete sequence of a human Y chromosome by Arang Rhie, Sergey Nurk, Monika Cechova, Savannah J. Hoyt, Dylan J. Taylor, Nicolas Altemose, Paul W. Hook, Sergey Koren, Mikko Rautiainen, Ivan A. Alexandrov, Jamie Allen, Mobin Asri, Andrey V. Bzikadze, Nae-Chyun Chen, Chen-Shan Chin, Mark Diekhans, Paul Flicek, Giulio Formenti, Arkarachai Fungtammasan, Carlos Garcia Giron, Erik Garrison, Ariel Gershman, Jennifer L. Gerton, Patrick G. S. Grady, Andrea Guarracino, Leanne Haggerty, Reza Halabian, Nancy F. Hansen, Robert Harris, Gabrielle A. Hartley, William T. Harvey, Marina Haukness, Jakob Heinz, Thibaut Hourlier, Robert M. Hubley, Sarah E. Hunt, Stephen Hwang, Miten Jain, Rupesh K. Kesharwani, Alexandra P. Lewis, Heng Li, Glennis A. Logsdon, Julian K. Lucas, Wojciech Makalowski, Christopher Markovic, Fergal J. Martin, Ann M. Mc Cartney, Rajiv C. McCoy, Jennifer McDaniel, Brandy M. McNulty, Paul Medvedev, Alla Mikheenko, Katherine M. Munson, Terence D. Murphy, Hugh E. Olsen, Nathan D. Olson, Luis F. Paulin, David Porubsky, Tamara Potapova, Fedor Ryabov, Steven L. Salzberg, Michael E. G. Sauria, Fritz J. Sedlazeck, Kishwar Shafin, Valery A. Shepelev, Alaina Shumate, Jessica M. Storer, Likhitha Surapaneni, Angela M. Taravella Oill, Françoise Thibaud-Nissen, Winston Timp, Marta Tomaszkiewicz, Mitchell R. Vollger, Brian P. Walenz, Allison C. Watwood, Matthias H. Weissensteiner, Aaron M. Wenger, Melissa A. Wilson, Samantha Zarate, Yiming Zhu, Justin M. Zook, Evan E. Eichler, Rachel J. O’Neill, Michael C. Schatz, Karen H. Miga, Kateryna D. Makova, Adam M. Phillippy
The Human Pangenome Project: a global resource to map genomic diversity by Ting Wang, Lucinda Antonacci-Fulton, Kerstin Howe, Heather A. Lawson, Julian K. Lucas, Adam M. Phillippy, Alice B. Popejoy, Mobin Asri, Caryn Carson, Mark J. P. Chaisson, Xian Chang, Robert Cook-Deegan, Adam L. Felsenfeld, Robert S. Fulton, Erik P. Garrison, Nanibaa’ A. Garrison, Tina A. Graves-Lindsay, Hanlee Ji, Eimear E. Kenny, Barbara A. Koenig, Daofeng Li, Tobias Marschall, Joshua F. McMichael, Adam M. Novak, Deepak Purushotham, Valerie A. Schneider, Baergen I. Schultz, Michael W. Smith, Heidi J. Sofia, Tsachy Weissman, Paul Flicek, Heng Li, Karen H. Miga, Benedict Paten, Erich D. Jarvis, Ira M. Hall, Evan E. Eichler, David Haussler, the Human Pangenome Reference Consortium
Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) by Michael C. Schatz, Anthony A. Philippakis, Enis Afgan, Eric Banks, Vincent J. Carey, Robert J. Carroll, Alessandro Culotti, Kyle Ellrott, Jeremy Goecks, Robert L. Grossman, Ira M. Hall, Kasper D. Hansen, Jonathan Lawson, Jeffrey T. Leek, Anne O’Donnell Luria, Stephen Mosher, Martin Morgan, Anton Nekrutenko, Brian D. O’Connor, Kevin Osborn, Benedict Paten, Candace Patterson, Frederick J. Tan, Casey Overby Taylor, Jennifer Vessio, Levi Waldron, Ting Wang, Kristin Wuichet, AnVIL Team
Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing by Sam Kovaka, Shujun Ou, Katharine M. Jenike, Michael C. Schatz

See 13 usage examples →

New Zealand Imagery

aerial imagerycogearth observationgeospatialsatellite imagerystac

The New Zealand Imagery dataset consists of New Zealand's publicly owned aerial and satellite imagery, which is freely available to use under an open licence. The dataset ranges from the latest high-resolution aerial imagery down to 5cm in some urban areas to lower resolution satellite imagery that provides full coverage of mainland New Zealand, Chathams and other offshore islands. It also includes historical imagery that has been scanned from film, orthorectified (removing distortions) and georeferenced (correctly positioned) to create a unique and crucial record of changes to the New Zea...

Usage examples

ArcGIS Workflows: NZ Imagery from a public AWS S3 bucket by Eagle Technology
A Boundary Regulated Network for Accurate Roof Segmentation and Outline Extraction by Guangming Wu, Zhiling Guo, Xiaodan Shi, Qi Chen, Yongwei Xu, Ryosuke Shibasaki and Xiaowei Shao
LINZ Data Service by Toitū Te Whenua Land Information New Zealand
LINZ Topographic Workflows - Bulk imagery processing with Kubernetes and Argo Workflows by Toitū Te Whenua Land Information New Zealand
LINZ Basemaps by Toitū Te Whenua Land Information New Zealand

See 10 usage examples →

RADARSAT-1

agriculturecogdisaster responseearth observationgeospatialglobalicesatellite imagerysynthetic aperture radar

Developed and operated by the Canadian Space Agency, it is Canada's first commercial Earth observation satellite

Développé et exploité par l'Agence spatiale canadienne, il s'agit du premier satellite commercial d'observation de la Terre au Canada.

Usage examples

See 10 usage examples →

Catalina Sky Survey (CSS) subset data on AWS

astronomyobject detectionplanetarysurvey

Raw data that discovers Near Earth Objects (NEOs) which potentially could impact Earth

Usage examples

Overview of Catalina Sky Survey PDS Archive by R. Seaman, C. Neese, J. Stone, E. Christensen
Transform Tool by Planetary Data System (PDS) Engineering Node
Astropy by The Astropy Developers
FITS image compression programs (fpack & funpack) by Rob Seaman (NOAO), William Pence (NASA/GSFC), Rick White (STScI).
Catalina Sky Survey Operations and Processing by R. Seaman, E. Christensen, A. Gibbs, S. Larson, F. Shelly

See 9 usage examples →

Department of Energy's Open Energy Data Initiative (OEDI)

energyenvironmentalgeospatiallidarmodelsolar

Data released under the Department of Energy's (DOE) Open Energy Data Initiative (OEDI). The Open Energy Data Initiative aims to improve and automate access of high-value energy data sets across the U.S. Department of Energy’s programs, offices, and national laboratories. OEDI aims to make data actionable and discoverable by researchers and industry to accelerate analysis and advance innovation.

Usage examples

Rooftop Solar Photovoltaic Technical Potential in the United States: A Detailed Assessment by Pieter Gagnon, Robert Margolis, Jennifer Melius, Caleb Phillips, and Ryan Elmore
Tracking the Sun Tool by Lawrence Berkeley National Laboratory (LBNL)
On the Use of Coupled Wind, Wave, and Current Fields in the Simulation of Loads on BottomSupported Offshore Wind Turbines during Hurricanes by E. Kim, L. Manuel, M. Curcic, S. S. Chen, C. Phillips, P. Veers
The Distributed Generation Market Demand Model (dGen):Documentation by B. Sigrin, M. Gleason, R. Preus, I. Baring-Gould, R. Margolis
NSRDB Viewer by National Renewable Energy Laboratory (NREL)

See 9 usage examples →

Digital Earth Africa ALOS PALSAR, ALOS-2 PALSAR-2 and JERS-1

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsynthetic aperture radar

The ALOS/PALSAR annual mosaic is a global 25 m resolution dataset that combines data from many images captured by JAXA’s PALSAR and PALSAR-2 sensors on ALOS-1 and ALOS-2 satellites respectively. This product contains radar measurement in L-band and in HH and HV polarizations. It has a spatial resolution of 25 m and is available annually for 2007 to 2010 (ALOS/PALSAR) and 2015 to 2020 (ALOS-2/PALSAR-2). The JERS annual mosaic is generated from images acquired by the SAR sensor on the Japanese Earth Resources Satellite-1 (JERS-1) satellite. This product contains radar measurement in L-band and H...

Usage examples

Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Geoportal by Digital Earth Africa Contributors
Digital Earth Africa Notebook Repo by Digital Earth Africa Contributors
Introduction to DE Africa by Dr Fang Yuan
Digital Earth Africa Sandbox by Digital Earth Africa Contributors

See 9 usage examples →

Digital Earth Africa Cropland Extent Map (2019)

agriculturecogdeafricaearth observationfood securitygeospatialsatellite imagerystacsustainability

Digital Earth Africa's cropland extent map (2019) shows the estimated location of croplands in Africa for the period January to December 2019. Cropland is defined as: "a piece of land of minimum 0.01 ha (a single 10m x 10m pixel) that is sowed/planted and harvest-able at least once within the 12 months after the sowing/planting date." This definition will exclude non-planted grazing lands and perennial crops which can be difficult for satellite imagery to differentiate from natural vegetation. This provisional cropland extent map has a resolution of 10m, and was built using Cope...

Usage examples

Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Notebook Repo by Digital Earth Africa Contributors
Co-Production of a 10-m Cropland Extent Map for Continental Africa using Sentinel-2, Cloud Computing, and the Open-Data-Cube by Chad Burton, Fang Yuan, Chong Ee-Faye, Meghan Halabisky, David Ongo, Fatou Mar, Victor Addabor, Bako Mamane and Sena Adimou
Cropland Extent is now available for the entire African continent by Digital Earth Africa Contributors
Digital Earth Africa Training by Digital Earth Africa Contributors

See 9 usage examples →

Digital Earth Africa Fractional Cover

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsustainability

Fractional cover (FC) describes the landscape in terms of coverage by green vegetation, non-green vegetation (including deciduous trees during autumn, dry grass, etc.) and bare soil. It provides insight into how areas of dry vegetation and/or bare soil and green vegetation are changing over time. The product is derived from Landsat satellite data, using an algorithm developed by the Joint Remote Sensing Research Program. Digital Earth Africa's FC service has two components. Fractional Cover is estimated from each Landsat scene, providing measurements from individual days. Fractional Cover...

Usage examples

Digital Earth Africa Training by Digital Earth Africa Contributors
Introduction to DE Africa by Dr Fang Yuan
Digital Earth Africa Notebook Repo by Digital Earth Africa Contributors
Digital Earth Africa Explorer (Fractional Cover) by Digital Earth Africa Contributors
Digital Earth Africa Sandbox by Digital Earth Africa Contributors

See 9 usage examples →

Digital Earth Africa Monthly Normalised Difference Vegetation Index (NDVI) Anomaly

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

Digital Earth Africa’s Monthly NDVI Anomaly service provides estimate of vegetation condition, for each caldendar month, against the long-term baseline condition measured for the month from 1984 to 2020 in the NDVI Climatology. A standardised anomaly is calculated by subtracting the long-term mean from an observation of interest and then dividing the result by the long-term standard deviation. Positive NDVI anomaly values indicate vegetation is greener than average conditions, and are usually due to increased rainfall in a region. Negative values indicate additional plant stress relative to t...

Usage examples

Mean NDVI and Anomalies by Digital Earth Africa Contributors
Digital Earth Africa Training by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Sandbox by Digital Earth Africa Contributors
Digital Earth Africa web services by Digital Earth Africa Contributors

See 9 usage examples →

KyFromAbove on AWS

aerial imagerycogdisaster responsedtmearth observationelevationgeopackagegeospatiallidarmappingstactifftiles

The KyFromAbove initiative is focused on building and maintaining a current basemap for Kentucky that can meet the needs of its users at the state, federal, local, and regional level. A common basemap, including current color leaf-off aerial photography and elevation data (LiDAR), reduces the cost of developing GIS applications, promotes data sharing, and add efficiencies to many business processes. All basemap data acquired through this effort is made available in the public domain. KyFromAbove acquires aerial imagery and LiDAR during leaf-off conditions in the Commonwealth. The imagery typic...

Usage examples

KyTopo! Kentucky's New Topographic Map Series by Kent Anness
KyFromAbove Catalog Explorer by Boyd Shearer, Ky Div. of Geographic Information
KyFromAbove is democratizing key data in Kentucky by Matt Collins
KyFromAbove County Bulk Download by Ian Horn
KyFromAbove Explorer by NV5

See 9 usage examples →

Louisiana Watershed Initiative (LWI) Model Data

bathymetryclimatecoastaldisaster responseelevationfloodsforecastgeospatialhydrologic modelhydrologyinfrastructureland coverland usemappingmeteorologicalmodelopen source softwareprecipitationsimulationssustainabilitywaterweather

Geographic (land cover, land elevation, etc.), meteorologic (pluvial, wind, etc.), hydrologic (fluvial, tidal, etc.), hydrodynamic (water surface elevations, flow velocities), and built environment (structures, levees, floodgates, culverts) data used as inputs to and outputs from numerical modeling software for the prediction of flood risk in stochastic and probabilistic frameworks. This data was collected from open sources, such as from the National Oceanographic and Atmospheric Administration (NOAA) or the United States Geological Survey (USGS). The format of these data is modified to su...

Usage examples

See 9 usage examples →

NREL Wind Integration National Dataset

environmentalgeospatialmeteorological

Released to the public as part of the Department of Energy's Open Energy Data Initiative, the Wind Integration National Dataset (WIND) is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies.

Usage examples

WTK-LED: The WIND Toolkit Long-Term Ensemble Dataset by Caroline Draxl, Jiali Wang, Lindsay Sheridan, et al.
Wind Visualization by Jordan Perr-Sauer
Power from wind: Open data on AWS by Caleb Phillips, Caroline Draxl, John Readey, Jordan Perr-Sauer
Validation of Power Output for the WIND Toolkit by J. King, Andrew Clifton, Bri-Mathias Hodge
Wind Prospector by Paul Edwards

See 9 usage examples →

Open NeuroData

array tomographybiologyelectron microscopyimage processinglife scienceslight-sheet microscopymagnetic resonance imagingneuroimagingneuroscience

This bucket contains multiple neuroimaging datasets (as Neuroglancer Precomputed Volumes) across multiple modalities and scales, ranging from nanoscale (electron microscopy), to microscale (cleared lightsheet microscopy and array tomography), and mesoscale (structural and functional magnetic resonance imaging). Additionally, many of the datasets include segmentations and meshes.

Usage examples

A Community-Developed Open-Source Computational Ecosystem for Big Neuro Data by J. T. Vogelstein, E. Perlman, B. Falk, A. Baden, W. Gray Roncal, V. Chandrashekhar, F. Collman, S. Seshamani, J. L. Patsolic, K. Lillaney, M. Kazhdan, R. Hider, D. Pryor, J. Matelsky, T. Gion, P. Manavalan, B. Wester, M. Chevillet, E. T. Trautman, K. Khairy, E. Bridgeford, D. M. Kleissas, D. J. Tward, A. K. Crow, B. Hsueh, M. A. Wright, M. I. Miller, S. J. Smith, R. J. Vogelstein, K. Deisseroth, and R. Burns
The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience by R. Burns, W. G. Roncal, D. Kleissas, K. Lillaney, P. Manavalan, E. Perlman, D. R. Berger, D. D. Bock, K. Chung, L. Grosenick, N. Kasthuri, N. C. Weiler, K. Deisseroth, M. Kazhdan, J. Lichtman, R. C. Reid, S. J. Smith, A. S. Szalay, J. T. Vogelstein, and R. J. Vogelstein.
From cosmos to connectomes: The evolution of data-intensive science by R. Burns, J. T. Vogelstein, and A. S. Szalay
CloudVolume by William Silversmith
To the Cloud! A Grassroots Proposal to Accelerate Brain Science Discovery by J. T. Vogelstein, B. Mensh, M. Häusser, N. Spruston, A. C. Evans, K. Kording, K. Amunts, C. Ebell, J. Muller, M. Telefont, S. Hill, S. P. Koushika, C. Calì, P. A. Valdés-Sosa, P. B. Littlewood, C. Koch, S. Saalfeld, A. Kepecs, H. Peng, Y. O. Halchenko, G. Kiar, M. M. Poo, J. B. Poline, M. P. Milham, A. P. Schaffer, R. Gidron, H. Okano, V. D. Calhoun, M. Chun, D. M. Kleissas, R. J. Vogelstein, E. Perlman, R. Burns, R. Huganir, and M. I. Miller

See 9 usage examples →

PubSeq - Public Sequence Resource

bambioinformaticsbiologycoronavirusCOVID-19fast5fastafastqgeneticgenomichealthjsonlife scienceslong read sequencingmedicineMERSmetadataopen source softwareRDFSARSSARS-CoV-2SPARQL

COVID-19 PubSeq is a free and open online bioinformatics public sequence resource with on-the-fly analysis of sequenced SARS-CoV-2 samples that allows for a quick turnaround in identification of new virus strains. PubSeq allows anyone to upload sequence material in the form of FASTA or FASTQ files with accompanying metadata through the web interface or REST API.

Usage examples

See 9 usage examples →

Southern California Earthquake Data

earth observationearthquakesseismology

This dataset contains ground motion velocity and acceleration seismic waveforms recorded by the Southern California Seismic Network (SCSN) and archived at the Southern California Earthquake Data Center (SCEDC). A Distributed Acousting Sensing (DAS) dataset is included.

Usage examples

Southern California Earthquake Data Now Available in the AWS Cloud by Ellen Yu; Aparna Bhaskaran; Shang‐Lin Chen; Zachary E. Ross; Egill Hauksson; Robert W. Clayton
SeisNoise.jl GPU Computing Tutorial - Another example of accessing data s3://scedc-pds for ambient noise cross-correlation by Tim Clements
Getting Started with SCEDC AWS Public Dataset by Ellen Yu
Using Lambda to Process Seismograms by Shang-Lin Chen
Using Amazon API Gateway and Lambda to window and decimate waveforms by Shang-Lin Chen

See 9 usage examples →

Steinegger Lab Datasets

bioinformaticslife sciencesmetagenomicsopen source softwareproteinprotein folding

The Steinegger Lab Dataset comprises biological databases and resources critical for protein sequence and structure analysis, developed to support ColabFold, MMseqs2, and Foldseek/Foldcomp—three high-performance computational tools widely used in bioinformatics.The MMseqs2 dataset serves as the backbone for our fast structure prediction tool, ColabFold, and includes UniRef30, BFD, and the ColabFold environmental databases. These datasets are specifically designed for the rapid generation of multiple sequence alignments (MSAs), which are essential for high-accuracy structure prediction. Beyond ...

Usage examples

ColabFold: Making protein folding accessible to all by Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M
ColabFold User Guide by Mirdita M and Ovchinnikov S
ColabFold Tutorial by Ovchinnikov S, Mirdita M and Steinegger M
Foldseek Search Server by van Kempen M, Kim S, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, et al.
Foldseek User Guide by Mirdita M and Steinegger M

See 9 usage examples →

USGS 3DEP LiDAR Point Clouds

agriculturedisaster responseelevationgeospatiallidarstac

The goal of the USGS 3D Elevation Program (3DEP) is to collect elevation data in the form of light detection and ranging (LiDAR) data over the conterminous United States, Hawaii, and the U.S. territories, with data acquired over an 8-year period. This dataset provides two realizations of the 3DEP point cloud data. The first resource is a public access organization provided in Entwine Point Tiles format, which a lossless, full-density, streamable octree based on LASzip (LAZ) encoding. The second resource is a Requester Pays of the original, Raw LAZ (Compressed LAS) 1.4 3DEP format, and more co...

Usage examples

USGS 3DEP Lidar Point Cloud Now Available as Amazon Public Dataset by Department of the Interior, U.S. Geological Survey
OpenTopography access to 3DEP lidar point cloud data by OpenTopography
Extracting buildings and roads from AWS Open Data using Amazon SageMaker by Yunzhi Shi, Tianyu Zhang, and Xin Chen
Statewide USGS 3DEP Lidar Topographic Differencing Applied to Indiana, USA by Chelsea Phipps Scott, Matthew Beckley, Minh Phan, Emily Zawacki, Christopher Crosby, Viswanath Nandigam, and Ramon Arrowsmith
Using Lambda Layers with USGS 3DEP LiDAR Point Clouds by Howard Butler

See 9 usage examples →

World Bank - Light Every Night

cogdisaster responseearth observationsatellite imagerystac

Light Every Night - World Bank Nighttime Light Data – provides open access to all nightly imagery and data from the Visible Infrared Imaging Radiometer Suite Day-Night Band (VIIRS DNB) from 2012-2020 and the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) from 1992-2013. The underlying data are sourced from the NOAA National Centers for Environmental Information (NCEI) archive. Additional processing by the University of Michigan enables access in Cloud Optimized GeoTIFF format (COG) and search using the Spatial Temporal Asset Catalog (STAC) standard. The data is...

Usage examples

Detection of Rural Electrification in Africa using DMSP-OLS Night Lights Imagery. International Journal of Remote Sensing by Brian Min, Kwawu Mensan Gaba, Ousmane Fall Sarr, Alassane Agalassou.
Power and the Vote - Elections and Electricity in the Developing World. Cambridge. 2015. by Brian Min
Nighttime lights compositing using the VIIRS day-night band - Preliminary results. Proceedings of the Asia-Pacific Advanced Network 35 (2013)70-86. by Kimberly Baugh, Feng-Chi Hsu, Christopher D. Elvidge, and Mikhail Zhizhin.
Mainstreaming Disruptive Technologies in Energy. World Bank Report. 2019 by Kwawu Mensan Gaba, Brian Min, Olaf Veerman, Kimberly Baugh
High Resolution Electricity Access Indicators (HREA) - Settlement-level measures of electricity access, reliability, and usage. by Brian Min, Zachary O'Keeffe

See 9 usage examples →

nuScenes

autonomous vehiclescomputer visionlidarroboticstransportationurban

Public large-scale dataset for autonomous driving. It enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car.

Usage examples

nuScenes lidarseg and panoptic tutorial by Motional
nuScenes prediction tutorial by Motional
nuScenes devkit tutorial by Motional
Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking by Whye Kit Fong, Rohit Mohan, Juana Valeria Hurtado, Lubing Zhou, Holger Caesar, Oscar Beijbom, Abhinav Valada
nuScenes Map Expansion Tutorial by Motional

See 9 usage examples →

ArcticDEM

cogearth observationelevationgeospatialmappingopen source softwaresatellite imagerystac

ArcticDEM - 2m GSD Digital Elevation Models (DEMs) and mosaics from 2007 to the present. The ArcticDEM project seeks to fill the need for high-resolution time-series elevation data in the Arctic. The time-dependent nature of the strip DEM files allows users to perform change detection analysis and to compare observations of topography data acquired in different seasons or years. The mosaic DEM tiles are assembled from multiple strip DEMs with the intention of providing a more consistent and comprehensive product over large areas. ArcticDEM data is constructed from in-track and cross-track high...

Usage examples

Future Evolution of Greenland's Marine-Terminating Outlet Glaciers by Ginny A. Catania, Leigh A. Stearns, Twila A. Moon, Ellen M. Enderlin, R. H. Jackson
PGC Dynamic STAC API Tutorial by Polar Geospatial Center
The surface extraction from TIN based search-space minimization (SETSM) algorithm by Myoung-Jong Noh, Ian M. Howat
Dynamic ice loss from the Greenland Ice Sheet driven by sustained glacier retreat by Michalea D. King, Ian M. Howat, Salvatore G. Candela, Myoung J. Noh, Seongsu Jeong, Brice P. Y. Noël, Michiel R. van den Broeke, Bert Wouters, Adelaide Negrete
Automatic relative RPC image model bias compensation through hierarchical image matching for improving DEM quality by Myoung-Jong Noh, Ian M. Howat

See 8 usage examples →

Boreas Autonomous Driving Dataset

autonomous vehiclescomputer visionlidarrobotics

This autonomous driving dataset includes data from a 128-beam Velodyne Alpha-Prime lidar, a 5MP Blackfly camera, a 360-degree Navtech radar, and post-processed Applanix POS LV GNSS data. This dataset was collect in various weather conditions (sun, rain, snow) over the course of a year. The intended purpose of this dataset is to enable benchmarking of long-term all-weather odometry and metric localization across various sensor types. In the future, we hope to also support an object detection benchmark.

Usage examples

Project Lidar onto Camera Frames (Jupyter notebook) by Keenan Burnett
Need for Speed: Fast Correspondence-Free Lidar Odometry Using Doppler Velocity by D J Yoon, K Burnett, J Laconte, Y Chen, H Vhavle, S Kammel, J Reuther, T D Barfoot
Introduction to Visualizing Sensor Types (Jupyter notebook) by Keenan Burnett
Radar odometry combining probabilistic estimation and unsupervised feature learning by K Burnett, D J Yoon, A P Schoellig, T D Barfoot
Picking up speed: Continuous-time Lidar-only odometry using doppler velocity measurements by Y Wu, D J Yoon, K Burnett, S Kammel, Y Chen, H Vhavle, T D Barfoot

See 8 usage examples →

CHAMMI-75

biologycell imagingfluorescence imaginghigh-throughput imagingimaginglife sciencesmachine learningmicroscopy

Quantifying cell morphology using images and machine learning models has proven to be a powerful tool to study the response of cells to treatments. However, the models used to quantify cellular morphology are typically trained with a single microscopy imaging type and under controlled experimental conditions. This results in specialized models that cannot be reused across biological studies because the technical specifications do not match (e.g., different number of channels), or because the target experimental conditions are out of distribution. We have created CHAMMI-75, a large-scale dat...

Usage examples

CHAMMI-75 Source Code by Vidit Agrawal
CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images by Vidit Agrawal, John Peters, Tyler N. Thompson, Mohammad Vali Sanian, Chau Pham, Nikita Moshkov, Arshad Kazi, Aditya Pillai, Jack Freeman, Byunguk Kang, Samouil L. Farhi, Ernest Fraenkel, Ron Stewart, Lassi Paavolainen, Bryan A. Plummer, Juan C. Caicedo
MorphEm Model (Trained on CHAMMI-75) by Vidit Agrawal, John Peters, Juan Caicedo
CHAMMI Benchmarking Source Code by Chau Pham
Running CHAMMI-75 Evaluation Benchmarks by Vidit Agrawal, Juan Caicedo

See 8 usage examples →

CMIP6 GCMs downscaled using WRF

agricultureatmosphereclimateearth observationenvironmentalmodeloceanssimulationsweather

High-resolution historical and future climate simulations from 1980-2100

Usage examples

Downscaling file descriptions, directory structure, and data access by Stefan Rahimi & Lei Huang
Memo on the Development and Availability of Dynamically Downscaled Projections Using WRF by Stefan Rahimi
Memo on the Evaluation of Downscaled GCMs Using WRF by Stefan Rahimi
Memorandum on Evaluating Global Climate Models for Studying Regional Climate Change in California by Will Krantz, David Pierce, Naomi Goldenson, Daniel Cayan
An Overview of the Western United States Dynamically Downscaled Dataset (WUS-D3) by Rahimi, S., Huang, L., Norris, J., Hall, A., Goldenson, N., Krantz, W., Bass, B., Thackeray, C., Lin, H., Chen, D., Dennis, E., Collins, E., Lebo, Z. J., Slinskey, E., Graves, S., Biyani, S., Wang, B., Cropper, S., and the UCLA Center for Climate Science Team

See 8 usage examples →

Cancer Cell Line Encyclopedia (CCLE)

cancergeneticgenomicHomo sapienslife sciencesSTRIDEStranscriptomicswhole genome sequencing

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. The CCLE provides public access to genomic data, visualization and analysis for over 1100 cancer cell lines. This dataset contains RNA-Seq Aligned Reads, WXS Aligned Reads, and WGS Aligned Reads data.

Usage examples

The landscape of cancer cell line metabolism by Li, H. et al.
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity by Barretina Caponigro Stransky et al.
Next-generation characterization of the Cancer Cell Line Encyclopedia by Ghandi, M., Huang F. et al.
Pharmacogenomic agreement between two cancer cell line data sets by The Cancer Cell Line Encyclopedia Consortium & The Genomics of Drug Sensitivity in Cancer Consortium
Cancer Genomics Cloud by Seven Bridges

See 8 usage examples →

Co-Produced Climate Data to Support California's Resilience Investments

atmosphereclimateclimate modelearth observationgeosciencegeospatialmeteorologicalsimulationsweatherzarr

Downscaled future and historical climate projections for California and her environs in support of California's Fifth Climate Assessment

Usage examples

See 8 usage examples →

DOE's Water Power Technology Office's (WPTO) US Wave dataset

earth observationenergygeospatialmeteorologicalwater

Released to the public as part of the Department of Energy's Open Energy Data Initiative, this is the highest resolution publicly available long-term wave hindcast dataset that – when complete – will cover the entire U.S. Exclusive Economic Zone (EEZ).

Usage examples

High-Resolution Regional Wave Hindcast for the U.S. West Coast by Yang, Zhaoqing; Wu, Wei-Cheng; Wang, Taiping; Castrucci, Luca
SWAN Cycle III version 41.31A by The SWAN team
Development and validation of a regional-scale high-resolution unstructured model for wave energy resource characterization along the US East Coast by Allahdadi, M.N., Gunawan, J. Lai, R. He, V.S. Neary
Development and validation of a high-resolution regional wave hindcast model for U.S. West Coast wave resource characterization by Wu, Wei-Cheng; Wang, Taiping; Yang, Zhaoqing; Garcia Medina, Gabriel
High-resolution hindcasts for U.S. wave energy resource characterization by Yang, Z. and V.S. Neary

See 8 usage examples →

Digital Earth Africa Normalised Difference Vegetation Index (NDVI) Climatology

agricultureagriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystac

Digital Earth Africa’s NDVI climatology product represents the long-term average baseline condition of vegetation for every Landsat pixel over the African continent. Both mean and standard deviation NDVI climatologies are available for each calender month.Some key features of the product are:

NDVI climatologies were developed using harmonized Landsat 5,7,and 8 satellite imagery.
Mean and standard deviation NDVI climatologies are produced for each calender month, using a temporal baseline period from 1984-2020 (inclusive)
Datasets have a spatial

...

Usage examples

Digital Earth Africa Sandbox by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa web services by Digital Earth Africa Contributors
Mean NDVI and Anomalies by Digital Earth Africa Contributors
Digital Earth Africa Geoportal by Digital Earth Africa Contributors

See 8 usage examples →

Logan Unitigs and Contigs of the Sequence Read Archive (SRA) on AWS

fastageneticgenomiclife sciencesmetagenomicsSTRIDEStranscriptomicswhole exome sequencingwhole genome sequencing

This repository is a re-analysis of the NCBI Sequence Read Archive (SRA), December 2023 freeze, to make it more accessible. The SRA is an open access database of biological sequences, containing raw data from high-throughput DNA and RNA sequencing platforms. It is the largest database of public DNA sequences worldwide, containing a wealth of genomic diversity across all living organisms. This repository contains Logan, a set of compressed FASTA files for all individual SRA accessions, in the form of unitigs and contigs. Borrowing methods from the realm of genome assembly, unitigs preserve near...

Usage examples

Downloading, mapping many contigs to a gene of interest by Rayan Chikhi
Logan - Planetary-Scale Genome Assembly Surveys Life’s Diversity by Chikhi R., Lemane T., Loll-Krippleber R., et al (2025)
Logan Search by Pierre Peterlongo
Minia 3 by Rayan Chikhi
Open Virome by Artem Babaian

See 8 usage examples →

NIH Roadmap Epigenomics

bioinformaticsbiologyepigenomicsgeneticgenomiclife sciences

The NIH Roadmap Epigenomics Mapping Consortium was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The project has generated high-quality, genome-wide maps of several key histone modifications, chromatin accessibility, DNA methylation and mRNA expression across 100s of human cell types and tissues. To see what data is available, please check the directory listing: https://roadmapepigenomics.s3.us-west-2.amazonaws.com/index.html.

Usage examples

Chromatin architecture reorganization during stem cell differentiation by GJesse R. Dixon, Inkyung Jung, Siddarth Selvaraj, Yin Shen, Jessica E. Antosiewicz-Bourget, Ah Young Lee, Zhen Ye, Audrey Kim, Nisha Rajagopal, Wei Xie, Yarui Diao, Jing Liang, Huimin Zhao, Victor V. Lobanenkov, Joseph R. Ecker, James A. Thomson & Bing Ren
Navigation of Roadmap data using Roadmap web portal by Anshul Kundaje Lab
Integrative analysis of 111 reference human epigenomes by Roadmap Epigenomics Consortium, Anshul Kundaje, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour etc.al, Ting Wang, Manolis Kellis
Human body epigenome maps reveal noncanonical DNA methylation variation by Matthew D. Schultz, Yupeng He, John W. Whitaker, Manoj Hariharan, Eran A. Mukamel, Danny Leung, Nisha Rajagopal, Joseph R. Nery, Mark A. Urich, Huaming Chen, Shin Lin, Yiing Lin, Inkyung Jung, Anthony D. Schmitt, Siddarth Selvaraj, Bing Ren, Terrence J. Sejnowski, Wei Wang & Joseph R. Ecker
WashU Epigenome Browser update 2019 by Daofeng Li, Silas Hsu, Deepak Purushotham, Renee L Sears and Ting Wang

See 8 usage examples →

NOAA Water-Column Sonar Data Archive

biodiversityearth observationecosystemsenvironmentalgeospatialmappingoceans

Water-column sonar data archived at the NOAA National Centers for Environmental Information.

Usage examples

Building an Accessible Archive for Water Column Sonar Data by Carrie Wall
pyEcholab - an open-source, python-based toolkit for reading, processing, plotting, and exporting fisheries acoustic echosounder data by Rick Towler, Chuck Anderson, Veronica Martinez, Pamme Crandell
Increasing the accessibility of acoustic data through global access and imagery by Carrie Wall, Michael Jech, and Susan McLean
Reading and Plotting Bottom Data by Carrie Wall
Reading and Plotting Processed CSV Data by Carrie Wall

See 8 usage examples →

New Zealand Elevation

cogearth observationelevationgeospatialstac

The New Zealand Elevation dataset consists of New Zealand's publicly owned digital elevation models and digital surface models, which are freely available to use under an open licence. The dataset contains 1m resolution grids derived from LiDAR data. Point clouds are not included in the initial release.All of the elevation files are Cloud Optimised GeoTIFFs using LERC compression for the main grid and LERC compression with lower max_z_error for the overviews. These elevation files are accompanied by ...

Usage examples

Underpinning Terroir with Data: Integrating Vineyard Performance Metrics with Soil and Climate Data to Better Understand Within-Region Variation in Marlborough, New Zealand by R. G. V. Bramley, J. Ouzman, A. P. Sturman, G. J. Grealish, C. E. M. Ratcliff and M. C. T. Trought
Browsing the s3://nz-elevation bucket by Toitū Te Whenua Land Information New Zealand
Stress Detection in New Zealand Kauri Canopies with WorldView-2 Satellite and LiDAR Data by Jane J. Meiforth, Henning Buddenbaum, Joachim Hill, James D. Shepherd and John R. Dymond
LINZ Topographic Workflows - Bulk elevation data processing with Kubernetes and Argo Workflows by Toitū Te Whenua Land Information New Zealand
Dairy farming exposure and impacts from coastal flooding and sea level rise in Aotearoa-New Zealand by Heather Craig, Alec Wild and Ryan Paulik

See 8 usage examples →

Northern California Earthquake Data

earth observationearthquakesseismology

This dataset contains various types of digital data relating to earthquakes in central and northern California. Time series data come from broadband, short period, and strong motion seismic sensors, GPS, and other geophysical sensors.

Usage examples

Open Access to Decades of NCSN Waveforms at the Northern California Earthquake Data Center by Doug Neuhauser; Fred Klein; Stephane Zuzlewski; David Oppenheimer; Gray Jensen
Robust Distributed Earthquake Monitoring with CISN software in Northern California by Doug Neuhauser; Stephane Zuzlewski; Peter Lombard; Lynn Dietz; Jim Luetgert
Improved Data Access From the Northern California Earthquake Data Center by Doug Neuhauser; Fred Klein; Stephane Zuzlewski; David Oppenheimer; Gray Jensen
The Northern California Earthquake Data Center: Seismic and Geophysical Data for Northern California and Beyond by Doug Neuhauser; Fred Klein; Stephane Zuzlewski; Lind Gee; David Oppenheimer
Collaborative Projects at the Northern California Earthquake Data Center (NCEDC) by Doug Neuhauser; David Oppenheimer; Stephane Zuzlewski; Lind Gee; Mark Murray

See 8 usage examples →

Open CEDA by Watershed

carbonclimateEEIOscope 3spend-based modelssupply chain

CEDA is a multi-regional Environmentally-Extended Input-Output (EEIO) model developed to support a wide range of environmental systems analyses—including corporate carbon accounting and sustainable spend analysis. CEDA provides unparalleled global coverage and granularity, representing 95% of the world's GDP across 148 countries and 400 sectors, enabling robust and geographically comprehensive Scope 3 greenhouse gas (GHG) measurement. Open CEDA is the publicly avaialable version of CEDA, now easy to download and available for free for all use cases. For more information please visit our w...

Usage examples

A Consumption-Based Greenhouse Gas Inventory of SanFrancisco Bay Area Neighborhoods, Cities and Counties by Christopher M. Jones & Daniel M. Kammen, 2015
For a tutoral please download the CEDA Methodology Documentation on the openceda.org website. by Watershed Technology
Converting University Spending to Greenhouse Gas Emissions - A Supply Chain Carbon Footprint Analysis of UC Berkeley by Kelley L. Doyle, 2012
Are services better for climate change? by Sangwon, Suh, 2006
Greening Government Procurement - Turning Uncle Sam Into an Eco-Friendly Consumer by James Badham, 2014

See 8 usage examples →

Radiant MLHub

cogearth observationenvironmentalgeospatiallabeledmachine learningsatellite imagerystac

Radiant MLHub is an open library for geospatial training data that hosts datasets generated by Radiant Earth Foundation's team as well as other training data catalogs contributed by Radiant Earth’s partners. Radiant MLHub is open to anyone to access, store, register and/or share their training datasets for high-quality Earth observations. All of the training datasets are stored using a SpatioTemporal Asset Catalog (STAC) compliant catalog and exposed through a common API. Training datasets include pairs of imagery and labels for different types of machine learning problems including image ...

Usage examples

See 8 usage examples →

Reference Elevation Model of Antarctica (REMA)

cogearth observationelevationgeospatialmappingopen source softwaresatellite imagerystac

The Reference Elevation Model of Antarctica - 2m GSD Digital Elevation Models (DEMs) and mosaics from 2009 to the present. The REMA project seeks to fill the need for high-resolution time-series elevation data in the Antarctic. The time-dependent nature of the strip DEM files allows users to perform change detection analysis and to compare observations of topography data acquired in different seasons or years. The mosaic DEM tiles are assembled from multiple strip DEMs with the intention of providing a more consistent and comprehensive product over large areas. REMA data is constructed from in...

Usage examples

REMA Explorer by Polar Geospatial Center & ESRI
Automatic relative RPC image model bias compensation through hierarchical image matching for improving DEM quality by Myoung-Jong Noh, Ian M. Howat
OpenTopography access to REMA by OpenTopography
The Reference Elevation Model of Antarctica by Ian M. Howat, Claire Porter, Benjanim E. Smith, Myoung-Jong Noh, Paul Morin
The surface extraction from TIN based search-space minimization (SETSM) algorithm by Myoung-Jong Noh, Ian M. Howat

See 8 usage examples →

Toxicant Exposures and Responses by Genomic and Epigenomic Regulators of Transcription (TaRGET)

bioinformaticsbiologyenvironmentalepigenomicsgeneticgenomiclife sciences

The TaRGET (Toxicant Exposures and Responses by Genomic and Epigenomic Regulators of Transcription) Program is a research consortium funded by the National Institute of Environmental Health Sciences (NIEHS). The goal of the collaboration is to address the role of environmental exposures in disease pathogenesis as a function of epigenome perturbation, including understanding the environmental control of epigenetic mechanisms and assessing the utility of surrogate tissue analysis in mouse models of disease-relevant environmental exposures.

Usage examples

Comparison of differential accessibility analysis strategies for ATAC-seq data by Gontarz P, Fu S, Xing X, Liu S, Miao B et.al.
Finding and Downloading TaRGET II Data files by TaRGET-DCC
Epigenetic biomarkers and preterm birth by Park B, Khanam R, Vinayachandran V, et.al.
Metabolic effects of air pollution exposure and reversibility by Rajagopalan S, Park B, Palanivel R, et al.
The role of environmental exposures and the epigenome in health and disease. by Perera BPU, Faulk C, Svoboda LK, Goodrich JM, Dolinoy DC.

See 8 usage examples →

U.S. Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure High Throughput Transcriptomics Data

bioinformaticsfastqgene expressiontranscriptomics

High-throughput transcriptomics (HTTr) data generated by US EPA Office of Research and Development, Center for Computational Toxicology and Exposure (CCTE), Biomolecular and Computational Toxicology Division. All data is generated using TempO-Seq targeted RNA-seq technology from in vitro cell culture systems.

Usage examples

High-Throughput Transcriptomics of Water Extracts Detects Reductions in Biological Activity with Water Treatment Processes by Rogers J., Leusch F., Chambers B., Daniels K., Everett L., Judson R., et al
High-Throughput Transcriptomics Screen of ToxCast Chemicals in U-2 OS Cells by Bundy J., Everett L., Rogers J., Nyffeler J., Byrd G., Culbreth M., et al
Exploring the Effects of Experimental Parameters and Data Modeling Approaches on In Vitro Transcriptomic Point-of-Departure Estimates by Harrill J., Everett L., Haggard D., Bundy J., Willis C., Shah I., et al
Signature analysis of high-throughput transcriptomics screening data for mechanistic inference and chemical grouping by Harrill J., Everett L., Haggard D., Word L., Bundy J., Chambers B., et al
HTTr pipeline by Logan Everett

See 8 usage examples →

ASTER L1T Cloud-Optimized GeoTIFFs

cogearth observationgeospatialminingnatural resourcesatellite imagerysustainability

The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Level 1 Precision Terrain Corrected Registered At-Sensor Radiance (AST_L1T) data contains calibrated at-sensor radiance, which corresponds with the ASTER Level 1B (AST_L1B), that has been geometrically corrected, and rotated to a north-up UTM projection. The AST_L1T is created from a single resampling of the corresponding ASTER L1A (AST_L1A) product.The precision terrain correction process incorporates GLS2000 digital elevation data with derived ground control points (GCPs) to achieve topographic accuracy for all daytim...

Usage examples

ASTER L1T Algorithm Theoretical Basis by David Meyer, Dawn Siemonsma, Barbara Brooks, and Lowell Johnson
ASTER L1T Product Specification by USGS EROS Data Center
Latitude-Longitude to Path-Row conversion by USGS
EarthDaily EarthOne Platform by EarthDaily
Working with ASTER L1T Visible and Near Infrared (VNIR) Data in R by Cole Krehbiel

See 7 usage examples →

BossDB Open Neuroimagery Datasets

calcium imagingelectron microscopyimaginglife scienceslight-sheet microscopymagnetic resonance imagingneuroimagingneurosciencevolumetric imagingx-rayx-ray microtomographyx-ray tomography

This data ecosystem, Brain Observatory Storage Service & Database (BossDB), contains several neuro-imaging datasets across multiple modalities and scales, ranging from nanoscale (electron microscopy), to microscale (cleared lightsheet microscopy and array tomography), and mesoscale (structural and functional magnetic resonance imaging). Additionally, many of the datasets include dense segmentation and meshes.

Usage examples

A Community-Developed Open-Source Computational Ecosystem for Big Neuro Data by J. T. Vogelstein, E. Perlman, B. Falk, A. Baden, W. Gray Roncal, V. Chandrashekhar, F. Collman, S. Seshamani, J. L. Patsolic, K. Lillaney, M. Kazhdan, R. Hider, D. Pryor, J. Matelsky, T. Gion, P. Manavalan, B. Wester, M. Chevillet, E. T. Trautman, K. Khairy, E. Bridgeford, D. M. Kleissas, D. J. Tward, A. K. Crow, B. Hsueh, M. A. Wright, M. I. Miller, S. J. Smith, R. J. Vogelstein, K. Deisseroth, and R. Burns
The Block Object Storage Service (bossDB): A Cloud-Native Approach for Petascale Neuroscience Discovery by Robert Hider Jr., Dean M. Kleissas, Derek Pryor, Timothy Gion, Luis Rodriguez, Jordan Matelsky, William Gray-Roncal, Brock Wester
intern: Integrated Toolkit for Extensible and Reproducible Neuroscience by Jordan K Matelsky, Luis Rodriguez, Daniel Xenes, Timothy Gion, Robert Hider Jr., Brock Wester, William Gray-Roncal
Data access and download by Jordan Matelsky
Get To Know A Dataset - BossDB by BossDB Team

See 7 usage examples →

CIViC (Clinical Interpretation of Variants in Cancer)

cancergeneticgenomiclife sciencesvcf

Precision medicine refers to the use of prevention and treatment strategies that are tailored to the unique features of each individual and their disease. In the context of cancer this might involve the identification of specific mutations shown to predict response to a targeted therapy. The biomedical literature describing these associations is large and growing rapidly. Currently these interpretations exist largely in private or encumbered databases resulting in extensive repetition of effort. Realizing precision medicine will require this information to be centralized, debated and interpret...

Usage examples

See 7 usage examples →

Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)

cancergenomiclife sciencesSTRIDEStranscriptomics

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. CPTAC-2 is the Phase II of the CPTAC Initiative (2011-2016). Datasets contain open RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantification, and miRNA Expression Quantification data.

Usage examples

Proteomic analysis of colon and rectal carcinoma using standard and customized databases by Slebos RJ, Wang X, Wang X, Zhang B, Tabb DL, Liebler DC
Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer by Hui Zhang, Tao Liu, Zhen Zhang, Samuel H. Payne, Bai Zhang, Jason E. McDermott, Jian-Ying Zhou, Vladislav A. Petyuk, Li Chen, Debjit Ray, Shisheng Sun, Feng Yang, Lijun Chen, Jing Wang, Punit Shah, Seong Won Cha, Paul Aiyetan, Sunghee Woo, Yuan Tian, Marina A. Gritsenko, Therese R. Clauss, Caitlin Choi, Matthew E. Monroe, Stefani Thomas, Song Nie, Chaochao Wu, Ronald J. Moore, Kun-Hsing Yu, David L. Tabb, David Fenyö, Vineet Bafna, Yue Wang, Henry Rodriguez, Emily S. Boja, Tara Hiltke, Robert C. Rivers, Lori Sokoll, Heng Zhu, Ie-Ming Shih, Leslie Cope, Akhilesh Pandey, Bing Zhang, Michael P. Snyder, Douglas A. Levine, Richard D. Smith, Daniel W. Chan, Karin D. Rodland, the CPTAC Investigators
Proteomic Data Commons by National Cancer Institute
CPTAC Data Portal by National Cancer Institute
Genomic Data Commons by National Cancer Institute

See 7 usage examples →

Coupled Model Intercomparison Project 6

agricultureatmosphereclimateearth observationenvironmentalmodeloceanssimulationsweather

The sixth phase of global coupled ocean-atmosphere general circulation model ensemble.

Usage examples

Comparing CMIP6 Zarr vs NetCDF Holdings by Aparna Radhakrishnan
CMIP6 Data Informational Session Video Recording by ASDI
Getting Started with CMIP6 Data by Aparna Radhakrishnan
Analyze terabyte-scale geospatial datasets with Dask and Jupyter on AWS by Ethan Fahy and Zac Flamig
Finding CMIP6 data using intake-esm and plotting time series for points by Zac Flamig

See 7 usage examples →

Earth Observation Data Cubes for Brazil

cogearth observationgeosciencegeospatialimage processingopen source softwaresatellite imagerystac

Earth observation (EO) data cubes produced from analysis-ready data (ARD) of CBERS-4, Sentinel-2 A/B and Landsat-8 satellite images for Brazil. The datacubes are regular in time and use a hierarchical tiling system. Further details are described in Ferreira et al. (2020).

Usage examples

Using Python to Access Image Collections Data (Jupyter notebook) by Brazil Data Cube
Tile Map Service to view Image Collections from Brazil Data Cube Catalog by Brazil Data Cube
Earth Observation Data Cubes for Brazil: Requirements, Methodology and Products. by K. R. Ferreira, et al.
rstac - R library to query and download Image Collections from Brazil Data Cube Catalog on Amazon S3 by Brazil Data Cube
Building Earth Observation Data Cubes on AWS by Ferreira, K R; Queiroz, G R; Marujo, R F B; Costa, R W.

See 7 usage examples →

GEOS-Chem Input Data

air qualityatmospherechemistryclimateenvironmentalmeteorologicalmodelweather

Input data for the GEOS-Chem Chemical Transport Model, includes NASA/GMAO MERRA-2 and GEOS-FP meteorological products, chemistry input data, emissions input data, and other smaller datasets such as model initial conditions.

Usage examples

See 7 usage examples →

GEOS-Chem Nested Input Data

air qualityatmospherechemistryclimateenvironmentalmeteorologicalmodelweather

Input data for nested-grid simulations using the GEOS-Chem Chemical Transport Model. This includes the NASA/GMAO MERRA-2 and GEOS-FP meteorological products, the HEMCO emission inventories, and other small data such as model initial conditions.

Usage examples

See 7 usage examples →

Global Database of Events, Language and Tone (GDELT)

disaster responseevents

This project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, emotions, quotes, images and events driving our global society every second of every day.

Usage examples

See 7 usage examples →

IBL Neuropixels Reproducible Ephys Data on AWS

life sciencesMus musculusneurophysiologyneuroscienceopen source software

Electrophysiological recordings acquired using Neuropixels probes in different mice and labs, targeting the same brain locations (including posterior parietal cortex, hippocampus, and thalamus).

Usage examples

See 7 usage examples →

ICGC on AWS

bamcancergeneticgenomiclife sciencesvcf

The International Cancer Genome Consortium (ICGC) coordinates projects with the common aim of accelerating research into the causes and control of cancer. The PanCancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in whole genomes from ICGC. More than 2,400 consistently analyzed genomes corresponding to over 1,100 unique ICGC donors are now freely available on Amazon S3 to credentialed researchers subject to ICGC data sharing policies.

Usage examples

See 7 usage examples →

Materials Project Data

chemistrycloud computingdata assimilationdigital assetsdigital preservationenergyenvironmentalfree softwaregenomeHPCinformation retrievalinfrastructurejsonmachine learningmaterials sciencemolecular dynamicsmoleculeopen source softwarephysicspost-processingx-ray crystallography

Materials Project is an open database of computed materials properties aiming to accelerate materials science research. The resources in this OpenData dataset contain the raw, parsed, and build data products.

Usage examples

Atomate2 by Materials Project
MP API python client by Materials Project
The Materials Project: A materials genome approach to accelerating materials innovation by Jain A., Ong S.P., Hautier G., et al.
FireWorks by Materials Project
Getting Started with the Materials API (MAPI) by Materials Project

See 7 usage examples →

NOAA National Water Model CONUS Retrospective Dataset

agricultureagricultureclimatedisaster responseenvironmentaltransportationweather

The NOAA National Water Model Retrospective dataset contains input and output from multi-decade CONUS retrospective simulations. These simulations used meteorological input fields from meteorological retrospective datasets. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time operational NWM forecast model. Additionally, note that no streamflow or other data assimilation is performed within any of the NWM retrospective simulations

One application of this dataset is to provide historical context to current near real-time streamflow, soil moisture and snowpack conditions. The retrospective data can be used to infer flow frequencies and perform tempor...

Usage examples

Explore Repository of Tutorials on National Water Model V2.1 Retrospective Dataset in Zarr by James McCreight
Explore the National Water Model V2.1 Retrospective Dataset in Zarr by James McCreight, Ishita Srivastava, Rich Signell
Processing the 250 TB NWM dataset with Coiled, Dask, and Xarray by Sarah Johnson (Coiled)
Simulating storm surge and compound flooding events with a creek-to-ocean model: Importance of baroclinic effects by Fei Ye, et al.
NOAA's National Water Model: Advancing Operational Hydrology Through Continental-scale Modeling. by Brian Cosgrove, David Gochis, Trey Flowers, Aubrey Dugger, Fred Ogden, Tom Graziano, Ed Clark, et al; 2024

See 7 usage examples →

OpenAQ

air qualitycitiesenvironmentalgeospatial

Global, aggregated physical air quality data from public data sources provided by government, research-grade and other sources. These awesome groups do the hard work of measuring these data and publicly sharing them, and our community makes them more universally-accessible to both humans and machines.

Usage examples

See 7 usage examples →

OpenAerialMap on AWS

aerial imagerycogdisaster responseearth observationsatellite imagery

OpenAerialMap is a collection of high-resolution openly licensed satellite and aerial imagery.

Usage examples

See 10 usage examples →

Scottish Public Sector LiDAR Dataset

citiescoastalcogelevationenvironmentallidarurban

This dataset is Lidar data that has been collected by the Scottish public sector and made available under the Open Government Licence. The data are available as point cloud (LAS format or in LAZ compressed format), along with the derived Digital Terrain Model (DTM) and Digital Surface Model (DSM) products as Cloud optimized GeoTIFFs (COG) or standard GeoTIFF. The dataset contains multiple subsets of data which were each commissioned and flown in response to different organisational requirements. The details of each can be found at https://remotesensingdata.gov.scot/data#/list

Usage examples

Towards National Archaeological Mapping. Assessing Source Data and Methodology - A Case Study from Scotland by Łukasz Banaszek, Dave Cowley and Mike Middleton
New light on medieval settlement in lowland Scotland by Dave Cowley and Piers Dixon
Scottish Remote Sensing Portal by Scottish Government and Joint Nature Conservation Committee (JNCC)
LiDAR Tutorial using R by Michal Michalski
Making LiGHT Work of Large Area Survey? Developing Approaches to Rapid Archaeological Mapping and the Creation of Systematic National-scaled Heritage Data by Dave Cowley, Łukasz Banaszek, George Geddes, Angela Gannon, Mike Middleton and Kirsty Millican

See 7 usage examples →

SnpEff & SnpSift Genomic Variant Annotation Databases

bioinformaticscancergeneticgenomegenomiclife sciencesproteinstructural variationtranscriptomicsvariant annotationvcfwhole exome sequencingwhole genome sequencing

SnpEff is a variant annotation and effect prediction tool that annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes). It supports over 38,000 genomes and provides comprehensive genomic databases for variant annotation. The databases include reference genomes, gene annotations, protein sequences, and regulatory elements from trusted sources like ENSEMBL, RefSeq, and UCSC. SnpSift complements SnpEff by providing tools to annotate genomic variants using databases, filter large genomic datasets, and manipulate annotated variants. Together, these ...

Usage examples

See 7 usage examples →

nuPlan

autonomous vehicleslidarroboticstransportationurban

nuPlan is the world's first large-scale planning benchmark for autonomous driving.

Usage examples

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles by Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, Sammy Omari
nuPlan Advacned Model Training by Motional
nuPlan Framework by Motional
nuPlan devkit by Motional
nuPlan Planner Tutorial by Motional

See 7 usage examples →

10m Annual Land Use Land Cover (9-class)

cogearth observationenvironmentalgeospatialland coverland usemachine learningmappingplanetarysatellite imagerystacsustainability

This dataset, produced by Impact Observatory, Microsoft, and Esri, displays a global map of land use and land cover (LULC) derived from ESA Sentinel-2 imagery at 10 meter resolution for the years 2017 - 2023. Each map is a composite of LULC predictions for 9 classes throughout the year in order to generate a representative snapshot of each year. This dataset was generated by Impact Observatory, which used billions of human-labeled pixels (curated by the National Geographic Society) to train a deep learning model for land classification. Each global map was produced by applying this model to ...

Usage examples

‘Very Dire’: Devastated by Floods, Pakistan Faces Looming Food Crisis by New York Times
These maps from satellite data show how much Earth has changed in only five years by Fast Company
Global land use / land cover with Sentinel 2 and deep learning by K. Karra, C. Kontgis, Z. Statman-Weil, J. C. Mazzariello, M. Mathis and S. P. Brumby
View the dataset on UN Biodiversity Lab Map Viewer by United Nations Development Programme
The world's most populated and greenest megacities (and how we found out) by Esri

See 6 usage examples →

AIRS/Aqua L1C Infrared (IR) resampled and corrected radiances V6.7 (AIRICRAD) at GES DISC

atmosphereclimatedatacenterearth observationglobalhdfmetadataopendaporbit

The Atmospheric Infrared Sounder (AIRS) is a grating spectrometer (R = 1200) aboard the second Earth Observing System (EOS) polar-orbiting platform, EOS Aqua. In combination with the Advanced Microwave Sounding Unit (AMSU) and the Humidity Sounder for Brazil (HSB), AIRS constitutes an innovative atmospheric sounding group of visible, infrared, and microwave sensors. The AIRS Infrared (IR) level 1C data set contains AIRS infrared calibrated and geolocated radiances in W/m2/micron/ster. This data set is generated from AIRS level 1B data. The spectral coverage of L1C data is from 3.74 to 15.4 mm....

Usage examples

Validation of the Atmospheric Infrared Sounder radiative transfer algorithm by Strow, L.L, Hannon, S.E, De-Souza Machado, S., Motteler, H.E., and Tobin, D.C.
AIRS version 6.6 and version 7 level-1C products by Evan M. Manning, L. Larrabee Strow, and Hartmut H. Aumann
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
AIRS Level-1C and applications to cross-calibration with MODIS and CrIS by Evan M. Manning, Hartmut H. Aumann, and Ali Behrangi
Radiometric Stability Validation of 17 Years of AIRS Data Using Sea Surface Temperatures. by Aumann, H.H. Brogerg, S.,Manning, E., and Pagaino, T.

See 6 usage examples →

Argoverse

autonomous vehiclescomputer visiongeospatiallidarrobotics

Home of the Argoverse datasets.Public datasets supported by detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them.This bucket includes the following datasets:

Argoverse 1 (AV1)

Motion Forecasting
Tracking

Argoverse 2 (AV2)

Motion Forecasting
Lidar
Sensor

Trust, but Verify (TbV)

Map Change Detection

Usage examples

conda-forge package for `av2` by Argoverse Authors
Argoverse: 3D Tracking and Forecasting With Rich Maps by Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, James Hays
PyPi package for `av2` by Argoverse Authors
Argoverse 2 API by Argoverse Authors
Trust, but Verify: Cross-Modality Fusion for HD Map Change Detection by John Lambert, James Hays

See 6 usage examples →

Capella Space Synthetic Aperture Radar (SAR) Open Dataset

cogcomputer visionearth observationgeospatialimage processingsatellite imagerystacsynthetic aperture radar

Open Synthetic Aperture Radar (SAR) data from Capella Space. Capella Space is an information services company that provides on-demand, industry-leading, high-resolution synthetic aperture radar (SAR) Earth observation imagery. Through a constellation of small satellites, Capella provides easy access to frequent, timely, and flexible information affecting dozens of industries worldwide. Capella's high-resolution SAR satellites are matched with unparalleled infrastructure to deliver reliable global insights that sharpen our understanding of the changing world – improving decisions ...

Usage examples

Scaling GEO Images in QGIS by Capella Space
Analyzing LiDAR and SAR data with Capella Space and TileDB by Stavros Papadopoulos
Single Look Complex data reader for Capella SLC images - python module to convert Capella SLC data into an amplitude image. by Capella Space
Radar Generalized Image Quality Equation Applied to Capella Open Dataset by Wade Schwartzkopf, Jason Brown, Gordon Farquharson, Craig Stringham, Michael Duersch, Jordan Heemskerk
Python SDK for api.capellaspace.com by Capella Space

See 6 usage examples →

Clinical Proteomic Tumor Analysis Consortium 3 (CPTAC-3)

cancergenomiclife sciencesSTRIDEStranscriptomics

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. CPTAC-3 is the Phase III of the CPTAC Initiative. The dataset contains open RNA-Seq Gene Expression Quantification data.

Usage examples

Genomic Data Commons by National Cancer Institute
CPTAC Data Portal by National Cancer Institute
Integrated Proteogenomic Characterization of Clear Cell Renal Cell Carcinoma by Clark DJ, Dhanasekaran SM, Petralia F, Pan J, Song X, Hu Y, da Veiga Leprevost F, Reva B, Lih TM, Chang HY, Ma W, Huang C, Ricketts CJ, Chen L1, Krek A, Li Y, Rykunov D, Li QK, Chen LS, Ozbek U, Vasaikar S, Wu Y, Yoo S, Chowdhury S, Wyczalkowski MA, Ji J, Schnaubelt M, Kong A, Sethuraman S, Avtonomov DM, Ao M, Colaprico A, Cao S, Cho KC, Kalayci S, Ma S, Liu W, Ruggles K, Calinawan A, Gümüş ZH, Geizler D, Kawaler E, Teo GC, Wen B, Zhang Y, Keegan S, Li K, Chen F, Edwards N, Pierorazio PM, Chen XS, Pavlovich CP, Hakimi AA, Brominski G, Hsieh JJ, Antczak A, Omelchenko T, Lubinski J, Wiznerowicz M, Linehan WM, Kinsinger CR, Thiagarajan M, Boja ES, Mesri M, Hiltke T, Robles AI, Rodriguez H, Qian J, Fenyö D, Zhang B, Ding L, Schadt E, Chinnaiyan AM, Zhang Z, Omenn GS, Cieslik M, Chan DW, Nesvizhskii AI, Wang P, Zhang H; Clinical Proteomic Tumor Analysis Consortium
Evaluation of NCI-7 Cell Line Panel as a Reference Material for Clinical Proteomics by Clark DJ, Hu Y, Bocik W, Chen L, Schnaubelt M, Roberts R, Shah P, Whiteley G, Zhang H
Proteomic Data Commons by National Cancer Institute

See 6 usage examples →

CryoET Data Portal

Biohubcell biologycryo electron tomographyelectron tomographylife sciencesmachine learningsegmentationstructural biology

Cryo-electron tomography (cryoET) is a powerful technique for visualizing 3D structures of cellular macromolecules at near atomic resolution in their native environment. Observing the inner workings of cells in context enables better understanding about the function of healthy cells and the changes associated with disease. However, the analysis of cryoET data remains a significant bottleneck, particularly the annotation of macromolecules within a set of tomograms, which often requires a laborious and time-consuming process of manual labelling that can take months to complete. Given the current...

Usage examples

See 6 usage examples →

DCR Office of Resilience Planning – Public File Repository

coastalfloods

The Virginia Department of Conservation and Recreation’s Office of Resilience Planning maintains this public file repository to provide access to flood resilience open data products. The repository is designed to house public data produced for the Virginia Coastal Resilience Master Plan (CRMP), Virginia Flood Protection Master Plan (VFPMP), and other purposes. At present, the repository hosts only data products produced for the CRMP Phase II (2025) and Phase I (2021).

Usage examples

See 6 usage examples →

DE Africa Waterbodies Monitoring Service

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacwater

The Digital Earth Africa continental Waterbodies Monitoring Service identifies more than 700,000 water bodies from over three decades of satellite observations. This service maps persistent and seasonal water bodies and the change in their water surface area over time. Mapped water bodies may include, but are not limited to, lakes, ponds, man-made reservoirs, wetlands, and segments of some river systems.On a local, regional, and continental scale, this service helps improve our understanding of surface water dynamics and water availability and can be used for monitoring water bodies such as we...

Usage examples

Waterbodies Monitoring Service by Digital Earth Africa Contributors
Digital Earth Africa Notebook Repo by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Geoportal by Digital Earth Africa Contributors
Digital Earth Africa Sandbox by Digital Earth Africa Contributors

See 6 usage examples →

ESM Atlas — Protein Features and Structures

Biohubbioinformaticslife sciencesmachine learningmetagenomicsproteinstructural biology

The ESM Atlas is a large-scale public dataset of computational outputs generated by ESMC and ESMFold2, derived from a deduplicated set of over 6.8 billion publicly available protein sequences spanning all domains of life — including viral proteins and previously unannotated sequences representing metagenomic dark matter sampled from a wide range of biomes. The dataset includes two primary components. A sparse autoencoder (SAE) features for ~6.8 billion proteins, capturing interpretable biological representations from the ESMC 6B model, and predicted three-dimensional protein structures for ~1....

Usage examples

See 6 usage examples →

EarthDEM

cogearth observationelevationgeospatialmappingopen source softwaresatellite imagerystac

EarthDEM - 2m GSD Digital Elevation Models (DEMs) and mosaics from 2002 to the present. The EarthDEM project seeks to fill the need for high-resolution time-series elevation data in non-polar regions. The time-dependent nature of the strip DEM files allows users to perform change detection analysis and to compare observations of topography data acquired in different seasons or years. The mosaic DEM tiles are assembled from multiple strip DEMs with the intention of providing a more consistent and comprehensive product over large areas. EarthDEM data is constructed from in-track and cross-track ...

Usage examples

NASA CSDA SmallSat Data Explorer by NASA Commercial Smallsat Acquisition Program
Automated stereo-photogrammetric DEM generation at high latitudes: Surface Extraction with TIN-based Search-space Minimization (SETSM) validation and demonstration over glaciated regions by Myoung-Jong Noh, Ian M. Howat
The surface extraction from TIN based search-space minimization (SETSM) algorithm by Myoung-Jong Noh, Ian M. Howat
Multi-Source EO for Dynamic Wetland Mapping and Monitoring in the Great Lakes Basin by Michael J. Battaglia, Sarah Banks, Amir Behnamian, Laura Bourgeau-Chavez, Brian Brisco, Jennifer Corcoran, Zhaohua Chen, Brian Huberty, James Klassen, Joseph Knight, Paul Morin, Kevin Murnaghan, Keith Pelletier, Lori White
GLARS Data Viewer by Great Lakes Alliance for Remote Sensing

See 6 usage examples →

GeoJSON Files for Geo-TIDE

electricityenergyenvironmentalgeospatialsupply chainsustainabilitytransportation

GeoJSON files for the MIT Climate & Sustainability Consortium's Geospatial Trucking Industry Decarbonization Explorer

Usage examples

See 6 usage examples →

HYCOM-OceanTrack Integrated HYCOM Eulerian Fields and Lagrangian Trajectories Dataset

driftersEulerianHYCOMLagrangiannumerical particleocean circulationocean currentsocean sea surface heightocean simulationocean velocityoceans

A combined dataset of simulated ocean sea surface height, near-surface velocities, and particle trajectories from a global 1/25th degree HYbrid Coordinate Ocean Model (HYCOM) 1-year run.

Usage examples

See 6 usage examples →

HYbrid Coordinate Ocean Model Global Ocean Forecast System Reanalysis

globaloceans

Global Ocean Forecasting System (GOFS) 3.1 output on the GLBv0.08 grid. The resolution is 0.08° resolution between 40°S and 40°N, 0.04° poleward of these latitudes. The temportal frequenct is 3 hourly. This data was created by the Naval Research Laboratory: Ocean Dynamics and Prediction Branch.

Usage examples

HYCOM-tools repository by Dr. Alan Wallcraft
Variational Data Assimilation for the Global Ocean by James A. Cummings, Ole Martin Smedstad
Validation Test Report for the Improved Synthetic Ocean Profile (ISOP) System, Part I: Synthetic Profile Methods and Algorithm by NAVAL RESEARCH LAB STENNIS DETACHMENT STENNIS SPACE CENTER MS OCEANOGRAPHY DIV
Operational multivariate ocean data assimilation by James A. Cummings
HYCOM-examples repository by Dr. Alexandra Bozec

See 6 usage examples →

IBL Behavioral Data on AWS

life sciencesMus musculusneurophysiologyneuroscienceopen source software

Behavioral data of mice performing a decision-making task, associated with 2020 publication of the IBL.

Usage examples

See 6 usage examples →

NOAA Rapid Refresh Forecast System (RRFS) [Prototype]

agricultureclimatemeteorologicalweather

The Rapid Refresh Forecast System (RRFS) is the National Oceanic and Atmospheric Administration’s (NOAA) next generation convection-allowing, rapidly-updated ensemble prediction system, currently scheduled for operational implementation in 2026. The operational configuration will feature a 3 km grid covering North America and include deterministic forecasts every hour out to 18 hours, with deterministic and ensemble forecasts to 60 hours four times per day at 00, 06, 12, and 18 UTC.The RRFS will provide guidance to support forecast interests including, but not limited to, aviation, severe convective weather, renewable energy, heavy precipitation, and winter weather on timescales where rapidly-updated guidance is particularly useful....

Usage examples

A Limited Area Modeling Capability for the Finite-Volume Cubed-Sphere (FV3) Dynamical Core and Comparison With a Global Two-Way Nest by Black, T. L., J. A. Abeles, B. T. Blake, D. Jovic, E. Rogers, X. Zhang, E. A. Aligo, L. C. Dawson, Y. Lin, E. Strobach, P. C. Shafran, and J. R. Carley
Community modeling framework underpinning the RRFS - The UFS Short Range Weather Application by UFS Community
Prototype UFS-Based Rapid Refresh Forecast System (RRFS) on the Cloud by Holt, C., D. Abdi, J. A. Abeles, J. R. Carley, C. W. Harrop, R. Panda, S. Trahan, and C. R. Alexander
Status and Opportunities with the Rapid Refresh Forecast System by Carley J. R. and C. R. Alexander
Assessment of the data assimilation framework for the Rapid Refresh Forecast System v0.1 and impacts on forecasts of a convective storm case study by Banos, I. H., W. D. Mayfield, G. Ge, L. F. Sapucci, J. R. Carley, and L. Nance

See 6 usage examples →

NYU Langone & FAIR FastMRI Dataset

biologyhealthimage processingimaginglife sciencesmagnetic resonance imagingneurobiologyneuroimaging

This dataset contains deidentified raw k-space data and DICOM image files of over 1,500 knees and 6,970 brains.

Usage examples

See 6 usage examples →

New York City Taxi and Limousine Commission (TLC) Trip Record Data

citiestransportationurban

Data of trips taken by taxis and for-hire vehicles in New York City. Note: access to this dataset is free, however direct S3 access does require an AWS account. Anonymous downloads are accessible from the dataset's documentation webpage listed below.

Usage examples

Build a Real-time Stream Processing Pipeline with Apache Flink on AWS by Steffen Hausmann
Optimizing data for analysis with Amazon Athena and AWS Glue by Manav Sehgal
Exploring data with Python and Amazon S3 Select by Manav Sehgal
Machine learning on distributed Dask using Amazon SageMaker and AWS Fargate by Ram Vittal
Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications by Steffen Hausmann

See 6 usage examples →

Open Bioinformatics Reference Data for Galaxy

bioinformaticsbiologygeneticgenomiclife sciencesreference index

This dataset provides genomic reference data and software packages for use with Galaxy and Bioconductor applications. The reference data is available for hundreds of reference genomes and has been formatted for use with a variety of tools. The available configuration files make this data easily incorporable with a local Galaxy server without additional data preparation. Additionally, Bioconductor's AnnotationHub and ExperimentHub data are provided for use via R packag...

Usage examples

Accessible, curated metagenomic data through ExperimentHub by Edoardo Pasolli, Lucas Schiffer, Paolo Manghi, Audrey Renson, Valerie Obenchain, Duy Tin Truong, Francesco Beghini, Faizan Malik, Marcel Ramos, Jennifer B Dowd, Curtis Huttenhower, Martin Morgan, Nicola Segata, and Levi Waldron
Bioconductor by Bioconductor Project
Galaxy by Galaxy Project
Using Open Bio Ref Data with Galaxy and Bioconductor by Enis Afgan, Alexandru Mahmoud, Nuwan Goonasekera
TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages by Tiago C. Silva, Antonio Colaprico, Catharina Olsen, Fulvio D'Angelo, Gianluca Bontempi, Michele Ceccarelli, Houtan Noushmehr

See 6 usage examples →

OpenEEW

disaster responseearth observationearthquakes

Grillo has developed an IoT-based earthquake early-warning system, with sensors currently deployed in Mexico, Chile, Puerto Rico and Costa Rica, and is now opening its entire archive of unprocessed accelerometer data to the world to encourage the development of new algorithms capable of rapidly detecting and characterizing earthquakes in real time.

Usage examples

OpenEEW library for Python by Grillo
NepalEEW: Testing the feasibility of an Earthquake Early Warning System in Nepal by Vaclav M. Kuna
Detect earthquakes in Chile in real time — from anywhere in the world by Michael Allman
Evaluation of the Grillo sensor, a low-cost accelerometer for IoT-based Real-time seismology by Vaclav Matej Kuna, Diego Melgar, Andres Meira
How Grillo Built a Low-Cost Earthquake Early Warning System on AWS by Marcia Villalba

See 6 usage examples →

Pacific Ocean Sound Recordings

acousticsbiodiversitybiologyclimatecoastaldeep learningecosystemsenvironmentalmachine learningmarine mammalsoceansopen source software

This project offers passive acoustic data (sound recordings) from a deep-ocean environment off central California. Recording began in July 2015, has been nearly continuous, and is ongoing. These resources are intended for applications in ocean soundscape research, education, and the arts.

Usage examples

New Passive Acoustic Monitoring in Monterey Bay National Marine Sanctuary by Ryan et al. (2016)
Seal bomb noise as a potential threat to Monterey Bay harbor porpoise by Simonis et al. (2020)
Humpback whale song occurrence reflects ecosystem variability in feeding and migratory habitat of the northeast Pacific by Ryan et al. (2019)
Animal-borne metrics enable acoustic detection of blue whale migration by Oestreich et al. (2020)
Reduction of Low-Frequency Vessel Noise in Monterey Bay National Marine Sanctuary During the COVID-19 Pandemic by Ryan et al. (2021)

See 6 usage examples →

Planette C3S Seasonal Forecast Data

climateearth observationweather

The C3S seasonal forecast dataset provides global, daily, probabilistic forecasts of the Earth system, enabling users to assess the likelihood of future climate states. These forecasts are particularly valuable for studying slowly evolving climate patterns such as El Niño, La Niña, and the North Atlantic Oscillation (NAO), which can be predicted with greater skill than the chaotic atmosphere. This dataset is derived from the Copernicus Climate Change Service (C3S) archive and includes SEAS5 hindcasts (1981-2016) and forecasts (2017-present) at 1°x1° global resolution. More models from the...

Usage examples

xarray by xarray Developers
zarr-python by zarr Developers
C3S Seasonal Forecasts Documentation by Copernicus Climate Change Service
SEAS5: The new ECMWF seasonal forecast system by Johnson, S. J., et al.
Accessing C3S Seasonal Forecast Data with Python by Aodhan Sweeney-Jaramillo

See 6 usage examples →

Planette ERA5 Archive

climateearth observationweather

The ERA5 archive provides a comprehensive record of global weather and climate from 1940 to present, with multiple temporal aggregations for flexible analysis. This dataset is derived from the ECMWF/Copernicus ERA5 reanalysis and includes daily means, 7-day rolling means, and monthly/seasonal aggregations at 0.25°×0.25° global resolution. The Planette ERA5 archive stores this data in cloud-native format (Zarr with icechunk) for efficient access and analysis.The dataset includes essential atmospheric variables at both surface and pressure levels, enabling a wide range of climate analyses, ...

Usage examples

The ERA5 global reanalysis by Hersbach, H., et al.
ERA5 Documentation by ECMWF
icechunk by earth-mover
zarr-python by zarr Developers
xarray by xarray Developers

See 6 usage examples →

PoroTomo

geospatialgeothermalimage processingseismology

Released to the public as part of the Department of Energy's Open Energy Data Initiative, these data represent vertical and horizontal distributed acoustic sensing (DAS) data collected as part of the Poroelastic Tomography (PoroTomo) project funded in part by the Office of Energy Efficiency and Renewable Energy (EERE), U.S. Department of Energy.

Usage examples

PoroTomo DAS Data Processing Tutorial for hdf5 Files by Nicole Taverna and Michael Rossol
Ground motion response to an ML 4.3 earthquake using co-located distributed acoustic sensing and seismometer arrays by Herbert F Wang, Xiangfang Zeng, Douglas E Miller, Dante Fratta, Kurt L Feigl, Clifford H Thurber, Robert J Mellors
DAS and DTS at Brady Hot Springs: Observations about Coupling and Coupled Interpretations by Douglas E. Miller, Thomas Coleman, Xiangfang Zeng, Jeremy R. Patterson , Elena C. Reinnisch, Michael A. Cardiff, Herbert F. Wang, Dante Fratta, Whitney Trainor-Guitton, Clifford H. Thurber, Michelle ROBERTSON, Kurt FEIGL, and The PoroTomo Team
PoroTomo DAS Data Processing Tutorial for hdf5 Files via HSDS and h5pyd by Michael Rossol and Nicole Taverna
PoroTomo DAS Data Processing Tutorial for SEG-Y Files by Nicole Taverna and Ross Ring-Jarvi

See 6 usage examples →

Public Utility Data Liberation Project

economicselectricityenergyenergy modelingenvironmentalgeospatialgovernment recordsindustrialindustryinfrastructuremarket dataparquetregulatorysolarsqliteusutilities

The Public Utility Data Liberation Project (PUDL) provides analysis-ready U.S. energy system data in bulk for programmatic use. Sources include the U.S. Energy Information Administration (EIA), the Environmental Protection Agency (EPA), the Federal Energy Regulatory Commission (FERC), the Pipeline and Hazardous Materials Safety Administration (PHMSA), the Securities and Exchange Commission (SEC). The primary focus is on the electricity sector, with additional data on the natural gas system and energy company financial reporting.

Usage examples

See 6 usage examples →

RCM CEOS Analysis Ready Data | Données prêtes à l'analyse du CEOS pour le MCR

agricultureanalysis ready dataceosdisaster responseearth observationgeospatialsatellite imagerystacsustainabilitysynthetic aperture radar

The RADARSAT Constellation Mission (RCM) is Canada's third generation of Earth observation satellites. Launched on June 12, 2019, the three identical satellites work together to bring solutions to key challenges for Canadians. As part of ongoing Open Government efforts, NRCan produces a CEOS analysis ready data (ARD) of Canada landmass using a 30M Compact-Polarization standard coverage, every 12 days. RCM CEOS-ARD (POL) is the first ever polarimetric dataset approved by the CEOS committee. Previously, users were stuck ordering, downloading and processing RCM images (level 1) on their own, often wit...

Usage examples

See 6 usage examples →

RarePlanes

computer visiondeep learningearth observationgeospatiallabeledmachine learningsatellite imagery

RarePlanes is a unique open-source machine learning dataset from CosmiQ Works and AI.Reverie that incorporates both real and synthetically generated satellite imagery. The RarePlanes dataset specifically focuses on the value of AI.Reverie synthetic data to aid computer vision algorithms in their ability to automatically detect aircraft and their attributes in satellite imagery. Although other synthetic/real combination datasets exist, RarePlanes is the largest openly-available very high resolution dataset built to test the value of synthetic data from an overhead perspective. The real portion ...

Usage examples

Notebook for training and testing YOLYv4 by Adam Van Etten
RarePlanes Codebase by Thomas Hossler and Jacob Shermeyer
Getting Started with YOLTv4 for Object Detection in Imagery: Getting Training Data by Sophia Parafina
RarePlanes: Synthetic Data Takes Flight by Jacob Shermeyer, Thomas Hossler, Adam Van Etten, Daniel Hogan, Ryan Lewis, Daeil Kim
Announcing YOLTv4: Improved Satellite Imagery Object Detection by Adam Van Etten

See 6 usage examples →

Sentinel-1 Monthly Mosaic

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsynthetic aperture radar

Synthetic Aperture Radar (SAR) sensor have the advantage of operating at wavelengths not impeded by cloud cover and can acquire data over a site during the day or night. The Sentinel-1 mission, part of the Copernicus joint initiative by the European Commission (EC) and the European Space Agency (ESA), provides reliable and repeated wide-area monitoring using its SAR instrument.Sentinel-1 Monthly Mosaics are analysis-ready product of individual Sentinel-1 acquisitions. Sentinel-1 monthly mosaics are generated from Radiometric Terrain Corrected (RTC) backscatter data, with variations from changi...

Usage examples

Digital Earth Africa Notebook Repo by Digital Earth Africa Contributors
Digital Earth Africa Explorer by Digital Earth Africa Contributors
Digital Earth Africa web services by Digital Earth Africa Contributors
Digital Earth Africa Map by Digital Earth Africa Contributors
Digital Earth Africa Geoportal by Digital Earth Africa Contributors

See 6 usage examples →

Serratus: Ultra-deep Search for Novel Viruses - Versioned Data Release

bamCOVID-19geneticgenomiclife sciencesMERSSARSSARS-CoV-2virus

Serratus is a collaborative open science project for ultra-rapid discovery of known and unknown coronaviruses in response to the COVID-19 pandemic through re-analysis of publicly available genomic data. Our resulting vertebrate viral alignment data is explorable via the Serratus Explorer and directly accessible on Amazon S3.

Usage examples

Serratus Explorer by Serratus Team
Diversification of mammalian deltaviruses by host shifting by Bergner L.M., Orton R.J., et al (2021)
Petabase-scale sequence alignment catalyses viral discovery by Edgar R., Taylor J., Lin V., et al (2021)
coronaSPAdes. From biosynthetic gene clusters to RNA viral assemblies by Meleshko D., Hajirasouliha I., and Korobeynikov A. (2021)
Tantalus: An R Package for exploration of Serratus data by Serratus Team

See 6 usage examples →

Solar Dynamics Observatory (SDO) Machine Learning Dataset

machine learningNASA SMD AI

The v1 dataset includes AIA/HMI observations 2010-2018 and v2 includes AIA/HMI observations 2010-2020 in all 10 wavebands (94A, 131A, 171A, 193A, 211A, 304A, 335A, 1600A, 1700A, 4500A), with 512x512 resolution and 6 minutes cadence; HMI vector magnetic field observations in Bx, By, and Bz components, with 512x512 resolution and 12 minutes cadence; The EVE observations in 39 wavelengths from 2010-05-01 to 2014-05-26, with 10 seconds cadence.

Usage examples

A Machine-learning Data Set Prepared from the NASA Solar Dynamics Observatory Mission by Galvez, Richard; Fouhey, David F.; Jin, Meng; Szenicer, Alexandre; et al
Scripts for generating the SDOMLv2 dataset by Jin, Meng
Scripts for generating the SDOMLv1 dataset by Fouhey, David F.; Jin, Meng
ML applications based on the SDOMLv2 dataset by Wright, Paul J.
ML applications based on the SDOMLv1 dataset by Salvatelli, Valentina; dos Santos, Luiz F. G.; Bose, Souvik; Neuberg, Brad; Cheung, Mark C. M.; Janvier, Miho; Jin, Meng; Gal, Yarin; Boerner, Paul; Baydin, Atılım Güneş

See 6 usage examples →

The MIT Supercloud Dataset

cloud computingdatacenterenergyHPCworkload analysis

Collection of parsed datacenter logs and time series data of hardware utilization from the MIT Supercloud system.

Usage examples

Generating the labelled dataset by Matthew Weiss
AI-Enabling Workloads on Large-Scale GPU-Accelerated System: Characterization, Opportunities, and Implications by Baolin Li, Rohin Arora, Siddharth Samsi, et. al.
The MIT Supercloud Dataset by Siddharth Samsi, Matthew Weiss, David Bestor, et. al.
The MIT Supercloud Workload Classification Challenge by Benny J. Tang, Qiqi Chen, Matthew L. Weiss, et. al.
The MIT Supercloud Workload Classification Challenge by Benny J. Tang, Qiqi Chen, Matthew L. Weiss, et. al.

See 6 usage examples →

Wildfire Projections to Support Climate Resilience

agricultureclimateclimate modelclimate projectionsdisaster responseelectricityenergyenvironmentalgeospatialmeteorologicalsolarsustainabilityweather

Wildfire projections for California and her environs in support of California's Fifth Climate Assessment supported with historical weather observations and renewable energy capacity profiles for grid operations.

Usage examples

See 6 usage examples →

2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File

censusdifferential privacydisclosure avoidanceethnicitygroup quartershispanichousinghousing unitslatinonoisy measurementspopulationraceredistrictingvoting age

The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File (2023-06-30) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9 , and implemented in https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code). The NMF was produced using the official “production settings,” the final set of algorithmic parameters and privacy-loss budget allocations, that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Char...

Usage examples

DAS 2020 Redistricting Production Code Release by U.S. Census Bureau (Public GitHub repository for the 2020 Census DAS, vintaged as of the commit used to produce the official production run of the Redistricting product. The zCDP framework NMFs were generated in a for-internal-use-only pickled (https://docs.python.org/3/library/pickle.html; https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.pickleFile.html) form as a byproduct of the use of this code. A stand-alone script was developed and used to convert these internal-use NMFs into the Parquet format used in this product (that script is not yet publicly available.)
Computing Confidence Intervals Using the 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) by Cumings-Menon, R., Hawes, M., and Spence, M. (2023) "(Jupyter notebook explaining how to calculate estimates and confidence intervals from the noisy measurement files)"
The 2020 Census Disclosure Avoidance System Topdown Algorithm by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.
Geographic Spines in the 2020 Census Disclosure Avoidance System by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.
2010 Census Summary File 1 Technical Documentation by U.S. Census Bureau

See 5 usage examples →

2020 Census Demographic and Housing Characteristics (DHC) Noisy Measurement File

censusdifferential privacydisclosure avoidanceethnicitygroup quartershousinghousing unitsnoisy measurementspopulationraceredistrictingvoting age

The 2020 Census Demographic and Housing Characteristics Noisy Measurement File is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022], and implemented in primitives.py). The 2020 Census Demographic and Housing Characteristics Noisy Measurement File includes zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism (Cannone C., et al., [2023] ), which added positive or negative integer-valued noise to each of the resulting counts. These ar...

Usage examples

Computing Confidence Intervals Using the 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) (Jupyter notebook explaining how to calculate estimates and confidence intervals from the noisy measurement files) by Cumings-Menon, R., Hawes, M., and Spence, M. (2023)
DAS 2020 DHC Production Code Release by U.S. Census Bureau
The 2020 Census Disclosure Avoidance System Topdown Algorithm by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.
Geographic Spines in the 2020 Census Disclosure Avoidance System by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.
2020 Census Demographic and Housing Characteristics File Technical Documentation by U.S. Census Bureau. Note that the zCDP framework NMFs were generated in a for-internal-use-only pickled (https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.pickleFile.html) form as a byproduct of the use of this code. A stand-alone script was developed and used to convert these internal-use NMFs into the Parquet format used in this product (that script is not yet publicly available).

See 5 usage examples →

3000 Rice Genomes Project

agriculturefood securitygeneticgenomiclife sciences

The 3000 Rice Genome Project is an international effort to sequence the genomes of 3,024 rice varieties from 89 countries.

Usage examples

Rice Galaxy: an open resource for plant science by Juanillas V et al (2019)
RiceGalaxy by International Rice Research Institute
Structural variants in 3000 rice genomes by Fuentes RR et al (2019)
Tracking the origin of two genetic components associated with transposable element bursts in domesticated rice by Chen J et al (2019)
Identification and Allele Combination Analysis of Rice Grain Shape-Related Genes by Genome-Wide Association Study by Meng B et al (2022)

See 5 usage examples →

AWS Public Blockchain Data

blockchainweb3

The AWS Public Blockchain Data initiative provides free access to blockchain datasets through collaboration with data providers. The data is optimized for analytics by being transformed into compressed Parquet files, partitioned by date for efficient querying.

...

Usage examples

Unlocking XRP Ledger Data: Comprehensive Analysis with AWS Public Blockchain Datasets by Simon Goldberg, Everton Fraga
New datasets added to the AWS Public Blockchain Datasets — available for analytics and research by Everton Fraga, Simon Goldberg
FEDS Notes - Primary and Secondary Markets for Stablecoins by Cy Watsky, Jeffrey Allen, Hamzah Daud, Jochen Demuth, Daniel Little, Megan Rodden, Amber Seira
Access Bitcoin and Ethereum open datasets for cross-chain analytics by Oliver Steffmann, Bhaskar Ravat, Sreeji Gopal, and Stefan Dicker
Exploring Arbitrum Data: Analyze L2 Activity with AWS Public Blockchain Datasets by Simon Goldberg, Everton Fraga

See 5 usage examples →

Amazonia EO satellite on AWS

agriculturecogdisaster responseearth observationgeospatialimagingsatellite imagerystacsustainability

Imagery acquired by Amazonia-1 satellite. The image files are recorded and processed by Instituto Nacional de Pesquisas Espaciais (INPE) and are converted to Cloud Optimized Geotiff format in order to optimize its use for cloud based applications. WFI Level 4 (Orthorectified) scenes are being ingested daily starting from 08-29-2022, the complete Level 4 archive will be ingested by the end of October 2022.

Usage examples

The Evolution of ASDI's Data Infrastructure by Sean Harkins
Amazonia 1 stactools package by Frederico Liporace
Keeping a SpatioTemporal Asset Catalog (STAC) Up To Date with SNS/SQS by Frederico Liporace
STAC V1.0.0 endpoint by Frederico Liporace
Amazonia 1 stactools-pipeline by Frederico Liporace

See 5 usage examples →

Argo marine floats data and metadata from Global Data Assembly Centre (Argo GDAC)

chemical biologychemistryclimatedatacenterdigital assetsgeochemistrygeophysicsgeosciencemarinenetcdfoceans

Argo is an international program to observe the interior of the ocean with a fleet of profiling floats drifting in the deep ocean currents (https://argo.ucsd.edu). Argo GDAC is a dataset of 5 billion in situ ocean observations from 18.000 profiling floats (4.000 active) which started 20 years ago. Argo GDAC dataset is a collection of 18.000 NetCDF files. It is a major asset for ocean and climate science, a contributor to IOCCP reports.

Usage examples

See 5 usage examples →

Automated Segmentation of Intracellular Substructures in Electron Microscopy (ASEM) on AWS

biologycell biologycomputer visionelectron microscopyimaginglife sciencesmicroscopysegmentation

The Automated Segmentation of intracellular substructures in Electron Microscopy (ASEM) project provides deep learning models trained to segment structures in 3D images of cells acquired by Focused Ion Beam Scanning Electron Microscopy (FIB-SEM). Each model is trained to detect a single type of structure (mitochondria, endoplasmic reticulum, golgi apparatus, nuclear pores, clathrin-coated pits) in cells prepared via chemically-fixation (CF) or high-pressure freezing and freeze substitution (HPFS). You can use our open source pipeline to load a model and predict a class of sub-cellular structur...

Usage examples

Deep neural network automated segmentation of cellular structures in volume electron microscopy by Benjamin Gallusser, Giorgio Maltese, Giuseppe Di Caprio, Tegy John Vadakkan, Anwesha Sanyal, Elliott Somerville, Mihir Sahasrabudhe, Justin O’Connor, Martin Weigert, Tom Kirchhausen
Data layout and how to view by kirchhausenlab
ASEM Colab Notebook (Interactive Demo) by Patrick Stock
How to use models by kirchhausenlab
TK Lab Data Explorer by Patrick Stock

See 5 usage examples →

CAM6 Data Assimilation Research Testbed (DART) Reanalysis: Cloud-Optimized Dataset

atmosphereclimateclimate modeldata assimilationforecastgeosciencegeospatiallandmeteorologicalweatherzarr

This is a cloud-hosted subset of the CAM6+DART (Community Atmosphere Model version 6 Data Assimilation Research Testbed) Reanalysis dataset. These data products are designed to facilitate a broad variety of research using the NCAR CESM 2.1 (National Center for Atmospheric Research's Community Earth System Model version 2.1), including model evaluation, ensemble hindcasting, data assimilation experiments, and sensitivity studies. They come from an 80 member ensemble reanalysis of the global troposphere and stratosphere using DART and CAM6. The data products represent states of the atmospher...

Usage examples

Analyzing large climate model ensembles in the cloud by Joe Hamman, NCAR
A new CAM6 + DART reanalysis with surface forcing from CAM6 to other CESM models by Raeder, K., Hoar, T.J., El Gharamti, M. et al (2021)
Rendered (static) version of Jupyter Notebook by Brian Bonnlander, NCAR
Intake-ESM Catalog by Brian Bonnlander, NCAR
Jupyter Notebook and other documentation and tools for DART Reanalysis on AWS by NCAR Science at Scale team

See 5 usage examples →

CAncer MEtastases in LYmph nOdes challeNge (CAMELYON) Dataset

cancercomputational pathologycomputer visiondeep learninggrand-challenge.orghistopathologylife sciences

"This dataset contains the all data for the CAncer MEtastases in LYmph nOdes challeNge or CAMELYON. CAMELYON was the first challenge using whole-slide images in computational pathology and aimed to help pathologists identify breast cancer metastases in sentinel lymph nodes. Lymph node metastases are extremely important to find, as they indicate that the cancer is no longer localized and systemic treatment might be warranted. Searching for these metastases in H&E-stained tissue is difficult and time-consuming and AI algorithms can play a role in helping make this faster and more accura...

Usage examples

See 5 usage examples →

CESM-HR

climateclimate modelclimate projectionsCMIP6ocean circulationocean currentsocean sea surface heightocean simulationocean velocity

This dataset provides several global fields describing the state of atmosphere, ocean, land and ice from a high-resolution (0.1o for the ocean/ice models 0.25o for the land/atmosphere models) numerical earth system model, the Community Earth System Model (CESM, https://www.cesm.ucar.edu/). Texas A&M University (TAMU) and National Center for Atmospheric Research together with international partners collaboratively carried out a large set of high-resolution climate simulations, including a 500-year long preindustrial control simulation (PI-CTRL) described here. The CESM uses dynamic equation...

Usage examples

An Unprecedented Set of High‐Resolution Earth System Simulations for Understanding Multiscale Interactions in Climate Variability and Change by Chang et al. (2020)
CESM-HR Tutorial by Jaison Kurian
CESM-HR Tools and Applications by Jaison Kurian
High-resolution modelling identifies the Bering Strait’s role in amplified Arctic warming by Xu et al. (2024)
Uncertain future of sustainable fisheries environment in eastern boundary upwelling zones under climate change by Chang et al. (2023)

See 5 usage examples →

CMAS Data Warehouse

air qualityclimateenvironmentalgeospatialmeteorological

CMAS Data Warehouse on AWS collects and disseminates meteorology, emissions and air quality model input and output for Community Multiscale Air Quality (CMAQ) Model Applications. This dataset is available as part of the AWS Open Data Program, therefore egress fees are not charged to either the host or the person downloading the data. This S3 bucket is maintained as a public service by the University of North Carolina's CMAS Center, the US EPA’s Office of Research and Development, and the US EPA’s Office of Air and Radiation. Metadata and DOIs for datasets included in the CMAS Data Wareho...

Usage examples

See 5 usage examples →

Caenorabditis Diversity Natural Resource

bambioinformaticsbiologyCaenorhabditis elegansfastqgatk-svgenetic mapsgenomegenome wide association studygenomiclife sciencesshort read sequencingvariant annotationvcf

The Caenorhabditis Natural Diversity Resource (CaeNDR) is a data repository and analysis hub of wild strains of selfing Caenhorabditis species C. elegans, C. briggsae, and C. tropicalis from around the world to facilitate discovery of genetic variation across all three species through genome-wide association mappings to correlate genotype with phenotype and identify genetic variation underlying quantitative traits.

Usage examples

FAQ - AWS API by Erik Andersen
Data Releases - C. tropicalis by Erik Andersen
Data Releases - C. elegans by Erik Andersen
Data Releases - C. briggsae by Erik Andersen
CaeNDR, the Ceanorhabditis Natural Diversity Resource by Crombie TA, McKeown R, Moya ND, Evans KS, Widmayer SJ, LaGrassa V, et al.

See 5 usage examples →

CoMMpass from the Multiple Myeloma Research Foundation

cancergeneticgenomiclife sciencesSTRIDESwhole genome sequencing

The Relating Clinical Outcomes in Multiple Myeloma to Personal Assessment of Genetic Profile study is the Multiple Myeloma Research Foundation (MMRF)’s landmark personalized medicine initiative. CoMMpass is a longitudinal observation study of around 1000 newly diagnosed myeloma patients receiving various standard approved treatments. The MMRF’s vision is to track the treatment and results for each CoMMpass patient so that someday the information can be used to guide decisions for newly diagnosed patients. CoMMpass checked on patients every 6 months for 8 years, collecting tissue samples, gene...

Usage examples

"Interim Analysis Of The MMRF CoMMpass Trial: a Longitudinal Study In Multiple Myeloma Relating Clinical Outcomes To Genomic and Immunophenotypic Profiles" by Keats JJ, Craig DW, Liang W, Venkata Y, Kurdoglu A, Aldrich J, Auclair D, Allen K, Harrison B, Jewell S, Kidd PG, Correll M, Jagannath S, Siegel DS, Vij R, Orloff G, Zimmerman TM, MMRF CoMMpass Network, Capone W, Carpten J, Lonial S.
"Molecular Predictors of Outcome and Drug Response in Multiple Myeloma: An Interim Analysis of the Mmrf CoMMpass Study" by Jonathan J Keats, PhD, Gil Speyer, Austin Christofferson, Christophe Legendre, PhD, Jessica Aldrich, Megan Russell, Lori Cuyugan, Jonathan Adkins, Alex Blanski, Meghan Hodges, Dan Rohrer, Sundar Jagannath, MD, Ravi Vij, MD, Gregory Orloff, MD, Todd Zimmerman, MD, Ruben Niesvizky, MD, Darla Liles, MD, Joseph W. Fay, Jeffrey L. Wolf, MD, Robert M Rifkin, Norma C Gutierrez, MD PhD, Mmrf CoMMpass Network, Jennifer Yesil, MS, Mary Derome, MS, Seungchan Kim, PhD, Winnie Liang, PhD, Pamela G. Kidd, MD, Scott Jewell, PhD, John David Carpten, PhD, Daniel Auclair, PhD, Sagar Lonial, MD FACP
Genomic Data Commons by National Cancer Institute
"Identification of Initiating Trunk Mutations and Distinct Molecular Subtypes: An Interim Analysis of the Mmrf Commpass Study" by Jonathan J Keats, PhD, Gil Speyer, Legendre Christophe, Christofferson Austin, Kristi Stephenson, BS, Ahmet Kurdoglu, Megan Russell, Aldrich Jessica, Cuyugan Lori, Jonathan Adkins, Jackie McDonald, Adrienne Helland, Alex Blanski, Meghan Hodges, Dan Rohrer, Sundar Jagannath, MD, David Siegel, MD PhD, Ravi Vij, MD MBA, Gregory Orloff, MD, Todd Zimmerman, MD, Ruben Niesvizky, MD, Darla Liles, MD, Joseph W. Fay, Jeffrey L. Wolf, MD PhD, Robert M. Rifkin, Norma C Gutierrez, The MMRF CoMMpass Network, Jen Toups, Mary Derome, MS, Winnie Liang, PhD, Seunchan Kim, Daniel Auclair, PhD, Pamela G. Kidd, MD, Scott Jewell, PhD, John David Carpten, PhD, Sagar Lonial, MD
"Interim Analysis of the Mmrf Commpass Trial: Identification of Novel Rearrangements Potentially Associated with Disease Initiation and Progression" by Sagar Lonial, MD, Venkata D Yellapantula, Winnie Liang, PhD, Ahmet Kurdoglu, BS, Jessica Aldrich, MSc, Christophe M. Legendre, MD, Kristi Stephenson, Jonathan Adkins, Jackie McDonald, Adrienne Helland, Megan Russell, Austin Christofferson, Lori Cuyugan, Dan Rohrer, Alex Blanski, Meghan Hodges, Mmrf CoMMpass Network, Mary Derome, Daniel Auclair, PhD, Pamela G. Kidd, MD, Scott Jewell, PhD, David Craig, PhD, John Carpten, PhD, Jonathan J. Keats, PhD

See 5 usage examples →

Community Earth System Model Large Ensemble (CESM LENS)

atmosphereclimateclimate modelgeospatialicelandmodeloceanssustainabilityzarr

The Community Earth System Model (CESM) Large Ensemble Numerical Simulation (LENS) dataset includes a 40-member ensemble of climate simulations for the period 1920-2100 using historical data (1920-2005) or assuming the RCP8.5 greenhouse gas concentration scenario (2006-2100), as well as longer control runs based on pre-industrial conditions. The data comprise both surface (2D) and volumetric (3D) variables in the atmosphere, ocean, land, and ice domains. The total data volume of the original dataset is ~500TB, which has traditionally been stored as ~150,000 individual CF/NetCDF files on disk o...

Usage examples

Analyzing large climate model ensembles in the cloud by Joe Hamman, NCAR
Rendered (static) version of Jupyter Notebook by Anderson Banihirwe, NCAR
Jupyter Notebook and other documentation and tools for CESM LENS on AWS by NCAR Science at Scale team
The Community Earth System Model (CESM) Large Ensemble Project: A Community Resource for Studying Climate Change in the Presence of Internal Climate Variability by Kay et al. (2015), Bull. AMS, 96, 1333-1349
Urban Climate Explorer by Zhonghua Zheng

See 5 usage examples →

Daylight Map Distribution of OpenStreetMap

disaster responsegeospatialmappingosm

Daylight is a complete distribution of global, open map data that’s freely available with support from community and professional mapmakers. Meta combines the work of global contributors to projects like OpenStreetMap with quality and consistency checks from Daylight mapping partners to create a free, stable, and easy-to-use street-scale global map.

The Daylight Map Distribution contains a validated subset of the OpenStreetMap database. In addition to the standard OpenStreetMap PBF format, Daylight is available in two parquet formats that are optimized for AWS Athena including geometries (Points, LineStrings, Polygons, or Multi...

Usage examples

Querying the Daylight OpenStreetMap Distribution with Amazon Athena by Jennings Anderson and Mike Jeffe
Daylight Earth Tables (NACIS 2022) by Jennings Anderson, Jonah Adkins, Jacob Wasserman
Loading the Daylight Map Distribution OpenStreetMap Features into AWS Athena by Jennings Anderson
Daylight Earth Tables by Daylight Contributors
Increasing OpenStreetMap Data Accessibility with the Analysis-Ready Daylight Distribution of OpenStreetMap: A Demonstration of Cloud-Based Assessments of Global Building Completeness by Jennings Anderson and Timmera Whaley Omidire

See 5 usage examples →

EEGDash on AWS

deep learninglife sciencesmachine learningneuroimagingneuroscience

The EEG-DaSh (EEG Data Sharing) data archive is a large-scale data-sharing resource for magnetoencephalography and electroencephalography (MEEG) data hosted at the Swartz Center for Computational Neuroscience (SCCN), UC San Diego. It provides curated, BIDS-formatted datasets for neuroscience research, machine learning, and deep learning applications. The archive spans three S3 buckets: (1) the EEGDash bucket for data served through the EEGDash platform, (2) the NEMAR archive containing datasets contributed through the NEMAR (Neuroelectromagnetic Data Archive and Tools Resource) platform, which...

Usage examples

Deep Learning on EEGDash Data example by SCCN
EEGLAB - An open source environment for electrophysiological signal processing by SCCN
eegdash on pypi.python.org - Python module to query and download EEGDash data from Amazon S3 by Young Truong
NEMAR - Neuroelectromagnetic Data Archive and Tools Resource by SCCN
NEMAR: an open access data, tools, and compute resource operating on neuroelectromagnetic data by A. Delorme, D. Truong, C. Youn, T. Mullen, A. Smetanin, S. Bhatt, S. Makeig

See 5 usage examples →

ESA WorldCover Sentinel-1 and Sentinel-2 10m Annual Composites

agriculturecogdisaster responseearth observationgeospatialland coverland usemachine learningmappingnatural resourcesatellite imagerystacsustainabilitysynthetic aperture radar

The WorldCover 10m Annual Composites were produced, as part of the European Space Agency (ESA) WorldCover project, from the yearly Copernicus Sentinel-1 and Sentinel-2 archives for both years 2020 and 2021. These global mosaics consists of four products composites. A Sentinel-2 RGBNIR yearly median composite for bands B02, B03, B04, B08. A Sentinel-2 SWIR yearly median composite for bands B11 and B12. A Sentinel-2 NDVI yearly percentiles composite (NDVI 90th, NDVI 50th NDVI 10th percentiles). A Sentinel-1 GAMMA0 yearly median composite for bands VV, VH and VH/VV (power scaled). Each product is...

Usage examples

ESA WorldCover 10 m 2021 v200 by Zanaga, D., Van De Kerchove, R.,Daems, D.,De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S., Lesiv, M., Herold, M., Tsendbazar, N.E., Xu, P., Ramoino, F., Arino, O.
WorldCover Viewer by VITO
Release of the 10 m WorldCover map by Ruben Van De Kerchove
Exploring the datasets by VITO
ESA WorldCover 10 m 2021 v200 - Product User Manual by VITO

See 5 usage examples →

End-Use Load Profiles for the U.S. Building Stock

citiesclimateenergyenergy modelinggeospatialmetadatamodelopen source softwaresustainabilityutilities

The U.S. Department of Energy (DOE) funded a three-year project, End-Use Load Profiles for the U.S. Building Stock, that culminated in this publicly available dataset of calibrated and validated 15-minute resolution load profiles for all major residential and commercial building types and end uses, across all climate regions in the United States. These EULPs were created by calibrating the ResStock and ComStock physics-based building stock models using many different measured datasets, as described here. This dataset includes load profiles for both the baseline building stock and the building ...

Usage examples

Web-Based Data Viewer for Commercial Load Profiles by National Renewable Energy Laboratory (NREL)
End-Use Load Profiles for the U.S. Building Stock by E. Wilson, A. Parker, A. Fontanini, et al.
BuildStockQuery: a python library designed to simplify and streamline the process of querying datasets generated by ResStock and ComStock by National Renewable Energy Laboratory (NREL)
Tutorials, how-to guides, explanations, and reference material for users of the ComStock dataset. by National Renewable Energy Laboratory (NREL)
Web-Based Data Viewer for Residential Load Profiles by National Renewable Energy Laboratory (NREL)

See 5 usage examples →

GPM IMERG Early Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHHE) at GES DISC

atmosphereclimatedatacenterforecastglobalhdfhydrologylandmetadataopendapradarwater

Version 07B is the current version of the IMERG data sets. Older versions will no longer be available and have been superseded by Version 07.The Integrated Multi-satellitE Retrievals for GPM (IMERG) is the unified U.S. algorithm that provides the multi-satellite precipitation product for the U.S. GPM team.The precipitation estimates from the various precipitation-relevant satellite passive microwave (PMW) sensors comprising the GPM constellation are computed using the 2021 version of the Goddard Profiling Algorithm (GPROF2021), then gridded, intercalibrated to the GPM Combined Ku Radar-Radiome...

Usage examples

Calculation of Gridded Precipitation Data for the Global Land-Surface Using In-Situ Gauge Observations by Rudolf, B., and U. Schneider
Precipitation Estimation from Remotely Sensed Imagery Using an Artificial Neural Network Cloud Classification System by Hong, Y., K. L. Hsu, S. Sorooshian, and X. Gao
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Kalman Filter Based CMORPH by Joyce, R. J., P. Xie, and J. E. Janowiak
How to Read IMERG Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.

See 5 usage examples →

GPM IMERG Final Precipitation L3 1 month 0.1 degree x 0.1 degree V07 (GPM_3IMERGM) at GES DISC

atmosphereclimatedatacenterforecastglobalhdfhydrologylandmetadataopendapradarwater

Version 07B is the current version of the IMERG data sets. Older versions will no longer be available and have been superseded by Version 07.The Integrated Multi-satellitE Retrievals for GPM (IMERG) is the unified U.S. algorithm that provides the multi-satellite precipitation product for the U.S. GPM team.The precipitation estimates from the various precipitation-relevant satellite passive microwave (PMW) sensors comprising the GPM constellation are computed using the 2021 version of the Goddard Profiling Algorithm (GPROF2021), then gridded, intercalibrated to the GPM Combined Ku Radar-Radiome...

Usage examples

How to Read IMERG Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Calculation of Gridded Precipitation Data for the Global Land-Surface Using In-Situ Gauge Observations by Rudolf, B., and U. Schneider
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Kalman Filter Based CMORPH by Joyce, R. J., P. Xie, and J. E. Janowiak
Precipitation Estimation from Remotely Sensed Imagery Using an Artificial Neural Network Cloud Classification System by Hong, Y., K. L. Hsu, S. Sorooshian, and X. Gao

See 5 usage examples →

GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHH) at GES DISC

atmosphereclimatedatacenterforecastglobalhdfhydrologylandmetadataopendapradarwater

Version 07B is the current version of the IMERG data sets. Older versions will no longer be available and have been superseded by Version 07.The Integrated Multi-satellitE Retrievals for GPM (IMERG) is the unified U.S. algorithm that provides the multi-satellite precipitation product for the U.S. GPM team.The precipitation estimates from the various precipitation-relevant satellite passive microwave (PMW) sensors comprising the GPM constellation are computed using the 2021 version of the Goddard Profiling Algorithm (GPROF2021), then gridded, intercalibrated to the GPM Combined Ku Radar-Radiome...

Usage examples

Precipitation Estimation from Remotely Sensed Imagery Using an Artificial Neural Network Cloud Classification System by Hong, Y., K. L. Hsu, S. Sorooshian, and X. Gao
Kalman Filter Based CMORPH by Joyce, R. J., P. Xie, and J. E. Janowiak
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
How to Read IMERG Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Calculation of Gridded Precipitation Data for the Global Land-Surface Using In-Situ Gauge Observations by Rudolf, B., and U. Schneider

See 5 usage examples →

GPM IMERG Late Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHHL) at GES DISC

atmosphereclimatedatacenterforecastglobalhdfhydrologylandmetadataopendapradarwater

Version 07B is the current version of the IMERG data sets. Older versions will no longer be available and have been superseded by Version 07.\n\nThe Integrated Multi-satellitE Retrievals for GPM (IMERG) is the unified U.S. algorithm that provides the multi-satellite precipitation product for the U.S. GPM team.\n\nThe precipitation estimates from the various precipitation-relevant satellite passive microwave (PMW) sensors comprising the GPM constellation are computed using the 2021 version of the Goddard Profiling Algorithm (GPROF2021), then gridded, intercalibrated to the GPM Combined Ku Radar...

Usage examples

How to Read IMERG Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Kalman Filter Based CMORPH by Joyce, R. J., P. Xie, and J. E. Janowiak
Precipitation Estimation from Remotely Sensed Imagery Using an Artificial Neural Network Cloud Classification System by Hong, Y., K. L. Hsu, S. Sorooshian, and X. Gao
Calculation of Gridded Precipitation Data for the Global Land-Surface Using In-Situ Gauge Observations by Rudolf, B., and U. Schneider
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.

See 5 usage examples →

Global 30m Height Above Nearest Drainage (HAND)

agriculturecogdisaster responseelevationgeospatialhydrologysatellite imagerystac

Height Above Nearest Drainage (HAND) is a terrain model that normalizes topography to the relative heights along the drainage network and is used to describe the relative soil gravitational potentials or the local drainage potentials. Each pixel value represents the vertical distance to the nearest drainage. The HAND data provides near-worldwide land coverage at 30 meters and was produced from the 2021 release of the Copernicus GLO-30 Public DEM as distributed in the Registry of Open Data on AWS.

Usage examples

See 5 usage examples →

Global Seasonal Sentinel-1 Interferometric Coherence and Backscatter Data Set

agriculturecogearth observationearthquakesecosystemsenvironmentalgeologygeophysicsgeospatialglobalinfrastructuremappingnatural resourcesatellite imagerysynthetic aperture radarurban

This data set is the first-of-its-kind spatial representation of multi-seasonal, global SAR repeat-pass interferometric coherence and backscatter signatures. Global coverage comprises all land masses and ice sheets from 82 degrees northern to 79 degrees southern latitude. The data set is derived from high-resolution multi-temporal repeat-pass interferometric processing of about 205,000 Sentinel-1 Single-Look-Complex data acquired in Interferometric Wide-Swath mode (Sentinel-1 IW mode) from 1-Dec-2019 to 30-Nov-2020. The data set was developed by Earth Big Data LLC and Gamma Remote Sensing AG, ...

Usage examples

Global seasonal Sentinel-1 interferometric coherence and backscatter data set by Josef Kellndorfer, Oliver Cartus, Marco Lavalle, Christophe Magnard, Pietro Milillo, Shadi Oveisgharan, Batu Osmanoglu, Paul A. Rosen, Urs Wegmüller
Jupyter Notebook to access and visualize sub regions of the global data set by Josef Kellndorfer
Generating Global Temporal Coherence Maps from one year of Sentinel-1 C-band data, ESA Fringe 2021 Poster (Youtube) by Oliver Cartus, Josef Kellndorfer, Shadi Oveisgharan, Batu Osmanoglu, Paul Rosen, Urs Wegmüller
Webinar: The new era of SAR Time Series Analysis and Visualization: Cloud meets Big SAR Data. IEEE GRSS Bay Area Chapter (Dec. 3rd 2021) by Josef Kellndorfer
Jupyter Notebook to access and visualize global mosaics of the global data set by Josef Kellndorfer

See 5 usage examples →

High resolution, annual cropland and landcover maps for selected African countries

agriculturecogdeep learninglabeledland covermachine learningsatellite imagery

High resolution, annual cropland and landcover maps for selected African countries developed by Clark University's Agricultural Impacts Research Group using various machine learning approaches applied to Planet imagery, including field boundary and cultivated frequency maps, as well as multi-class land cover.

Usage examples

Accessing and downloading data by Rahebe Abedi
A super-ensemble approach to map land cover types with high resolution over data-sparse African savanna landscapes by Song et al. (2023)
High resolution, annual maps of field boundaries for smallholder-dominated croplands at national scales by Estes et al. (2022)
Final report-Phase 2: Creating next generation field boundary and crop type maps: Rigorous multi-scale groundtruth provides sustainable extension services for smallholders by Wussah et al (2022)
Final report-Phase 1: Creating open agricultural maps and ground truth data to better deliver farm extension services by Estes et al (2022)

See 5 usage examples →

IBL Neuropixels Brainwide Map on AWS

autism spectrum disorderlife sciencesMus musculusneurophysiologyneuroscienceopen source software

Electrophysiological recordings of mouse brain activity acquired during a decision making task in multiple autism mice models.

Usage examples

See 5 usage examples →

JMA Himawari-8/9

agriculturedisaster responseearth observationgeospatialmeteorologicalsatellite imageryweather

Himawari-9, stationed at 140.7E, owned and operated by the Japan Meteorological Agency (JMA), is a geostationary meteorological satellite, with Himawari-8 as on-orbit back-up, that provides constant and uniform coverage of east Asia, and the west and central Pacific regions from around 35,800 km above the equator with an orbit corresponding to the period of the earth’s rotation. This allows JMA weather offices to perform uninterrupted observation of environmental phenomena such as typhoons, volcanoes, and general weather systems. Archive data back to July 2015 is available for Full Disk (AHI-L...

Usage examples

Introduction of Himawari-8/9 (pdf file) by JMA
Himawari-8 on AWS (pdf file) by ASDI
Himawari-8: Enabling access to key weather data by Manan Dalal, Jena Kent
Identifying the Causes of Pyrocumulonimbus (PyroCb) by Emiliano Díaz Salas-Porras, Kenza Tazi, Ashwin Braude, Daniel Okoh, Kara D. Lamb, Duncan Watson-Parris, Paula Harder, and Nis Meinert
Himawari-8 Advanced Himawari Imager Data on AWS (pdf file) by NOAA NESDIS

See 5 usage examples →

MONKEY

cancerclassificationcomputational pathologycomputer visiondeep learningdigital pathologygrand-challenge.orghistopathologyimaginglife sciencesmachine learningmedical image computingmedical imaging

This dataset contains the training data for the Machine learning for Optimal detection of iNflammatory cells in the KidnEY or MONKEY challenge. The MONKEY challenge focuses on the automated detection and classification of inflammatory cells, specifically monocytes and lymphocytes, in kidney transplant biopsies using Periodic acid-Schiff (PAS) stained whole-slide images (WSI). It contains 80 WSI, collected from 4 different pathology institutes, with annotated regions of interest. For each WSI up to 3 different PAS scans and one IHC slide scan are available. This dataset and challenge support th...

Usage examples

See 5 usage examples →

Meta-Organized Stimuli And fMRI Imaging data for Computational modeling (MOSAIC)

brain imagesbrain modelshdf5machine learningneuroimagingneuroscience

This extensible dataset, MOSAIC, aggregates individual functional magnetic resonance imaging (fMRI) datasets by leveraging a shared preprocessing pipeline and stimulus curation procedure. This dataset aggregation procedure achieves the scale necessary for neural network training and the diversity needed for generalizable results.

Usage examples

Run a synthetic localizer experiment using MOSAIC's brain-optimized models (Jupyter notebook) by Benjamin Lahner
MOSAIC Python package (mosaic-dataset) by Mayukh Deb
Preprocess fMRI datasets with MOSAIC shared pipeline by Benjamin Lahner
Load HDF5 file (Jupyter notebook) by Benjamin Lahner
Download MOSAIC data, visualize fMRI responses, load and run brain-optimized models (Jupyter notebook) by Mayukh Deb

See 5 usage examples →

NASA / USGS Lunar Orbiter Laser Altimeter Cloud Optimized Point Cloud

elevationlidarplanetarystac

The lunar orbiter laser altimeter (LOLA) has collected and released almost 7 billion individual laser altimeter returns from the lunar surface. This dataset includes individual altimetry returns scraped from the Planetary Data System (PDS) LOLA Reduced Data Record (RDR) Query Tool, V2.0. Data are organized in 15˚ x 15˚ (longitude/latitude) sections, compressed and encoded into the Cloud Optimized Point Cloud (COPC) file format, and collected into a Spatio-Temporal Asset Catalog (STAC) collection for query and analysis. The data are in latitude, longitude, and radius (X, Y, Z) format with the p...

Usage examples

See 5 usage examples →

NASA Earth Exchange (NEX) Data Collection

climateCMIP5natural resourcesustainability

A collection of downscaled climate change projections, derived from the General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al. 2012] and across the four greenhouse gas emissions scenarios known as Representative Concentration Pathways (RCPs) [Meinshausen et al. 2011]. The NASA Earth Exchange group maintains the NEX-DCP30 (CMIP5), NEX-GDDP (CMIP5), and LOCA (CMIP5).NOTE: The S3 Bucket location for this dataset changed on 5/6/2026

Usage examples

Explainable deep learning for insights in El Nino and river flows by Yumin Liu, Kate Duffy, Jennifer G. Dy, and Auroop R. Ganguly
Statistical downscaling using Localized Constructed Analogs (LOCA). by Pierce, D. W., D. R. Cayan, and B. L. Thrasher (2014)
Potential changes in cooling degree day under different global warming levels and shared socioeconomic pathways in West Africa by Oluwarotimi Delano Thierry Odou, Heidi Heinrichs Ursula, Rabani Adamou, Thierry Godjo, and Mounkaila S Moussa
Downscaled Climate Projections Suitable for Resource Management, Eos Trans. AGU, 94(37), 321. by Thrasher, B., J. Xiong, W. Wang, F. Melton, A. Michaelis and R. Nemani (2013)
Climate Downscaling Using YNet: a Deep Convolutional Network with Skip Connections and Fusion by Yumin Liu, Auroop Ganguly, and Jennifer Dy

See 5 usage examples →

NIH NCBI Sequence Read Archive (SRA) on AWS

bamcramfastqgeneticgenomiclife sciencesSTRIDEStranscriptomicswhole exome sequencingwhole genome sequencing

The Sequence Read Archive (SRA), produced by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) at the National Institutes of Health (NIH), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms. The SRA provides open access to these biological sequence data to support the research community's efforts to enhance reproducibility and make new discoveries by comparing data sets. Buckets in this registry contain public SRA data in the original (user submitted) format from select high value and newly-rel...

Usage examples

See 5 usage examples →

NOAA National Air Quality Forecast Capability (NAQFC) Regional Model Guidance

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

The National Air Quality Forecasting Capability (NAQFC) dataset contains model-generated air quality (AQ) forecast guidance from three different prediction systems. The first system is a coupled weather and atmospheric chemistry numerical forecast model, known as the Air Quality Model (AQM). It is used to produce forecast guidance for ozone (O3) and particulate matter that is less than or equal to 2.5 micrometers in diameter (PM2.5). Prior to May 14, 2024, AQM predictions were derived using the EPA’s Community Multiscale Air Quality (CMAQ) model, driven by meteorological fields from NCEP’s operational weather forecast models, ...

Usage examples

Improving NOAA NAQFC PM2.5 predictions with a bias correction approach (2017, Wea. and Forecasting, 32(2), 407–421) by Huang, J., McQueen, J., Wilczak, J., Djalalova, I., Stajner, I., Shafran, P., Allured, D., Lee, P., Pan, L., Tong, D., Huang, H.-C., DiMego, G., Upadhayay, S., & Delle Monache, L
Development and evaluation of an advanced National Air Quality Forecasting Capability using the NOAA Global Forecast System version 16 (2022, Geosci. Model Dev., 15, 3281–3313) by Campbell, P.C., and Coauthors
Development of the next-generation air quality prediction system in the UFS framework: Enhancing predictability of wildfire air quality impacts (2024)(Bull. Amer. Meteor. Soc. In review) by Huang, J.P., I. Stajner, R. Montuoro, F. Yang, K. Wang, H.-C. Huang, C.-H. Jeon, B. Curtis, J. McQueen, H. Liu, B. Baker, D. Tong , Y. Tang, P. Campbell, G. Grell, G. Frost, R. Schwantes, S. Wang, S. Kondragunta, F. Li, and Y. Jung
Using VIIRS fire radiative power data to simulate biomass burning emissions, plume rise and smoke transport in a real-time air quality modeling system (Proc. 2017 IEEE Int. Geoscience and Remote Sensing Symp. (IGARSS0),Fort Worth, TX, IEEE, 2806–2808) by Ahmadov, R., and Coauthors
An empirically derived emission algorithm for wind-blown dust (J.Geophys. Res., 115, D16212) by Draxler, R. R., P. Ginoux, and A. F. Stein

See 5 usage examples →

New Zealand Coastal Elevation

coastalcogearth observationelevationgeospatiallidarstac

The New Zealand Coastal Elevation dataset consists of New Zealand's publicly owned coastal digital elevation models, which are freely available to use under an open licence. The data consists of bare earth (DEM) data that traverses the coastal zone, including the seabed down to approximately 25m in depth. Data is provided as nationally consistent 1m resolution tiles derived from LiDAR surveys.All of the coastal elevation files are Cloud Optimised GeoTIFFs using LERC compression for the main grid and LERC compression with lower max_z_error for the overviews. These elevation files are accomp...

Usage examples

See 5 usage examples →

Normalized Difference Urban Index (NDUI)

earth observationgeospatialsatellite imageryurban

NDUI is combined with cloud shadow-free Landsat Normalized Difference Vegetation Index (NDVI) composite and DMSP/OLS Night Time Light (NTL) to characterize global urban areas at a 30 m resolution,and it can greatly enhance urban areas, which can then be easily distinguished from bare lands including fallows and deserts. With the capability to delineate urban boundaries and, at the same time, to present sufficient spatial details within urban areas, the NDUI has the potential for urbanization studies at regional and global scales.

Usage examples

An example of using ndui data with AWS sagemaker tools by Yifang Wang
Building a Better Urban Picture:Combining Day and Night Remote Sensing Imagery by Qingling Zhang and Bin Li and David Thau & Rebecca Moore
Automated extraction of urban built-up areas with NDUI using Python and Google Earth Engine by Yifang Wang
Global DMSP images Correction by Yifang Wang
A Robust Method to Generate a Consistent Time Series From DMSP/OLS Nighttime Light Data by Qingling Zhang and Bhartendu Pandey and Keren C.Seto

See 5 usage examples →

OME-Zarr Open SciVis Datasets

biologycomputed tomographyimage processingimaginglife sciencesmagnetic resonance imagingneuroimagingneurosciencevolumetric imagingzarr

This project provides the Open SciVis Datasets in a chunked, highly-compressed, multi-scale format, encodes metadata in JSON according to the OME-Zarr specification, and hosts the datasets on AWS S3 through the AWS Open Data Program, aiming to serve as a web-based resource for the scientific visualization community to enhance reproducibility and facilitate testing and development of OME-Zarr tools.

Usage examples

OME-NGFF: a next-generation file format for expanding bioimaging data-access strategies by Josh Moore, Chris Allan, Sébastien Besson, Jean-Marie Burel, Erin Diel, David Gault, Kevin Kozlowski, Dominik Lindner, Melissa Linkert, Trevor Manz, Will Moore, Constantin Pape, Christian Tischer & Jason R. Swedlow
Open SciVis Datasets by Pavol Klacansky
OME-Zarr: a cloud-optimized bioimaging file format with international community support by Josh Moore, Daniela Basurto-Lozada, Sébastien Besson, John Bogovic, Jordão Bragantini, Eva M. Brown, Jean-Marie Burel, Xavier Casas Moreno, Gustavo de Medeiros, Erin E. Diel, David Gault, Satrajit S. Ghosh, Ilan Gold, Yaroslav O. Halchenko, Matthew Hartley, Dave Horsfall, Mark S. Keller, Mark Kittisopikul, Gabor Kovacs, Aybüke Küpcü Yoldaş, Koji Kyoda, Albane le Tournoulx de la Villegeorges, Tong Li, Prisca Liberali, Dominik Lindner, Melissa Linkert, Joel Lüthi, Jeremy Maitin-Shepard, Trevor Manz, Luca Marconato, Matthew McCormick, Merlin Lange, Khaled Mohamed, William Moore, Nils Norlin, Wei Ouyang, Bugra Özdemir, Giovanni Palla, Constantin Pape, Lucas Pelkmans, Tobias Pietzsch, Stephan Preibisch, Martin Prete, Norman Rzepka, Sameeul Samee, Nicholas Schaub, Hythem Sidky, Ahmet Can Solak, David R. Stirling, Jonathan Striebel, Christian Tischer, Daniel Toloudis, Isaac Virshup, Petr Walczysko, Alan M. Watson, Erin Weisbart, Frances Wong, Kevin A. Yamauchi, Omer Bayraktar, Beth A. Cimini, Nils Gehlenborg, Muzlifah Haniffa, Nathan Hotaling, Shuichi Onami, Loic A. Royer, Stephan Saalfeld, Oliver Stegle, Fabian J. Theis & Jason R. Swedlow
Read and Visualize in Python by Matt McCormick
A list of tools and libraries with OME-Zarr support by NGFF community

See 5 usage examples →

OPERA Dynamic Surface Water Extent from Harmonized Landsat Sentinel-2 product (Version 1)

cogdatacenterearth observationicelandland covermetadatasurface waterwater

This dataset contains Level-3 Dynamic OPERA surface water extent product version 1. The data are validated surface water extent observations beginning April 2023. Known issues and caveats on usage are described under Documentation. The input dataset for generating each product is the Harmonized Landsat-8 and Sentinel-2A/B/C (HLS) product version 2.0. HLS products provide surface reflectance (SR) data from the Operational Land Imager (OLI) aboard the Landsat 8 satellite and the MultiSpectral Instrument (MSI) aboard the Sentinel-2A/B/C satellite. The surface water extent products are distributed ov...

Usage examples

Improved Automated Detection of Subpixel-Scale Inundation—Revised Dynamic Surface Water Extent (DSWE) Partial Surface Water Tests by Jones, John W
Access DSWx-HLS S3 by M. Grace Bato
Working with OPERA Dynamic Surface Water Extent (DSWx) Data by Nicholas Tarpinian
Getting Started with OPERA DSWx Product by K. Devlin and M. Grace Bato
Stream and Viz DSWx-HLS via Direct HTTPS by M. Grace Bato

See 5 usage examples →

Open-Meteo Weather API Database

agricultureclimateearth observationmeteorologicalweather

Open-Meteo integrates weather models from reputable national weather services, offering a swift and efficient weather API. Real-time weather forecasts are unified into a time-series database that provides historical and future weather data for any location worldwide.Through Open-Meteo on AWS Open Data, you can download the Open-Meteo weather database and analysis weather data locally. Docker images are provided to download data and to expose an HTTP API endpoint. Using Open-Meteo SDKs, you can seamlessly integrate weather data into your Python, Typescript, Swift, Kotlin, or Java applications.T...

Usage examples

Zippenfenig, P. (2023). Open-Meteo.com Weather API [Computer software] by Patrick Zippenfenig
Accessing 80 Years Historical Weather Data from ERA5 by Open-Meteo
Run Your Own Weather API by Open-Meteo
Open-Meteo API source-code by Open-Meteo
Open-Meteo API documentation by Open-Meteo

See 5 usage examples →

OpenStreetMap on AWS

disaster responsegeospatialmappingosm

OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3 in both standard formats (OSM PBF, XML) and cloud-native formats optimized for analytics workloads.

Usage examples

See 5 usage examples →

Overture Maps Foundation Open Map Data

geospatialglobalmappingosmparquettransportation

Overture is a collaboratively built, global, open map data project for developers who build map services or use geospatial data. Overture Open Map Data contains data that are standardized under the themes of Admins, Base, Buildings, Places, and Transportation. Overture also includes a Global Entity Reference System (GERS) which encodes map data to a shared universal reference. Beginning with the Overture 2023-11-14-alpha.0 release, the data is available as cloud-native GeoParquet files.

Usage examples

Accessing Overture Maps Data by Overture Maps Foundation
Overture Data Schema by Overture Maps Foundation
Building Heights: From open USGS lidar to open Overture maps by Overture Maps Foundation
Working With Overture Data: A Step-by-Step Guide by Jennings Anderson
Global Entity Reference System by Overture Maps Foundation

See 5 usage examples →

Ozone Monitoring Instrument (OMI) / Aura NO2 Tropospheric Column Density

air qualityatmosphereearth observationenvironmentalgeospatialsatellite imagery

NO2 tropospheric column density, screened for CloudFraction < 30% global daily composite at 0.25 degree resolution for the temporal range of 2004 to May 2020. Original archive data in HDF5 has been processed into a Cloud-Optimized GeoTiff (COG) format. Quality Assurance - This data has been validated by the NASA Science Team at Goddard Space Flight Center.Cautionary Note: https://airquality.gsfc.nasa.gov/caution-interpretation.

Usage examples

See 5 usage examples →

Prefeitura Municipal de São Paulo (PMSP) LiDAR Point Cloud

citieselevationgeospatiallandlidarmappingurban

The objective of the Mapa 3D Digital da Cidade (M3DC) of the São Paulo City Hall is to publish LiDAR point cloud data. The initial data was acquired in 2017 by aerial surveying and future data will be added. This publicly accessible dataset is provided in the Entwine Point Tiles format as a lossless octree, full density, based on LASzip (LAZ) encoding.

Usage examples

Entwine by Hobu, Inc.
Describing the Vertical Structure of Informal Settlements on the Basis of LiDAR Data – A Case Study for Favelas (Slums) in Sao Paulo City by S. C. L. Ribeiro, M. Jarzabek-Rychard, J. P. Cintra, H.-G. Maas
PDAL - Point Data Abstraction Library by PDAL Contributors
LAStools by rapidlasso GmbH, GERMANY
Fusion by US Department of Agriculture - Forest Service

See 5 usage examples →

Protein Data Bank 3D Structural Biology Data

amino acidarchivesbioinformaticsbiomolecular modelingcell biologychemical biologyCOVID-19electron microscopyelectron tomographyenzymelife sciencesmoleculenuclear magnetic resonancepharmaceuticalproteinprotein templateSARS-CoV-2structural biologyx-ray crystallography

The "Protein Data Bank (PDB) archive" was established in 1971 as the first open-access digital data archive in biology. It is a collection of three-dimensional (3D) atomic-level structures of biological macromolecules (i.e., proteins, DNA, and RNA) and their complexes with one another and various small-molecule ligands (e.g., US FDA approved drugs, enzyme co-factors). For each PDB entry (unique identifier: 1abc or PDB_0000001abc) multiple data files contain information about the 3D atomic coordinates, sequences of biological macromolecules, information about any small molecules/ligan...

Usage examples

Protein Data Bank: the single global archive for 3D macromolecular structure data by wwPDB consortium
PDB 101 by RCSB PDB
Get to Know a Dataset: Protein Data Bank 3D Structural Biology Data by RCSB PDB
Announcing the worldwide Protein Data Bank by Berman, H., Henrick, K. & Nakamura, H.
File Download Services by RCSB PDB

See 5 usage examples →

RACECAR Dataset

autonomous racingautonomous vehiclescomputer visionGNSSimage processinglidarlocalizationobject detectionobject trackingperceptionradarrobotics

The RACECAR dataset is the first open dataset for full-scale and high-speed autonomous racing. Multi-modal sensor data has been collected from fully autonomous Indy race cars operating at speeds of up to 170 mph (273 kph). Six teams who raced in the Indy Autonomous Challenge during 2021-22 have contributed to this dataset. The dataset spans 11 interesting racing scenarios across two race tracks which include solo laps, multi-agent laps, overtaking situations, high-accelerations, banked tracks, obstacle avoidance, pit entry and exit at different speeds. The data is organized and released in bot...

Usage examples

RACECAR Tutorials - ROS2 Localization by Amar Kulkarni
RACECAR--The Dataset for High-Speed Autonomous Racing by Amar Kulkarni, John Chrosniak, Emory Ducote, Florian Sauerbeck, Andrew Saba, Utkarsh Chirimar, John Link, Marcello Cellina, and Madhur Behl
RACECAR Tutorials - ROS2 Visualization by Amar Kulkarni, Utkarsh Chirimar
RACECAR Tutorials - nuScenes by John Chrosniak
rosbag2nuscenes conversion library by John Chrosniak, Emory Ducote, John Link, Madhur Behl

See 5 usage examples →

SPARC: Datasets bridging the body and the brain

bioinformaticselectrophysiologylife sciencesmicroscopyneurophysiologyneuroscience

The SPARC Datasets comprise a collection of scientific data that is focused on bridging the body and the brain. The datasets focus on neural connectivity, organ innervation and detailed anatomical mapping of the peripheral nervous system. SPARC datasets distinguish themselves from other data resources through its multi-modal approach to scientific data and integrates molecular, imaging, timeseries and other datatypes associated with the interaction between the peripheral nervous system and organs. SPARC data provides a unique integrated effort to develop next generation mapping of anatomical ...

Usage examples

The SPARC DRC: Building a Resource for the Autonomic Nervous System Community by Osanlouy M, Bandrowski A, de Bono B, Brooks D, Cassara A, Christie R, Ebrahimi N, Gillespie T, Grethe J, Guercio L, Heal M, Lin M, Kuster N, Martone M, Neufeld E, Nickerson D, Soltani E, Tappan S, Wagenaar J, Zhuang K, Hunter P
Downloading large scale SPARC datasets by The SPARC Data and Resource Center
Using sparc.client for data movement in SPARC by The SPARC Data and Resource Center
The Pennsieve Data Management Platform by Joost Wagenaar
SODA Tool by Bhavesh Patel

See 8 usage examples →

SPARTAN Data

air qualityenvironmental

SPARTAN (Surface PARTiculate mAtter Network) measures and provides surface ambient particulate matter (PM2.5 and PM10) concentration and the chemical composition around the world, with the purpose of connecting ground-based PM2.5 and satellite remote sensing.

Usage examples

SPARTAN: A global network to evaluate and enhance satellite-based estimates of ground-level particulate matter for global health applications by Graydon Snider, et al.
Tutorial on using SPARTAN data on AWS by Haihui Zhu
A Global-Scale Mineral Dust Equation by Xuan Liu, et al.
Elemental Characterization of Ambient Particulate Matter for a Globally Distributed Monitoring Network: Methodology and Implications by Xuan Liu, et al.
Variation in global chemical composition of PM2.5: emerging results from SPARTAN by Graydon Snider, et al.

See 5 usage examples →

Sanborn Maps Data Package

archivescitiescomputer visionconservationcultural preservationculturedemographicsdigital assetsgeospatialhistoryhousingland usemappingurban

The dataset contains metadata records for 50,600 maps from the Sanborn Fire Insurance Maps collection and their corresponding 440,048 JPEG images. The Sanborn collection at Library of Congress includes over fifty thousand editions of fire insurance maps comprising almost seven hundred thousand individual sheets. The Library of Congress holdings represent the largest extant collection of maps produced by the Sanborn Map Company.

Usage examples

Introduction to the Collection by Walter W. Ristow
Sanborn Atlas Volume Finder by Julie Stoner and Meagan Snow, Geography and Map Division, Library of Congress
Fire Insurance Maps at the Library of Congress: A Resource Guide by Julie Stoner, Reference Librarian, Geography and Map Division, Library of Congress
Sanborn Map Data Python Tutorial (Jupyter notebook) by Library of Congress
README data cover sheet by Library of Congress

See 5 usage examples →

Sea Around Us Global Fisheries Catch Data

biodiversityecosystemsfisheriesmarine

The project presents Sea Around Us Global Fisheries Catch Data aggregated at EEZ level. The data are computed from reconstructed catches from various official fisheries statistics, scientific, technical and policy reports about the fisheries, and includes estimation of discards, unreported and illegal catch data from all maritime countries and major territories of the world.This project was the result of a work between Sea Around Us and the CIC programme, a collaborative programme between the University of British Columbia (UBC) and AWS.

Usage examples

See 5 usage examples →

Sofar Spotter Archive

climateenvironmentalmeteorologicaloceansoceanssustainabilityweather

This dataset includes archival hourly data from the [Sofar Spotter buoy global network] (https://weather.sofarocean.com/) from 2019 to March 2022.

Usage examples

Exploring Bulk Variables from the Spotter Archive by Isabel A. Houghton
Analyzing Spotter data with CloudDrift by Milan Curcic
Performance Characteristics of “Spotter,” a Newly Developed Real-Time Wave Measurement Buoy (2019) by K. Raghukumar, G. Chang, F. Spada, C. Jones, T. Janssen, A. Gans
Performance Statistics of a Real-Time Pacific Ocean Weather Sensor Network (2021) by I. Houghton, P. Smit, D. Clark, C. Dunning, A. Fisher, N. Nidzieko, P. Chamberlain, T. Janssen
Exploring Wave Spectra Variables from the Spotter Archive by Isabel A. Houghton

See 5 usage examples →

SondeHub Radiosonde Telemetry

climateenvironmentalGPSweather

SondeHub Radiosonde telemetry contains global radiosonde (weather balloon) data captured by SondeHub from our participating radiosonde_auto_rx receiving stations. radiosonde_auto_rx is a open source project aimed at receiving and decoding telemetry from airborne radiosondes using software-defined-radio techniques, enabling study of the telemetry and sometimes recovery of the radiosonde itself. Currently 313 receiver stations are providing data for an average of 384 radiosondes a day. The data within this repository contains received telemetry frames, including radiosonde type, gps position, a...

Usage examples

STM32 Development Boards (literally) Falling From The Sky (How to submit data) by Mark Jessop & Michaela Wheeler
Loading example notebooks into SageMaker by Michaela Wheeler
Using pysondehub to read radiosonde data by Michaela Wheeler
pysondehub by Sondehub
Using Athena to read radiosonde data by Michaela Wheeler

See 5 usage examples →

The Human Connectome Project

biologyimaginglife sciencesneurobiologyneuroimagingneuroscience

The Human Connectome Project (HCP Young Adult, HCP-YA) is mapping the healthy human connectome by collecting and freely distributing neuroimaging and behavioral data on 1,200 normal young adults, aged 22-35.

Usage examples

The Human Connectome Project: A retrospective by Elam JS, Glasser MF, Harms MP, Sotiropoulos SN, Andersson JL, Burgess GC, Curtiss SW, et al.
The minimal preprocessing pipelines for the Human Connectome Project by Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, Andersson JL, Xu J, Jbabdi S, et al.
The Human Connectome Workbench by The Human Connectome Project
The WU-Minn Human Connectome Project: an overview. by Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil, K, and the WU-Minn HCP Consortium.
Exploring the Human Connectom by The Human Connectome Project

See 5 usage examples →

Vermont Open Geospatial on AWS

aerial imageryearth observationelevationgeospatialland coverlidar

The State of Vermont has partnered with Amazon's Open Data Initative to make a wide range of geospatial data available in the public domain. Vermont acquires aerial imagery and LiDAR during leaf-off conditions. The imagery typically ranges from 30-centimeter to 15-centimeter in resolution and is available from Vermont's Amazon S3 bucket in a Cloud Optimized GeoTiff (COG) format. LiDAR data has been acquired and is available as USGS Quality Level-1 (QL1) and Level-2 (QL2) compliant datasets in COG format. Geospatial datasets derived from imagery and/or lidar are also available as COGs, ...

Usage examples

Imagery Program FAQs by Vermont Center for Geographic Information
Vermont AWS S3 Open Data Bucket Browser - Elevation by Vermont Center for Geographic Information
Vermont AWS S3 Open Data Bucket Browser - Landcover by Vermont Center for Geographic Information
Imagery Page - Vermont Open Geodata Portal by Vermont Center for Geographic Information
Elevation Page - Vermont Open Geodata Portal by Vermont Center for Geographic Information

See 8 usage examples →

A region-wide, multi-year set of crop field boundary labels for Africa

agriculturecoglabeledland covermachine learningsatellite imagery

Crop field boundaries digitized in Planet imagery collected across Africa between 2017 and 2023, developed by Farmerline, Spatial Collective, and the Agricultural Impacts Research Group at Clark University, with support from the Lacuna Fund (Estes et al, 2024; Details →

Usage examples

Generalization enhancement strategies to enable cross-year cropland mapping with convolutional neural networks trained using historical samples by Khallaghi et al. (2025)
Instructions on data access and label-making demonstration notebook by Lyndon Estes
A platform for crowdsourcing the creation of representative, accurate landcover maps by Estes et al. (2016)
Technical report on label develop and processing by Wussah et al. (2023)
A region-wide, multi-year set of crop field boundary labels for Africa by Wussah et al. (2023)

See 7 usage examples →

APEX-CONNECTS

analysis ready databrain imagesbrain modelsimaginginfrastructurejsonlife sciencesmachine learningmetadatamicroscopyneuroimagingneuroscienceniftizarr

The BRAIN Initiative Connectivity Across Scales (CONNECTS) program is working to create detailed maps of brain wiring across different species and scales, using advanced imaging technologies. APEX supports this effort by serving as a central hub that brings together and coordinates data and tools from research focused on brain connectivity in humans and animals. Together, these efforts aim to improve our understanding of how the brain is structured and functions.

Usage examples

See 4 usage examples →

ASKAP Radio Telescope

archivesastronomy

ASKAP is the CSIRO’s newest radio telescope. It is situated at the Inyarrimanha Ilgari Bundara, the CSIRO Murchison Radio-astronomy Observatory on Wajarri Yamaji Country in the Murchison region of Western Australia, about 800 km north of Perth. ASKAP consists of 36 12m dishes, spread-out as far as 6km apart. It uses a new technology called Phased Array Feeds (PAFs), which allows it to see more of the sky at once. This novel technology allows ASKAP to achieve extremely high survey speed, making it one of the best instruments in the world for mapping the sky at radio wavelengths. Initial dataset...

Usage examples

Rapid Askap Continuum Survey (RACS) Home Page by CSIRO, ATNF
ASKAP System Description paper by Hotan, A. et al.
ASKAP Publication List by various, list maintained by CSIRO, ATNF
CSIRO ASKAP Science Data Archive User Guide by CSIRO, ATNF

See 4 usage examples →

BUSCO Datasets

assemblybacteriabioinformaticsgenomiclife sciencesmetagenomicsopen source softwareproteinvirus

Lineage datasets for use with BUSCO software package. Each dataset contains HMM profiles for clade specific, universal, single-copy marker genes. Datasets are available across archaea, bacteria, eukaryota and virus domains. The repository also includes necessary data files for phylogenetic placement of an input assembly.

Usage examples

BUSCO - assessing genomic data quality and beyond. by Mosè Manni, Matthew R. Berkeley, Mathieu Seppey, Evgeny M. Zdobnov
BUSCO Update - Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. by Mosè Manni, Matthew R Berkeley, Mathieu Seppey, Felipe A Simão, Evgeny M Zdobnov
OrthoDB and BUSCO update - annotation of orthologs with wider sampling of genomes. by Fredrik Tegenfeldt, Dmitry Kuznetsov, Mosè Manni, Matthew Berkeley, Evgeny M Zdobnov, Evgenia V Kriventseva
BUSCO - from QC to gene prediction and phylogenomics by Matthew Berkeley

See 4 usage examples →

Basic Local Alignment Sequences Tool (BLAST) Databases

bioinformaticsbiologygeneticgenomichealthlife sciencesproteinreference indexSTRIDEStranscriptomics

A centralized repository of pre-formatted BLAST databases created by the National Center for Biotechnology Information (NCBI).

Usage examples

BLAST+ Docker by NCBI BLAST
BLAST+: Architecture and Applications by Christiam Camacho 1 , George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, Thomas L Madden
Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs by S F Altschul, T L Madden, A A Schäffer, J Zhang, Z Zhang, W Miller, D J Lipman
BLAST on the Cloud with NCBI’s ElasticBLAST by Sixing Huang

See 4 usage examples →

Chalmers Cloud Ice Climatology

atmosphereclimatedeep learningenvironmentalexplorationgeophysicsgeosciencegeospatialglobaliceplanetarysatellite imageryzarr

The Chalmers Cloud Ice Climatology (CCIC) is a novel, deep-learning-based climate record of ice-particle concentrations in the atmosphere. CCIC results are available at high spatial and temporal resolution (0.07° / 3 h from 1983, 0.036° / 30 min from 2000) and thus ideally suited for evaluating high-resolution weather and climate models or studying individual weather systems.

Usage examples

CCIC on PyPI -- software for training and processing the CCIC retrievals and to read CCIC Zarr files by Simon Pfreundschuh
The Chalmers Cloud Ice Climatology: Retrieval implementation and validation by Adrià Amell, Simon Pfreundschuh, and Patrick Eriksson
Reading CCIC data by Adrià Amell & Simon Pfreundschuh
Storm tracking with CCIC by Julia Kukulies

See 4 usage examples →

Community Earth System Model v2 Large Ensemble (CESM2 LENS)

atmosphereclimateclimate modelgeospatialicelandmodeloceanssustainabilityzarr

The US National Center for Atmospheric Research partnered with the IBS Center for Climate Physics in South Korea to generate the CESM2 Large Ensemble which consists of 100 ensemble members at 1 degree spatial resolution covering the period 1850-2100 under CMIP6 historical and SSP370 future radiative forcing scenarios. Data sets from this ensemble were made downloadable via the Climate Data Gateway on June 14th, 2021. NCAR has copied a subset (currently ~500 TB) of CESM2 LENS data to Amazon S3 as part of the AWS Public Datasets Program. To optimize for large-scale analytics we have represented ...

Usage examples

Jupyter Notebook and other documentation and tools for CESM2-LE on AWS by NCAR Science at Scale team
Ubiquity of human-induced changes in climate variability by Rodgers et al. (2021), Earth Syst. Dynam., 12, 1–19, 2021
Rendered (static) version of Jupyter Notebook by Maxwell Grover, NCAR
Urban Climate Explorer by Zhonghua Zheng

See 4 usage examples →

Community coral reef image classification training data

coastalconservationcoral reefcsvglobalmachine learningmarineparquetsurvey

Community-sourced repository of coral reef image classification training data, including continually updated confirmed annotations from MERMAID

Usage examples

See 4 usage examples →

Data to Science Catalog

aerial imageryagriculturecogdsmdtmearth observationgeospatialhigh-throughput imagingimage processinglidarmappingstactiff

A user-generated geospatial data collection maintained by the Data to Science platform. Contributions vary by project, but typically include cloud-optimized datasets such as Cloud-Optimized GeoTIFFs (COGs) and Cloud-Optimized Point Clouds (COPCs), designed for efficient streaming, visualization, and analysis in modern geospatial applications.

Usage examples

Data to Science Web Application by GDSL
D2S Browser by GDSL
Data to Science User Manual by GDSL
Data to Science Notebook Tutorials by GDSL

See 4 usage examples →

Demand-Side Grid (dsgrid) Toolkit

data assimilationelectricityenergyenergy modelingindustrialmeteorologicalsolartransportation

Projects that use the dsgrid toolkit assemble bottom-up descriptions of electricity demand and related data that are highly resolved geographically, temporally, and sectorally. Typically modelers describe multiple scenarios of future energy use at hourly resolution, suitable for inclusion in long-term power system planning models, i.e., capacity expansion and production cost models.

Usage examples

dsgrid Project Standard Scenarios for the TEMPO Project by Elaine Hale
GitHub Repository for Working with the dsgrid Projects by Elaine Hale
Demand-Side Grid Toolkit by Elaine Hale
The Demand-side Grid (dsgrid) Model Documentation by Elaine Hale, Henry Horsey, Brandon Johnson, et al.
Highly Resolved Projections of Passenger Electric Vehicle Charging Loads for the Contiguous United States by Arthur Yip, Christopher Hoehne, Paige Jadun, Catherine Ledna, Elaine Hale, and Matteo Muratori

See 7 usage examples →

ECMWF real-time forecasts

air temperatureatmospheremeteorologicalnear-surface air temperaturenear-surface relative humiditynear-surface specific humidityprecipitationweather

These products are a subset of the ECMWF real-time forecast data and are made available to the public free of charge. They are based on the medium-range (high-resolution and ensembles) forecast models. Note: The ECMWF Open Data Portal provides a rolling archive (most recent forecast runs), while the AWS replica bucket is updated as new data are published and may retain older data conventions/versions over time.

Usage examples

See 4 usage examples →

Earth Radio Occultation

atmosphereclimateearth observationglobalsignal processingweather

This is an updating archive of radio occultation (RO) data using the transmitters of the Global Navigation Satellite Systems (GNSS) as generated and processed by the COSMIC DAAC (ucar), the Jet Propulsion Laboratory (jpl) of the California Institute of Technology, and the Radio Occultation Meteorology Satellite Application Facility (romsaf). The contributions for ucar and romsaf are currently active.

This dataset is funded by the NASA Earth Science Data Systems and the Advancing Collaborative Connections for Earth System Science (ACCESS) 2019 program.

Usage examples

Tutorial Demonstrations for Working with GNSS Radio Occultation Data in AWS Open Data Registry archive by Stephen Leroy, Amy McVey, Hailing Zhang, Chi Ao
Observing Earth's atmosphere with radio occultation measurements using the Global Positioning System by Kursinski, E.R. et al.
Utilities for Handling Data in the AWS Open Data Registry archive of GNSS Radio Occultation Data by Stephen Leroy, Amy McVey
A technical description of atmospheric sounding by GPS occultation by Hajj, G.A. et al.

See 4 usage examples →

Encyclopedia of DNA Elements (ENCODE)

bioinformaticsbiologygeneticgenomiclife sciences

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. ENCODE investigators employ a variety of assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing a ...

Usage examples

See 4 usage examples →

Epigenomes of the Human Pangenome Reference Consortium (HPRC) Release 2

bioinformaticsbiologyepigenomicsgeneticgenomiclife sciences

The Human Pangenome Reference Consortium (HPRC) Release 2 represents a landmark achievement in genomics, providing high-quality phased genome assemblies from over 200 individuals with comprehensive functional genomics data. The HPRC Epigenome Browser provides researchers a way to explore all epigenomics data generated by release 2. The HPRC Epigenome Browser (HPRCEB) is a modern, interactive web portal that democratizes access to HPRC Release 2 epigenomics data through an intuitive interface supporting genome selection, data visualization, and bulk download capabilities. The portal integrates ...

Usage examples

WashU Epigenome Browser update 2025 by Chanrung Seng, Shane Liu, Wenjin Zhang, Xiaoyu Zhuo, Daofeng Li, Ting Wang
"Get To Know A Dataset: HPRC Epigenome" by HPRC Epigenome Browser
"Modbed track: Visualization of modified bases in single-molecule sequencing" by Daofeng Li, Xiaoyu Zhuo, Jessica K. Harrison, Shane Liu, Ting Wang
A draft human pangenome reference by Liao, WW., Asri, M., Ebler, J. et al.

See 4 usage examples →

Epilepsy.Science

bioinformaticselectrophysiologylife sciencesmedicineneuroscience

Epilepsy.Science comprise a set of datasets focused on Epilepsy Research that span both Clinical Data and Pre-clinical data. Datasets are contributed by the Epilepsy Research community and published using a standardized structure and metadata. Clinical datasets include de-identified subject information, EEG, and clinical imaging.

Usage examples

Submitting a dataset proposal by Pennsieve
The Pennsieve Data Management Platform by Joost Wagenaar
Pennsieve Open Repositories by Pennsieve
The Epilepsy.Science Portal by Joost Wagenaar, Brandon Westover, Kathryn Davis, Nishant Sinha, Brian Litt

See 4 usage examples →

FoMo - A Multi-Season Dataset for Robot Navigation in Forêt Montmorency

autonomous vehiclesbenchmarkcomputer visionenvironmentalextreme weathergeospatialGNSSIMUlidarlocalizationmappingmeteorologicalperceptionradarRINEXroboticssignal processing

The FoMo dataset is a multi-season collection recorded in a boreal forest environment, featuring deep snow, off-road terrain, steep slopes, and highly variable weather. It provides synchronized multi-modal sensor data—including two lidars (RoboSense and Leishen), an FMCW radar (Navtech), stereo and monocular cameras, dual IMUs, wheel odometry, power data, calibration sequences, and precise ground-truth trajectories via GNSS-PPK fusion. Designed to support research on robust robot autonomy under adverse conditions, FoMo includes repeated traversals of six trajectories of varying complexity for ...

Usage examples

See 4 usage examples →

GEOGLOWS Hydrological Model Version 2

geopackagehydrographyhydrologic modelhydrologysimulationszarr

GEOGLOWS is the Group on Earth Observation's Global Water Sustainability Program. It coordinates efforts from public and private entities to make application ready river data more accessible and sustainably available to underdeveloped regions. The GEOGLOWS Hydrological Model provides a retrospective and daily forecast of global river discharge at 7 million river sub-basins. The stream network is a hydrologically conditioned subset of the TDX-Hydro streams and basins data produced by the United State's National Geospatial Intelligence Agency. The daily forecast provides 3 hourly average discharge ...

Usage examples

See 4 usage examples →

GPM IMERG Early Precipitation L3 1 day 0.1 degree x 0.1 degree V07 (GPM_3IMERGDE) at GES DISC

atmosphereclimatecoastaldatacenterglobalhydrologylandmetadatanetcdfopendap

Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07.The Integrated Multi-satellitE Retrievals for GPM (IMERG) IMERG is a NASA product estimating global surface precipitation rates at a high resolution of 0.1° every half-hour beginning 2000. It is part of the joint NASA-JAXA Global Precipitation Measurement (GPM) mission, using the GPM Core Observatory satellite as the standard to combine precipitation observations from an international constellation of satellites using advanced techniques. IMERG can be used for globa...

Usage examples

Kalman Filter Based CMORPH by Joyce, R. J., P. Xie, and J. E. Janowiak
Calculation of Gridded Precipitation Data for the Global Land-Surface Using In-Situ Gauge Observations by Rudolf, B., and U. Schneider
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Precipitation Estimation from Remotely Sensed Imagery Using an Artificial Neural Network Cloud Classification System by Hong, Y., K. L. Hsu, S. Sorooshian, and X. Gao

See 4 usage examples →

GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07 (GPM_3IMERGDF) at GES DISC

climatecoastaldatacenterglobalhydrologyicelandmetadatanetcdfopendap

Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07.The Integrated Multi-satellitE Retrievals for GPM (IMERG) IMERG is a NASA product estimating global surface precipitation rates at a high resolution of 0.1° every half-hour beginning 2000. It is part of the joint NASA-JAXA Global Precipitation Measurement (GPM) mission, using the GPM Core Observatory satellite as the standard to combine precipitation observations from an international constellation of satellites using advanced techniques. IMERG can be used for global-scale applic...

Usage examples

Calculation of Gridded Precipitation Data for the Global Land-Surface Using In-Situ Gauge Observations by Rudolf, B., and U. Schneider
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Kalman Filter Based CMORPH by Joyce, R. J., P. Xie, and J. E. Janowiak
Precipitation Estimation from Remotely Sensed Imagery Using an Artificial Neural Network Cloud Classification System by Hong, Y., K. L. Hsu, S. Sorooshian, and X. Gao

See 4 usage examples →

GPM IMERG Late Precipitation L3 1 day 0.1 degree x 0.1 degree V07 (GPM_3IMERGDL) at GES DISC

atmosphereclimatecoastaldatacenterglobalhydrologylandmetadatanetcdfopendap

Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07.The Integrated Multi-satellitE Retrievals for GPM (IMERG) IMERG is a NASA product estimating global surface precipitation rates at a high resolution of 0.1° every half-hour beginning 2000. It is part of the joint NASA-JAXA Global Precipitation Measurement (GPM) mission, using the GPM Core Observatory satellite as the standard to combine precipitation observations from an international constellation of satellites using advanced techniques. IMERG can be used for ...

Usage examples

How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
Calculation of Gridded Precipitation Data for the Global Land-Surface Using In-Situ Gauge Observations by Rudolf, B., and U. Schneider
Precipitation Estimation from Remotely Sensed Imagery Using an Artificial Neural Network Cloud Classification System by Hong, Y., K. L. Hsu, S. Sorooshian, and X. Gao
Kalman Filter Based CMORPH by Joyce, R. J., P. Xie, and J. E. Janowiak

See 4 usage examples →

Genome in a Bottle on AWS

geneticgenomiclife sciencesreference indexvcf

Several reference genomes to enable translation of whole human genome sequencing to clinical practice. On 11/12/2020 these data were updated to reflect the most up to date GIAB release.

Usage examples

Extensive sequencing of seven human genomes to characterize benchmark reference materials by Zook J et al (2016)
High-coverage, long-read sequencing of Han Chinese trio reference samples by Wang Y et al (2019)
The Genome in a Bottle Github Project by Genome In A Bottle Consortium
GA4GH Benchmarking Tools by GA4GH Benchmarking Team

See 4 usage examples →

High Resolution Canopy Height Maps by WRI and Meta

aerial imageryagricultureclimatecogearth observationgeospatialimage processingland covermachine learningsatellite imagery

Global and regional Canopy Height Maps (CHM). Created using machine learning models on high-resolution worldwide Maxar satellite imagery.

Usage examples

Every tree counts: Large-scale mapping of canopy height at the resolution of individual trees by Jamie Tolan, Camille Couprie, and Tracy Johns
Sub-meter resolution canopy height maps using self-supervised learning and a vision transformer trained on Aerial and GEDI Lidar by Jamie Tolan, Hung-I Yang, Ben Nosarzewski, Guillaume Couairon, Huy Vo, John Brandt, Justine Spore, Sayantan Majumdar, Daniel Haziza, Janaki Vamaraju, Theo Moutakanni, Piotr Bojanowski, Tracy Johns, Brian White, Tobias Tiecke, Camille Couprie
Global Canopy Height on Earth Engine by Meta and WRI
Using Artificial Intelligence to Map the Earth’s Forests by Jamie Tolan, Camille Couprie, John Brandt, Justine Spore, Tobias Tiecke, Tracy Johns and Patrick Nease

See 4 usage examples →

IDEAM - Colombian Radar Network

agricultureearth observationmeteorologicalnatural resourceweather

Historical and one-day delay data from the IDEAM radar network.

Usage examples

Ciencia de Datos Hidrometeorológicos con Python by Alfonso Ladino, Nicole Rivera, Max Grover
Read and plot Sigmet files available on AWS using Xradar by Alfonso Ladino
Guia de como explorar y plotear los archivos de radar utilizando el lenguaje de programación Python by IDEAM
Specific Differential Phase (KDP) retrieval methods comparison by Alfonso Ladino, Max Grover

See 4 usage examples →

International Skin Imaging Collaboration (ISIC) Archive

biologycancerclassificationcomputational pathologydicomgrand-challenge.orghealthHomo sapiensimaginglife sciencesmachine learningmedical image computingmedical imagingmedicinemicroscopysegmentation

A public-access archive of skin lesion images, supporting teaching, research, and the development and evaluation of diagnostic algorithms.

Usage examples

International Skin Imaging Collaboration - Designated Diagnoses (ISIC-DX): Consensus terminology for lesion diagnostic labeling by Scope A, Liopyris K, Weber J, Barnhill R, Braun R, Curiel-Lewandrowski C, et al
The SLICE-3D dataset: 400,000 skin lesion image crops extracted from 3D TBP for skin cancer detection by Kurtansky N, D'Alessandro B, Gillis M, Betz-Stablein B, Cerminara S, Garcia R, et al
ISIC Archive Gallery by International Skin Imaging Collaboration (ISIC)
A patient-centric dataset of images and metadata for identifying melanomas using clinical context by Rotemberg V, Kurtansky N, Betz-Stablein B, Caffery L, Chousakos E, Codella N, et al
ISIC Archive Data Dictionary by International Skin Imaging Collaboration (ISIC)

See 7 usage examples →

JAXA / USGS / NASA Kaguya/SELENE Terrain Camera Digital Terrain Models

cogelevationplanetarystac

The Japan Aerospace EXploration Agency (JAXA) SELenological and ENgineering Explorer (SELENE) mission’s Kaguya spacecraft was launched on September 14, 2007 and science operations around the Moon started October 20, 2007. The primary mission in a circular polar orbit 100-km above the surface lasted from October 20, 2007 until October 31, 2008. An extended mission was then conducted in lower orbits (averaging 50km above the surface) from November 1, 2008 until the SELENE mission ended with Kaguya impacting the Moon on June 10, 2009. These data are digital terrain models derived using the NASA A...

Usage examples

See 4 usage examples →

JAXA / USGS / NASA Kaguya/SELENE Terrain Camera Observations

cogplanetarysatellite imagerystac

The Japan Aerospace EXploration Agency (JAXA) SELenological and ENgineering Explorer (SELENE) mission’s Kaguya spacecraft was launched on September 14, 2007 and science operations around the Moon started October 20, 2007. The primary mission in a circular polar orbit 100-km above the surface lasted from October 20, 2007 until October 31, 2008. An extended mission was then conducted in lower orbits (averaging 50km above the surface) from November 1, 2008 until the SELENE mission ended with Kaguya impacting the Moon on June 10, 2009. These data were collected in monoscopic observing mode. To cre...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – Global gridded percentiles

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides percentile weather forecasts. The grid resolution is approximately 20km and covers the whole globe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific), and represented as percentiles and probabilities.

This info is correct as of April 2026, but some things (like the number of sites, parameters and timesteps) may change in future.

How perce...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – Global gridded probabilities

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides gridded probabilistic weather forecasts. The grid resolution is approximately 20km and covers the whole globe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific), and represented as percentiles and probabilities.

This info is correct as of April 2026, but some things (like the number of si...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – Global spot percentiles

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides percentile weather forecasts for 5,956 sites (or spots) across the globe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific), and represented as percentiles and probabilities.

T...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – Global spot probabilities

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides probabilistic weather forecasts for 5,956 sites (or spots) across the globe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific), and represented as per...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – UK Spot Percentiles

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides percentile weather forecasts for 7,213 sites (or spots) across the United Kingdom, Ireland and parts of Western Europe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific), and represented as percentiles ...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – UK Spot Probabilities

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides probabilistic weather forecasts for 7,213 sites (or spots) across the United Kingdom, Ireland and parts of Western Europe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific)...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – UK gridded percentiles

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides gridded percentile weather forecasts. The grid resolution is approximately 2km and covers the UK and parts of Western Europe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific), and represented as percentiles and probabilities.

This info is correct as of April 2026, but some things (like the number of sites, parameters and timesteps) may change in future.

How percentiles work
Ensemble forecasts show a range of...

Usage examples

See 4 usage examples →

Met Office Blended Probabilistic Forecast – UK gridded probabilities

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

This product provides gridded probabilistic weather forecasts. The grid resolution is approximately 2km and covers the UK and parts of Western Europe. It is produced by the Met Office IMPROVER Blended Probabilistic Forecast system. It is available in NetCDF format.

Blended Probabilistic Forecast data is derived from the Met Office's operational NWP (Numerical Weather Prediction) ensembles and nowcasts. To give more reliable predictions, these are then blended and calibrated using the IMPROVER pipeline, and verified using spread–skill and reliability checks.

This is 1 of 8 Blended Probabilistic Forecast products published by the Met Office on the Registry of Open Data on AWS. Data is available for the Global and UK domains, as gridded and spot (site-specific), and represented as percentiles and probabilities.

This info is correct as of April 2026, but some things (like the number of sites, parameters and timesteps) may change in future.

How probabilities work
Ensemble fo...

Usage examples

See 4 usage examples →

Met Office Global Ensemble Prediction System (MOGREPS-G) on a 30-day rolling archive

air temperatureatmosphereforecastgeosciencegeospatialglobalmeteorologicalmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

THIS DATASET IS CHANGING

Files uploaded from late January 2026 onward will contain changes including:

precision changes
new parameters
changes to existing parameters e.g. adding vertical levels and timesteps
the height_asl_on_pressure_levels parameter will be replaced by geopotential_height_on_pressure_levels

Please check your systems are prepared for these changes.

A numerical weather prediction model that produces forecasts for the whole globe up to a week ahead. The projection used is the Equirectangular Latitude-Longitude and the grid resolution is 20km. The data is available as NetCDF files. It's offered on a fre...

Usage examples

See 4 usage examples →

Met Office Global and Regional Ensemble Prediction System - UK (MOGREPS-UK) on a 30-day rolling archive

air temperatureatmosphereforecastgeosciencegeospatialglobalmeteorologicalmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

THIS DATASET IS CHANGING

Files uploaded from late January 2026 onward will contain changes including:

precision changes
new parameters
changes to existing parameters e.g. adding vertical levels and timesteps
the height_asl_on_pressure_levels parameter will be replaced by geopotential_height_on_pressure_levels

Please check your systems are prepared for these changes.

A numerical weather prediction model that produces forecasts for the UK for the next 5 days. Parameters including temperature, pressure, wind, humidity, etc. are forecast at grid points separated by about 2.2 km, and the model has multiple vertical...

Usage examples

See 4 usage examples →

Molecular Profiling to Predict Response to Treatment (phs001965)

cancergenomiclife sciencesSTRIDESwhole genome sequencing

The Molecular Profiling to Predict Response to Treatment (MP2PRT) program is part of the NCI's Cancer Moonshot Initiative. The aim of this program is the retrospective characterization and analysis of biospecimens collected from completed NCI-sponsored trials of the National Clinical Trials Network and the NCI Community Oncology Research Program. This study, titled "Identification of Genetic Changes Associated with Relapse and/or Adaptive Resistance in Patients Registered as Favorable Histology Wilms Tumor on AREN03B2", performs genomic characterization (WGS 30X, Total RNAseq, mi...

Usage examples

Genomic Data Commons by National Cancer Institute
Genetic changes associated with relapse in favorable histology Wilms tumor: A Children's Oncology Group AREN03B2 study by Samantha Gadd, Vicki Huff, et al.
Finding the way to Wilms tumor by comparing the primary and relapse tumor samples by Filippo Spreafico, Sara Ciceri, et al.
Childhood Cancer Data Initiative Data Catalog by National Cancer Institute

See 4 usage examples →

Mouse Brain Anatomy: MouseLight Imagery

biologyfluorescence imagingimage processingimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscience

This data set, made available by Janelia's MouseLight project, consists of images and neuron annotations of the Mus musculus brain, stored in formats suitable for viewing and annotation using the HortaCloud cloud-based annotation system.

Usage examples

MouseLight NeuronBrowser by Tiago A. Ferreira, Jayaram Chandrashekar
HortaCloud by David Schauder, Donald J. Olbris, Jody Clements, Cristian Goina, Robert R. Svirskas, Konrad Rokicki
MouseLight Project Website by Tiago A. Ferreira, Jayaram Chandrashekar
Reconstruction of 1,000 Projection Neurons Reveals New Cell Types and Organization of Long-Range Connectivity in the Mouse Brain by Johan Winnubst, Erhan Bas, Tiago A. Ferreira, Zhuhao Wu, Michael N. Economo, Patrick Edson, Ben J. Arthur, Christopher Bruns, Konrad Rokicki, David Schauder, Donald J. Olbris, Sean D. Murphy, David G. Ackerman, Cameron Arshadi, Perry Baldwin, Regina Blake, Ahmad Elsayed, Mashtura Hasan, Daniel Ramirez, Bruno Dos Santos, Monet Weldon, Amina Zafar, Joshua T. Dudman, Charles R. Gerfen, Adam W. Hantman, Wyatt Korff, Scott M. Sternson, Nelson Spruston, Karel Svoboda, Jayaram Chandrashekar

See 4 usage examples →

NA-CORDEX - North American component of the Coordinated Regional Downscaling Experiment

atmosphereclimateclimate modelgeospatiallandmodelsustainabilityzarr

The NA-CORDEX dataset contains regional climate change scenario data and guidance for North America, for use in impacts, decision-making, and climate science. The NA-CORDEX data archive contains output from regional climate models (RCMs) run over a domain covering most of North America using boundary conditions from global climate model (GCM) simulations in the CMIP5 archive. These simulations run from 1950–2100 with a spatial resolution of 0.22°/25km or 0.44°/50km. This AWS S3 version of the data includes selected variables converted to Zarr format from the original NetCDF. Only daily data a...

Usage examples

Jupyter Notebook and other documentation and tools by Brian Bonnlander, Seth McGinnis (NCAR)
Intake-ESM Catalog by Brian Bonnlander (NCAR)
The NA-CORDEX dataset, version 1.0. NCAR Climate Data Gateway, Boulder CO (2017) by Mearns, Linda O., et al.
Rendered (static) version of Jupyter Notebook by Brian Bonnlander (NCAR)

See 4 usage examples →

NAIP on AWS

aerial imageryagriculturecogearth observationgeospatialnatural resourceregulatory

The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. This "leaf-on" imagery andtypically ranges from 30 centimeters to 100 centimeters in resolution and is available from the naip-analytic Amazon S3 bucket as 4-band (RGB + NIR) imagery in MRF format, on naip-source Amazon S3 bucket as 4-band (RGB + NIR) in uncompressed Raw GeoTiff format and naip-visualization as 3-band (RGB) Cloud Optimized GeoTiff format. More details on NAIP

Usage examples

VoyagerSearch showing off Batch + NAIP by Voyager
EOS Land Viewer by Earth Observing System
Individual Tree Detection in Large-Scale Urban Environments using High-Resolution Multispectral Imagery by Jonathan Ventura, Milo Honsberger, Cameron Gonsalves, Julian Rice, Camille Pawlak, Natalie L.R. Love, Skyler Han, Viet Nguyen, Keilana Sugano, Jacqueline Doremus, G. Andrew Fricker, Jenn Yost, Matt Ritter
Urban Tree Detection by Jonathan Ventura

See 4 usage examples →

NASA / USGS Controlled Europa DTMs

cogplanetarysatellite imagerystac

Knowledge of a planetary surface’s topography is necessary to understand its geology and enable landed mission operations. The Solid State Imager (SSI) on board NASA’s Galileo spacecraft acquired more than 700 images of Jupiter’s moon Europa. Although moderate- and high-resolution coverage is extremely limited, repeat coverage of a small number of sites enables the creation of digital terrain models (DTMs) via stereophotogrammetry. Here we provide stereo-derived DTMs of five sites on Europa. The sites are the bright band Agenor Linea, the crater Cilix, the crater Pwyll, pits and chaos adjacent...

Usage examples

See 4 usage examples →

NASA / USGS Controlled THEMIS Mosaics

cogplanetarysatellite imagerystac

These data are infrared image mosaics, tiled to the Mars quadrangle, generated using Thermal Emission Imaging System (THEMIS) images from the 2001 Mars Odyssey orbiter mission. The mosaic is generated at the full resolution of the THEMIS infrared dataset, which is approximately 100 meters/pixel. The mosaic was absolutely photogrammetrically controlled to an improved Viking MDIM network that was develop by the USGS Astrogeology processing group using the Integrated Software for Imagers and Spectrometers. Image-to-image alignment precision is subpixel (i.e., <100m). These 8-bit, qualitative d...

Usage examples

See 4 usage examples →

NASA / USGS Europa Controlled Observation Mosaics

cogplanetarysatellite imagerystac

The Solid State Imager (SSI) on NASA's Galileo spacecraft acquired more than 500 images of Jupiter's moon, Europa. These images vary from relatively low-resolution hemispherical imaging, to high-resolution targeted images that cover a small portion of the surface. Here we provide a set of 92 image mosaics generated from minimally processed, projected Galileo images with photogrammetrically improved locations on Europa's surface.

These images provide users with nearly the entire Galileo Europa imaging dataset at its native resolution and with improved relative image locations. The S...

Usage examples

See 4 usage examples →

NASA / USGS Europa Controlled Observations

cogplanetarysatellite imagerystac

The Solid State Imager (SSI) on NASA's Galileo spacecraft acquired more than 500 images of Jupiter's moon, Europa. These images vary from relatively low-resolution hemispherical imaging, to high-resolution targeted images that cover a small portion of the surface. Here we provide a set of 481 minimally processed, projected Galileo images with photogrammetrically improved locations on Europa's surface. These individual images were subsequently used as input into a set of 92 observation mosaics.

These images provide users with nearly the entire Galileo Europa imaging dataset at its nativ...

Usage examples

See 4 usage examples →

NASA / USGS Mars Reconnaissance Orbiter (MRO) Context Camera (CTX) Targeted DTMs

cogelevationplanetarysatellite imagerystac

As of March, 2023 the Mars Reconnaissance Orbiter (MRO) High Resolution Science Experiment (HiRISE) sensor has collected more than 5000 targeted stereopairs. During HiRISE acquisition, the Context Camera (CTX) also collects lower resolution, higher spatial extent context images. These CTX acquisitions are also targeted stereopairs. This data set contains targeted CTX DTMs and orthoimages, created using the NASA Ames Stereopipeline. These data have been created using relatively controlled CTX images that have been globally bundle adjusted using the USGS Integrated System for Imagers and Spectro...

Usage examples

See 4 usage examples →

NASA / USGS Released HiRISE Digital Terrain Models

cogplanetarysatellite imagerystac

These data are digital terrain models (DTMs) created by multiple different institutions and released to the Planetary Data System (PDS) by the University of Arizona. The data are processed from the Planetary Data System (PDS) stored JP2 files, map projected, and converted to Cloud Optimized GeoTiffs (COGs) for efficient remote data access. These data are controlled to the Mars Orbiter Laser Altimeter (MOLA). Therefore, they are a proxy for the geodetic coordinate reference frame. These data are not guaranteed to co-register with an uncontrolled products (e.g., the uncontrolled High Resolution ...

Usage examples

See 4 usage examples →

NASA / USGS Uncontrolled HiRISE RDRs

cogplanetarysatellite imagerystac

These data are red and color Reduced Data Record (RDR) observations collected and originally processed by the High Resolution Imaging Science Experiment (HiRISE) team. The mdata are processed from the Planetary Data System (PDS) stored RDRs, map projected, and converted to Cloud Optimized GeoTiffs (COGs) for efficient remote data access. These data are not photogrammetrically controlled and use a priori NAIF SPICE pointing. Therefore, these data will not co-register with controlled data products. Data are released using simple cylindrical (planetocentric positive East, center longitude 0, -180...

Usage examples

See 4 usage examples →

NOAA Global Forecast System (GFS)

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

NOTE - Upgrade NCEP Global Forecast System to v16.3.0 - Effective November 29, 2022 See notification HERE

The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The entire globe is covered by the GFS at a base horizontal resolution of 18 miles (28 kilometers) between grid points, which is used by the operational forecasters who predict weather out to 16...

Usage examples

NOAA Global Forecast System (GFS) quickstart notebook on AWS by Benoit de Chateauvieux
GFS Warm Restart Files Additional Information by Fanglin Yang
Validation and Application of the Accu-Waves Operational Platform for Wave Forecasts at Ports by Christos Makris et al.
Toward the Next Generation of Microwave Sounders: Benefits of a Low-Earth Orbit Hyperspectral Microwave Instrument in All- Weather Conditions using AI by Eric S. Maddy

See 4 usage examples →

NOAA Global Historical Climatology Network Daily (GHCN-D)

agricultureclimatemeteorologicalweather

UPDATE TO GHCN PREFIXES - The NODD team is working on improving performance and access to the GHCNd data and will be implementing an updated prefix structure. For more information on the prefix changes, please see the "READ ME on the NODD Github". If you have questions, comments, or feedback, please reach out to nodd@noaa.gov with GHCN in the subject line.

Global Historical Climatology Network - Daily is a dataset from NOAA that contains daily observations over global land areas. It contains station-based measurements ...

Usage examples

Natural and Socioeconomic Conditions Influence Tick-Borne Encephalitis Cases in Russia by Lantian Zhang, Linsheng Yang, Li Wang, Lijuan Gu, Hairong Li, Svetlana Malkhazova
Explore & Visualize 200+ Years of Global Temperature by Kapil Sreedharan
Visualize over 200 years of global climate data using Amazon Athena and Amazon QuickSight by Conor Delaney
Calculating growing degree days using AWS Registry of Open Data by Karen Hildebrand and Zac Flamig

See 4 usage examples →

NOAA High-Resolution Rapid Refresh (HRRR) Model

agricultureclimatedisaster responseenvironmentalweather

The HRRR is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation. Radar data is assimilated in the HRRR every 15 min over a 1-h period adding further detail to that provided by the hourly data assimilation from the 13km radar-enhanced Rapid Refresh.

The HRRR ZARR formatted data was originally generated by the University of Utah under a grant provided by NOAA. They are are continuing to publish ZARR versions of HRRR data. For information about data in the s3://hrrrzarr/ please contact ...

Usage examples

Using Cloud Computing to Analyze Model Output Archived in Zarr Format by Taylor A. Gowan, John D. Horel, Alexander A. Jacques, and Adair Kovac
Using the U.S. Climate Reference Network to Identify Biases in Near- and Sub-Surface Meteorological Fields in the High-Resolution Rapid Refresh (HRRR) Weather Prediction Model by Temple R. Lee, Ronald D. Leeper, Tim Wilson, Howard Diamond, Tilden P. Meyers, and David D. Turner
HRRR-B Python package: download and read HRRR grib2 files by Brian Blaylock
The HRRR Zarr Archive Managed by MesoWest by Taylor Gowan

See 4 usage examples →

NOAA's Coastal Ocean Reanalysis (CORA) Dataset: 1979-2022

agricultureagricultureclimatedisaster responseenvironmentaloceanstransportationweather

NOAA's Coastal Ocean Reanalysis (CORA) for the Gulf, East Coast/Atlantic, and Caribbean (GEC) is produced using verified hourly water levels from the National Ocean Service’s Center of Operational Oceanographic Products & Services (CO-OPS). ADvanced CIRCulation Model (ADCIRC) and Simulating WAves Nearshore (SWAN) models are coupled to model coastal water levels and nearshore waves. Hourly water level observations are used for data assimilation and validation to improve the accuracy of modeled water levels and wave datasets.

Additional Details:
Metadata associated with model domain and time span:

Timeseries - 1979 to 2022
Size - Approx. 44.6 TB
Domain - Lat 5.8 to 45.8 ; Long -98.0 to -53.8

...

Usage examples

Assessment of water levels from 43 years of NOAA’s Coastal Ocean Reanalysis (CORA) for the Gulf of Mexico and East Coasts by Rose, Linta; Widlansky, Matthew J.; Feng, Xue; Thompson, Thompson; Asher, Taylor G.; Dusek, Gregory; Blanton, Blanton; Luettich, Richard A. Jr.; Callahan, John; Brooks, William; Keeney, Analise; Haddad, Jana; Sweet, William; Genz, Ayesha; Hovenga, Paige; Marra, John & Tilson, Jeffrey
NOAA Technical Report NOS CO-OPS 108: NOAA’s Coastal Ocean Reanalysis: Gulf of Mexico, Atlantic, and Caribbean (January 2025) by Keeney, Analise; Dusek, Gregory; Callahan, John; Ratcliff, John; Jima, Tigist; Brooks, William; Marcy, Doug; Blanton, Brian; Tilson, Jeffrey; Asher, Taylor G.; Leuttich, Richard A.; Widlansky, Matthew J.; Rose, Linta; Morse, Cheryl; Haddad, Jana; & Waring, Blake
Coastal Ocean Reanalysis Use cases by NOAA's Center for Operational Oceanographic Products and Services
Using Python to Access Coastal Ocean Reanalysis (CORA) Data by NOAA's Center for Operational Oceanographic Products and Services

See 4 usage examples →

NREL National Solar Radiation Database

earth observationenergygeospatialmeteorologicalsolar

Released to the public as part of the Department of Energy's Open Energy Data Initiative, the National Solar Radiation Database (NSRDB) is a serially complete collection of hourly and half-hourly values of the three most common measurements of solar radiation – global horizontal, direct normal, and diffuse horizontal irradiance — and meteorological data. These data have been collected at a sufficient number of locations and temporal and spatial scales to accurately represent regional solar radiation climates.

Usage examples

The National Solar Radiation Data Base (NSRDB) by Manajit Sengupta, Yu Xe, Anthony Lopez, Aron Habte, Galen Maclaurin, James Shelby
HSDS Examples by Caleb Phillips, Caroline Draxl, John Readey, Jordan Perr-Sauer, Michael Rossol
NSRDB Viewer by Manajit Sengupta, Yu Xe, Anthony Lopez, Aron Habte, Galen Maclaurin, James Shelby, Paul Edwards
Physics-guided machine learning for improved accuracy of the National Solar Radiation Database by Grant Buster, Mike Bannister, Aron Habte, Dylan Hettinger, Galen Maclaurin, Michael Rossol, Manajit Sengupta, Yu Xie

See 4 usage examples →

OPERA Radiometric Terrain Corrected SAR Backscatter from Sentinel-1 validated product (Version 1)

coastalearth observationgeoscienceglobalhdficelandmetadataoceansorbitradarsentinel-1soil moisturesynthetic aperture radartiffxml

The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Radiometric Terrain Corrected (RTC) SAR Backscatter from Sentinel-1 (S1) validated product consists of radar backscatter normalized with respect to the topography. The product maps signals related to the physical properties of ground scattering objects, such as surface roughness and soil moisture and/or vegetation. The OPERA RTC-S1 product is derived from Copernicus Sentinel-1 Interferometric Wide (IW) Single Look Complex (SLC) data with a near global scope and temporal sampling coincident with the availability of S1...

Usage examples

RTC Landslide Example by K. Venkataramani
Thermal Denoising of Products Generated by the S-1 IPF by Riccardo Piantanida
Load, Mosaic, and Visualize OPERA RTC-S1 Data by K. Venkataramani
An Area-Based Projection Algorithm for SAR Radiometric Terrain Correction and Geocoding by Gustavo H. X. Shiroma, Marco Lavalle, and Sean M. Buckley

See 4 usage examples →

OpenAlex dataset

graphjsonmetadatascholarly communication

An open, comprehensive index of scolarly papers, citations, authors, institutions, and journals.

Usage examples

Download snapshot by OurResearch
OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts by Jason Priem, Heather Piwowar, Richard Orr
Getting citation data from OpenAlex by DOI (Jupyter notebook) by Jens Peter Anderson
Data Analysis and Knowledge Graph with OpenAlex dataset by Gabriel Bella Martini

See 4 usage examples →

OpenCell on AWS

Biohubbiologycell biologycell imagingcomputer visionfluorescence imagingimaginglife sciencesmachine learningmicroscopy

The OpenCell project is a proteome-scale effort to measure the localization and interactions of human proteins using high-throughput genome engineering to endogenously tag thousands of proteins in the human proteome. This dataset consists of the raw confocal fluorescence microscopy images for all tagged cell lines in the OpenCell library.These images can be interpreted both individually, to determine the localization of particular proteins of interest, and in aggregate, by training machine learning models to classify or quantify subcellular localization patterns.

Usage examples

See 4 usage examples →

OpenFold3 Training Data

life sciencesmsaopen source softwareopenfoldproteinprotein foldingprotein template

This dataset contains MSAs and predicted structures used to train OpenFold3 preview, an open-source, all-atom ligand, RNA and protein structure prediction software. This includes -

PDB - 245k structures and alignments from the RCSB Protein Data Bank - https://www.rcsb.org/
Long monomer distillation set - ~13 million long (sequence length >= 200 amino acids) monomers from the MGNIFY database - https://www.ebi.ac.uk/metagenomics/.
Short monomer distillation set - 400k short (sequence length < 200 amino acid) monomers from the MGNIFY database - https://www.ebi.ac.uk/metagenomics/.
Disordered

...

Usage examples

Deploying OpenFold3 with NVIDIA NIMs on Brev & AWS EC2 by Glòria Macià
OpenFold3-preview2 Technical Report by The OpenFold3 Team
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization by Ahdritz, Gustaf; Bouatta, Nazim; Kadyan, Sachin; Xia, Qinghui; Gerecke, William; O'Donnell, Timothy J, et al
Looking at an OpenFold3 MSA in a Browser-Based Notebook on Scigantic by Scigantic

See 4 usage examples →

Refgenie reference genome assets

bioinformaticsbiologygeneticgenomicinfrastructurelife sciencessingle-cell transcriptomicstranscriptomicswhole genome sequencing

Pre-built refgenie reference genome data assets used for aligning and analyzing DNA sequence data.

Usage examples

See 4 usage examples →

SILO climate data on AWS

agricultureclimateearth observationenvironmentalmeteorologicalmodelsustainabilitywaterweather

SILO is a database of Australian climate data from 1889 to the present. It provides continuous, daily time-step data products in ready-to-use formats for research and operational applications. SIL...

Usage examples

Using relative humidity grids with xarray from s3 by Richard Scott
NetCDF Operators to calculate seasonal means by SILO
Convert NetCDF to ESRI ArcASCII or GeoTIFF by SILO
Python script to calculate a regional mean by SILO

See 4 usage examples →

Sea Surface Temperature Daily Analysis: European Space Agency Climate Change Initiative product version 2.1

climateearth observationenvironmentalgeospatialglobaloceans

Global daily-mean sea surface temperatures, presented on a 0.05° latitude-longitude grid, with gaps between available daily observations filled by statistical means, spanning late 1981 to recent time. Suitable for large-scale oceanographic meteorological and climatological applications, such as evaluating or constraining environmental models or case-studies of marine heat wave events. Includes temperature uncertainty information and auxiliary information about land-sea fraction and sea-ice coverage. For reference and citation see: www.nature.com/articles/s41597-019-0236-x.

Usage examples

Working with surftemp-sst data - Tutorial 1 - Getting started by Niall McCarroll
Working with surftemp-sst data - Tutorial 2 - Analysing Marine Heatwaves by Niall McCarroll
Satellite-based time-series of sea-surface temperature since 1981 for climate applications (2019). by Merchant, C.J., Embury, O., Bulgin, C.E., Block, T., Corlett, G.K., Fiedler, E., Good, S.A., Mittaz, J., Rayner, N.A., Berry, D., Eastwood, S., Taylor, M., Tsushima, Y., Waterfall, A., Wilson, R. and Donlon, C.
Adjusting for desert-dust-related biases in a climate data record of sea surface temperature (2020). by Merchant, C.J. and Embury, O.

See 4 usage examples →

Sentinel-2 L2A 120m Mosaic

agriculturecogearth observationgeospatialmachine learningnatural resourcesatellite imagery

Sentinel-2 L2A 120m mosaic is a derived product, which contains best pixel values for 10-daily periods, modelled by removing the cloudy pixels and then performing interpolation among remaining values. As there are some parts of the world, which have lengthy cloudy periods, clouds might be remaining in some parts. The actual modelling script is available here.

Usage examples

See 4 usage examples →

Sentinel-3

cogearth observationenvironmentalgeospatiallandoceanssatellite imagerystac

This data set consists of observations from the Sentinel-3 satellite of the European Commission’s Copernicus Earth Observation Programme. Sentinel-3 is a polar orbiting satellite that completes 14 orbits of the Earth a day. It carries the Ocean and Land Colour Instrument (OLCI) for medium resolution marine and terrestrial optical measurements, the Sea and Land Surface Temperature Radiometer (SLSTR), the SAR Radar Altimeter (SRAL), the MicroWave Radiometer (MWR) and the Precise Orbit Determination (POD) instruments. The satellite was launched in 2016 and entered routine operational phase in 201...

Usage examples

See 4 usage examples →

Sentinel-5P Level 2

air qualityatmospherecogearth observationenvironmentalgeospatialsatellite imagerystac

This data set consists of observations from the Sentinel-5 Precursor (Sentinel-5P) satellite of the European Commission’s Copernicus Earth Observation Programme. Sentinel-5P is a polar orbiting satellite that completes 14 orbits of the Earth a day. It carries the TROPOspheric Monitoring Instrument (TROPOMI) which is a spectrometer that senses ultraviolet (UV), visible (VIS), near (NIR) and short wave infrared (SWIR) to monitor ozone, methane, formaldehyde, aerosol, carbon monoxide, nitrogen dioxide and sulphur dioxide in the atmosphere. The satellite was launched in October 2017 and entered ro...

Usage examples

See 4 usage examples →

SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)

biodiversitybiologyecosystemsimage processingmultimediawildlife

The SiPeCaM goal is to create a data source that allows to evaluate changes in the biodiversity state, considering key aspect of how does the ecosystem behaves.

Usage examples

Sitios Permanente de la Calibración y Monitoreo de la Biodiversidad by Michael Schmidt et. al.
Sample search query on november video files for cumulus 92, using Alfresco. by Carolina Acosta
Sample search query on november audio files for cumulus 92, using Alfresco. by Carolina Acosta
Sample search query on all images files for cumulus 92, using Alfresco. by Carolina Acosta

See 4 usage examples →

Speedtest by Ookla Global Fixed and Mobile Network Performance Maps

analyticsbroadbandcitiescivicdisaster responsegeospatialglobalgovernment spendinginfrastructureinternetmappingnetwork trafficparquetregulatorytelecommunicationstiles

Global fixed broadband and mobile (cellular) network performance, allocated to zoom level 16 web mercator tiles (approximately 610.8 meters by 610.8 meters at the equator). Data is provided in both Shapefile format as well as Apache Parquet with geometries represented in Well Known Text (WKT) projected in EPSG:4326. Download speed, upload speed, and latency are collected via the Speedtest by Ookla applications for Android and iOS and averaged for each tile. Measurements are filtered to results containing GPS-quality location accuracy.

Usage examples

How to deliver performant GIS desktop applications with Amazon AppStream 2.0 by Ethan Fahy and Spencer DeBrosse
Bootstrapping Dask on 1000 cores with AWS Fargate by Imri Paran
Launching Lonboard - A Python library for extremely fast geospatial vector data visualization in Jupyter by Kyle Barron
New Year, Great Data: The Best Ookla Open Data Projects We’ve Seen So Far by Katie Jolly

See 4 usage examples →

Storm EVent ImageRy (SEVIR)

meteorologicalsatellite imageryweather

Collection of spatially and temporally aligned GOES-16 ABI satellite imagery, NEXRAD radar mosaics, and GOES-16 GLM lightning detections.

Usage examples

Towards a More Realistic and Detailed Deep-Learning-Based Radar Echo Extrapolation Method by Yuan Hu, Lei Chen, Zhibin Wang, Xiang Pan and Hao Li
Using Generators for SEVIR data by Mark Veillette
sevir -- python utilities for working with SEVIR dataset by Mark Veillette
Introduction to SEVIR by Mark Veillette

See 4 usage examples →

Synthea synthetic patient generator data in OMOP Common Data Model

bioinformaticshealthlife sciencesnatural language processingus

The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and gov...

Usage examples

OHDSIonAWS by James Wiggins
Predict patient health outcomes using OHDSI and machine learning on AWS by James Wiggins
Map clinical notes to the OMOP Common Data Model and healthcare ontologies using Amazon Comprehend Medical by James Wiggins
Create data science environments on AWS for health analysis using OHDSI by James Wiggins

See 4 usage examples →

The Impact of Variation on Function Consortium (IGVF)

bioinformaticsbiologygeneticgenomiclife sciences

The IGVF (Impact of Genomic Variation on Function) Consortium aims to understand how genomic variation affects genome function, which in turn impacts phenotype. The NHGRI is funding this collaborative program that brings together teams of investigators who will use state-of-the-art experimental and computational approaches to model, predict, characterize and map genome function, how genome function shapes phenotype, and how these processes are affected by genomic variation. These joint efforts will produce a catalog of the impact of genomic variants on genome function and phenotypes.
The Da...

Usage examples

See 4 usage examples →

UK Biobank Linkage Disequilibrium Matrices

geneticgenome wide association studygenomiclife sciencespopulation genetics

Linkage disequilibrium (LD) matrices of UK Biobank participants of a British ancestry, based on imputed genotypes.

Usage examples

PolyFun Wiki by Omer Weissbrod
Functionally informed fine-mapping and polygenic localization of complex trait heritability by Weissbrod et al.
Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores by Weissbrod et al.
PolyFun and PolyPred software by Omer Weissbrod

See 4 usage examples →

UK Biobank Pan-Ancestry Summary Statistics

geneticgenome wide association studygenomiclife sciencespopulation genetics

A multi-ancestry analysis of 7,221 phenotypes using a generalized mixed model association testing framework, spanning 16,119 genome-wide association studies. We provide standard meta-analysis across all populations and with a leave-one-population-out approach for each trait. The data are provided in tsv format (per phenotype) and Hail MatrixTable (all phenotypes and variants). Metadata is provided in phenotype and variant manifests.

Usage examples

Hail Tutorials by Hail Team
Pan-ancestry genetic analysis of the UK Biobank by Pan UKBB Team
Hail by Hail Team
Hail on AWS Quick Start by Amazon Web Services and PrivoIT

See 4 usage examples →

Version 2 High Resolution Canopy Height Maps by WRI and Meta

aerial imageryagricultureclimatecogearth observationgeospatialimage processingland covermachine learningsatellite imagery

Version 2 Global and regional Canopy Height Maps (CHMv2). Created using machine learning models on high-resolution worldwide Vantor satellite imagery.

Usage examples

Get To Know A Dataset - CHMv2 by Meta
DINOv3 by Oriane Siméoni, Huy V. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julien Mairal, Hervé Jégou, Patrick Labatut, Piotr Bojanowski
CHMv2: Improvements in Global Canopy Height Mapping using DINOv3 by John Brandt, Seungeun Yi, Jamie Tolan, Xinyuan Li, Peter Potapov,Jessica Ertel, Justine Spore, Huy V. Vo, Michael Ramamonjisoa, Patrick Labatut, Piotr Bojanowski, and Camille Couprie
Global Canopy Height on Earth Engine by Meta and WRI

See 4 usage examples →

Yale-CMU-Berkeley (YCB) Object and Model Set

robotics

This project primarily aims to facilitate performance benchmarking in robotics research. The dataset provides mesh models, RGB, RGB-D and point cloud images of over 80 objects. The physical objects are also available via the YCB benchmarking project. The data are collected by two state of the art systems: UC Berkley's scanning rig and the Google scanner. The UC Berkley's scanning rig data provide meshes generated with Poisson reconstruction, meshes generated with volumetric range image integration, textured versions of both meshes, Kinbody files for using the meshes with OpenRAVE, 600 ...

Usage examples

See 4 usage examples →

iSDAsoil

agricultureanalyticsbiodiversityconservationdeep learningfood securitygeospatialmachine learningsatellite imagery

iSDAsoil is a resource containing soil property predictions for the entire African continent, generated using machine learning. Maps for over 20 different soil properties have been created at 2 different depths (0-20 and 20-50cm). Soil property predictions were made using machine learning coupled with remote sensing data and a training set of over 100,000 analyzed soil samples. Included in this dataset are images of predicted soil properties, model error and satellite covariates used in the mapping process.

Usage examples

iSDAsoil liming demo app on Observable by Jamie Collinson
iSDAsoil Python tutorial by Matt Miller
iSDAsoil homepage - view soil property maps online by iSDA
African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning by Tomislav Hengl, Matthew A. E. Miller, Josip Križan, Keith D. Shepherd, Andrew Sila, Milan Kilibarda, Ognjen Antonijević, Luka Glušica, Achim Dobermann, Stephan M. Haefele, Steve P. McGrath, Gifty E. Acquah, Jamie Collinson, Leandro Parente, Mohammadreza Sheykhmousa, Kazuki Saito, Jean-Martial Johnson, Jordan Chamberlin, Francis B. T. Silatsa, Martin Yemefack, John Wendt, Robert A. MacMillan, Ichsani Wheeler & Jonathan Crouch

See 4 usage examples →

real-changesets

disaster responsegeospatialmappingosm

The real-changesets is an augmented representation of OpenStreetMap changesets in JSON format. It contains the current and the previous version of each feature in a changeset. It's primary used by OSMCha, the main OpenStreetMap validation tool, to have a visualization of the changeset and provide to the user the understanding of what was changed on the map. The real-changesets are created by combining the changeset metadata and the augmented diff generated by overpass.

Usage examples

See 4 usage examples →

1000 Genomes

fastqgeneticgenomiclife scienceswhole genome sequencing

The 1000 Genomes Project is an international collaboration which has established the most detailed catalogue of human genetic variation, including SNPs, structural variants, and their haplotype context. The final phase of the project sequenced more than 2500 individuals from 26 different populations around the world and produced an integrated set of phased haplotypes with more than 80 million variants for these individuals.

Usage examples

See 3 usage examples →

AG-LOAM Dataset

agriculturelidarlocalizationmappingrobotics

AG-LOAM dataset has been released to facilitate the evaluation of LiDAR-based odometry algorithms in agricultural environments.

It was collected by a wheeled mobile robot at the Agricultural Experimental Station of the University of California, Riverside, during Winter 2022 and Winter 2023.
It provides LiDAR point cloud data captured using a Velodyne VLP-16 sensor, along with ground-truth trajectories obtained from an RTK-GPS system.
It consists of 18 sequences collected over three phases, covering diverse planting environments, terrain conditions, path patterns, and robot motion profiles.
It

...

Usage examples

Source code of the LiDAR-only odometry and mapping system by Hanzhe Teng et al.
Adaptive LiDAR Odometry and Mapping for Autonomous Agricultural Mobile Robots in Unmanned Farms by Hanzhe Teng, Yipeng Wang, Dimitrios Chatziparaschis, Konstantinos Karydis
Adaptive LiDAR Odometry and Mapping for Autonomous Agricultural Mobile Robots in Unmanned Farms by Hanzhe Teng, Yipeng Wang, Dimitrios Chatziparaschis, Konstantinos Karydis

See 3 usage examples →

AI3 Protein-Ligand Binding Affinity Dataset

healthlife sciencesmachine learningmolecular dynamicspharmaceuticalproteinsimulations

The rapid advancement of computing technologies, particularly artificial intelligence (AI), has revolutionized various domains, including drug discovery. Curated datasets are crucial for developing reliable, generalizable, and accurate models for practical applications. Generating experimental data on a large scale is an expensive and arduous process. In domains such as medical diagnostics where real-life data is hard to obtain, synthetic data has been shown to be extremely valuable. We, teams from IIIT Hyderabad, Intel, AWS, and Insilico Medicine, have performed physics-based calculations (mo...

Usage examples

See 3 usage examples →

ASF SAR Data Products for Disaster Events

cogdisaster responsegeospatialsatellite imagerystac

synthetic Aperture Radar (SAR) data is a powerful tool for monitoring and assessing disaster events and can provide valuable insights for researchers, scientists, and emergency response teams. The Alaska Satellite Facility (ASF) curates this collection of (primarily) SAR and SAR-derived satellite data products from a variety of data sources for disaster events.

Usage examples

See 3 usage examples →

AdaptiveFlow Ligand Libraries

bioinformaticslife sciencesmedicinepharmaceuticalstructural biology

AdaptiveFlow Versions of Ligand Libraries in Ready-To-Dock Format

Usage examples

See 3 usage examples →

Allen Ivy Glioblastoma Atlas

biologycancercomputer visiongene expressiongeneticglioblastomaHomo sapiensimage processingimaginglife sciencesmachine learningneurobiology

This dataset consists of images of glioblastoma human brain tumor tissue sections that have been probed for expression of particular genes believed to play a role in development of the cancer. Each tissue section is adjacent to another section that was stained with a reagent useful for identifying histological features of the tumor. Each of these types of images has been completely annotated for tumor features by a machine learning process trained by expert medical doctors.

Usage examples

See 3 usage examples →

Allen Mouse Brain Atlas

biologygene expressiongeneticimage processingimaginglife sciencesMus musculusneurobiologytranscriptomics

The Allen Mouse Brain Atlas is a genome-scale collection of cellular resolution gene expression profiles using in situ hybridization (ISH). Highly methodical data production methods and comprehensive anatomical coverage via dense, uniformly spaced sampling facilitate data consistency and comparability across >20,000 genes. The use of an inbred mouse strain with minimal animal-to-animal variance allows one to treat the brain essentially as a complex but highly reproducible three-dimensional tissue array. The entire Allen Mouse Brain Atlas dataset and associated tools are available through an...

Usage examples

See 3 usage examples →

Beat Acute Myeloid Leukemia (AML) 1.0

cancergeneticgenomicHomo sapienslife sciencesSTRIDES

Beat AML 1.0 is a collaborative research program involving 11 academic medical centers who worked collectively to better understand drugs and drug combinations that should be prioritized for further development within clinical and/or molecular subsets of acute myeloid leukemia (AML) patients. Beat AML 1.0 provides the largest-to-date dataset on primary acute myeloid leukemia samples offering genomic, clinical, and drug response.This dataset contains open Clinical Supplement and RNA-Seq Gene Expression Quantification data.This dataset also contains controlled Whole Exome Sequencing (WXS) and R...

Usage examples

Genomic Data Commons by National Cancer Institute
Functional Genomic Landscape of Acute Myeloid Leukemia by Jeffrey W. Tyner, Cristina E. Tognon, Dan Bottomly et al.
Clinical resistance to crenolanib in acute myeloid leukemia due to diverse molecular mechanisms by Zhang H, Savage S, Schultz AR, Bottomly D, White L, Segerdell E, et al.

See 3 usage examples →

Blended TROPOMI+GOSAT Satellite Data Product for Atmospheric Methane

climateenvironmentalsatellite imagery

A dataset of satellite retrievals of atmospheric methane that extends from 30 April 2018 to present.

Usage examples

A blended TROPOMI+GOSAT satellite data product for atmospheric methane using machine learning to correct retrieval biases by N. Balasus, D.J. Jacob, A. Lorente, J.D. Maasakkers, R.J. Parker, H. Boesch, Z. Chen, M.M. Kelp, H. Nesser, D.J. Varon
Downloading the full data record from AWS by Nicholas Balasus
Plotting one month of blended TROPOMI+GOSAT methane retrievals by Nicholas Balasus

See 3 usage examples →

BraiDyn-BC: Cued lever-pull task dataset

calcium imagingimaginglife sciencesMus musculusneurosciencevideo

The BraiDyn-BC (Brain Dynamics underlying emergence of Behavioral Change) Database offers an extensive, multimodal dataset that links wide-field calcium imaging of the mouse neocortex to comprehensive behavioral measurements during a behavioral task. As one of the contents in this database, we newly provide a dataset that includes 15 sessions spanning two weeks of motor skill learning, in which 25 mice were trained to pull a lever to obtain water rewards. Simultaneous high-speed videography captures body, facial, and eye movements, and environmental parameters are monitored. The dataset also ...

Usage examples

A multimodal dataset linking wide-field calcium imaging to behavior changes in mice during an operant lever-pull task by Kondo M, Sehara K, Harukuni R, Aoki R, Sugimoto S, Tanaka YR, Matsuzaki M, Nakae K
A set of libraries used for generating the dataset by Keisuke Sehara, Ryo Aoki, Shoya Sugimoto
Detailed usage tutorials on Google Colab by Keisuke Sehara

See 3 usage examples →

COBRA

cancercomputational pathologycomputer visiondeep learninghistopathologylife sciences

This page describes the COBRA (Classification Of Basal cell carcinoma, Risky skin cancers and Abnormalities) skin pathology dataset, which comprises over 7000 histopathology whole-slide-images related to the diagnosis of basal cell carcinoma skin cancer, the most commonly diagnosed cancer. The dataset includes biopsies and excisions and is divided into four groups. The first group contains about 2,500 BCC biopsies with subtype labels, while the second group includes 2,500 non-BCC biopsies with different types of skin dysplasia. The third group has 1,000 labelled risky cancer biopsies, includin...

Usage examples

See 3 usage examples →

COVID-19 Harmonized Data

coronavirusCOVID-19life sciences

A harmonized collection of the core data pertaining to COVID-19 reported cases by geography, in a format prepared for analysis

Usage examples

See 3 usage examples →

CanElevation - LiDAR Point Clouds

elevationfloodsgeospatiallandlidarurban

The LiDAR Point Clouds is a product that is part of the CanElevation Series created to support the National Elevation Data Strategy implemented by NRCan. This product contains point clouds from various airborne LiDAR acquisition projects conducted in Canada. These airborne LiDAR acquisition projects may have been conducted by NRCan or by various partners. The LiDAR point cloud data is licensed under an open government license and has been incorporated into the National Elevation Data Strategy. Point cloud files are distributed by LiDAR acquisition project without integration between projects. The point cloud files are distributed using the compressed .LAZ / Cloud Optimized Point Cloud (COPC) format. The COPC open format is an octree reorganization of the data inside a .LAZ 1.4 file. It allows efficient use and visualization rendering...

Usage examples

See 3 usage examples →

Canopy Tree Height Map for the Amazon Forest (mean height composite 2020-2024) by CTrees.org

cogconservationdeep learningearth observationenvironmentalgeospatialimage processingland coverlidarsatellite imagery

Mean canopy Tree Height for the Amazon Forest on the period 2020-2024 at 4.78 m of spatial resolution. Created using a deep learning model on high-resolution Planet imagery from the Norway's International Climate and Forest Initiative (NICFI) Satellite Data Program. From the original research paper https://doi.org/10.48550/arXiv.2501.10600

Usage examples

Is this the largest tree in the Amazon? A Q&A with CTrees scientist Fabien Wagner by Rachel Kovinsky
How to download the CTrees Amazon Canopy Height Map by Fabien H Wagner
High Resolution Tree Height Mapping of the Amazon Forest using Planet NICFI Images and LiDAR-Informed U-Net Model by Fabien H Wagner, Ricardo Dalagnol, Griffin Carter, Mayumi CM Hirye, Shivraj Gill, Le Bienfaiteur Sagang Takougoum, Samuel Favrichon, Michael Keller, Jean PHB Ometto, Lorena Alves, Cynthia Creze, Stephanie P George-Chacon, Shuang Li, Zhihua Liu, Adugna Mullissa, Yan Yang, Erone G Santos, Sarah R Worden, Martin Brandt, Philippe Ciais, Stephen C Hagen, Sassan Saatchi

See 3 usage examples →

Cell Organelle Segmentation in Electron Microscopy (COSEM) on AWS

cell biologycomputer visionelectron microscopyimaginglife sciencesorganelle

High resolution images of subcellular structures.

Usage examples

Whole-cell organelle segmentation in volume electron microscopy by Lisa Heinrich, Davis Bennett, David Ackerman, Woohyun Park, Jon Bogovic, Nils Eckstein, et al.
Enhanced FIB-SEM systems for large-volume 3D imaging by C. Shan Xu, Kenneth J. Hayworth, Zhiyuan Lu, Patricia Grob, Ahmed M. Hassan, José G. García-Cerdán, Krishna K. Niyogi, Eva Nogales, Richard J. Weinberg, Harald F. Hess.
Correlative three-dimensional super-resolution and block-face electron microscopy of whole vitreously frozen cells. by David P. Hoffman, Gleb Shtengel, C. Shan Xu, Kirby R. Campbell, Melanie Freeman, Lei Wang, Daniel E. Milkie, H. Amalia Pasolli, Nirmala Iyer, John A. Bogovic, Daniel R. Stabley, Abbas Shirinifard, Song Pang, David Peale, Kathy Schaefer, Wim Pomp, Chi-Lun Chang, Jennifer Lippincott-Schwartz, Tom Kirchhausen1, David J. Solecki, Eric Betzig, Harald F. Hess

See 3 usage examples →

CitrusFarm Dataset

agriculturecomputer visionIMUlidarlocalizationmappingrobotics

CitrusFarm is a multimodal agricultural robotics dataset that provides both multispectral images and navigational sensor data for localization, mapping and crop monitoring tasks.

It was collected by a wheeled mobile robot in the Agricultural Experimental Station at the University of California Riverside in the summer of 2023.
It offers a total of nine sensing modalities, including stereo RGB, depth, monochrome, near-infrared and thermal images, as well as wheel odometry, LiDAR, IMU and GPS-RTK data.
It comprises seven sequences collected from three citrus tree fields, featuring various tree spe

...

Usage examples

Python scripts used in the data collection and post-processing by Hanzhe Teng et al.
Python script to download this dataset by Hanzhe Teng et al.
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms by Hanzhe Teng, Yipeng Wang, Xiaoao Song and Konstantinos Karydis

See 3 usage examples →

Clinical Trial Sequencing Project - Diffuse Large B-Cell Lymphoma

cancergenomiclife sciencesSTRIDEStranscriptomicswhole genome sequencing

The goal of the project is to identify recurrent genetic alterations (mutations, deletions, amplifications, rearrangements) and/or gene expression signatures. National Cancer Institute (NCI) utilized whole genome sequencing and/or whole exome sequencing in conjunction with transcriptome sequencing. The samples were processed and submitted for genomic characterization using pipelines and procedures established within The Cancer Genome Analysis (TCGA) project.

Usage examples

Genetics and Pathogenesis of Diffuse Large B Cell Lymphoma by Roland Schmitz, Ph.D., George W. Wright, Ph.D., Da Wei Huang, M.D., Calvin A. Johnson, Ph.D., James D. Phelan, Ph.D., James Q. Wang, Ph.D., Sandrine Roulland, Ph.D., Monica Kasbekar, Ph.D., Ryan M. Young, Ph.D., Arthur L. Shaffer, Ph.D., Daniel J. Hodson, M.D., Ph.D., Wenming Xiao, Ph.D., et al.
A multiprotein supercomplex controlling oncogenic signalling in lymphoma by Phelan JD, Young RM, Webster DE, Roulland S, Wright GW, Kasbekar M, Shaffer AL 3rd, Ceribelli M, Wang JQ, Schmitz R, Nakagawa M, Bachy E, Huang DW, Ji Y, Chen L, Yang Y, Zhao H, Yu X, Xu W, Palisoc MM, Valadez RR, Davies-Hill T, Wilson WH, Chan WC, Jaffe ES, Gascoyne RD, Campo E, Rosenwald A, Ott G, Delabie J, Rimsza LM, Rodriguez FJ, Estephan F, Holdhoff M, Kruhlak MJ, Hewitt SM, Thomas CJ, Pittaluga S, Oellerich T, Staudt LM
Genomic Data Commons by National Cancer Institute

See 3 usage examples →

DeepDrug Protein Embeddings Bank (DPEB)

bioinformaticslife sciencesmachine learningproteinstructural biology

DPEB is a multimodal database of human protein embeddings integrating four biologically complementary representations—AlphaFold2, BioEmbeddings, ESM-2, and ProtVec—designed for enhanced protein-protein interaction prediction and functional classification.

Usage examples

See 3 usage examples →

Department of Energy’s Geothermal Data Repository (GDR) Data Lake

energygeothermal

Data released from projects funded by the Department of Energy's Geothermal Technologies Office (DOE GTO) that are too large or complex to be conveniently accessed by traditional means. The GDR data lake aims to improve and automate access of high-value geothermal data sets, making data actionable and discoverable by researchers and industry to accelerate analysis and advance innovation. This data lake is a sister-data lake to the Department of Energy’s Open Energy Data Initiative (OEDI) Data Lake.

Usage examples

The Imperial Valley Dark Fiber Project: Toward Seismic Studies Using DAS and Telecom Infrastructure for Geothermal Applications by J. Ajo-Franklin, V. Rodrigues Tribaldos, A. Nayak, et al.
Dark Fiber Jupyter Notebook Tutorial by A. Nayak, A. Lowney
The Geothermal Data Repository: Ten Years of Supporting the Geothermal Industry with Open Access to Geothermal Data by J. Weers, A. Anderson, N. Taverna

See 3 usage examples →

EURO-CORDEX - European component of the Coordinated Regional Downscaling Experiment

atmosphereclimateclimate modelgeospatialmodelzarr

The EURO-CORDEX dataset contains regional climate model data for Europe, for use in impacts, decision-making, and climate science. Currently, the bucket contains monthly datasets of 2m air temperature downscaled from CMIP5 global model datasets using different regional climate models.

Usage examples

Beyond ESGF - Regional climate model datasets in the cloud on AWS S3 by Lars Buntemeyer
Intake-ESM Catalog by Lars Buntemeyer
Jupyter Book by Lars Buntemeyer

See 3 usage examples →

Exceptional Responders Initiative

cancerepigenomicsgenomiclife sciencesSTRIDEStranscriptomicswhole exome sequencingwhole genome sequencing

The Exceptional Responders Initiative is a pilot study to investigate the underlying molecular factors driving exceptional treatment responses of cancer patients to drug therapies. Study researchers will examine molecular profiles of tumors from patients either enrolled in a clinical trial for an investigational drug(s) and who achieved an exceptional response relative to other trial participants, or who achieved an exceptional response to a non-investigational chemotherapy. An exceptional response is defined as achievement of either a complete response or a partial response for at least 6 mon...

Usage examples

Genomic Data Commons by National Cancer Institute
GDC Legacy Archive by National Cancer Institute
The Exceptional Responders Initiative: Feasibility of a National Cancer Institute Pilot Study by Barbara A. Conley, Lou Staudt, et al.

See 3 usage examples →

Finnish Meteorological Institute Weather Radar Data

agricultureearth observationmeteorologicalweather

The up-to-date weather radar from the FMI radar network is available as Open Data. The data contain both single radar data along with composites over Finland in GeoTIFF and HDF5-formats. Available composite parameters consist of radar reflectivity (DBZ), rainfall intensity (RR), and precipitation accumulation of 1, 12, and 24 hours. Single radar parameters consist of radar reflectivity (DBZ), radial velocity (VRAD), rain classification (HCLASS), and Cloud top height (ETOP 20). Raw volume data from singe radars are also provided in HDF5 format with ODIM 2.3 conventions. Radar data becomes avail...

Usage examples

Processing HDF5 data with python by Roope Tervo
Handling data with QGIS by Markus Peura
Processing GeoTIFF data with python by Roope Tervo

See 3 usage examples →

Foundation Medicine Adult Cancer Clinical Dataset (FM-AD)

cancergenomiclife sciences

The Foundation Medicine Adult Cancer Clinical Dataset (FM-AD) is a study conducted by Foundation Medicine Inc (FMI). Genomic profiling data for approximately 18,000 adult patients with a diverse array of cancers was generated using FoundationeOne, FMI's commercially available, comprehensive genomic profiling assay. This dataset contains open Clinical and Biospecimen data.

Usage examples

High-Throughput Genomic Profiling of Adult Solid Tumors Reveals Novel Insights into Cancer Pathogenesis by Ryan J. Hartmaier, Lee A. Albacker, Juliann Chmielecki, Mark Bailey, Jie He, Michael E. Goldberg, Shakti Ramkissoon, James Suh, Julia A. Elvin, Samuel Chiacchia, Garrett M. Frampton, Jeffrey S. Ross, Vincent Miller, Philip J. Stephens and Doron Lipson
Genomic Data Commons by National Cancer Institute
Targeted next-generation sequencing of advanced prostate cancer identifies potential therapeutic targets and disease heterogeneity. by Beltran H, Yelensky R, Frampton GM, Park K, Downing SR, MacDonald TY, Jarosz M, Lipson D, Tagawa ST, Nanus DM, Stephens PJ, Mosquera JM, Cronin MT, Rubin MA

See 3 usage examples →

GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1)

datacenterearth observationglobalicemetadataoceansparquetuswater

A Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis produced as a retrospective dataset (four day latency) and near-real-time dataset (one day latency) at the JPL Physical Oceanography DAAC using wavelets as basis functions in an optimal interpolation approach on a global 0.01 degree grid. The version 4 Multiscale Ultrahigh Resolution (MUR) L4 analysis is based upon nighttime GHRSST L2P skin and subskin SST observations from several instruments including the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave S...

Usage examples

Using Sea Surface Temperature and Sea Surface Height Data for Hurricane Helene by Julie Sanchez
MUR Sea Surface Temperature Analysis of Washington State by Zoë Walschots
A multi-scale high-resolution analysis of global sea surface temperature by Chin, T.M, J. Vazquez-Cuervo, and E.M. Armstrong

See 3 usage examples →

Global Cache of Japan

atmosphereclimateclimate modelclimate projectionsclimate riskearth observationforecasthydrologymeteorologicaloceansradarsatellite imageryspace weatherweather

Global real-time Earth system data deemed by the World Meteorological Organisation (WMO) as essential for provision of services for the protection of life and property and for the well-being of all nations. Data is sourced from all WMO Member countries / territories and retained for 24-hours. JMA operate this Global Cache service curating and publishing the dataset on behalf of WMO.

Usage examples

WIS 2.0 video for 19th World Meterological Congress by WMO Secretariat
Guide to the WMO Information System (WMO-No. 1061), Volume II, WMO Information System 2.0 by World Meteorological Organisation
Manual on the WMO Information System (WMO-No. 1060), Volume II, WMO Information System 2.0 by World Meteorological Organisation

See 3 usage examples →

Golden Retriever Lifetime Study: Whole genome genotyping of Golden Retrievers on Axiom HD Arrays

genomegenotypinggolden retriever lifetime studylife sciencesmorris animal foundation

Morris Animal Foundation’s Golden Retriever Lifetime Study is a longitudinal, prospective study following 3044 golden retrievers. The Study’s purpose is to identify the nutritional, environmental, lifestyle and genetic risk factors for cancer and other diseases. The Golden Oldie’s study enrolled an additional cohort of golden retrievers that had reached the age of 12 years or older and had not yet been diagnosed with a malignant cancer. This population can be used as a control group for conditions with high mortality in younger age. This dataset contains the data for ~1.1 million genetic marke...

Usage examples

The Golden Retriever Lifetime Study: establishing an observational cohort study with translational relevance for human health by Michael K. Guy, Rodney L. Page, Wayne A. Jensen, Patricia N. Olson, J. David Haworth, Erin E. Searfoss, and Diane E. Brown
GRLS GWAS Tutorial by Tamer Mansour
Cohort profile: The Golden Retriever Lifetime Study (GRLS) by Julia Labadie, Brenna Swafford, Mara DePena, Kathy Tietje, Rodney Page, Janet Patterson-Kane

See 3 usage examples →

Human and Mammalian Brain Atlas

biologygene expressionHomo sapienslife sciencesMus musculusneurobiologynon-human primatesingle-cell transcriptomics

Human and Mammalian Brain Atlas (HMBA) is a major atlas of the BRAIN Initiative Cell Atlas Network (BICAN) that proposes to establish a comprehensive, highly granular cell atlas in complete adult human, macaque, and marmoset brains that links brain structure, function and cellular architecture. Release artifacts have been made available in this OpenData bucket to enable utilization along with their paper publications by the neuroscience community.

Usage examples

See 3 usage examples →

I-CARE:International Cardiac Arrest REsearch consortium Electroencephalography Database

bioinformaticsdeep learninglife sciencesmachine learningmedicineneurophysiologyneuroscience

The International Cardiac Arrest REsearch consortium (I-CARE) Database includes baseline clinical information and continuous electroencephalography (EEG) recordings from 1,020 comatose patients with a diagnosis of cardiac arrest who were admitted to an intensive care unit from seven academic hospitals in the U.S. and Europe. Patients were monitored with 18 bipolar EEG channels over hours to days for the diagnosis of seizures and for neurological prognostication. Long-term neurological function was determined using the Cerebral Performance Category scale.

Usage examples

The International Cardiac Arrest Research (I-CARE) Consortium Electroencephalography Database by Amorim E, Zheng WL, Ghassemi MM, Aghaeeaval M, Kandhare P, Karukonda V, et al.
WFDB Software Package by Moody, G., Pollard, T., & Moody, B.
I-CARE:International Cardiac Arrest REsearch consortium Electroencephalography Database by Amorim E, Zheng WL, Ghassemi MM, Aghaeeaval M, Kandhare P, Karukonda V, et al.

See 3 usage examples →

Imaging MIT Licensed data and models

biodiversityBiohubbioinformaticsbiologybiomolecular modelingbrain imagescell biologycell imagingimaginglife sciencesmachine learningmicroscopymodelproteinzarr

This dataset contains a diverse range of imaging biological data and models. The data is sourced and curated by a team of experts at Biohub and is made available as part of these datasets only when it is not publicly accessible or requires transformations to support model training.

Usage examples

Documentation for CELL-Diff by Biohub
Quickstart Tutorial for SubCell by Biohub
Documentation for SubCell by Biohub
SubCell: Vision foundation models for microscopy capture single-cell biology by Ankit Gupta, Zoe Wefers, Konstantin Kahnert, Jan N Hansen, William D. Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, Emma Lundberg
CELL-Diff: Unified diffusion modeling for protein sequences and microscopy images by Zheng Dihan, Bo Huang

See 6 usage examples →

Indiana Statewide Digital Aerial Imagery Catalog

aerial imageryagriculturecogearth observationgeospatialimagingmappingnatural resourcesustainability

The State of Indiana Geographic Information Office and IOT Office of Technology manage a series of digital orthophotography dating back to 2005. Every year's worth of imagery is available as Cloud Optimized GeoTIFF (COG) files, original GeoTIFF, and other compressed deliverables such as ECW and MrSID. Additionally, each imagery year is organized into a tile grid scheme covering the entire geography of Indiana. All years of imagery are tiled from a 5,000 ft grid or sub tiles depending upon the resolution of the imagery. The naming of the tiles reflects the lower left coordinate from the...

Usage examples

ArcGIS Online Indiana Orthoimagery Viewer by Indiana Geographic Information Office (IGIO)
Recording of 2025 - 2028 Indiana Orthoimagery Program Presentation by Indiana Geographic Information Office (IGIO)
IGIO Imagery Opendata S3 Browser by Indiana Geographic Information Office (IGIO)

See 3 usage examples →

Indiana Statewide Elevation Catalog

agricultureearth observationgeospatialimaginglidarmappingnatural resourcesustainability

The State of Indiana Geographic Information Office and IOT Office of Technology manage a series of digital LiDAR LAS files stored in AWS, dating back to the 2011-2013 collection and including the NRCS-funded 2016-2020 collection. These LiDAR datasets are available as uncompressed LAS files, for cloud storage and access. Each year's data is organized into a tile grid scheme covering the entire geography of Indiana, ensuring easy access and efficient processing. The tiles' naming reflects each tile's lower left coordinate, facilitating accurate data management and retrieval. The AWS ...

Usage examples

ArcGIS Online Indiana Lidar Viewer by Indiana Geographic Information Office (IGIO)
Recording of 2025 - 2028 Indiana Imagery and Elevation Program Presentation by Indiana Geographic Information Office (IGIO)
IGIO Elevation Opendata S3 Browser by Indiana Geographic Information Office (IGIO)

See 3 usage examples →

Japanese Tokenizer Dictionaries

csvjapanesenatural language processing

Japanese Tokenizer Dictionaries for use with MeCab.

Usage examples

See 3 usage examples →

Kraken2 NCBI RefSeq Complete V205 database on AWS

benchmarkbioinformaticslife sciencesmetagenomicsmicrobiome

Database for use with Kraken2 (taxonomic annotation of metagenomic sequencing reads) including all NCBI RefSeq genomes available in release V205

Usage examples

Using an Amazon Machine Image for analysing samples with Kraken2 by Robyn Wright
From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools by Robyn J. Wright, Andre M. Comeau and Morgan G.I. Langille
Kraken2 by Derrick Wood, Jennifer Lu and Ben Langmead

See 3 usage examples →

MIMIC-III (‘Medical Information Mart for Intensive Care’)

bioinformaticshealthlife sciencesnatural language processingus

MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework. The MIMIC-I...

Usage examples

Perform biomedical informatics without a database using MIMIC-III data and Amazon Athena by James Wiggins, Alistair Johnson
Building predictive disease models using Amazon SageMaker with Amazon HealthLake normalized data by Ujjwal Ratan, Nihir Chadderwala, and Parminder Bhatia
MIMIC-code GitHub repository by Alistair Johnson

See 3 usage examples →

Medical Segmentation Decathlon

computed tomographyhealthimaginglife sciencesmagnetic resonance imagingmedicineniftisegmentation

With recent advances in machine learning, semantic segmentation algorithms are becoming increasingly general purpose and translatable to unseen tasks. Many key algorithmic advances in the field of medical imaging are commonly validated on a small number of tasks, limiting our understanding of the generalisability of the proposed contributions. A model which works out-of-the-box on many tasks, in the spirit of AutoML, would have a tremendous impact on healthcare. The field of medical imaging is also missing a fully open source and comprehensive benchmark for general purpose algorithmic validati...

Usage examples

A large annotated medical image dataset for the development and evaluation of segmentation algorithms by Simpson A. L., Antonelli M., Bakas S., Bilello M., Farahana K., van Ginneken B., et al
MONAI: Getting Started by MONAI Development Team
Pytorch-Integrated MSD Data Loader by MONAI Development Team

See 3 usage examples →

Met Office Global Deterministic 10km on a 2-year rolling archive

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

THIS DATASET IS CHANGING

Files uploaded from late January 2026 onward will contain changes including:

precision changes
new parameters
changes to existing parameters e.g. adding vertical levels and timesteps
the height_asl_on_pressure_levels parameter will be replaced by geopotential_height_on_pressure_levels

Please check your systems are prepared for these changes.

A numerical weather prediction forecast for the whole globe, with a resolution of approximately 0.09 degrees i.e. 10km (2,560 x 1,920 grid points). The data is available as NetCDF files. It's offered on a free, unsupported basis, so we don't recommend using it ...

Usage examples

See 3 usage examples →

Met Office Global Ocean model on a 2-year rolling archive

forecastgeosciencegeospatialglobalmarinemodelnetcdfocean sea surface heightoceansweather

The Global Ocean component of the Met Office Global Coupled Atmosphere-Land-Ocean-Ice system which has been running in operations since May 2022. The system provides a global physical analysis and coupled forecast products providing 3D daily mean fields of temperature and salinity, zonal and meridional velocities; 2D daily mean fields of sea surface height, bottom temperature, mixed layer depth, sea ice fraction, sea ice thickness and sea ice zonal and meridional velocities; and instantaneous hourly fields for sea surface height, sea surface temperature and surface currents. The Met Office Glo...

Usage examples

Iris by Iris Contributors
Coupled forecasting development by Met Office
Ocean models by Met Office

See 3 usage examples →

Met Office Global Wave model on a 2-year rolling archive

forecastgeosciencegeospatialglobalmarinemodelnetcdfocean sea surface heightoceansweather

The Met Office runs global wave forecast models to support marine safety and operational decision making. Met Office configurations are developed to be run using the community wave model WAVEWATCH IIITM. The global wave configuration is designed to generate accurate forecasts for open waters of the world’s oceans and larger seas. The Met Office wave models are forced using wind data from the Met Office Global Atmospheric Hi-Res Model. The global wave model is run to provide a five day outlook for wave characteristics defining height, period and direction of waves within a given sea-state. The ...

Usage examples

Ocean models by Met Office
Iris by Iris Contributors
Coupled forecasting development by Met Office

See 3 usage examples →

Met Office NWS Ocean model on a 2-year rolling archive

forecastgeosciencegeospatialmarinemodelnetcdfocean sea surface heightoceansweather

The Northwest European continental shelf physical ocean model predicts temperature, salinity and circulation for waters surrounding the UK. Ocean physics analysis provides a 6-day forecast for the North-West European Atlantic shelf at 1.5km resolution:

33 depth levels
Currents
Salinity
Temperature
Mixing Layer Thickness

Usage examples

Ocean models by Met Office
Coupled forecasting development by Met Office
Iris by Iris Contributors

See 3 usage examples →

Met Office NWS Wave model on a 2-year rolling archive

forecastgeosciencegeospatialmarinemodelnetcdfocean sea surface heightoceansweather

Northwest European continental shelf regional wave model predicting sea-state and various sea and swell wave characteristics for waters surrounding the UK.The Met Office runs global and regional wave forecast models to support marine safety and operational decision making. Met Office configurations are developed to be run using the community wave model WAVEWATCH IIITM. The global wave configuration is designed to generate accurate forecasts for open waters of the world's oceans and larger seas, whilst regional configurations are run in order to improve accuracy closer to the coast. The Met...

Usage examples

Iris by Iris Contributors
Coupled forecasting development by Met Office
Ocean models by Met Office

See 3 usage examples →

Met Office UK Deterministic (UKV)2km on a 2-year rolling archive

air temperatureatmosphereforecastgeosciencegeospatialmodelnear-surface air temperaturenear-surface relative humiditynetcdfweather

THIS DATASET IS CHANGING

Files uploaded from late January 2026 onward will contain changes including:

precision changes
new parameters
changes to existing parameters e.g. adding vertical levels and timesteps
the height_asl_on_pressure_levels parameter will be replaced by geopotential_height_on_pressure_levels

Please check your systems are prepared for these changes.

A high-resolution gridded weather forecast for the UK, with a resolution of 0.018 degrees, projected on to a 2km horizontal grid. The data is available as NetCDF files. It's offered on a free, unsupported basis, so we don’t recommend using it for any critical business purpos...

Usage examples

See 3 usage examples →

Multiview Extended Video with Activities (MEVA)

computer visionurbanusvideo

The Multiview Extended Video with Activities (MEVA) dataset consists video data of human activity, both scripted and unscripted, collected with roughly 100 actors over several weeks. The data was collected with 29 cameras with overlapping and non-overlapping fields of view. The current release consists of about 328 hours (516GB, 4259 clips) of video data, as well as 4.6 hours (26GB) of UAV data. Other data includes GPS tracks of actors, camera models, and a site map. We have also released annotations for roughly 184 hours of data. Further updates are planned.

Usage examples

ActEV: Activities in Extended Video by National Institute of Standards and Technology (NIST)
MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection by Kellie Corona, Katie Osterdahl, Roderic Collins, Anthony Hoogs
TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos by Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah

See 3 usage examples →

NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP-CMIP6)

air temperatureclimateclimate modelclimate projectionsCMIP6cogearth observationenvironmentalglobalmodelNASA Center for Climate Simulation (NCCS)near-surface relative humiditynear-surface specific humiditynetcdfprecipitation

The NEX-GDDP-CMIP6 dataset is comprised of global downscaled climate scenarios derived from the General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 6 (CMIP6) and across two of the four "Tier 1" greenhouse gas emissions scenarios known as Shared Socioeconomic Pathways (SSPs). The CMIP6 GCM runs were developed in support of the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR6). This dataset includes downscaled projections from ScenarioMIP model runs for which daily scenarios were produced and distributed...

Usage examples

NASA and ASDI announce no-cost access to important climate dataset on the AWS Cloud by Dr. Manil Maskey and Ana Pinheiro Privette
NASA Global Daily Downscaled Projections, CMIP6 by Thrasher, B., Wang, W., Michaelis, A., Melton, F., Lee, T. and Nemani, R.
NEX-GDDP-CMIP6 Dashboard by NASA

See 3 usage examples →

NASA High Energy Astrophysics Mission Data

archivesastronomydatacenterimagingsatellite imageryx-ray

NASA data for high energy astrophysics (generally x-ray and gamma-ray domains) is made available here by the High Energy Astrophysics Science Archive Research Center. The HEASARC hosts the full data archives of over 30 different missions spanning 50 years. The data archive for each mission will contain a range of data types from spacecraft housekeeping and raw photon event list data up to high level science-ready products such as images, light curves (time series), and energy spectra.

This is a relatively modest total data volume but contains significant complexity and heterogeneity among the different missions. Data provided here are stored in the Flexible Imag...

Usage examples

See 3 usage examples →

NASA Legacy Archive for Microwave Background Data Analysis (LAMBDA)

archivesastronomydatacenterimagingsatellite imagery

NASA data for cosmic microwave background (CMB) analysis is made available here by the Legacy Archive for Microwave Background Data Analysis (LAMBDA), which is a part of NASA's High Energy Astrophysics Science Archive Research Center (HEASARC). LAMBDA hosts the data archives of over 30 different CMB missions spanning 30+ years. The data archive for each mission may contain a range of data types from low-level time-ordered data to high level science-ready products such as sky maps and angular power spectra. Also provided in consistent formats are a variety of full sky maps in complementary ...

Usage examples

See 3 usage examples →

NASA SOHO/LASCO2 comet challenge on AWS

astronomymachine learningNASA SMD AI

The SOHO/LASCO data set (prepared for the challenge hosted in Topcoder) provided here comes from the instrument’s C2 telescope and comprises approximately 36,000 images spread across 2,950 comet observations. The human eye is a very sensitive tool and it is the only tool currently used to reliably detect new comets in SOHO data - particularly comets that are very faint and embedded in the instrument background noise. Bright comets can be easily detected in the LASCO data by relatively simple automated algorithms, but the majority of comets observed by the instrument are extremely faint, noise-...

Usage examples

Topcoder NASA Comet Discovery: A Recap by TopCoder
Winners Selected for the NASA SOHO Comet Search with Artificial Intelligence Open-Science Challenge by Denise Hill
Topcoder Challenge Finds Two New Comets For NASA by Annika Nagy

See 3 usage examples →

NASA Space Biology Open Science Data Repository (OSDR)

bioinformaticsbiologyGeneLabgenomicimaginglife sciencesspace biology

NASA’s Space Biology Open Science Data Repository (OSDR) introduces a one-stop site where users can explore and contribute a variety of NASA open science biological data. This site consolidates data from the Ames Life Sciences Data Archive (ALSDA) and GeneLab and includes information about the broader NASA Open Science and Open Data initiatives, all at one centralized location. Our mission is to maximize the utilization of the valuable biological research resources and enable new discoveries.

OSDR introduces access to data generated from spaceflight and space relevant experiments that explore ...

Usage examples

Advancing the Integration of Biosciences Data Sharing to Further Enable Space Exploration by Ryan T. Scott, Kirill Grigorev, Graham Mackintosh, Samrawit G. Gebre, Christopher E. Mason, Martha E. Del Alto, Sylvain V. Costes
NASA GeneLab: interfaces for the exploration of space omics data by Daniel C Berrios, Jonathan Galazka, Kirill Grigorev, Samrawit Gebre, Sylvain V Costes
GeneLab: Omics database for spaceflight experiments by Shayoni Ray, Samrawit Gebre, Homer Fogle, Daniel C Berrios, Peter B Tran, Jonathan M Galazka, Sylvain V Costes

See 3 usage examples →

NEXRAD ARCO - Analysis-Ready Cloud-Optimized Weather Radar

earth observationmeteorologicalradarweatherzarr

NEXRAD Level II weather radar data converted to FAIR-compliant, analysis-ready cloud-optimized (ARCO) format using Zarr v3 and Icechunk V2. Hierarchically organized by Volume Coverage Pattern (VCP) and sweep, enabling instant time-series access to polarimetric variables (DBZH, ZDR, RHOHV, PHIDP, VELOCITY) without downloading individual files. Currently includes KLOT (Chicago, IL) with continuous updates.

Usage examples

Radar DataTree: A FAIR and Cloud-Native Framework for Scalable Weather Radar Archives by Alfonso Ladino-Rincon and Stephen W. Nesbitt
Get To Know A Dataset: NEXRAD-ARCO by Alfonso Ladino-Rincon
radar-datatree - Examples and tutorials for accessing ARCO weather radar data by Alfonso Ladino-Rincon

See 3 usage examples →

NIFS Large Helical Device (LHD) Experiment

analyticsanomaly detectionarchivescomputed tomographydatacenterdigital assetselectricityenergyfluid dynamicsimage processingphysicspost-processingradiationsignal processingsource codeturbulencevideox-rayx-ray tomography

The Large Helical Device (LHD), owned and operated by the National Institute for Fusion Science (NIFS), is one of the world's largest plasma confinement device which employs a heliotron magnetic configuration generated by the superconducting coils. The objectives are to conduct academic research on the confinement of steady-state, high-temperature, high-density plasmas, core plasma physics, and fusion reactor engineering, which are necessary to develop future fusion reactors. All the archived data of the LHD plasma diagnostics are available since the beginning of the LHD experiment, starte...

Usage examples

See 3 usage examples →

NOAA - hourly position, current, and sea surface temperature from drifters

climateenvironmentalmeteorologicaloceanssustainabilityweather

This dataset includes hourly sea surface temperature and current data collected by satellite-tracked surface drifting buoys ("drifters") of the NOAA Global Drifter Program. The Drifter Data Assembly Center (DAC) at NOAA’s Atlantic Oceanographic and Meteorological Laboratory (AOML) has applied quality control procedures and processing to edit these observational data and obtain estimates at regular hourly intervals. The data include positions (latitude and longitude), sea surface temperatures (total, diurnal, and non-diurnal components) and velocities (eastward, northward) with accompanying uncertainty estimates. Metadata include identification numbe...

Usage examples

A global surface drifter dataset at hourly resolution (2016) by Elipot, S., R. Lumpkin, R. C. Perez, J. M. Lilly, J. J. Early, and A. M. Sykulski
A Dataset of Hourly Sea Surface Temperature From Drifting Buoys (2022) by Elipot, S., A. Sykulski, R. Lumpkin, L. Centurioni, and M. Pazos
Working with GDP hourly data using python and xarray, a CloudDrift notebook by Shane Elipot

See 3 usage examples →

NOAA Emergency Response Imagery

aerial imageryclimatecogdisaster responseweather

In order to support NOAA's homeland security and emergency response requirements, the National Geodetic Survey Remote Sensing Division (NGS/RSD) has the capability to acquire and rapidly disseminate a variety of spatially-referenced datasets to federal, state, and local government agencies, as well as the general public. Remote sensing technologies used for these projects have included lidar, high-resolution digital cameras, a film-based RC-30 aerial camera system, and hyperspectral imagers. Examples of rapid response initiatives include acquiring high resolution images with the Emerge/App...

Usage examples

Using Emergency and Pre-Event Imagery by Jon Sellars
Open data helps recovery in the aftermath of devastating weather events by Jena Kent
ERI notebook using SageMaker Studio Lab (SMSL) by Mya Sears

See 3 usage examples →

NOAA GFS - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

The Global Forecast System (GFS) is a National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) weather forecast model that generates data for dozens of atmospheric and land-soil variables, including temperatures, winds, precipitation, soil moisture, and atmospheric ozone concentration. The system couples four separate models (atmosphere, ocean model, land/soil model, and sea ice) that work together to depict weather conditions.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

NOAA GFS analysis - Weather analysis from the Global Forecast System (GFS) oper

...

Usage examples

See 3 usage examples →

NOAA Global Ensemble Forecast System (GEFS) Re-forecast

agricultureclimatemeteorologicalweather

NOAA has generated a multi-decadal reanalysis and reforecast data set to accompany the next-generation version of its ensemble prediction system, the Global Ensemble Forecast System, version 12 (GEFSv12). Accompanying the real-time forecasts are “reforecasts” of the weather, that is, retrospective forecasts spanning the period 2000-2019. These reforecasts are not as numerous as the real-time data; they were generated only once per day, from 00 UTC initial conditions, and only 5 members were provided, with the following exception. Once weekly, an 11-member reforecast was generated, and these ex...

Usage examples

See 3 usage examples →

NOAA Multi-Radar/Multi-Sensor System (MRMS)

agricultureclimatemeteorologicalweather

The MRMS system was developed to produce severe weather, transportation, and precipitation products for improved decision-making capability to improve hazardous weather forecasts and warnings, along with hydrology, aviation, and numerical weather prediction.

MRMS is a system with fully-automated algorithms that quickly and intelligently integrate data streams from multiple radars, surface and upper air observations, lightning detection systems, satellite observations, and forecast models. Numerous two-dimensional multiple-sensor products offer assistance for hail, wind, tornado, quantitative precipitation estimations, c...

Usage examples

Multi-Radar Multi-Sensor (MRMS) Severe Weather and Aviation Products: Initial Operating Capabilities by Travis M. Smith, Valliappa Lakshmanan, Gregory J. Stumpf, Kiel L. Ortega, Kurt Hondl, Karen Cooper, Kristin M. Calhoun, Darrel M. Kingfield, Kevin L. Manross, Robert Toomey, Jeff Brogden
Collection of Jupyter Notebooks using Python for working with MRMS Data by Project Pythia Community
Multi-Radar Multi-Sensor (MRMS) Quantitative Precipitation Estimation: Initial Operating Capabilities by Jian Zhang, Kenneth Howard, Carrie Langston, Brian Kaney, Youcun Qi, Lin Tang, Heather Grams, Yadong Wang, Stephen Cocks, Steven Martinaitis, Ami Arthur, Karen Cooper, Jeff Brogden, David Kitzmiller

See 3 usage examples →

NOAA North American Multi-Model Ensemble (NMME)

climatemeteorologicalweather

The North American Multi-Model Ensemble (NMME) is an experimental multi-model seasonal forecasting system consisting of coupled models from US modeling centers including NOAA/NCEP, NOAA/GFDL, NCAR, NASA, and Canada's ECCC.

The need for the development of NMME operational predictive capability was recommended in US National Academies report "Assessment of Intraseasonal to Interannual Climate Prediction and Predictability". Indeed, the national effort is required to meet the specific tailored regional prediction and decision support needs of a large community. The multi-model ens...

Usage examples

NMME Articles by Multiple publications listed in the provided link
Real-time Usage by NWS Climate Prediction Center NMME
NMME : Meeting Future Needs Workshop Report by Christine, Bassett, Jessie, Carman, D.K. Kang, and Mark Olsen

See 3 usage examples →

NapierOne Mixed File Dataset

computer forensicscomputer securitycyber securitydigital forensicsmalwaremixed file datasetransomware

NapierOne is a modern cybersecurity mixed file data set, primarily aimed at, but not limited to, ransomware detection and forensic analysis. The dataset contains over 500,000 distinct files, representing 44 distinct popular file types. It was designed to address the known deficiency in research reproducibility and improve consistency by facilitating research replication and repeatability. The data set was inspired by the Govdocs1 data set and it is intended that ‘NapierOne’ be used as a complement to this original data set. An investigation was performed with the goal of determining the common...

Usage examples

NapierOne - A modern mixed file data set alternative to Govdocs1 by Simon R.Davies, Richard Macfarlane, William J.Buchanan
napierOne use examples by s.davies
Exploring the Need For an Updated Mixed File Research Data Set by Simon R.Davies, Richard Macfarlane, William J.Buchanan.

See 3 usage examples →

National Cancer Institute Imaging Data Commons (IDC) Collections

cancerdigital pathologyfluorescence imagingimage processingimaginglife sciencesmachine learningmedical imagingmicroscopyradiology

Imaging Data Commons (IDC) is a repository within the Cancer Research Data Commons (CRDC) that manages imaging data and enables its integration with the other components of CRDC. IDC hosts a growing number of imaging collections that are contributed by either funded US National Cancer Institute (NCI) data collection activities, or by the individual researchers.Image data hosted by IDC is stored in DICOM format.

Usage examples

See 3 usage examples →

National Climate Database (NCDB)

climate projectionsCMIP5CMIP6earth observationenergygeospatialmeteorologicalsolar

The National Climate Database (NCDB) seeks to be the definitive source of climate data for energy applications. The goal of the NCDB is to provide unbiased high temporal and spatial resolution climate data needed for renewable energy modeling. The NCDB seeks to maintain the inherent relationship between the various parameters that are needed to model solar, wind, hydrology and load and provide data for multiple important climate scenarios.

Usage examples

NCDB Website by NREL NCDB Team
Regridding uncertainty for statistical downscaling of solar radiation by Maggie D. Bailey, Douglas Nychka, Manajit Sengupta, Aron Habte, Yu Xie, Soutir Bandyopadhyay
NCDB HSDS Examples by Reid Olson

See 3 usage examples →

National Herbarium of NSW

agriculturebiodiversitybiologyclimatedigital preservationecosystemsenvironmental

The National Herbarium of New South Wales is one of the most significant scientific, cultural and historical botanical resources in the Southern hemisphere. The 1.43 million preserved plant specimens have been captured as high-resolution images and the biodiversity metadata associated with each of the images captured in digital form. Botanical specimens date from year 1770 to today, and form voucher collections that document the distribution and diversity of the world's flora through time, particularly that of NSW, Austalia and the Pacific.The data is used in biodiversity assessment, syste...

Usage examples

See 3 usage examples →

ONT Methylation Benchmarking Datasets

bambenchmarkbioinformaticsepigenomicsgenomiclife scienceslong read sequencing

ONT Methylation Benchmarking Datasets are generated to benchmark existing methylation-calling tools on the Oxford Nanopore sequencing platform using their recent R10.4.1 flowcell chemistry. It spans a diverse range of species, including bacteria (E. coli, H. pylori J99, H. pylori 26695, A. variabilis, T. denticola), plants (Rice, Arabidopsis), and mammals (mouse, human).In addition, the dataset includes EMSeq data for E. coli, plant, and mouse samples, which can serve as ground truth for methylation studies. It also provides unmethylated whole-genome amplified (WGA) DNA for H. pylori 26695 and...

Usage examples

Methylation calling using ONT methylation benchmarking dataset by Onkar Kulkarni
Comprehensive benchmarking of tools for nanopore-based detection of DNA methylation by Kulkarni et al.
Running Benchmarking Pipeline (Nextflow/Snakemake) on an Example Dataset using AWS by Onkar Kulkarni

See 3 usage examples →

OPERA Land Surface Disturbance Annual from Harmonized Landsat Sentinel-2 product (Version 1)

cogearth observationenvironmentalgloballandland coverland use

The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Land Surface Disturbance Annual from Harmonized Landsat Sentinel-2 (HLS) product Version 1 summarizes the DIST-ALERT data product into an annual vegetation disturbance data product. Vegetation disturbance is mapped when there is an indicated decrease in vegetation cover within an HLS Version 2 pixel. The product also provides auxiliary generic disturbance information as determined from the variations of the reflectance through the DIST-ALERT scenes to provide information about more general disturbance trends. The DIS...

Usage examples

Visualizing and Analyzing the OPERA DIST-ANN-HLS Product to Visualize Wildfire Impact in Northern Quebec by C. Speed and M. Grace Bato
Visualization and Exploration of OPERA DIST-ANN-HLS Product Layers by C. Speed and M. Grace Bato
Visualizing and Analyzing the OPERA DIST-ANN-HLS Product to Explore Land-Use Change in Brazil by C. Speed and M. Grace Bato

See 3 usage examples →

OPERA Radiometric Terrain Corrected SAR Backscatter from Sentinel-1 Static Layers validated product (Version 1)

coastalcogearth observationgeoscienceglobalicelandmetadataoceansorbitradarsentinel-1synthetic aperture radartiffxml

The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Radiometric Terrain Corrected (RTC) SAR Backscatter from Sentinel-1 (S1) Static Layers (RTC-S1-STATIC) validated product contains static radar geometry layers associated with the OPERA Radiometric Terrain Corrected (RTC) SAR Backscatter from Sentinel-1 (S1) (RTC-S1) validated product. Due to the S1 mission’s narrow orbital tube, radar-geometry layers such as incidence angle, local incidence angle, number of looks, and RTC Area Normalization Factor (ANF) vary slightly over time for each position on the ground, and th...

Usage examples

RTC Landslide Example by K. Venkataramani
Load, Mosaic, and Visualize OPERA RTC-S1 Data by K. Venkataramani
An Area-Based Projection Algorithm for SAR Radiometric Terrain Correction and Geocoding by Gustavo H. X. Shiroma, Marco Lavalle, and Sean M. Buckley

See 3 usage examples →

Open City Model (OCM)

citieseventsgeospatial

Open City Model is an initiative to provide cityGML data for all the buildings in the United States. By using other open datasets in conjunction with our own code and algorithms it is our goal to provide 3D geometries for every US building.

Usage examples

See 3 usage examples →

Open Human Genome Library

bioinformaticsbiologygenomiclife sciences

The Open Human Genome Library (OpenHGL) is a collection of high-quality de novo human assemblies that are publicly available in genomic databases (e.g. NCBI and CNCB) or from individual research papers. It provides consistent naming and uniform formats across datasets, supporting efficient subsequence retrieval and approximate string search.

Usage examples

Using OpenHGL data by Heng Li
BWT construction and search at the terabase scale by Heng Li
AGC: compact representation of assembled genomes with fast queries and updates by Sebastian Deorowicz, Agnieszka Danek, Heng Li

See 3 usage examples →

Open VLF: Scientific Open Data Initiative for CRAAM's SAVNET and AWESOME VLF Data.

archivesastronomyatmospheregloballife sciencesopen source softwaresignal processing

This platform is maintained by CRAAM (Mackenzie Radio Astronomy and Astrophysics Center), a research center operated by UPM (Mackenzie Presbyterian University) and INPE (National Institute for Space Research), to provide public and free access for researchers, students, and the interested public to VLF (Very Low Frequency) data from CRAAM's antenna systems. Amazon AWS supports all data stored through the AWS Open Data Program. Very Low Frequency (VLF) signals can be used for navigation services, communication with submarines, and are a powerful tool to study the low-altitude Earth's io...

Usage examples

All source code, program utilities, and file layouts. by Kauffmann, DHV; Santiago, LS; Oliveira, R de.
Open VLF platform by CRAAM
Open VLF: Scientific Open Data Initiative for CRAAM's SAVNET and AWESOME VLF Data (2023). by Kauffmann, DHV; Santiago, LS; Raulin, JP; Correia, E; Oliveira, R de.

See 3 usage examples →

OpenProteinSet

alphafoldlife sciencesmsaopen source softwareopenfoldproteinprotein foldingprotein template

Multiple sequence alignments (MSAs) for 140,000 unique Protein Data Bank (PDB) chains and 16,000,000 UniClust30 clusters. Template hits are also provided for the PDB chains and 270,000 UniClust30 clusters chosen for maximal diversity and MSA depth. MSAs were generated with HHBlits (-n3) and JackHMMER against MGnify, BFD, UniRef90, and UniClust30 while templates were identified from PDB70 with HHSearch, all according to procedures outlined in the supplement to the AlphaFold 2 Nature paper, Jumper et al. 2021. We expect the database to be broadly useful to structural biologists training or valid...

Usage examples

Run inference at scale for OpenFold, a PyTorch-based protein folding ML model, using Amazon EKS by Shubha Kumbadakone, Ankur Srivastava, and Sachin Kadyan
OpenProteinSet: Training data for structural biology at scale by Ahdritz, Gustaf; Bouatta, Nazim; Kadyan, Sachin; Jarosch, Lukas; Berenberg, Daniel; Fisk, Ian, et al
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization by Ahdritz, Gustaf; Bouatta, Nazim; Kadyan, Sachin; Xia, Qinghui; Gerecke, William; O'Donnell, Timothy J, et al

See 3 usage examples →

OpenRoboCare Multi-Modal Expert Demonstration Dataset for Robot-Assisted Caregiving

computer visionhealthlife sciencesmachine learningrobotics

A comprehensive multimodal dataset capturing real-world caregiving routines from 21 occupational therapists performing 15 daily caregiving tasks. The dataset includes synchronized RGB-D video, tactile sensing, eye-gaze tracking, pose annotations, and action labels across 315 sessions totaling 19.8 hours of expert demonstrations. Data modalities include anonymized RGB images, depth maps, 44-sensor tactile readings, 2D/3D pose tracking, temporal action annotations, and first/third-person videos, enabling research in robot learning from demonstration, multimodal perception, and safe human-robot i...

Usage examples

OpenRoboCare: A Multimodal Multi-Task Expert Demonstration Dataset for Robot Caregiving by Liang X, Liu Z, Lin K, et al.
OpenRoboCare Dataset Viewer by Cornell University EmPRISE Lab
Get To Know A Dataset: OpenRoboCare by Cornell University EmPRISE Lab

See 3 usage examples →

PD12M

artdeep learningimage processinglabeledmachine learningmedia

PD12M is a collection of 12.4 million CC0/PD image-caption pairs for the purpose of training generative image models.

Usage examples

PD12M: A Large-Scale Image Captioning Dataset by Jordan Meyer, Nick Padgett, Laura Exline, Cullen Miller
Datasheet by Spawning
Working with the Metadata by Spawning
Downloading Images by Spawning
Hugging Face Dataset by Spawning

See 6 usage examples →

Pohang Canal Dataset: A Multimodal Maritime Dataset for Autonomous Navigation in Restricted Waters

autonomous vehiclescomputer visionlidarmarine navigationrobotics

This dataset presents a multi-modal maritime dataset acquired in restricted waters in Pohang, South Korea. The sensor suite is composed of three LiDARs (one 64-channel LiDAR and two 32-channel LiDARs), a marine radar, two visual cameras used as a stereo camera, an infrared camera, an omnidirectional camera with 6 directions, an AHRS, and a GPS with RTK. The dataset includes the sensor calibration parameters and SLAM-based baseline trajectory. It was acquired while navigating a 7.5 km route that includes a narrow canal area, inner and outer port areas, and a near-coastal area. The aim of this d...

Usage examples

ROS package for LiDAR to image of Pohang Canal Dataset by Dongha Chung
Ros Message Player for Pohang Dataset by Dongha Chung
Pohang Canal Dataset: A Multimodal Maritime Dataset for Autonomous Navigation in Restricted Waters by Dongha Chung, Jonghwi Kim, Changyu Lee, Jinwhan Kim

See 3 usage examples →

ProteinGym

bioinformaticsbiologydeep learninglife sciencesmachine learningprotein

ProteinGym is a benchmark suite for assessing the performance of protein fitness prediction and design models. It comprises a large curated collection of 200+ high-throughput experimental assays (~3M mutated sequences), as well as clinical annotations from experts about the pathogenicity of mutants in over 3k human genes.

Usage examples

Scoring ProteinGym assays with TranceptEVE by Daniel Ritter
ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design by Pascal Notin, et al.
ProteinGym website by Pascal Notin & Daniel Ritter

See 3 usage examples →

QIIME 2 Tutorial Data

bioinformaticsbiologyecosystemsenvironmentalgeneticgenomichealthlife sciencesmetagenomicsmicrobiome

QIIME 2 (pronounced “chime two”) is a microbiome multi-omics bioinformatics and data science platform that is trusted, free, open source, extensible, and community developed and supported.

Usage examples

See 3 usage examples →

Rain over Africa

agricultureanalysis ready dataatmosphereclimatedeep learningearth observationgeophysicsgeosciencehydrologymachine learningprecipitationsatellite imageryweatherzarr

The Rain over Africa (RoA) dataset consists of spaceborn estimates of precipitation of Rain over Africa using only geostationary imagery and obtained through a convolutional and quantile regression neural network. The dataset also contains some uncertainty estimates.

Usage examples

How to use the data by Adrià Amell
Reading RoA data by Adrià Amell
Probabilistic near real-time retrievals of Rain over Africa using deep learning by Adrià Amell, Lilian Hee, Simon Pfreundschuh, and Patrick Eriksson

See 3 usage examples →

SPaRCNet data:Seizures, Rhythmic and Periodic Patterns in ICU Electroencephalography

bioinformaticsdeep learninglife sciencesmachine learningmedicineneurophysiologyneuroscience

The IIIC dataset includes 50,697 labeled EEG samples from 2,711 patients' and 6,095 EEGs that were annotated by physician experts from 18 institutions. These samples were used to train SPaRCNet (Seizures, Periodic and Rhythmic Continuum patterns Deep Neural Network), a computer program that classifies IIIC events with an accuracy matching clinical experts.

Usage examples

Development of Expert-Level Classification of Seizures and Rhythmic and Periodic Patterns During EEG Interpretation by Jing J, Ge W, Hong S, Fernandes MB, Lin Z, Yang C et al., et al.
SPaRCNet data:Seizures, Rhythmic and Periodic Patterns in ICU Electroencephalography by Jing, J., Ge, W., Struck, A. F., Fernandes, M., Hong, S., An, S., et al.
IIIC-SPaRCNet Github Repository by Brain Data Science Platform (BDSP)

See 3 usage examples →

STOIC2021 Training

computed tomographycomputer visioncoronavirusCOVID-19grand-challenge.orgimaginglife sciencesSARS-CoV-2

The STOIC project collected Computed Tomography (CT) images of 10,735 individuals suspected of being infected with SARS-COV-2 during the first wave of the pandemic in France, from March to April 2020. For each patient in the training set, the dataset contains binary labels for COVID-19 presence, based on RT-PCR test results, and COVID-19 severity, defined as intubation or death within one month from the acquisition of the CT scan. This S3 bucket contains the training sample of the STOIC dataset as used in the STOIC2021 challenge on grand-challenge.org.

Usage examples

STOIC2021 Challenge by Diagnostic Image Analysis Group, Radboudumc, Nijmegen
How Well Do Self-Supervised Models Transfer to Medical Imaging? by Anton J, Castelli L, Chan MF, Outthers M, Tang WH, Cheung V, et al.
Study of Thoracic CT in COVID-19: The STOIC Project by Revel MP, Boussouar S, de Margerie-Mellon C, Saab I, Lapotre T, Mompoint D, et al.

See 3 usage examples →

Sentinel-1

agriculturecogdisaster responseearth observationgeospatialsatellite imagerysynthetic aperture radar

Sentinel-1 is a pair of European radar imaging (SAR) satellites launched in 2014 and 2016. Its 6 days revisit cycle and ability to observe through clouds makes it perfect for sea and land monitoring, emergency response due to environmental disasters, and economic applications. This dataset represents the global Sentinel-1 GRD archive, from beginning to the present, converted to cloud-optimized GeoTIFF format.

Usage examples

See 3 usage examples →

Sentinel-2 ACOLITE-DSF Aquatic Reflectance for the Conterminous United States

cogearth observationgeospatialnatural resourcesatellite imagerywater

Aquatic reflectance produced with the dark spectrum fitting (DSF) algorithm as implemented in the Atmospheric Correction for OLI “lite” (ACOLITE) software (version 20221114.0). Aquatic reflectance is defined here as unitless water-leaving radiance reflectance and represents the ratio of water-leaving radiance (units of watts per square meter per steradian per nanometer) to downwelling irradiance (units of watts per square meter per nanometer) multiplied by pi.

Usage examples

tutorial.zip by S.D. Ducar
GLOBUS Access Point by T.V. King, et al.
Sentinel-2 ACOLITE-DSF Aquatic Reflectance for the Conterminous United States by T.V. King, et al.

See 3 usage examples →

Software Heritage Graph Dataset

digital preservationfree softwareopen source softwaresource code

Software Heritage is the largest existing public archive of software source code and accompanying development history. The Software Heritage Graph Dataset is a fully deduplicated Merkle DAG representation of the Software Heritage archive.The dataset links together file content identifiers, source code directories, Version Control System (VCS) commits tracking evolution over time, up to the full states of VCS repositories as observed by Software Heritage during periodic crawls. The dataset’s contents come from major development forges (including GitHub and GitLab), FOSS distributions (e.g., Deb...

Usage examples

The SWH-Graph module by The Software Heritage team
The Software Heritage Graph Dataset by Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli
Using the Software Heritage Graph Dataset by The Software Heritage team

See 3 usage examples →

Sophos/ReversingLabs 20 Million malware detection dataset

cyber securitydeep learninglabeledmachine learning

A dataset intended to support research on machine learning techniques for detecting malware. It includes metadata and EMBER-v2 features for approximately 10 million benign and 10 million malicious Portable Executable files, with disarmed but otherwise complete files for all malware samples. All samples are labeled using Sophos in-house labeling methods, have features extracted using the EMBER-v2 feature set, well as metadata obtained via the pefile python library, detection counts obtained via ReversingLabs telemetry, and additional behavioral tags that indicate the rough behavior of the sam...

Usage examples

SOREL-20M quickstart by Richard Harang
SOREL-20M dataset interface code by Richard Harang and Ethan M Rudd
SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection by Richard Harang and Ethan M Rudd

See 3 usage examples →

State of Colorado Imagery

aerial imagerygeospatialimagingmapping

The State of Colorado has gathered public historical imagery ranging from 2005 to 2021.

Usage examples

See 3 usage examples →

TESS-GAIA Light Curve (TGLC)

astronomy

TESS-Gaia Light Curve (TGLC) is a PSF-based TESS full-frame image (FFI) light curve product. Using Gaia DR3 as priors, the team forward models the FFIs with the effective point spread function to remove contamination from nearby stars. The resulting light curves show a photometric precision closely tracking the pre-launch prediction of the noise level: TGLC's photometric precision consistently reaches ≲2% at 16th TESS magnitude even in crowded fields, demonstrating excellent decontamination and deblending power.

Usage examples

TESS-Gaia Light Curve, A PSF-based TESS FFI Light-curve Product by Han, T. and Brandt, T.
TIKE - a free, Jupyter-based cloud platform to access and analyze MAST's AWS timeseries data by MAST Staff
TGLC in the Cloud Jupyter Notebook by MAST Staff

See 3 usage examples →

The Human Microbiome Project

amino acidfastafastqgeneticgenomiclife sciencesmetagenomicsmicrobiome

The NIH-funded Human Microbiome Project (HMP) is a collaborative effort of over 300 scientists from more than 80 organizations to comprehensively characterize the microbial communities inhabiting the human body and elucidate their role in human health and disease. To accomplish this task, microbial community samples were isolated from a cohort of 300 healthy adult human subjects at 18 specific sites within five regions of the body (oral cavity, airways, urogenital track, skin, and gut). Targeted sequencing of the 16S bacterial marker gene and/or whole metagenome shotgun sequencing was performe...

Usage examples

The Human Microbiome Project by Peter J. Turnbaugh, Ruth E. Ley, Micah Hamady, Claire M. Fraser-Liggett, Rob Knight & Jeffrey I. Gordon
Strains, functions and dynamics in the expanded Human Microbiome Project by Jason Lloyd-Price, Anup Mahurkar, Gholamali Rahnavard, Jonathan Crabtree, Joshua Orvis, A. Brantley Hall, et al.
New microbe genomic variants in patients fecal community following surgical disruption of the upper human gastrointestinal tract by Ranjit Kumar, Jayleen Grams, Daniel I. Chu, David K.Crossman, Richard Stahl, Peter Eipers, et al

See 3 usage examples →

Transcriptomic MIT Licensed data and models

biodiversityBiohubbiologybiomolecular modelingcell biologyhdf5life sciencesmachine learningmodelproteintranscriptomics

This dataset contains a transcriptomics biological data and models. The models embed transcriptomic data and facilitate transcriptomic analysis. The data is sourced and curated by a team of experts at Biohub and is made available as part of these datasets only when it is not publicly accessible or requires transformations to support model training.

Usage examples

Documentation for Transcriptformer by Biohub
scGenePT Perturbation Prediction Tutorial by Biohub
A Cross-Species Generative Cell Atlas Across 1.5 Billion Years of Evolution: The TranscriptFormer Single-cell Model by Pearce, J. D., et. al.
Documentation for scGenePT by Biohub
scGenePT: Is language all you need for modeling single-cell perturbations? by Ana-Maria Istrate, Donghui Li, Theofanis Karaletsos

See 6 usage examples →

UCSF Renal Mass CT Dataset

cancercomputed tomographylife sciencesmedical imagingmedicineradiology

This dataset provides a set of 831 3D Multiphase CT exams of renal masses, registered across phases with annotations identifying the masses

Usage examples

See 3 usage examples →

Variant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) Plugin

genome wide association studygenomiclife scienceslofteevep

VEP determines the effect of genetic variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. The European Bioinformatics Institute produces the VEP tool/db and releases updates every 1 - 6 months. The latest release contains 267 genomes from 232 species containing 5567663 protein coding genes. This dataset hosts the last 5 releases for human, rat, and zebrafish. Also, it hosts the required reference files for the Loss-Of-Function Transcript Effect Estimator (LOFTEE) plugin as it is commonly used with VEP.

Usage examples

See 3 usage examples →

Vesuvius Challenge - CT Scans of Herculaneum Papyri

computed tomographycultural preservationdigital preservationhistoryimage processingimagingvolumetric imagingx-rayx-ray microtomographyx-ray tomographyzarr

This dataset contains reconstructed micro-CT volumes of carbonized Herculaneum papyri produced as part of the Vesuvius Challenge. The scanned scroll library survived the eruption of Mount Vesuvius in AD 79. It is the only intact library known to have survived from antiquity. The volumetric reconstructions are distributed as OME-Zarr multiscale datasets to support research in virtual unwrapping, segmentation, and text recovery of ancient scrolls. Deciphering these scrolls could forever change our understanding of Roman history.

Usage examples

See 3 usage examples →

WIS2 Global Cache on AWS

atmosphereclimateearth observationforecastgeosciencehydrologymeteorologicalmodeloceansweather

Global real-time Earth system data deemed by the World Meteorological Organisation (WMO) as essential for provision of services for the protection of life and property and for the well-being of all nations. Data is sourced from all WMO Member countries / territories and retained for 24-hours. Met Office and NOAA operate this Global Cache service curating and publishing the dataset on behalf of WMO.

Usage examples

Guide to the WMO Information System (WMO-No. 1061), Volume II, WMO Information System 2.0 by World Meteorological Organisation
Manual on the WMO Information System (WMO-No. 1060), Volume II, WMO Information System 2.0 by World Meteorological Organisation
WIS 2.0 video for 19th World Meterological Congress by WMO Secretariat

See 3 usage examples →

Wind AI Bench

benchmarkenergymachine learning

This data lake contains multiple datasets related to fundamental problems in wind energy research. This includes data for wind plant power production for various layouts/wind flow scenarios, data for two- and three-dimensional flow around different wind turbine airfoils/blades, wind turbine noise production, among others. The purpose of these datasets is to establish a standard benchmark against which new AI/ML methods can be tested, compared, and deployed. Details regarding the generation and formatting of the data for each dataset is included in the metadata as well as example noteboo...

Usage examples

Wind AI Bench FLORIS PlayGen by Dakota Ramos, Andrew Glaws
Airfoil 9k by Dakota Ramos, Andrew Glaws
Airfoil 2k by Dakota Ramos, Andrew Glaws

See 3 usage examples →

run_dbcan CAZyme and CGC annotation database on AWS

benchmarkbioinformaticslife sciencesmetagenomicsmicrobiome

Database for use with run_dbcan (CAZyme and CGC annotation), including CAZyme, Transporter, Transcription factor, Signaling Transduction Protein, Sulfatase, Peptidase, and Polysaccharide utilization Loci.

Usage examples

run_dbcan Documentation by Xinpeng Zhang; Haidong Yi; Yanbin Yin
dbCAN3: automated carbohydrate-active enzyme and substrate annotation by Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin
run_dbcan by Xinpeng Zhang; Haidong Yi; Jinfang Zheng; Le Huang; Qiwei Ge; Yanbin Yin

See 3 usage examples →

1940 Census Population Schedules, Enumeration District Maps, and Enumeration District Descriptions

1940 censusarchivescensusdemographynara

The 1940 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1940, although some persons were missed. The 1940 census population schedules were digitized by the National Archives and Records Administration (NARA) and released publicly on April 2, 2012. The 1940 Census enumeration district maps contain maps of counties, cities, and other minor civil divisions that show enumeration districts, census tracts, and related boundaries and numbers used for each census. The coverage is nation wide and inclu...

Usage examples

1940 Census on the AWS Registry of Open Data by National Archives and Records Administration
National Archives 1940 Census by National Archives and Records Administration

See 2 usage examples →

1950 Census Population Schedules, Enumeration District Maps, and Enumeration District Descriptions

1950 censusarchivescensusdemographynara

The 1950 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1950, although some persons were missed. The 1950 census population schedules were digitized by the National Archives and Records Administration (NARA) and released publicly on April 1, 2022. The 1950 Census enumeration district maps contain maps of counties, cities, and other minor civil divisions that show enumeration districts, census tracts, and related boundaries and numbers used for each census. The coverage is nation wide and inclu...

Usage examples

National Archives 1950 Census by National Archives and Records Administration
1950 Census on the AWS Registry of Open Data by National Archives and Records Administration

See 2 usage examples →

2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File

censusdifferential privacydisclosure avoidanceethnicitygroup quartershispanichousinghousing unitslatinonoisy measurementspopulationraceredistrictingvoting age

The 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9 , and implemented in https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code). The NMF was produced using the official “production settings,” the final set of algorithmic parameters and privacy-loss budget allocations, that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Characteristics File. ...

Usage examples

Geographic Spines in the 2020 Census Disclosure Avoidance System by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.
The 2020 Census Disclosure Avoidance System Topdown Algorithm by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.

See 2 usage examples →

2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File

censusdifferential privacydisclosure avoidanceethnicitygroup quartershousinghousing unitsnoisy measurementspopulationraceredistrictingvoting age

The 2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File (NMF) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9, and implemented in the DAS 2020 Redistricting Production Code). The NMF was generated using the Census Bureau's implementation of the Discrete Gaussian Mechanism, calibrated to satisfy zero-Concentrated Differential Privacy with bounded neighbors.

The NMF values, called noisy measurements are the output of applying the Discrete Gaussian Mechanism to ...

Usage examples

Geographic Spines in the 2020 Census Disclosure Avoidance System by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.
The 2020 Census Disclosure Avoidance System Topdown Algorithm by Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., Zhuravlev, P.

See 2 usage examples →

4D Nucleome (4DN)

bioinformaticsbiologygeneticgenomicimaginglife sciences

The goal of the National Institutes of Health (NIH) Common Fund’s 4D Nucleome (4DN) program is to study the three-dimensional organization of the nucleus in space and time (the 4th dimension). The nucleus of a cell contains DNA, the genetic “blueprint” that encodes all of the genes a living organism uses to produce proteins needed to carry out life-sustaining cellular functions. Understanding the conformation of the nuclear DNA and how it is maintained or changes in response to environmental and cellular cues over time will provide insights into basic biology as well as aspects of human health...

Usage examples

See 2 usage examples →

A Global Drought and Flood Catalogue from 1950 to 2016

floodsglobalnear-surface air temperaturenear-surface specific humiditynetcdfprecipitation

Hydrological extremes, in the form of droughts and floods, have impacts on a wide range of sectors including water availability, food security, and energy production, among others. Given continuing large impacts of droughts and floods and the expectation for significant regional changes projected in the future, there is an urgent need to provide estimates of past events and their future risk, globally. However, current estimates of hydrological extremes are not robust and accurate enough, due to lack of long-term data records, standardized methods for event identification, geographical inconsi...

Usage examples

A Global Drought and Flood Catalogue from 1950 to 2016 by He, X., M. Pan, Z. Wei, E. F. Wood, and J. Sheffield
Data introduction included in the GDFC by Xiaogang He

See 2 usage examples →

ASL 1000

machine learningvideo

This dataset provides a high-fidelity collection of American Sign Language (ASL) videos annotated with 2D landmarks for hands, pose, and face. The data is designed to train advanced research and development in ASL recognition, translation, gesture analysis, and computer animation. The annotations for this dataset were generated using an automated data pipeline to pre-annotate keyframes from the source videos. As a final, critical step, all automated annotations were subsequently reviewed and meticulously corrected by human labellers to ensure the highest level of accuracy and reliability, maki...

Usage examples

A Scalable Data Pipeline for ASL Media Processing and Annotation by Khanh Nguyen, Radha Sri-Tharan, Arun George Zachariah, Pratyusha Maiti, Latasha Grainger, Amariah Arias, Dnaijsha Minor, Jesse Oliver, Jenna Diamond, Dina Yared, Darragh Hanley, Christof Henkel, Alex Starnes, Kevin Schlichter, Suseella Panguluri, Michael Boone, Jeff Fisher, Nikki Pope, Anders Ahrensbach Mikkelsen, Nicole Maisonville, Oscar Örnberg, Jason Liang, Julianne Fong
ASL Data Pipeline by NVIDIA

See 2 usage examples →

Africa Soil Information Service (AfSIS) Soil Chemistry

agricultureenvironmentalfood securitylife sciencesmachine learning

This dataset contains soil infrared spectral data and paired soil property reference measurements for georeferenced soil samples that were collected through the Africa Soil Information Service (AfSIS) project, which lasted from 2009 through 2018. In this release, we include data collected during Phase I (2009-2013.) Georeferenced samples were collected from 19 countries in Sub-Saharan African using a statistically sound sampling scheme, and their soil properties were analyzed using both conventional soil testing methods and spectral methods (infrared diffuse reflectance spectroscopy). The two ...

Usage examples

See 2 usage examples →

AgricultureVision

aerial imageryagriculturecomputer visiondeep learningmachine learning

Agriculture-Vision aims to be a publicly available large-scale aerial agricultural image dataset that is high-resolution, multi-band, and with multiple types of patterns annotated by agronomy experts. The original dataset affiliated with the 2020 CVPR paper includes 94,986 512x512images sampled from 3,432 farmlands with nine types of annotations: double plant, drydown, endrow, nutrient deficiency, planter skip, storm damage, water, waterway and weed cluster. All of these patterns have substantial impacts on field conditions and the final yield. These farmland images were captured between 201...

Usage examples

The 2nd International Workshop and Prize Challenge on Agriculture-Vision, Challenges & Opportunities for Computer Vision in Agricutlure by Humphrey Shi, Naira Hovakimyan, Jennifer Hobbs, Ed Delp, Melba Crawford, Zhen Li, David Clifford, Jim Yuan, Mang Tik Chiu, Xingqian Xu
Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis by Mang Tik Chiu, Xingqian Xu, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Hrant Khachatrian, Hovnatan Karapetyan, Ivan Dozier, Greg Rose, David Wilson, Adrian Tudor, Naira Hovakimyan, Thomas S. Huang, Honghui Shi

See 2 usage examples →

Allen Institute for Brain Science - Synaptic Physiology Public Data Set

electrophysiologyHomo sapienslife sciencesMus musculusneurobiologysignal processing

This is a large-scale survey that describes the physiology (strength, kinetics, and short term plasticity) of thousands of synapses from patch clamp experiments in mouse visual cortex and human middle temporal gyrus.

Usage examples

aisynphys python package for accessing synaptic physiology data by Campagnola L., Seeman S., et al.
Local connectivity and synaptic dynamics in mouse and human neocortex by Campagnola L., Seeman S., et al.

See 2 usage examples →

Allen Institute for Neural Dynamics - Extracellular Electrophysiology Compression Benchmark

electrophysiologylife sciencesMus musculusneurobiologysignal processing

Extracellular electrophysiology data is growing at a remarkable pace. This data, collected neuropixels probes by the Allen Institute and the International Brain Lab can be used to benchmark throughput rates and storage ratios of various data compression algorithms.

Usage examples

See 2 usage examples →

Allen Institute for Neural Dynamics - Extracellular Electrophysiology Hybrid Evaluation Benchmark

electrophysiologylife sciencesMus musculusneurobiologysignal processing

Evaluation of spike sorting methods is a challenging task, as it requires both ground-truth data and a variety of sorting algorithms to compare against. This dataset contains a set of hybrid data specifically designed for benchmarking spike sorting methods.

Usage examples

See 2 usage examples →

Animal Tracking - Acoustic Telemetry - Quality controlled detections

biologymarine mammalsoceans

Since 2007, the Integrated Marine Observing System’s Animal Tracking Facility (formerly known as the Australian Animal Tracking And Monitoring System (AATAMS)) has established a permanent array of acoustic receivers around Australia to detect the movements of tagged marine animals in coastal waters. Simultaneously, the Animal Tracking Facility developed a centralised national database (https://animaltracking.aodn.org.au/) to encourage collaborative research across the Australian research community and provide unprecedented opportunities to monitor broad-scale animal movements. The resulting da...

Usage examples

See 2 usage examples →

Astrophysics Division Galaxy Segmentation Benchmark Dataset

astronomymachine learningNASA SMD AIsegmentation

Pan-STARSS imaging data and associated labels for galaxy segmentation into galactic centers, galactic bars, spiral arms and foreground stars derived from citizen scientist labels from the Galaxy Zoo: 3D project.

Usage examples

Galaxy Zoo: 3D – crowdsourced bar, spiral, and foreground star masks for MaNGA target galaxies by Karen L Masters, Coleman Krawczyk, Shoaib Shamsi, Alexander Todd, Daniel Finnegan, Matthew Bershady, Kevin Bundy, Brian Cherinka, Amelia Fraser-McKelvie, Dhanesh Krishnarao, Sandor Kruk, Richard R Lane, David Law, Chris Lintott, Michael Merrifield, Brooke Simmons, Anne-Marie Weijmans, Renbin Yan
Pan-STARRS Pixel Processing: Detrending, Warping, Stacking by C. Z. Waters, E. A. Magnier, P. A. Price, K. C. Chambers, W. S. Burgett, P. W. Draper, H. A. Flewelling, K. W. Hodapp, M. E. Huber, R. Jedicke, N. Kaiser, R.-P. Kudritzki, R. H. Lupton, N. Metcalfe, A. Rest, W. E. Sweeney, J. L. Tonry, R. J. Wainscoat, and W. M. Wood-Vase

See 2 usage examples →

Atmospheric Models from Météo-France

agricultureclimatedisaster responseearth observationenvironmentalmeteorologicalmodelweather

Global and high-resolution regional atmospheric models from Météo-France.

ARPEGE World covers the entire world at a base horizontal resolution of 0.5° (~55km) between grid points, it predicts weather out up to 114 hours in the future.
ARPEGE Europe covers Europe and North-Africa at a base horizontal resolution of 0.1° (~11km) between grid points, it predicts weather out up to 114 hours in the future.
AROME France covers France at a base horizontal resolution of 0.025° (~2.5km) between grid points, it predicts weather out up to 42 hours in the future.
AROME France HD covers France and neighborhood a

...

Usage examples

Windy.com by Windy
Windguru.cz by Windguru

See 2 usage examples →

Aurora Multi-Sensor Dataset

autonomous vehiclescomputer visiondeep learningimage processinglidarmachine learningmappingroboticstraffictransportationurbanweather

The Aurora Multi-Sensor Dataset is an open, large-scale multi-sensor dataset with highly accurate localization ground truth, captured between January 2017 and February 2018 in the metropolitan area of Pittsburgh, PA, USA by Aurora (via Uber ATG) in collaboration with the University of Toronto. The de-identified dataset contains rich metadata, such as weather and semantic segmentation, and spans all four seasons, rain, snow, overcast and sunny days, different times of day, and a variety of traffic conditions.
The Aurora Multi-Sensor Dataset contains data from a 64-beam Velodyne HDL-64E LiDAR s...

Usage examples

"Pit30M: A benchmark for global localization in the age of self-driving cars", in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4477-4484) by Martinez, J., Doubov, S., Fan, J., Bârsan, I. A., Wang, S., Máttyus, G., Urtasun, R.
Introduction to Visualizing Sensor Types (Jupyter notebook) by Andrei Bârsan (note: Aurora makes no representations as to the accuracy or functionality of the tutorial)

See 2 usage examples →

Biodiversity Heritage Library Metadata and Page Images

biodiversitybioinformaticslife sciences

The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working together to digitize the natural history literature held in their collections and make it freely available for open access.

Usage examples

See 5 usage examples →

Biological and Physical Sciences (BPS) Microscopy Benchmark Training Dataset

fluorescence imagingGeneLabgeneticgenetic mapslife sciencesmicroscopyNASA SMD AI

Fluorescence microscopy images of individual nuclei from mouse fibroblast cells, irradiated with Fe particles or X-rays with fluorescent foci indicating 53BP1 positivity, a marker of DNA damage. These are maximum intensity projections of 9-layer microscopy Z-stacks.

Usage examples

Dose, LET and Strain Dependence of Radiation-Induced 53BP1 Foci in 15 Mouse Strains Ex Vivo Introducing Novel DNA Damage Metrics by Sébastien Penninckx, Egle Cekanaviciute, Charlotte Degorre, Elodie Guiet, Louise Viger, Stéphane Lucasb, Sylvain V. Costes
NASA SMD AI Workshop Report by SMD Artificial Intelligence (AI) Initiative

See 2 usage examples →

Biological and Physical Sciences (BPS) RNA Sequencing Benchmark Training Dataset

gene expressionGeneLabgeneticgenetic mapslife sciencesNASA SMD AIspace biology

RNA sequencing data from spaceflown and control mouse liver samples, sourced from NASA GeneLab and augmented with generative adversarial network.

Usage examples

NASA SMD AI Workshop Report by SMD Artificial Intelligence (AI) Initiative
Adversarial generation of gene expression data by Ramon Viñas, Helena Andrés-Terré, Pietro Liò, Kevin Bryson

See 2 usage examples →

Brain Encoding Response Generator (BERG)

brain modelscomputer visiondeep learninglife sciencesmachine learningneuroimagingneuroscience

Brain Encoding Response Generator (BERG) is a resource consisting of multiple pre-trained encoding models of the brain and an accompanying Python package to generate accurate in silico neural responses to arbitrary stimuli with just a few lines of code.

Usage examples

In-Silico fMRI Data Tutorial by Alessandro Gifford
Quickstart Tutorial by Domenic Bersch
The Brain Encoding Response Generator by Alessandro Gifford
Brain Encoding Response Generator (BERG) by Alessandro Gifford
In-Silico EEG Data Tutorial by Alessandro Gifford

See 5 usage examples →

Brain/MINDS Marmoset Connectivity Resource on AWS

brain imagesimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscienceniftinon-human primate

Brain/MINDS Marmoset Connectivity Resource (BMCR) is a resource that provides access to anterograde and retrograde neuronal tracer data, made available by Brain/MINDS project. It is currently restricted to injections into the prefrontal cortex of a marmoset brain but is planned to include injections into entire cortical areas and representative subcortical brain regions.

Usage examples

The Brain/MINDS Marmoset Connectivity Resource - An open-access platform for cellular-level tracing and tractography in the primate brain by H. Skibbe, M.F. Rachmadi, K. Nakae, C. E. Gutierrez, J. Hata, H. Tsukada, C. Poon, K. Doya, P. Majka, M. G. P. Rosa, M. Schlachter, H. Okano, T. Yamamori, S. Ishii, M. Reisert, A. Watakabe.
Local and long-distance organization of prefrontal cortex circuits in the marmoset brain. by Watakabe A, Skibbe H, Nakae K, Abe H, Ichinohe N, Rachmadi MF, Wang J, Takaji M, Mizukami H, Woodward A, Gong R, Hata J, Van Essen DC, Okano H, Ishii S, Yamamori T.
Marmoset PFC connectome by Akiya Watakabe, Henrik Skibbe and Tetsuo Yamamori
Explorer Tutorials by Henrik Skibbe
BMCR website by Henrik Skibbe

See 5 usage examples →

BrainGlobe Atlases

biologydigital preservationHomo sapiensimage processingimaginglife scienceslight-sheet microscopymagnetic resonance imagingmedical imagingmicroscopyMus musculusneurobiologyneuroimagingneuroscienceRattus norvegicusvolumetric imagingzarr

BrainGlobe provides an archive and standardised interface to anatomical atlases from multiple species. This dataset includes these atlases, and other data (e.g. sample neuroanatomy data) to allow the greatest use of the atlases.

Usage examples

See 2 usage examples →

BrainSeq - Neurogenomics to Drive Novel Target Discovery for Neuropsychiatric Disorders

gene expressiongenotypinglife sciencestranscriptomics

This ambitious project seeks to characterize the genetic and epigenetic regulation of multiple facets of transcription in distinct brain regions across the human lifespan in samples of major neuropsychiatric disorders and controls. Initially focused on schizophrenia and mood disorders, the goal of this consortium is to elucidate the underlying molecular mechanisms of genetic associations with the goal of identifying novel therapeutic targets. The consortium currently consists of seven pharmaceutical companies and a not-for-profit medical research institution working as a precompetitive team to...

Usage examples

See 2 usage examples →

Broad Genome References

bioinformaticsbiologycancergeneticgenomicHomo sapienslife sciencesreference index

Broad maintained human genome reference builds hg19/hg38 and decoy references.

Usage examples

Advancing NGS quality control to enable measurement of actionable mutations in circulating tumor DNA by Willey J. C., Morrison T. B., Austermiller B., Crawford E. E., et al (2021)
Using Amazon FSx for Lustre for Genomics Workflows on AWS by W. Lee Pang

See 2 usage examples →

COVID-19 Data Lake

amazon.sciencebioinformaticsbiologycoronavirusCOVID-19healthlife sciencesmedicineMERSSARS

A centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness, COVID-19. Globally, there are several efforts underway to gather this data, and we are working with partners to make this crucial data freely available and keep it up-to-date. Hosted on the AWS cloud, we have seeded our curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and rela...

Usage examples

See 5 usage examples →

CanElevation - Canada Digital Elevation Models

canadademdsmdtmelevationgeospatiallandstac

The Canadian DEM represents the current coverage of elevation data available. This dataset includes a Digital Terrain Model (DTM), a Digital Surface Model (DSM) and other derived products. This dataset includes a 1m, 2m and 30m DEM. The 1m and 2 m products are a combination of DEM data generated from airborne LiDAR and optical digital images. The 30 m DEM integrates data from the Copernicus DEM acquired during the TanDEM-X Mission, with the DEM data derived from airborne lidar and provides a complete coverage for Canada.

Le modèle numérique d’élévation (MNE) canadien représente la couvertur...

Usage examples

Descriptor: Medium Resolution Digital Elevation Model From Natural Resources Canada’s CanElevation Series (MRDEM-30) by H. McGrath et al.
Cloud-Optimized Geospatial Data Access by NRCan

See 2 usage examples →

Cancer Genome Characterization Initiatives - Burkitt Lymphoma, HIV+ Cervical Cancer

cancergenomiclife sciencesSTRIDEStranscriptomics

The Cancer Genome Characterization Initiatives (CGCI) program supports cutting-edge genomics research of adult and pediatric cancers. CGCI investigators develop and apply advanced sequencing methods that examine genomes, exomes, and transcriptomes within various types of tumors. The program includes Burkitt Lymphoma Genome Sequencing Project (BLGSP) project and HIV+ Tumor Molecular Characterization Project - Cervical Cancer (HTMCP-CC) project. The dataset contains open Clinical Supplement, Biospecimen Supplement, RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform Expression Quantificati...

Usage examples

Genomic Data Commons by National Cancer Institute
Genome-wide discovery of somatic coding and noncoding mutations in pediatric endemic and sporadic Burkitt lymphoma by Grande B. M., Gerhard D. S., Jiang A., Griner N. B., Abramson J. S., Alexander T. B., et al.

See 2 usage examples →

Cell Painting Image Collection

biologycell imagingcell paintingfluorescence imaginghigh-throughput imagingimaginglife sciencesmicroscopy

The Cell Painting Image Collection is a collection of freely downloadable microscopy image sets. Cell Painting is an unbiased high throughput imaging assay used to analyze perturbations in cell models. In addition to the images themselves, each set includes a description of the biological application and some type of "ground truth" (expected results). Researchers are encouraged to use these image sets as reference points when developing, testing, and publishing new image analysis algorithms for the life sciences. We hope that the this data set will lead to a better understanding of w...

Usage examples

See 2 usage examples →

Cloud Indexes for Bowtie, Kraken, HISAT, and Centrifuge

bioinformaticsbiologygenomiclife sciencesmappingmedicinereference indexwhole genome sequencing

Genomic tools use reference databases as indexes to operate quickly and efficiently, analogous to how web search engines use indexes for fast querying. Here, we aggregate genomic, pan-genomic and metagenomic indexes for analysis of sequencing data.

Usage examples

Table of contents for tutorials for constituent tools by Ben Langmead
Reducing reference bias using multiple population reference genomes by Chen et al (2020)

See 2 usage examples →

Collection of open nation-scale LiDAR datasets

earth observationgeosciencegeospatialland coverlidarmappingsurvey

The goal of this project is to collect all publicly available large scale LiDAR datasets and archive them in an uniform fashion for easy access and use. Initial efforts to collect the datasets are concentrated on Europe and will be in future expanded to USA and other regions, striving for global coverage. Every dataset includes files in original data format and translated to COPC format. For faster browsing, we include an overview file that includes a small subset of data points from every dataset file in a single COPC file.

Usage examples

See 2 usage examples →

Consented Activities of People

activity detectionactivity recognitioncomputer visionlabeledmachine learningprivacyvideo

The Consented Activities of People (CAP) dataset is a fine grained activity dataset for visual AI research curated using the Visym Collector platform.

Usage examples

Visym Collector by Visym Labs & Systems & Technology Research
OpenFAD - Open Fine Grained Activity Detection Challenge by Visym Labs & NIST

See 2 usage examples →

Copernicus Digital Elevation Model (DEM)

agriculturecogdisaster responseearth observationelevationgeospatialsatellite imagery

The Copernicus DEM is a Digital Surface Model (DSM) which represents the surface of the Earth including buildings, infrastructure and vegetation. We provide two instances of Copernicus DEM named GLO-30 Public and GLO-90. GLO-90 provides worldwide coverage at 90 meters. GLO-30 Public provides limited worldwide coverage at 30 meters because a small subset of tiles covering specific countries are not yet released to the public by the Copernicus Programme. Note that in both cases ocean areas do not have tiles, there one can assume height values equal to zero. Data is provided as Cloud Optimized Ge...

Usage examples

See 2 usage examples →

CoversBR

copyright monitoringcover song identificationlive song identificationmusicmusic features datasetmusic information retrievalmusic recognition

CoversBR is the first large audio database with, predominantly, Brazilian music for the tasks of Covers Song Identification (CSI) and Live Song Identifications (LSI). Due to copyright restrictions audios of the songs cannot be made available, however metadata and files of features have public access. Audio streamings captured from radio and TV channels for the live song identification task will be made public. CoversBR is composed of metadata and features extracted from 102298 songs, distributed in 26366 groups of covers/versions, with an average of 3.88 versions per group. The entire collecti...

Usage examples

See 2 usage examples →

Covid Job Impacts - US Hiring Data Since March 1 2020

COVID-19economicsfinancial marketshiringmarket data

This dataset provides daily updates on the volume of US job listings filtered by geography industry job family and role; normalized to pre-covid levels.These data files feed the business intelligence visuals at covidjobimpacts.greenwich.hr, a public-facing site hosted by Greenwich.HR and OneModel Inc. Data is derived from online job listings tracked continuously, calculated daily and published nightly. On average data from 70% of all new US jobs are captured, and the dataset currently contains data from 3.3 million hiring organizations.Data for each filter segment is represented as the 7-day ...

Usage examples

Documentation of dataset schemas by Greenwich.HR
CovidJobImpats.Greenwich.HR - online visualization of daily hiring data and weekly unemployment data including links to recorded discussions using the data by Greenwich.HR and OneModel Inc.

See 2 usage examples →

Cryo-EM SPA Workflow Records

life sciencesmachine learningstructural biology

The “Cryo-EM SPA Workflow Records” contains all outputs of all processing steps involved in cryogenic electron microscopy (cryo-EM) single particle analysis (SPA), including both intermediate and final output data. The primary focus will be on data generated by RELION and CryoSPARC, two widely used software packages for :Cryo-EM SPA. These records will be archived systematically. To ensure the data remains reproducible while minimizing storage demands, large-sized files that can be regenerated will be excluded prior to registration. The aim is to retain only the essential metadata, processing ...

Usage examples

See 2 usage examples →

DNAStack COVID19 SRA Data

bambioinformaticscoronavirusCOVID-19fastafastqgeneticgenomicglobalhealthlife scienceslong read sequencingSARS-CoV-2vcfviruswhole genome sequencing

The Sequence Read Archive (SRA) is the primary archive of high-throughput sequencing data, hosted by the National Institutes of Health (NIH). The SRA represents the largest publicly available repository of SARS-CoV-2 sequencing data. This dataset was created by DNAstack using SARS-CoV-2 sequencing data sourced from the SRA. Where possible, raw sequence data were processed by DNAstack through a unified bioinformatics pipeline to produce genome assemblies and variant calls. The use of a standardized workflow to produce this harmonized dataset allows public data generated using different methodol...

Usage examples

Viral lineage assignment by Heather Ward
Viral AI by DNAstack

See 2 usage examples →

Danish Meteorological Institute (DMI) Reanalysis dataset v0.5

air temperatureatmospheregeospatialgloballandmeteorologicalmodelnear-surface air temperaturenear-surface relative humiditynear-surface specific humiditywaterweatherzarr

DANRA is a high-resolution meteorological reanalysis dataset for Denmark and Northwestern Europe covering the period September 1990 to December 2023

Usage examples

See 2 usage examples →

Dendritic Consortium Multimodal Dataset

brain imagesbrain modelselectron microscopyelectrophysiologyimaginglife sciencesMus musculusneurobiologyneuroimagingneurophysiologyneurosciencesimulation neurosciencesingle neuron models

The Dendritic Consortium provides a multimodal dataset integrating calcium and voltage imaging, electrophysiology, electron microscopy, proteomics, and computational models of Baz1a pyramidal neurons in the mouse primary visual cortex (V1).

Usage examples

See 2 usage examples →

DigitalCorpora

computer forensicscomputer securityCSIcyber securitydigital forensicsimage processingimaginginformation retrievalinternetintrusion detectionmachine learningmachine translationtext analysis

Disk images, memory dumps, network packet captures, and files for use in digital forensics research and education. All of this information is accessible through the digitalcorpora.org website, and made available at s3://digitalcorpora/. Some of these datasets implement scenarios that were performed by students, faculty, and others acting in persona. As such, the information is synthetic and may be used without prior authorization or IRB approval. Details of these datasets can be found at Details →

Usage examples

See 2 usage examples →

Downscaled Climate Data for Alaska (v1.1, August 2023)

agricultureclimatecoastalearth observationenvironmentalsustainabilityweather

This dataset contains historical and projected dynamically downscaled climate data for the State of Alaska and surrounding regions at 20km spatial resolution and hourly temporal resolution. Select variables are also summarized into daily resolutions. This data was produced using the Weather Research and Forecasting (WRF) model (Version 3.5). We downscaled both ERA-Interim historical reanalysis data (1979-2015) and both historical and projected runs from 2 GCM’s from the Coupled Model Inter-comparison Project 5 (CMIP5): GFDL-CM3 and NCAR-CCSM4 (historical run: 1970-2005 and RCP 8.5: 2006-2100)....

Usage examples

Dynamical Downscaling of ERA-Interim Temperature and Precipitation for Alaska by Peter A. Bieniek, Uma S. Bhatt, John E. Walsh, T. Scott Rupp, Jing Zhang, Jeremy R. Krieger, and Rick Lader
Atmospheric Circulation Drivers of Extreme High Water Level Events at Foggy Island Bay, Alaska by Peter A. Bieniek, Li Erikson, and Jeremy Kasper

See 2 usage examples →

E11bio PRISM

bioinformaticsbiologybrain imagescell imagingcomputer visionfluorescence imaginghigh-throughput imagingimage processingimagingion channelslife sciencesmachine learningmicroscopymorphological reconstructionsMus musculusneurobiologyneuroimagingneuroscienceproteinsegmentationzarr

This dataset was generated using E11.bio's PRISM technology (Protein Reconstruction and Identification through Multiplexing), a platform that combines viral barcoding, expansion microscopy, and iterative immunolabeling for large-scale neuronal reconstruction.Neurons in the mouse hippocampal CA3 were transduced with a library of adeno-associated viruses (AAVs) encoding diverse “protein bits”—small epitope tags that act as combinatorial barcodes. Tissue was then processed with an expansion microscopy protocol, physically enlarging the sample ~5× to achieve an effective voxel size of ~35 × 3...

Usage examples

See 2 usage examples →

ECMWF AIFS Single - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

The Artificial Intelligence Forecasting System (AIFS) is a data driven forecast model developed by the European Centre for Medium-Range Weather Forecasts (ECMWF). This is the non-ensemble configuration of AIFS that produces a single forecast trace. AIFS is trained on ECMWF's ERA5 re-analysis and ECMWF's operational numerical weather prediction (NWP) analyses.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

E

Usage examples

See 2 usage examples →

EMBER Open Datasets

activity detectionactivity recognitionanalyticsbioinformaticsbrain imagesbrain modelscloud computingcomputer visiondeep learningelectrophysiologyGPSh5hdf5Homo sapiensjsonlife scienceslocalizationmachine learningmagnetic resonance imagingMus musculusneurobiologyneuroimagingneurophysiologyneurosciencenon-human primatesignal processingspeech processingzarr

This is data from, Ecosystem for Multi-modal Brain-behavior Experimentation and Research (EMBER), It contains time series behavioral and neuroscience data from animal and deidentified human subjects across multiple modalities.

Usage examples

Get To Know A Dataset - EMBER by EMBER Team
Mapping the landscape of social behavior by Ugne Klibaite, Tianqing Li, Diego Aldarondo, Jumana F Akoad, Bence P Ölveczky, Timothy W Dunn.

See 2 usage examples →

EMory BrEast Imaging Dataset (EMBED)

biasbiologycancerhealthimaginglife sciencesmammographyx-ray

EMBED is a racially diverse mammography dataset containing 3.4M screening and diagnostic images from 110,000 patients collected from 2013-2020, with an equal representation of black and white women. The dataset is comprised of 2D, synthetic 2D (C-view), and 3D (digital breast tomosynthesis, i.e. DBT) images. It contains 60,000 annotated lesions linked to structured imaging descriptors and ground truth pathologic outcomes grouped into six severity classes. This release represents 20% of the total 2D and C-view dataset and is available for research use. DBT, US, and MRI exams will be added at a ...

Usage examples

See 2 usage examples →

Emory Knee Radiograph (MRKR) dataset

bioinformaticsbiologycomputer visioncsvhealthimaginglabeledlife sciencesmachine learningmedical image computingmedical imagingradiologyx-ray

The Emory Knee Radiograph (MRKR) dataset is a large, demographically diverse collection of 503,261 knee radiographs from 83,011 patients, 40% of which are African American. This dataset provides imaging data in DICOM format along with detailed clinical information, including patient- reported pain scores, diagnostic codes, and procedural codes, which are not commonly available in similar datasets. The MRKR dataset also features imaging metadata such as image laterality, view type, and presence of hardware, enhancing its value for research and model development. MRKR addresses significant gaps ...

Usage examples

Example Notebook by Emory-HITI
Emory Knee Radiograph Dataset by Brandon Price, Jason Adleberg, Kaesha Thomas, Zach Zaiman, Aawez Mansuri, Beatrice Brown-Mulry, Chima Okecheukwu, Judy Gichoya, Hari Trivedi.

See 2 usage examples →

FLAb: Fitness Landscapes for Antibodies

life sciencesmachine learningproteinprotein template

FLAb is the largest publicly available therapeutic antibody dataset designed to train and benchmark protein AI models. It provides open-access, high-quality developability data on diverse therapeutic properties, including expression, thermostability, immunogenicity, aggregation, polyreactivity, binding affinity, and pharmacokinetics.

Usage examples

See 2 usage examples →

Ford Multi-AV Seasonal Dataset

autonomous vehiclescomputer visionlidarmappingroboticstransportationurbanweather

This research presents a challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18. The vehicles The vehicles were manually driven on an average route of 66 km in Michigan that included a mix of driving scenarios like the Detroit Airport, freeways, city-centres, university campus and suburban neighbourhood, etc. Each vehicle used in this data collection is a Ford Fusion outfitted with an Applanix POS-LV inertial measurement unit (IMU), four HDL-32E Velodyne 3D-lidar scanners, 6 Point Grey 1.3 MP Cameras arranged on the...

Usage examples

Autonomous Driving Data Service (ADDS) by Ajay Vohra, Amazon
Ford AV Dataset Tutorial by Ford Motor Company

See 2 usage examples →

GATK Structural Variation (SV) Data

bioinformaticsbiologycromwellgatk-svgeneticgenomiclife sciencesstructural variation

This dataset holds the data needed to run a structural variation discovery pipeline for Illumina short-read whole-genome sequencing (WGS) data in AWS.

Usage examples

Structural Variant Analysis on AWS with Amazon FSx for Lustre by Goldfinch Bio and Loka Inc.
AWS Setup & Execution by Goldfinch Bio and Loka Inc.

See 2 usage examples →

GEDI L2A Elevation and Height Metrics Data Global Footprint Level V002

biodiversitycarbondatacenterearth observationenergyglobalhdficelandland coverlidarmetadataorbiturbanwater

The Global Ecosystem Dynamics Investigation (GEDI) mission aims to characterize ecosystem structure and dynamics to enable radically improved quantification and understanding of the Earth’s carbon cycle and biodiversity. The GEDI instrument produces high resolution laser ranging observations of the 3-dimensional structure of the Earth. GEDI is attached to the International Space Station (ISS) and collects data globally between 51.6° N and 51.6° S latitudes at the highest resolution and densest sampling of any light detection and ranging (lidar) instrument in orbit to date. Each GEDI Version 2 granule encompasses one-fourth of an ISS orbit and includes georeferenced metadata to allow for spatial querying and subsetting.The GEDI instrument was removed fro...

Usage examples

See 2 usage examples →

GHRSST Level 2P Global Sea Surface Skin Temperature from the MODIS on the NASA Terra satellite (GDS2)

atmospheredatacenterearth observationgloballandmarinemetadatanetcdfoceansorbit

NASA produces skin sea surface temperature (SST) products from the Infrared (IR) channels of the Moderate-resolution Imaging Spectroradiometer (MODIS) onboard the Terra satellite. Terra was launched by NASA on December 18, 1999, into a sun synchronous, polar orbit with a daylight descending node at 10:30 am, to study the global dynamics of the Earth atmosphere, land and oceans. The MODIS captures data in 36 spectral bands at a variety of spatial resolutions. Two SST products can be present in these files. The first is a skin SST produced for both day and night observations, derived from the l...

Usage examples

Direct S3 Access tutorial by PODAAC
A decade of sea surface temperature from MODIS by Kilpatrick, K.A., Podestá, G., Walsh, S., Williams, E., Halliwell, V., Szczodrak, M., Brown, O.B., Minnett, P.J., & Evans, R.

See 2 usage examples →

GPM DPR Precipitation Profile L2A 1.5 hours 5 km V07 (GPM_2ADPR) at GES DISC

atmospherecontaminationdatacenterearth observationglobalhdfmetadataopendapradarwater

Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07. .2ADPR provides single- and dual-frequency-derived precipitation estimates from the Ku and Ka radars of the Dual-Frequency Precipitation Radar (DPR) on the core GPM spacecraft. The output consists of three main classes of precipitation products: those derived from the Ku-band frequency over a wide swath (245 km), those derived from the Ka-band frequency over a narrow swath (125 km), and those derived from the dual-frequency data over the narrow swath. The Ka-ban...

Usage examples

How to Read IMERG Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.
How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.

See 2 usage examples →

Genomic Characterization of Metastatic Castration Resistant Prostate Cancer

cancergenomiclife sciencesSTRIDESwhole genome sequencing

Biopsies of castration resistant prostate cancer metastases were subjected to whole genome sequencing (WGS), along with RNA-sequencing (RNA-Seq). The overarching goal of the study is to illuminate molecular mechanisms of acquired resistance to therapeutic agents, and particularly androgen signaling inhibitors, in the treatment of metastatic castration resistant prostate cancer (mCRPC). This study is made available on AWS via the NIH STRIDES Initiative.

Usage examples

Genomic characterization of metastatic castration-resistant prostate cancer patients undergoing PSMA radioligand therapy: A single-center experience by Swayamjeet Satapathy, Chandan K Das, et al.
Genomic Data Commons by National Cancer Institute

See 2 usage examples →

Genoxus Annotation

geneticgenomiclife sciencesvariant annotationwhole exome sequencingwhole genome sequencing

Genoxus Annotation is a harmonized and curated collection of human genetic variant databases designed to support accurate and salable variant annotation. Variant annotation following genetic testing such as whole genome sequencing (WGS) or whole exome sequencing (WES) is a critical step in identifying and interpreting disease-associated genetic factors. As sequencing technologies continue to generate large volumes of genomic data, robust and well-structured annotation resources are essential for translating raw variant calls into clinically meaningful insights. Genoxus Annotation v1.0 integrat...

Usage examples

See 2 usage examples →

Harvard Electroencephalography Database

bioinformaticsdeep learninglife sciencesmachine learningmedicineneurophysiologyneuroscience

The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University:Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH).

Usage examples

Harvard Electroencephalography Database by Zafar, S., Loddenkemper, T., Lee, J. W., Cole, A., Goldenholz, D., Peters, J., et al.
Harvard-EEG-Database-Tools by Brain Data Science Platform (BDSP)

See 2 usage examples →

Harvard-Emory ECG Database

bioinformaticsdeep learninglife sciencesmachine learningmedicineneurophysiologyneuroscience

The Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investigators.

Usage examples

Harvard Electroencephalography Database by Moura Junior, V.; Reyna, M.; Hong, S.; Gupta, A.; Ghanta, M.; Sameni, R., et al.
WFDB Software Package by Moody, G., Pollard, T., & Moody, B.

See 2 usage examples →

Hecatomb Databases

bioinformaticsgeneticgenomiclife sciencesmetagenomicsviruswhole genome sequencing

Preprocessed databases for use with the Hecatomb pipeline for viral and phage sequence annotation.

Usage examples

See 2 usage examples →

Human Cell Atlas

biologycell biologycell imaginggene expressiongenomegenomicHomo sapienslife sciencesMus musculussingle-cell transcriptomicstranscriptomics

The Human Cell Atlas (HCA) is a collaborative community of international scientists. Our mission is to create comprehensive reference maps of all the cells in the human body as a basis for both understanding human health and diagnosing, monitoring, and treating disease. The HCA registry has more than one thousand member scientists from hundreds of institutions around the world. The project is steered and governed by an Organizing Committee, co-chaired by Aviv Regev and Sarah Teichmann.

Usage examples

The Human Cell Atlas: towards a first draft atlas by Various authors
The Human Cell Atlas from a cell census to a unified foundation model by Jennifer E. Rood, Samantha Wynne, Lucia Robson, Anna Hupalowska, John Randell, Sarah A. Teichmann & Aviv Regev
The Human Cell Atlas: towards a first draft atlas by Various authors
The Human Cell Atlas White Paper by Aviv Regev, Sarah Teichmann, Orit Rozenblatt-Rosen, Michael Stubbington, Kristin Ardlie, Ido Amit, Paola Arlotta, Gary Bader, Christophe Benoist, Moshe Biton, Bernd Bodenmiller, Benoit Bruneau, Peter Campbell, Mary Carmichael, Piero Carninci, Leslie Castelo-Soccio, Menna Clatworthy, Hans Clevers, Christian Conrad, Roland Eils, Jeremy Freeman, Lars Fugger, Berthold Goettgens, Daniel Graham, Anna Greka, Nir Hacohen, Muzlifah Haniffa, Ingo Helbig, Robert Heuckeroth, Sekar Kathiresan, Seung Kim, Allon Klein, Bartha Knoppers, Arnold Kriegstein, Eric Lander, Jane Lee, Ed Lein, Sten Linnarsson, Evan Macosko, Sonya MacParland, Robert Majovski, Partha Majumder, John Marioni, Ian McGilvray, Miriam Merad, Musa Mhlanga, Shalin Naik, Martijn Nawijn, Garry Nolan, Benedict Paten, Dana Pe'er, Anthony Philippakis, Chris Ponting, Steve Quake, Jayaraj Rajagopal, Nikolaus Rajewsky, Wolf Reik, Jennifer Rood, Kourosh Saeb-Parsy, Herbert Schiller, Steve Scott, Alex Shalek, Ehud Shapiro, Jay Shin, Kenneth Skeldon, Michael Stratton, Jenna Streicher, Henk Stunnenberg, Kai Tan, Deanne Taylor, Adrian Thorogood, Ludovic Vallier, Alexander van Oudenaarden, Fiona Watt, Wilko Weicher, Jonathan Weissman, Andrew Wells, Barbara Wold, Ramnik Xavier, Xiaowei Zhuang, Human Cell Atlas Organizing Committee
The network effect: studying COVID-19 pathology with the Human Cell Atlas by Sarah Teichmann, Aviv Regev

See 5 usage examples →

IWMI DIWASA Blue ET for Africa

evapotranspirationground waterirrigated croplandsurface waterwater

Blue evapotranspiration (Blue ET) is the portion of ET derived from blue water sources, including surface water (rivers, lakes, reservoirs) and groundwater used for irrigation. It is a key component of blue water fluxes in water accounting. Blue ET consists of evaporation from irrigated fields, transpiration from irrigated crops, and water lost from artificial storage. It helps assess water productivity in irrigated agriculture, quantify consumptive water use, and support sustainable water resource management, particularly in water-scarce regions.

Usage examples

See 2 usage examples →

Indexes for Kaiju

bioinformaticsbiologygenomiclife sciencesmetagenomicsmicrobiomereference indexwhole genome sequencing

This dataset comprises pre-built indexes for the bioinformatics software Kaiju, which is used for taxonomic classification of metagenomic sequencing data. Various indexes for different source reference databases are available.

Usage examples

Fast and sensitive taxonomic classification for metagenomics with Kaiju by Peter Menzel et al (2016)
Quickstart Tutorial for downloading the index files and running Kaiju. by Peter Menzel

See 2 usage examples →

Indian Supreme Court Judgments

legal data

This dataset contains judgements from the Indian Supreme Court, downloaded from ecourts website. It contains judgments from 1950 to 2025, along with raw metadata (in json format) and structured metadata in parquet format. Judgments are available in both English and regional Indian languages in zip format for easier download.

Usage examples

See 2 usage examples →

Integrative Analysis of Lung Adenocarcinoma in Environment and Genetics Lung cancer Etiology (Phase 2)

cancerepigenomicsgenomiclife sciencesSTRIDESwhole exome sequencingwhole genome sequencing

We performed whole genome sequencing and whole exome sequencing of 31 lung adenocarcinoma (LUAD) samples from the Environment And Genetics in Lung cancer Etiology (EAGLE) study. The EAGLE study is made available on AWS via the NIH STRIDES Initiative (https://aws.amazon.com/blogs/publicsector/aws-and-national-institutes-of-health-collaborate-to-accelerate-discoveries-with-strides-initiative/).

Usage examples

See 2 usage examples →

James Webb Space Telescope (JWST)

astronomy

The James Webb Space Telescope (JWST) is the world's next flagship infrared observatory led by NASA with its partners, ESA (European Space Agency), and CSA (Canadian Space Agency). JWST offers scientists the opportunity to observe galaxy evolution, the formation of stars and planets, exoplanetary systems, and our own solar system, in ways never before possible.

Usage examples

MAST Archive Manual for JWST Data by MAST, STScI
Getting Started with JWST Data (JDox) by STScI

See 2 usage examples →

LEarning biOchemical Prostate cAncer Recurrence from histopathology sliDes challenge (LEOPARD) Dataset

cancercomputational pathologycomputer visiondeep learninggrand-challenge.orghistopathologylife sciences

"This dataset contains the all data for the LEarning biOchemical Prostate cAncer Recurrence from histopathology sliDes challenge or LEOPARD.Prostate cancer, impacting 1.4 million men annually, is a prevalent malignancy (H. Sung et al., 2021). A substantial number of these individuals undergo prostatectomy as the primary curative treatment. The efficacy of this surgery is assessed, in part, by monitoring the concentration of prostate-specific antigen (PSA) in the bloodstream. While the role of PSA in prostate cancer screening is debatable (W. F. Clark et al., 2018; E. A. M. Heijnsdijk et al., 2018), it serves as a valuable biomarker for postprostatectomy follow-up in patients. Following successful surgery, PSA concentration is typically undetectable (<0.1 ng/mL) within 4-6 weeks (S. S. Goonewardene et al., 2014). However, approximately 30% of patie...

Usage examples

See 2 usage examples →

Multi-Anatomy Post-Surgical Magnetic Resonance Dataset (MAPSMR)

life sciencesmachine learningmagnetic resonance imagingmedical imaging

The MAPSMR dataset is a multi-organ, post-surgical MRI benchmark dataset focused on organ absence and altered anatomy after common abdominal and pelvic surgeries. The dataset includes cases such as cholecystectomy, prostatectomy, nephrectomy, colectomy, hepatectomy, and related procedures, with annotations identifying surgically absent organs and post-treatment anatomical changes.

Usage examples

Decipher-MR: a vision-language foundation model for 3D MRI representations by Zhijian Yang, Noel DSouza, Istvan Megyeri, Xiaojian Xu, Amin Honarmandi Shandiz, Farzin Haddadpour, Krisztian Koos, Laszlo Rusko, Emanuele Valeriano, Bharadwaj Swaminathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout, Erhan Bas
Get to know a dataset - GEHCAI-MAPSMR by https://github.com/fastestimator

See 2 usage examples →

NASA Physical Sciences Informatics (PSI)

chemistryfluid dynamicsmaterials sciencephysicsspace biology

NASA's Physical Sciences Research Program, along with its predecessors, has conducted significant fundamental and applied research in the physical sciences. The International Space Station (ISS) is an orbiting laboratory that provides an ideal facility to conduct long-duration experiments in the near absence of gravity and allows continuous and interactive research similar to Earth-based laboratories. This enables scientists to pursue innovations and discoveries not currently achievable by other means. NASA's Physical Sciences Research Program also benefits from collaborations with several ...

Usage examples

PSI System Demonstration by Cheryl Payne
Researcher's Guide to Physical Sciences Informatics System by NASA ISS Program Science Office

See 2 usage examples →

NIH NLM NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS

csvlife sciencesSTRIDEStxtxml

PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on PMCID and version number:

The PMC Open Access (OA) Subset, which includes all articles in PMC that are available for reuse based on terms specified by the publisher. The majority of avai...

Usage examples

Extracting insights from PubMed articles using Amazon Q Business by Bharath Gunapati and Stefan Mationg
Accessing PMC Article Datasets Using Amazon Web Services by NCBI PMC

See 2 usage examples →

NOAA Analysis of Record for Calibration (AORC) Dataset

agricultureagricultureclimatedisaster responseenvironmentaltransportationweather

...

Usage examples

Explore the AORC 1.1 dataset in Zarr by Michael AuCoin
The Office of Water Prediction's Analysis of Record for Calibration, version 1.1: Dataset description and precipitation evaluation (09 July 2023). J. Am. Water Resour. Assoc., 59 (6). 1246-1272. by Greg Fall, David Kitzmiller, Sandra Pavlovic, Ziya Zhang, Nathan Patrick, Michael St. Laurent, Carl Trypaluk, Wanru Wu, and Dennis Miller

See 2 usage examples →

NOAA Climate Forecast System (CFS)

agricultureclimatemeteorologicalweather

The Climate Forecast System (CFS) is a model representing the global interaction between Earth's oceans, land, and atmosphere. Produced by several dozen scientists under guidance from the National Centers for Environmental Prediction (NCEP), this model offers hourly data with a horizontal resolution down to one-half of a degree (approximately 56 km) around Earth for many variables. CFS uses the latest scientific approaches for taking in, or assimilating, observations from data sources including surface observations, upper air balloon observations, aircraft observations, and satellite obser...

Usage examples

The NCEP Climate Forecast System Version 2 by Saha, Suranjana, and Coauthors
The NCEP Climate Forecast System Reanalysis by Saha, Suranjana, and Coauthors

See 2 usage examples →

NOAA GEFS - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

The Global Ensemble Forecast System (GEFS) is a National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) weather forecast model. GEFS creates 31 separate forecasts (ensemble members) to describe the range of forecast uncertainty.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

NOAA GEFS forecast, 35 day - Weather forecasts from the Global Ensemble Forecast System (GEFS) operated by NOAA NWS NCEP.
NOAA GEFS analysis - Weather analysis from the Global Ensemble Forecast System (GEFS) operated by

...

Usage examples

See 2 usage examples →

NOAA Global Forecast System (GFS) netCDF Formatted Data

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The GFS data files stored here can be immediately used for OAR/ARL’s NOAA-EPA Atmosphere-Chemistry Coupler Cloud (NACC-Cloud) tool, and are in a Network Common Data Form (netCDF), which is a very common format used across the scientific community. These particular GFS files contain a comprehensive number of global atmosphere/land variables at a relatively high spati...

Usage examples

NOAA’s Global Forecast System Data in the Cloud for Community Air Quality Modeling. Atmosphere 2023, 14, 1110. by Campbell, P.C.; Jiang, W.; Moon, Z.; Zinn, S.; Tang, Y.
U.S. EPA’s application of NOAA’s GFS netCDF data on AWS and NACC-Cloud: 'Expedited modeling of burn events results (EMBER): A screening-level dataset of 2023 ozone fire impacts in the US.' by Simon et al.

See 2 usage examples →

NOAA HRRR - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

The High-Resolution Rapid Refresh (HRRR) is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation. Radar data is assimilated in the HRRR every 15 min over a 1-h period adding further detail to that provided by the hourly data assimilation from the 13km radar-enhanced Rapid Refresh.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

NOAA HRRR forecast, 48 hour - Weather forecasts from the High-Resolution Rapid Refresh (HRRR) model operated by NOAA NWS NCEP.
Details →

Usage examples
- NOAA HRRR forecast, 48 hour — Quickstart by dynamical.org
- NOAA HRRR analysis — Quickstart by dynamical.org
See 2 usage examples →

NOAA JISAO’s Seasonal Coastal Ocean Prediction of the Ecosystem (J-SCOPE)

chemistryclimatecoastalmarineoceans

J-SCOPE (JISAO’s Seasonal Coastal Ocean Prediction of the Ecosystem) is funded by NOAA and presented by NANOOS. This project aims to provide experimental seasonal forecasts (six to nine months) of upper ocean properties, based on operational simulations by NOAA's Climate Forecast System (CFS) model, and dynamical downscaling with a high-resolution version of the Regional Ocean Model System (ROMS) that includes a state-of-the-art biogeochemical module. Forecasts of specific oceanic properties crucial to the nearshore and coastal marine ecosystem such as upwelling, pH, mixed layer depth, oxygen concentration and plankton distributions are anticipated with updates on a monthly basis. For more information about the forecast system, please read Siedlecki et al. 2016.The Regional Ocean Modeling System (R...

Usage examples

J-SCOPE About the Model by NOAA J_SCOPE
J-SCOPE Peer Reviewed Publications by Multiple publications available through the provided link

See 2 usage examples →

NOAA S-104 Water Level Data

coastalhydrographymarine navigationoceanswater

S-104 is a data and metadata encoding specification that is part of the S-100 Universal Hydrographic Data Model, an international standard for hydrographic data. This collection of data contains water level forecast guidance from NOAA's Global Surge and Tide Operational Forecast System 2-D (STOFS-2D-Global), an operational hydrodynamic nowcast and forecast modeling system for global water level conditions. These datasets are encoded as HDF-5 files conforming to the S-104 specification, and are geospatially subset into individual tiles conforming to the NOAA/OCS Nautical Product Tiling Sche...

Usage examples

See 2 usage examples →

NOAA Unified Forecast System Subseasonal to Seasonal Prototypes

agricultureclimatedisaster responseenvironmentalmeteorologicaloceansweather

The Unified Forecast System Subseasonal to Seasonal prototypes consist of reforecast data from the UFS atmosphere-ocean coupled model experimental prototype version 5, 6, 7, and 8 produced by the Medium Range and Subseasonal to Seasonal Application team of the UFS-R2O project. The UFS prototypes are the first dataset released to the broader weather community for analysis and feedback as part of the development of the next generation operational numerical weather prediction system from NWS. The datasets includes all the major weather variables for atmosphere, land, ocean, sea ice, and ocean wav...

Usage examples

The impact of tropical SST biases on the S2S precipitation forecast skill over the Contiguous United States in the UFS global coupled model by Hedanqiu Bai, Bin Li, Avichal Mehra, Jessica Meixner, Shrinivas Moorthi, Sulagna Ray, Lydia Stefanova, Jiande Wang, Jun Wang, Denise Worthen, Fanglin Yang, and Cristiana Stan
Advances in Seasonal Predictions of Arctic Sea Ice With NOAA UFS by Jieshun Zhu, Wanqiu Wang, Yanyun Liu, Arun Kumar, and David DeWitt

See 2 usage examples →

NOAA World Ocean Database (WOD)

climateoceans

The World Ocean Database (WOD) is the largest uniformly formatted, quality-controlled, publicly available historical subsurface ocean profile database. From Captain Cook's second voyage in 1772 to today's automated Argo floats, global aggregation of ocean variable information including temperature, salinity, oxygen, nutrients, and others vs. depth allow for study and understanding of the changing physical, chemical, and to some extent biological state of the World's Oceans. Browse the bucket via the AWS S3 explorer: https://noaa-wod-pds.s3.amazonaws.com/index.html

Usage examples

The World Ocean Database Introduction by Tim P. Boyer, Olga K. Baranova, Carla Coleman, Hernan E. Garcia, Alexandra Grodsky, Ricardo A. Locarnini, Alexey V. Mishonov, Christopher R. Paver, James R. Reagan, Dan Seidov, Igor V. Smolyar, Katharine W. Weathers, Melissa M. Zweng
The World Ocean Database User's Manual by Hernan E. Garcia, Tim P. Boyer, Ricardo A. Locarnini, Olga K. Baranova, Melissa M. Zweng

See 2 usage examples →

NUVIEW - Multi-State Geospatial Data

demdisaster responsegeospatiallidarnatural resourcesatellite imagerysustainability

NUVIEW hosts and manages a unified collection of geospatial datasets from multiple U.S. states and agencies (LiDAR, orthophoto imagery, DEM/DSM, and derivative products). Data are organized in a single S3 bucket with a logical sub-folder hierarchy: /state_or_agency_product_type/acqusition_project_name/.... All assets are cloud-optimized (COG GeoTIFFs, COPC (Cloud Optimized Point Cloud) LAZ point clouds, etc.) and available under open licenses.

Usage examples

See 2 usage examples →

National Archives Catalog

archivesgovernment recordsnaranational archives catalog

The National Archives Catalog dataset contains all of the descriptions; authority records; digitized and electronic records; and tags, transcriptions and comments for NARA’s archival holdings available in the Catalog.

Usage examples

National Archives Catalog on the AWS Registry of Open Data by National Archives and Records Administration
National Archives Catalog by National Archives and Records Administration

See 2 usage examples →

National Cancer Institute Center for Cancer Research - Diffuse Large B Cell Lymphoma (DLBCL) Genomics and Expression

cancergenomiclife sciences

The study describes integrative analysis of genetic lesions in 574 diffuse large B cell lymphomas (DLBCL) involving exome and transcriptome sequencing, array-based DNA copy number analysis and targeted amplicon resequencing. The dataset contains open RNA-Seq Gene Expression Quantification data.

Usage examples

Genomic Data Commons by National Cancer Institute
Genetics and Pathogenesis of Diffuse Large B Cell Lymphoma by Roland Schmitz, Ph.D., George W. Wright, Ph.D., Da Wei Huang, M.D., et al.

See 2 usage examples →

Nighttime-Fire-Flare

anomaly detectionclassificationdisaster responseearth observationenvironmentalNASA SMD AIsatellite imagerysocioeconomicurban

Detection of nighttime combustion (fire and gas flaring) from daily top of atmosphere data from NASA's Black Marble VNP46A1 product using VIIRS Day/Night Band and VIIRS thermal bands.

Usage examples

Potentially underestimated gas flaring activities—a new approach to detect combustion using machine learning and NASA's Black Marble product suite by Srija Chakraborty, Tomohiro Oda, Virginia Kalb, Zhuosen Wang, Miguel O Román
NASA SMD AI Workshop Report by SMD Artificial Intelligence Machine Learning (AI/ML) Working Group

See 2 usage examples →

OPERA Coregistered Single-Look Complex from Sentinel-1 Static Layers validated product (Version 1)

coastalearth observationhdficelandmetadataoceansorbitradarsentinel-1synthetic aperture radarxml

The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Coregistered Single-Look Complex (CSLC) from Sentinel-1 (S1) Static Layers (CSLC-S1-STATIC) validated product contains static radar geometry layers associated with the OPERA Coregistered Single-Look Complex (CSLC) from Sentinel-1 (S1) validated product. Due to the S1 mission’s narrow orbital tube, radar-geometry layers vary slightly over time for each position on the ground, and therefore are considered static. These static layers are provided separately from the OPERA CSLC-S1 product, as they are produced only once ...

Usage examples

Generate interferograms and map the lava flow emplacement using OPERA CSLC-S1 by M. Grace Bato
Generate inteferograms without the need to download OPERA CSLC-S1 products locally by M. Grace Bato and K. Devlin

See 2 usage examples →

OPERA Coregistered Single-Look Complex from Sentinel-1 validated product (Version 1)

coastalearth observationhdficelandmetadataoceansorbitradarsentinel-1synthetic aperture radarxml

The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Coregistered Single-Look Complex (CSLC) from Sentinel-1 validated product consists of Single Look Complex (SLC) images which contain both amplitude and phase information of the complex radar return. The amplitude is primarily determined by ground surface properties (e.g., terrain slope, surface roughness, and physical properties), and phase primarily represents the distance between the radar and ground targets corrected for the geometrical distance between the two based on the knowledge from Digital Elevation Model a...

Usage examples

Generate inteferograms without the need to download OPERA CSLC-S1 products locally by M. Grace Bato and K. Devlin
Generate interferograms and map the lava flow emplacement using OPERA CSLC-S1 by M. Grace Bato

See 2 usage examples →

OPERA Dynamic Surface Water Extent from Sentinel-1 (Version 1)

cogdatacenterearth observationgloballandorbitradarsentinel-1surface waterwater

This dataset contains Level-3 Dynamic OPERA Surface Water Extent from Sentinel-1 (DSWx-S1) product version 1. DSWx-S1 provides near-global geographical mapping of surface water extent over land at a spatial resolution of 30 meters over the Military Grid reference System (MGRS) grid system, with a temporal revisit frequency between 6-12 days. Using Sentinel-1 radar observations, DSWx-S1 maps open inland water bodies greater than 3 hectares and 200 meters in width, irrespective of cloud conditions and daylight illumination that often pose challenges to optical sensors. Forward production of the...

Usage examples

Working with OPERA Dynamic Surface Water Extent (DSWx) Data by Nicholas Tarpinian
Generate Flood Maps without downloading OPERA DSWx-S1 products locally by S. Sangha

See 2 usage examples →

OPERA Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 product (Version 1)

cogearth observationenvironmentalgloballandland coverland usesatellite imagery

The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 (HLS) product Version 1 maps vegetation disturbance alerts that are derived from data collected by Landsat 8 and Landsat 9 Operational Land Imager (OLI) and Sentinel-2A, Sentinel-2B, and Sentinel-2C Multi-Spectral Instrument (MSI). A vegetation disturbance alert is detected at 30 meter (m) spatial resolution when there is an indicated decrease in vegetation cover within an HLS pixel. The Level-3 data product also provides additional information about more ...

Usage examples

Getting Started with OPERA DIST-ALERT-HLS Products by R. Dhillon and M. Grace Bato
Getting Started with OPERA DIST Product by M. Grace Bato and R. Dhillon

See 2 usage examples →

OPERA Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 provisional product (Version 0)

cogearth observationenvironmentalgloballandland coverland use

The OPERA_L3_DIST-ALERT-HLS Version 0 data product was decommissioned on April 25, 2025. Users are encouraged to use the OPERA_L3_DIST-ALERT-HLS V1 data product which was released on March 14, 2024, and has achieved stage 1 validation.The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 (HLS) provisional data product Version 0 maps vegetation disturbance alerts from data collected by Landsat 8 and Landsat 9 Operational Land Imager (OLI) and Sentinel-2A, Sentinel-2B, and Sentinel-2C Multi-Spectral Instrum...

Usage examples

Getting Started with OPERA DIST-ALERT-HLS Products by R. Dhillon and M. Grace Bato
Getting Started with OPERA DIST Product by M. Grace Bato and R. Dhillon

See 2 usage examples →

OS-Climate Physrisk

climate riskextreme weatherhazardhazard indicatorperilphysical

Collection of adapted and derived hazard indicator datasets optimized for running physical climate risk analyses.

Usage examples

See 5 usage examples →

Ocean Radar - Newcastle site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Newcastle (NEWC) HF ocean radar system covers an area of the Central Coast, New South Wales, an area subject to the variability of the East Australian Current (EAC) and its coupling with coastal winds, tides, and waves. In this area the EAC separates from the coast and generates several eddies which control the larval species and the higher marine species and ecosystems in which they forage.The NEWC HF ocean radar system consists of two SeaSonde crossed loop direction finding stations located at Sea Rocks (32.441575 S 152.539022 E) and Red Head (33.010245 S 151.727059 E). These radars ope...

Usage examples

See 2 usage examples →

OpenCRAVAT

geneticgenomiclife sciencessqlitetertiary analysisvariant annotation

OpenCRAVAT is a module variant annotation tool developed by KarchinLab at Johns Hopkins. This dataset is a mirror of the OpenCRAVAT store available at https://store.opencravat.org. You can configure OpenCRAVAT to use this mirror by editing the "cravat-system.yml" file. The path to this file is in the first output line of the command "oc config system". In that file, change the value of "store_url" to "https://opencravat-store-aws.s3.amazonaws.com".

Usage examples

Changing the OpenCRAVAT store url by Kyle Moad
OpenCRAVAT by Karchinlab

See 2 usage examples →

Oregon Health & Science University Chronic Neutrophilic Leukemia Dataset

cancergenomiclife sciences

The OHSU-CNL study offers the whole exome and RNA-sequencing on a cohort of 100 cases with rare hematologic malignancies such as Chronic neutrophilic leukemia (CNL), atypical chronic myeloid leukemia (aCML), and unclassified myelodysplastic syndrome/myeloproliferative neoplasms (MDS/MPN-U). This dataset contains open RNA-Seq Gene Expression Quantification data.

Usage examples

Genomic landscape of neutrophilic leukemias of ambiguous diagnosis by Zhang H, Wilmot B, Bottomly D et al.
Genomic Data Commons by National Cancer Institute

See 2 usage examples →

PALSAR-2 ScanSAR Turkey & Syria Earthquake (L2.1 & L1.1)

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsustainabilitysynthetic aperture radar

JAXA has responded to the Earthquake events in Turkey and Syria by conducting emergency disaster observations and providing data as requested by the Disaster and Emergency Management Authority (AFAD), Ministry of Interior in Turkey, through Sentinel Asia and the International Disaster Charter. Additional information on the event and dataset can be found here. The 25 m PALSAR-2 ScanSAR is normalized backscatter data of PALSAR-2 broad area observation mode with observation width of 350 km. Polarization data are stored as 16-bit digital numbers (DN). The DN values can be converted to gamma naught...

Usage examples

ALOS series Open and Free Data by JAXA EORC
ALOS-2 observations of earthquakes in southeastern Turkey in 2023 by JAXA EORC

See 2 usage examples →

Pancreatic Cancer Organoid Profiling

cancergeneticgenomiclife sciencesSTRIDEStranscriptomicswhole genome sequencing

This study generated a collection of patient-derived pancreatic normal and cancer organoids and it was sequenced using Whole Genome Sequencing (WGS), Whole Exome Sequencing (WXS) and RNA-Seq as well as matched tumor and normal tissue if available. The study provides a valuable resource for pancreatic cancer researchers. The dataset contains open RNA-Seq Gene Expression Quantification data and controlled WGS/WXS/RNA-Seq Aligned Reads, WXS Annotated Somatic Mutation, WXS Raw Somatic Mutation, and RNA-Seq Splice Junction Quantification.

Usage examples

Organoid Profiling Identifies Common Responders to Chemotherapy in Pancreatic Cancer by Tiriac H, Belleau P, Engle DD, Plenker D, Deschênes A, Somerville TD, et al.
Genomic Data Commons by National Cancer Institute

See 2 usage examples →

RAPID NRT Flood Maps

agriculturedisaster responseearth observationenvironmentalwater

Near Real-time and archival data of High-resolution (10 m) flood inundation dataset over the Contiguous United States, developed based on the Sentinel-1 SAR imagery (2016-current) archive, using an automated Radar Produced Inundation Diary (RAPID) algorithm.

Usage examples

Near Real-Time Nonobstructed Flood Inundation Mapping by Synthetic Aperture Radar by Xinyi Shen, Emmanouil N. Anagnostou, George H. Allen, G. Robert Brakenridge, Albert J. Kettner
Inundation Extent Mapping by Synthetic Aperture Radar: A Review by Xinyi Shen, Dacheng Wang, Kebiao Mao, Emmanouil Anagnostou, and Yang Hong

See 2 usage examples →

REDASA COVID-19 Open Data

coronavirusCOVID-19information retrievallife sciencesnatural language processingtext analysis

The REaltime DAta Synthesis and Analysis (REDASA) COVID-19 snapshot contains the output of the curation protocol produced by our curator community. A detailed description can be found in our paper. The first S3 bucket listed in Resources contains a large collection of medical documents in text format extracted from the CORD-19 dataset, plus other sources deemed relevant by the REDASA consortium. The second S3 bucket contains a series of documents surfaced by Amazon Kendra that were considered relevant for each medical question asked. The final S3 bucket contains the GroundTruth annotations cr...

Usage examples

Curadr - Curation Platform by REDASA Consortium, Imperial College London
Using a Secure, Continually Updating, Web Source Processing Pipeline to Support the Real-Time Data Synthesis and Analysis of Scientific Literature: Development and Validation Study by Uddhav Vaghela, Simon Rabinowicz, Paris Bratsos, Guy Martin, Epameinondas Fritzilas, et al.

See 2 usage examples →

RNA structure by fragmentation frequency

bioinformaticsgenomiclife sciencestranscriptomics

The fragSTRUC project devises a software to extract RNA secondary structure information from Illumina datasets, based on divalent ions in standard RNA-seq library preparation fragmenting sequences at non-base-paired regions of RNA.

Usage examples

Accessing the fragSTRUC dataset on AWS by Yuk Kei Wan and Leonard Schärfen
fragSTRUC: RNA structure by fragmentation frequency by Yuk Kei Wan and Leonard Schärfen

See 2 usage examples →

Reference Indexes for krepp

bioinformaticslife sciencesmetagenomicsmicrobiomereference index

krepp is an alignment-free method for estimating distances and phylogenetic placement of individual reads to many thousands of reference genomes in a scalable manner using k-mers. This dataset includes k-mer-based indexes consisting of ultra-large reference genome sets that can be efficiently analyzed using krepp.

Usage examples

See 2 usage examples →

Reference data for HiFi human WGS

genetichealthHomo sapienslife scienceslong read sequencingmappingvariant annotationvcfwhole genome sequencing

Reference data bundle for analyzing HiFi human whole genome sequencing data

Usage examples

See 2 usage examples →

SatPM2.5

air qualityatmosphereenvironmentalhealthnetcdf

Fine particulate matter (PM2.5) concentrations are estimated using information from satellite-, simulation- and monitor-based sources. Aerosol optical depth from multiple satellites (MODIS, VIIRS, MISR, SeaWiFS, and VIIRS) and their respective retrievals (Dark Target, Deep Blue, MAIAC) is combined with simulation (GEOS-Chem) based upon their relative uncertainties as determined using ground-based sun photometer (AERONET) observations to produce geophysical estimates that explain most of the variance in ground-based PM2.5 measurements. A subsequent statistical fusion incorporates additional inf...

Usage examples

See 2 usage examples →

Satellogic EarthView dataset

cogcomputer visionearth observationgeospatialimage processingsatellite imagerystac

Satellogic EarthView dataset includes high-resolution satellite images captured over all continents. The dataset is organized in Hive partition format and hosted by AWS. The dataset can be accessed via STAC browser or aws cli. Each item of the dataset corresponds to a specific region and date, with some of the regions revisited for additional data. The dataset provides Top-of-Atmosphere (TOA) reflectance values across four spectral bands (Red, Green, Blue, Near-Infrared) at a Ground Sample Distance (GSD) of 1 meter, accompanied by comprehensive metadata such as off-nadir angles, sun elevation,...

Usage examples

Explore Satellogic EarthView in SageMaker Studio Lab (SMSL) by Javier Marin
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision by Velázquez, Diego and Rodríguez, Pau and Alonso, Sergio and Gonfaus, Josep M. and González, Jordi and, Richarte, Gerardo and Marín, Javier and Bengio, Yoshua and Lacoste, Alexandre

See 2 usage examples →

SeeFar V0

biodiversityclimatecoastalearth observationenvironmentalgeospatialglobalmachine learningmappingnatural resourcesatellite imagerysustainability

A collection of multi-resolution satellite images from both public and commercial satellites. The dataset is specifically curated for training geospatial foundation models.

Usage examples

SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models by Lowman, J., Zheng, K. L., Fraser, R., The, J. V. G., & Valipour, M.
Getting Started with SeeFar for Multi-Resolution Geospatial Analysis by James Lowman - Coastal Carbon

See 2 usage examples →

Sentinel-1 SLC dataset for South and Southeast Asia, Taiwan, Korea and Japan

disaster responseearth observationenvironmentalgeospatialsatellite imagerysynthetic aperture radar

The S1 Single Look Complex (SLC) dataset contains Synthetic Aperture Radar (SAR) data in the C-Band wavelength. The SAR sensors are installed on a two-satellite (Sentinel-1A and Sentinel-1B) constellation orbiting the Earth with a combined revisit time of six days, operated by the European Space Agency. The S1 SLC data are a Level-1 product that collects radar amplitude and phase information in all-weather, day or night conditions, which is ideal for studying natural hazards and emergency response, land applications, oil spill monitoring, sea-ice conditions, and associated climate change effec...

Usage examples

Sentinel-1 Opendataset Wiki and Tutorials by Earth Observatory of Singapore
Rapid flood and damage mapping using synthetic aperture radar in response to Typhoon Hagibis, Japan by Cheryl W. J. Tay, Sang-Ho Yun, Shi Tong Chin, Alok Bhardwaj, Jungkyo Jung & Emma M. Hill

See 2 usage examples →

Somatic Mosaicism across Human Tissues (SMaHT)

bambioinformaticsbiologygeneticgenomicimaginglife scienceswhole genome sequencing

The Somatic Mosaicism across Human Tissues (SMaHT) project is an NIH Common Fund consortium (2023-) aimed to comprehensively characterize somatic variation ("mosaicism") in normal human tissues. While most genetic studies have relied on blood-derived DNA, SMaHT captures the full spectrum of DNA variation across cell types, tissues, and organs from phenotypically normal individuals to better understand the role of somatic mosaicism in human development, aging, and disease progression.Researchers in the consortium develop and apply experimental and computational methods, paired with th...

Usage examples

Somatic Mosaicism across Human Tissues Data Portal by SMaHT Data Analysis Center (DAC)
The Somatic Mosaicism across Human Tissues Network by Coorens T, Oh J, Choi Y, Lim N, Zhao B, Voshall A et al.

See 2 usage examples →

Sounds of Central African landscapes

biodiversitybiologyecosystemsgeospatiallandlife sciencesnatural resourcesurvey

Archival soundscapes recorded in the rainforest landscapes of Central Africa, with a focus on the vocalizations of African forest elephants (Loxodonta cyclotis).

Usage examples

Listen to the rainforest chorus that's helping scientists protect African elephants by Amazon Staff
You can now hear rainforest sounds worldwide-here's why that matters by Rachel Fobar

See 2 usage examples →

State of Colorado Elevation Data

geospatialimagingmapping

The State of Colorado has gathered public historical elevation data.

Usage examples

See 2 usage examples →

Sub-Meter Canopy Tree Height of California in 2020 by CTrees.org

aerial imagerycogconservationdeep learningearth observationenvironmentalgeospatialimage processingland cover

Canopy Tree Height maps for California in 2020. Created using a deep learning model on very-high-resolution airborne imagery from the National Agriculture Imagery Program (NAIP) by United States Department of Agriculture (USDA).

Usage examples

Sub-Meter Tree Height Mapping of California using Aerial Images and LiDAR-Informed U-Net Model by Fabien H Wagner, Sophia Roberts, Alison L Ritz, Griffin Carter, Ricardo Dalagnol, Samuel Favrichon, Mayumi CM Hirye, Martin Brandt, Philippe Ciais and Sassan Saatchi
Canopy Height Unlocked: California Forest Resources Detailed in New Tree-level Map by Daniel Melling

See 2 usage examples →

TESS-SPOC

astronomy

The data products for the TESS-SPOC FFI targets are the same as for the TESS two-minute cadence targets: calibrated target pixel files, simple aperture photometry (SAP) flux time series, presearch data conditioning corrected (PDCSAP) flux time series, and cotrending basis vectors (CBV). Since TESS-SPOC relies on FFIs, data are sampled at the FFI cadence.

Usage examples

TESS Science Processing Operations Center FFI Target List Products by Caldwell et. al
TIKE: a free, Jupyter-based cloud platform to access and analyze MAST's AWS timeseries data by MAST Staff

See 2 usage examples →

TIGER Training

cancercomputational pathologycomputer visiondeep learninggrand-challenge.orghistopathologylife sciences

"This dataset contains the training data for the Tumor InfiltratinG lymphocytes in breast cancER or TIGER challenge. TIGER is the first challenge on fully automated assessment of tumor-infiltrating lymphocytes (TILs) in breast cancer histopathology slides. TILs are proving to be an important biomarker in cancer patients as they can play a part in killing tumor cells, particularly in some types of breast cancer. Identifying and measuring TILs can help to better target treatments, particularly immunotherapy, and may result in lower levels of other more aggressive treatments, including chemo...

Usage examples

See 2 usage examples →

Tabula Sapiens

Biohubbiologyencyclopedicgeneticgenomichealthlife sciencesmedicinesingle-cell transcriptomics

Tabula Sapiens is a benchmark, first-draft human cell atlas of over 1.1M cells from 28 organs of 24 normal human subjects. This work is the product of the Tabula Sapiens Consortium. Taking the organs from the same individual controls for genetic background, age, environment, and epigenetic effects, and allows detailed analysis and comparison of cell types that are shared between tissues. Our work creates a detailed portrait of cell types as well as their distribution and variation in gene expression across tissues and within the endothelial, epithelial, stromal and immune compartments. We...

Usage examples

The Tabula Sapiens: a multiple organ single cell transcriptomic atlas of humans by The Tabula Sapiens Consortium
Tabula Sapiens reveals transcription factor expression, senescence effects, and sex-specific features in cell types from 28 human organs and tissues by The Tabula Sapiens Consortium, Stephen R Quake

See 2 usage examples →

Terra Fusion Data Sampler

geospatialsatellite imagery

The Terra Basic Fusion dataset is a fused dataset of the original Level 1 radiances from the five Terra instruments. They have been fully validate to contain the original Terra instrument Level 1 data. Each Level 1 Terra Basic Fusion file contains one full Terra orbit of data and is typically 15 – 40 GB in size, depending on how much data was collected for that orbit. It contains instrument radiance in physical units; radiance quality indicator; geolocation for each IFOV at its native resolution; sun-view geometry; bservation time; and other attributes/metadata. It is stored in HDF5, conformed to CF conventions, and ...

Usage examples

Basic Terra fusion product algorithm theoretical basis and data specifications by Zhao, Guangu; Yang, Muqun; Clipp, Landon; Gao, Yizhao; Lee, Joe H.
TerraFusion GitHub by University of Illinois

See 2 usage examples →

Transiting Exoplanet Survey Satellite (TESS)

astronomy

The Transiting Exoplanet Survey Satellite (TESS) is a multi-year survey that has discovered exoplanets in orbit around bright stars across the entire sky using high-precision photometry. The survey also enables a wide variety of stellar astrophysics, solar system science, and extragalactic variability studies. More information about TESS is available at MAST and the TESS Science Support Center.

Usage examples

TESS Archive Manual by MAST Staff
TIKE - a free, Jupyter-based cloud platform to access and analyze MAST's AWS TESS data by MAST Staff

See 2 usage examples →

Tropical Cyclone Precipitation, Infrared, Microwave, and Environmental Dataset (TC PRIMED)

atmosphereearth observationenvironmentalgeophysicsgeoscienceglobalmeteorologicalmodelnetcdfprecipitationsatellite imageryweather

The Tropical Cyclone Precipitation, Infrared, Microwave and Environmental Dataset (TC PRIMED) is a dataset centered around passive microwave observations of global tropical cyclones from low-Earth-orbiting satellites. TC PRIMED is a compilation of tropical cyclone data from various sources, including 1) tropical cyclone information from the National Oceanic and Atmospheric Administration (NOAA) National Weather Service National Hurricane Center (NHC) and Central Pacific Hurricane Center (CPHC) and the U.S. Department of Defense Joint Typhoon Warning Center, 2) low-Earth-orbiting satellite obse...

Usage examples

NOAA Center for Artificial Intelligence (NCAI) TC PRIMED Learning Journey Jupyter Notebooks by Naufal Razin (CSU/CIRA), Kathy Haynes (CSU/CIRA), Chris Slocum (NOAA/STAR)
tcprimedapi by TC PRIMED Development Team

See 2 usage examples →

UMASSD-FVCOM-GOM3-Hindcast

oceans

The Finite Volume Community Ocean Model (FVCOM) was used to simulate ocean water levels, velocity, temperature and salinity over a multi-decadal period (1984-present) in the waters of the Northeast US including the Gulf of Maine. The model was configured and run by the Dr. Changshen Chen, Director of the Marine Ecosystems Dynamics Modeling Laboratory in the School for Marine Science & Technology at the University of Massachusetts Dartmouth. The triangular mesh has a varying horizontal resolution from several hundred meters inshore to several kilometers offshore, and 45 terrain-following ...

Usage examples

An Unstructured Grid, Finite Volume, Three Dimensional, Primitive Equations Ocean Model with Application to Coastal Ocean and Estuaries by Changsheng Chen, Hedong Liu, and Robert C. Beardsley
FVCOM Explorer Notebook by Rich Signell

See 2 usage examples →

USGS COAWST (Coupled Ocean Atmosphere Wave and Sediment Transport) Forecast Model Archive, US East and Gulf Coasts

oceans

The COAWST modeling system has been used to simulate ocean, wave and sediment transport processes along the of US East Coast and Gulf of Mexico. The grid has a horizontal resolution of approximately 5km and is resolved with 16 vertical terrain following levels. The model has been executed on a daily basis since August 2009 with outputs written every hour. This archive contains model output from 2009-08-21 to 2022-06-17.

Usage examples

COAWST Icechunk Creation and Explorer Notebooks by Rich Signell
Coupled-Ocean-Atmosphere-Wave-Sediment Transport (COAWST) Modeling System, U.S. Geological Survey Software Release, 23 April 2019 by Warner, J.C., Ganju, N.K., Sherwood, C.R., Kalra, T.S., Aretxabaleta, A., He, R., Zambon, J., and Kumar, N.

See 2 usage examples →

UniProt

bioinformaticsbiologychemistryenzymegraphlife sciencesmoleculeproteinRDFSPARQL

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.

Usage examples

Exploring the UniProt protein knowledgebase with AWS Open Data and Amazon Neptune by Eric Greene, Rafa Xu, Yuan Shi (AWS)
UniProt SPARQL by Swiss-Prot Group at SIB Swiss Institute of Bioinformatics

See 2 usage examples →

Whiffle WINS50 Open Data on AWS

atmosphereelectricitymeteorologicalmodelsustainabilityturbulenceweatherzarr

Large Eddy Simulation (LES) data of the Winds of the North Sea in 2050 (WINS50) project.

Usage examples

Investigating energy production and wake losses of multi-gigawatt offshore wind farms with atmospheric large-eddy simulation by Baas, P., Verzijlbergh, R., van Dorp, P., and Jonker, H
Whiffle WINS50 Tutorials by Whiffle

See 2 usage examples →

Zwicky Transient Facility (ZTF)

astronomyobject detectionparquetsurvey

The Zwicky Transient Facility (ZTF) is a time-domain astronomy survey that uses the Palomar 48 inch Schmidt telescope and a custom-built wide-field camera to image the night sky in three photometric filters (g, r, and i). It is a fully-automated survey aimed at a systematic exploration of optical transient phenomena. It completes a scan of the observable northern sky approximately every three nights.

Usage examples

See 2 usage examples →

3-Band Cryo Data | Wide-field Infrared Survey Explorer (WISE)

astronomyimagingsatellite imagerysurvey

The Wide-field Infrared Survey Explorer (WISE) was a NASA Medium Explorer satellite in low-Earth orbit that conducted an all-sky astronomical imaging survey over four infrared bands from 2010-2011. The 3-Band Cryo Data Release contains 3.4, 4.6 and 12 micron (W1, W2, W3) imaging data that were acquired between 6 Aug and 29 Sept 2010 while the detectors were cooled by the inner cryogen tank following the exhaustion of the outer tank.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

3DCoMPaT: Composition of Materials on Parts of 3D Things

computer visionmachine learning

3D CoMPaT is a richly annotated large-scale dataset of rendered compositions of Materials on Parts of thousands of unique 3D Models. This dataset primarily focuses on stylizing 3D shapes at part-level with compatible materials. Each object with the applied part-material compositions is rendered from four equally spaced views as well as four randomized views. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. We present two variations of this task and adapt state-of-art 2D/3D deep learning met...

Usage examples

3DCoMPaT: Composition of Materials on Parts of 3D Things by Yuchen Li, Ujjwal Upadhyay, Habib Slim, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka & Mohamed Elhoseiny

See 1 usage example →

A2D2: Audi Autonomous Driving Dataset

autonomous vehiclescomputer visiondeep learninglidarmachine learningmappingrobotics

An open multi-sensor dataset for autonomous driving research. This dataset comprises semantically segmented images, semantic point clouds, and 3D bounding boxes. In addition, it contains unlabelled 360 degree camera images, lidar, and bus data for three sequences. We hope this dataset will further facilitate active research and development in AI, computer vision, and robotics for autonomous driving.

Usage examples

Autonomous Driving Data Service (ADDS) by Ajay Vohra, Amazon

See 1 usage example →

ABoVE: Bias-Corrected IMERG Monthly Precipitation for Alaska and Canada, 2000-2020

atmospherecogcogearth observationgloballandradar

This dataset is a modification to the Integrated Multi-satellitE Retrievals for GPM (IMERG) Final Run microwave-only, daily precipitation Version 06 data. It provides bias-corrected IMERG monthly precipitation data for Alaska and Canada from June 2000 through December 2020 in Cloud-Optimized GeoTIFF (*.tif) format. Data are provided in the units of mm/day. NASA's IMERG data product is one of the most advanced satellite precipitation products with a 0.1-degree spatial resolution and near global coverage. This dataset bias-corrected IMERG's HQprecipitation precipitation estimates, which ...

Usage examples

Accessing Data through ORNL DAAC Web Services by ORNL DAAC

See 1 usage example →

AI2 Diagram Dataset (AI2D)

machine learning

4,817 illustrative diagrams for research on diagram understanding and associated question answering.

Usage examples

A Diagram is Worth a Dozen Images by Aniruddha Kembhavi, Michael Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi

See 1 usage example →

AI2 Meaningful Citations Data Set

csvmachine learning

630 paper annotations

Usage examples

Identifying Meaningful Citations by Marco Valenzuela, Vu A. Ha, Oren Etzioni

See 1 usage example →

AI2 Reasoning Challenge (ARC) 2018

csvjsonmachine learning

7,787 multiple choice science questions and associated corpora

Usage examples

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challengg by Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord

See 1 usage example →

AIRS/Aqua L1B Infrared (IR) geolocated and calibrated radiances V005 (AIRIBRAD) at GES DISC

atmospheredatacenterearth observationglobalhdficelandmetadataopendaporbit

WARNING: On 2021/09/23 the EOS Aqua executed a Deep Space Maneuver (DSM). In the DSM, the spacecraft is turned such that the normal Earth field of regard is deep space.The thermal impact of the DSM caused a shift of the centroids of spectral response functions (SRF) of about 1% of the width of the SRF, equivalent to a frequency shift of 9 parts per million. This shift is reflected in the “spectral_freq” parameter (observed frequencies) in the L1b v5 files for each 6 minute granule. The magnitude of the effect on brightness temperatures (BT) depends on the spectral gradient of each channel. Max...

Usage examples

How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.

See 1 usage example →

ARCO-OCEAN

analysis ready dataatmosphereclimatehydrologyicemachine learningoceansphysicszarr

ARCO-OCEAN is an analysis-ready cloud-optimized dataset providing physical properties of the ocean, waves, and sea ice for a period of about 28 years between the 1st of January 1993 and the 30th of June 2021. The dataset includes also atmospheric and hydrological variables that would be needed as boundary conditions and used to drive a numerical simulation. The dataset is the result of collecting, processing, merging and optimizing for the cloud different data sources, all retrospective analyses (reanalyses) or hindcasts of different Earth system components. The dataset has been designed with ...

Usage examples

Computing the Oceanic El Nino Index (ONI) with Xarray and ARCO-OCEAN by OGS

See 1 usage example →

ARPA-E PERFORM Forecast data

energyenvironmentalgeospatialmodelsolar

The ARPA-E PERFORM Program is an ARPA-E funded program that aim to use time-coincident power and load seeks to develop innovative management systems that represent the relative delivery risk of each asset and balance the collective risk of all assets across the grid. A risk-driven paradigm allows operators to: (i) fully understand the true likelihood of maintaining a supply-demand balance and system reliability, (ii) optimally manage the system, and (iii) assess the true value of essential reliability services. This paradigm shift is critical for all power systems and is essential for grids wi...

Usage examples

ARPA-E PERFORM by ARPA-E

See 1 usage example →

ASTER Level 1T Precision Terrain Corrected Registered At-Sensor Radiance V004

cogcogearth observationgloballandorbit

The Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Level 1 Precision Terrain Corrected Registered At-Sensor Radiance (AST_L1T) data contains calibrated at-sensor radiance, which corresponds with the ASTER Level 1B (AST_L1B) that has been geometrically corrected and rotated to a north-up UTM projection. The AST_L1T is created from a single resampling of the corresponding ASTER L1A (AST_L1A) product. The bands available in the AST_L1T depend on the bands in the AST_L1A and can include up to three Visible and Near Infrared (VNIR) bands, six Shortwave Infrared (SWIR) bands, and five Thermal Infrared (TIR...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

ATLAS/ICESat-2 L2A Global Geolocated Photon Data V006

atmospheredatacenterearth observationglobalhdficelandwater

This data set (ATL03) contains height above the WGS 84 ellipsoid (ITRF2014 reference frame), latitude, longitude, and time for all photons downlinked by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory. The ATL03 product was designed to be a single source for all photon data and ancillary information needed by higher-level ATLAS/ICESat-2 products. As such, it also includes spacecraft and instrument parameters and ancillary data not explicitly required for ATL03. Read our doc on how to get AWS Creden...

Usage examples

Accessing and working with ICESat-2 data in the cloud by Andy Barrett, Jennifer Roebuck, Amy Steiker

See 1 usage example →

ATLAS/ICESat-2 L3A Land and Vegetation Height V006

atmospheredatacenterearth observationglobalhdficeland

This data set (ATL08) contains along-track heights above the WGS84 ellipsoid (ITRF2014 reference frame) for the ground and canopy surfaces. The canopy and ground surfaces are processed in fixed 100 m data segments, which typically contain more than 100 signal photons. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory. Read our doc on how to get AWS Credentials to retrieve this data: https://data.nsidc.earthdatacloud.nasa.gov/s3credentialsREADME

Usage examples

Accessing and working with ICESat-2 data in the cloud by Andy Barrett, Jennifer Roebuck, and Amy Steiker.

See 1 usage example →

Aging Mouse Brain Epigenetic

bamcramfastqgeneticgenomiclife sciencestranscriptomicswhole exome sequencingwhole genome sequencing

Aging is a major risk factor for neurodegenerative diseases, yet underlying epigenetic mechanisms remain unclear. Here, we generated a comprehensive single-nucleus cell atlas of brain aging across multiple brain regions, comprising 132,551 single-cell methylomes and 72,666 joint chromatin conformation-methylome nuclei. Integration with companion transcriptomic and chromatin accessibility data yielded a cross-modality taxonomy of 36 major cell types.

Usage examples

Cell-type-specific transposable element demethylation and TAD remodeling in the aging mouse brain by Zeng, Q., Wei, T., Klein, A., Bartlett, A., Liu, H., Nery, J.R., Castanon, R., Osteen, J., Johnson, N.D., Wang, W., Ding, W., Chen, H., Altshul, J., Kenworthy, M., Valadon, C., Owens, W., Wu, Z., Amaral, M.L., Song, Báez-Becerra, T.a.t.i.a.n.a., Cho, S., Chen, C., Willier, J., Cao, S., Rink, J., Lee, J., Barcoma, A., Arzavala, J., Emerson, N., Lu, Y.R., Ren, B., Behrens, M.a.r.g.a.r.i.t.a., Ecker, J.R.

See 1 usage example →

All-Sky Data | Wide-field Infrared Survey Explorer (WISE)

astronomyimagingsatellite imagerysurvey

The Wide-field Infrared Survey Explorer (WISE) was a NASA Medium Explorer satellite in low-Earth orbit that conducted an all-sky astronomical imaging survey over four infrared bands from 2010-2011. The All-Sky Release includes all data taken during the WISE full cryogenic mission phase, 7 January 2010 to 6 August 2010, in the 3.4, 4.6, 12, and 22 micron bands (i.e., W1, W2, W3, W4) that were processed with improved calibrations and reduction algorithms.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

AllWISE Data | Wide-field Infrared Survey Explorer (WISE)

astronomyimagingobject detectionparquetsatellite imagerysurvey

The Wide-field Infrared Survey Explorer (WISE) was a NASA Medium Explorer satellite in low-Earth orbit that conducted an all-sky astronomical imaging survey over four infrared bands from 2010-2011. The AllWISE Data Release combines data from all cryogenic and post-cryogenic survey phases and provides a comprehensive view of the mid-infrared sky. The Images Atlas includes 18,240 FITS image sets at 3.4, 4.6, 12 and 22 microns. The Source Catalog contains position, apparent motion, and flux information for over 747 million objects detected on the Atlas Images.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

Allen Brain Observatory - Visual Coding AWS Public Data Set

electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing

The Allen Brain Observatory – Visual Coding is a large-scale, standardized survey of physiological activity across the mouse visual cortex, hippocampus, and thalamus. It includes datasets collected with both two-photon imaging and Neuropixels probes, two complementary techniques for measuring the activity of neurons in vivo. The two-photon imaging dataset features visually evoked calcium responses from GCaMP6-expressing neurons in a range of cortical layers, visual areas, and Cre lines. The Neuropixels dataset features spiking activity from distributed cortical and subcortical brain regions, c...

Usage examples

Use the Allen Brain Observatory – Visual Coding on AWS by Nika Keller, David Feng

See 1 usage example →

Allen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology Data

electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing

The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the raw and derived data we collect publicly with rich metadata, including preliminary data collected during methods development, as near to the time of collection as possible.

Usage examples

AIND Open Data Access by David Feng, Saskia de Vries

See 1 usage example →

Analysis Ready Sentinel-1 Backscatter Imagery

agriculturecogdisaster responseearth observationenvironmentalgeospatialsatellite imagerystacsynthetic aperture radar

The Sentinel-1 mission is a constellation of C-band Synthetic Aperature Radar (SAR) satellites from the European Space Agency launched since 2014. These satellites collect observations of radar backscatter intensity day or night, regardless of the weather conditions, making them enormously valuable for environmental monitoring. These radar data have been processed from original Ground Range Detected (GRD) scenes into a Radiometrically Terrain Corrected, tiled product suitable for analysis. This product is available over the Contiguous United States (CONUS) since 2017 when Sentinel-1 data becam...

Usage examples

Compare Cloud-Optimized Geotiffs from Amazon Sustainability Data Initiative (ASDI) hosted on S3 using SageMaker Studio Lab (SMSL) by Gianfranco Rapino

See 1 usage example →

Astrophysics Division Galaxy Morphology Benchmark Dataset

astronomymachine learningNASA SMD AIsatellite imagery

Hubble Space Telescope imaging data and associated identification labels for galaxy morphology derived from citizen scientist labels from the Galaxy Zoo: Hubble project.

Usage examples

Galaxy Zoo: morphological classifications for 120 000 galaxies in HST legacy imaging by Kyle W. Willett, Melanie A. Galloway, Steven P. Bamford, Chris J. Lintott, Karen L. Masters, Claudia Scarlata, B. D. Simmons, Melanie Beck, Carolin N. Cardamone, Edmond Cheung, Edward M. Edmondson, Lucy F. Fortson, Roger L. Griffith, Boris Häußler, Anna Han, Ross Hart, Thomas Melvin, Michael Parrish, Kevin Schawinski, R. J. Smethurst, Arfon M. Smith

See 1 usage example →

CANOE (Canadian Aquatic Navigation for Observation of the Environment) Dataset

autonomous vehiclescomputer visionlidarradarrobotics

This autonomous marine navigation dataset includes data from a 360-degree Navtech radar, a 128-beam Ouster OS1 lidar with integrated IMU, a Teledyne Bumblebee stereo camera, Oculus M3000d imaging sonar, motor inputs, and GNSS. This dataset was collected on a lake and reservoir in Ontario, Canada. The intended purpose of this dataset is to enable the development and benchmarking of autonomous navigation algorithms in aquatic environments. In the future, we hope to release localization and odometry benchmarks.

Usage examples

Get to Know a Dataset: Canoe by Mia Thomas

See 1 usage example →

CHIMERA

cancercomputational pathologycomputer visiondeep learningdigital pathologygrand-challenge.orghistopathologylife sciencesmachine learningmedical image computingmedical imaging

This dataset contains the training data for the CHIMERA - Combining HIstology, Medical imaging (radiology) and molEcular data for medical pRognosis and diAgnosis challenge. The CHIMERA Challenge aims to advance precision medicine in cancer care by addressing the critical need for multimodal data integration. Despite significant progress in AI, integrating transcriptomics, pathology, and radiology across clinical departments remains a complex challenge. Clinicians are faced with large, heterogeneous datasets that are difficult to analyze effectively. AI has the potential to unify multimodal dat...

Usage examples

CHIMERA Challenge by Computational Pathology Group Radboudumc, Nijmegen

See 1 usage example →

CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in OMOP Common Data Model

amazon.sciencebioinformaticshealthlife sciencesnatural language processingus

DE-SynPUF is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,300,000 persom (2.3m) data sets in the OMOP Common Data Model format. The DE-SynPUF was created with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries’ protected health information. The purposes of the DE-SynPUF are to:

allow data entrepreneurs to develop and create software and applications that may eventually be applied to actual CMS claims data;
train researchers on the use and complexity of conducting anal

...

Usage examples

Map clinical notes to the OMOP Common Data Model and healthcare ontologies using Amazon Comprehend Medical by James Wiggins
Create data science environments on AWS for health analysis using OHDSI by James Wiggins
OHDSIonAWS by James Wiggins
Predict patient health outcomes using OHDSI and machine learning on AWS by James Wiggins

See 4 usage examples →

COVID-19 Genome Sequence Dataset

bambioinformaticsbiologycoronavirusCOVID-19cramfastqgeneticgenomichealthlife sciencesMERSSARSSTRIDEStranscriptomicsviruswhole genome sequencing

This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six ho...

Usage examples

Download SRA sequence data using Amazon Web Services (AWS) by NCBI SRA

See 1 usage example →

COVID-19 Open Research Dataset (CORD-19)

coronavirusCOVID-19life sciencesMERSSARS

Full-text and metadata dataset of COVID-19 and coronavirus-related research articles optimized for machine readability.

Usage examples

COVID-19 Open Research Dataset Challenge (CORD-19) by Kaggle

See 1 usage example →

Common Screens

encyclopedicinternetnatural language processing

A corpus of web screenshot and metadata data composed of over 70 million websites.

Usage examples

IAB Text Classification by Common Screens

See 1 usage example →

Community Earth System Model v2 ARISE (CESM2 ARISE)

atmosphereclimateclimate modelgeospatialicelandmodeloceanssustainability

Data from ARISE-SAI Experiments with CESM2

Usage examples

Coming Soon by NCAR

See 1 usage example →

Conformational Space of Short Peptides

amino acidbioinformaticsbiomolecular modelinglife sciencesmolecular dynamicsproteinstructural biology

Co-managed by Toyoko and the Structural Biology Group at the Universidad Nacional de Quilmes, this dataset allows us to explore the conformational space of all possible peptides using the 20 common amino acids. It consists of a collection of exhaustive molecular dynamics simulations of tripeptides and pentapeptides.

Usage examples

Intro to Conformational Space of Short Peptides by Sebastian Bassi and Virginia Gonzalez

See 1 usage example →

Corn Kernel Counting Dataset

agriculturecomputer visionmachine learning

Dataset associated with the March 2021 Frontiers in Robotics and AI paper "Broad Dataset and Methods for Counting and Localization of On-Ear Corn Kernels", DOI: 10.3389/frobt.2021.627009

Usage examples

Broad Dataset and Methods for Counting and Localization of On-Ear Corn Kernels by Jennifer Hobbs, Vachik Khachatryan, Barathwaj Anandan, Harutyun Hovhannisyan, David Wilson

See 1 usage example →

Coupled Model Intercomparison Project Phase 5 (CMIP5) University of Wisconsin-Madison Probabilistic Downscaling Dataset

climatecoastaldisaster responseenvironmentalmeteorologicaloceanssustainabilitywaterweather

The University of Wisconsin Probabilistic Downscaling (UWPD) is a statistically downscaled dataset based on the Coupled Model Intercomparison Project Phase 5 (CMIP5) climate models. UWPD consists of three variables, daily precipitation and maximum and minimum temperature. The spatial resolution is 0.1°x0.1° degree resolution for the United States and southern Canada east of the Rocky Mountains.

The downscaling methodology is not deterministic. Instead, to properly capture unexplained variability and extreme events, the methodology predicts a spatially and temporally varying Probability Density Function (PDF) for each variable. Statistics such as the mean, me...

Usage examples

Assessment Report: Analysis of Impact of Nonstationary Climate on NOAA Atlas 14 Estimates by NOAA

See 1 usage example →

Crowdsourced Bathymetry

earth observationoceans

Community provided bathymetry data collected in collaboration with the International Hydrographic Organization.

Usage examples

Crowdsourced Bathymetry Data (CSB) Visualization by David Neufeld

See 1 usage example →

DWD ICON-EU - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

ICON-EU is a regional weather forecast model operated by Deutscher Wetterdienst (DWD), Germany's national meteorological service. ICON-EU is a nested configuration of DWD's global ICON (Icosahedral Non-hydrostatic) model that provides high-resolution forecasts over Europe.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

DWD ICON-EU for

Usage examples

DWD ICON-EU forecast, 5 day — Quickstart by dynamical.org

See 1 usage example →

Danish Meteorological Institute (DMI) Open Data Forecasts

air temperatureatmosphereforecastmeteorologicalmodelnear-surface air temperaturenear-surface relative humiditynear-surface specific humidityocean circulationocean currentsocean sea surface heightocean simulationocean velocityoceanstime series forecastingweather

DMI forecast data consist of various models where each model contains different set of parameters relating to a specific domain like ocean (WAM), storm flooding (DKSS) or weather (HARMONIE)

Usage examples

Guide to processing forecast data with Python by Danish Meteorological Institute

See 1 usage example →

Defense Meteorology Satellite Program (DMSP) Auroral Particle Flux

earth observationgeospatialsolarspace weather

The United States Air Force (USAF) Defense Meteorological Satellite Program (DMSP) SSJ precipitating particle instrument measures in-situ total flux and energy distribution of electrons and ions at low earth orbit. These precipitating particles are of interest for space weather operations and research, in part because they produce aurora during normal and very strong geomagnetic storms. This dataset contains both sensor-level raw data (as detailed in Redmon et al. 2017) and a high-level machine-learning-ready data product.

Usage examples

ssj_latbin: Create and read ML-ready DMSP particle precipitation data by Liam Kilcommons

See 1 usage example →

Discrete Reasoning Over the content of Paragraphs (DROP)

machine learningnatural language processing

The DROP dataset contains 96k Question and Answer pairs (QAs) over 6.7K paragraphs, split between train (77k QAs), development (9.5k QAs) and a hidden test partition (9.5k QAs).

Usage examples

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs by Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner

See 1 usage example →

ECMWF AIFS ENS - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

The Artificial Intelligence Forecasting System (AIFS) is a data driven forecast model developed by the European Centre for Medium-Range Weather Forecasts (ECMWF). AIFS-ENS is the ensemble configuration of AIFS, containing 51 ensemble members. AIFS is trained on ECMWF's ERA5 re-analysis and ECMWF's operational numerical weather prediction (NWP) analyses.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

ECMWF A

Usage examples

ECMWF AIFS ENS forecast — Quickstart by dynamical.org

See 1 usage example →

ECMWF IFS ENS - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

The Integrated Forecasting System (IFS) is a global forecast model developed by ECMWF. ENS is an ensemble configuration of IFS, containing 51 ensemble members. IFS consists of a numerical model of the Earth system, which includes an atmospheric model at its heart, coupled with models of other Earth system components such as the ocean. The data assimilation system combines the latest weather observations with a recent forecast to obtain the best possible estimate of the current state of the Earth system.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

Details →

Usage examples
- ECMWF IFS ENS forecast, 15 day, 0.25 degree — Quickstart by dynamical.org
See 1 usage example →

ERA5-for-WRF Open Data on AWS

atmosphereelectricitymeteorologicalmodelsustainabilityweather

ERA5 reanalysis data on AWS, preprocessed for use with the Weather Research and Forecasting (WRF) model.

Usage examples

ERA5-for-WRF Tutorials by Veer Renewables

See 1 usage example →

East Coast Community Ocean Forecast System (ECCOFS)

coastalenvironmentalforecastmarineoceansweather

The East Coast Community Ocean Forecast System (ECCOFS) is a data assimilating ocean analysis and forecast system being developed by Rutgers University, the University of California Santa Cruz, Fathom Science Inc., and the National Ocean Service (NOS) of NOAA for transition to operations at NCEP in 2028. The ECCOFS domain spans the eastern seaboard of North America and Intra-Americas Seas from the Grand Banks of Newfoundland in the north to the mouth of the Orinoco River, Venezuela, in the south. ECCOFS will complement the existing WCOFS (West Coast Operational Forecast System) to achieve complete forecast coverage of U.S. territori...

Usage examples

ROMS model outputs in AWS can be readily visualized and analyzed in Python using the xroms tools. by XROMS

See 1 usage example →

End of Term Web Archive Dataset

archivesinternetnatural language processingweb archive

The End of Term Web Archive (EOT) captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, 2020 and 2024. Data from these web crawls have been made openly available in several formats in this dataset.

Usage examples

Moving the End of Term Web Archive to the Cloud to Encourage Research Use and Reuse by Mark Phillips and Sawood Alam

See 1 usage example →

Ensemble Meteorological Dataset for Planet Earth, EM-Earth

atmospheremeteorologicalnear-surface air temperaturenetcdfprecipitation

EM-Earth provides data for precipitation, mean air temperature, air temperature range, and dew-point temperature at 0.1° spatial resolution over global land areas from 1950 to 2019. EM-Earth provides hourly/daily deterministic estimates, and daily probabilistic estimates (25 ensemble members), to meet the diverse requirements of hydrometeorological applications.

Usage examples

EM-Earth: The Ensemble Meteorological Dataset for Planet Earth by Guoqiang Tang, Martyn P. Clark & Simon Michael Papalexiou

See 1 usage example →

Essential-Web v1.0: 24T tokens of organized web data

machine learningnatural language processingtext analysisweb archive

A 24-trillion-token dataset in which every document is annotated with a twelve-category taxonomy covering topic, format, content complexity, and quality.

Usage examples

Essential-Web v1.0: 24T tokens of organized web data by Andrew Hojel, Michael Pust, Tim Romanski, Yash Vanjani, Ritvik Kapila, Mohit Parmar et al.

See 1 usage example →

GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1

earth observationecosystemsglobalhdflandland coverlidaropendap

This dataset contains Global Ecosystem Dynamics Investigation (GEDI) Level 4A (L4A) Version 2 predictions of the aboveground biomass density (AGBD; in Mg/ha) and estimates of the prediction standard error within each sampled geolocated laser footprint. In this version, the granules are in sub-orbits. The algorithm setting group selection used for GEDI02_A Version 2 has been modified for Evergreen Broadleaf Trees in South America to reduce false positive errors resulting from the selection of waveform modes above ground elevation as the lowest mode. The footprints are located within the global ...

Usage examples

Searching and Downloading GEDI L4A Dataset by Rupesh Shrestha

See 1 usage example →

Geosnap Data, Center for Geospatial Sciences

demographicsgeospatialurban

This bucket contains multiple datasets (as Quilt packages) created by the Center for Geospatial Sciences (CGS) at the University of California-Riverside. The data in this bucket contains the following:

Tabular and geographic data from the US Census
Land Cover imagery collected from Multi-Resolution Land Characteristics Consortium
Road network data processed from OpenStreetMap

Usage examples

Geosnap User Guide by Eli Knaap

See 1 usage example →

Global Biodiversity Information Facility (GBIF) Species Occurrences

biodiversitybioinformaticsconservationearth observationlife sciences

The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken a...

Usage examples

GBIF and Apache-Spark on AWS tutorial by John Waller

See 1 usage example →

HLS Landsat Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0

atmospherecogdatacenterearth observationgeospatialglobalicelandmetadataorbitsatellite imagerystacsurface watertileswaterxml

The Harmonized Landsat Sentinel-2 (HLS) project provides consistent surface reflectance (SR) and top of atmosphere (TOA) brightness data from a virtual constellation of satellite sensors. The Operational Land Imager (OLI) is housed aboard the joint NASA/USGS Landsat 8 and Landsat 9 satellites, while the Multi-Spectral Instrument (MSI) is mounted aboard Europe’s Copernicus Sentinel-2A, Sentinel-2B, and Sentinel-2C satellites. The combined measurement enables global observations of the land every 2–3 days at 30-meter (m) spatial resolution. The HLS project uses a set of algorithms to obtain seamless products from OLI and MSI that include atmospheric correction, cloud and cloud-shadow masking, spatial co-registration and common gridding, illumination and view angle normalization...

Usage examples

Getting Started with Cloud-Native HLS Data in Python by Mahsa Jami, Erik A. Bolch, Cole K. Krehbiel, Aaron M. Friesz, Brianna M. Lind

See 1 usage example →

HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30m v2.0

cogdatacenterearth observationgeospatialglobalhdficelandmetadataorbitsatellite imagerystacsurface watertileswaterxml

The Harmonized Landsat Sentinel-2 (HLS) project provides consistent surface reflectance data from the Operational Land Imager (OLI) aboard the joint NASA/USGS Landsat 8 satellite and the Multi-Spectral Instrument (MSI) aboard Europe’s Copernicus Sentinel-2A, Sentinel-2B, and Sentinel-2C satellites. The combined measurement enables global observations of the land every 2–3 days at 30-meter (m) spatial resolution. The HLS project uses a set of algorithms to obtain seamless products from OLI and MSI that include atmospheric correction, cloud and cloud-shadow masking, spatial co-registration and common gridding, illumination and view angle normalization, and spectral bandpass adjustment. The HLSS30 product provides 30-m Nadir Bidirectio...

Usage examples

Getting Started with Cloud-Native HLS Data in Python by Mahsa Jami, Erik A. Bolch, Cole K. Krehbiel, Aaron M. Friesz, Brianna M. Lind

See 1 usage example →

High Resolution Population Density Maps + Demographic Estimates by CIESIN and Meta

aerial imagerydemographicsdisaster responsegeospatialimage processingmachine learningpopulationsatellite imagery

Population data for a selection of countries, allocated to 1 arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files. This refines CIESIN’s Gridded Population of the World using machine learning models on high-resolution worldwide Maxar satellite imagery. CIESIN population counts aggregated from worldwide census data are allocated to blocks where imagery appears to contain buildings.

Usage examples

Investigating environmental characteristics of US cities using publicly available ASDI datasets by Darren Ko

See 1 usage example →

High-Order Accurate Direct Numerical Simulation of Flow over a MTU-T161 Low Pressure Turbine Blade

computational fluid dynamicsgreen aviationlow-pressure turbineturbulence

The archive comprises snapshot, point-probe, and time-average data produced via a high-fidelity computational simulation of turbulent air flow over a low pressure turbine blade, which is an important component in a jet engine. The simulation was undertaken using the open source PyFR flow solver on over 5000 Nvidia K20X GPUs of the Titan supercomputer at Oak Ridge National Laboratory under an INCITE award from the US DOE. The data can be used to develop an enhanced understanding of the complex three-dimensional unsteady air flow patterns over turbine blades in jet engines. This could in turn le...

Usage examples

High-Order Accurate Direct Numerical Simulation of Flow over a MTU-T161 Low Pressure Turbine Blade by A. S. Iyer, Y. Abe, B. C. Vermeire, P. Bechlars, R. D. Baier, A. Jameson, F. D. Witherden, and P. E. Vincent

See 1 usage example →

Human Cancer Models Initiative (HCMI) Cancer Model Development Center

cancergenomiclife sciencesSTRIDESwhole genome sequencing

The Human Cancer Models Initiative (HCMI) is an international consortium that is generating novel, next-generation, tumor-derived culture models annotated with genomic and clinical data. HCMI-developed models and related data are available as a community resource. The NCI is contributing to the initiative by supporting four Cancer Model Development Centers (CMDCs). CMDCs are tasked with producing next-generation cancer models from clinical samples. The cancer models include tumor types that are rare, originate from patients from underrepresented populations, lack precision therapy, or lack ca...

Usage examples

Genomic Data Commons by National Cancer Institute

See 1 usage example →

Human PanGenomics Project

cramfast5fastqgeneticgenomiclife sciences

This dataset includes sequencing data, assemblies, and analyses for the offspring of ten parent-offspring trios.

Usage examples

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes by Shafin et al (2020)

See 1 usage example →

ICEYE Synthetic Aperture Radar (SAR) Open Dataset

computer visiondisaster responseearth observationgeospatialimage processingsatellite imagerystacsynthetic aperture radar

ICEYE operates the world’s largest constellation of synthetic aperture radar (SAR) satellites, delivering unmatched access to persistent, high-resolution Earth observation data regardless of time of day or weather conditions. The ICEYE Open Dataset makes a curated selection of SAR imagery publicly available to promote research, innovation, and education in the geospatial community. ICEYE’s constellation enables rapid revisit rates and flexible imaging modes, unlocking insights into natural disasters, climate monitoring, infrastructure, and more.Learn more at www.iceye.com.

Usage examples

ICEYE Product Documentation by ICEYE

See 1 usage example →

IGP Brick Kilns Bangladesh

air qualityenergyenvironmentalinfrastructure

This dataset includes detailed information about brick kilns, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Brick Kiln Data and Emission Calculation by APAD

See 1 usage example →

IGP Brick Kilns India

air qualityenergyenvironmentalinfrastructure

This dataset includes detailed information about brick kilns, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Brick Kiln Data and Emission Calculation by APAD

See 1 usage example →

IGP Brick Kilns Pakistan

air qualityenergyenvironmentalinfrastructure

This dataset includes detailed information about brick kilns, their locations, capacities, emissions, and other relevant attributes around the Pakistann Gangetic Plain.

Usage examples

Analyzing Brick Kiln Data and Emission Calculation by APAD

See 1 usage example →

IGP Cement Plants

air qualityenvironmentalindustrialinfrastructure

This dataset includes detailed information about cement plants, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Cement Plant Data by APAD

See 1 usage example →

IGP Paper and Pulp Plant

air qualityenvironmentalindustrialinfrastructure

This dataset includes detailed information about paper and pulp plants, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Paper and Pulp Plant Data by APAD

See 1 usage example →

IGP Power Generation Plant

air qualityenergyenvironmentalinfrastructure

This dataset includes detailed information about power generation plants, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Power Generation Plant Data and Emission Calculation by APAD

See 1 usage example →

IGP Steel Plants

air qualityenvironmentalindustrialinfrastructure

This dataset includes detailed information about steel plants, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Steel Plant Data by APAD

See 1 usage example →

IGP Waste Management Data

air qualityenvironmentalinfrastructuresustainability

This dataset includes detailed information about waste management sites, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Waste Management Data by APAD

See 1 usage example →

IWMI DIWASA Rainfed and Irrigated Cropland Map for Africa

agriculturecropland partitioningirrigated croplandland coverland userainfed cropland

A framework integrating the Budyko model has been developed to distinguish between rainfed and irrigated cropland areas across Africa. This expands on remote sensing land cover products available for agricultural water studies in Africa and thereby helps address the need for deeper insights into cropland patterns. Validation against an independent dataset revealed an overall accuracy of 73% with high precision and specificity scores. These results validate the framework’s effectiveness in identifying irrigated areas while minimizing errors in misclassifying rainfed areas as irrigated.

Usage examples

Cropland percentage by iwmiwaplus
Water use in Awash basin by A. Owusu, K. Akpoti, M. Leh, N. Velpuri
A framework for disaggregating remote-sensing cropland into rainfed and irrigated classes at continental scale by Owusu, A., Kagone, S., Leh, M., Velpuri, N. M., Gumma, M. K., Ghansah, B., Thilina-Prabhath, P., Akpoti, K., Mekonnen, K., Tinonetsana, P., & Mohammed, I.
Rainfed and Irrigated Cropland Areas for Africa by Owusu, A., Kagone, S., Leh, M., and Velpuri, N.M.

See 4 usage examples →

Image classification - fast.ai datasets

computer visiondeep learningmachine learning

Some of the most important datasets for image classification research, including CIFAR 10 and 100, Caltech 101, MNIST, Food-101, Oxford-102-Flowers, Oxford-IIIT-Pets, and Stanford-Cars. This is part of the fast.ai datasets collection hosted by AWS for convenience of fast.ai students. See documentation link for citation and license details for each dataset.

Usage examples

Oxford-IIIT Pet Image Classification on Amazon SageMaker by AWS

See 1 usage example →

Japan Prefectures, 3D Point Cloud Data

disaster responseelevationgeospatialjapaneselandlidarmapping

This dataset comprises high-precision 3D point cloud data that covers all prefectures throughout Japan. The data is produced through aerial laser surveys, airborne laser bathymetry, and mobile mapping systems, representing the culmination of many years of dedicated effort. This data will be visualized and analyzed for use in infrastructure maintenance, disaster prevention measures, and autonomous vehicle driving.

Usage examples

Tutorial of handling LAS format point cloud data by AIGID

See 1 usage example →

Kanagawa, 3D Point Cloud Data

disaster responseelevationgeospatialjapaneselandlidarmapping

This dataset comprises high-precision 3D point cloud data that encompasses the entire Kanagawa prefecture in Japan. The data is produced through aerial laser survey, airborne laser bathymetry and mobile mapping systems, the culmination of many years of dedicated effort. This data will be visualized and analyzed for use in infrastructure maintenance, disaster prevention measures and autonomous vehicle driving.

Usage examples

Tutorial of handling LAS format point cloud data by AIGID

See 1 usage example →

Knowledge Portal Network Bottom-line Genetic Associations

geneticgenome wide association studylife sciences

At the Knowledge Portal Network, we aggregate and analyze genetic association results for a wide range of diseases and traits. For any given disease, a large number of individual genetic association datasets may have been generated. To make these results more interpretable, we meta-analyze all datasets for each phenotype, using a method that we term "bottom-line integrative analysis". Here we provide the bottom-line summary statistic files for public download.

Usage examples

Leveraging type 1 diabetes human genetic and genomic data in the T1D knowledge portal by Kudtarkar P, Costanzo MC, Sun Y, Jang D, Koesterer R, Mychaleckyj JC, et al.
Tutorial: Use cases for the Knowledge Portal Network bottom-line genetic associations by Jason Flannick
The Type 2 Diabetes Knowledge Portal: An open access genetic resource dedicated to type 2 diabetes and related traits by Costanzo MC, von Grotthuss M, Massung J, Jang D, Caulkins L, Koesterer R, et al.
Cardiovascular Disease Knowledge Portal: A Community Resource for Cardiovascular Disease Research by Costanzo MC, Roselli C, Brandes M, Duby M, Hoang Q, Jang D, et al.

See 4 usage examples →

Korea Meteorological Administration (KMA) GK-2A Satellite Data

agriculturedisaster responseearth observationgeospatialmeteorologicalsatellite imageryweather

The Geo-KOMPSAT-2A (GK2A) is the new generation geostationary meteorological satellite (located in 128.2°E) of the Korea Meteorological Administration (KMA). The main mission of the GK2A is to observe the atmospheric phenomena over the Asia-Pacific region. The Advance Meteorological Imager (AMI) on GK2A scan the Earth full disk every 10 minutes and the Korean Peninsula area every 2 minutes with a high spatial resolution of 4 visible channels and 12 infrared channels. In addition, the AMI has an ability of flexible target area scanning useful for monitoring severe weather events such as typhoon...

Usage examples

GK2A Full Fact Sheet by KMA

See 1 usage example →

LGND Clay v1.5 Sentinel-2

computer visionearth observationimagingmachine learningsatellite imagery

A global dataset of Clay v1.5 embeddings for Sentinel2.

Usage examples

Applying geospatial foundation models to reforestation site selection by Amazon Sustainability Data Initiative

See 1 usage example →

LOFAR ELAIS-N1 cycle 2 observations on AWS

astronomyimagingsurvey

These data correspond to the International LOFAR Telescope observations of the sky field ELAIS-N1 (16:10:01 +54:30:36) during the cycle 2 of observations. There are 11 runs of about 8 hours each plus the corresponding observation of the calibration targets before and after the target field. The data are measurement sets (MS) containing the cross-correlated data and metadata divided in 371 frequency sub-bands per target centred at ~150 MHz.

Usage examples

Calibration of LOFAR ELAIS-N1 data in the Amazon cloud by J. Sabater

See 1 usage example →

Land/Sea static mask relevant to IMERG precipitation 0.1x0.1 degree V2 (GPM_IMERG_LandSeaMask) at GES DISC

atmospherecoastaldatacentergloballandmetadatanetcdfopendap

Version 2 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 2.This land sea mask originated from the NOAA group at SSEC in the 1980s. It was originally produced at 1/6 deg resolution, and then regridded for the purposes of GPCP, TMPA, and IMERG precipitation products. NASA code 610.2, Terrestrial Information Systems Laboratory, restructured this land sea mask to match the IMERG grid, and converted the file to CF-compliant netCDF4. Version 2 was created in May, 2019 to resolve detected inaccuracies in coastal regions.Users sho...

Usage examples

How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.

See 1 usage example →

Legal Entity Identifier (LEI) and Legal Entity Reference Data (LE-RD)

analyticsblockchainclimatecommercecopyright monitoringcsvfinancial marketsgovernancegovernment spendingjsonmarket datasocioeconomicstatisticstransparencyxml

The Legal Entity Identifier (LEI) is a 20-character, alpha-numeric code based on the ISO 17442 standard developed by the International Organization for Standardization (ISO). It connects to key reference information that enables clear and unique identification of legal entities participating in financial transactions. Each LEI contains information about an entity’s ownership structure and thus answers the questions of 'who is who’ and ‘who owns whom’. Simply put, the publicly available LEI data pool can be regarded as a global directory, which greatly enhances transparency in the global ma...

Usage examples

AWS hosts new open dataset to help businesses identify climate finance risks and investments by AWS Public Sector Blog Team

See 1 usage example →

LongBench - cross-platform reference dataset profiling cancer cell lines with bulk and single-cell approaches

bambenchmarkbioinformaticscancerfastqlife scienceslong read sequencingshort read sequencingsingle-cell transcriptomicsvcf

LongBench is a comprehensive benchmark dataset of the latest long-read transcriptomics technologies from Oxford Nanopore (ON) and Pacific Biosciences, alongside a comparison with next-generation sequencing from Illumina. We generated bulk and single-cell libraries from lung cancer cell lines which include different cancer subtypes to capture real biological variation. To further compare and assess sequencing platform performance, Sequins and SIRVs (Set 4) synthetic spike-ins have been included.

Usage examples

Benchmarking long-read DE gene and transcript analysis with edgeR by Yupei You

See 1 usage example →

Longitudinal Nutrient Deficiency

aerial imageryagriculturecomputer visiondeep learningmachine learning

Dataset associated with the 2021 AAAI Paper- Detection and Prediction of Nutrient Deficiency Stress using Longitudinal Aerial Imagery. The dataset contains 3 image sequences of aerial imagery from 386 farm parcels which have been annotated for nutrient deficiency stress.

Usage examples

Detection and Prediction of Nutrient Deficiency Stress using Longitudinal Aerial Imagery by Saba Dadsetan, Gisele Rose, Naira Hovakimyan, Jennifer Hobbs

See 1 usage example →

MAN TruckScenes

autonomous vehiclescomputer visiondeep learningGPSIMUlidarlogisticsmachine learningobject detectionobject trackingperceptionradarroboticstransportation

A large scale multimodal dataset for Autonomous Trucking. Sensor data was recorded with a heavy truck from MAN equipped with 6 lidars, 6 radars, 4 cameras and a high-precision GNSS. MAN TruckScenes allows the research community to come into contact with truck-specific challenges, such as trailer occlusions, novel sensor perspectives, and terminal environments for the first time. It comprises more than 740 scenes of 20s each within a multitude of different environmental conditions. Bounding boxes are available for 27 object classes, 15 attributes, and a range of more than 230m. The scenes are t...

Usage examples

TruckScenes devkit tutorial by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin
PyPi package by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin
MANTruckScenes: A multimodal dataset for autonomous trucking in diverse conditions by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin, et al
TruckScenes devkit by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin

See 4 usage examples →

MODIS MYD13A1, MOD13A1, MYD11A1, MOD11A1, MCD43A4

agriculturedisaster responsegeospatialnatural resourcesatellite imagery

Data from the Moderate Resolution Imaging Spectroradiometer (MODIS), managed by the U.S. Geological Survey and NASA. Five products are included: MCD43A4 (MODIS/Terra and Aqua Nadir BRDF-Adjusted Reflectance Daily L3 Global 500 m SIN Grid), MOD11A1 (MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1 km SIN Grid), MYD11A1 (MODIS/Aqua Land Surface Temperature/Emissivity Daily L3 Global 1 km SIN Grid), MOD13A1 (MODIS/Terra Vegetation Indices 16-Day L3 Global 500 m SIN Grid), and MYD13A1 (MODIS/Aqua Vegetation Indices 16-Day L3 Global 500 m SIN Grid). MCD43A4 has global coverage, all...

Usage examples

Astraea Earth OnDemand by Astraea, Inc.

See 1 usage example →

MODIS/Aqua Surface Reflectance Daily L2G Global 1km and 500m SIN Grid V061

datacenterearth observationgeospatialglobalhdficelandopendapsatellite imagery

The MYD09GA Version 6.1 product provides an estimate of the surface spectral reflectance of Aqua Moderate Resolution Imaging Spectroradiometer (MODIS) Bands 1 through 7, corrected for atmospheric conditions such as gasses, aerosols, and Rayleigh scattering. Provided along with the 500 meter (m) surface reflectance, observation, and quality bands are a set of ten 1 kilometer observation bands and geolocation flags. The reflectance layers from the MYD09GA are used as the source data for many of the MODIS land products. Known Issues

Prior to the Aqua MODIS launch, Band 6 exhibited several anomalous detectors. Band 6 performanc

...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Aqua Surface Reflectance Daily L2G Global 250m SIN Grid V061

datacenterearth observationgeospatialglobalhdficelandopendap

The MYD09GQ Version 6.1 product provides an estimate of the surface spectral reflectance of Aqua Moderate Resolution Imaging Spectroradiometer (MODIS) 250 meter (m) bands 1 and 2, corrected for atmospheric conditions such as gasses, aerosols, and Rayleigh scattering. Along with the 250 m bands are the Quality Assurance (QA) layer and five observation layers. This product is intended to be used in conjunction with the quality and viewing geometry information of the 500 m product (MYD09GA). Known Issues

Prior to the Aqua MODIS launch, Band 6 exhibited several anomalous detectors. Band 6 performance degraded seriously after la

...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Terra Calibrated Radiances 5-Min L1B Swath 500m

atmospheredatacenterearth observationenvironmentalglobalhdfmetadataopendaporbit

The MODIS/Terra Calibrated Radiances 5Min L1B Swath 500m data set contains calibrated and geolocated at-aperture radiances for 7 discrete bands located in the 0.45 to 2.20 micron region of the electromagnetic spectrum. These data are generated from the MODIS Level 1A scans of raw radiance and in the process converted to geophysical units of W/(m^2 um sr). Additional data are provided including quality flags, error estimates and calibration data.Visible, shortwave infrared, and near infrared measurements are only made during the daytime (except band 26), while radiances for the thermal infrared region (bands 20-25, 27-36) are measured continuously.Channels 1 and 2 have 250 m resolution, channels 3 through 7 have 500 m resolution. However, for the MODIS L1B 500 m product, ...

Usage examples

MODIS Level 1B - Calibrated Radiances - Natural Color RGB - 500m by EUMETSAT

See 1 usage example →

MODIS/Terra Net Evapotranspiration 8-Day L4 Global 500m SIN Grid V061

atmosphereearth observationevapotranspirationgeospatialglobalhdflandland coveropendapwater

The MOD16A2 Version 6.1 Evapotranspiration/Latent Heat Flux product is an 8-day composite dataset produced at 500 meter (m) pixel resolution. The algorithm used for the MOD16 data product collection is based on the logic of the Penman-Monteith equation, which includes inputs of daily meteorological reanalysis data along with Moderate Resolution Imaging Spectroradiometer (MODIS) remotely sensed data products such as vegetation property dynamics, albedo, and land cover. Provided in the MOD16A2 product are layers for composited Evapotranspiration (ET), Latent Heat Flux (LE), Potential ET (PET) and Potential LE (PLE) along with a quality control layer. Two low resolution browse images, ET and LE, are also available for each MO...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Terra Surface Reflectance 8-Day L3 Global 500m SIN Grid V061

earth observationgeospatialglobalhdflandopendapsatellite imagery

The Moderate Resolution Imaging Spectroradiometer (MODIS) Terra MOD09A1 Version 6.1 product provides an estimate of the surface spectral reflectance of Terra MODIS Bands 1 through 7 corrected for atmospheric conditions such as gasses, aerosols, and Rayleigh scattering. Along with the seven 500 meter (m) reflectance bands are two quality layers and four observation bands. For each pixel, a value is selected from all the acquisitions within the 8-day composite period. The criteria for the pixel choice include cloud and solar zenith. When several acquisitions meet the criteria the pixel with the minimum channel 3 (blue) value is used. Known Issues

For complete information about known issues please refer to the

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Terra Surface Reflectance Daily L2G Global 1km and 500m SIN Grid V061

datacenterearth observationgeospatialglobalhdficelandopendapsatellite imagery

The MOD09GA Version 6.1 product provides an estimate of the surface spectral reflectance of Terra Moderate Resolution Imaging Spectroradiometer (MODIS) Bands 1 through 7, corrected for atmospheric conditions such as gasses, aerosols, and Rayleigh scattering. Provided along with the 500 meter (m) surface reflectance, observation, and quality bands are a set of ten 1 kilometer (km) observation bands and geolocation flags. The reflectance layers from the MOD09GA are used as the source data for many of the MODIS land products. Known Issues

For complete information about known issues please refer to the Details →

Usage examples
- Download Files from S3 Using boto3 by LPDAAC
See 1 usage example →

MODIS/Terra Surface Reflectance Daily L2G Global 250m SIN Grid V061

datacenterearth observationgeospatialglobalhdficelandopendap

The MOD09GQ Version 6.1 product provides an estimate of the surface spectral reflectance of Terra Moderate Resolution Imaging Spectroradiometer (MODIS) 250 meter (m) bands 1 and 2, corrected for atmospheric conditions such as gasses, aerosols, and Rayleigh scattering. Along with the 250 m surface reflectance bands are the Quality Assurance (QA) layer and five observation layers. This product is intended to be used in conjunction with the quality and viewing geometry information of the 500 m product (MOD09GA). Known Issues

For complete information about known issues please refer to the MODIS/VIIRS Land Quality A

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061

datacenterearth observationgeospatialglobalhdficelandopendapsatellite imagery

The Terra Moderate Resolution Imaging Spectroradiometer (MODIS) Vegetation Indices (MOD13Q1) Version 6.1 data are generated every 16 days at 250 meter (m) spatial resolution as a Level 3 product. The MOD13Q1 product provides two primary vegetation layers. The first is the Normalized Difference Vegetation Index (NDVI) which is referred to as the continuity index to the existing National Oceanic and Atmospheric Administration-Advanced Very High Resolution Radiometer (NOAA-AVHRR) derived NDVI. The second vegetation layer is the Enhanced Vegetation Index (EVI), which has improved sensitivity over high biomass regions. The algorithm chooses the best available pixel value from all the acquisitions from the 16 day period. The cri...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Terra+Aqua BRDF/Albedo Albedo Daily L3 Global - 500m V061

earth observationgeospatialglobalhdflandopendapsatellite imagerytiles

The Moderate Resolution Imaging Spectroradiometer (MODIS) MCD43A3 Version 6.1 Albedo Model dataset is produced daily using 16 days of Terra and Aqua MODIS data at 500 meter (m) resolution. Data are temporally weighted to the ninth day of the 16 day which is reflected in the Julian date in the file name.Users are urged to use the band specific quality flags to isolate the highest quality full inversion results for their own science applications as described in the User Guide.The MCD43A3 provides black-sky albedo (directional hemispherical reflectance) and white-sky albedo (bihemispherical reflectance) data at local solar noon fo...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Terra+Aqua BRDF/Albedo Model Parameters Daily L3 Global - 500m V061

earth observationgeospatialglobalhdflandopendaptiles

The Moderate Resolution Imaging Spectroradiometer (MODIS) MCD43A1 Version 6.1 Bidirectional Reflectance Distribution Function and Albedo (BRDF/Albedo) Model Parameters dataset is produced daily using 16 days of Terra and Aqua MODIS data at 500 meter (m) resolution. Data are temporally weighted to the ninth day of the retrieval period which is reflected in the Julian date in the file name. MCD43A1 provides the three model weighting parameters (isotropic, volumetric, and geometric) used to derive the Albedo (MCD43A3) and Nadir BRDF-Adjusted Reflectance (NBAR) (MCD43A4) products.Users are urged to use the band specific qualit...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

MODIS/Terra+Aqua BRDF/Albedo Nadir BRDF-Adjusted Ref Daily L3 Global - 500m V061

earth observationgeospatialglobalhdflandopendapsatellite imagerytiles

The Moderate Resolution Imaging Spectroradiometer (MODIS) MCD43A4 Version 6.1 Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) dataset is produced daily using 16 days of Terra and Aqua MODIS data at 500 meter (m) resolution. The view angle effects are removed from the directional reflectances, resulting in a stable and consistent NBAR product. Data are temporally weighted to the ninth day which is reflected in the Julian date in the file name.Users are urged to use the band specific quality flags to isolate the highest quality full inversion results for their own science applications as described in the User Guide.The MCD43A4 provides NBAR and simplified mandatory quality layers for MODIS bands 1 through 7. Essential q...

Usage examples

Download Files from S3 Using boto3 by LPDAAC

See 1 usage example →

Marginal Build Emissions Rates (MBERs) for Electricity

carbonclimatecsvelectricityenergyenergy modelingenvironmental

The Climate TRACE coalition has developed and maintains free global hourly Build Margin data, also known as MBERs, that are compliant with the Greenhouse Gas Protocol's Project Protocol electricity sector guidance, Guidelines for Grid-Connected Electricity Projects ("GHGP Guidelines").

Usage examples

MBER Orientation and Tutorial by Climate TRACE

See 1 usage example →

Mars Spectrometry 2: Gas Chromatography for the Sample Analysis at Mars Data (SAM) Instrument

analyticsarchivesdeep learningmachine learningNASA SMD AIplanetary

NASA missions like the Curiosity and Perseverance rovers carry a rich array of instruments suited to collect data and build evidence towards answering if Mars ever had livable environmental conditions. These rovers can collect rock and soil samples and can take measurements that can be used to determine their chemical makeup.

Because communication between rovers and Earth is severely constrained, with limited transfer rates and short daily communication windows, scientists have a limited time to analyze the data and make difficult inferences about the chemistry in order to prioritize the next operations and send thos...

Usage examples

Mars Spectrometry: Mars Spectrometry 2 (Challenge Results) by DrivenData team and NASA partners

See 1 usage example →

Mars Spectrometry: Detect Evidence for Past Habitability

analyticsarchivesdeep learningmachine learningNASA SMD AIplanetary

NASA missions like the Curiosity and Perseverance rovers carry a rich array of instruments suited to collect data and build evidence towards answering if Mars ever had livable environmental conditions. These rovers can collect rock and soil samples and can take measurements that can be used to determine their chemical makeup.

Because communication between rovers and Earth is severely constrained, with limited transfer rates and short daily communication windows, scientists have a limited time to analyze the data and make difficult inferences about the chemistry in order to prioritize the next operations and send thos...

Usage examples

Mars Spectrometry: Detect Evidence for Past Habitability (Challenge Results) by DrivenData team and NASA partners

See 1 usage example →

Met Office UK Earth System Model (UKESM1) ARISE-SAI geoengineering experiment data

atmosphereclimateclimate modelCMIP6geospatialicelandmodeloceanssustainability

Data from the UK Earth System Model (UKESM1) ARISE-SAI experiment. The UKESM1 ARISE-SAI experiment explores the impacts of geoengineering via the injection of sulphur dioxide (SO2) into the stratosphere in order to keep global mean surface air temperature near 1.5 C above the pre-industrial climate. Data includes a five member ensemble of simulations with SO2 injection plus a five member ensemble of SSP2-4.5 simulations from CMIP6 to serve as a reference data set

Usage examples

Examples coming soon by MetOffice

See 1 usage example →

Met Office UK Land Surface Observations

atmospherecsvgeospatialprecipitationweather

Land surface weather observations for 31 parameters from over 250 locations across the Met Office UK land observation network. The data is available as CSV files. You can use it to monitor the latest weather affecting a specific location so you can plan for your business or operations.

The observations are produced every minute and transmitted to the Amazon Registry of Open Data every hour. They’re available for a rolling 7-day period (168 hours).
All locations in the observation network are within the bounding box:

-15 (West)
48 (South)
5 (East)
62 (North) On average they are about 40km apart,

...

Usage examples

UK Land Surface Observations Usage Examples by Met Office

See 1 usage example →

Met Office UK Marine Observations

atmospherecsvgeospatialprecipitationweather

Marine surface weather observations for 32 parameters from 69 locations across the Met Office marine observation network. Observations are available for a rolling 7-day period (168 hours). The data is available as CSV files.

The data comes from moored buoys, light vessels and ships with automatic weather stations onboard. Buoys and light vessels are static and you can view their locations on the Met Office Marine Observations page. You can use the data to monitor the latest weather affecting a specific marine location so you can plan for your business or operations.

Join the Met Office researc...

Usage examples

UK Marine Observations Usage Examples by Met Office

See 1 usage example →

Met Office UK Radar Observations on a 2-year rolling archive

atmospheregeospatialh5hdf5precipitationradarweather

The United Kingdom Composite, Surface Rain Rate Estimate is an international radar composite produced by Met Office (UK). This is a composite, radar reflectivity derived, surface rain rate estimate product in HDF5 code from stations covering the United Kingdom.

Usage examples

Radar by Met Office

See 1 usage example →

Multi-robot, Multi-Sensor, Multi-Environment Event Dataset (M3ED)

autonomous vehiclescomputer visiondeep learningevent cameraglobal shutter cameraGNSSGPSh5hdf5IMUlidarmachine learningperceptionroboticsRTK

M3ED is the first multi-sensor event camera (EC) dataset focused on high-speed dynamic motions in robotics applications. M3ED provides high-quality synchronized data from multiple platforms (car, legged robot, UAV), operating in challenging conditions such as off-road trails, dense forests, and performing aggressive flight maneuvers. M3ED also covers demanding operational scenarios for EC, such as high egomotion and multiple independently moving objects. M3ED includes high-resolution stereo EC (1280×720), grayscale and RGB cameras, a high-quality IMU, a 64-beam LiDAR, and RTK localization.

Usage examples

M3ED: Multi-Robot, Multi-Sensor, Multi-Environment Event Dataset by Chaney K, Cladera F, et al.

See 1 usage example →

MultiCoNER Datasets

natural language processing

MultiCoNER 1 is a large multilingual dataset (11 languages) for Named Entity Recognition. It is designed to represent some of the contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities such as movie titles, and long-tail entity distributions. MultiCoNER 2 is a large multilingual dataset (12 languages) for fine grained Named Entity Recognition. Its fine-grained taxonomy contains 36 NE classes, representing real-world challenges for NER, where named entities, apart from the surface form, context represents a critical role in disti...

Usage examples

Dynamic Gazetteer Integration in Multilingual Models for Cross-Lingual and Cross-Domain Named Entity Recognition by Besnik Fetahu, Anjie Fang, Oleg Rokhlenko and Shervin Malmasi
MultiCoNER: A Large-scale Multilingual Dataset for Complex Named Entity Recognition by Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko
Gazetteer Enhanced Named Entity Recognition for Code-Mixed Web Queries by Besnik Fetahu, Anjie Fang, Oleg Rokhlenko and Shervin Malmasi
GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input by Tao Meng, Anjie Fang, Oleg Rokhlenko and Shervin Malmasi

See 4 usage examples →

My School Today

educationgeospatialinfrastructureschools

This database provides estimates of walking travel time of school-aged populations to schools recorded in OpenStreetMap. Population counts of male and female students are sorted into 3 groups of travel time - under 30 minutes, 30-60 minutes, and over 60 minutes. It covers the African continent and is aggregated by first-level administrative divisions.

Usage examples

Map a School by SDGs Today

See 1 usage example →

NCEP/CPC L3 Half Hourly 4km Global (60S - 60N) Merged IR V1 (GPM_MERGIR) at GES DISC

atmosphereclimatedatacenterforecastglobalmetadatanetcdfopendap

These data originate from NOAA/NCEP.The NOAA Climate Prediction Center/NCEP/NWS is making the data available originally in binary format, in a weekly rotating archive. The NASA GES DISC is acquiring the binary files as they become available, converts them into CF (Climate and Forecast) -convention compliant netCDF-4 format, and stores the product in a permanent archive. The original record started from February, 2000, but in June, 2025 it was extended back to January, 1998.The leading edge of data availability is delayed by about 24 hours from real-time to abide by international data exchange agreements between NOAA and EUMETSAT (the METEOSAT data providers).The data ...

Usage examples

How to Access GES DISC Data Using Python by James Acker, Jerome Alfred, Helen Amos, Chris Battisto, Thomas Hearty, Alexis Hunzinger, Lena Iredell, Christoph Keller, Binita KC, Carlee Loeser, Ariana Louise, Kristan Morgan, Dieu My T. Nguyen, Dana Ostrenga, Xiaohua Pan, Kanan Patel, Brianna R. Pagán, Andrey Savtchenko, Elliot Sherman, Suhung Shen, Jian Su,Joseph Wysk, Rupesh Shrestha.

See 1 usage example →

NEOWISE Post-Cryo Data | Wide-field Infrared Survey Explorer (WISE)

astronomyimagingsatellite imagerysurvey

The Wide-field Infrared Survey Explorer (WISE) was a NASA Medium Explorer satellite in low-Earth orbit that conducted an all-sky astronomical imaging survey over four infrared bands from 2010-2011. The NEOWISE Post-Cryo Data Release contains 3.4 and 4.6 micron (W1 and W2) imaging data that were acquired between 29 September 2010 and 1 February 2011 following the exhaustion of the inner and outer cryogen tanks.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

NEOWISE Reactivation Data | Near-Earth Object Wide-field Infrared Survey Explorer (NEOWISE)

astronomyimagingobject detectionparquetsatellite imagerysurvey

The Near-Earth Object Wide-field Infrared Survey Explorer (NEOWISE) is a NASA Medium-class Explorer satellite in low-Earth orbit conducting an all-sky astronomical imaging survey over two infrared bands. The NEOWISE Reactivation mission began in 2013 when the original WISE satellite was brought out of hibernation to learn more about the population of near-Earth objects and comets that could pose an impact hazard to the Earth. The data is also used to study a wide range of astrophysical phenomena in the time domain including brown dwarfs, supernovae and active galactic nuclei.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

NOAA Coastal Lidar Data

climatedisaster responseelevationgeospatiallidarstac

Lidar (light detection and ranging) is a technology that can measure the 3-dimentional location of objects, including the solid earth surface. The data consists of a point cloud of the positions of solid objects that reflected a laser pulse, typically from an airborne platform. In addition to the position, each point may also be attributed by the type of object it reflected from, the intensity of the reflection, and other system dependent metadata. The NOAA Coastal Lidar Data is a collection of lidar projects from many different sources and agencies, geographically focused on the coastal areas of the ...

Usage examples

OpenTopography access and processing of NOAA Coastal Lidar Data by OpenTopography

See 1 usage example →

NOAA Global Surface Summary of Day

agricultureclimateenvironmentalnatural resourceregulatoryweather

Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries. The online data files begin with 1929 and are at the time of this writing at the Version 8 software level. Over 9000 stations' data are typically available. The daily elements included in the dataset (as available from each station) are:
Mean t...

Usage examples

ML Demo: Predicting Air Quality w/ ASDI NOAA + OpenAQ Datasets in SageMaker Studio Lab (SMSL) by Aaron Soto

See 1 usage example →

NOAA HYSPLIT-compatible meteorological data archives

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

The HYSPLIT model is a complete system for computing simple air parcel trajectories, as well as complex transport, dispersion, chemical transformation, and deposition simulations. HYSPLIT continues to be one of the most extensively used atmospheric transport and dispersion models in the atmospheric sciences community. A common application is a back trajectory analysis to determine the origin of air masses and establish source-receptor relationships. HYSPLIT has also been used in a variety of simulations describing the atmospheric transport, dispersion, and deposition of pollutants and hazardou...

Usage examples

HYSPLIT Basic Tutorial by NOAA ARL HYSPLIT

See 1 usage example →

NOAA Integrated Surface Database (ISD)

agricultureclimatemeteorologicalweather

The Integrated Surface Database (ISD) consists of global hourly and synoptic observations compiled from numerous sources into a gzipped fixed width format. ISD was developed as a joint activity within Asheville's Federal Climate Complex. The database includes over 35,000 stations worldwide, with some having data as far back as 1901, though the data show a substantial increase in volume in the 1940s and again in the early 1970s. Currently, there are over 14,000 "active" stations updated daily in the database. The total uncompressed data volume is around 600 gigabytes; however, it ...

Usage examples

NOAA Integrated Surface Database (ISD) Example Notebook by Zac Flamig

See 1 usage example →

NOAA MRMS - dynamical.org Icechunk Zarr

atmosphereclimateforecastmeteorologicalweatherzarr

The NOAA Multi-Radar/Multi-Sensor System (MRMS) integrates data from multiple radars and radar networks, surface observations, numerical weather prediction (NWP) models, and climatology to generate seamless, high spatio-temporal resolution mosaics at low latency focused on hail, wind, tornado, quantitative precipitation estimations, convection, icing, and turbulence.

These datasets have been translated to cloud-optimized Icechunk Zarr format by dynamical.org.

NOAA MRMS CONUS analysis, hourly - Hourly precipitation analysis from the Multi-Radar Multi-Sensor (MRMS) system operated by NOAA NWS

...

Usage examples

NOAA MRMS CONUS analysis, hourly — Quickstart by dynamical.org

See 1 usage example →

NOAA Multi-Year Reanalysis of Remotely Sensed Storms (MYRORSS)

agricultureearth observationmeteorologicalnatural resourcesustainabilityweather

The Multi-Year Reanalysis of Remotely Sensed Storms (MYRORSS) consists of radar reflectivity data run through the Multi-Radar, Multi-Sensor (MRMS) framework to create a three-dimensional radar volume on a quasi-Cartesian latitude-longitude grid across the entire contiguous United States. The radar reflectivity grid is also combined with hourly forecast model analyses to produce derived products such as echo top heights and hail size estimates. Radar Doppler velocity data was also processed into two azimuthal shear layer products. The source radar data was from the NEXRAD Level-II archive and t...

Usage examples

Comprehensive radar data for the contiguous United States: Multi-year reanalysis of remotely sensed storms. by Williams, S. S., K. L. Ortega, T. M. Smith, and A. E. Reinhart

See 1 usage example →

NOAA National Digital Forecast Database (NDFD)

agricultureclimatemeteorologicalweather

The National Digital Forecast Database (NDFD) is a suite of gridded forecasts of sensible weather elements (e.g., cloud cover, maximum temperature). Forecasts prepared by NWS field offices working in collaboration with the National Centers for Environmental Prediction (NCEP) are combined in the NDFD to create a seamless mosaic of digital forecasts from which operational NWS products are generated. The most recent data is under the opnl and expr prefixes. A copy is also placed under the wmo prefix. The wmo prefix is structured like so: wmo/<parameter>/<year>/<month>/<day...

Usage examples

NDFD Product Spreadsheet (excel file) by NOAA MDL

See 1 usage example →

NOAA National Water Model Short-Range Forecast

agricultureagricultureclimatedisaster responseenvironmentaltransportationweather

The National Water Model (NWM) is a water resources model that simulates and forecasts water budget variables, including snowpack, evapotranspiration, soil moisture and streamflow, over the entire continental United States (CONUS). The model, launched in August 2016, is designed to improve the ability of NOAA to meet the needs of its stakeholders (forecasters, emergency managers, reservoir operators, first responders, recreationists, farmers, barge operators, and ecosystem and floodplain managers) by providing expanded accuracy, detail, and frequency of water information. It is operated by NOA...

Usage examples

Harmonic Oscillator Seasonal Trend (HOST) Model for Hydrological Drought Pattern Identification and Analysis by K. Raczyński, J. Dyer

See 1 usage example →

NOAA S-111 Surface Water Currents Data

oceanswater

S-111 is a data and metadata encoding specification that is part of the S-100 Universal Hydrographic Data Model, an international standard for hydrographic data. This collection of data contains surface water currents forecast guidance from NOAA/NOS Operational Forecast Systems, a set of operational hydrodynamic nowcast and forecast modeling systems, for various U.S. coastal waters and the great lakes. The collection also contains surface current forecast guidance output from the NCEP Global Real-Time Ocean Forecast System (GRTOFS) for some offshore areas. These datasets are encoded as HDF-5 f...

Usage examples

NOAA Precision Marine Navigation Program: Developing Next-Gen Data Svcs for the Maritime Community by NOAA

See 1 usage example →

NOAA U.S. Climate Normals

agricultureclimatemeteorologicalsustainabilityweather

The U.S. Climate Normals are a large suite of data products that provide information about typical climate conditions for thousands of locations across the United States. Normals act both as a ruler to compare today’s weather and tomorrow’s forecast, and as a predictor of conditions in the near future. The official normals are calculated for a uniform 30 year period, and consist of annual/seasonal, monthly, daily, and hourly averages and statistics of temperature, precipitation, and other climatological variables from almost 15,000 U.S. weather stations.

NCEI generates the official U.S. norma...

Usage examples

Investigating environmental characteristics of US cities using publicly available ASDI datasets using SageMaker Studio Lab (SMSL) by Darren Ko

See 1 usage example →

NOAA Wave Ensemble Reforecast

agricultureclimatemeteorologicalweather

This is a 20-year global wave reforecast generated by WAVEWATCH III model (https://github.com/NOAA-EMC/WW3) forced by GEFSv12 winds (https://noaa-gefs-retrospective.s3.amazonaws.com/index.html). The wave ensemble was run with one cycle per day (at 03Z), spatial resolution of 0.25°X0.25° and temporal resolution of 3 hours. There are five ensemble members (control plus four perturbed members) and, once a week (Wednesdays), the ensemble is expanded to eleven members. The forecast range is 16 days and, once a week (Wednesdays), it extends to 35 days. More information about the wave modeling, wave grids and calibration can be found in the WAVEWATCH III regtest ww3_ufs1.3 (Details →

Usage examples

Examples and scripts to support users to download, visualize, and process the WAVEWATCH III output files by Ricardo Campos

See 1 usage example →

NOAA nClimGrid and Livneh Gridded Historical Climate Observation Thresholds

agricultureclimateenvironmentalmeteorologicalweather

Livneh and nClimGrid are gridded observed historical climatology data that were used in the LOCA2 and STAR-ESDM downscaling process of global climate models as part of the 5th National Climate Assessment. The original Livneh and nClimGrid daily temperature and precipitation observations have been converted to a series of decision-relevant thresholds as part of the (U.S. Climate Resilience Information System (CRIS)). These thresholds, such as days with extreme heat or precipitation, have been calculated to match the future projections from LOCA2 and STAR, also available in CRIS.

Usage examples

U.S. CRIS Resources by U.S. CRIS

See 1 usage example →

NOAA/PMEL Ocean Climate Stations Moorings

climateenvironmentaloceansweather

The mission of the Ocean Climate Stations (OCS) Project is to make meteorological and oceanic measurements from autonomous platforms. Calibrated, quality-controlled, and well-documented climatological measurements are available on the OCS webpage and the OceanSITES Global Data Assembly Centers (GDACs), with near-realtime data available prior to release of the complete, downloaded datasets.

OCS measurements served through the Big Data Program come from OCS high-latitude moored buoys located in the Kuroshio Extension (32°N 145°E) and the Gulf of Alaska (50°N 145°W). Initiated in 2004 and 20...

Usage examples

OCS publications - All OCS-relevant publications are updated at the URL below. by PMEL

See 1 usage example →

NSF NCAR Curated ECMWF Reanalysis 5 (ERA5)

atmosphereclimatedata assimilationforecastgeosciencegeospatiallandmeteorologicalmodelnetcdfweather

NSF NCAR is providing a NetCDF-4 structured version of the 0.25 degree atmospheric ECMWF Reanalysis 5 (ERA5) to the AWS ODSP. ERA5 is produced using high-resolution forecasts (HRES) at 31 kilometer resolution (one fourth the spatial resolution of the operational model) and a 62 kilometer resolution ten member 4D-Var ensemble of data assimilation (EDA) in CY41r2 of ECMWF's Integrated Forecast System (IFS) with 137 hybrid sigma-pressure (model) levels in the vertical, up to a top level of 0.01 hPa. Atmospheric data on these levels are interpolated to 37 pressure levels (the same levels as in...

Usage examples

The ERA5 global reanalysis by Hersbach et al 2020

See 1 usage example →

NYUMets Brain Dataset

biologycancercomputer visionhealthimage processingimaginglife sciencesmachine learningmagnetic resonance imagingmedical imagingmedicineneurobiologyneuroimagingsegmentation

This dataset contains 8,000+ brain MRIs of 2,000+ patients with brain metastases.

Usage examples

Longitudinal deep neural networks for assessing metastatic brain cancer on a massive open benchmark. by Link et al (2023)

See 1 usage example →

National Herbarium of Israel

biodiversitybiologyclimatedigital preservationenvironmentalimage processingimaginglife sciences

Our collection encompasses approximately one million vascular plant specimens from the Mediterranean and Middle East biodiversity hotspot, representing flora from Israel, Jordan, Hermon, Sinai, Egypt, the Caucasus, Arabia, North Africa, and throughout the Mediterranean basin. This scientifically significant repository includes published voucher specimens, original specimens used for "Flora Palaestina" illustrations, and critical references for the Israeli gene bank collections. The ongoing digitization process captures high-resolution images of each specimen while systematically inco...

Usage examples

How to use AWS S3 bucket to explore our public images dataset by Eyal Ben-Hur

See 1 usage example →

Natural Earth

earth observationgeospatialglobalmappingpopulationtiles

Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.

Usage examples

Natural Earth Vector (2009) by Nathaniel Vaughn Kelso, Tom Patterson

See 1 usage example →

New Jersey Statewide Digital Aerial Imagery Catalog

aerial imagerycogearth observationgeospatialimagingmapping

The New Jersey Office of GIS, NJ Office of Information Technology manages a series of 11 digital orthophotography and scanned aerial photo maps collected at various years ranging from 1930 to 2017. Each year’s worth of imagery are available as Cloud Optimized GeoTIFF (COG) files and some years are available as compressed MrSID and/or JP2 files. Additionally, each year of imagery is organized into a tile grid scheme covering the entire geography of New Jersey. Many years share the same tiling grid while others have unique grids as defined by the project at the time.

Usage examples

Visualize Imagery Changes by stephanie.bosits@tech.nj.gov

See 1 usage example →

New Jersey Statewide LiDAR

elevationgeospatiallidarmapping

Elevation datasets in New Jersey have been collected over several years as several discrete projects. Each project covers a geographic area, which is a subsection of the entire state, and has differing specifications based on the available technology at the time and project budget. The geographic extent of one project may overlap that of a neighboring project. Each of the 18 projects contains deliverable products such as LAS (Lidar point cloud) files, unclassified/classified, tiled to cover project area; relevant metadata records or documents, most adhering to the Federal Geographic Data Com...

Usage examples

3D Visualization by stephanie.bosits@tech.nj.us

See 1 usage example →

OPERA Surface Displacement from Sentinel-1 validated product (Version 1)

earth observationlandmetadatanetcdforbitradarsentinel-1synthetic aperture radarxmlzarr

The Level-3 OPERA Sentinel-1 Surface Displacement (DISP) product is generated through interferometric time-series analysis of Level-2 Coregistered Sentinel-1 Single Look Complex (CSLC) datasets. Using a hybrid Persistent Scatterer (PS) and Distributed Scatterer (DS) approach, this product quantifies Earth's surface displacement in the radar line-of-sight. The DISP products enable the detection of anthropogenic and natural surface changes, including subsidence, tectonic deformation, and landslides. The OPERA DISP suite comprises complementary datasets derived from Sentinel-1 and NISAR input...

Usage examples

Inspect DISP-S1 Layers by M. Grace Bato

See 1 usage example →

Ohio State Cardiac MRI Raw Data (OCMR)

Homo sapiensimage processingimaginglife sciencesmagnetic resonance imagingsignal processing

OCMR is an open-access repository that provides multi-coil k-space data for cardiac cine. The fully sampled MRI datasets are intended for quantitative comparison and evaluation of image reconstruction methods. The free-breathing, prospectively undersampled datasets are intended to evaluate their performance and generalizability qualitatively.

Usage examples

OCMR Tutorial by Chong Chen

See 1 usage example →

OpenNeuro

biologyimaginglife sciencesneurobiologyneuroimagingneuroscience

OpenNeuro is a database of openly-available brain imaging data. The data are shared according to a Creative Commons CC0 license, providing a broad range of brain imaging data to researchers and citizen scientists alike. The database primarily focuses on functional magnetic resonance imaging (fMRI) data, but also includes other imaging modalities including structural and diffusion MRI, electroencephalography (EEG), and magnetoencephalograpy (MEG). OpenfMRI is a project of the Center for Reproducible Neuroscience at Stanford University. Development of the OpenNeuro resource has been funded by th...

Usage examples

Accessing UCLA Consortium for Neuropsychiatric Phenomics from OpenNeuro with Scigantic by Scigantic

See 1 usage example →

OpenSurfaces

computer vision

A large database of annotated surfaces created from real-world consumer photographs.

Usage examples

OpenSurfaces: A Richly Annotated Catalog of Surface Appearance by Sean Bell, Paul Upchurch, Noah Snavely, Kavita Bala

See 1 usage example →

OpenUniverse 2024 Simulated Roman & Rubin Images

astronomyimagingobject detectionparquetsatellite imagerysimulationssurvey

This release consists of simulated data products designed to mimic observations of the same region of the sky as seen by two astronomical facilities: the Nancy Grace Roman Telescope and the Vera C. Rubin Observatory.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

Orcasound - bioacoustic data for marine conservation

biodiversitybiologycoastalconservationdeep learningecosystemsenvironmentalgeospatiallabeledmachine learningmappingoceansopen source softwaresignal processing

Live-streamed and archived audio data (~2018-present) from underwater microphones (hydrophones) containing marine biological signals as well as ambient ocean noise. Hydrophone placement and passive acoustic monitoring effort prioritizes detection of orca sounds (calls, clicks, whistles) and potentially harmful noise. Geographic focus is on the US/Canada critical habitat of Southern Resident killer whales (northern CA to central BC) with initial focus on inland waters of WA. In addition to the raw lossy or lossless compressed data, we provide a growing archive of annotated bioacoustic bouts.

Usage examples

Github for our open source projects by Orcasound open source community

See 1 usage example →

Oxford Nanopore Technologies Benchmark Datasets

bioinformaticsbiologyfast5fastqgenomicHomo sapienslife scienceswhole genome sequencing

The ont-open-data registry provides reference sequencing data from Oxford Nanopore Technologies to support, 1) Exploration of the characteristics of nanopore sequence data. 2) Assessment and reproduction of performance benchmarks 3) Development of tools and methods. The data deposited showcases DNA sequences from a representative subset of sequencing chemistries. The datasets correspond to publicly-available reference samples (e.g. Genome In A Bottle reference cell lines). Raw data are provided with metadata and scripts to describe sample and data provenance.

Usage examples

ONT Dataset Tutorials by EPI2MELabs

See 1 usage example →

PALSAR-2 ScanSAR CARD4L (L2.2)

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsustainabilitysynthetic aperture radar

The 25 m PALSAR-2 ScanSAR is normalized backscatter data of PALSAR-2 broad area observation mode with observation width of 350 km. The SAR imagery was ortho-rectificatied and slope corrected using the ALOS World 3D - 30 m (AW3D30) Digital Surface Model. Polarization data are stored as 16-bit digital numbers (DN). The DN values can be converted to gamma naught values in decibel unit (dB) using the following equation: γ0 = 10*log10(DN2) - 83.0 dB CARD4L stands for CEOS Analysis Ready Data for Land (Level 2.2) data are ortho-rectified and radiometrically terrain-corrected. This datase...

Usage examples

ALOS series Open and Free Data by JAXA EORC

See 1 usage example →

PALSAR-2 ScanSAR Flooding in Rwanda (L2.1)

agriculturecogdeafricadisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsustainabilitysynthetic aperture radar

Torrential rainfall triggered flooding and landslides in many parts of Rwanda. The hardest-hit districts were Ngororero, Rubavu, Nyabihu, Rutsiro and Karongi. According to reports, 14 people have died in Karongi, 26 in Rutsiro, 18 in Rubavu, 19 in Nyabihu and 18 in Ngororero.Rwanda National Police reported that the Mukamira-Ngororero and Rubavu-Rutsiro roads are impassable due to flooding and landslide debris. UNITAR on behalf of United Nations Office for the Coordination of Humanitarian Affairs (OCHA) / Regional Office for Southern & Eastern Africa in cooperation with Rwanda Space Agency ...

Usage examples

ALOS series Open and Free Data by JAXA EORC by JAXA EORC

See 1 usage example →

PALSAR-2 ScanSAR Tropical Cycolne Mocha (L2.1)

agriculturecogdisaster responseearth observationgeospatialnatural resourcesatellite imagerystacsustainabilitysynthetic aperture radar

Tropical Cyclone Mocha began to form in the Bay of Bengal on 11 May 2023 and continues to intensify as it moves towards Myanmar and Bangladesh.Cyclone Mocha is the first storm to form in the Bay of Bengal this year and is expected to hit several coastal areas in Bangladesh on 14 May with wind speeds of up to 175 km/h.After made its landfall in the coast between Cox’s Bazar (Bangladesh) and Kyaukphyu (Myanmar) near Sittwe (Myanmar). At most, Catastrophic Damage-causing winds was possible especially in the areas of Rakhine State and Chin State, and Severe Damage-causing winds is possible in the ...

Usage examples

ALOS series Open and Free Data by JAXA EORC by JAXA EORC

See 1 usage example →

Quoref

machine learningnatural language processing

24K Question/Answer (QA) pairs over 4.7K paragraphs, split between train (19K QAs), development (2.4K QAs) and a hidden test partition (2.5K QAs).

Usage examples

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning by Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner

See 1 usage example →

RSNA Abdominal Trauma Detection (RSNA-ABT)

computed tomographycomputer visioncsvlabeledlife sciencesmachine learningmedical image computingmedical imagingradiologyx-ray tomography

Blunt force abdominal trauma is among the most common types of traumatic injury, with the most frequent cause being motor vehicle accidents. Abdominal trauma may result in damage and internal bleeding of the internal organs, including the liver, spleen, kidneys, and bowel. Detection and classification of injuries are key to effective treatment and favorable outcomes. A large proportion of patients with abdominal trauma require urgent surgery. Abdominal trauma often cannot be diagnosed clinically by physical exam, patient symptoms, or laboratory tests. Prompt diagnosis of abdominal trauma using...

Usage examples

The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset by Rudie, Jeffrey D.

See 1 usage example →

RSNA Abdominal Traumatic Injury CT (RATIC)

computed tomographycomputer visioncsvlabeledlife sciencesmachine learningmedical image computingmedical imagingradiologyx-ray tomography

Blunt force abdominal trauma is among the most common types of traumatic injury, with the most frequent cause being motor vehicle accidents. Abdominal trauma may result in damage and internal bleeding of the internal organs, including the liver, spleen, kidneys, and bowel. Detection and classification of injuries are key to effective treatment and favorable outcomes. A large proportion of patients with abdominal trauma require urgent surgery. Abdominal trauma often cannot be diagnosed clinically by physical exam, patient symptoms, or laboratory tests. Prompt diagnosis of abdominal trauma using...

Usage examples

The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset by Rudie, Jeffrey D.

See 1 usage example →

RSNA Cervical Spine Fracture Detection (RSNA-CSF) Dataset

computed tomographycomputer visioncsvlabeledlife sciencesmachine learningmedical image computingmedical imagingradiologyx-ray tomography

Over 1.5 million spine fractures occur annually in the United States alone resulting in over 17,730 spinal cord injuries annually. The most common site of spine fracture is the cervical spine. There has been a rise in the incidence of spinal fractures in the elderly and in this population, fractures can be more difficult to detect on imaging due to degenerative disease and osteoporosis. Imaging diagnosis of adult spine fractures is now almost exclusively performed with computed tomography (CT). Quickly detecting and determining the location of any vertebral fractures is essential to prevent ne...

Usage examples

The RSNA Cervical Spine Fracture CT Dataset by Ming, Hui Lin

See 1 usage example →

RSNA Intracranial Aneurysm Detection Dataset (RSNA-ICA)

computer visioncsvlabeledlife sciencesmachine learningmedical image computingmedical imagingradiology

The Radiological Society of North America Intracranial Aneurysm Detection (RSNA-ICA) dataset is a collection of over 4,000 CT brain scans annotated by a cohort of over 40 volunteer radiologists from RSNA and the American Society of Neuroradiology to show the presence and location of intracranial aneurysms. It also includes a set of about 200 imaging studies that are annotated with AI-generated segmentations highlighting abnormalities. The imaging data was provided by 18 institutions. Initially compiled in 2025 for the RSNA Intracranial Aneurysm Detection AI Challenge hosted on Kaggle competiti...

Usage examples

The RSNA Intercranial Aneurysm Detection Dataset by Authors, Various

See 1 usage example →

RSNA Intracranial Hemorrhage Detection

computed tomographycomputer visioncsvlabeledlife sciencesmachine learningmedical image computingmedical imagingradiologyx-ray tomography

RSNA assembled this dataset in 2019 for the RSNA Intracranial Hemorrhage Detection AI Challenge (https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/). De-identified head CT studies were provided by four research institutions. A group of over 60 volunteer expert radiologists recruited by RSNA and the American Society of Neuroradiology labeled over 25,000 exams for the presence and subtype classification of acute intracranial hemorrhage.

Usage examples

Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge by Rudie, Jeffrey D.

See 1 usage example →

RSNA Lumbar Spine Degenerative Classification Dataset (RSNA-LSDD)

computer visioncsvlabeledlife sciencesmachine learningmedical image computingmedical imagingradiology

The Radiological Society of North America Lumbar Spine Degenerative Classification dataset (RSNA-LSDD) is a collection of over 2,600 magnetic resonance imaging (MR) scans of the lumbar spine annotated by a cohort of about 60 volunteer radiologists recruited by the RSNA, the American Society for Spine Radiology and the American Society of Neuroradiology to identify the location and severity of five degenerative conditions across the five intervertebral disc levels (L1/L2, L2/L3, L3/L4, L4/L5, and L5/S1). The imaging data, comprising over 8,500 image series (Sagittal “T2”, Axial T2 and Sagittal ...

Usage examples

The RSNA Lumbar Spine Degenerative Classification Dataset by Authors, Various

See 1 usage example →

RSNA Pulmonary Embolism Detection

computed tomographycomputer visioncsvlabeledlife sciencesmachine learningmedical image computingmedical imagingradiologyx-ray tomography

RSNA assembled this dataset in 2020 for the RSNA STR Pulmonary Embolism Detection AI Challenge (https://www.kaggle.com/c/rsna-str-pulmonary-embolism-detection/). With more than 12,000 CT pulmonary angiography (CTPA) studies contributed by five international research centers, it is the largest publicly available annotated PE dataset. RSNA collaborated with the Society of Thoracic Radiology to recruit more than 80 expert thoracic radiologists who labeled the dataset with detailed clinical annotations.

Usage examples

The RSNA Pulmonary Embolism CT Dataset by Colak, Errol

See 1 usage example →

Reasoning Over Paragraph Effects in Situations (ROPES)

jsonmachine learningnatural language processing

14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs).

Usage examples

Reasoning Over Paragraph Effects in Situations by Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner

See 1 usage example →

SENTINEL-1A_DUAL_POL_GRD_HIGH_RES

agriculturecoastalearth observationearthquakesecosystemsicelandland coverland usemetadataoceansradarsentinel-1stacsurface watersynthetic aperture radartiffurbanwater

Sentinel-1A Dual-pol ground projected high and full resolution images Read our doc on how to get AWS Credentials to retrieve this data: https://sentinel1.asf.alaska.edu/s3credentialsREADME

Usage examples

Interferometric Synthetic Aperture Radar Tutorial by LiveEO

See 1 usage example →

SENTINEL-1A_SLC

coastalearthquakesecosystemsicelandland coverland usemetadataoceansorbitradarsentinel-1stacsurface watersynthetic aperture radartiffurbanwater

Sentinel-1A slant-range product Read our doc on how to get AWS Credentials to retrieve this data: https://sentinel1.asf.alaska.edu/s3credentialsREADME

Usage examples

Interferometric Synthetic Aperture Radar Tutorial by LiveEO

See 1 usage example →

SENTINEL-1B_DUAL_POL_GRD_HIGH_RES

agriculturecoastalearthquakesecosystemsicelandland coverland usemetadataoceansradarsentinel-1stacsurface watersynthetic aperture radartiffurbanwater

Sentinel-1B Dual-pol ground projected high and full resolution images Read our doc on how to get AWS Credentials to retrieve this data: https://sentinel1.asf.alaska.edu/s3credentialsREADME

Usage examples

Interferometric Synthetic Aperture Radar Tutorial by LiveEO

See 1 usage example →

SENTINEL-1B_SLC

agriculturecoastalearthquakesecosystemsicelandland coverland usemetadataoceansorbitradarsentinel-1stacsurface watersynthetic aperture radartiffurbanwater

Sentinel-1B slant-range product Read our doc on how to get AWS Credentials to retrieve this data: https://sentinel1.asf.alaska.edu/s3credentialsREADME

Usage examples

Interferometric Synthetic Aperture Radar Tutorial by LiveEO

See 1 usage example →

SILAM Air Quality

air qualityclimateearth observationmeteorologicalweather

Air Quality is a global SILAM atmospheric composition and air quality forecast performed on a daily basis for > 100 species and covering the troposphere and the stratosphere. The output produces 3D concentration fields and aerosol optical thickness. The data are unique: 20km resolution for global AQ models is unseen worldwide.

Usage examples

Simple examples by Roope Tervo

See 1 usage example →

SPHEREx Quick Release (QR): An All-Sky Spectral Survey

astronomyimagingobject detectionsatellite imagerysurvey

The Spectro-Photometer for the History of the Universe, Epoch of Reionization, and Ices Explorer (SPHEREx) is a NASA Astrophysics Medium-class Explorer (MIDEX) mission launched in March 2025. During its planned two-year mission, SPHEREx will perform the first ever all-sky spectral survey in the optical to near-infrared (0.75-5 microns). SPHEREx data will be used to probe inflation and the early universe, trace the history of galactic light production, and investigate the origin of planetary systems and biogenic ices, in addition to contributing to many other astrophysics research topics. IRSA ...

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

Safecast

air qualityclimateenvironmentalgeospatialradiation

An ongoing collection of radiation and air quality measurements taken by devices involved in the Safecast project.

Usage examples

Safecast Map by Nick Dolezal

See 1 usage example →

Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD)

biologycell biologycell imagingepigenomicsgene expressionhistopathologyHomo sapiensimaginglife sciencesmedicinemicroscopyneurobiologyneurosciencesingle-cell transcriptomicstranscriptomics

The Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD) consortium strives to gain a deep molecular and cellular understanding of the early pathogenesis of Alzheimer's disease and is funded by the National Institutes on Aging (NIA U19AG060909). The SEA-AD datasets available here comprise single cell profiling (transcriptomics and epigenomics) and quantitative neuropathology. To explore gene expression and chromatin accessibility information, the single-cell profiling data includes: snRNAseq and snATAC-seq data from the SEA-AD donor cohort (aged brains which span the spectrum of Alzhe...

Usage examples

Seattle Alzheimer’s Disease Brain Cell Atlas by Lein, E et al.

See 1 usage example →

Sentinel-1 SLC dataset for Germany

disaster responseearth observationenvironmentalgeospatialsatellite imagerysustainabilitysynthetic aperture radar

The Sentinel1 Single Look Complex (SLC) unzipped dataset contains Synthetic Aperture Radar (SAR) data from the European Space Agency’s Sentinel-1 mission. Different from the zipped data provided by ESA, this dataset allows direct access to individual swaths required for a given study area, thus drastically minimizing the storage and downloading time requirements of a project. Since the data is stored on S3, users can utilize the boto3 library and s3 get_object method to read the entire content of the object into the memory for processing, without actually having to download it. The Sentinel-1 ...

Usage examples

Interferometric Synthetic Aperture Radar Tutorial by LiveEO

See 1 usage example →

Single-Cell Atlas of Human Blood During Healthy Aging

life sciencesproteinsingle-cell transcriptomics

Comprehensive, large-scale single-cell profiling of healthy human blood at different ages is one of the critical pending tasks required to establish a framework for systematic understanding of human aging. Here, using single-cell RNA/TCR/BCR-seq with protein feature barcoding (20 antibodies), we profiled 317 samples from 166 healthy individuals aged 25 to 85 years old drawn over 3-year period. Dataset spanning ~2 million cells describes 50 subpopulations of blood immune cells, with 14 subpopulations changing with age, including a novel NKG2C+ CD8 Tcm population that decreases with age. We desc...

Usage examples

Single-cell atlas of healthy human blood unveils age-related loss of NKG2C+GZMB−CD8+ memory T cells and accumulation of type 2 memory T cells by Terekhova M, Swain A, Bohacova P, Aladyeva E, Arthur L, et al

See 1 usage example →

SpaceEye-T VVHR EO Open Data

disaster responseearth observationgeospatialimage processingsatellite imagery

SpaceEye-T satellite collects the highest resolution optical imagery among the commercial satellites, 25 cm resolution. The Open Data features various satellite images around the world for end users to experience the power of VVHR optical data.

Usage examples

SpaceEye-T Data Manual by SI-Imaging Services

See 1 usage example →

Spitzer Enhanced Imaging Products (SEIP) Super Mosaics

astronomyimagingsatellite imagerysurvey

Spitzer was an infrared astronomy space telescope with imaging from 3 to 160 microns and spectroscopy from 5 to 37 microns, launched into an Earth-trailing solar orbit as the last of NASA's Great Observatories. The SEIP Super Mosaics include data from the four channels of IRAC (3.6, 4.5, 5.8, 8 microns) and the 24 micron channel of MIPS. Data from multiple programs are combined where appropriate. Cryogenic Release v3.0 includes Spitzer data taken during commissioning and cryogenic operations, including calibration data.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

Sup3rCC

air temperatureclimate modelenergysolar

Released to the public as part of the Department of Energy's Open Energy Data Initiative, these data represent a serially complete collection of hourly 4km wind, solar, temperature, humidity, and pressure fields for the Continental United States under climate change scenarios.Sup3rCC is downscaled Global Climate Model (GCM) data. For example, the initial file set tagged "sup3rcc_conus_mriesm20_ssp585_r1i1p1f1" is downscaled from MRI ESM 2.0 for climate change scenario SSP5 8.5 and variant label r1i1p1f1. The downscaling process is performed using a generative machine learning...

Usage examples

Using the Sup3rCC Data by Grant Buster

See 1 usage example →

Swiss Public Transport Stops

citiesgeospatialinfrastructuremappingtraffictransportation

The basic geo-data set for public transport stops comprises public transport stops in Switzerland and additional selected geo-referenced public transport locations that are of operational or structural importance (operating points).

Usage examples

Map Viewer by Swiss Geoportal

See 1 usage example →

Synthea Coherent Data Set

bioinformaticscsvdicomgenomichealthimaginglife sciencesmedicine

This is a synthetic data set that includes FHIR resources, DICOM images, genomic data, physiological data (i.e., ECGs), and simple clinical notes. FHIR links all the data types together.

Usage examples

The “Coherent Data Set”: Combining Patient Data and Imaging in a Comprehensive Synthetic Health Record. by Walonoski J, Hall D, Bates KM, Farris MH, Dagher J, Downs ME, Sivek RT, Wellner B, Gregorowicz A, Hadley M, Campion FX, Levine L, Wacome K, Emmer G, Kemmer A, Malik M, Hughes J, Granger E, Russell S.

See 1 usage example →

Tabula Muris

Biohubbiologyencyclopedicgenomichealthlife sciencesmedicine

Tabula Muris is a compendium of single cell transcriptomic data from the model organism Mus musculus comprising more than 100,000 cells from 20 organs and tissues. These data represent a new resource for cell biology, reveal gene expression in poorly characterized cell populations, and allow for direct and controlled comparison of gene expression in cell types shared between tissues, such as T-lymphocytes and endothelial cells from different anatomical locations. Two distinct technical approaches were used for most organs: one approach, microfluidic droplet-based 3’-end counting, enabled the s...

Usage examples

Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. by Tabula Muris Consortium (2019)

See 1 usage example →

Tabula Muris Senis

Biohubbiologyencyclopedicgenomichealthlife sciencesmedicinesingle-cell transcriptomics

Tabula Muris Senis is a comprehensive compendium of single cell transcriptomic data from the model organism Mus musculus comprising more than 500,000 cells from 18 organs and tissues across the mouse lifespan. We discovered cell-specific changes occurring across multiple cell types and organs, as well as age related changes in the cellular composition of different organs. Using single-cell transcriptomic data we were able to assess cell type specific manifestations of different hallmarks of aging, such as senescence, changes in the activity of metabolic pathways, depletion of stem-cell populat...

Usage examples

Fast queries of scRNAseq datasets with Amazon Athena by Andrew Ang, James Golden, Lisa McFerrin, and Lee Pang

See 1 usage example →

The JWST Advanced Extragalactic Survey JADES

astronomy

JADES is an infrared imaging and multi-object spectroscopy survey focused on two deep fields: the Hubble Deep Field (GOODS-N) and Hubble Ultra Deep Field (GOODS-S). JADES conducted NIRCam imaging in 8-10 bands, covering about 42 square arcminutes to very deep limits (fainter than 30th magnitude) with an average of about 100 hours of total exposure time, and then another 167 square arcminutes to a typical exposure time of 25 hrs. Coordinated parallels with the MIRI instrument extend this imaging further into the infrared in smaller regions. JADES then performed extensive NIRSpec spectroscopy ...

Usage examples

TIKE: a free, Jupyter-based cloud platform to access and analyze MAST's AWS timeseries data by MAST Staff

See 1 usage example →

U.S. Census ACS PUMS

censusstatisticssurvey

U.S. Census Bureau American Community Survey (ACS) Public Use Microdata Sample (PUMS) available in a linked data format using the Resource Description Framework (RDF) data model.

Usage examples

Setting up Blazegraph on EC2 by data.world

See 1 usage example →

UCSF Primary Central Nervous System Lymphoma MRI Dataset

brain imagescancerlife sciencesmagnetic resonance imagingmedical imagingmedicineneuroimagingradiology

This BIDS-formatted dataset provides multimodal brain MRI data from 150 patients with primary central nervous system lymphoma (PCNSL), including T1-weighted, contrast-enhanced T1-weighted, FLAIR, and ADC sequences. The dataset includes expert-annotated lesion segmentations with radiomic features, along with anonymized clinical data including demographics, diagnosis history, and medications.

Usage examples

PCNSL Data Access Tutorial by Michael Francis Romano

See 1 usage example →

UK Biobank Pharma Proteomics Project (UKB-PPP)

genome wide association studylife sciencespopulation genetics

The UKB-PPP is a collaboration between the UK Biobank (UKB) and thirteen biopharmaceutical companies characterising the plasma proteomic profiles of 54,219 UKB participants. As part of a collaborative analysis across the thirteen UKB-PPP partners, we conducted comprehensive protein quantitative trait loci (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 85% are newly discovered, in addition to ancestry-specific pQTL mapping in non-Europeans. We identify independent secondary associations in 87% of cis and 30% of trans loci, expanding the catalogue ...

Usage examples

Plasma proteomic associations with genetics and health in the UK Biobank by Sun B, Chiou J, Traylor M, Benner C, Hsu Y, Richardson T, et al

See 1 usage example →

Unblurred Coadds of the Wide-field Infrared Survey Explorer (unWISE)

astronomyobject detectionparquetsurvey

unWISE is a reprocessing of Wide-field Infrared Survey Explorer (WISE) data which preserves the native angular resolution and is optimized for forced photometry. WISE was a NASA satellite producing all-sky imaging in four infrared bands centered at 3.4, 4.6, 12 and 22 microns (W1, W2, W3, and W4) starting in 2010 until the coolant was exhausted in 2011. It was reactivated in 2013 as NEOWISE and continued imaging in W1 and W2 until 2024.

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

Virtual Shizuoka, 3D Point Cloud Data

bathymetrydisaster responseelevationgeospatialjapaneselandlidarmapping

This dataset comprises high-precision 3D point cloud data that encompasses the entire Shizuoka prefecture in Japan, covering 7,200 out of its 7,777 square kilometers. The data is produced through aerial laser survey, airborne laser bathymetry and mobile mapping systems, the culmination of many years of dedicated effort.This data will be visualized and analyzed for use in infrastructure maintenance, disaster prevention measures and autonomous vehicle driving.

Usage examples

Tutorial of handling LAS format point cloud data by AIGID

See 1 usage example →

VitalDB

biologyhealthlife sciencesmedicinesignal processing

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients.

Usage examples

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients by Hyung-Chul Lee, Yoonsang Park, Soo Bin Yoon, Seong Mi Yang, Dongnyeok Park, and Chul-Woo Jung

See 1 usage example →

Voices Obscured in Complex Environmental Settings (VOiCES)

automatic speech recognitiondenoisingmachine learningspeaker identificationspeech processing

VOiCES is a speech corpus recorded in acoustically challenging settings, using distant microphone recording. Speech was recorded in real rooms with various acoustic features (reverb, echo, HVAC systems, outside noise, etc.). Adversarial noise, either television, music, or babble, was concurrently played with clean speech. Data was recorded using multiple microphones strategically placed throughout the room. The corpus includes audio recordings, orthographic transcriptions, and speaker labels.

Usage examples

Getting started with VOiCES data by M.A. Barrios

See 1 usage example →

World Bank Climate Change Knowledge Portal (CCKP)

climateclimate modelclimate projectionsCMIP6earth observationnetcdf

CCKP provides open access to a comprehensive suite of climate and climate change resources derived from the latest generation of climate data archives. Products are based on a consistent and transparent approach with a systematic way of pre-processing the raw observed and model-based projection data to enable inter-comparable use across a broad range of applications. Climate products consist of basic climate variables as well as a large collection (70+) of more specialized, application-orientated variables and indices across different scenarios. Precomputed data can be extracted per specified ...

Usage examples

World Bank Climate Change Knowledge Portal observed and projected climate datasets by World Bank

See 1 usage example →

Xiph.Org Test Media

computer visionimage processingimagingmediamoviesmultimediavideo

Uncompressed video used for video compression and video processing research.

Usage examples

Encoding video with AV1 on EC2 by Thomas Daede

See 1 usage example →

ZINC Database

biologychemical biologylife sciencesmolecular dockingpharmaceuticalprotein

3D models for molecular docking screens.

Usage examples

ZINC Database by John Irwin

See 1 usage example →

iHART Whole Genome Sequencing Data Set

autism spectrum disorderbamgeneticgenomiclife sciencesvcfwhole genome sequencing

iHART is the Hartwell Foundation’s Autism Research and Technology Initiative. This release contains whole genome data from over 1000 families with 2 or more children with autism, of which biomaterials were provided by the Autism Genetic Resource Exchange (AGRE).

Usage examples

Inherited and De Novo Genetic Risk for Autism Impacts Shared Networks by Ruzzo et al. (2020)

See 1 usage example →

recount3

bioinformaticsbiologycancercsvgene expressiongeneticgenomicHomo sapienslife sciencesMus musculusneurosciencetranscriptomics

recount3 is an online resource consisting of RNA-seq gene, exon, and exon-exon junction counts as well as coverage bigWig files for 8,679 and 10,088 different studies for human and mouse respectively. It is the third generation of the ReCount project and part of recount.bio. recount2 is also included for historical purposes. The pipeline used to generate the data in recount3 (but not recount2) is available here.

Usage examples

recount3 quick start guide by Leonardo Collado-Torres

See 1 usage example →

(EXPERIMENTAL) NOAA FourCastNet Global Forecast System (FourCastNetGFS) (EXPERIMENTAL)

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

The FourCastNet Global Forecast System (FourCastNetGFS) is an experimental system set up by the National Centers for Environmental Prediction (NCEP) to produce medium range global forecasts. The model runs on a 0.25 degree latitude-longitude grid (about 28 km) and 13 pressure levels. The model produces forecasts 4 times a day at 00Z, 06Z, 12Z and 18Z cycles. Major atmospheric and surface fields including temperature, wind components, geopotential height, relative humidity and 2 meter temperature and 10 meter winds are available. The products are 6 hourly forecasts up to 10 days. The data format is ...

2020 Redistricting Data File Least Squares Estimates

censusconfidence intervalsdifferential privacydisclosure avoidanceethnicitygroup quartershousinghousing unitsleast squaresnoisy measurementspopulationraceredistrictingvoting age

The 2020 Redistricting Data File Least Squares Estimates data product provides count estimates, and their standard deviations, for each tabulation that was published as part of the persons universe of the 2020 Redistricting Data File for the US, state, county, and tract geographic levels. These estimates are computed using the generalized least squares (GLS) estimator using as input the publicly available 2020 Census persons universe noisy measurement files for both the Redistricting Data File and the Demographic and Housing Characteristics File. The algorithm used to compute this estimate is desc...

A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018)

cyber securityinternetintrusion detectionnetwork traffic

This dataset is the result of a collaborative project between the Communications Security Establishment (CSE) and The Canadian Institute for Cybersecurity (CIC) that use the notion of profiles to generate cybersecurity dataset in a systematic manner. It incluides a detailed description of intrusions along with abstract distribution models for applications, protocols, or lower level network entities. The dataset includes seven different attack scenarios, namely Brute-force, Heartbleed, Botnet, DoS, DDoS, Web attacks, and infiltration of the network from inside. The attacking infrastructure incl...

AI2 TabMCQ: Multiple Choice Questions aligned with the Aristo Tablestore

machine learningnatural language processing

9092 crowd-sourced science questions and 68 tables of curated facts

AI2 Tablestore (November 2015 Snapshot)

machine learningnatural language processing

68 tables of curated facts

Aristo Mini Corpus

csvjsonmachine learning

1,197,377 science-relevant sentences

Aristo Tuple KB

machine learningnatural language processing

294,000 science-relevant tuples

Australasian Genomes

biodiversitybiologyconservationgeneticgenomiclife sciencestranscriptomicswildlife

Australasian Genomes is the genomic data repository for the Threatened Species Initiative (TSI) and the ARC Centre for Innovations in Peptide and Protein Science (CIPPS). This repository contains reference genomes, transcriptomes, resequenced genomes and reduced representation sequencing data from Australasian species. Australasian Genomes is managed by the Australasian Wildlife Genomics Group (AWGG) at the University of Sydney on behalf of our collaborators within TSI and CIPPS.

Baby Open Brains (BOBs) Repository on AWS

life sciencesmagnetic resonance imagingneuroimagingneuroscienceniftipediatricsegmentation

Manually curated and reviewed infant brain segmentations and accompanying T1w and T2w images for a range of 1-9 month old participants from the Baby Connectome Project (BCP)

Usage examples

BIBsnet by Timothy J. Hendrickson et al.
BIBSNet: A Deep Learning Baby Image Brain Segmentation Network for MRI Scans by Timothy J. Hendrickson et al.
View or Download the BOBS Repository by Lucille A. Moore

See 3 usage examples →

BioLiP

bioinformaticschemistrylife sciencesmolecular dockingmoleculeproteinstructural biology

BioLiP is a semi-manually curated database for high-quality, biologically relevant ligand-protein binding interactions. The structure data are collected primarily from the Protein Data Bank (PDB), with biological insights mined from literature and other specific databases. BioLiP aims to construct the most comprehensive and accurate database for serving the needs of ligand-protein docking, virtual ligand screening and protein function annotation.

Usage examples

BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions by Jianyi Yang, Ambrish Roy, and Yang Zhang
BioLiP2: an updated structure database for biologically relevant ligand-protein interactions by Chengxin Zhang, Xi Zhang, Peter L Freddolino, and Yang Zhang
BioLiP API usage by Zhang Lab

See 3 usage examples →

CAFE60 reanalysis

climatesustainability

The CSIRO Climate retrospective Analysis and Forecast Ensemble system: version 1 (CAFE60v1) provides a large ensemble retrospective analysis of the global climate system from 1960 to present with sufficiently many realizations and at spatio-temporal resolutions suitable to enable probabilistic climate studies. Using a variant of the ensemble Kalman filter, 96 climate state estimates are generated over the most recent six decades. These state estimates are constrained by monthly mean ocean, atmosphere and sea ice observations such that their trajectories track the observed state while enabling ...

CCAFS-Climate Data

agricultureclimatefood securitysustainability

High resolution climate data to help assess the impacts of climate change primarily on agriculture. These open access datasets of climate projections will help researchers make climate change impact assessments.

COCO - Common Objects in Context - fast.ai datasets

computer visiondeep learningmachine learning

COCO is a large-scale object detection, segmentation, and captioning dataset. This is part of the fast.ai datasets collection hosted by AWS for convenience of fast.ai students. If you use this dataset in your research please cite arXiv:1405.0312 [cs.CV].

COVID-19 Molecular Structure and Therapeutics Hub

bioinformaticsbiologycoronavirusCOVID-19life sciencesmolecular dockingpharmaceutical

Aggregating critical information to accelerate drug discovery for the molecular modeling and simulation community. A community-driven data repository and curation service for molecular structures, models, therapeutics, and simulations related to computational research related to therapeutic opportunities for COVID-19 (caused by the SARS-CoV-2 coronavirus).

CRC-SAS/SISSA historical seasonal and subseasonal forecast database

agricultureearth observationforecasthydrologymeteorologicalnatural resourceweather

En el marco del Sistema de Información de Sequías del Sur de Sudamérica (SISSA) se ha desarrollado una base de predicciones en escala subestacional y estacional con datos corregidos y sin corregir, con el propósito que permita estudiar predictibilidad en distintas escalas y también que sirva para alimentar modelos de sectores como agricultura e hidrología.

La base contiene datos en escala diaria entre 2000-2019 (sin corregir) y 2010-2019 (corregidos) para diversas variables incluyendo: temperatura media, máxima y mínima, así como también lluvia, viento medio y otras variables pensadas para alimentar modelos hidrológicos y de cultivo.

La base de datos abarca toda el área del Centro Regional del Clima para el sur de sudamérica (CRC-SAS), abarcando desde Bolivia y centro-sur de Brasil hasta la Patagonia incluyendo los países miembros como Chile, Argentina, Brasil, Paraguay, Uruguay y Bolivia.

La base fue generada a p...

CarbonPDF

csvenvironmentalindustryinformation retrievalproduct comparison

A carbon question-answering (QA) dataset specifically designed to facilitate the extraction and analysis of data from real-world carbon reports of computing products. The dataset features annotated metadata, a variety of numerical reasoning tasks, and structured derivations to ensure accurate processing of fragmented and inconsistent information.

CartoStore

bioinformaticsgenomiclife sciencesspatial omicsspatial transcriptomics

Cross-Platform Repository for High-resolution Spatial Transcriptomics Datasets.

Usage examples

Cartloader Documentation by Hyun Min Kang and Weiqiu Cheng
Example CartoStore Repository for Xenium Breast Cancer Dataset by Hyun Min Kang and Weiqiu Cheng
CartoStore Overview by Hyun Min Kang and Weiqiu Cheng

See 3 usage examples →

Central Weather Administration OpenData

climateearth observationearthquakessatellite imageryweather

Various kinds of weather raw data and charts from Central Weather Administration.

Central Weather Bureau OpenData

climateearth observationearthquakessatellite imageryweather

Various kinds of weather raw data and charts from Central Weather Bureau.

Clinical Ultrasound Image Repository

life sciencesmachine learningmedical imagingmedicine

Generic Clinical Ultrasound Data from Random Subjects acquired for Clinical Reasons, to be used for Developing Artificial Intelligence Applications. This dataset is complete with 2000 studies from 2000 subjects (one third each from abdominal, cardiac, and OB/GYN cases)

Cloud to Street - Microsoft Flood and Clouds Dataset

cogcomputer visiondeep learningearth observationfloodsgeospatialmachine learningsatellite imagerysynthetic aperture radar

This dataset consists of chips of Sentinel-1 and Sentinel-2 satellite data. Each Sentinel-1 chip contains a corresponding label for water and each Sentinel-2 chip contains a corresponding label for water and clouds. Data is stored in folders by a unique event identifier as the folder name. Within each event folder there are subfolders for Sentinel-1 (s1) and Sentinel-2 (s2) data. Each chip is contained in its own sub-folder with the folder name being the source image id, followed by a unique chip identifier consisting of a hyphenated set of 5 numbers. All bands of the satellite data, as well a...

Community Multiscale Air Quality (CMAQ) 2019 3D Gridded and Column data from the EPA's Air Quality Time Series (EQUATES) Project

air qualityatmospheremodel

The data are part of EPA’s Air Quality Time Series (EQUATES) Project. The data consist of hourly gridded pollutant concentrations estimates by the Community Multiscale Air Quality (CMAQ) model version 5.3.2 (https://doi.org/10.15139/S3/F2KJSK) for January 1 – December 31, 2019. Model data is provided for two spatial domains : the Northern Hemisphere (108 km x 108km horizontal grid spacing) and the Contiguous United States including parts of Canada and Mexico (12km x 12km horizontal grid spacing). Two types of hourly data are provided: three-dimensional air pollutant concentrations and vert...

DARPA Invisible Headlights Dataset

autonomous vehiclesbroadbandcomputer visionlidarmachine learningsegmentationus

"The DARPA Invisible Headlights Dataset is a large-scale multi-sensor dataset annotated for autonomous, off-road navigation in challenging off-road environments. It features simultaneously collected off-road imagery from multispectral, hyperspectral, polarimetric, and broadband sensors spanning wave-lengths from the visible spectrum to long-wave infrared and provides aligned LIDAR data for ground-truth shape. Camera calibrations, LiDAR registrations, and traversability annotations for a subset of the data are available."

DHARANI Developing Human-Brain Atlas

brain imagescomputer visionlife sciencesmicroscopyneurobiologysegmentation

We introduce DHARANI, the first online platform with three-dimensional (3D) histological reconstructions of the developing human brain from 14 to 24 gestational weeks (GW) across the five fetal brains. DHARANI features 5132 Nissl, hematoxylin and eosin stained, 20 µm coronal and sagittal sections, postmortem MRI, and a neuroanatomical atlas with 466 annotated sections covering ∼500 brain structures. It is accessible online at https://brainportal.humanbrain.in/publicview/index.html. The 3D reconstruction enables a volumetric view of the fetal brain, allowing visualization in all three planes ak...

Usage examples

See 3 usage examples →

Department of Energy's Marine Energy Data Lake

energymarinewater

Data released from projects funded by the Department of Energy's Water Power Technologies Office (DOE WPTO) that are too large or complex to be conveniently accessed by traditional means. The Marine Energy data lake aims to improve and automate access of high-value MHK data sets, making data actionable and discoverable by researchers and industry to accelerate analysis and advance innovation. This data lake is a sister-data lake to the Department of Energy’s Open Energy Data Initiative (OEDI) data lake.

District of Columbia - Classified Point Cloud LiDAR

citiesdisaster responsegeospatialus-dc

LiDAR point cloud data for Washington, DC is available for anyone to use on Amazon S3. This dataset, managed by the Office of the Chief Technology Officer (OCTO), through the direction of the District of Columbia GIS program, contains tiled point cloud data for the entire District along with associated metadata.

ENHANCE.PET 1.6k - Whole-/Total-Body [18F]FDG-PET/CT with CT-Derived Segmentations

cancerlife sciencesmedical imagingniftiradiologysegmentation

Open, multi-center dataset of 1,597 whole-/total-body FDG-PET/CT studies with 130 CT-derived, expert-verified anatomical segmentations per scan (~250 GB). Provided as anonymized NIfTI (PET, CT, labels) with spreadsheet metadata. Designed for segmentation benchmarking, multi-organ analysis, radiomics, and PET/CT AI research.

Usage examples

See 3 usage examples →

EPA Dynamically Downscaled Ensemble (EDDE) Version 1

agricultureair qualityair temperatureatmosphereclimateclimate modelclimate projectionsCMIP5CMIP6ecosystemselevationenvironmentalEulerianeventsfloodsfluid dynamicsgeosciencegeospatialhdf5healthHPChydrologyinfrastructureland coverland usemeteorologicalmodelnear-surface air temperaturenear-surface relative humiditynear-surface specific humiditynetcdfopen source softwarephysicspost-processingprecipitationradiationsimulationsuswaterweather

The data are a subset of the EPA Dynamically Downscaled Ensemble (EDDE), Version 1. EDDE is a collection of physics-based modeled data that represent 3D atmospheric conditions for historical and future periods under different scenarios. The EDDE Version 1 datasets cover the contiguous United States at a horizontal grid spacing of 36 kilometers at hourly increments. EDDE Version 1 includes simulations that have been dynamically downscaled from multiple global climate models (GCMs) under both mid- and high-emission scenarios from the Fifth Coupled Model Intercomparison Project (CMIP5) using the...

EPA Dynamically Downscaled Ensemble (EDDE) Version 2

agricultureair qualityair temperatureatmosphereclimateclimate modelclimate projectionsCMIP5CMIP6ecosystemselevationenvironmentalEulerianeventsfloodsfluid dynamicsgeosciencegeospatialhdf5healthHPChydrologyinfrastructureland coverland usemeteorologicalmodelnear-surface air temperaturenear-surface relative humiditynear-surface specific humiditynetcdfopen source softwarephysicspost-processingprecipitationradiationsimulationsuswaterweather

The data are a subset of the EPA Dynamically Downscaled Ensemble (EDDE), Version 2. EDDE is a collection of physics-based modeled data that represent 3D atmospheric conditions for historical and future periods under different scenarios. The EDDE Version 2 datasets cover the contiguous United States at a horizontal grid spacing of 12 kilometers at hourly increments. EDDE Version 2 will include simulations that have been dynamically downscaled from multiple global climate models (GCMs) under multiple emission scenarios from the Sixth Coupled Model Intercomparison Project (CMIP6) using the Weath...

EPA Hourly Prognostic Meteorological Data

air qualityenvironmentalmeteorologicalregulatoryweather

The data are hourly outputs from the Weather Research and Forecasting (WRF) model generated by the EPA's Office of State Air Partnerships (OSAP), Air Quality Assessment Division, Air Quality Modeling Branch. These data were generated at a 12-km resolution over the Continental United States (12US), beginning for the year 2021 and continuing annually through 2023. These files are intended for use in a broad range of air quality applications, but specifically may be used in dispersion modeling applications that would benefit from the use of the Mesoscale Model Interface (MMIF) tool (https:/...

EPA Risk-Screening Environmental Indicators

environmental

Detailed air model results from EPA’s Risk-Screening Environmental Indicators (RSEI) model.

Epoch of Reionization Dataset

astronomy

The data are from observations with the Murchison Widefield Array (MWA) which is a Square Kilometer Array (SKA) precursor in Western Australia. This particular dataset is from the Epoch of Reionization project which is a key science driver of the SKA. Nearly 2PB of such observations have been recorded to date, this is a small subset of that which has been exported from the MWA data archive in Perth and made available to the public on AWS. The data were taken to detect signatures of the first stars and galaxies forming and the effect of these early stars and galaxies on the evolution of the u...

GATK Test Data

bioinformaticsbiologycancergeneticgenomiclife sciences

The GATK test data resource bundle is a collection of files for resequencing human genomic data with the Broad Institute's Genome Analysis Toolkit (GATK).

GLAD Landsat ARD

agriculturecogearth observationgeospatialnatural resourcesatellite imagery

The Landsat Analysis Ready Data (ARD) created by the Global Land Analysis and Discovery Lab (GLAD) at the University of Maryland serves as a spatially and temporally consistent input for land cover mapping and change detection at global to local scales. The GLAD ARD represents a 16-day time series of globally consistent, tiled Landsat normalized surface reflectance from 1997 to the present operationally updated every 16 days. Only data from 2020 to present available on the AWS, older data is available through the UMD API.

Usage examples

Landsat analysis ready data for global land cover and land cover change mapping by Potapov, P., Hansen, M.C., Kommareddy, I., Kommareddy, A., Turubanova, S., Pickens, A., Adusei, B., Tyukavina A., Ying, Q.
GLAD Landsat ARD Tools by Peter Potapov
GLAD ARD Data Format by Peter Potapov

See 3 usage examples →

GX database for NCBI Foreign Contamination Screen (FCS) Tool Suite

assemblybioinformaticsbiologycontaminationfastageneticgenomehealthlife sciencesSTRIDESwhole genome sequencing

Sequence database used by FCS-GX (Foreign Contamination Screen - Genome Cross-species aligner) to detect contamination from foreign organisms in genome sequences.

Galaxy Evolution Explorer Satellite (GALEX)

astronomy

The Galaxy Evolution Explorer Satellite (GALEX) was a NASA mission led by the California Institute of Technology, whose primary goal was to investigate how star formation in galaxies evolved from the early universe up to the present. GALEX used microchannel plate detectors to obtain direct images in the near-UV (NUV) and far-UV (FUV), and a grism to disperse light for low resolution spectroscopy.

Genome Ark

biodiversitybioinformaticsbiologyconservationgeneticgenomiclife sciences

The Genome Ark hosts genomic information for the Vertebrate Genomes Project (VGP) and other related projects. The VGP is an international collaboration that aims to generate complete and near error-free reference genomes for all extant vertebrate species. These genomes will be used to address fundamental questions in biology and disease, to identify species most genetically at risk for extinction, and to preserve genetic information of life.

Google Satellite Embedding V1

aerial imageryearth observationimagingmachine learningsatellite imagery

COG (Cloud-Optimized GeoTIFF) files that together contain the AlphaEarth Foundations annual Satellite Embedding dataset. It contains the annual embeddings for the years from 2018 to 2024, inclusive.

Gretel Synthetic Safety Alignment Dataset

ai safetymachine learningnatural language processingsynthetic data

A comprehensive dataset designed for aligning language models with safety and ethical guidelines. Contains 8,361 curated triplets of prompts, responses, and safe responses across various risk categories. Each entry includes safety scores, judge reasoning, and harm probability assessments, making it valuable for model alignment, testing, and benchmarking.

Usage examples

See 3 usage examples →

Grid Algorithms and Data Analytics Library (GADAL)

energyenvironmentalmodelsustainability

The aim of this project is to create an easy-to-use platform where various types of analytics can be performed on a wide range of electrical grid datasets. The aim is to establish an open-source library of algorithms that universities, national labs and other developers can contribute to which can be used on both open-source and proprietary grid data to improve the analysis of electrical distribution systems for the grid modeling community. OEDI Systems Integration (SI) is a grid algorithms and data analytics API created to standardize how data is sent between different modules that are run as...

Gulfwide Avian Colony Monitoring Survey Photos

biologyconservationecosystemsenvironmentallabeledobject detection

For this project, The Water Institute (the Institute) and subcontractor Colibri Ecological Consulting, LLC (Colibri) utilized established methods and protocols capable of assessing changes of colonial waterbird populations and their important habitats within individual states and the broader northern Gulf of Mexico region. Data collection activities included: Aerial Photographic Nest Surveys: Implementation of fixed-wing aircraft surveys intended to assess waterbird colonies and document associated nesting within select portions of the northern Gulf of Mexico. Additional detail is provide...

Guy's Breast Cancer Lymph Nodes (GRAPE)

biologybreast cancercancercomputational pathologyhistopathologylife sciences

This is a retrospective dataset of 1523 H&E-stained whole slide images (WSI) of lymph nodes from breast cancer patients. The cohort consisted of 177 patients (122 LN-positive - metastasis was reported in at least 1 LN - and 55 LN-negative patients) with invasive breast carcinoma treated between 1984 and 2002 at Guy’s Hospital London, UK. Slides were scanned and digitised at 40x magnification (0.23 µm/pixel), NanoZoomer H.T2.0 2.0-HT (Hamamatsu Photonics UK, Ltd, Welwyn Garden City, UK). WSIs are in .ndpi format.

HIRLAM Weather Model

agricultureclimateearth observationmeteorologicalweather

HIRLAM (High Resolution Limited Area Model) is an operational synoptic and mesoscale weather prediction model managed by the Finnish Meteorological Institute.

High Resolution Downscaled Climate Data for Southeast Alaska

agricultureclimatecoastalearth observationenvironmentalsustainabilityweather

This dataset contains historical and projected dynamically downscaled climate data for the Southeast region of the State of Alaska at 1 and 4km spatial resolution and hourly temporal resolution. Select variables are also summarized into daily resolutions. This data was produced using the Weather Research and Forecasting (WRF) model (Version 4.0). We downscaled both Climate Forecast System Reanalysis (CFSR) historical reanalysis data (1980-2019) and both historical and projected runs from two GCM’s from the Coupled Model Inter-comparison Project 5 (CMIP5): GFDL-CM3 and NCAR-CCSM4 (historical ru...

Homeland Security and Infrastructure US Cities

disaster responseelevationgeospatiallidar

The U.S. Cities elevation data collection program supported the US Department of Homeland Security Homeland Security and Infrastructure Program (HSIP). As part of the HSIP Program, there were 133+ U.S. cities that had imagery and LiDAR collected to provide the Homeland Security, Homeland Defense, and Emergency Preparedness, Response and Recovery (EPR&R) community with common operational, geospatially enabled baseline data needed to analyze threat, support critical infrastructure protection and expedite readiness, response and recovery in the event of a man-made or natural disaster. As a pa...

Hubble Space Telescope

astronomy

The Hubble Space Telescope (HST) is one of the most productive scientific instruments ever created. This dataset contains calibrated and raw data for all currently active instruments on HST: ACS, COS, STIS, WFC3, and FGS.

Hybrid statistical-dynamic downscaling based on multi-model ensembles in Southeast Asia

climatenetcdfprecipitation

GCMs under CMIP6 have been widely used to investigate climate change impacts and put forward associated adaptation and mitigation strategies. However, the relatively coarse spatial resolutions (usually 100~300km) preclude their direct applications at regional scales, which are exactly where the analysis (e.g., hydrological model simulation) is performed. To bridge this gap, a typical approach is to ‘refine’ the information from GCMs through regional climate downscaling experiments, which can be conducted statistically, dynamically, or a combination thereof. Statistical downscaling establishes ...

ISERV

earth observationenvironmentalgeospatialsatellite imagery

ISS SERVIR Environmental Research and Visualization System (ISERV) was a fully-automated prototype camera aboard the International Space Station that was tasked to capture high-resolution Earth imagery of specific locations at 3-7 frames per second. In the course of its regular operations during 2013 and 2014, ISERV's camera acquired images that can be used primaliry in use is environmental and disaster management.

Image localization - fast.ai datasets

computer visiondeep learningmachine learning

Some of the most important datasets for image localization research, including Camvid and PASCAL VOC (2007 and 2012). This is part of the fast.ai datasets collection hosted by AWS for convenience of fast.ai students. See documentation link for citation and license details for each dataset.

Imaging BSD licensed data and models

biodiversityBiohubbioinformaticsbiologybiomolecular modelingbrain imagescell biologycell imagingimaginglife sciencesmachine learningmicroscopymodelproteinzarr

This dataset contains a diverse range of imaging biological data and models. The data is sourced and curated by a team of experts at Biohub and is made available as part of these datasets only when it is not publicly accessible or requires transformations to support model training.

Usage examples

Documentation for Cytoland by Biohub
Cytoland: robust virtual staining of landmark organelles by Liu, Hirata-Miyasaki, et al.
Quickstart Tutorial for Cytoland by Biohub

See 3 usage examples →

InRad COVID-19 X-Ray and CT Scans

bioinformaticscoronavirusCOVID-19healthlife sciencesmedicineSARS

This dataset is a collection of anonymized thoracic radiographs (X-Rays) and computed tomography (CT) scans of patients with suspected COVID-19. Images are acommpanied by a positive or negative diagnosis for SARS-CoV2 infection via RT-PCR. These images were provided by Hospital das Clínicas da Universidade de São Paulo, Hospital Sirio-Libanes, and by Laboratory Fleury.

K2 Mission Data

astronomy

The K2 mission observed 100 square degrees for 80 days each across 20 different pointings along the ecliptic, collecting high-precision photometry for a selection of targets within each field. The mission began when the original Kepler mission ended due to loss of the second reaction wheel in 2013.

KITTI Vision Benchmark Suite

autonomous vehiclescomputer visiondeep learningmachine learningrobotics

Dataset and benchmarks for computer vision research in the context of autonomous driving. The dataset has been recorded in and around the city of Karlsruhe, Germany using the mobile platform AnnieWay (VW station wagon) which has been equipped with several RGB and monochrome cameras, a Velodyne HDL 64 laser scanner as well as an accurate RTK corrected GPS/IMU localization unit. The dataset has been created for computer vision and machine learning research on stereo, optical flow, visual odometry, semantic segmentation, semantic instance segmentation, road segmentation, single image depth predic...

Kepler Mission Data

astronomy

The Kepler mission observed the brightness of more than 180,000 stars near the Cygnus constellation at a 30 minute cadence for 4 years in order to find transiting exoplanets, study variable stars, and find eclipsing binaries.

Kepler Mission Data

astronomy

The Kepler mission observed the brightness of more than 180,000 stars near the Cygnus constellation at a 30 minute cadence for 4 years in order to find transiting exoplanets, study variable stars, and find eclipsing binaries.

MIMIC-IV Clinical Database Demo

The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access r...

MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset

The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.

MISR Level 1B2 Ellipsoid Data V004

atmosphereclimatecyclone typhoon hurricanedatacenterearth observationgloballandopendaporbit

MI1B2E_004 is the Multi-angle Imaging SpectroRadiometer (MISR) Level 1B2 Ellipsoid Data Version 4 product. It contains Ellipsoid-projected Top-of-Atmosphere (TOA) Radiance, resampled at the surface and topographically corrected, as well as geometrically corrected by PGE22. Data collection for this product is ongoing.MISR itself is an instrument designed to view Earth with cameras pointed in 9 different directions. As the instrument flies overhead, each piece of Earth's surface below is successively imaged by all 9 cameras, in each of 4 wavelengths (blue, green, red, and near-infrared). The...

MegaScenes

benchmarkcomputer visiondeep learninginternet

The MegaScenes Dataset is an extensive collection of around 430k scenes, featuring over 100k structure-from-motion reconstructions and over 2 million registered images. MegaScenes includes a diverse array of scenes, such as minarets, building interiors, statues, bridges, towers, religious buildings, and natural landscapes. The images of these scenes are captured under varying conditions, including different times of day, various weather and illumination, and from different devices with distinct camera intrinsics.

Usage examples

MegaScenes: Scene-Level View Synthesis at Scale by Tung J., Chou G., Cai R., Yang, G., Zhang K., Wetzstein G., et al.
MegaScenes: Scene-Level View Synthesis at Scale by Tung J., Chou G., Cai R., Yang, G., Zhang K., Wetzstein G., et al.
MegaScenes: Scene-Level View Synthesis at Scale by Tung J., Chou G., Cai R., Yang, G., Zhang K., Wetzstein G., et al.

See 3 usage examples →

MetaGraph Sequence Indexes

analysis ready databiodiversitybioinformaticsbiologyfastagenomegenomicgraphinformation retrievallife sciencesmedicinemetagenomicsmicrobiometranscriptomicswhole exome sequencingwhole genome sequencing

The MetaGraph Sequence Indexes dataset comprises full-text searchable index files for raw sequencing data hosted in major public repositories. These include the European Nucleotide Archive (ENA) managed by the European Bioinformatics Institute (EMBL-EBI), the Sequence Read Archive (SRA) maintained by the National Center for Biotechnology Information (NCBI), and the DNA Data Bank of Japan (DDBJ) Sequence Read Archive (DRA).All index files can be used with the MetaGraph framework for sequence search. Indexes can be jointly used for aggregated search in the cloud or can be individually downloaded...

Usage examples

Usage within AWS by Oleksandr Kulkov
A global metagenomic map of urban microbiomes and antimicrobial resistance by Danko D, Bezdan D, Afshin EE, Ahsanuddin S, Bhattacharya C, Butler DJ, Chng KE, Donnellan D, Hecht J, Jackson K, Kuchin K, Karasikov M, Lyons A, Mak L, Meleshko D, Mustafa H, et al.
CloudFormation stack with a Step Function for dataset queries via AWS Batch by Oleksandr Kulkov

See 3 usage examples →

Metagenomic reference libraries for Slacken

bioinformaticsbiologygenomiclife sciencesmetagenomicsmicrobiome

Metagenomic indexes for use with the Slacken taxonomic classification tool

Usage examples

Slacken by Johan Nyström-Persson, Nishad Bapatdhar
Precise and scalable metagenomic profiling with sample-tailored minimizer libraries by Johan Nyström-Persson, Nishad Bapatdhar and Samik Ghosh
Classifying metagenomic samples on AWS ElasticMapReduce by Johan Nyström-Persson

See 3 usage examples →

Model Benchmarking

benchmarkBiohubbiologybiomolecular modelingcell biologylife sciencesmachine learningmodel

This dataset includes data and models relevant to benchmarking multimodal biological models. The data has been sourced and curated by a team of experts at Biohub and is provided as part of these datasets only when it is not publicly available or requires transformation to support effective model benchmarking.

Usage examples

Tabula Sapiens reveals transcription factor expression, senescence effects, and sex-specific features in cell types from 28 human organs and tissues by Tabula Sapiens Consortium et al.
The molecular evolution of spermatogenesis across mammals by Murat, F., et al.
Evaluating SubCell and Related Imaging Models by Biohub

See 3 usage examples →

Multimedia Commons

computer visionmachine learningmultimediavideo

The Multimedia Commons is a collection of audio and visual features computed for the nearly 100 million Creative Commons-licensed Flickr images and videos in the YFCC100M dataset from Yahoo! Labs, along with ground-truth annotations for selected subsets. The International Computer Science Institute (ICSI) and Lawrence Livermore National Laboratory are producing and distributing a core set of derived feature sets and annotations as part of an effort to enable large-scale video search capabilities. They have released this feature corpus into the public domain, under Creative Commons License 0, s...

NASA 1993_AN_NASA Project

elevationice

This data set contains spot elevation measurements of Arctic, Greenland, Antarctic, and Patagonia sea ice and ice surface acquired using the NASA Airborne Topographic Mapper (ATM) instrumentation....

NASA 1993_GR_NASA Project

elevationiceradar

This data set contains depth sounder measurements of ice elevation, ice surface, ice bottom, and ice thickness over Greenland and Antarctica, acquired by the Multichannel Coherent Radar Depth Sounder (MCoRDS)....

NASA 2007_GR_NASA Project

elevationicelidar

This data set contains surface elevation data over Greenland measured by the NASA Land, Vegetation, and Ice Sensor (LVIS), an airborne lidar scanning laser altimeter....

NASA 2008_AN_UTIG Project

climateelevationiceradar

This data set contains vertical acceleration values for Antarctica using the BGM-3 Gravimeter. The data were collected by scientists working on the International Collaborative Exploration of the Cryosphere through Airborne Profiling (ICECAP) project, which is funded by the National Science Foundation (NSF) and the Natural Environment Research Council (NERC) with additional support from NASA Operation IceBridge....

NASA 2009_AK_NASA Project

elevationicelidar

This data set represents a collection of orthorectified images that were created using the NASA Ames Stereo Pipeline. The final images were obtained by processing stereo images from the IceBridge DMS L0 Raw Imagery data set, along with NASA's Land, Vegetation, and Ice Sensor (LVIS) and Airborne Topographic Mapper (ATM) lidar data from the IceBridge LVIS L2 Geolocated Surface Elevation Product and IceBridge ATM L1B Elevation and Return Strength data sets, respectively. The closely related data set IceBridge DMS L3 Ames Stereo Pipeline Photogrammetric DEM provides the corresponding digital e...

NASA 2009_AK_UAF Project

icelidar

This data set contains flight reports from NASA Operation IceBridge Greenland, Arctic, Antarctic, and Alaska missions. Flight reports contain information on region, mission, aircraft model, flight data, purpose of flight, and on-board sensors. The flight reports were collected as part of Operation IceBridge funded aircraft survey campaigns. The corresponding flight lines can be found in the IceBridge L1B Thinned Flight Lines (IPFLT1B) data set....

NASA 2009_AN_CRESIS Project

elevationiceradar

This data set contains depth sounder measurements of ice elevation, ice surface, ice bottom, and ice thickness for Greenland and Antarctica taken from the Multichannel Coherent Radar Depth Sounder (MCoRDS). The data were collected as part of Operation IceBridge funded aircraft survey campaigns....

NASA 2009_AN_NASA Project

earth observationelevationicelidarradar

This data set contains radar echograms taken over Greenland and Antarctica using the Center for Remote Sensing of Ice Sheets (CReSIS) Accumulation Radar instrument. The data were collected as part of Operation IceBridge funded campaigns....

NASA 2009_GR_NASA Project

iceradar

This data set contains contains Greenland ice thickness measurements acquired using the Pathfinder Advanced Radar Ice Sounder (PARIS). The data were collected as part of Operation IceBridge funded campaigns....

NASA 2010_AN_NASA Project

earth observationelevationiceradar

This data set contains Level-3 tomographic ice thickness measurements and ice thickness errors over areas of Greenland and Antarctica. Two of the data files additionally provide bed elevation measurements. The data were derived from measurements taken by the Center for Remote Sensing of Ice Sheets (CReSIS) Multichannel Coherent Radar Depth Sounder (MCoRDS) instrument and were collected as part of NASA Operation IceBridge funded campaigns....

NASA 2010_AN_UTIG Project

climateelevationicelidar

This data set contains geolocated surface elevation measurements captured over Antarctica using the Sigma Space Mapping Photon Counting Lidar and Riegl Laser Altimeter. The data were collected by scientists working on the International Collaborative Exploration of the Cryosphere through Airborne Profiling (ICECAP) project, which was funded by the National Science Foundation (NSF), the Antarctic Climate and Ecosystems Collaborative Research Center, and the Natural Environment Research Council (NERC) with additional support from NASA Operation IceBridge....

NASA 2010_GR_NASA Project

elevationice

This data set contains reprocessed images depicting labels that indicate the sea ice surface category, created by processing IceBridge DMS L0 Raw Imagery with the Open Source Sea-ice Processing Algorithm. The images are provided as TIFF files (.tif). Additional metadata are provided as CSV text files (.csv), which are available as a single zip file named RDSISC4_metadata.zip. An orthorectified version of this data set is available as IceBridge-Related DMS-Derived L4 Sea Ice Surface Cover Classification Orthorectified Images....

NASA 2011_AN_NASA Project

elevationice

This data set contains surface temperature measurements of Arctic and Antarctic sea ice and land ice acquired by the Heitronics KT19.85 Series II Infrared Radiation Pyrometer. The instrument is operated by the Wallops Flight Facility (WFF) as part of the ATM instrument suite. The data were collected as part of the Operation IceBridge funded survey campaigns....

NASA 2011_AN_UTIG Project

climateelevationicelidar

This data set contains geolocated photon elevations captured over Antarctica using the Sigma Space photon counting lidar. The data were collected by scientists working on the International Collaborative Exploration of the Cryosphere through Airborne Profiling (ICECAP) project, which was funded by the National Science Foundation (NSF), the Antarctic Climate and Ecosystems Collaborative Research Center, and the Natural Environment Research Council (NERC) with additional support from NASA Operation IceBridge....

NASA 2012_AN_NASA Project

earth observationiceradar

This data set contains radar echograms taken from the Center for Remote Sensing of Ice Sheets (CReSIS) ultra Multichannel Coherent Radar Depth Sounder (MCoRDS) over land and sea ice in the Arctic and Antarctic....

NASA 2012_AN_UTIG Project

ice

This data set contains vertical acceleration values for Antarctica using the CMG 1A dynamic gravity meter. The data were collected by scientists working on the Investigating the Cryospheric Evolution of the Central Antarctic Plate (ICECAP) project, which is funded by the National Science Foundation (NSF) and the Natural Environment Research Council (NERC) with additional support from NASA Operation IceBridge....

NASA 2012_GR_GBMF Project

This data set contains Greenland and Antarctica gravity measurements taken from the Sander Geophysics AIRGrav airborne gravity system....

NASA 2013_AK_UAF Project

elevationicelidarradar

This data set contains radar echograms acquired by the University of Alaska Fairbanks High-Frequency Radar Sounder over select glaciers in Alaska....

NASA 2013_AN_NASA Project

elevationicelidar

This data set contains spot elevation measurements of Arctic and Antarctic sea ice, and Greenland, Antarctic Peninsula, and West Antarctic region ice surface acquired using the NASA Airborne Topographic Mapper (ATM) instrumentation. The data were collected as part of Operation IceBridge funded aircraft survey campaigns....

NASA 2015_AK_UAF Project

elevationicelidarradar

This data set contains radar echograms acquired by the Arizona Radio-Echo Sounder (ARES) over select glaciers in Alaska....

NASA 2015_AN_NASA Project

icelidar

This data set contains geotagged images captured by NASA Digital Mapping Cameras, which were mounted alongside the Land, Vegetation, and Ice Sensor (LVIS), an airborne lidar scanning laser altimeter....

NASA 2017_AN_NASA Project

elevationice

This data set contains spot elevation measurements with corresponding waveforms of Arctic and Antarctic sea ice, and Greenland, Antarctic Peninsula, and West Antarctic region ice surface acquired using the NASA Airborne Topographic Mapper (ATM) instrumentation. The data were collected as part of Operation IceBridge funded aircraft survey campaigns....

NASA 2017_GR_NASA Project

elevationicelidar

This data set contains gravity measurements taken over Greenland and Antarctica by the Lamont-Doherty Earth Observatory (LDEO) Gravimeter Suite. The data were collected as part of Operation IceBridge funded campaigns....

NASA 2018_AN_NASA Project

elevationicelidar

This data set contains geolocated waveforms of Greenland, Arctic, and Antarctic sea ice measured by the Airborne Topographic Mapper (ATM) near-infrared (NIR) lidar. The data complement, and are intended to be used with, the IceBridge Narrow Swath ATM L1B Elevation and Return Strength with Waveforms data, which are measured at green wavelength. The data were acquired as part of aircraft survey campaigns funded by Operation IceBridge....

NASA ABLE-2 Project

carbonlidar

ABLE-2A_Aerosol_AircraftInSitu_Electra_Data is the in-situ aerosol data collected onboard the NASA Electra aircraft during the Amazon Boundary Layer Experiment - 2A (ABLE-2A) suborbital campaign. Data collection for this product is complete. From 1983-2001, NASA conducted a collection of field campaigns as part of the Global Tropospheric Experiment (GTE). Among those were the Amazon Boundary Layer Experiment (ABLE 2) campaigns. ABLE 2 was divided into two sub-campaigns, ABLE 2A (dry season) and ABLE 2B (wet season). ABLE 2A took place from July-August 1985, while ABLE 2B took place from April-...

NASA ABLE-3 Project

carbonclimatelidar

ABLE-3A_TraceGas_AircraftInSitu_Electra_Data is the in-situ trace gas data collected onboard the NASA Electra aircraft during the Arctic Boundary Layer Expedition - 3A (ABLE-3A) suborbital campaign. Data using grab samples, gas chromatography, and Laser Induced Fluorescence (LIF) are featured in this collection. Data collection for this product is complete. From 1983-2001, NASA conducted a collection of field campaigns as part of the Global Tropospheric Experiment (GTE). Among those were the Arctic Boundary Layer Expedition (ABLE 3) campaigns. ABLE 3 was broken into two sub-campaigns: ABLE 3A ...

NASA ABoVE Project

atmospherecarbonclimatecogearth observationelevationgeospatialhydrologyiceland coverlidarnetcdfoceansprecipitationradarsatellite imagerysoil moisturestacweather

This document presents the Concise Experiment Plan for NASA's Arctic-Boreal Vulnerability Experiment (ABoVE) to serve as a guide to the Program as it identifies the research to be conducted under this study. Research for ABoVE will link field-based, process-level studies with geospatial data products derived from airborne and satellite remote sensing, providing a foundation for improving the analysis and modeling capabilities needed to understand and predict ecosystem responses and societal implications. The ABoVE Concise Experiment Plan (ACEP) outlines the conceptual basis for the Field C...

NASA ACCP Project

carbon

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA ACRIM III Project

atmosphereclimatesatellite imagery

Launch and mission info for NASA's AcrimSat Earth satellite, which for 14 years monitored solar radiation and its effects on Earth's atmosphere and climate change....

NASA ACT-America Project

atmospherecarbonearth observationelevationhdfhdf5lidarnetcdfweather

The ACT-America Campaign Catalog provides information about the airborne campaigns of the Atmospheric Carbon and Transport (ACT-America) project. ACT-America advanced atmospheric greenhouse gas inversions to a high level of accuracy and precision through new methods and models that improved knowledge of atmospheric transport, prior flux models, and space-based observations. The catalog compiles flight details for the five campaigns conducted during Summer 2016, Winter 2017, Fall 2017, Spring 2018, and Summer 2019 (2016-05-27 to 2019-07-26) across three regions of the eastern and central United...

NASA AEHYP Project

climateearth observation

The Airborne Hyperspectral Reflectance Indian Cave Nebraska Multi-Day (AEHYPICNE1M) data are from the Indian Cave Forest Global Earth Observatory (ForestGeo) plot in Indian Cave State Park in southeastern Nebraska. The data have a spatial resolution of 1 meter (m) and fall in the spectral range of 400-1000 nanometers (nm). The data can be used by researchers in developing new capabilities for the remote sensing of forest diversity and function, as well as a global biodiversity monitoring system. The Indian Cave ForestGeo plot airborne data were collected on September 6, 2019, and August 4, 202...

NASA AQUARIUS SAC-D Project

iceoceanssatellite imagery

The version 5.0 Aquarius CAP Level 2 product contains the fourth release of the AQUARIUS/SAC-D orbital/swath data based on the Combined Active Passive (CAP) algorithm. CAP is a P.I. produced dataset developed and provided by JPL. This Level 2 dataset contains sea surface salinity (SSS), wind speed and wind direction data derived from 3 different radiometers and the onboard scatterometer. The CAP algorithm simultaneously retrieves the salinity, wind speed and direction by minimizing the sum of squared differences between model and observations. The main improvements in CAP V5.0 relative to the ...

NASA ARCSIX Project

atmosphereearth observationicelidarprecipitationradar

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA ASCENDS Airborne Project

carbonhdflidar

This dataset provides in situ airborne measurements of atmospheric carbon dioxide (CO2) over California and Nevada on February 10-11, 2016. Measurements were taken onboard a DC-8 aircraft during this Active Sensing of CO2 Emissions over Nights, Days and Seasons (ASCENDS) airborne deployment. CO2 was measured with NASA's Atmospheric Vertical Observations of CO2 in the Earth's Troposphere (AVOCET) instrument while over California and Nevada. The objective of this deployment was to assess the performance of the 2016 version of the CO2 Sounder LiDAR. The two flights were flown to compare r...

NASA ASIA-AQ Project

earth observationlidarsatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA ASTER GED Project

hdfhdf5icesatellite imagery

The AG1kmB Version 3 dataset was decommissioned as of December 14, 2016. Users are encouraged to use the ASTER Global Emissivity Dataset 1-kilometer AG1km dataset in HDF5. The Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Emissivity Dataset (GED) land surface temperature and emissivity (LST&E) data products are generated using the ASTER Temperature Emissivity Separation (TES) algorithm with a Water Vapor Scaling (WVS) atmospheric correction method using Moderate Resolution Imaging Spectroradiometer (MODIS) MOD07 atmospheric profiles and the MODerate sp...

NASA ATDD Project

elevationhdficeoceanssatellite imagery

This is a subset of AMSR-E rain rate product along CloudSat field of view track. The goal of the subset is to select and return AMSR-E data that are within -100 km across the CloudSat track. Thus resultant subset swath is 45 pixels cross-track. Apart from that, all efforts are made to preserve the original HDF-EOS formatting of the source full-sized data. The Advanced Microwave Scanning Radiometer - Earth Observing System (AMSR-E) instrument on the NASA EOS Aqua satellite provides global passive microwave measurements of terrestrial, oceanic, and atmospheric variables for the investigation of ...

NASA ATLAS Project

atmospheresatellite imagery

The Shuttle Solar Backscatter Ultraviolet (SSBUV) Level-2 Ozone data are available for eight space shuttle missions flown between 1989 and 1996. SSBUV, a successor to the SBUV flown on the Nimbus-7 satellite, is nearly identical to the SBUV/2 instruments flying on the NOAA satellites. Data are available in the ASCII AMES text format. Ozone profiles of the upper atmosphere and total column ozone values are available for the following time periods: Flight #1: 1989 October 19, 20, 21. Flight #2: 1990 October 7, 8, 9. Flight #3: 1991 August 3, 4, 5, 6. Flight #4: 1992 March 29, 31. Flight #5: 1993...

NASA ATOMIC Project

atmospherenetcdfoceans

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA ATom Project

atmospherecarbonclimateicenetcdfoceansprecipitationsatellite imagery

This dataset provides observations collected during eleven airborne campaigns from 2006–2017 and associated input and output from nine widely used chemical transport models (CTMs). The airborne campaigns include ARCTAS-A, ARCTAS-B, ATom-1 and ATom-2, CalNex, DC3, INTEX-B, KORUS-AQ, MILAGRO, SEAC4RS, and WINTER, and they sampled mainly tropospheric air over the conterminous U.S. and the state of Alaska, Mexico, Canada, Greenland, and South Korea and remote areas over the Arctic, Pacific, Southern, and Atlantic Oceans. The CTMs are the AM4.1, CCSM4, GEOS-5, GEOS-Chem TOMAS, GEOS-Chem v10, ...

NASA AVIRIS Project

atmospherecarbongeospatialicenetcdf

This dataset provides attributed geospatial and tabular information for identifying and querying flight lines of interest for the Airborne Visible InfraRed Imaging Spectrometer-Classic (AVIRIS-C) and Airborne Visible InfraRed Imaging Spectrometer-Next Generation (AVIRIS-NG) Facility Instrument collections. It includes attributed shapefile and GeoJSON files containing polygon representation of individual flights lines for all years and separate KMZ files for each year. These files allow users to visualize and query flight line locations using Geographic Information System (GIS) software. Tables...

NASA AVISO Project

climatesatellite imagery

This dataset contains absolute dynamic topography (similar to sea level but with respect to the geoid) binned and averaged monthly on 1 degree grids. The coverage is from October 1992 to December 2010. These data were provided by AVISO (French space agency data provider) to support the CMIP5 (Coupled Model Intercomparison Project Phase 5) under the World Climate Research Program (WCRP) and was first made available via the JPL Earth System Grid. The dynamic topography are derived from sea surface height measured by several satellites including Envisat, TOPEX/Poseidon, Jason-1 and OSTM/Jason-2, ...

NASA AVUELO Project

atmospherenetcdfoceans

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA AfriSAR Project

carbonearth observationelevationicelidarradar

This data set contains geotagged images collected over Gabon, Africa. The images were taken by the NASA Digital Mapping Camera paired with the Land, Vegetation, and Ice Sensor (LVIS), an airborne lidar scanning laser altimeter. The data were collected as part of a NASA campaign, in collaboration with the European Space Agency (ESA) mission AfriSAR....

NASA AirMOSS Project

carbonprecipitationradarsoil moisture

This dataset provides in situ measurements of soil temperature, moisture, conductivity, measured diameter of tree at breast height (DBH) and total height collected at the Harvard Forest, Petersham, Massachusetts, USA, during October 2012 and July - August 2013. These measurements were collected in support of the Airborne Microwave Observatory of Subcanopy and Subsurface (AirMOSS) project to validate root-zone soil measurements and carbon flux model estimates....

NASA Applications Technology Satellite Project

satellite imagery

GVHRRATS6IMIR is the Geosynchronous Very High Resolution Radiometer (GVHRR) Black and White Infrared Images on 70mm Film data product from the sixth Applications Technology Satellite (ATS-6). This set of IR imagery (10.5 to 12.5 micrometer, with an 11 km footprint at the sub-satellite point) was originally produced on commercial image-generation equipment from digital tapes and was made available on 70-mm film, from which they were later scanned to digital TIFF image files. Each TIFF scan contains 2 or 3 pictures, and there are several hundred scans from an original 70 mm film roll which are c...

NASA Aqua Project

atmospherecarbonclimateearth observationelevationhdficeland covernetcdfoceansprecipitationsatellite imagerysoil moistureweather

AIRS is a facility instrument whose goal is to support climate research and improve weather forecasting Launched into Earth-orbit on May 4, 2002, the Atmospheric Infrared Sounder, AIRS, moves climate research and weather prediction into the 21st century....

NASA Aura Project

atmospherecarbonclimateelevationhdfhdf5iceland covernetcdfsatellite imagery

Aura (Latin for breeze) obtains measurements of ozone, aerosols and key gases throughout the atmosphere....

NASA BOREAS Project

atmospherecarbonclimateearth observationhydrologyiceland coverlidaroceansprecipitationradarsatellite imagerysoil moistureweather

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA BigFoot Project

carbonearth observationland coversatellite imagery

The BigFoot project gathered field data for selected EOS Land Validation Sites in North America from 1999 to 2003. Data collected and derived for varying intervals at the BigFoot sites and archived with this data set include FPAR, nitrogen content, allometry equations, root biomass, LAI, tree biomass, soil respiration, NPP, landcover images, and vegetation inventories.Each site is representative of one or two distinct biomes, including the Arctic tundra; boreal evergreen needleleaf forest; temperate cropland, grassland, and deciduous broadleaf forest; desert grassland and shrubland. The project co...

NASA BioSCape Project

elevationiceland coverlidarnetcdfsatellite imagery

BioSCape...

NASA Biodiversity Project

elevationland coversatellite imagery

This dataset contains vegetation canopy metrics for the Greater Kruger National Park region of South Africa for 2007-2010 and 2015-2024. Metrics include relative height 98th percentile (RH98), fractional canopy cover, and foliage height diversity. This dataset contains vegetation canopy metrics for the Greater Kruger National Park region of South Africa. Metrics include relative height 98th percentile (RH98), fractional canopy cover, and foliage height diversity. They were derived by modeling a sample of Global Ecosystem Dynamics Investigation (GEDI) Level 2A Elevation and Height Metrics and L...

NASA CALIPSO Project

atmosphereclimateearth observationhdficelidarnetcdfsatellite imageryweather

Earth Orbiter...

NASA CAR Project

atmospherecarbonclimateearth observationiceoceanssatellite imageryweather

CAR will fly in 2022-2025 for the NASA’s Student Airborne Science Activation (SaSa) project. GSFC scientists and engineers will operate CAR together with...

NASA CARAFE Project

carbon

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA CARVE Project

atmospherecarbonelevationicenetcdfsatellite imageryweather

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA CERES Project

atmosphereclimateelevationhdficenetcdfoceanssatellite imagery

CER_BDS_Terra-FM2_Edition4 is the Clouds and the Earth's Radiant Energy System (CERES) Bidirectional Scans (BDS) Terra Flight Model 2 (FM2) Edition 4 data product, which is collected using the CERES-FM2 instrument on the Terra platform. CER_BDS_Terra-FM2_Edition4 includes geolocated and calibrated Top of the Atmosphere (TOA) filtered radiances and other instrument data. Data collection for this product is ongoing. Each CERES BDS data product contains twenty-four hours of Level-1B data for each CERES scanner instrument mounted on each spacecraft. BDS includes samples of normal and short Ear...

NASA CIOSS Project

The Gridded Altimeter Fields with Enhanced Coastal Coverage data product contains Sea Surface Height Anomalies (SSHA or SLA) and zonal and meridional geostrophic velocities for the US west coast encompassing 35.25 deg-48.5 deg N latitude and 227.75 deg-248.5 deg E longitude. This annually updated data product extends from October 14, 1992 through November 4, 2009. SSHA and current velocities are derived from the AVISO quarter degree DT UPD MSLA version 3.0 grids, 0.75 deg and greater away from the coast. Values within 0.75 deg of the coast are derived from tide gauge observations and interpola...

NASA CITE-2 Project

oceans

CITE-2_Aerosol_AircraftInSitu_Electra_Data is the in-situ trace gas data collected onboard the NASA Electra aircraft Chemical Instrumentation Test and Evaluation - 2 (CITE-2) suborbital campaign. Data using forward scattering spectrometers are featured in this collection. Data collection for this product is complete. During 1983-2001, NASA conducted a collection of field campaigns as a part of the Global Tropospheric Experiment (GTE) to develop advanced instrumentation to measure critical atmospheric trace gases and quantify their sources, sinks, and distribution. Among those were the CITE mis...

NASA CITE-3 Project

atmospherecarbonoceans

CITE-3_Aerosol_AircraftInSitu_Electra_Data is the in-situ trace gas data collected onboard the NASA Electra aircraft Chemical Instrumentation Test and Evaluation - 3 (CITE-3) suborbital campaign. Data using chemiluminescence are featured in this collection. Data collection for this product is complete. During 1983-2001, NASA conducted a collection of field campaigns as a part of the Global Tropospheric Experiment (GTE) to develop advanced instrumentation to measure critical atmospheric trace gases and quantify their sources, sinks, and distribution. Among those were the Chemical Instrumentatio...

NASA CLASIC07 Project

land coverradarsatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA CMS Project

atmospherecarbonclimatecogearth observationelevationgeospatialhydrologyiceland coverlidarnetcdfoceansprecipitationradarsatellite imagerysoil moistureweather

This dataset provides gridded estimates of aboveground biomass (AGB) for live dry woody vegetation density in the form of both stock for the baseline year 2003 and annual change in stock from 2003 to 2016. Data are at a spatial resolution of approximately 500 m (463.31 m; 21.47 ha) for three geographies: the biogeographical limit of the Amazon Basin, the country of Mexico, and a Pantropical belt from 40 degrees North to 30 degrees South latitudes. Estimates were derived from a multi-step modeling approach that combined field measurements with co-located LiDAR data from NASA ICESat Geoscience L...

NASA COMEX Project

hdfoceanssatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA COWVR-TEMPEST/STP-H8 Project

hdfhdf5oceanssatellite imagery

This data set includes satellite-based observations of calibrated, geo-located antenna temperature and brightness temperatures, along with the sensor telemetry used to derive those values. Brightness temperatures are derived from the microwave band frequencies 18.7 GHz, 23.8 GHz, and 34.5 GHz. This product is best suited for a cal/val user or sensor expert. These level 1c measurements make up the temperature sensor data record (TSDR) from the COWVR (Compact Ocean Wind Vector Radiometer) sensor aboard the international space station (ISS), starting in January 2022 forward-streaming to PO.DAAC till ...

NASA CSDA Project

geospatialicesatellite imagery

The GeoEye-1 Level 1B Multispectral 4-Band L1B Satellite Imagery collection contains satellite imagery acquired from Maxar Technologies (formerly known as DigitalGlobe) by the Commercial Smallsat Data Acquisition (CSDA) Program. Imagery is collected by the GeoEye-1 satellite using the GeoEye-1 Imaging System across the global land surface from September 2008 to the present. This satellite imagery is in the visible and near-infrared waveband range with data in the blue, green, red, and near-infrared wavelengths. The imagery has a spatial resolution of 1.84m at nadir (1.65m before summer 2013) a...

NASA CWIC Project

oceansprecipitationsatellite imagery

The The MODIS Near Real Time (NRT) product, MOD09Q1N provides Band 1 and 2 data at 250 meter resolution in a daily rolling 8-day gridded level-3 product in the Sinusoidal projection. Each MOD09Q1N pixel contains the best possible L2G observation during an 8-day period as selected on the basis of high observation coverage low view angle the absence of clouds or cloud shadow and aerosol loading. Science Data Sets provided for this product include reflectance values for Bands 1 and 2 and a quality rating....

NASA CYGNSS Project

atmosphereclimateicenetcdfoceansprecipitationradarsatellite imagerysoil moistureweather

This dataset contains the version 1.0 CYGNSS level 3 ocean microplastic concentration data record, which provides 18 netCDF files, each containing one month of daily gridded maps of microplastic number density (#/km^2). Microplastic concentration number density is indirectly estimated by an empirical relationship between ocean surface roughness and wind speed (Evans and Ruf, 2021). User caution is advised in regions containing independent, non-correlative factors affecting ocean surface roughness, such as anomalous atmospheric conditions within the Intertropical Convergence Zone, biogenic surf...

NASA Climate Project

atmospherecarbonclimateearth observationicenetcdfprecipitationsatellite imageryweather

This dataset provides two 30-year climate normal data products for conditions during the last glacial maximum (LGM; ~18,000 years ago) and a modern time period (1975-2005) for the entire state of Alaska. The first set of products are monthly climate variable averages at 60 m resolution, including: minimum, maximum, and average temperatures, total precipitation, total surface radiation, rain, snow, potential evapotranspiration (PET), actual evapotranspiration (AET), and water deficit. The second set of products are annual summary climate variable averages for the same variables (excepting avera...

NASA DIS Project

atmosphereclimategeospatialiceoceans

The ORNL DAAC Spatial Data Access Tool (SDAT) is a suite of Web-based applications that enable users to visualize and download spatial data in user-selected spatial/temporal extents, file formats, and projections. SDAT incorporates Open Geospatial Consortium (OGC) standard Web services, including Web Coverage Service (WCS), Web Map Service (WMS), and Web Feature Service (WFS). The SDAT provides ORNL DAAC-archived data sets and additional relevant data products including agriculture, atmosphere, biosphere, climate indicators, human dimensions, land surface, oceans, terrestrial hydrosphere data ...

NASA DSCOVR Project

atmosphereclimateearth observationhdfhdf5iceoceanssatellite imageryweather

Deep Space Climate Observatory (DSCOVR) DSCOVR National Institute of Standards and Technology Advanced Radiometer (NISTAR) was explicitly designed to measure the global daytime radiation budget for an entire hemisphere using active cavity radiometers for three channels: total (0.2 - 100 um), SW (0.2 - 4.0 um), and near-infrared (0.7 - 4.0 um). To derive the Earth Radiation Budget (ERB) from NISTAR measurements, the Short Wave (SW) radiances need to be unfiltered first before they can be subtracted from the total to yield the Long Wave (LW) (4 - 100 um) radiances. Additionally, the Earth's ...

NASA Daymet Project

climateicenetcdfprecipitationweather

This dataset provides annual climate summaries derived from Daymet Version 4 R1 daily data at a 1 km x 1 km spatial resolution for five Daymet variables: minimum and maximum temperature, precipitation, vapor pressure, and snow water equivalent. Annual averages are provided for minimum and maximum temperature, vapor pressure, and snow water equivalent, and annual totals are provided for the precipitation variable. Each data file is provided as a single year by variable and covers the same period of record as the Daymet V4 R1 daily data. The annual climatology files are derived from the larger d...

NASA Delta-X Project

atmospherecarboncogearth observationelevationiceland coverlidarnetcdfoceansradarsatellite imagerystacweather

This dataset contains estimates of forest aboveground biomass (AGB) across the Atchafalaya and Terrebonne Basins, Louisiana, US. AGB was derived from AVIRIS-NG surface reflectance and UAVSAR products. L2B BRDF-adjusted surface reflectance was produced after applying atmospheric correction to L2 Hemispherical-Directional surface reflectance from NASA's AVIRIS-NG instrument. A polarimetric decomposition of the UAVSAR Level 1 (L1) Single Look Complex (SLC) stack product was used. To estimate AGB, local pixel reflectance spectra and radar scattering component pixels coincident with in situ for...

NASA ECCO Project

atmosphereclimateiceoceanssatellite imagery

This dataset provides ancillary data for the ECCO Version 4 Release 4 (V4r4) ocean and sea-ice state estimate, and is intended for expert users to reproduce the state estimate. The ancillary data include documentation files, files required to initialize the model, forcing fields, binary input grid files, observational data used to constrain the model, model equivalent of observed profiles, files related to atmospheric flux-forced experiments, and some script files. Estimating the Circulation and Climate of the Ocean (ECCO) state estimates are dynamically and kinematically-consistent reconstruc...

NASA ECOSTRESS Project

atmospherecarbonearth observationiceland coversatellite imagery

The ECO1BRAD Version 1 data product was decommissioned on May 21, 2025. Users are encouraged to use the ECO_L1CT_RAD Version 2 and ECO_L1CG_RAD Version 2 data products. The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station (ECOSTRESS) mission measures the temperature of plants to better understand how much water plants need and how they respond to stress. ECOSTRESS is attached to the International Space Station (ISS) and collects data globally between 52 degrees N and 52 degrees S latitudes. . The ECO1BRAD Version 1 data product provides at-sensor calibrated radiance values retri...

NASA EMIT Project

carboncogelevationnetcdfradar

The Earth Surface Mineral Dust Source Investigation (EMIT) instrument measures surface mineralogy, targeting the Earth’s arid dust source regions. EMIT is installed on the International Space Station (ISS) and uses imaging spectroscopy to take mineralogical measurements of sunlit regions of interest between 52° N latitude and 52° S latitude. An interactive map showing the regions being investigated, current and forecasted data coverage, and additional data resources can be found on the VSWIR Imaging Spectroscopy Interface for Open Science (VISIONS) EMIT Open Data Portal. The EMIT Level 1B At-S...

NASA EOS LAND VAL Project

atmospherecarbonclimateearth observationiceland covernetcdfradarsatellite imagerysoil moisture

This data set provides field measurements of diameter, tree height, and crown dimensions for 1,513 trees in 30 plots at the La Selva Biological Station in Costa Rica. Fourteen of these plots were in undisturbed primary forest, six were in primary forest which had been selectively logged, seven were secondary forests, and three were abandoned pastures reverting to forest. The diameter and height data were used to calculate aboveground biomass for each of the 30 plots. The crown measurements were used to estimate a vertical profile for each plot, showing the vegetation volume in 1 meter incremen...

NASA EOS Project

atmosphereearth observationoceanssatellite imagery

AM1EPHNE is the Terra Near Real Time (NRT) 2-hour spacecraft Extrapolated ephemeris data file in native format. The file name format is the following: AM1EPHNE.Ayyyyddd.hhmm.vvv.yyyydddhhmmss where from left to right: E = Extrapolated; N = Native format; A = AM1 (Terra); yyyy = data year, ddd = Julian data day, hh = data hour, mm = data minute; vvv = Version ID; yyyy = production year, ddd = Julian production day, hh = production hour, mm = production minute, and ss = production second. Data set information:http://modis.gsfc.nasa.gov/sci_team/...

NASA EOSDIS Project

atmospherenetcdfsatellite imagery

This is the version 3 Atmospheric Trace Molecule Spectroscopy (ATMOS) Level 1 product containing spectra and runlog (i.e. ) information in a netCDF format. ATMOS is an infrared spectrometer (a Fourier transform interferometer) designed to derive vertical concentrations of various trace gases in the atmosphere, particularly the ozone depleting chlorine and fluorine based molecules. The transmission spectra are ratioed from ATMOS high sun observations, on a scale of 0 to 1. Data files also include time, geolocation and other information. The data were collected during four space shuttle missions...

NASA ESDIS Project

carbon

This is a global simulation of black carbon (BC) aerosol concentrations and daily deposition (wet+dry) from the FLEX-ible PARTicle (FLEXPART) Lagrangian particle dispersion model version 10.4. The FLEXPART model code are open source and freely available....

NASA FIFE Project

atmospherecarbonclimateearth observationelevationhydrologyicelidaroceansprecipitationradarsatellite imagerysoil moisturestacweather

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA FLASHFLUX Project

atmosphereclimatehdfsatellite imagery

FLASH_SSF_Aqua-FM3-MODIS_Version4A is the Fast Longwave And Shortwave Radiative Fluxes (FLASHFlux) Clouds and Radiative Swath (SSF) Aqua-FM3-MODIS data in HDF Version 4A data product. This product consists of Low latency (< 5 days from observation) Top-of-Atmosphere (TOA) fluxes and parameterized surface radiative fluxes at Clouds and the Earth's Radiant Energy System (CERES) Single Scanner Footprint (SSF) level for quick-look purposes. Data collection for this product is in progress. FLASHFlux data are a product line of the CERES project designed to process and release TOA and surface ...

NASA FLDAS Project

climateprecipitationsatellite imagerysoil moisture

This dataset contains a series of land surface parameters simulated from the Noah 3.6.1 model in the Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System (FLDAS), adapted from Land Information System (LIS7). The dataset contains 28 parameters in a 0.10 degree spatial resolution and from January 2019 to present. The temporal resolution is monthly and the spatial coverage is global (60S, 180W, 90N, 180E). The simulation was forced by a combination of the Global Data Assimilation System (GDAS) data and Climate Hazards Group InfraRed Precipitation with Station Preliminary ...

NASA Fluxnet Project

atmospherecarbonclimatesoil moisture

CO2 and water vapor fluxes and ecosystem characteristics were measured at 24 sites along a 317-km transect from the Arctic coast to the latitudinal treeline in Alaska during the growing seasons of 1994-1996. The sites were stratified to sample the ranges of climate, physiography, soil moisture, and vegetation type within the region. Our main objective was to understand what factors control variations in CO2 and water vapor exchange across the region. We therefore developed a spatially extensive approach of documenting fluxes for 1-2 weeks at each of the sites in order to study as many sites as...

NASA G-LiHT Project

climateelevationlidar

Goddard’s LiDAR, Hyperspectral, and Thermal Imager (G-LiHT) mission is a portable, airborne imaging system that aims to simultaneously map the composition, structure, and function of terrestrial ecosystems. G-LiHT primarily focuses on a broad diversity of forest communities and ecoregions in North America, mapping aerial swaths over the Conterminous United States (CONUS), Alaska, Puerto Rico, and Mexico. The purpose of G-LiHT’s Aerial Orthomosaic data product (GLORTHO) is to provide orthorectified high-resolution aerial photography. This data is provided as a supplement to other G-LiHT data pr...

NASA GCOM-W Project

earth observationsatellite imagerysoil moisture

AMSR2/GCOM-W1 downscaled surface soil moisture (LPRM) L2B V001 is a Level 2 (swath) data set. Its land surface parameters, surface soil moisture, land surface (skin) temperature, and vegetation water content, are derived from passive microwave remote sensing data from the Advanced Microwave Scanning Radiometer 2 (AMSR2), using the Land Parameter Retrieval Model (LPRM). Each swath is packaged with associated geolocation fields. The data set covers the period from May 2012, when the Japan Aerospace Exploration Agency (JAXA) Global Change Observation Mission-1st Water GCOM-W1 satellite was launch...

NASA GEDI Project

carbonclimateelevationhdfhdf5iceland coverlidarsatellite imagery

GEDI Version 1 data products were decommissioned on February 15, 2022. Users are advised to use the improved GEDI01_B Version 2 data product. The Global Ecosystem Dynamics Investigation (GEDI) mission aims to characterize ecosystem structure and dynamics to enable radically improved quantification and understanding of the Earth’s carbon cycle and biodiversity. The GEDI instrument produces high resolution laser ranging observations of the 3-dimensional structure of the Earth. GEDI is attached to the International Space Station and collects data globally between 51.6 degrees N and 51.6 degrees S...

NASA GEOS-3 Project

oceanssatellite imagery

These data consist of Geos-3 altimeter measurements produced by NOAA/NODC/Laboratory for Satellite Altimetry. The dataset contains 5,006,956 altimetric sea surface heights and supporting information such as sea state, wind speed, Schwiderski ocean tide height, and Cartwright solid-tide height. Corrections for altimeter bias, wet and dry troposheric delays, and electromagnetic bias are not included. The corrections in this dataset (tides and even orbit height) are old and not very accurate. This dataset should only be used by those with an expertise in altimetry. Measurements are compressed to ...

NASA GFSAD Project

precipitationsatellite imagery

The Landsat-Derived Global Irrigated-Cropland Product Level 1 2020 (LGRIP30_L1_IRRI) Version 2 data provides high-resolution, 30 meter (m) cropland data to assist and address food and water security issues of the twenty-first century. As an extension of the Global Food Security-support Analysis Data (GFSAD) project, LGRIP_L1_IRRI V2 maps agricultural lands by dividing them into 32 irrigated cropland types and calculates applicable cropland areas across the globe. LGRIP data are produced using Landsat 8 time-series satellite sensor data for the 2019 through 2021 time period to create a nominal ...

NASA GHISA Project

ice

The Global Hyperspectral Imaging Spectral-library of Agricultural crops (GHISA) is a comprehensive compilation, collation, harmonization, and standardization of hyperspectral signatures of agricultural crops of the world. This hyperspectral library of agricultural crops is developed for all major world crops and was collected by United States Geological Survey (USGS) and partnering volunteer agencies from around the world. Crops include wheat, rice, barley, corn, soybeans, cotton, sugarcane, potatoes, chickpeas, lentils, and pigeon peas, which together occupy about 65% of all global cropland a...

NASA GHRSST Project

atmosphereclimateearth observationhydrologyicenetcdfoceansprecipitationsatellite imagerysoil moistureweather

CNR MED Sea Surface Temperature provides daily gap-free maps (L4) at 0.0625 deg. x 0.0625 deg. horizontal resolution over the Black Sea. The data are obtained from infra-red measurements collected by satellite radiometers and statistical interpolation. It is the CMEMS sea surface temperature nominal operational product for the Black sea....

NASA GLDAS Project

icenetcdfoceansprecipitationweather

NASA Global Land Data Assimilation System Version 2 (GLDAS-2) has three components: GLDAS-2.0, GLDAS-2.1, and GLDAS-2.2. GLDAS-2.0 is forced entirely with the Princeton meteorological forcing input data and provides a temporally consistent series from 1948 through 2014. GLDAS-2.1 is forced with a combination of model and observation data from 2000 to present. GLDAS-2.2 product suites use data assimilation (DA), whereas the GLDAS-2.0 and GLDAS-2.1 products are "open-loop" (i.e., no data assimilation). The choice of forcing data, as well as DA observation source, variable, and scheme, ...

NASA GOES Project

climateearth observationicenetcdfoceanssatellite imagery

The ABI G16 Deep Blue L3 Daily Aerosol Data, 1 x 1 degree grid product, short-name AERDB_D3_ABI_G16, derived from the L2 (AERDB_L2_ABI_G16) input data, each D3 ABI/GOES-16 product is produced daily at 1 x 1-degree horizontal resolution. In general, in this daily L3 (identified in the short-name as D3) aggregated product, each data field represents the arithmetic mean of all cells whose latitude and longitude places them within the bounds of each grid element. Another statistic like standard deviation is also provided in some cases. The final retrievals used in the aggregation process are Quali...

NASA GPCP Project

climateiceoceansprecipitationsatellite imageryweather

These data are transitioned to a state of permanent preservation. They are available upon request. More advanced datasets have been developed since. One recommended replacement is the GPCP (doi: 10.5067/DBVUO4KQHXTK) product developed under the MEaSUREs project. The Arkin and Janowiak GPI (GOES Precipitation Index) was the infrared-based monthly rainfall estimate produced by the early GPCP (Global Precipitation Climatology Project) algorithms. The infrared observations from geostationary satellites (GOES, GMS, Meteosat) are used to produce these monthly mean rainfall totals on a 2.5 deg by 2.5...

NASA GPM Project

atmosphereclimateearth observationicenetcdfoceansprecipitationradarsatellite imageryweather

Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07. The 'CLIM' products differ from their 'regular' counterparts (without the 'CLIM' in the name) by the ancillary data they use. They are Climate-Reference products, which requires homogeneous ancillary data over the climate time series. Hence, the ECMWF-Interim (European Centre for Medium-Range Weather Forecasts, 2-3 months lag behind the regular production) reanalysis is used as ancillary data to derive surface and atmospheric conditions r...

NASA GRACE Project

atmosphereclimatehydrologyicenetcdfoceanssatellite imagerysoil moisture

The monthly land mass grids contain water mass anomalies given as equivalent water thickness derived from GRACE & GRACE-FO time-variable gravity observations during the specified timespan, and relative to the specified time-mean reference period. The Equivalent water thickness represents the total terrestrial water storage anomalies from soil moisture, snow, surface water (incl. rivers, lakes, reservoirs etc.), as well as groundwater and aquifers. A glacial isostatic adjustment (GIA) correction has been applied, and standard corrections for geocenter (degree-1), C20 (degree-20) and C30 (de...

NASA GRACE-DA-DM Project

netcdfsatellite imagerysoil moisture

Scientists at NASA Goddard Space Flight Center generate groundwater and soil moisture drought indicators each week. They are based on terrestrial water storage observations derived from GRACE satellite data and integrated with other observations, using a sophisticated numerical model of land surface water and energy processes. This data product is GRACE Data Assimilation for Drought Monitoring (GRACE-DA-DM) U.S. Version 4.0 data product and supersedes the GRACE-DA-DM Version 2.0. The GRACE-DA-DM U.S. V4.0 is based on the Catchment Land Surface Model (CLSM) Fortuna 2.5 version simulation that w...

NASA GRACE-FO Project

atmosphereclimateicenetcdfoceanssatellite imagerysoil moisture

This data set is produced by the Center for Space Research (CSR) GRACE-FO (Gravity Recovery and Climate Experiment Follow-On) program and derives the terrestrial water storage anomaly given as equivalent water thickness. These monthly grids are derived from GRACE-FO time-variable gravity observations during the specified timespan, and relative to the specified time-mean reference period. This quantity represents the total terrestrial water storage anomalies from soil moisture, snow, surface water (incl. rivers, lakes, reservoirs etc.), as well as groundwater and aquifers. A glacial isostatic adjus...

NASA GSESA Project

satellite imagery

The Global Surface Emissivity Spectral Atlas (GSESA) database contains global, monthly climatology infrared emissivity functional Empirical Orthogonal Function (EOF) scores in 0.25 x 0.25 latitude-longitude resolution. An eigenvector file and a reader file allow customers to produce emissivity spectra. The emissivity functional EOF scores were developed using the Infrared Atmospheric Sounding Interferometer (IASI) instrument on the METOP-A, METOP-B, and METOP-C satellites for the period 2007-07-01 to 2025-01-31. An inversion scheme, dealing with cloudy as well as cloud-free radiances observed ...

NASA HAQAST Project

atmospherecarbonclimatenetcdfoceanssatellite imagery

Our mission is to put the power of NASA’s satellites down to earth and in your hands. HAQAST is a collaborative team that works in partnership with public health and air quality agencies to use NASA data and tools for the public benefit. Here, you can learn about our team, partnerships, and newsworthy achievements. You…...

NASA HLS Project

atmospherecogelevationgeospatialicesatellite imagerystac

The HLSL30 V1.5 data product was decommissioned on January 4, 2022. Users are encouraged to use the improved HLSL30 V2 data product. The Harmonized Landsat Sentinel-2 (HLS) project provides consistent surface reflectance (SR) and top of atmosphere (TOA) brightness data from the Operational Land Imager (OLI) aboard the joint NASA/USGS Landsat 8 satellite and the Multi-Spectral Instrument (MSI) aboard Europe’s Copernicus Sentinel-2A and Sentinel-2B satellites. The combined measurement enables global observations of the land every 2–3 days at 30-meter (m) spatial resolution. The HLS project uses a set of algorithms to obtain seamless products from OLI and MSI that include at...

NASA HRAC Project

elevationprecipitationsatellite imagery

This is a dataset that enhances the TMPA monthly product (3B43) in its accuracy and spatial resolution, in hydrometeorological applications. About 9,200 gauge measurement are used to compare with the 3B43 product at 0.25° x 0.25° spatial resolution across the CONUS. Observed is a strong relationship between the bias and land surface elevation, in which 3B43 underestimates the true precipitation at the elevations above 1,500 m amsl. Satellite data is resampled to elevation data at ~1km grid size and applied a correction function to reduce bias in the data. Accordingly, a High-Resolution Altitud...

NASA HWHYP Project

The Headwall Hyperspectral Reflectance data are for the Long-Term Ecological Research (LTER) at Cedar Creek Ecosystem Science Reserve (CCESR), Minnesota (HWHYPCCMN1MM). The reflectance products are at 1 millimeter (mm) spatial resolution in the 400 to 1,000 nanometer spectral range. This dataset can be used to understand the optical diversity-biodiversity relationship and investigate the spatial sensitivity of the relationship at local scales. A Headwall Series E imaging spectrometer was mounted on a tram system to collect designated plots for the biodiversity (BioDIV) experiment at the LTER C...

NASA Hydroclimatology Project

climateelevationhydrologyicenetcdfprecipitationsatellite imagery

The Global Monthly River Discharge Data Set (RivDIS) contains monthly averaged discharge measurements for 1,018 stations located throughout the world from 1807-1991. The period of record varies widely from station to station with a mean of 21.5 years. The data are derived from the published UNESCO archives for river discharge, and checked against information obtained from the Global Runoff Center in Koblenz, Germany through the U.S. National Geophysical Data Center in Boulder, Colorado....

NASA INTEXB Project

climatesatellite imagery

INTEX-NA is a two phase experiment that aims to understand the transport and transformation of gases and aerosols on transcontinental/intercontinental scales and assess their impact on air quality and climate. The primary constituents of interest are ozone and precursors, aerosols and precursors, and the long-lived greenhouse gases. The first phase (INTEX-A) was completed in the summer of 2004 and the second phase (INTEX-B) is to be performed in the spring of 2006. This document is intended to provide an update on the goals of INTEX-B and define its implementation strategy. The scientific goal...

NASA ISLSCP II Project

atmospherecarbonclimateelevationhydrologyiceland coveroceansprecipitationsatellite imageryweather

This data set contains the calculated net ocean-air carbon dioxide (CO2) flux and sea-air CO2 partial pressure (pCO2) difference. The estimates are based on approximately one million measurements made for the pCO2 in surface waters of the global ocean since the International Geophysical Year, 1956-1959. Only the ocean water pCO2 values measured using direct gas-seawater equilibration methods were used. The results represent the climatological distributions under non-El Nino conditions. Since the measurements were made in different years, during which the atmospheric pCO2 was increasing, they w...

NASA ISS_RapidScat Project

climateearth observationhdficenetcdfoceansradar

This dataset contains the ISS-RapidScat Version 2.0 Level 1B geo-located Sigma-0 measurements and antenna pulse "egg" and "slice" geometries as derived from ephemeris and the Level 1A dataset. The pulse "egg" represents the complete footprint of the pulse, which has a spatial geometry of approximately 25 km by 35 km. There are 8 slices that constitute the range-binned components of a pulse each of which has a spatial geometry of approximately 25 km by 7 km. The orientation of the long dimension of the slices varies with the rotation of the antenna and thus does not align with the along/across track orientation of the wind vector grid in...

NASA JASON-1 Project

icenetcdfoceansradarsatellite imagery

The enhanced Jason-1 Microwave Radiometer (JMR) corrections contains better wet tropospheric path delay corrections along with better land, rain and ice flagging for coastal regions than that found in the Jason-1 Geophysical Data Records (GDR). The enhanced corrections can be used in place of the GDR wet troposphere correction to provide more accurate Sea Surface Height Anomalies for coastal regions....

NASA JASON-3 Project

climatesatellite imagery

This is a near real time dataset that provides a GPS based orbit and Sea Surface Height Anomalies (SSHA) from that orbit. It is similar to the Jason-3 Operation Geophysical Data Record (OGDR) that is distributed at NOAA (http://www.nodc.noaa.gov/sog/jason/), but includes the GPS orbit and SSHA as two additional variables. It has a 5 hour time lag due to the time needed to calculate the GPS orbit and SSHA. The GPS orbits have been shown to be more accurate than the DORIS orbits on a near real time scale and therefore produces a more accurate SSHA.
Forward stream transitioned from processing bas...

NASA JPSS Project

atmospherecarbonclimateearth observationelevationhdfhdf5iceoceanssatellite imageryweather

This High-Resolution (0.1 x 0.1 degree) Level 3 daily Aerosol Optical Depth (AOD) product is generated by combining two Visible Infrared Imaging Radiometer Suite (VIIRS) operational algorithms, namely Deep Blue (DB) and Dark Target (DT), on board the NOAA-20 satellite. This dataset is provided in daily files ranging from 2018-02-17 to the present. The spatial coverage is global and the dataset is gridded at 0.1 x 0.1 degree spatial resolution. The data are generated using Level 2 AOD retrieved using DT and DB algorithms. The product provides multiple options for using data either from DT or DB...

NASA LANCE Project

icesatellite imagerysoil moisture

ATL13QL is the quick look version of ATL13 and is based on the same algorithms that generate the ATL13 final data products. Once final ATL13 files are available, the corresponding ATL13QL files are removed. ATL13QL provides along-track surface water products for inland water bodies, defined as lakes, reservoirs, bays, estuaries, rivers, and a 7 km near-shore buffer. Data parameters include surface water height statistics and related parameters including significant wave height, transect slope, subsurface signal attenuation, and shallow water bathymetry. Water surface heights are provided as bo...

NASA LBA-ECO Project

atmospherecarbonclimatecogearth observationelevationgeospatialhdfhydrologyiceland covernetcdfoceansprecipitationradarsatellite imagerysoil moistureweather

This data set provides measurements from the Amazonian Aerosol Characterization Experiment (AMAZE-08) carried out during the wet season from February 4 to March 21, 2008 in the central Amazon Basin. Aerosol and atmospheric samples and measurements were collected at Tower TT34 located 60 km NNW of downtown Manaus, and at Tower K34, located 1.6 km from the TT34 site. Physical characterization of aerosols included size, mass, and number distributions and light scattering properties. Chemical characterization included mass concentrations of organics, major anions and cations, and trace metals. Aer...

NASA LPJ Project

carbonclimatecogearth observationsoil moistureweather

Exploring Greenhouse Gas Data; Driving Sustainable Strategies through Powerful Analysis...

NASA Landslide Project Project

precipitationsatellite imagerysoil moisture

The Landslide Hazard Assessment for Situational Awareness (LHASA) model identifies locations with high potential for landslide occurrence at a daily temporal resolution. LHASA combines satellite‐based precipitation estimates with a landslide susceptibility map derived from information on slope, geology, road networks, fault zones, and forest loss. When rainfall is considered to be extreme and susceptibility values are moderate to very high, a “nowcast” is issued to indicate the times and places where landslides are more probable. Although the model could be run every half hour, this archive co...

NASA Low-Cost Sensor AQ Project

satellite imagery

Low-Cost-Sensors-AQ_AirQino is the ground site data collected by the AirQino sensor network as part of the Low-Cost Air Quality Sensor Harmonization Database. Data collection for this product is ongoing. The Low-Cost Sensor AQ Harmonization aims to harmonize sensor networks by amalgamating measurements from a multitude of networks into one open access framework. Currently, data from 10 unique US-based sensor networks have been collected for redistribution in the archive. Data from each network is reformatted in a common data format with metadata embedded to streamline data processing. A key fe...

NASA MASTER Project

climateearth observationhdfhydrologylidaroceansradarsatellite imagery

This dataset includes Level 1B (L1B) data products from the MODIS/ASTER Airborne Simulator (MASTER) instrument. The spectral data were collected during 7 flights aboard a DOE B-200 aircraft over Baja California, Mexico, and Nevada, U.S., on 1999-04-23 to 1999-05-05. Data products include L1B georeferenced multispectral imagery of calibrated radiance in 50 bands covering wavelengths of 0.460 to 12.879 micrometers at approximately 20-meter spatial resolution. The L1B file format is HDF-4. In addition, the dataset includes flight paths, spectral band information, instrument configuration, ancilla...

NASA MAS_eMAS Project

earth observationhdficesatellite imagery

The Enhanced Moderate Resolution Imaging Spectroradiometer (MODIS) Airborne Simulator (eMAS)instrument is maintained and operated by the Airborne Sensor Facility at NASA Ames Research Center in Mountain View, California, under the oversight of the EOS Project Science Office at NASA Goddard. Prior to 1995, the MAS was deployed on the NASA's ER-2 and C-130 aircraft platforms using a 12-channel, 8-bit data system that somewhat constrained the full benefit of having a 50-channel scanning spectrometer. Beginning in January 1995, a 50-channel, 16-bit digitizer was used on the ER-2 platform, whic...

NASA MERRA-2 Observation Project

ice

Global Modeling and Assimilation Office Research...

NASA MERRA-2 Project

atmospherecarboniceoceansprecipitationsatellite imageryweather

Global Modeling and Assimilation Office Research...

NASA MEaSUREs/HOMaGE Project

icenetcdfoceanssatellite imagery

This data set contains the monthly Global Ocean Mass Anomalies (goma) since 04/2002, as measured by the GRACE and GRACE Follow-On (G/GFO) satellite missions. The data are averaged over the global ocean domain, at monthly intervals (note: data gaps exist). This file contains the goma time series based on the spherical harmonic gravity fields provided by the G/GFO SDS centers: JPL, CSR, GFZ. The data are frequently updated as new monthly observations are acquired by the GFO mission. The processing of the spherical harmonics gravity field coefficients is as follows: (1) GAD + GSM: the monthly de-...

NASA MEaSUREs/OSWV Project

oceansprecipitationsatellite imagery

This dataset contains model output interpolated in space and time to observations from the MetOp-A ASCAT (ASCAT-A) instrument (a satellite-based scatterometer), representing the first science quality release of these data (post-provisional after v1.0) funded under the MEaAUREs program. These auxiliary fields are included to complement those scatterometer observations, specifically for the ASCATA_ESDR_L2_WIND_STRESS_V1.1 dataset. Model variables include: i) ocean surface wind fields from ERA-5 short-term forecast (removed from the analyses times to reduce impacts from assimilated scatterometer retr...

NASA MISR Project

atmosphereclimateelevationhdficenetcdfoceans

MIANACP_1 is the Multi-angle Imaging SpectroRadiometer (MISR) Aerosol Climatology Product version 1. It is 1) the microphysical and scattering characteristics of pure aerosol upon which routine retrievals are based, 2) mixtures of pure aerosol to be compared with MISR observations, and 3) the likelihood value assigned to each mode geographically. The ACP describes mixtures of up to three component aerosol types from a list of eight components in varying proportions. ACP component aerosol particle data quality depends on the ACP input data, which are based on aerosol particles described in the ...

NASA MISST Project

icenetcdfoceanssatellite imagery

The Saildrone Arctic 2021 dataset presents a unique collection of high-quality, near real-time, multivariate surface ocean, and atmospheric observations obtained through the deployment of Saildrone, an innovative wind and solar-powered uncrewed surface vehicle (USV). Saildrone is capable of extended missions lasting up to 12 months, covering vast distances at typical speeds of 3-5 knots and operates autonomously, relying solely on wind propulsion, while its navigation can be remotely guided from land. The 2021 Saildrone Arctic campaign featured two Saildrone USVs deployed during a 76-day cruise in th...

NASA MOPITT Project

carbon

MOP03J_109 is the Measurements Of Pollution In The Troposphere (MOPITT) Beta CO gridded daily means (Near and Thermal Infrared Radiances) version 109 product is an unvalidated beta product subject to recalibration, contains daily mean gridded versions of the daily Level 2 CO profile and total column retrievals. The averaging kernels associated with each retrieval are also gridded and included in the Level 3 files. Data collection for this product is ongoing. For a description of the file contents, refer to the File Spec Document. The MOPITT Level 2 Data Quality Statement contains additional in...

NASA MULTI-TASTE Project

atmospheresatellite imagery

The Medium Resolution Imaging Spectrometer (MERIS) is one of 10 sensors deployed in March of 2002 on board the polar-orbiting Envisat-1 environmental research satellite by the European Space Agency (ESA). The MERIS instrument is a moderate-resolution wide field-of-view push-broom imaging spectroradiometer capable of sensing in the 390 nm to 1040 nm spectral range. Being a programmable instrument, it had the unique capability of selectively adjusting the width and location of its 15 bands through ground command. The instrument has a 68.5-degree field of view and a swath width of 1150 meters, pr...

NASA MULTI_NASA Project

earth observationelevationiceoceansradarsatellite imagery

This data set contains surface elevations from retracked CryoSat-2 waveforms, as well as model fitting parameters used to retrack the waveform. The primary data set used in the production of these data come from the ESA CryoSat-2 satellite....

NASA MetOp Project

climatehdficenetcdfoceansradarsatellite imagery

This dataset represents the first historically reprocessed Level 2 coastal ocean surface wind vector climate data record from the Advanced Scatterometer (ASCAT) on MetOp-A sampled on a 12.5 km grid. This coastal dataset utilizes a spatial box filter to generate a spatial average of the Sigma-0 retrievals from the Level 1B dataset and obtains additional winds near the coast. Since the full resolution L1B Sigma-0 retrievals are used, all non-sea retrievals are discarded prior to the Sigma-0 averaging. Each box average Sigma-0 is then used to compute the vector cell wind using the same CMOD7 geophysical model function as in the operational OSI SAF ASCAT wind vector datasets. With this enhanced coastal retrieval, winds are computed as close to ~15 km from ...

NASA Model Archive Project

atmospherecarbonclimatehydrologyicenetcdfprecipitationsoil moistureweather

This model product provides: (1) the source code for the updated Berkeley-Dalhousie Soil Nitric Oxide (NO) Parameterization module (BDSNP, Version 1.0) as implemented with the Community Multi-scale Air Quality model (CMAQ, Version 5.0.2), (2) module input data from historical and new sources of maps for soil biome type, fertilizer, and arid and non-arid climates, and (3) sample CMAQ simulation outputs for three BDSNP module NO parameterizations (standard, historical, and newer inputs). The simulations use a 12-km spatial grid resolution for CMAQ modeling covering the conterminous United States...

NASA NACP Project

atmospherecarbonclimateearth observationelevationhdficeland coverlidarnetcdfoceansprecipitationradarsatellite imageryweather

This dataset provides estimates of hourly carbon dioxide (CO2) emissions from the combustion of fossil fuels at 1-km resolution for the coterminous United States (CONUS) covering the years 2012 through 2017. Emissions from the ACES model are reported for ten distinct emissions source sectors: Airports and Aircraft, Commercial Buildings, Electric Power Generation facilities, Industrial point and non-point sources, Commercial Marine Vessels, Nonroad vehicles and equipment, Oil and Gas wells and facilities, Onroad vehicles, Railway engines and yards, and Residential buildings. All emissions are r...

NASA NASA-SSH Project

climateoceansradarsatellite imagery

This dataset contains the Global Mean Sea Level (GMSL) trend generated from the Integrated Multi-Mission Ocean Altimeter Data for Climate Research Version 5.2. The GMSL trend is a 1-dimensional time series of globally averaged Sea Surface Height Anomalies (SSHA) from TOPEX/Poseidon, Jason-1, OSTM/Jason-2, Jason-3, and Sentinel-6A that covers September 1992 to present with a lag of up to 4 months. The data are reported as variations relative to a 20-year TOPEX/Jason collinear mean. Bias adjustments and cross-calibrations were applied to ensure SSHA data are consistent across the missions; Glaci...

NASA NCA-LDAS Project

climateearth observationhydrologyiceprecipitationsatellite imagerysoil moisture

The National Climate Assessment - Land Data Assimilation System, or NCA-LDAS, is a terrestrial water reanalysis in support of the United States Global Change Research Program's NCA activities. NCA-LDAS features high resolution, gridded, daily time series data products of terrestrial water and energy balance stores, states, and fluxes over the continental U.S., derived from land surface hydrologic modeling with multivariate assimilation of satellite Environmental Data Records (EDRs). The overall goal is to provide the highest quality terrestrial hydrology products that enable improved scien...

NASA NEESPI NASA Project

climateicesatellite imagerysoil moisture

The dataset contains global monthly-mean soil moisture statistics (average values) for 1 by 1 degree grid cells. The source for the data is AMSR-E daily estimates of soil moisture (AE_Land3.002: AMSR-E/Aqua Daily L3 Surface Soil Moisture, Interpretive Parameters, QC EASE-Grids. Version 2 ). The dataset covers the time period from 2002-10-01 to 2011-09-30....

NASA NEWS Project

earth observationnetcdfoceans

The world's premier catalyst for understanding Earth as a unified and dynamic system, empowering humanity through transformative insights into Earth system science....

NASA NIMBUS-7 Project

earth observationiceoceanssatellite imagery

NIMBUS7_NFOV_MLCE data are Nimbus 7 Narrow Field of View (NFOV) Maximum Likelihood Cloud Estimation (MLCE) Data in Native Format.The NIMBUS7_NFOV_MLCE data set uses the Nimbus-7 measurements and the MLCE algorithm for better regional and temporal resolution. The Earth Radiation Budget (ERB) parameters, derived from the Nimbus-7 scanner measurements, were rederived in 1990 using a Maximum Likelihood Cloud Estimation (MLCE) algorithm similar, but not identical, to the Earth Radiation Budget Experiment (ERBE) algorithm. Daily and monthly means are presented on two commensurate equal area world gr...

NASA NLDAS Project

climateicenetcdfprecipitationradarsatellite imagerysoil moistureweather

This data set contains thirty-eight fields simulated from the Mosaic land-surface model (LSM) for Phase 2 of the North American Land Data Assimilation System (NLDAS-2). The data are in 1/8th degree grid spacing and range from Jan 1979 to the present. The temporal resolution is hourly. The file format is netCDF (converted from the GRIB format). Mosaic was developed by Koster and Suarez (1994, 1996) to account for subgrid vegetation variability with a tile approach. Each vegetation tile carries its own energy and water balance and soil moisture and temperature. Each tile has three soil layers, w...

NASA NOAA - SPACE WEATHER PROGRAM Project

climatehdfhdf5icenetcdfoceanssatellite imageryweather

The Long-Term Data Record (LTDR) produces, validates, and distributes a global land surface climate data record (CDR) that uses both mature and well-tested algorithms in concert with the best-available polar-orbiting satellite data from past to the present. The CDR is critically important to studying global climate change. The LTDR project is unique in that it serves as a bridge that connects data derived from the NOAA Advanced Very High Resolution Radiometer (AVHRR), the EOS Moderate resolution Imaging Spectroradiometer (MODIS), the Suomi National Polar-orbiting Partnership (SNPP) Visible Inf...

NASA NOAA CLIMATE DATA RECORD (CDR) PROGRAM Project

atmosphereclimateicenetcdfoceans

The Smith & Reynolds Extended Reconstructed Sea Surface Temperature (ERSST) Level 4 dataset provides a historical reconstruction of monthly global ocean surface temperatures and temperature anomalies over a 2 degree spatial grid since 1854 from in-situ observations based on a consistent statistical methodology that accounts for uneven sampling distributions over time and related observational biases. Version 5 of this dataset implements release 3.0 of ICOADS (International Comprehensive Ocean-Atmosphere Data Set) and is supplemented by monthly GTS (Global Telecommunications Ship and buoy) ...

NASA NOBM Project

carboniceoceans

Global Modeling and Assimilation Office Research...

NASA NOPP_MISST Project

icenetcdfoceanssatellite imagery

The Saildrone Arctic 2019 dataset presents a unique collection of high-quality, near real-time, multivariate surface ocean, and atmospheric observations obtained through the deployment of Saildrone, an innovative wind and solar-powered uncrewed surface vehicle (USV). Saildrone is capable of extended missions lasting up to 12 months, covering vast distances at typical speeds of 3-5 knots and operates autonomously, relying solely on wind propulsion, while its navigation can be remotely guided from land. The 2019 Saildrone Arctic campaign featured six Saildrone USVs (jointly funded by NOAA and NASA) dep...

NASA NPP Project

atmospherecarbonclimatecogearth observationelevationiceland coverprecipitationsatellite imagerysoil moisturestacweather

This data set contains two files (.txt). One file contains stand characteristics, soil characteristics, biomass distribution, and production allocation data measured during the 1984 growing season in four lodgepole pine stands (Pinus contorta var. latifolia) located near Canal Flats, British Columbia, Canada (50.2 N -115.5 W Elevation 1,300-1,380 m). The second file contains climate data from a nearby weather station at Kananaskis Boundary, Alberta (50.98 N -115.12 W Elevation 1,463 m). Two lodgepole pine stands were growing on xeric sites and two stands were growing on mesic sites. The stands...

NASA NPP-JPSS Project

atmospherecarbonclimateearth observationelevationhdfhdf5iceland covernetcdfoceanssatellite imagery

The Advanced Technology Microwave Sounder (ATMS) Level 1B data files contain brightness temperature measurements along with ancillary spacecraft, instrument, and geolocation data of the ATMS instrument on the Joint Polar Satellite System-1 (JPSS-1) platform. This platform is also known as NOAA-20 (National Oceanic and Atmospheric Administration). The ATMS is a 22-channel mm-wave radiometer. The ATMS will measure upwelling radiances in six frequency bands centered at 23 GHz, 31 GHz, 50-58 GHz, 89 GHz, 66 GHz, and 183 GHz. The ATMS is a total power radiometer, with “through-the-antennaR...

NASA NRL Coriolis Project

icesatellite imagerysoil moisture

WindSat/Coriolis surface soil moisture (LPRM) L2 V001 is a Level 2 (swath) data set. Its land surface parameters, surface soil moisture, land surface (skin) temperature, and vegetation water content, are derived from polarimetric microwave radiometer data from WindSat, onboard the Naval Research Laboratory's Coriolis satellite, using the Land Parameter Retrieval Model (LPRM). Each swath is packaged with associated geolocation fields. The data set covers the period from February 2003 to July 2012. The LPRM is based on a forward radiative transfer model to retrieve surface soil moisture and ...

NASA NSCAT Project

iceoceansprecipitation

The NASA Scatterometer (NSCAT) Level 2.5 high-resolution reduced MGDR contains only wind vector data (sigma-0 is excluded) in 25 km wind vector cell (WVC) swaths which contain daily data from ascending and descending passes. Wind vectors are accurate to within 2 m/s (vector speed) and 20 degrees (vector direction). Wind vectors are not considered valid in rain contaminated regions; rain flags and precipitation information are not provided. Data is flagged where measurements are either missing or ambiguous. In the presence of land or sea ice winds values are set to 0. Wind vectors are processed...

NASA Nimbus Project

atmospherecarboniceoceanssatellite imagerysoil moistureweather

The Nimbus-4 BUV Level-1 Dark Current Study Master Data is derived from the BUV Level 1 Radiance (RUT) product and contains the geophysical indices and classification, geographic and geomagnetic coordinates, solar magnetic parameters and angles; monochromator and photometer pulse count and analog data, and energetic trapped particles. There is one-to-one correspondence between this product and the dark current working data files, the difference is the working product data have been filtered. The data were originally created on IBM 360 machines and archived on magnetic tapes. The data have been...

NASA OCO Project

carbon

Version 9r is the current version of the data set. Older versions will no longer be available and are superseded by Version 9r. The ACOS Lite files contain bias-corrected XCO2 along with other select fields aggregated as daily files. Orbital granules of the ACOS Level 2 standard product (ACOS_L2S) are used as input. The "ACOS" data set contains Carbon Dioxide (CO2) column averaged dry air mole fraction for all soundings for which retrieval was attempted. These are the highest-level products made available by the OCO Project, using TANSO-FTS spectral radiances. The GOSAT team at JAXA ...

NASA OCO-2 Project

atmospherecarbonice

This is the Gridded Daily OCO-2 Carbon Dioxide assimilated dataset. The OCO-2 mission provides the highest quality space-based XCO2 retrievals to date. However, the instrument data are characterized by large gaps in coverage due to OCO-2’s narrow 10-km ground track and an inability to see through clouds and thick aerosols. This global gridded dataset is produced using a data assimilation technique commonly referred to as state estimation within the geophysical literature. Data assimilation synthesizes simulations and observations, adjusting the state of atmospheric constituents like CO2 to ref...

NASA OCO-3 Project

atmospherecarbon

The ECOSTRESS/OCO-3 (“ECOCO3”) data set consists of spatially and temporally co-located observations from the ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station (ECOSTRESS) and the Orbiting Carbon Observatory 3 (OCO-3) instruments currently operating on the International Space Station. Land Surface Temperature, Evapotranspiration, and Water Use Efficiency data products from ECOSTRESS, and Solar-Induced Chlorophyll Fluorescence and Dry-Air Column Mole Fraction CO2 from OCO-3 are matched in space and time over 3°x3° areas across the globe where OCO-3 performs Snapshot Area Map a...

NASA OMG Project

iceoceansweather

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA OSCAR Project

oceanssatellite imagery

Ocean Surface Current Analyses Real-time (OSCAR) is a global surface current database and NASA funded research project. OSCAR ocean mixed layer velocities are calculated from satellite-sensed sea surface height gradients, ocean vector winds, and sea surface temperature gradients using a simplified physical model for geostrophy, Ekman, and thermal wind dynamics. Daily averaged surface currents are provided on a global 0.25 x 0.25 degree grid as an average over an assumed well-mixed top 30 m of the ocean from 1993 to present day. OSCAR currents are provided at three quality levels: final, interi...

NASA OTTER Project

atmosphere

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA PACE-PAX Project

earth observationlidaroceanssatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA PEM-Tropics Project

carbonclimatecogearth observationlidarsatellite imagery

PEM-Tropics-A_Aerosol_AircraftInSitu_DC8_Data is the in-situ aerosol data collected onboard the DC-8 aircraft during the Pacific Exploratory Mission (PEM) Tropics A suborbital campaign. Data utilizing condensation nuclei counters (CNC) is featured in this collection. Data collection for this product is complete. From 1983-2001, NASA conducted a collection of field campaigns as part of the Global Tropospheric Experiment (GTE). Among those was PEM, which intended to improve the scientific understanding of human influence on tropospheric chemistry. Part of the PEM field campaigns were focused on ...

NASA PEM-West Project

carbonlidarsatellite imagery

PEM-West-A_Aerosol_AircraftInSitu_DC8_Data is the in-situ aerosol data collected onboard the DC-8 aircraft during the Pacific Exploratory Mission (PEM) West A suborbital campaign. Data utilizing Optical Particle Counters (OPC) and ion chromatography are featured in this collection. Data collection for this product is complete. During 1983-2001, NASA conducted a collection of field campaigns as a part of the Global Tropospheric Experiment (GTE) for developing advanced instrumentation to quantify atmospheric trace gases’ sources, sinks, and distribution. Among those was PEM, which intended to im...

NASA POES Project

atmospheresatellite imagery

The version 8 SBUV/2 NOAA-11 ozone data were first released at the 2004 Quadrennial Ozone Symposium on DVD. The DVD contained all of the SBUV/2 data from NOAA-9, NOAA-11 and NOAA-16 satellites as well as SBUV data from the Nimbus-7 satellite. The DVD is no longer available, however all the data are available on-line from the NASA GES DISC. The NOAA-11 SBUV/2 v8 data are available from 1988-12-01 to 2001-03-27. The instrument spatial resolution is 180 km x 180 km footprint at nadir. The ozone profiles are made at 21 pressure levels between 1000 and 0.1 hPa. Each data file contains a days worth ...

NASA PREFIRE Project

atmosphereclimatecogelevationicenetcdfsatellite imagery

Polar Radiant Energy in the Far InfraRed Experiment (PREFIRE) Atmospheric Properties from PREFIRE Satellite 1 COG (PREFIRE_SAT1_2B-ATM_COG) is retrieved from data collected by the PREFIRE Thermal Infrared Spectrometer (TIRS-PREFIRE) aboard PREFIRE-SAT1. Dual CubeSats each carry a PREFIRE Thermal Infrared Spectrometer (TIRS-PREFIRE), a push broom spectrometer with 63 channels measuring mid- and far-infrared (FIR) radiation from approximately 5 to 53 µm. Most polar emissions are in the FIR but have not been measured on a large scale. PREFIRE aims to fill knowledge gaps in the global energy budge...

NASA PROVE Project

land coversatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA QUIKSCAT Project

earth observationiceoceansradar

This dataset consists of the version 2 Level 2B science-quality ocean surface wind vector retrievals from the Oceansat-2 scatterometer (OSCAT), which was designed and launched by the Indian Space Research Organization (ISRO) 23 September 2009. This Level 2B dataset is produced by the Jet Propulsion Laboratory (JPL) QuikSCAT Project in cooperation with ISRO. The retrievals are provided on a non-uniform grid within the swath at 12.5 km pixel resolution. This resolution is achieved through a slice composite technique in which high resolution slice measurements from L1B data are composited into a ...

NASA ROSES Project

satellite imagery

The Reconstructed Sea Level dataset contains sea level anomalies derived from satellite altimetry and tide gauges. The satellite altimetric record provides accurate measurements of sea level with near-global coverage, but it has a relatively short time span, since 1993. Tide gauges have measured sea level over the last 200 years, with some records extending back to 1807, but they only provide regional coverage, not global. Combining satellite altimetry with tide gauges, using a technique known as sea level reconstruction, results in a dataset with the record length of the tide gauges and the n...

NASA S-MODE Project

atmospherecarbonelevationgeospatialicelidarnetcdfoceansradarweather

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA SAFARI 2000 Project

atmospherecarbonclimatecogearth observationelevationhdfhydrologyiceland coverlidarnetcdfoceansprecipitationsatellite imagerysoil moistureweather

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA SAGE III-ISS Project

atmospherehdfhdf5icenetcdf

g3btmnc_6 is the Stratospheric Aerosol and Gas Experiment III (SAGE III) on the International Space Station (ISS) (SAGE III/ISS) Level 1 Monthly Solar Event Species Profiles (NetCDF) V6 data product. It contains pixel group transmission profiles for a month of solar events (the last day of each month is omitted). SAGE III was Launched on February 19, 2017 on a SpaceX Falcon 9 from Kennedy Space Center, SAGE III-ISS is the second instrument from the SAGE III project, externally mounted on the ISS. This ISS-based instrument uses a technique known as occultation, which involves looking at the lig...

NASA SAGE III-M3M Project

hdfsatellite imagery

Level 1B pixel group transmission profiles for a single solar event....

NASA SARAL Project

oceanssatellite imagery

These data are near-real-time (NRT) (within 7-9 hours of measurement) sea surface height anomalies (SSHA) from the AltiKa altimeter onboard the Satellite with ARgos and ALtiKa (SARAL). SARAL is a French(CNES)/Indian(SARAL) collaborative mission to measure sea surface height using the Ka-band AltiKa altimeter and was launched February 25, 2013. The major difference between these data and the Operational Geophysical Data Record (OGDR) data produced by the project is that the orbit from SARAL has been adjusted using SSHA differences with those from the OSTM/Jason-2 GPS-OGDR-SSHA product at inter-...

NASA SARP Project

oceanssatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA SASSIE Project

atmosphereclimateearth observationicenetcdfoceansradar

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA SBG Project

This report summarizes the community discussions at the 2024 NASA Surface Biology and Geology (SBG) Technical Interchange Meeting (TIM) from May 29 to 31, 2024, in Washington, D.C., US. The report provides the broader science community with information on the current state of the SBG mission, SBG-relevant science and applications, and future work needed. The SBG mission includes a visible to shortwave infrared (VSWIR) spectrometer and a thermal infrared (TIR) radiometer as outlined in the 2017-2027 National Academies' Decadal Survey. Over 150 in-person and 200 online community members from...

NASA SCP Project

climateearth observationradarsatellite imagery

This European Remote Sensing (ERS) Sigma-0 dataset is generated by the Scatterometer Climate Record Pathfinder (SCP) project at Brigham Young University (BYU) and is generated using a Scatterometer Image Reconstruction (SIR) technique developed by Dr. David Long at BYU. The dataset provides SIR processed Sigma-0 data from the ERS-1 C-band scatterometer, which is also known as the Active Microwave Instrument (AMI). AMI is a multimode radar operating at a frequency of 5.3 GHz (C-band), using vertically polarized antennas for both transmission and reception. The SIR technique results in an enhanc...

NASA SEAWINDS Project

atmosphereearth observationiceoceanssatellite imagery

The WindSat Polarimetric Radiometer, launched on January 6, 2003 aboard the Department of Defense Coriolis satellite, was designed to measure the ocean surface wind vector from space. It developed by the Naval Research Laboratory (NRL) Remote Sensing Division and the Naval Center for Space Technology for the U.S. Navy and the National Polar-orbiting Operational Environmental Satellite System (NPOESS) Integrated Program Office (IPO). The dataset contains the Level 1C WindSat Top of the Atmosphere (TOA) TB processed by RSS. The WindSat radiances are turned into TOA TB after correction for hot an...

NASA SHIFT Project

climatecogearth observationhdfhdf5icenetcdfoceans

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA SIF-ESDR Project

carbonclimatenetcdfoceanssatellite imagery

This dataset provides global solar-induced chlorophyll fluorescence (SIF) estimates at a 0.05-degree resolution (approximately 5 km at the equator) for each month from January 2003 through December 2017. SIF data (740 nm) was retrieved from the SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY) and Global Ozone Monitoring Experiment 2 (GOME-2) instruments onboard the MetOp-A satellite. The data were downscaled to 0.05 degrees using the Random Forest algorithm and predictors from Moderate Resolution Imaging Spectroradiometer (MODIS) and Modern-Era Retrospective an...

NASA SISTER Project

ice

The Space-based Imaging Spectroscopy and Thermal pathfindER (SISTER) activity originated in support of the NASA Earth System Observatory's Surface Biology and Geology (SBG) mission to develop prototype workflows with community algorithms and generate prototype data products envisioned for SBG. SISTER focused on developing a data system that is open, portable, scalable, standards-compliant, and reproducible. This collection contains EXPERIMENTAL workflows and sample data products, including (a) the Common Workflow Language (CWL) process file and a Jupyter Notebook that run the entire SISTER...

NASA SMAPVEX08 Project

land coverradarsatellite imagerysoil moisture

This data set includes several parameters that were obtained from field surveys as part of the Soil Moisture Active Passive Validation Experiment 2008 (SMAPVEX08)....

NASA SMAPVEX12 Project

iceland coverradarsatellite imagerysoil moisture

This data set contains in situ soil moisture data collected with coring devices at several agricultural sites as part of the Soil Moisture Active Passive Validation Experiment 2012 (SMAPVEX12)....

NASA SMAPVEX15 Project

soil moisture

This data set contains brightness temperatures obtained by the Passive Active L-band System (PALS) aircraft instrument. The data were collected as part of SMAPVEX15, the Soil Moisture Active Passive Validation Experiment 2015....

NASA SMAPVEX16 Manitoba Project

land coverprecipitationsoil moisture

This data set contains in situ measurements of soil moisture and bulk density collected for the Soil Moisture Active Passive Validation Experiment 2016 Manitoba (SMAPVEX16 Manitoba) campaign....

NASA SMERGE Project

climatesatellite imagerysoil moisture

Smerge-Noah-CCI root zone soil moisture 0-40 cm L4 daily 0.125 x 0.125 degree V2.0 is a multi-decadal root-zone soil moisture product. Smerge is developed by merging the North American Land Data Assimilation System (NLDAS) land surface model output with surface satellite retrievals from the European Space Agency Climate Change Initiative. The data have a 0.125 degree resolution at a daily time-step, covering the entire continental United States and spanning nearly four decades (January 1979 to May 2019). This data product contains root-zone soil moisture of 0 - 40 cm layer, Climate Change Init...

NASA SNF Project

atmosphereearth observationicesatellite imageryweather

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA SNWG/OPERA Project

cogearth observationelevationiceland coverradarsatellite imagery

This dataset contains Level-3 Dynamic OPERA surface water extent product version 1. The data are validated surface water extent observations beginning April 2023. Known issues and caveats on usage are described under Documentation. The input dataset for generating each product is the Harmonized Landsat-8 and Sentinel-2A/B/C (HLS) product version 2.0. HLS products provide surface reflectance (SR) data from the Operational Land Imager (OLI) aboard the Landsat 8 satellite and the MultiSpectral Instrument (MSI) aboard the Sentinel-2A/B/C satellite. The surface water extent products are distributed ove...

NASA SORCE Project

atmosphereclimateicenetcdfsatellite imagery

Welcome to the Home Page of the SOlar Radiation and Climate Experiment (SORCE) The Solar Radiation and Climate Experiment (SORCE) […]...

NASA SPURS Project

netcdfoceansprecipitationradarsatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA STAQS Project

earth observationlidarsatellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA Saildrone Baja Project

netcdfoceanssatellite imagery

Saildrone is a wind and solar powered unmanned surface vehicle (USV) capable of long distance deployments lasting up to 12 months and providing high quality, near real-time, multivariate surface ocean and atmospheric observations while transiting at typical speeds of 3-5 knots. The drone is autonomous in that it may be guided remotely from land while being completely wind driven. The saildrone Baja campaign was a 60-day cruise from San Francisco Bay, down along the US/Mexico coast to Guadalupe Island and back again over the period 11 April 2018 to 11 June 2018. Repeat surveys were taken around...

NASA Sentinel-3A Project

atmosphereearth observationnetcdfoceanssatellite imagery

The OLCI+SLSTR/Sentinel-3A L2 Surface Reflectance and Aerosol parameters over Land product with shortname S3A_SY_2_SYN, is generated by combining data acquired by the Ocean and Land Colour Instrument (OLCI) and the Sea and Land Surface Temperature Radiometer (SLSTR), on-board SENTINEL-3. The OLCI is a push-broom imaging spectrometer that measures solar radiation reflected by the Earth at a ground spatial resolution of around 300m, over all surfaces, in 21 spectral bands whereas the SLSTR is a dual scan temperature radiometer. The principal objective of SLSTR products is to provide global and r...

NASA Sentinel-3B Project

atmosphereearth observationnetcdfoceanssatellite imagery

The OLCI+SLSTR/Sentinel-3B L2 Surface Reflectance and Aerosol parameters over Land product with shortname S3A_SY_2_SYN, is generated by combining data acquired by the Ocean and Land Colour Instrument (OLCI) and the Sea and Land Surface Temperature Radiometer (SLSTR), on-board SENTINEL-3. The OLCI is a push-broom imaging spectrometer that measures solar radiation reflected by the Earth at a ground spatial resolution of around 300m, over all surfaces, in 21 spectral bands whereas the SLSTR is a dual scan temperature radiometer. The principal objective of SLSTR products is to provide global and r...

NASA Sentinel-5P Project

atmospherecarbonclimatecogicenetcdfoceanssatellite imagery

Sentinel-5P: Unveiling mission goals, applications, sensor insights, and product details, including advanced processing algorithms....

NASA Sentinel-6 Project

radarsatellite imagery

Provides reprocessed L1A high resolution (HR) non-time critical (NTC; 60-day latency) altimetry intermediate outputs from the Poseidon-4 SAR altimeter on the Sentinel-6A Michael Freilich spacecraft, which are geo-located bursts of Ku-band echoes (at ~140 Hz) with all instrument calibrations applied and full rate complex waveforms for delay/Doppler or HR processing. The S6A NTC product is analogous to the Jason-3 GDR product....

NASA SnowEx Project

atmosphereelevationiceland coverlidarradarsatellite imagerysoil moisture

This data set provides 3 m gridded, bare-earth elevations (excluding trees) that are used as the baseline for the Airborne Snow Observatory (ASO) snow-on products. The data were collected during snow-free conditions as part of the NASA/JPL ASO aircraft survey campaigns....

NASA Soil Project

atmospherecarbonclimateelevationhydrologyicelidarnetcdfoceansprecipitationsatellite imagerysoil moisture

This data set provides the concentrations of soil microbial biomass carbon (C), nitrogen (N) and phosphorus (P), soil organic carbon, total nitrogen, and total phosphorus at biome and global scales. The data were compiled from a comprehensive survey of publications from the late 1970s to 2012 and include 3,422 data points from 315 papers. These data are from soil samples collected primarily at 0-15 cm depth with some from 0-30 cm. In addition, data were compiled for soil microbial biomass concentrations from soil profile samples to depths of 100 cm. Sampling site latitude and longitude were av...

NASA Suomi-NPP Project

atmospherecarbonclimatehdfhdf5icenetcdfoceansprecipitationsatellite imageryweather

The objective of this limited edition data collection is to examine the ammonia products generated by the ESSPA (Earth System Science Profiling Algorithm) algorithm from the Cross-track Infrared Sounder (CrIS) instruments. The CrIS instrument used for this product is deployed on board the Suomi National Polar-orbiting Partnership (SNPP) platform and uses the Normal Spectral Resolution (NSR) data. The CrIS instrument is a Fourier transform spectrometer with a total of 1305 NSR infrared sounding channels covering the longwave (655-1095 cm-1), midwave (1210-1750 cm-1), and shortwave (2155-2550 cm...

NASA TCTE Project

atmosphere

TSI Calibration Transfer Experiment (TCTE) Launch November 19, 2013 The TCTE includes a Total Irradiance Monitor to measure total solar irradiance (TSI). This new instrument is similar to that providing data from NASA’s SORCE mission since 2003, and will overlap TSI measurements with the SORCE mission. TCTE was launched on Nov. 19, 2013 from NASA’s Wallops Flight Facility as […]...

NASA TIROS Project

atmospherecarbonsatellite imagery

TIROS-2 Medium-Resolution Scanning Radiometer Level 1 Final Meteorological Radiation Data (FMRT) product contains radiances expressed in five infrared/visible wavelength regions, expressed in either equivalent blackbody temperature (IR channels 1 and 2) or effective radiant emmitance (visible channels 3 and 5). The data will trace an elliptical, parabolic, or hyperbolic pattern on the ground due to the rotating of the instrument about the satellite spin axis. There is one orbit per file. The data were originally written on IBM 7094 machines, and these have been recovered from magnetic tapes, r...

NASA TOMS Project

atmospherehdfsatellite imagery

The Earth Probe (EP) Total Ozone Mapping Spectrometer (TOMS) version 8 daily ground station overpass data product contains total column ozone, UV aerosol index, Lambertian effective surface reflectivity (Rayleigh corrected), UV aerosol index and sulfur dioxide index values. The overpass data files contain the data derived from the best-matched TOMS field-of-view (FOV) to a site for every day the TOMS instrument was operational. The data are stored in an ASCII format. TOMS data were produced by the Laboratory for Atmospheres at NASA Goddard Space Flight Center (Code 614)....

NASA TOPEX/POSEIDON Project

oceans

The Sensory Data Record (SDR) is similar to the GDR product except that it also contains waveforms, which are required for retracking. This is an expert level product. If you do not need the waveforms then the GDR should suit your needs....

NASA TOVS Pathfinder Project

atmospherehdficenetcdfoceansprecipitationsatellite imagery

The Microwave Sounding Unit (MSU) Lower Troposphere Deep Layer Temperature product (MSULTT) provides gridded lower tropospheric temperatures derived from MSU instruments on several different platforms. The temperatures are derived using a combination of MSU channels 2 and 3 which has an averaging kernel that peaks near 500 hecto Pascals. The algorithm is based on Spencer and Christy (1990) with the LIMB 93 limb correction based on latitude, longitude, month, and scan angle. The MSU instruments measure the thermal emission of radiation by molecular oxygen at four frequencies near 60 GHz. North ...

NASA TRACE-A Project

atmospherelidaroceanssatellite imagery

TRACE-A_Sondes_Data is the balloonsonde and ozonesonde data collected during the Transport and Atmospheric Chemistry near the Equator - Atlantic (TRACE-A) suborbital campaign. Data collection for this product is complete. The TRACE-A mission was a part of NASA’s Global Tropospheric Experiment (GTE) – an assemblage of missions conducted from 1983-2001 with various research goals and objectives. TRACE-A was conducted in the Atlantic from September 21 to October 24, 1992. TRACE-A had the objective of determining the cause and source of the high concentrations of ozone that accumulated over the At...

NASA TRACE-P Project

atmospherelidarsatellite imagery

TRACE-P_Sondes_Data is the balloonsonde and ozonesonde data collected during the Transport and Chemical Evolution over the Pacific (TRACE-P) suborbital campaign. Data collection for this product is complete. The NASA TRACE-P mission was a part of NASA’s Global Tropospheric Experiment (GTE) – an assemblage of missions conducted from 1983-2001 with various research goals and objectives. TRACE-P was a multi-organizational campaign with NASA, the National Center for Atmospheric Research (NCAR), and several US universities. TRACE-P deployed its payloads in the Pacific between the months of March an...

NASA TRMM Project

atmosphereearth observationelevationhdfhdf5icenetcdfoceansprecipitationradarsatellite imagerysoil moistureweather

This is a new (GPM-formated) TRMM product. The equivalent old TRMM legacy product is TRMM_2H31. Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07. Estimating vertical profiles of latent heating released by precipitating cloud systems is one of the key objectives of TRMM, together with accurately measuring the horizontal distribution of tropical rainfall. The method uses TRMM PR information [precipitation-top height (PTH), precipitation rates at the surface and melting level, and rain type] to select heating prof...

NASA TROPESS Project

carbonnetcdfsatellite imagery

The TROPESS AIRS-Aqua and OMI-Aura L2 Ozone for Forward Stream, Standard Product contains the vertical distribution of the retrieved atmospheric state of ozone (O3), formal uncertainties, and diagnostic information measured by the AIRS instrument on the EOS Aqua satellite and the OMI instrument on the EOS Aura satellite. The forward stream standard product is global for the time period from 2021-02-01 to present. The NASA TRopospheric Ozone and Precursors from Earth System Sounding (TROPESS) project, uses an optimal estimation algorithm, known as the MUlti-SpEctra, MUlti-SpEcies, Multi-SEnsors...

NASA TROPICS (EVI-3) Project

icenetcdfoceansprecipitationsatellite imageryweather

The "Time-Resolved Observations of Precipitation structure and storm Intensity with a Constellation of Smallsats" (TROPICS) mission has a goal of providing nearly all-weather observations of three-dimensional temperature and humidity, as well as cloud ice and precipitation horizontal structure, at high temporal resolution to conduct high-value science investigations of tropical cyclones. The mission comprises a constellation of five identical Space Vehicles (SVs) conforming to the 3U form factor and hosting a passive microwave spectrometer payload. Each SV hosts an identical high-per...

NASA Terra Project

atmospherecarbonclimatecogearth observationelevationhdfhydrologyiceland covernetcdfoceanssatellite imagerystac

The AST14DEM Version 3 data product was decommissioned on December 15, 2025. Users are encouraged to use the AST14DEM Version 4 data product. The Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Digital Elevation Model (AST14DEM) product is generated using bands 3N (nadir-viewing) and 3B (backward-viewing) of an ASTER Level 1A image acquired by the Visible and Near Infrared (VNIR) sensor. The VNIR subsystem includes two independent telescope assemblies that facilitate the generation of stereoscopic data. The band 3 stereo pair is acquired in the spectral range of 0.78 and 0.86 ...

NASA TransCom Project

atmospherecarboniceoceans

The Atmospheric Tracer Transport Model Intercomparison Project (TransCom) was created to quantify and diagnose the uncertainty in inversion calculations of the global carbon budget that results from errors in simulated atmospheric transport, the choice of measured atmospheric carbon dioxide data used, and the inversion methodology employed. Under the third phase of TransCom (TransCom 3), surface-atmosphere CO2 fluxes were estimated from an intercomparison of 16 different atmospheric tracer transport models and model variants in order to assess the contribution of uncertainties in transport to ...

NASA UARS Project

atmosphereicesatellite imagerystacweather

The UARS Correlative assimilation data from NOAA's National Meteorological Center (NMC) consists of daily model runs at 12 GMT as a means of providing an independent analysis for comparison with data from the UARS instruments. The NMC data product includes temperature (Kelvin), humidity (%), geopotential height (m), and zonal and meridional wind components (m/s). Geopotential height and atmospheric temperature data are derived from two analysis systems: 1) tropospheric fields from 1000 to 100 mb, and 2) stratospheric analyses from 70 to 0.4 mb. The tropospheric fields are the 12 GMT gridde...

NASA VEMAP Project

atmospherecarbonclimateiceoceansprecipitation

The Vegetation/Ecosystem Modeling and Analysis Project (VEMAP) is an ongoing multiinstitutional, international effort addressing the response of biogeography and biogeochemistry to environmental variability in climate and other drivers in both space and time domains. The objectives of VEMAP are the intercomparison of biogeochemistry models and vegetationtype distribution models (biogeography models) and determination of their sensitivity to changing climate, elevated atmospheric carbon dioxide concentrations, and other sources of altered forcing. The VEMAP data set includes three georeferencin...

NASA Vegetation Project

atmospherecarbonclimatecogearth observationelevationgeospatialiceland coverlidarnetcdfprecipitationradarsatellite imagerystacweather

This global data set of photosynthetic rates and leaf nutrient traits was compiled from a comprehensive literature review. It includes estimates of Vcmax (maximum rate of carboxylation), Jmax (maximum rate of electron transport), leaf nitrogen content (N), leaf phosphorus content (P), and specific leaf area (SLA) data from both experimental and ambient field conditions, for a total of 325 species and treatment combinations. Both the original published Vcmax and Jmax values as well as estimates at standard temperature are reported. The maximum rate of carboxylation (Vcmax) and the maximum rate ...

NASA WDTS Project

satellite imagery

An inventory of NASA's airborne and field campaigns for Earth Science...

NASA WELD Project

atmospheregeospatialhdfland coversatellite imagery

WELDLCLUC.015 was decommissioned on December 2, 2019. The Web-Enabled Landsat Data (WELD) 5-year Land Cover Land Use Change (LCLUC) is a composite of 30 meter (m) land use land change product for the contiguous United States (CONUS). The data were generated from five years of consecutive growing season WELD weekly composite inputs from April 15, 2006, to November 17, 2010. WELD data are created using Landsat Thematic Mapper Plus (ETM+) Terrain Corrected data. This product includes data about tree cover loss and bare ground gain, which are composited over the five year period. WELD LCLUC is dis...

NASA WLDAS Project

elevationhydrologyiceprecipitationsoil moisture

The Western Land Data Assimilation System (WLDAS), developed at Goddard Space Flight Center (GSFC) and funded by the NASA Western Water Applications Office, provides water managers and stakeholders in the western United States with a long-term record of near-surface hydrology for use in drought assessment and water resources planning. WLDAS leverages advanced capabilities in land surface modeling and data assimilation to furnish a system that is customized for stakeholders’ needs in the region. WLDAS uses NASA’s Land Information System (LIS) to configure and drive the Noah Multiparameterizatio...

NASA amsr-2 Project

iceoceansprecipitationsatellite imagerysoil moisture

This AMSR Unified global ocean data set reports integrated water vapor and cloud liquid water content in the atmospheric column, plus 10-meter sea surface wind speeds. The data are derived from AMSR-E and AMSR2 brightness temperature observations that have been resampled by the Japan Aerospace Exploration Agency (JAXA) to facilitate an intercalibrated (i.e., “unified”) AMSR-E/AMSR2 data record. Ancillary files, including product history, quality assessment (QA), and file-specific metadata are also available....

NASA amsr-e Project

earth observationiceoceansprecipitationsatellite imagerysoil moisture

These Level-3 Snow Water Equivalent (SWE) data sets contain SWE data and quality assurance flags mapped to Northern and Southern Hemisphere 25 km Equal-Area Scalable Earth Grids (EASE-Grids)....

NASA aster Project

elevationoceans

This dataset contains a simulated rasterized water surface elevation and inundation-extent product to be provided by the Surface Water and Ocean Topography (SWOT) mission. SWOT will provide a global coverage but this simulated subset focuses on the North America continent. This is a derived product through resampling the upstream dataset L2_HR_PIXC_V1 and L2_HR_PIXCVEC_V1 onto a uniform grid over the North America continent. A uniform grid is superimposed onto the pixel cloud from the source products, and all pixel-cloud samples within each grid cell are aggregated to produce a single value pe...

NASA chirps Project

climateprecipitationweather

This data set provides downscaled six-hourly atmospheric forcings from European Centre for Medium-Range Weather Forecasts (ECMWF) and Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) precipitation from 2003 to 2019 at a spatial resolution of ~1km across High Mountain Asia....

NASA icesat Project

elevationicelidaroceansradarsatellite imagery

Level-1A altimetry data (GLAH01) include the transmitted and received waveform from the altimeter. Each data granule has an associated browse product....

NASA icesat-2 Project

atmosphereclimateelevationiceoceansradarsatellite imagery

These Level 1B time-ordered telemetry data are used for system-level, quality control analysis by the Advanced Topographic Laser Altimeter System (ATLAS) ICESat-2 Science Investigator-led Processing System (SIPS). They also provide source data for the Level 2 products and the Precision Orbit Determination (POD) and Precision Pointing Determination (PPD) computations....

NASA landsat Project

satellite imagery

This data set contains shapefiles of Greenland’s glacial termini and basins for the years 1972 to 2019. These vector data were created from Landsat 1-8 satellite imagery using the Calving Front Machine (CALFIN) an automated processing workflow utilizing neural networks for extracting calving fronts from satellite images of marine-terminating glaciers....

NASA landsat-7 Project

iceoceansradarsatellite imagery

This data set, part of the NASA's Making Earth System Data Records for Use in Research Environments (MEaSUREs) program, provides a complete 15 m resolution image mosaic of the Greenland ice sheet, derived from USGS Landsat 7 ETM+ imagery and Canadian Space Agency's (CSA) RADARSAT-1 imagery from the years 1999 to 2002. Additional bands (some at 30 m resolution) are provided for each tile in the mosaic and are useful for understanding surface properties, such as snow grain size, bedrock outcrops, mapping layering in the snow, and blue ice or lake-filled regions, during the spring and sum...

NASA landsat-8 Project

iceradarsatellite imagery

This data set, part of the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) program, consists of mean monthly velocity maps for selected glacier outlet areas. The maps are generated by tracking visible features between optical image pairs acquired by the Landsat 4 and 5 Thematic Mapper (TM), the Landsat 7 Enhanced Thematic Mapper Plus (ETM+), the Landsat 8 Operational Land Imager (OLI), and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER). See Greenland Ice Mapping Project (GIMP) for related data....

NASA measures Project

climateelevationicenetcdfoceansradarsatellite imagery

This data set, part of the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) program, provides a daily record of Arctic sea ice characteristics for the years 1979 through 2012 derived from passive microwave brightness temperatures. Characteristics include the location of sea ice cover, sea ice age, day of melt onset, and status of melt onset. Data are gridded in the 25 km Equal-Area Scalable Earth Grid (EASE-Grid) 2.0 and provided as netCDF files....

NASA modis-aqua Project

climateicesatellite imagery

This global Level-3 data set (MYD10A1F) provides daily cloud-free snow cover derived from the MODIS/Aqua Snow Cover Daily L3 Global 500m SIN Grid data set (MYD10A1). Grid cells in MYD10A1 which are obscured by cloud cover are filled by retaining clear-sky views of the surface from previous days. A separate parameter is provided which tracks the number of days in each cell since the last clear-sky observation. Each data granule contains a 10° x 10° tile projected to the 500 m sinusoidal grid. The terms "Version 61" and "Collection 6.1" are used interchangeably in reference t...

NASA modis-terra Project

atmospherecarbonclimateearth observationelevationiceland coverlidarnetcdfoceansprecipitationradarsatellite imagerysoil moistureweather

Mission Objectives: The Surface Water and Ocean Topography (SWOT) mission aims to provide valuable data and information about the world's oceans and its terrestrial surface water such as lakes, rivers, and wetlands. SWOT is being developed jointly by NASA and Centre National D'Etudes Spatiales (CNES), with contributions from the Canadian Space Agency (CSA) and United Kingdom Space Agency (UKSA)....

NASA omi Project

netcdfoceansradar

This dataset was produced by the Adaptive Sampling of Rain and Ocean Salinity from Autonomous Seagliders (NASA grant NNX17AK07G) project, an investigation to develop tools and strategies to better measure the structure and variability of upper-ocean salinity in rain-dominated environments. From October 2019 to January 2020, three Seagliders were deployed near Guam (14°N 144°E). The Seaglider is an autonomous profiler measuring salinity and temperature in the upper ocean. The three gliders sampled in an adaptive formation to capture the patchiness of the rain and the corresponding oceanic respo...

NASA opera Project

iceoceanssatellite imagerysoil moisture

This data set is an inventory of some 2800 landslides that occurred in the High Mountain Asia (HMA) study area between 5 January 2007 and 31 December 2018 (plus one event from 28 January 1990). The catalog includes dates and locations of landslides, plus additional characteristics such as event triggers, country, length and area of the slide, and the number of injuries and fatalities. The events in this catalog represent an HMA-specific subset of the Cooperative Open Online Landslide Repository (COOLR), a project that was created to build a more robust, publicly available inventory of landslid...

NASA pace Project

iceoceansradarsatellite imagerysoil moisture

The data set consists of weekly gridded Level-3 products of Aquarius L-band radiometer brightness temperature (TB) observations and Sea Surface Salinity (SSS) retrievals from the Aquarius/Satélite de Aplicaciones Científicas (SAC-D) mission, developed collaboratively between the U.S. National Aeronautics and Space Administration (NASA) and Argentina's space agency, Comisión Nacional de Actividades Espaciales (CONAE)....

NASA sentinel-1 Project

earth observationelevationiceoceansradarsatellite imagerysoil moisture

This data set contains annual surface melt onset and freeze onset dates across all glaciers in the Hindu Kush Himalayas (HKH) retrieved from time series synthetic aperture radar (SAR) imagery. The data set was based on analysis of C-band Sentinel-1 A/B SAR time series, comprising 32,741 Sentinel-1 A/B SAR images. The duration of annual glacier surface melt was determined for 105,432 mapped glaciers (83,102 km2 glacierized area) during the calendar years 2017-2020....

NASA sentinel-2 Project

icesatellite imagery

This data set, part of the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) program, consists of surface velocity estimates for selected Greenland Ice Sheet outlet glaciers. Velocity fields were generated by tracking visible features in optical images acquired by the U.S. Geological Survey (USGS) Landsat 8 Operational Land Imager (OLI) and the European Space Agency (ESA) Copernicus Sentinel-2A and Sentinel-2B satellites....

NASA smap Project

iceland coverradarsoil moisture

This data set contains data obtained by the Passive Active L- and S-band (PALS) microwave aircraft instrument that are matched up with a variety of soil moisture campaign data. The data were collected as part of four different campaigns: Southern Great Plains 1999 (SGP99), Cloud and Land Surface Interaction Campaign 2007 (CLASIC07), Soil Moisture Experiment 2002 (SMEX02), and the SMAP Validation Experiment 2008 (SMAPVEX08)....

NASA swot Project

elevationoceans

This dataset provides a simulated water surface elevation product that resembles the Ka-band Interferometer (KaRIn) measurements by the Surface Water and Ocean Topography (SWOT) mission. SWOT will provide a global coverage but this simulated subset focuses on the North America continent. The simulated SWOT KaRIN swaths span 128 km in the cross-swath direction with a 20-km nadir gap. This product is complementary to the L2_HR_PIXC_V1 product. It provides a less noisy, height-constrained geolocation (latitude, longitude, and height) of the L2_HR_PIXC_V1 pixels. In addition, this product provides...

NASA tempo Project

icesatellite imagerysoil moisture

The mountains of Nepal are one of the most hazardous environments in the world, with frequent landslides caused by tectonic activity, extreme rainfall and infrastructure development. As a landlocked country, Nepal relies on proper functioning of major transportation networks such as the highways to sustain and improve the livelihoods of the population. Every year there are reports of landslides blocking the highways, especially during the rainy season; however, the frequency and location of landslides along the highway corridors are not well reported. RapidEye satellite imagery was used to cre...

NASA tes Project

carbonclimateelevationhdfhdf5hydrologyiceland coverlidarnetcdfoceansprecipitationradarsatellite imagerysoil moistureweather

This data set contains Level-2 global soil moisture estimates derived from the NASA Aquarius passive microwave radiometer on the Satélite de Aplicaciones Científicas (SAC-D)....

NASA viirs-jpss Project

icesatellite imagery

This data set contains daily 'cloud-free' snow cover produced from the VIIRS/JPSS-2 Snow Cover Daily L3 Global 375m SIN Grid, Version 2 snow cover product. A cloud-gap-filled algorithm is utilized to replace ‘cloud-covered’ pixels with ‘cloud-free pixels’ for the purpose of estimating the snow cover that may exist under current cloud cover. The data are provided daily and mapped to a 375 m sinusoidal grid....

NASA viirs-snpp Project

icesatellite imagery

This data set contains daily 'cloud-free' snow cover produced from the VIIRS/JPSS-1 Snow Cover Daily L3 Global 375m SIN Grid, Version 2 snow cover product. A cloud-gap-filled algorithm is utilized to replace ‘cloud-covered’ pixels with ‘cloud-free pixels’ for the purpose of estimating the snow cover that may exist under current cloud cover. The data are provided daily and mapped to a 375 m sinusoidal grid....

NLP - fast.ai datasets

deep learningmachine learningnatural language processing

Some of the most important datasets for NLP, with a focus on classification, including IMDb, AG-News, Amazon Reviews (polarity and full), Yelp Reviews (polarity and full), Dbpedia, Sogou News (Pinyin), Yahoo Answers, Wikitext 2 and Wikitext 103, and ACL-2010 French-English 10^9 corpus. This is part of the fast.ai datasets collection hosted by AWS for convenience of fast.ai students. See documentation link for citation and license details for each dataset.

NOAA / NGA Satellite Computed Bathymetry Assessment-SCuBA

agricultureagriculturebathymetryclimatedisaster responseenvironmentaloceanstransportationweather

One of the National Geospatial-Intelligence Agency’s (NGA) and the National Oceanic and Atmospheric Administration’s (NOAA) missions is to ensure the safety of navigation on the seas by maintaining the most current information and the highest quality services for U.S. and global transport networks. To achieve this mission, we need accurate coastal bathymetry over diverse environmental conditions. The SCuBA program focused on providing critical information to improve existing bathymetry resources and techniques with two specific objectives. The first objective was to validate National Aeronautics and Space Administration’...

NOAA 3-D Surge and Tide Operational Forecast System for the Atlantic Basin (STOFS-3D-Atlantic)

climatecoastaldisaster responseenvironmentalglobalmarine navigationmeteorologicaloceanssustainabilitywaterweather

NOTICE - The Coast Survey Development Laboratory (CSDL) in NOAA/National Ocean Service (NOS)/Office of Coast Survey is upgrading the Surge and Tide Operational Forecast System (STOFS, formerly ESTOFS) to Version 2.1. A Service Change Notice (SCN) has been issued and can be found "HERE"

NOAA's Surge and Tide Operational Forecast System: Three-Dimensional Component for the Atlantic Basin (STOFS-3D-Atlantic). STOFS-3D-Atlantic runs daily (at 12 UTC) to provide users with 24-hour nowcasts (analyses of near present conditions) and up to 96-hour forecast guidance of water level conditions, and 2- and 3...

NOAA Atmospheric Climate Data Records

agricultureclimatemeteorologicalsustainabilityweather

NOAA's Climate Data Records (CDRs) are robust, sustainable, and scientifically sound climate records that provide trustworthy information on how, where, and to what extent the land, oceans, atmosphere and ice sheets are changing. These datasets are thoroughly vetted time series measurements with the longevity, consistency, and continuity to assess and measure climate variability and change. NOAA CDRs are vetted using standards established by the National Research Council (NRC).

Climate Data Records are created by merging data from surface, atmosphere, and space-based systems across decades. NOA...

NOAA Cloud Optimized Zarr Reference Files (Kerchunk)

climatecoastaldisaster responseenvironmentalmeteorologicaloceanswaterweather

This repository contains references to datasets published to the NOAA Open Data Dissemination Program. These reference datasets serve as index files to the original data by mapping to the Zarr V2 specification. When multidimensional model output is read through zarr, data can be lazily loaded (i.e. retrieving only the data chunks needed for processing) and data reads can be scaled horizontally to optimize object storage read performance.

The process used to optimize the data is called kerchunk. RPS runs the workflow in their AWS cloud environment every time a new data notification is received from a relevant source data bucket.

These are the current datasets being cloud-optimized. Refer to those pages for file naming conventions and other information regarding the specific model implementations:
Details →

NOAA Continuously Operating Reference Stations (CORS) Network (NCN)

broadcast ephemerisContinuously Operating Reference Station (CORS)earth observationgeospatialGNSSGPSmappingNOAA CORS Network (NCN)post-processingRINEXsurvey

The NOAA Continuously Operating Reference Stations (CORS) Network (NCN), managed by NOAA/National Geodetic Survey (NGS), provide Global Navigation Satellite System (GNSS) data, supporting three dimensional positioning, meteorology, space weather, and geophysical applications throughout the United States. The NCN is a multi-purpose, multi-agency cooperative endeavor, combining the efforts of hundreds of government, academic, and private organizations. The stations are independently owned and operated. Each agency shares their GNSS/GPS carrier phase and code range measurements and station metada...

NOAA EAGLE (Experimental AI Global and Limited-Area Ensemble) Global Deterministic and Ensemble Forecasts

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

...

NOAA Fundamental Climate Data Records (FCDR)

agricultureclimatemeteorologicalsustainabilityweather

NOAA's Climate Data Records (CDRs) are robust, sustainable, and scientifically sound climate records that provide trustworthy information on how, where, and to what extent the land, oceans, atmosphere and ice sheets are changing. These datasets are thoroughly vetted time series measurements with the longevity, consistency, and continuity to assess and measure climate variability and change. NOAA CDRs are vetted using standards established by the National Research Council (NRC).

Climate Data Records are created by merging data from surface, atmosphere, and space-based systems across decades. NOA...

NOAA Global Data Assimilation (DA) Test Data

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

The Unified Forecast System (UFS) is a community-based, coupled, comprehensive Earth Modeling System. It supports multiple applications with different forecast durations and spatial domains. The Global Data Assimilation System (GDAS) Application (App) is being used as the basis for uniting the Global Workflow and Global Forecast System (GFS) model with Joint Effort for Data assimilation Integration (JEDI) capabilities.

The National Centers for Environmental Prediction (NCEP) use GDAS to interpolate data from various observing systems and instruments onto a three-dimensional grid. GDAS obtain...

NOAA Global Ensemble Forecast System (GEFS)

agricultureclimatemeteorologicalweather

The Global Ensemble Forecast System (GEFS), previously known as the GFS Global ENSemble (GENS), is a weather forecast model made up of 21 separate forecasts, or ensemble members. The National Centers for Environmental Prediction (NCEP) started the GEFS to address the nature of uncertainty in weather observations, which is used to initialize weather forecast models. The GEFS attempts to quantify the amount of uncertainty in a forecast by generating an ensemble of multiple forecasts, each minutely different, or perturbed, from the original observations. With global coverage, GEFS is produced fo...

NOAA Global Hydro Estimator (GHE) / Enterprise Rain Rate

agriculturemeteorologicalwaterweather

NOTE - The legacy on-premises version of the Global Hydroestimator (GHE) is being retired. It is being replaced by the global Enterprise Rain Rate algorithm. You can find Enterprise Rain Rate products in the new bucket listed under the Resources section.

Global Hydro-Estimator provides a global mosaic imagery of rainfall estimates from multi-geostationary satellites, which currently includes GOES-16, GOES-15, Meteosat-8, Meteosat-11 and Himawari-8. The GHE products include: Instantaneous rain rate, 1 hour, 3 hour, 6 hour, 24 hour and also multi-day rainfall accumulation.

NOAA Global Mosaic of Geostationary Satellite Imagery (GMGSI)

agricultureclimatemeteorologicalweather

NOAA/NESDIS Global Mosaic of Geostationary Satellite Imagery (GMGSI) visible (VIS), shortwave infrared (SIR), longwave infrared (LIR) imagery, and water vapor imagery (WV) are composited from data from several geostationary satellites orbiting the globe, including the GOES-East and GOES-West Satellites operated by U.S. NOAA/NESDIS, the Meteosat-10 and Meteosat-9 satellites from theMeteosat Second Generation (MSG) series of satellites operated by European Organization for the Exploitation of Meteorological Satellites (EUMETSAT), and the Himawari-9 satellite operated by the Japan Meteorological ...

NOAA Global Real-Time Ocean Forecast System (Global RTOFS)

climatecoastaldisaster responseenvironmentalglobalmeteorologicaloceanswaterweather

NOAA is soliciting public comment on petential changes to the Real Time Ocean Forecast System (RTOFS) through March 27, 2024. Please see Public Notice at (https://www.weather.gov/media/notification/pdf_2023_24/pns24-12_rtofs_v2.4.0.pdf)

NOAA's Global Real-Time Ocean Forecast System (Global RTOFS) provides users with nowcasts (analyses of near present conditions) and forecast guidance up to eight days of ocean temperature and salinity, water velocity, sea surface elevation, sea ice coverage and sea ice thickness.

The Global Operational Real-Time Ocean Forecast System (Global RTOFS) is based on an eddy resolving 1/12° global HYCOM (HYbrid Coor...

NOAA Global Surge and Tide Operational Forecast System 2-D (STOFS-2D-Global)

climatecoastaldisaster responseenvironmentalglobalmeteorologicaloceanswaterweather

NOTICE - The Coast Survey Development Laboratory (CSDL) in NOAA/National Ocean Service (NOS)/Office of Coast Survey has upgraded the Surge and Tide Operational Forecast System (STOFS, formerly ESTOFS) to Version 2.1. A Service Change Notice (SCN) has been issued and can be found "HERE"

NOAA's Global Surge and Tide Operational Forecast System 2-D (STOFS-2D-Global) provides users with nowcasts (analyses of near present conditions) and forecast guidance of water level conditions for the entire globe. STOFS-2D-Global has been developed to serve the marine navigation, weather forecasting, an...

NOAA Historical Maps and Charts

coastalgeospatialhistorymappingsurvey

Historical Charts are not for Navigation. The collection primarily consists of historic charts and maps produced by NOAA's Coast Survey and its predecessors, especially the U.S. Coast and Geodetic Survey and the U.S. Lake Survey (previously under the Department of War). The collection also includes bathymetric maps, land sketches, Civil War battle maps, aeronautical charting from the 1930s to the 1950s, and other drawings and photographs.

NOAA Hurricane Analysis and Forecast System (HAFS)

agricultureclimatemeteorologicalweather

The last several hurricane seasons have been active with records being set for the number of tropical storms and hurricanes in the Atlantic basin. These record-breaking seasons underscore the importance of accurate hurricane forecasting. Imperative to increased forecasting skill for hurricanes is the development of the Hurricane Forecast Analysis System or HAFS. To accelerate improvements in hurricane forecasting, this project has the following goals:

To improve the HAFS. The HAFS is NOAA’s next-generation multi-scale numerical model, with data assimilation package and ocean coupling, which will provide an op

...

NOAA NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis

agricultureclimatemeteorologicalweather

The NOAA NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis is a curated joint observation archive containing Earth system data from 1979 to present prepared by teams at NOAA's Physical Sciences Laboratory and NASA's Global Modeling and Assimilation Office. The goal is to foster collaboration across organizations and develop the ability for direct comparison of Earth System reanalysis results. Providing a singular dataset for observation input use will allow reanalyses to be compared on their unique development qualities by removing the variation from using different...

NOAA National Bathymetric Source Data

bathymetryearth observationmarine navigationmodeloceansoceans

The National Bathymetric Source (NBS) project creates and maintains high-resolution bathymetry composed of the best available data. This project enables the creation of next-generation nautical charts while also providing support for modeling, industry, science, regulation, and public curiosity. Primary sources of bathymetry include NOAA and U.S. Army Corps of Engineers hydrographic surveys and topographic bathymetric (topo-bathy) lidar (light detection and ranging) data. Data submitted through the NOAA Office of Coast Survey’s external source data process are also included, with gaps...

NOAA National Blend of Models (NBM)

agricultureclimatecogmeteorologicalweather

The National Blend of Models (NBM) is a nationally consistent and skillful suite of calibrated forecast guidance based on a blend of both NWS and non-NWS numerical weather prediction model data and post-processed model guidance. The goal of the NBM is to create a highly accurate, skillful and consistent starting point for the gridded forecast.

NOAA National Blend of Models (NBM) Parallel

agricultureclimatedisaster responseenvironmentalmeteorologicalweather

The National Blend of Models (NBM) is a nationally consistent and skillful suite of calibrated forecast guidance based on a blend of both NWS and non-NWS numerical weather prediction model data and post-processed model guidance. The goal of the NBM is to create a highly accurate, skillful and consistent starting point for the gridded forecast. This dataset contains data from the current parallel version of the NBM which is a test version, featuring many changes, that is a candidate to be implemented into operations following a careful vetting process.

NOAA North American Mesoscale Forecast System (NAM)

agricultureclimatemeteorologicalweather

The North American Mesoscale Forecast System (NAM) is one of the National Centers For Environmental Prediction’s (NCEP) major models for producing weather forecasts. NAM generates multiple grids (or domains) of weather forecasts over the North American continent at various horizontal resolutions. Each grid contains data for dozens of weather parameters, including temperature, precipitation, lightning, and turbulent kinetic energy. NAM uses additional numerical weather models to generate high-resolution forecasts over fixed regions, and occasionally to follow significant weather events like hur...

NOAA Oceanic Climate Data Records

agricultureclimatemeteorologicaloceanssustainabilityweather

NOAA's Climate Data Records (CDRs) are robust, sustainable, and scientifically sound climate records that provide trustworthy information on how, where, and to what extent the land, oceans, atmosphere and ice sheets are changing. These datasets are thoroughly vetted time series measurements with the longevity, consistency, and continuity to assess and measure climate variability and change. NOAA CDRs are vetted using standards established by the National Research Council (NRC).

Climate Data Records are created by merging data from surface, atmosphere, and space-based systems across decades. NOA...

NOAA Office of Coast Survey - Hydrographic Survey Data

bathymetryearth observationmarine navigationmodeloceansoceans

Founded in 1807, NOAA’s Office of Coast Survey is the nation’s first scientific agency and today is responsible for supporting nearly $5.4 trillion in economic activity through providing advanced marine navigation services. The Office of Coast Survey collects and qualifies hydrographic, bathymetric, and topographic data, from NOAA platforms and many other data providers. These data and associated deliverables are posted here for various users to access, including but not limited to the "National Bathymetric Source Program" for incorporation into compilations of the best available bat...

NOAA Rapid Refresh (RAP)

agricultureclimatemeteorologicalweather

The Rapid Refresh (RAP) is a NOAA/NCEP operational weather prediction system comprised primarily of a numerical forecast model and analysis/assimilation system to initialize that model. It covers North America and is run with a horizontal resolution of 13 km and 50 vertical layers. The RAP was developed to serve users needing frequently updated short-range weather forecasts, including those in the US aviation community and US severe weather forecasting community. The model is run for every hour of the day; it is integrated to 51 hours for the 03/09/15/21 UTC cycles and to 21 hours for every ot...

NOAA Real-Time Mesoscale Analysis (RTMA) / Unrestricted Mesoscale Analysis (URMA)

agricultureclimatemeteorologicalweather

The Real-Time Mesoscale Analysis (RTMA) is a NOAA National Centers For Environmental Prediction (NCEP) high-spatial and temporal resolution analysis/assimilation system for near-surf ace weather conditions. Its main component is the NCEP/EMC Gridpoint Statistical Interpolation (GSI) system applied in two-dimensional variational mode to assimilate conventional and satellite-derived observations.

The RTMA was developed to support NDFD operations and provide field forecasters with high quality analyses for nowcasting, situational awareness, and forecast verification purposes. The system produces ...

NOAA S-102 Bathymetric Surface Data

bathymetryhydrographymarine navigationoceansseafloorwater

S-102 is a data and metadata encoding specification that is part of the S-100 Universal Hydrographic Data Model, an international standard for hydrographic data exchange. This collection of data contains bathymetric surfaces from NOAA/NOS/OCS National Bathymetric Source, for various U.S. coastal and offshore waters and the great lakes. These datasets are encoded as HDF5 files conforming to the S-102 specification.

NOAA Severe Weather Data Inventory (SWDI)

agricultureclimatemeteorologicalweather

The Storm Events Database is an integrated database of severe weather events across the United States from 1950 to this year, with information about a storm event's location, azimuth, distance, impact, and severity, including the cost of damages to property and crops. It contains data documenting: The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce. Rare, unusual, weather phenomena that generate media attention, such as snow flurries in South Florida or the S...

NOAA Space Weather Follow-On Mission Geostationary Operational Environmental Satellite (GOES) 19

satellite imageryspace weather

The National Oceanic and Atmospheric Administration (NOAA) Geostationary Operational Environmental Satellite 19 (GOES-19) is the fourth and final satellite in the Geostationary Operational Environmental Satellites (GOES) – R Series, the Western Hemisphere’s most sophisticated weather-observing and environmental monitoring system.
The GOES-R Series provides advanced imagery and atmospheric measurements, real-time mapping of lightning activity, and space weather observations. As a part of the Space Weather Follow On (SWFO) Mission, the GOES-19 spacecraft contains a Compact Coronagraph-1 (...

NOAA Space Weather Forecast and Observation Data

climatemeteorologicalsolarweather

Space weather forecast and observation data is collected and disseminated by NOAA’s Space Weather Prediction Center (SWPC) in Boulder, CO. SWPC produces forecasts for multiple space weather phenomenon types and the resulting impacts to Earth and human activities. A variety of products are available that provide these forecast expectations, and their respective measurements, in formats that range from detailed technical forecast discussions to NOAA Scale values to simple bulletins that give information in laymen's terms. Forecasting is the prediction of future events, based on analysis and...

NOAA Terrestrial Climate Data Records

agricultureclimatemeteorologicalsustainabilityweather

NOAA's Climate Data Records (CDRs) are robust, sustainable, and scientifically sound climate records that provide trustworthy information on how, where, and to what extent the land, oceans, atmosphere and ice sheets are changing. These datasets are thoroughly vetted time series measurements with the longevity, consistency, and continuity to assess and measure climate variability and change. NOAA CDRs are vetted using standards established by the National Research Council (NRC).

Climate Data Records are created by merging data from surface, atmosphere, and space-based systems across decades. NOA...

NOAA U.S. Climate Gridded Dataset (NClimGrid)

agricultureclimatemeteorologicalweather

The NOAA Monthly U.S. Climate Gridded Dataset (NClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature, minimum temperature, average temperature and precipitation. Each file provides monthly values in a 5x5 lat/lon grid for the Continental United States. Data is available from 1895 to the present. On an annual basis, approximately one year of "final" nClimGrid will be submitted to replace the initially supplied "preliminary" data for the same time period. Users should be sure to ascertain which level of data is required for their resear...

NOAA Unified Forecast System (UFS) Coastal Model

climatedisaster responseelevationgeospatiallidarstac

The Unified Forecast System (UFS) is a community-based, coupled, comprehensive Earth Modeling System. The UFS Coastal application is a project under development by NOAA and NCAR, which supports coastal forecasting requirements based on UFS standards. The coupling infrastructure for UFS Coastal App is currently being developed based on a fork of the ufs-weather-model (UFS-WM), with additional coastal model-components including SCHISM, ADCIRC, ROMS, and FVCOM, as well as additional infrastructure to support coastal coupling of WW3 and CICE. The model-level repository contains the model code and external submo...

NOAA Unified Forecast System (UFS) Global Ensemble Forecast System (GEFS) Version 13 Replay

agricultureclimatemeteorologicalweather

The NOAA Unified Forecast System (UFS) / Global Ensemble Forecast System version 13 (GEFSv13) Replay dataset supports the retrospective forecast archive in preparation for GEFSv13 / GFSv17. It includes a range of atmospheric and oceanic variables—such as temperature, humidity, winds, salinity, and currents—covering global conditions at a nominal horizontal resolution of ¼ degree, enabling detailed weather analysis.

The dataset was generated by replaying the coupled UFS model against pre-existing external reanalyses; ERA5 for atmospheric data and ORAS5 for ocean and ice dynamics. Each simulation stream...

NOAA Unified Forecast System (UFS) Hierarchical Testing Framework (HTF)

agricultureclimatedisaster responseenvironmentalmeteorologicaloceansweather

The "Unified Forecast System" (UFS) is a community-based, coupled, comprehensive Earth Modeling System. The Hierarchical Testing Framework (HTF) serves as a comprehensive toolkit designed to enhance the testing capabilities within UFS "repositories". It aims to standardize and simplify the testing process across various "UFS Weather Model" (WM) components and associated modules, aligning with the Hierarchical System Development (HSD) approach and NOAA baseline operational metrics.

The HTF provides a structured methodology for test case design and execution, which enh...

NOAA Unified Forecast System (UFS) Land Data Assimilation (DA) System

agricultureclimatemeteorologicalweather

The Unified Forecast System (UFS) is a community-based, coupled, comprehensive Earth modeling system. It supports "multiple applications" covering different forecast durations and spatial domains. The Land Data Assimilation (DA) System is an offline version of the Noah Multi-Physics (Noah-MP) land surface model (LSM) used in the UFS Weather Model (WM). Its data assimilation framework uses "[Joint Effort for Data assimilation Integration - JEDI] (https://www.jcsda.org/jcsda-project-jedi)" software. The offline Noah-MP LSM is a stand-alone, uncoupled model used to execute land surface simu...

NOAA Unified Forecast System (UFS) Marine Reanalysis: 1979-2019

agricultureclimatemeteorologicalweather

The NOAA UFS Marine Reanalysis is a global sea ice ocean coupled reanalysis product produced by the marine data assimilation team of the UFS Research-to-Operation (R2O) project. Underlying forecast and data assimilation systems are based on the UFS model prototype version-6 and the Next Generation Global Ocean Data Assimilation System (NG-GODAS) release of the Joint Effort for Data assimilation Integration (JEDI) Sea Ice Ocean Coupled Assimilation (SOCA). Covering the 40 year reanalysis time period from 1979 to 2019, the data atmosphere option of the UFS coupled global atmosphere ocean sea ice (DAT...

NOAA Unified Forecast System Short-Range Weather (UFS SRW) Application

agricultureclimatemeteorologicalweather

The "Unified Forecast System (UFS)" is a community-based, coupled, comprehensive Earth Modeling System. It supports " multiple applications" with different forecast durations and spatial domains. The UFS Short-Range Weather (SRW) Application figures among these applications. It targets predictions of atmospheric behavior on a limited spatial domain and on time scales from minutes to several days. The SRW Application includes a prognostic atmospheric model, pre-processor, post-processor, and community workflow for running the system end-to-end. The "SRW Application Users's Guide...

NOAA Unified Forecast System Weather Model (UFS-WM) Regression Tests

agricultureclimatemeteorologicalweather

The Unified Forecast System (UFS) is a community-based, coupled, comprehensive Earth Modeling System. The ufs-weather-model (UFS-WM) is the model source of the UFS for NOAA’s operational numerical weather prediction applications. The UFS-WM Regression Test (RT) is the testing software to ensure that previously developed and tested capabilities in UFS-WM still work after code changes are integrated into the system. It is required that UFS-WM RTs are performed successfully on the required Tier-1 platforms whenever code changes are made to the UFS-WM. The results of the UFS-WM RTs are summarized i...

NOAA Wang Sheeley Arge (WSA) Enlil

climatemeteorologicalsolarweather

The WSA-Enlil heliospheric model provides critical information regarding the propagation of solar Coronal Mass Ejections (CMEs) and transient structures within the heliosphere. Two distinct models comprise the WSA-Enlil modeling system; 1) the Wang-Sheeley-Arge (WSA) semi-empirical solar coronal model, and 2) the Enlil magnetohydrodynamic (MHD) heliospheric model. MHD modeling of the full domain (solar photosphere to Earth) is extremely computationally demanding due to the large parameter space and resulting characteristic speeds within the system. To reduce the computational burden and improve the timeliness (and he...

NOAA Whole Atmosphere Model-Ionosphere Plasmasphere Electrodynamics (WAM-IPE) Forecast System (WFS)

climatemeteorologicalsolarweather

The coupled Whole Atmosphere Model-Ionosphere Plasmasphere Electrodynamics (WAM-IPE) Forecast System (WFS) is developed and maintained by the NOAA Space Weather Prediction Center (SWPC). The WAM-IPE model provides a specification of ionosphere and thermosphere conditions with real-time nowcasts and forecasts up to two days in advance in response to solar, geomagnetic, and lower atmospheric forcing. The WAM is an extension of the Global Forecast System (GFS) with a spectral hydrostatic dynamical core utilizing an enthalpy thermodynamic variable to 150 vertical levels on a hybrid pressure-sigma grid, with a model t...

Nanopore Reference Human Genome

geneticgenomiclife scienceswhole genome sequencing

This dataset includes the sequencing and assembly of a reference standard human genome (GM12878) using the MinION nanopore sequencing instrument with the R9.4 1D chemistry.

Natural Scenes Dataset

computer visionimage processingimaginglife sciencesmachine learningmagnetic resonance imagingneuroimagingneurosciencenifti

Here, we collected and pre-processed a massive, high-quality 7T fMRI dataset that can be used to advance our understanding of how the brain works. A unique feature of this dataset is the massive amount of data available per individual subject. The data were acquired using ultra-high-field fMRI (7T, whole-brain, 1.8-mm resolution, 1.6-s TR). We measured fMRI responses while each of 8 participants viewed 9,000–10,000 distinct, color natural scenes (22,500–30,000 trials) in 30–40 weekly scan sessions over the course of a year. Additional measures were collected including resting-state data, retin...

ONS Open Data Portal

electricityenergyhydrography

The ONS Open Data Portal, produced by the National Operator of the Electric System (ONS), gathers historical data from the Brazilian electricity sector in an easy and democratic way with the main objective to facilitate and improve the access and consumption of this type of content by all users and audiences. The Portal gathers consolidated data on energy generation (hydraulic, thermal, solar and wind), areas of national and international energy exchange and energy load, information about equipments such as transmission lines, generating units, converters and others; and hydrological data obta...

OSAP 2022 Modeling Platform

air qualityenvironmentalmeteorologicalregulatoryweather

The data are part of the 2022 Modeling Platform used to support regulatory actions and technical analyses conducted by the EPA's Office of State Air Partnerships (OSAP). Specifically, this data includes Weather Research and Forecasting Model (v4.4.2) conducted at a 12-km resolution over the Continental United States (12US). MCIP-processed files and wrfcamx-processed (12US1 domain) are also available as part of this dataset to assist in the use of emissions processing and photochemical modeling. These files may be used in downstream applications to generate emissions, photochemical mode...

Open Food Facts Images

image processingmachine learning

A dataset of all images of Open Food Facts, the biggest open dataset of food products in the world.

Open Observatory of Network Interference (OONI)

internet

A free software, global observation network for detecting censorship, surveillance and traffic manipulation on the internet.

OpenWings OpenData

biodiversityfastqgeneticgenomelife sciencesmuseumwildlife

DNA sequence data of UCE loci collected from the world's bird species (n=10,560).

Usage examples

phyluce by Brant Faircloth
Ultraconserved elements anchor thousands of genetic markers for target enrichment spanning multiple evolutionary timescales by BC Faircloth, JE McCormack, NG Crawford, MG Harvey, RT Brumfield, TC Glenn
Tutorial I - UCE Phylogenomics by Brant Faircloth

See 3 usage examples →

Opioid Industry Documents Archive (OIDA) Data on AWS

archiveslife sciencespharmaceuticaltext analysistxt

The OIDA Data on AWS contain the metadata, documents, and extracted text for all of the documents in the UCSF-JHU Opioid Industry Documents Archive, a growing corpus of internal corporate records and other documents arising from the opioid industry.

PROJ datum grids

geospatialmapping

Horizontal and vertical adjustment datasets for coordinate transformation to be used by PROJ 7 or later. PROJ is a generic coordinate transformation software that transforms geospatial coordinates from one coordinate reference system (CRS) to another. This includes cartographic projections as well as geodetic transformations.

Pan-STARRS PS1 Survey

astronomy

The PS1 surveys used a 1.8 meter telescope and its 1.4 Gigapixel camera to image the sky in five broadband filters. The largest of these surveys provides coverage of the entire sky north of -30 degrees declination, with approximately 10 observation epochs across 3 years in each filter.

Physionet

biologylife sciences

PhysioNet offers free web access to large collections of recorded physiologic signals (PhysioBank) and related open-source software (PhysioToolkit).

Provision of Web-Scale Parallel Corpora for Official European Languages (ParaCrawl)

machine translationnatural language processing

ParaCrawl is a set of large parallel corpora to/from English for all official EU languages by a broad web crawling effort. State-of-the-art methods are applied for the entire processing chain from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for CEF.AT and translation memories for DG Translation.

RSNA Screening Mammography Breast Cancer Detection (RSNA-SMBC) Dataset

breast cancercancercomputer visioncsvlabeledlife sciencesmachine learningmammographymedical image computingmedical imagingradiology

According to the WHO, breast cancer is the most commonly occurring cancer worldwide. In 2020 alone, there were 2.3 million new breast cancer diagnoses and 685,000 deaths. Yet breast cancer mortality in high-income countries has dropped by 40% since the 1980s when health authorities implemented regular mammography screening in age groups considered at risk. Early detection and treatment are critical to reducing cancer fatalities, and your machine learning skills could help streamline the process radiologists use to evaluate screening mammograms. Currently, early detection of breast cancer requi...

SMN Hi-Res Weather Forecast over Argentina

earth observationmeteorologicalnatural resourceweather

The Servicio Meteorológico Nacional de Argentina (SMN-Arg), the National Meteorological Service of Argentina, shares its deterministic forecasts generated with WRF 4.0 (Weather and Research Forecasting) initialized at 00 and 12 UTC every day.

This forecast includes some key hourly surface variables –2 m temperature, 2 m relative humidity, 10 m wind magnitude and direction, and precipitation–, along with other daily variables, minimum and maximum temperature.

The forecast covers Argentina, Chile, Uruguay, Paraguay and parts of Bolivia and Brazil in a Lambert conformal projection, with 4 km...

SUCHO Ukrainian Cultural Heritage Web Archives

cultural preservationinternetukraine

The dataset contains web archives of Open Access collections of digitised cultural heritage from more than 3,000+ websites of Ukrainian cultural institutions, such as museums, libraries or archives. The web archives have been produced by SUCHO, which is a volunteer group of more than 1,300 international cultural heritage professionals – librarians, archivists, researchers, programmers - who have joined forces to save as much digitised cultural heritage during the 2022 invasion of Ukraine before the servers hosting them get destroyed, damaged or go offline for any other reason. The web archives...

Sentinel Near Real-time Canada Mirror | Miroir Sentinel temps quasi réel du Canada

agriculturedisaster responseearth observationgeospatialsatellite imagerystacsustainabilitysynthetic aperture radar

The official Government of Canada (GC) 🍁 Near Real-time (NRT) Sentinel Mirror connected to the EU Copernicus programme, focused on Canadian coverage. In 2015, Canada joined the Sentinel collaborative ground segment which introduced an NRT Sentinel mirror site for users and programs inside the Government of Canada (GC). In 2022, the Commission signed a Copernicus Arrangement with the Canadian Space Agency with the aim to share each other’s satellite Earth Observation data on the basis of reciprocity. Further to this arrangement as well as ongoing Open Government efforts, the private mirror was made ope...

Smithsonian Open Access

artcultureencyclopedichistorymuseum

The Smithsonian’s mission is the "increase and diffusion of knowledge" and has been collecting since 1846. The Smithsonian, through its efforts to digitize its multidisciplinary collections, has created millions of digital assets and related metadata describing the collection objects. On February 25th, 2020, the Smithsonian released over 2.8 million CC0 interdisciplinary 2-D and 3-D images, related metadata, and additionally, research data from researches across the Smithsonian. The 2.8 million "open access" collections are a subset of the Smithsonian’s 155 million objects,...

SocialGene RefSeq Databases

amino acidbioinformaticschemical biologygenomicgraphmetagenomicsmicrobiomepharmaceuticalprotein

Precomputed SocialGene Neo4j graph databases of various sizes built from RefSeq genomes and MIBiG BGCs.

Usage examples

See 3 usage examples →

Surya Bench

heliophysicsmachine learningsolar

This dataset provides machine learning (ML)-ready solar data curated from NASA’s Solar Dynamics Observatory (SDO), covering observations from May 13, 2010, to July 31, 2024. It includes Level-1.5 processed data from:

Atmospheric Imaging Assembly (AIA):
Helioseismic and Magnetic Imager (HMI): The dataset is designed to facilitate large-scale ML applications in heliophysics, such as solar activity forecasting, unsupervised representation learning, and scientific foundation model development.

Textbook Question Answering (TQA)

machine learning

1,076 textbook lessons, 26,260 questions, 6229 images

The Genome Modeling System

geneticgenomiclife sciences

The Genome Institute at Washington University has developed a high-throughput, fault-tolerant analysis information management system called the Genome Modeling System (GMS), capable of executing complex, interdependent, and automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. GMS includes a full system image with software and services, expandable from one workstation to a large compute cluster.

The Massively Multilingual Image Dataset (MMID)

computer visionmachine learningmachine translationnatural language processing

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, and parallel to the word's translation into English (and corresponding images.)

The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI Dataset

cancerlife sciencesmagnetic resonance imagingmedical imagingmedicineradiology

The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) dataset is a public, clinical, multimodal brain MRI dataset consisting of 560 brain MRIs from 412 patients with expert annotations of 5136 brain metastases. Data consists of registered and skull stripped T1 post-contrast, T1 pre-contrast, FLAIR and subtraction (T1 pre-contrast - T1 post-contrast) images and voxelwise segmentations of enhancing brain metastases in NifTI format.

UCSC Genome Browser Sequence and Annotations

bioinformaticsbiologygeneticgenomiclife sciences

The UCSC Genome Browser is an online graphical viewer for genomes, a genome browser, hosted by the University of California, Santa Cruz (UCSC). The interactive website offers access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. This dataset is a copy of the MySQL tables in MyISAM binary and tab-sep format and all binary files in custom formats, sometimes referred as 'gbdb'-files. Data from the UCSC Genome Browser is free and open for use by anyone. However, every genome...

USearch Molecules

biologychemical biologylife sciencespharmaceutical

Collection of 7 billion small molecules in SMILES notation with 28 billion fingerprints, including MACCS, ECFP4, FCFP4, and PubChem, with pre-constructed USearch indexes over them.

Umbra Synthetic Aperture Radar (SAR) Open Data

earth observationgeospatialimage processingsatellite imagerystacsynthetic aperture radar

Umbra satellites generate the highest resolution Synthetic Aperture Radar (SAR) imagery ever offered from space, up to 16-cm resolution. SAR can capture images at night, through cloud cover, smoke and rain. SAR is unique in its abilities to monitor changes. The Open Data Program (ODP) features over twenty diverse time-series locations that are updated frequently, allowing users to experiment with SAR's capabilities. We offer single-looked spotlight mode in either 16cm, 25cm, 35cm, 50cm, or 1m resolution, and multi-looked spotlight mode. The ODP also features an assorted collection of over ...

University of British Columbia Sunflower Genome Dataset

agriculturebiodiversitybioinformaticsbiologyfood securitygeneticgenomiclife scienceswhole genome sequencing

This dataset captures Sunflower's genetic diversity originating from thousands of wild, cultivated, and landrace sunflower individuals distributed across North America.The data consists of raw sequences and associated botanical metadata, aligned sequences (to three different reference genomes), and sets of SNPs computed across several cohorts.

VENUS L2A Cloud-Optimized GeoTIFFs

activity detectionagriculturecogdisaster responseearth observationenvironmentalgeospatialimage processingland covernatural resourcesatellite imagerystac

The Venµs science mission is a joint research mission undertaken by CNES and ISA, the Israel Space Agency. It aims to demonstrate the effectiveness of high-resolution multi-temporal observation optimised through Copernicus, the global environmental and security monitoring programme. Venµs was launched from the Centre Spatial Guyanais by a VEGA rocket, during the night from 2017, August 1st to 2nd. Thanks to its multispectral camera (12 spectral bands in the visible and near-infrared ranges, with spectral characteristics provided here), it acquires imagery every 1-2 days over 100+ areas at...

Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021

The electrocardiogram (ECG) is a non-invasive representation of the electrical activity of the heart. Although the twelve-lead ECG is the standard diagnostic screening system for many cardiological issues, the limited accessibility of twelve-lead ECG devices provides a rationale for smaller, lower-cost, and easier to use devices. While single-lead ECGs are limiting [1], reduced-lead ECG systems hold promise, with evidence that subsets of the standard twelve leads can capture useful information [2], [3], [4] and even be comparable to twelve-lead ECGs in some limited contexts. In 2017 we challen...

ZEST: ZEroShot learning from Task descriptions

machine learningnatural language processing

ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.

iNaturalist Licensed Observation Images

biodiversitybioinformaticsconservationearth observationlife sciences

iNaturalist is a community science effort in which participants share observations of living organisms that they encounter and document with photographic evidence, location, and date. The community works together reviewing these images to identify these observations to species. This collection represents the licensed images accompanying iNaturalist observations.

stdpopsim species resources

genetic mapslife sciencespopulation geneticsrecombination mapssimulations

Contains all resources (genome specifications, recombination maps, etc.) required for species specific simulation with the stdpopsim package. These resources are originally from a variety of other consortium and published work but are consolidated here for ease of access and use. If you are interested in adding a new species to the stdpopsim resource please raise an issue on the stdpopsim GitHub page to have the necessary files added here.

ABEJA CC JA

internetjapanesenatural language processingweb archive

A large Japanese language corpus created through preprocessing Common Crawl data

Usage examples

Building a Large-Scale Japanese Corpus from Common Crawl and Its Preprocessing by Kyo Hattori
Tutorial of ABEJA CC JA dataset by Kyo Hattori

See 2 usage examples →

Amazon Bin Image Dataset

amazon.sciencecomputer visionmachine learning

The Amazon Bin Image Dataset contains over 500,000 images and metadata from bins of a pod in an operating Amazon Fulfillment Center. The bin images in this dataset are captured as robot units carry pods as part of normal Amazon Fulfillment Center operations.

Usage examples

See 2 usage examples →

Digital Earth Pacific Mangroves Extent and Density

climateearth observationenvironmentalgeosciencegeospatial

Pacific Mangroves beta version product is an extension of the Global Mangrove Watch (GMV v3, 2020). which shows the extent of mangrove ecosystems across Pacific Island Countries and Territories (PICTs). The changes in mangroves extent was further classified into three categories of closed (high-density), open (lower density) and non-mangrove. This was used as the baseline training layer where mangrove categories between 2016 and 2022 were analysed.

Usage examples

Digital Earth Pacific Open Data Access Documentation by digitalearthpacific
Digital Earth Pacific Map by Digital Earth Pacific Contributors

See 2 usage examples →

Digital Earth Pacific Water Observatins from Space (WOfS)

earth observationenvironmentalgeosciencegeospatialwater

Water Observations from Space (WOfS) beta version product for Water Observations from Space (WOfS) is an annual summary of the temporal and spatial extent of surface water over landscapes. In essence, this highlights where water is usually or where it is rarely. The results are visualised to compare points in time spanning over a year, a season or multiple years. The dataset extends back historically to 2013.

Usage examples

Digital Earth Pacific Open Data Access Documentation by digitalearthpacific
Digital Earth Pacific Map by Digital Earth Pacific Contributors

See 2 usage examples →

Estimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2010 Census Proof of Concept)

ageapproximate monte carloapproximate monte carlo replicatescensusdemographic and housing characteristics filedhcdifferential privacydisclosure avoidanceethnicitygroup quartershispanichousehold typehousinghousing unitslatinomicrodatanoisy measurementspopulationraceredistrictingrelation-to-householdersingle year of agevoting age

The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Approximate Monte Carlo (AMC) method seed Privacy Protected Microdata File (PPMF0) and PPMF replicates (PPMF1, PPMF2, ..., PPMF25) are a set of microdata files intended for use in estimating the magnitude of error(s) introduced by the 2020 Decennial Census Disclosure Avoidance System (DAS) into the Redistricting and DHC products. The PPMF0 was created by executing the 2020 DAS TopDown Algorithm (TDA) using the confidential 2010 Census Edited File (CEF) as the initial input; the replicates were then created by executing the 2020 DAS TDA repeatedly with the PPMF0 as its initial input. Inspired by analogy to the use of bootstrap methods in non-private conte...

Usage examples

Estimating Confidence Intervals Using Approximate Monte Carlo Simulation Iterations (Jupyter Notebook) by Ashmead, R., Hawes, M. B., Pritts, M., Zhuravlev, P., Keller, S. A.
An Approximate Monte Carlo Simulation Method for Estimating Uncertainty and Constructing Confidence Intervals for 2020 Census Statistics by Ashmead, R., Hawes, M. B., Pritts, M., Zhuravlev, P., Keller, S. A.

See 2 usage examples →

Estimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2020 Census Production Run)

2020 censusageapproximate monte carloapproximate monte carlo replicatescensusdecennial censusdemographic and housing characteristics filedhcdifferential privacydisclosure avoidanceethnicitygroup quartershispanichousehold typehousinghousing unitslatinomicrodatanoisy measurementspopulationraceredistrictingrelation-to-householdersingle year of agevoting age

The 2020 Census Production Settings Demographic and Housing Characteristics (DHC) Approximate Monte Carlo (AMC) method seed Privacy Protected Microdata File (PPMF0) and PPMF replicates (PPMF1, PPMF2, ..., PPMF50) are a set of microdata files intended for use in estimating the magnitude of error(s) introduced by the 2020 Census Disclosure Avoidance System (DAS) into the 2020 Census Redistricting Data Summary File (P.L. 94-171), the Demographic and Housing Characteristics File, and the Demographic Profile.

The PPMF0 was the source of the publicly released, official 2020 Census data products referenced above, and was cr...

Usage examples

An Approximate Monte Carlo Simulation Method for Estimating Uncertainty and Constructing Confidence Intervals for 2020 Census Statistics by Ashmead, R., Hawes, M. B., Pritts, M., Zhuravlev, P., Keller, S. A.
Estimating Confidence Intervals Using Approximate Monte Carlo Simulation Iterations (Jupyter Notebook) by Ashmead, R., Hawes, M. B., Pritts, M., Zhuravlev, P., Keller, S. A.

See 2 usage examples →

GenomeKit genomic data

bioinformaticsgenomegenomicHomo sapienslife sciencesMus musculusnon-human primateopen source softwareRattus norvegicusvariant annotation

GenomeKit is Deep Genomics’ Python library for fast and easy access to genomic resources such as sequence, data tracks, and annotations. The goal is to let machine learning researchers build data sets easily, and to be creative about how those data sets are designed. Out of the box, GenomeKit provides access to pre-built optimized genomic data files that are required for its operation.

Usage examples

See 2 usage examples →

IWMI DIWASA Green ET for Africa

evapotranspirationinterception lossrainfed croplandsoil moisturewater

Green evapotranspiration (Green ET) is the portion of ET derived from green water, which includes soil moisture and rainfall used by vegetation. It represents a key component of green water fluxes in water accounting. Green ET consists of evaporation from soil moisture in non-irrigated areas, transpiration from rainfed crops and natural vegetation, and interception losses from precipitation on vegetation. It plays a crucial role in rainfed agriculture, drought monitoring, and sustainable water management by tracking how rainfall supports plant growth.

Usage examples

See 2 usage examples →

Indian High Court Judgments

legal data

This dataset contains judgements from the Indian High Courts, downloaded from ecourts website. It contains judgments of 25 high courts, along with raw metadata (in json format) and structured metadata (in parquet format). Judgments from the website are further compressed to optimize for size (care has been taken to not have any loss of data either in content or in visual appearance). Tar files are also made available in addition to the individual pdf files to make it easier for bulk download.

Usage examples

See 2 usage examples →

Landsat Geometric Median and Absolute Deviations (GeoMAD) over the Pacific.

earth observationgeosciencegeospatial

The GeoMAD is derived from Landsat surface reflectance data. The data are masked for cloud, shadows and other image artefacts using the associated pixel quality product to help provide as clear a set of observations as possible from which to calculate the medians.

Usage examples

Digital Earth Pacific Map by Digital Earth Pacific Contributors
Digital Earth Pacific Open Data Access Documentation by digitalearthpacific

See 2 usage examples →

Marine Animal - Satellite Relay Tagging - Quality controlled profiles

biologychemical biologychemistrymarine mammalsoceans

CTD (Conductivity-Temperature_Depth)-Satellite Relay Data Loggers (CTD-SRDLs) are used to explore how marine animal behaviour relates to their oceanic environment. Loggers developed at the University of St Andrews Sea Mammal Research Unit transmit data in near real-time via the Argo satellite system. Data represented here was collected in the Southern Ocean, from elephant, fur and Weddell Seals. In 2024 data was added from flatback and olive ridley turtles, from a pilot study co-funded by the Royal Australian Navy in collaboration with the Australian Institute of Marine Science and Indigenous ...

Usage examples

See 2 usage examples →

Moorings - Hourly time-series product

chemistryocean velocityoceans

Integrated Marine Observing System (IMOS) have moorings across both it's National Mooring Network and Deep Water Moorings facilities. The National Mooring Network facility comprises a series of national reference stations and regional moorings designed to monitor particular oceanographic phenomena in Australian coastal ocean waters. The Deep Water Moorings facility (formerly known as the Australian Bluewater Observing System) provides the coordination of national efforts in the sustained observation of open ocean properties with particular emphasis on observations important to climate and ...

Usage examples

See 2 usage examples →

NASA SOTERIA Simulation Testbed Data

life sciencesneuroimagingtransportationworkload analysis

Commercial pilot simulation data during safety-of-flight scenarios.

Usage examples

Python Processing Code by Tyler Fettrow
SOTERIA Simulation - Experimental Methods, Data Processing, and Data Quality by Tyler Fettrow, Chad Stephens, Lance Prinzel, Jon Holbrook, Sepher Bastami, Michael Stewart, Kathryn Ballard, Daniel Kiggins

See 2 usage examples →

National Mooring Network - CTD profiles

chemistryoceans

This collection includes conductivity-temperature-depth (CTD) profiles obtained at the National Reference Stations (NRS) as part of the water sampling program. The instruments used also measure dissolved oxygen, fluorescence, and turbidity. The collection also includes practical salinity, water density and artificial chlorophyll concentration, as computed from the measured properties. The data are processed in delayed mode, with automated quality control applied. The National Reference Station network is designed to provide baseline information, at timescales relevant to human response, that i...

Usage examples

See 2 usage examples →

Ocean Gliders - Delayed mode

chemistryocean currentsocean velocityoceans

The Australian National Facility for Ocean Gliders (ANFOG), with IMOS/NCRIS funding, deploys a fleet of eight gliders around Australia. The data represented by this record, are presented in delayed mode. The underwater ocean glider represents a technological revolution for oceanography. Autonomous ocean gliders can be built relatively cheaply, are controlled remotely and are reusable allowing them to make repeated subsurface ocean observations at a fraction of the cost of conventional methods. The data retrieved from the glider fleet will contribute to the study of the major boundary current s...

Usage examples

See 2 usage examples →

Ocean Radar - Bonney coast site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Bonney Coast (BONC) HF ocean radar system covers an area of the Bonney Coast, South Australia, which has a recurring annual upwelling feature near to the coast that significantly changes the ecosystem from one of warm water originating in Western Australia, to one dominated by cold upwelling water from off the continental shelf. The dynamics of this area and the relationship between ocean circulation, chemistry and sediments control the larval species and the higher marine species and ecosystems in which they forage. The data from this site provide linking observations between the Southe...

Usage examples

See 2 usage examples →

Ocean Radar - Capricorn bunker group site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Capricorn Bunker Group site is in the southern region of the Great Barrier Reef Marine Park World Heritage Area (GBR). The HF ocean radar coverage is from the coast to beyond the edge of the continental shelf. This is an area where the East Australian Current (EAC) meanders as it moves south from the Swain Reefs and loses touch with the western land boundary. The area is dynamic with warm EAC water recirculating and being wind-driven northwards along the coast inside the GBR lagoon. The recirculating warm water contrasts with the upwelling tendency of the parts of the EAC which contin...

Usage examples

See 2 usage examples →

Ocean Radar - Capricorn bunker group site - Wave - Delayed mode

ocean currentsoceans

The Capricorn Bunker Group site is in the southern region of the Great Barrier Reef Marine Park World Heritage Area (GBR). The HF ocean radar coverage is from the coast to beyond the edge of the continental shelf. This is an area where the East Australian Current (EAC) meanders as it moves south from the Swain Reefs and loses touch with the western land boundary. The area is dynamic with warm EAC water recirculating and being wind-driven northwards along the coast inside the GBR lagoon. The recirculating warm water contrasts with the upwelling tendency of the parts of the EAC which contin...

Usage examples

See 2 usage examples →

Ocean Radar - Capricorn bunker group site - Wind - Delayed mode

meteorologicalocean currentsoceans

The Capricorn Bunker Group site is in the southern region of the Great Barrier Reef Marine Park World Heritage Area (GBR). The HF ocean radar coverage is from the coast to beyond the edge of the continental shelf. This is an area where the East Australian Current (EAC) meanders as it moves south from the Swain Reefs and loses touch with the western land boundary. The area is dynamic with warm EAC water recirculating and being wind-driven northwards along the coast inside the GBR lagoon. The recirculating warm water contrasts with the upwelling tendency of the parts of the EAC which contin...

Usage examples

See 2 usage examples →

Ocean Radar - Coffs Harbour site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Coffs Harbour (COF) HF ocean radar site is located near the point at which the East Australian Current (EAC) begins to separate from the coast. Here the EAC is at its narrowest and swiftest: to the north it is forming from the westwards subtropical jet, and to the south it forms eddies and eventually the warm water moves eastwards across the Tasman Sea, forming a front with the cold water of the Southern Ocean. The connection between coastal and continental shelf waters is fundamental to the understanding of the anthropogenic impact on the coastal ocean and the role of the ocean in mitig...

Usage examples

See 2 usage examples →

Ocean Radar - Coffs Harbour site - Wave - Delayed mode

ocean currentsoceans

The Coffs Harbour (COF) HF ocean radar site is located near the point at which the East Australian Current (EAC) begins to separate from the coast. Here the EAC is at its narrowest and swiftest: to the north it is forming from the westwards subtropical jet, and to the south it forms eddies and eventually the warm water moves eastwards across the Tasman Sea, forming a front with the cold water of the Southern Ocean. The connection between coastal and continental shelf waters is fundamental to the understanding of the anthropogenic impact on the coastal ocean and the role of the ocean in mitig...

Usage examples

See 2 usage examples →

Ocean Radar - Coffs Harbour site - Wind - Delayed mode

meteorologicalocean currentsoceans

The Coffs Harbour (COF) HF ocean radar site is located near the point at which the East Australian Current (EAC) begins to separate from the coast. Here the EAC is at its narrowest and swiftest: to the north it is forming from the westwards subtropical jet, and to the south it forms eddies and eventually the warm water moves eastwards across the Tasman Sea, forming a front with the cold water of the Southern Ocean. The connection between coastal and continental shelf waters is fundamental to the understanding of the anthropogenic impact on the coastal ocean and the role of the ocean in mitig...

Usage examples

See 2 usage examples →

Ocean Radar - Coral coast site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Coral Coast (CORL) HF ocean radar system covers an area of the Western Australia Coast, Western Australia, an area subject to the variability of the Leeuwin Current (LC) and its coupling with coastal winds, tides, and waves. In this area the LC generates several eddies which control the larval species and the higher marine species and ecosystems in which they forage.The CORL HF ocean radar system consists of two SeaSonde crossed loop direction finding stations located at Dongara (29.283 S 114.920E) and Green Head (114.967 E 30.073 S). These radars operate at a frequency of 4.463 MHz, with...

Usage examples

See 2 usage examples →

Ocean Radar - Northwest shelf site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Northwest Shelf (NWA) HF ocean radar system covers an area which includes the Ningaloo Peninsula and the Ningaloo Reef to the west. The Ningaloo Reef is one of the longest and most pristine reefs in the world. The reef is rich in marine biodiversity, with shark whales, turtles and fish aggregations, and high primary and secondary productions which are controlled by the physical oceanographic processes. The NWA HF ocean radar is a WERA phased array system with 12-element receive arrays located at the Jurabi Turtle Centre (21.8068 S, 114.1015 E) and Point Billie (22.5432 S, 113.690 E). Th...

Usage examples

See 2 usage examples →

Ocean Radar - Rottnest shelf site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Rottnest Shelf (ROT) HF ocean radar system covers an area which includes Rottnest Island and the Perth Canyon to the north-west. The Perth Canyon has the highest marine biodiversity in the region with whale and fish aggregations, and high primary and secondary productions which are controlled by the physical oceanographic processes. Combined with the dynamics of the Perth Canyon is the dominant Leeuwin Current which produces a wake on the leeward side of Rottnest Island. This is a topographically induced up-welling and associated primary and secondary productivity. The region is influ...

Usage examples

See 2 usage examples →

Ocean Radar - Rottnest shelf site - Wave - Delayed mode

ocean currentsoceans

The Rottnest Shelf (ROT) HF ocean radar system covers an area which includes Rottnest Island and the Perth Canyon to the north-west. The Perth Canyon has the highest marine biodiversity in the region with whale and fish aggregations, and high primary and secondary productions which are controlled by the physical oceanographic processes. Combined with the dynamics of the Perth Canyon is the dominant Leeuwin Current which produces a wake on the leeward side of Rottnest Island. This is a topographically induced up-welling and associated primary and secondary productivity. The region is influ...

Usage examples

See 2 usage examples →

Ocean Radar - Rottnest shelf site - Wind - Delayed mode

meteorologicalocean currentsoceans

The Rottnest Shelf (ROT) HF ocean radar system covers an area which includes Rottnest Island and the Perth Canyon to the north-west. The Perth Canyon has the highest marine biodiversity in the region with whale and fish aggregations, and high primary and secondary productions which are controlled by the physical oceanographic processes. Combined with the dynamics of the Perth Canyon is the dominant Leeuwin Current which produces a wake on the leeward side of Rottnest Island. This is a topographically induced up-welling and associated primary and secondary productivity. The region is influ...

Usage examples

See 2 usage examples →

Ocean Radar - South Australian gulfs site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The South Australia Gulfs (SAG) HF ocean radar system covers the area of about 40,000 square kilometres bounded by Kangaroo Island to the east and the Eyre Peninsula to the north. This is a dynamic region where warm water from the remnants of the Leeuwin current is moving from the west, and water with varying density is exchanging with Spencer Gulf and the Gulf of St Vincent. Upwelling events occur from the deep ocean on the south side of the observation area. This is a key ocean area for aquaculture and fishing, and is a major shipping thoroughfare. The data from this HF ocean radar syste...

Usage examples

See 2 usage examples →

Ocean Radar - South Australian gulfs site - Wave - Delayed mode

ocean currentsoceans

The South Australia Gulfs (SAG) HF ocean radar system covers the area of about 40,000 square kilometres bounded by Kangaroo Island to the east and the Eyre Peninsula to the north. This is a dynamic region where warm water from the remnants of the Leeuwin current is moving from the west, and water with varying density is exchanging with Spencer Gulf and the Gulf of St Vincent. Upwelling events occur from the deep ocean on the south side of the observation area. This is a key ocean area for aquaculture and fishing, and is a major shipping thoroughfare. The data from this HF ocean radar syste...

Usage examples

See 2 usage examples →

Ocean Radar - South Australian gulfs site - Wind - Delayed mode

meteorologicalocean currentsoceans

The South Australia Gulfs (SAG) HF ocean radar system covers the area of about 40,000 square kilometres bounded by Kangaroo Island to the east and the Eyre Peninsula to the north. This is a dynamic region where warm water from the remnants of the Leeuwin current is moving from the west, and water with varying density is exchanging with Spencer Gulf and the Gulf of St Vincent. Upwelling events occur from the deep ocean on the south side of the observation area. This is a key ocean area for aquaculture and fishing, and is a major shipping thoroughfare. The data from this HF ocean radar syste...

Usage examples

See 2 usage examples →

Ocean Radar - Turquoise coast site - Sea water velocity - Delayed mode

ocean currentsocean velocityoceans

The Turquoise Coast (TURQ) HF ocean radar system covers the area of shelf between Seabird and Jurien Bay and is the logical continuation of major research efforts to understand the role of the Leeuwin Current System (Leeuwin Current, the Leeuwin Undercurrent and Capes Current) in controlling not only the physical system but also its links to both pelagic and benthic ecosystems. In contrast to eastern ocean basins, which are highly productive, Western Australia experiences an oligotrophic environment. The Leeuwin Current is a shallow (<300 m deep), narrow band (< 100 km wide) of warm, lo...

Usage examples

See 2 usage examples →

OceanCurrent - Gridded sea level anomaly - Near real time

ocean sea surface heightocean velocityoceans

Gridded (adjusted) sea level anomaly (GSLA), gridded sea level (GSL) and surface geostrophic velocity (UCUR,VCUR) for the Australasian region. GSLA is mapped using optimal interpolation of detided, de-meaned, inverse-barometer-adjusted altimeter and tidegauge estimates of sea level. GSL is GSLA plus an estimate of the departure of mean sea level from the geoid – mean sea level (over 18 years of model time) of Ocean Forecasting Australia Model version 3 (OFAM3). The geostrophic velocities are derived from GSLA and the mean surface velocity from OFAM3. The altimeter data window for input to the ...

Usage examples

See 2 usage examples →

Pacific Coastlines Change

coastalearth observationenvironmentalgeosciencegeospatial

Pacific Coastlines beta version product includes coastline change detection since the year 2000 for Pacific Island Country and Territories (PICTs). This product will provide ongoing monitoring of coastline change detection. This provides insights into processes including erosion (where landmass area decreases) and accretion or deposition (where landmass area increases).

Usage examples

Digital Earth Pacific Map by Digital Earth Pacific Contributors
Digital Earth Pacific Open Data Access Documentation by digitalearthpacific

See 2 usage examples →

Platinum Pedigree

bioinformaticsgenomicgenotypingHomo sapienslife scienceslong read sequencingwhole genome sequencing

The Platinum Pedigree Consortium (PCC) is a collaborative project to create a comprehensive reference for human genetic variation using a four-generation, 28-member family (CEPH-1463). We employed five different short and long-read sequencing technologies to generate phased assemblies and characterize both inherited and de novo variation, including at some of the most difficult to genotype genomic regions such as tandem repeats, centromeres, and the Y chromosome. This extensive "truth set" is publicly available and can be used to test and benchmark new algorithms and technologies to ...

Usage examples

See 2 usage examples →

SSL4EO S12 Landsat Multi Product Dataset

satellite imagery

This dataset combines SSL4EO-S12 and SSL4EO-L to create a multi-view dataset for multi-modal fusion using self-supervised learning for earth observation.

Usage examples

See 2 usage examples →

Satellite - Altimetry calibration and validation

chemistryocean currentsoceans

High precision satellite altimeter missions including TOPEX/Poseidon (T/P), Jason-1 and now OSTM/Jason-2, have contributed fundamental advances in our understanding of regional and global ocean circulation and its role in the Earth's climate and regional applications. These altimeter satellites essentially observe the height of the global oceans – as such, they have become the tool of choice for scientists to measure sea level rise – both at regional and global scales as well as giving information about ocean currents and large- and small-scale variability. The determination of changes in ...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (Carder model)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water.There are multiple retrieval algorithms for estimating Chl-a. These data use the Carder method implemented in the SeaDAS processing software l2gen and described in Carder K. L., Chen F. R., Lee Z. P., Hawes S. K. and Cannizzaro J. P. (2003), MODIS Ocean Science Team Algorithm Theoretical Basis Docume...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (GSM model)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the Garver-Siegel-Maritorena (GSM) method implemented in the SeaDAS processing software l2gen and described in “Chapter 11, and references therein, of IOCCG Report 5, 2006, (http://ioccg.org/wp-content/uploads/2015/10/ioc...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (OC3 model)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the OC3 method recommended by the NASA Ocean Biology Processing Group and implemented in the SeaDAS processing software l2gen. The OC3 algorithm is described at http://oceancolor.gsfc.nasa.gov/cms/atbd/chlor_a (and links th...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (OCI model)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the OCI method (Hu et al 2012, doi: 10.1029/2011jc007395) recommended by the NASA Ocean Biology Processing Group and implemented in the SeaDAS processing software l2gen. The OCI algorithm is described at https://oceancolor....

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - NOAA20 - 1 day - Chlorophyll-a concentration (GSM model)

biologyoceanssatellite imagery

The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the Garver-Siegel-Maritorena (GSM) method implemented in the SeaDAS processing software l2gen and described in “Chapter 11, and references therein, of IOCCG Report 5, 2006, (http://ioccg.org/wp-content/uploads/2015/10/ioc...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - NOAA20 - 1 day - Chlorophyll-a concentration (OC3 model)

biologyoceanssatellite imagery

The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the OC3 method recommended by the NASA Ocean Biology Processing Group and implemented in the SeaDAS processing software l2gen. The OC3 algorithm is described at http://oceancolor.gsfc.nasa.gov/cms/atbd/chlor_a (and links ...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - NOAA20 - 1 day - Chlorophyll-a concentration (OCI model)

biologyoceanssatellite imagery

The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the OCI method (Hu et al 2012, doi: 10.1029/2011jc007395) recommended by the NASA Ocean Biology Processing Group and implemented in the SeaDAS processing software l2gen. The OCI algorithm is described at https://oceancolo...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - SNPP - 1 day - Chlorophyll-a concentration (GSM model)

biologyoceanssatellite imagery

The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the Garver-Siegel-Maritorena (GSM) method implemented in the SeaDAS processing software l2gen and described in “Chapter 11, and references therein, of IOCCG Report 5, 2006, (http://ioccg.org/wp-content/uploads/2015/10/ioccg...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - SNPP - 1 day - Chlorophyll-a concentration (OC3 model)

biologyoceanssatellite imagery

The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the OC3 method recommended by the NASA Ocean Biology Processing Group and implemented in the SeaDAS processing software l2gen. The OC3 algorithm is described at http://oceancolor.gsfc.nasa.gov/cms/atbd/chlor_a (and links th...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - SNPP - 1 day - Chlorophyll-a concentration (OCI model)

biologyoceanssatellite imagery

The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. There are multiple retrieval algorithms for estimating Chl-a. These data use the OCI method (Hu et al 2012, doi: 10.1029/2011jc007395) recommended by the NASA Ocean Biology Processing Group and implemented in the SeaDAS processing software l2gen. The OCI algorithm is described at https://oceancolor....

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Diffuse attenuation coefficient (k490)

oceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the diffuse attenuation coefficient (Kd) at 490nm wavelength which provides information on how light is attenuated in the water column. It is defined as the scaling length of the exponential decrease of the downwelling irradiance and has units (m^-1). The MODIS K490 product estimates Kd at 490nm wavelength, using a semi-empirical model based on the ratio of water leaving radiances at 490nm and 555nm....

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Nanoplankton fraction (OC3 model and Brewin et al 2012 algorithm)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. An empirical relationship is then used to compute an estimate of the relative abundance of three phytoplankton size classes (micro, nano and picoplankton). The methods used to decompose chl_oc3 are described by Brewin et al in two papers in 2010 and 2012. The two methods, denoted Brewin2010at and Br...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Net Primary Productivity (GSM model and Eppley-VGPM algorithm)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. Modelling is then used to compute an estimate of the Net Primary Productivity (NPP).The model used is based on the standard vertically generalised production model (VGPM). The VGPM is a "chlorophyll-based" model that estimates net primary production from chlorophyll using a temperature-de...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Net Primary Productivity (OC3 model and Eppley-VGPM algorithm)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. Modelling is then used to compute an estimate of the Net Primary Productivity (NPP).The model used is based on the standard vertically generalised production model (VGPM). The VGPM is a "chlorophyll-based" model that estimates net primary production from chlorophyll using a temperature-de...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Optical Water Type (Moore et al 2009 algorithm)

oceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These measurements at discrete wavelengths represent the spectrum of light leaving the water surface, and the shape of the spectrum is characteristic of the water optical properties.Moore et al. (2009) applied a clustering technique to spectra to identify 8 sets of discrete optical water types. This product "owt_csiro" is produced using a CSIRO implementation of the Moore et al. algorithm, and testing shows that it closely reproduces the res...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - MODIS - 1 day - Picoplankton fraction (OC3 model and Brewin et al 2012 algorithm)

biologyoceanssatellite imagery

The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the concentration of chlorophyll-a (Chl-a), most typically due to phytoplankton, present in the water. An empirical relationship is then used to compute an estimate of the relative abundance of three phytoplankton size classes (micro, nano and picoplankton). The methods used to decompose chl_oc3 are described by Brewin et al in two papers in 2010 and 2012. The two methods, denoted Brewin2010at and Br...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - NOAA20 - 1 day - Diffuse attenuation coefficient (k490)

biologyoceanssatellite imagery

The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the diffuse attenuation coefficient (Kd) at 490nm wavelength which provides information on how light is attenuated in the water column. It is defined as the scaling length of the exponential decrease of the downwelling irradiance, and has units (m^-1). The VIIRS K490 product estimates Kd at 490nm wavelength, using a semi-empirical model based on the ratio of water leaving radiances at 490nm and 555...

Usage examples

See 2 usage examples →

Satellite - Ocean Colour - SNPP - 1 day - Diffuse attenuation coefficient (k490)

biologyoceanssatellite imagery

The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the diffuse attenuation coefficient (Kd) at 490nm wavelength which provides information on how light is attenuated in the water column. It is defined as the scaling length of the exponential decrease of the downwelling irradiance, and has units (m^-1). The VIIRS K490 product estimates Kd at 490nm wavelength, using a semi-empirical model based on the ratio of water leaving radiances at 490nm and 555nm...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 3 - Multi sensor - 1 day - Day and night time

oceanssatellite imagery

This is a multi-sensor SSTfnd L3S product for a single 24 hour period, derived using sea surface temperature retrievals from the VIIRS sensor on the Suomi-NPP satellite and JPSS series of satellites, and AVHRR sensor on the NOAA and Metop series of Polar-orbiting satellites. The sensors and satellite platforms contributing to each file are listed in the sensor and platform global attributes in the file header. The SSTfnd is derived by adding a constant 0.17 degC to the NOAA AVHRR SSTskin observations, and 0 degC to the Metop and VIIRS SSTsubskin observations, after rejecting observations with ...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 3 - Multi sensor - 3 day - Day and night time

oceanssatellite imagery

This is a multi-sensor SSTfnd L3S product for a single 72 hour period, derived using sea surface temperature retrievals from the VIIRS sensor on the Suomi-NPP satellite and JPSS series of satellites, and AVHRR sensor on the NOAA and Metop series of Polar-orbiting satellites. The sensors and satellite platforms contributing to each file are listed in the sensor and platform global attributes in the file header. The SSTfnd is derived by adding a constant 0.17 degC to the NOAA AVHRR SSTskin observations, and 0 degC to the Metop and VIIRS SSTsubskin observations, after rejecting observations with ...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 3 - Single sensor - 1 day - Day and night time

oceanssatellite imagery

This is a single-sensor multi-satellite SSTfnd product for a single 24 hour period, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites. It is provided as a 0.02deg x 0.02deg cylindrical equidistant projected map over the region 70°E to 170°W, 20°N to 70°S. Each grid cell contains the 24 hour average of all the highest available quality SSTs that overlap with that cell, weighted by the area of overlap. The diagram at https://help.aodn.org.au/satellite-data-product-information/ indicates where this product fits within the GHRSST suite of NOAA/AVHRR ...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 3 - Single sensor - 1 day - Day and night time - Southern Ocean

oceanssatellite imagery

This is a single-sensor SSTfnd product for a single 24 hour period, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites. It is provided as a 0.02deg x 0.02deg cylindrical equidistant projected map over the Southern Ocean region 3°E to 158°W, 27°S to 78°S. Each grid cell contains the 24 hour average of all the highest available quality SSTs that overlap with that cell, weighted by the area of overlap. The diagram at https://help.aodn.org.au/satellite-data-product-information/ indicates where this product fits within the GHRSST suite of NOAA/AVHRR pro...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 3 - Single sensor - 1 month - Day time

oceanssatellite imagery

This is a single-sensor multi-satellite SSTskin product for 1 month of consecutive day-time periods, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites. It is provided as a 0.02deg x 0.02deg cylindrical equidistant projected map over the region 70°E to 170°W, 20°N to 70°S. Each grid cell contains the 1 month average of all the highest available quality SSTs that overlap with that cell, weighted by the area of overlap. The diagram at https://help.aodn.org.au/satellite-data-product-information/ indicates where this product fits within the GHRSST suite...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 3 - Single sensor - 6 day - Day and night time

oceanssatellite imagery

This is a single-sensor multi-satellite SSTfnd product for a 144 hour period, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites. It is provided as a 0.02deg x 0.02deg cylindrical equidistant projected map over the region 70°E to 170°W, 20°N to 70°S. Each grid cell contains the 144 hour average of all the highest available quality SSTs that overlap with that cell, weighted by the area of overlap. The diagram at https://help.aodn.org.au/satellite-data-product-information/ indicates where this product fits within the GHRSST suite of NOAA/AVHRR products. The SSTfnd is derived by adding a constant 0.17 de...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 3 - Single sensor - Himawari-8 - 1 day - Night time

oceanssatellite imagery

This is a regional GHRSST level 3 collated (L3C) dataset on 0.02-degree rectangular grid over the Australasian domain (70E to 190E, 70S to 20N) based on retrievals from the AHI imager on board Himawari-8 satellite. The Bureau of Meteorology (Bureau) produces Integrated Marine Observing System (IMOS) satellite SST products in the International Group for High Resolution SST (GHRSST) GDS2 file formats for Himawari-8 in real time and delayed mode. This product is composed of reprocessed multi-swath SSTskin retrievals obtained from compositing IMOS Himawari-8 hourly L3C files over the night (before...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 4 - Multi sensor - Global Australian

oceanssatellite imagery

An International Group for High-Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis, produced daily on an operational basis at the Australian Bureau of Meteorology using optimal interpolation (OI) on a global 0.25 degree grid. This Global Australian Multi-Sensor SST Analysis (GAMSSA) v1.0 system blends infra-red SST observations from the Advanced Very High Resolution Radiometer (AVHRR) on NOAA and METOP polar-orbiting satellites, microwave SST observations from the Advanced Microwave Scanning Radiometer-2 (AMSR-2) on GCOM-W, and in situ data from ships, and dr...

Usage examples

See 2 usage examples →

Satellite - Sea surface temperature - Level 4 - Multi sensor - Regional Australian

oceanssatellite imagery

An International Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis, produced daily on an operational basis at the Australian Bureau of Meteorology using optimal interpolation (OI) on a regional 1/12 degree grid over the Australian region (20N - 70S, 60E - 170W). This Regional Australian Multi-Sensor SST Analysis (RAMSSA) v1.0 system blends infra-red SST observations from the Advanced Very High Resolution Radiometer (AVHRR) on NOAA and METOP polar-orbiting satellites, microwave SST observations from the Advanced Microwave Scanning Radiometer-2 (...

Usage examples

See 2 usage examples →

Sentinel-1 Mean and Median Annual Mosaic

climateearth observationenvironmentalgeosciencegeospatial

Sentinel-1 carries a Synthetic Aperture RADAR (SAR) that operates on the C-band. This platform offers SAR data day and night and in all-weather conditions.

Usage examples

Digital Earth Pacific Open Data Access Documentation by digitalearthpacific
Digital Earth Pacific Map by Digital Earth Pacific Contributors

See 2 usage examples →

Sentinel-1 Precise Orbit Determination (POD) Products

auxiliary datadisaster responseearth observationearthquakesfloodsgeophysicssentinel-1synthetic aperture radar

Sentinel-1 Precise Orbit Determination (POD) products contain auxiliary data on satellite position and velocity for the European Space Agency's (ESA) Sentinel-1 mission. Sentinel-1 is a C-band Synthetic Aperture Radar (SAR) satellite constellation first launched in 2014 as part of the European Union's Copernicus Earth Observation programme. POD products are a necessary auxiliary input for nearly all Sentinel-1 data processing workflows.

This dataset is a mirror of the Sentinel-1 Orbits dataset hosted in the Copernicus Data Space Ecosystem (CDSE). New files are added within 20 minutes of their publ...

Usage examples

See 2 usage examples →

Sentinel-2 Geometric Median and Absolute Deviations (GeoMAD) over the Pacific

earth observationgeosciencegeospatial

The Geometric Median and Absolute Deviations (GeoMAD) product is a cloud-free annual mosaic that uses a more robust method of determining the median observation than a simple median. Along with the median observation, the GeoMAD produces three measures of variance, or absolute deviations, which helps to understand how the data over the time period changes. For example, some areas, such as desert, will change very little. Whereas crop land will change more. All ofthese values are useful in understand what is happening in the area covered by the GeoMAD.

Usage examples

Digital Earth Pacific Open Data Access Documentation by digitalearthpacific
Digital Earth Pacific Map by Digital Earth Pacific Contributors

See 2 usage examples →

Ships of Opportunity - Air-sea fluxes - Meteorological and flux - Delayed mode

air temperatureatmospheremeteorologicaloceansradiation

Enhancement of Measurements on Ships of Opportunity (SOOP)-Air Sea Flux sub-facility collects underway meteorological and oceanographic observations during scientific and Antarctic resupply voyages in the oceans adjacent to Australia. Data product is quality controlled bulk air-sea fluxes and input observations. Research Vessel Real Time Air-Sea Fluxes, equips the Marine National Facility (MNF) (Research Vessels Southern Surveyor and Investigator), the Australian Antarctic Division (Research and Supply Vessels Aurora Australis and Nuyina), and Research Vessel Tangaroa with "climate qualit...

Usage examples

See 2 usage examples →

Ships of Opportunity - Air-sea fluxes - Meteorological and sea surface temperature - Real time

air temperatureatmospheremeteorologicaloceansprecipitationradiation

Enhancement of Measurements on Ships of Opportunity (SOOP)-Air Sea Flux sub-facility collects underway meteorological and oceanographic observations during scientific and Antarctic resupply voyages in the oceans adjacent to Australia. Data product is quality controlled observations. Research Vessel Real Time Air-Sea Fluxes, equips the Marine National Facility (MNF) (Research Vessels Southern Surveyor and Investigator), the Australian Antarctic Division (Research and Supply Vessels Aurora Australis and Nuyina), and Research Vessel Tangaroa with "climate quality" meteorological measure...

Usage examples

See 2 usage examples →

Ships of Opportunity - Biogeochemical sensors - Delayed mode

atmospherechemistrymeteorologicaloceans

The IMOS Ship of Opportunity Underway CO2 Measurements group is a research and data collection project working within the IMOS Ship of Opportunity Multi-Disciplinary Underway Network sub-facility. The CO2 group sample critical regions of the Southern Ocean and the Australian shelf waters have a major impact on CO2 uptake by the ocean. These are regions where biogeochemical cycling is predicted to be particularly sensitive to a changing climate. The pCO2 Underway System measures the fugacity of carbon dioxide (fCO2) along with other variables such as sea surface salinity (SSS) and sea surface t...

Usage examples

See 2 usage examples →

Ships of Opportunity - Expendable bathythermographs - Delayed mode

oceans

IMOS Ship of Opportunity Underway Expendable Bathythermographs (XBT) group is a research and data collection project working within the IMOS Ship of Opportunity Multi-Disciplinary Underway Network sub-facility. Five major (HRX) high-resolution XBT lines provide boundary to boundary profiling and closely spaced sampling to resolve mesoscale eddies, fronts and boundary currents. The lines are repeated 4 times per year by an on-board technician. The routes sample each major boundary current system using available commercial vessel traffic. All of the transects transmit data in real-time. The data...

Usage examples

See 2 usage examples →

Ships of Opportunity - Expendable bathythermographs - Real time

oceans

XBT real-time data is available through the IMOS portal. Data is acquired by technicians who ride the ships of opportunity in order to perform high density sampling along well established transit lines. The data acquisition system used is the Quoll developed by Turo Technology. Data collected and is stored in netcdf files, with real-time data messages (JJVV bathy messages) created on the ship and sent to shore by iridium sbd. This is inserted onto the GTS by our colleagues at the Australian Bureau of Meteorology. The full resolution data is collected from the ship and returned for processing t...

Usage examples

See 2 usage examples →

Ships of Opportunity - Fisheries vessels - Real time

oceans

Fisheries Vessels as Ships of Opportunities (FishSOOP) is an IMOS Sub-Facility working with fishers to collect real-time temperature and depth data by installing equipment on a network of commercial fishing vessels using a range of common fishing gear.Every day, fishing vessels operate broadly across the productive areas of Australia’s Exclusive Economic Zone where we have few subsurface ocean measurements. The FishSOOP Sub-Facility is utilising this observing opportunity to cost-effectively increase the spatial and temporal resolution of subsurface temperature data in Australia’s inshore, she...

Usage examples

See 2 usage examples →

Ships of Opportunity - Sea surface temperature - 1-minute average data products

air temperatureatmosphereoceans

The Sea Surface Temperature (SST) sub-facility produces 1-minute average data products. Observed data are 1-minute median SST values and are retrieved from the vessel once an hour. High-resolution 1-minute median data are available in the delayed mode approximately every 3 months, these were produced in delayed mode using visual (manual) inspection of the quality flags after the data had been run through the automated quality control software at the Bureau of Meteorology The data products are produced from data from 3 ships: L'Astrolabe, Xutra Buhm and Wana Buhm, within this sub-facility w...

Usage examples

See 2 usage examples →

Ships of Opportunity - Tropical research vessels - Real time

chemistryoceans

The research vessels (RV Cape Ferguson and RV Solander) of the Australian Institute of Marine Science (AIMS) routinely record along-track (underway) measurements of near-surface water temperature, salinity, chlorophyll (fluorescence) and turbidity (NTU) during scientific operations in the tropical waters of northern Australia, particularly the Great Barrier Reef (GBR). All data records include sampling time (UTC), position (Latitude, Longitude) and water depth (under keel). Data are recorded at 10 second intervals. Data are measured with a Seabird SBE38 thermometer, Seabird SBE21 thermosalinog...

Usage examples

See 2 usage examples →

Wave buoys observations - Real time

oceans

Buoys provide integral wave parameters. Buoy data from the following organisations contribute to the National Wave Archive: Manly Hydraulics Laboratory (part of the NSW Department of Planning and Environment (DPE), which has assumed function of the former NSW Office of Environment and Heritage (OEH)); Bureau of Meteorology; Western Australia Department of Transport (DOT); the Queensland Department of Environment and Science (DES); the Integrated Marine Observing System (IMOS); Gippsland Ports; the NSW Nearshore Wave Data Program from the NSW Department of Planning and Environment (DPE); the Un...

Usage examples

See 2 usage examples →

YouTube 8 Million - Data Lakehouse Ready

amazon.sciencecomputer visionlabeledmachine learningparquetvideo

This both the original .tfrecords and a Parquet representation of the YouTube 8 Million dataset. YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This dataset also includes the YouTube-8M Segments data from June 2019. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of...

Usage examples

YouTube 8 Million by Google Research
Data Lake as Code Deployment Guide by AWS Industry Blueprints Team

See 2 usage examples →

1KG-ONT-VIENNA panel

fast5fastqgeneticgenomiclife scienceswhole genome sequencing

The 1KG-ONT-VIENNA panel comprises medium coverage ONT sequencing data for 1.019 samples from the 1000 Genomes Project collection, structural variants, and their haplotype context.

Usage examples

Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project by Siegfried Schloissnig, Samarendra Pani, Bernardo Rodriguez-Martin, Jana Ebler, Carsten Hain, Vasiliki Tsapalou, Arda Söylev, Patrick Hüther, Hufsah Ashraf, Timofey Prodanov, Mila Asparuhova, Sarah Hunt, Tobias Rausch, Tobias Marschall, Jan O Korbel

See 1 usage example →

AWS iGenomes

agricultureamazon.sciencebiologyCaenorhabditis elegansDanio reriogeneticgenomicHomo sapienslife sciencesMus musculusRattus norvegicusreference index

Common reference genomes hosted on AWS S3. Can be used when aligning and analysing raw DNA sequencing data.

Usage examples

nf-core analysis pipelines by Phil Ewels

See 1 usage example →

AllTheBacteria

assemblybacteriabioinformaticsfastagenomiclife sciencesmicrobial genomicsshort read sequencingwhole genome sequencing

All bacterial isolate whole-genome sequencing data from INSDC, uniformly assembled, quality-controlled, annotated, and searchable.

Usage examples

AllTheBacteria - all bacterial genomes assembled, available and searchable by Hunt M, Lima L, Anderson D, Hawkey J, Shen W, Lees J, Iqbal I

See 1 usage example →

Amazon-PQA

amazon.sciencemachine learningnatural language processing

Amazon product questions and their answers, along with the public product information.

Usage examples

Answering Product-Questions by Utilizing Questions from Other Contextually Similar Products by Ohad Rozen, David Carmel, Avihai Mejer, Vitaly Mirkis, and Yftah Ziser

See 1 usage example →

Answer Reformulation

amazon.sciencemachine learningnatural language processing

Original StackExchange answers and their voice-friendly Reformulation.

Usage examples

Voice-based Reformulation of Community Answers by Simone Filice, Nachshon Cohen & David Carmel

See 1 usage example →

Automatic Speech Recognition (ASR) Error Robustness

amazon.sciencedeep learningmachine learningnatural language processingspeech recognition

Sentence classification datatasets with ASR Errors.

Usage examples

Using Phoneme Representations to Build Predictive Models Robust to ASR Errors by Anjie Fang, Simone Filice, Nut Limsopatham and Oleg Rokhlenko

See 1 usage example →

BodyM Dataset

amazon.sciencecomputer visiondeep learning

The first large public body measurement dataset including 8978 frontal and lateral silhouettes for 2505 real subjects, paired with height, weight and 14 body measurements. The following artifacts are made available for each subject.

Subject Height
Subject Weight
Subject Gender
Two black-and-white silhouette images of subject standing in frontal and side pose respectively with full body in view.
14 body measurements in cm - {ankle girth, arm-length, bicep girth, calf girth, chest girth, forearm girth, height, hip girth, leg-length, shoulder-breadth,

...

Usage examples

Human Body Measurement Estimation with Adversarial Augmentation by Nataniel Ruiz, Miriam Bellver, Timo Bolkart, Ambuj Arora, Ming C. Lin, Javier Romero and Raja Bala

See 1 usage example →

Boltz-1 Training Data

deep learninglife sciencesmolecular dockingopen source softwareprotein folding

This is the data used to train the Boltz-1 model. It contains the following datasets:

Our pre-processed version of the Protein Data Bank
Our pre-processed version of the multiple sequence alignment data for each protein chain
The raw multiple sequence alginment data.
A pre-computed symmetry file for symmetry correction during training

Usage examples

Boltz-1: Democratizing Biomolecular Interaction Modeling by J Wohlwend, G Corso, S Passaro, M Reveiz, K Leidal, W Swiderski, T Portnoi, I Chinn, J Silterra, T Jaakkola, R Barzilay

See 1 usage example →

Clay Model v0 Embeddings

aerial imagerycomputer visionearth observationimagingmachine learningsatellite imagery

Machine learning model embeddings dataset providing pre-computed feature representations for satellite and aerial imagery analysis.

Usage examples

Revolutionizing earth observation with geospatial foundation models on AWS by Karsten Schroer, Bishesh Adhikari, and Iza Moise

See 1 usage example →

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

amazon.scienceconversation datamachine learningnatural language processing

This bucket contains the checkpoints used to reproduce the baseline results reported in the DialoGLUE benchmark hosted on EvalAI (https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview). The associated scripts for using the checkpoints are located here: https://github.com/alexa/dialoglue. The associated paper describing the benchmark and checkpoints is here: https://arxiv.org/abs/2009.13570. The provided checkpoints include the CONVBERT model, a BERT-esque model trained on a large open-domain conversational dataset. It also includes the CONVBERT-DG and BERT-DG checkpoints descri...

Usage examples

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue by Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur

See 1 usage example →

Enriched Topical-Chat Dataset for Knowledge-Grounded Dialogue Systems

amazon.scienceconversation datamachine learningnatural language processing

This dataset provides extra annotations on top of the publicly released Topical-Chat dataset(https://github.com/alexa/Topical-Chat) which will help in reproducing the results in our paper "Policy-Driven Neural Response Generation for Knowledge-Grounded Dialogue Systems" (https://arxiv.org/abs/2005.12529?context=cs.CL). The dataset contains 5 files: train.json, valid_freq.json, valid_rare.json, test_freq.json and test_rare.json. Each of these files will have additional annotations on top of the original Topical-Chat dataset. These specific annotations are: dialogue act annotations a...

Usage examples

Policy-Driven Neural Response Generation for Knowledge-Grounded Dialogue Systems by Behnam Hedayatnia, Karthik Gopalakrishnan, Seokhwan Kim, Yang Liu, Mihail Eric & Dilek Hakkani-Tur

See 1 usage example →

Euclid Quick Release 1 (Q1)

astronomyimagingobject detectionsatellite imagerysurvey

Euclid launched in July 2023 as a European Space Agency (ESA) mission with involvement by NASA. The primary science goals of Euclid are to better understand the composition and evolution of the dark Universe. The Euclid mission will provide space-based imaging and spectroscopy as well as supporting ground-based imaging to achieve these primary goals. These data will be archived by multiple global repositories, including IRSA, where they will support transformational work in many areas of astrophysics. Euclid Quick Release 1 (Q1) consists of ~30 TB of imaging, spectroscopy, and catalogs coverin...

Usage examples

Notebook Tutorials by Caltech/IPAC-IRSA

See 1 usage example →

GRAF Reforecast

atmospherecloud amountERA5forecastgeosciencegeospatialmodelMPASnear-surface air temperaturenear-surface relative humidityprecipitationvisibilityweatherwind speedszarr

A zarr-formatted dataset of 1836 reforecast cases (approx. 5 years) from The Weather Company GRAF (Global high-Resolution Atmospheric Forecasting) model, a version of the National Center for Atmospheric Research (NCAR) Model for Predictions Across Scales (MPAS). GRAF is global, but the configuration for this reforecast had a mesh refinement to approx. 4 km over the US, Caribbean Basin, and Europe, and 15 km elsewhere. This model was designed to run much of its computation on graphical processing units, with this development assisted by NVIDIA. The 1836 cases (approx. 5 years) were generated fr...

Usage examples

Global reforecasts from MPAS “GRAF” with mesh refinement over the US and Europe by Thomas M. Hamill, Raghu Raj Prasanna Kumar, Karthik Kashinath2, Carl Ponder, Mike Pritchard, Tao Ge, Akshay Subramanian, Jaideep Pathak, John Wong, Brett Wilt, Peter Neilley

See 1 usage example →

Gaia DR3

astronomy

Gaia DR3 data were originally released by the European Space Agency in December 2020. This HATS-formatted catalog was produced by the LSST Interdisciplinary Network for Collaboration and Computing. The GAIA HATS Datasets are specifically designed for efficient spatial cross-matching with other HATS-format catalogs, whether within the same archive or across distributed archive data centers. This enables astronomers to perform complex analyses, such as identifying correlations or overlaps between datasets from different surveys. Users can leverage LSDB (Large-Scale Database), a scalable spatial ...

Usage examples

Dark Energy Survey / Gaia DR3 Crossmatch by LSDB Collaboration

See 1 usage example →

Global Carbon Budget Data

climatelandoceans

The Global Carbon Budget (GCB) is recognised globally as the most comprehensive report on global carbon emissions and sinks. This dataset, updated every year, includes estimates of land and ocean carbon fluxes from the suite of models used in the report.

Usage examples

Global Carbon Budget 2023 by Pierre Friedlingstein, Michael O’Sullivan, Matthew W. Jones, Robbie M. Andrew, Luke Gregor, Judith Hauck, Corinne Le Quéré, Ingrid T. Luijkx, Are Olsen, Glen P. Peters, Wouter Peters, Julia Pongratz, Clemens Schwingshackl, Stephen Sitch, Josep G. Canadell, Philippe Ciais, Rob B. Jackson,Simone Alin, Ramdane Alkama, Almut Arneth, Vivek K. Arora, Nicholas R. Bates, Meike Becker, Nicolas Bellouin, Henry C. Bittig, Laurent Bopp, Frédéric Chevallier, Louise P. Chini, Margot Cronin, Wiley Evans, Stefanie Falk, Richard A. Feely, Thomas Gasser, Marion Gehlen, Thanos Gkritzalis, Lucas Gloege, Giacomo Grassi, Nicolas Gruber, Özgür Gürses, Ian Harris, Matthew Hefner, Richard A. Houghton, George C. Hurtt, Yosuke Iida, Tatiana Ilyina, Atul K. Jain, Annika Jersild, Koji Kadono, Etsushi Kato, Daniel Kennedy, Kees Klein Goldewijk, Jürgen Knauer, Jan Ivar Korsbakken, Peter Landschützer, Nathalie Lefèvre, Keith Lindsay, Junjie Liu, Zhu Liu, Gregg Marland, Nicolas Mayot, Matthew J. McGrath, Nicolas Metzl, Natalie M. Monacci, David R. Munro, Shin-Ichiro Nakaoka, Yosuke Niwa, Kevin O´Brien, Tsuneo Ono, Paul I. Palmer, Naiqing Pan, Denis Pierrot, Katie Pocock, Benjamin Poulter, Laure Resplandy, Eddy Robertson, Christian Rödenbeck, Carmen Rodriguez, Thais M. Rosan, Jörg Schwinger, Roland Séférian, Jamie D. Shutler, Ingunn Skjelvan, Tobias Steinhoff, Qing Sun, Adrienne J. Sutton, Colm Sweeney, Shintaro Takao, Toste Tanhua, Pieter P. Tans, Xiangjun Tian, Hanqin Tian, Bronte Tilbrook, Hiroyuki Tsujino, Francesco Tubiello, Guido R. van der Werf, Anthony P. Walker, Rik Wanninkhof, Chris Whitehead, Anna Wranne, Rebecca Wright, Wenping Yuan, Chao Yue, Xu Yue, Sönke Zaehle, Jiye Zeng, Bo Zheng

See 1 usage example →

Google Brain Genomics Sequencing Dataset for Benchmarking and Development

amazon.sciencebioinformaticsfastqgeneticgenomiclife scienceslong read sequencingshort read sequencingwhole exome sequencingwhole genome sequencing

To facilitate benchmarking and development, the Google Brain group has sequenced 9 human samples covering the Genome in a Bottle truth sets on different sequencing instruments, sequencing modalities (Illumina short read and Pacific BioSciences long read), sample preparation protocols, and for whole genome and whole exome capture. The original source of these data are gs://google-brain-genomics-public.

Usage examples

An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development by Baid G., Nattestad M., Kolesnikov A., Goel S., Yang H., Chang P., and Carroll A (2020)

See 1 usage example →

Helpful Sentences from Reviews

amazon.scienceinformation retrievaljsonnatural language processingtext analysis

A collection of sentences extracted from customer reviews labeled with their helpfulness score.

Usage examples

Identifying Helpful Sentences in Product Reviews by Iftah Gamzu et al (2021)

See 1 usage example →

Humor Detection from Product Question Answering Systems

amazon.sciencemachine learningnatural language processing

This dataset provides labeled humor detection from product question answering systems. The dataset contains 3 csv files: Humorous.csv containing the humorous product questions, Non-humorous-unbiased.csv containing the non-humorous prodcut questions from the same products as the humorous one, and, Details →

Usage examples

Humor Detection in Product Question Answering Systems. by Yftah Ziser, Elad Kravi & David Carmel

See 1 usage example →

Humor patterns used for querying Alexa traffic

amazon.sciencedialogmachine learningnatural language processing

Humor patterns used for querying Alexa traffic when creating the taxonomy described in the paper "“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational Agents" by Shani C., Libov A., Tolmach S., Lewin-Eytan L., Maarek Y., and Shahaf D. (CHI LBW 2022). These patterns corrospond to the researchers' hypotheses regarding what humor types are likely to appear in Alexa traffic. These patterns were used for querying Alexa traffic to evaluate these hypotheses.

Usage examples

“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational Agents by Shani C., Libov A., Tolmach S., Lewin-Eytan L., Maarek Y., and Shahaf D.

See 1 usage example →

IGP Coal Plant

air qualityenergyenvironmentalinfrastructure

This dataset includes detailed information about coal power plants, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.

Usage examples

Analyzing Coal Plant Data and Emission Calculation by APAD

See 1 usage example →

Learning to Rank and Filter - community question answering

amazon.sciencemachine learningnatural language processing

This dataset provides product related questions and answers, including answers' quality labels, as as part of the paper 'IR Evaluation and Learning in the Presence of Forbidden Documents'.

Usage examples

IR Evaluation and Learning in the Presence of Forbidden Documents by David Carmel, Nachshon Cohen, Amir Ingber & Elad Kravi

See 1 usage example →

Low Context Name Entity Recognition (NER) Datasets with Gazetteer

amazon.sciencenatural language processing

See https://lowcontext-ner-gaz.s3.amazonaws.com/readme.html

Usage examples

GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input by Tao Meng, Anjie Fang, Oleg Rokhlenko and Shervin Malmasi

See 1 usage example →

Multi Token Completion

amazon.sciencemachine learningnatural language processing

This dataset provides masked sentences and multi-token phrases that were masked-out of these sentences. We offer 3 datasets: a general purpose dataset extracted from the Wikipedia and Books corpora, and 2 additional datasets extracted from pubmed abstracts. As for the pubmed data, please be aware that the dataset does not reflect the most current/accurate data available from NLM (it is not being updated). For these datasets, the columns provided for each datapoint are as follows: text- the original sentence span- the span (phrase) which is masked out span_lower- the lowercase version of span r...

Usage examples

Simple and Effective Multi-Token Completion from Masked Language Models by Oren Kalinsky, Guy Kushilevitz, Alex Libov & Yoav Goldberg

See 1 usage example →

Multilingual Name Entity Recognition (NER) Datasets with Gazetteer

amazon.sciencenatural language processing

Name Entity Recognition datasets containing short sentences and queries with low-context, including LOWNER, MSQ-NER, ORCAS-NER and Gazetteers (1.67 million entities). This release contains the multilingual versions of the datasets in Low Context Name Entity Recognition (NER) Datasets with Gazetteer.

Usage examples

Gazetteer Enhanced Named Entity Recognition for Code-Mixed Web Queries by Besnik Fetahu, Anjie Fang, Oleg Rokhlenko and Shervin Malmasi

See 1 usage example →

Ocean Biodiversity Information System (OBIS) species occurrence data

biodiversitycoastalconservationecosystemsenvironmentalgeospatiallife sciencesoceanswater

The Ocean Biodiversity Information System (OBIS) was founded in 2000 under the Census of Marine Life. It is now a programme component of the International Oceanographic Data and Information Exchange (IODE) programme of the Intergovernmental Oceanographic Commission (IOC) of UNESCO. OBIS aims to be the most comprehensive data and information gateway on the diversity, distribution and abundance of marine life to support its Member States in achieving a healthy and resilient ocean ecosystem. The OBIS network consists of over 30 regional and thematic nodes, and provides access to more than 5,000 d...

Usage examples

Querying OBIS occurrence data using Amazon Athena by Pieter Provoost

See 1 usage example →

OceanOmics

biodiversitybioinformaticsbiologyconservationgeneticgenomiclife sciences

Minderoo Foundation OceanOmics aims to establish environmental DNA (eDNA) as a tool to measure, understand, and protect oceans. OceanOmics mainly generates two types of data: eDNA sequencing data (metabarcoding, metagenomics), and genome assembly data (marine vertebrates).

Usage examples

Case-studies on using OceanOmics genomes and eDNA data by Philipp Bayer

See 1 usage example →

PASS: Perturb-and-Select Summarizer for Product Reviews

amazon.sciencenatural language processingtext analysis

A collection of product reviews summaries automatically generated by PASS for 32 Amazon products from the FewSum dataset

Usage examples

PASS: Perturb-and-Select Summarizer for Product Reviews by Nadav Oved and Ran Levy (2021)

See 1 usage example →

PersonPath22

amazon.sciencecomputer vision

PersonPath22 is a large-scale multi-person tracking dataset containing 236 videos captured mostly from static-mounted cameras, collected from sources where we were given the rights to redistribute the content and participants have given explicit consent. Each video has ground-truth annotations including both bounding boxes and tracklet-ids for all the persons in each frame.

Usage examples

Large scale Real-world Multi-Person Tracking by Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew Berneshawi, Alyssa Boden, Joseph Tighe

See 1 usage example →

Phrase Clustering Dataset (PCD)

amazon.sciencejsonnatural language processing

This dataset is part of the paper "McPhraSy: Multi-Context Phrase Similarity and Clustering" by DN Cohen et al (2022). The purpose of PCD is to evaluate the quality of semantic-based clustering of noun phrases. The phrases were collected from the [Amazon Review Dataset] (https://nijianmo.github.io/amazon/).

Usage examples

McPhraSy: Multi context phrase similarity and clustering by Amir DN Cohen, Hila Gonen, Ori Shapira, Ran Levy, and Yoav Goldberg

See 1 usage example →

Poseidon 3D Seismic, Australia

explorationgeophysicsseismology

Near, mid, far, full stack (with AGC) imaged 3D seismic data. We also include the decimated stacking velocity field. The dataset is used in oil and gas exploration. Survey size is approximately 2,900 km2.Coordinate system used is: GDA94 / MGA Zone 51 Petrel, 700004Datasets are converted to open-source MDIO format (v1 specification).Original SEG-Y files are licensed as CC BY 3.0 AU and are downloaded from Google Drive accessed via SEG Open Data Wiki. The datasets are available courtesy of ConocoPhillips and Geoscience Australia. Raw data can be requested from Geoscience Australia's NOPIMS s...

Usage examples

MDIO by TGS

See 1 usage example →

Pre- and post-purchase product questions

amazon.sciencemachine learningnatural language processing

This dataset provides product related questions, including their textual content and gap, in hours, between purchase and posting time. Each question is also associated with related product details, including its id and title.

Usage examples

"Did you buy it already?", Detecting Users Purchase-State From Their Product-Related Questions by Lital Kuchy, David Carmel, Thomas Huet & Elad Kravi

See 1 usage example →

Product Comparison Dataset for Online Shopping

amazon.sciencemachine learningnatural language processingonline shoppingproduct comparison

The Product Comparison dataset for online shopping is a new, manually annotated dataset with about 15K human generated sentences, which compare related products based on one or more of their attributes (the first such data we know of for product comparison). It covers ∼8K product sets, their selected attributes, and comparison texts.

Usage examples

Generating Explainable Product Comparisons for Online Shopping by Nikhita Vedula, Marcus Collins, Eugene Agichtein and Oleg Rokhlenko

See 1 usage example →

PyEnvs and CallArgs

code completionmachine learning

PyEnvs is a collection of 2814 permissively licensed Python packages along with their isolated development environments. Paired with a program analyzer (e.g. Jedi Language Server), it supports querying for project-related information. CallArgs is a dataset built on top of PyEnvs for function call argument completion. It provides function definition, implementation, and usage information for each function call instance.

Usage examples

Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion by Hengzhi Pei, Jinman Zhao, Leonard Lausen, Sheng Zha, George Karypis

See 1 usage example →

Shopping Humor Generation

amazon.sciencecommercenatural language processing

This dataset provides a set of non-shoppable items, which are items that can't be purchased via a virtual assistant (love, vampires, etc). In addition, for each non-shoppable item, the dataset contains humorous responses generated by two different large language models, and a template-based generation solution that uses a commonsense knowledge graph called ConceptNet. Finally, each row contains a score provided by human annotators that judge how funny each response is. The columns provided for each datapoint are as follows: question- purchase request of the non-shoppable item answer- a ge...

Usage examples

Evaluating Humorous Response Generation to Playful Shopping Requests by Natalie Shapira, Oren Kalinsky, Alex Libov, Chen Shani, Sofia Tolmach

See 1 usage example →

Visual Anomaly (VisA)

amazon.scienceanomaly detectionclassificationfewshotindustrialsegmentation

Largest Visual Anomaly detection dataset containing objects from 12 classes in 3 domains across 10,821(9,621 normal and 1,200 anomaly) images. Both image and pixel level annotations are provided.

Usage examples

SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation by Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer

See 1 usage example →

VoiSeR

amazon.scienceinformation retrievalmachine learningnatural language processing

Voice-based refinements of product search

Usage examples

VoiSeR: A New Benchmark for Voice-Based Search Refinement by Simone Filice, Giuseppe Castellucci, Marcus Collins, Eugene Agichtein & Oleg Rokhlenko

See 1 usage example →

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

amazon.sciencemachine learningnatural language processing

This dataset provides how-to articles from wikihow.com and their summaries, written as a coherent paragraph. The dataset itself is available at wikisum.zip, and contains the article, the summary, the wikihow url, and an official fold (train, val, or test). In addition, human evaluation results are available at wikisum-human-eval...

Usage examples

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation by Nachshon Cohen, Oren Kalinsky, Yftah Ziser & Alessandro Moschitti

See 1 usage example →

Wizard of Tasks

amazon.scienceconversation datadialogmachine learningnatural language processing

Wizard of Tasks (WoT) is a dataset containing conversations for Conversational Task Assistants (CTAs). A CTA is a conversational agent whose goal is to help humans to perform real-world tasks. A CTA can help in exploring available tasks, answering task-specific questions and guiding users through step-by-step instructions. WoT contains about 550 conversations with ~18,000 utterances in two domains, i.e., Cooking and Home Improvement.

Usage examples

Wizard of Tasks: A Novel Conversational Dataset for Solving Real-World Tasks in Conversational Settings by Jason Ingyu Choi, Saar Kuzi, Nikhita Vedula, Jie Zhao, Giuseppe Castellucci, Marcus Collins, Shervin Malmasi, Oleg Rokhlenko and Eugene Agichtein

See 1 usage example →

mirrulations

government records

The regulations.gov website allows users to view proposed rules and supporting documents for the federal rule-making process. In addition, users can post and view comments about those proposed rules. The site contains about 27 million pieces of text and binary data, but the API that provides access only allows a user to obtain one thousand items per hour. As a result, it would take approximately 3 years to download all the data. Mirrulations (MIRRor of regULATIONS.gov) is a system that uses a collection of donated API keys to create a mirror of the data. In addition, for each pdf in the da...

Usage examples

Extracting Comment Sentiment from Public Comments by Ben Coleman

See 1 usage example →

AI Weather Prediction (AIWP) Model Reforecasts

environmentalmeteorologicalweather

This is an archive of pure AI-based weather prediction reforecasts produced collaboratively between the Cooperative Institute for Research in the Atmosphere (CIRA) and the NOAA Global Systems Laboratory (NOAA-GSL).

Currently, FourCastNetv2-small, Pangu-Weather, and GraphCast are included, with more models to come. Each of these models has been initialized with both NOAA GFS (directories with no extension) and ECMWF IFS initial conditions (directories ending in "_IFS"). The datasets are updated with near-real-time data twice per day (00Z and 12Z initializations).

FourCastNetv2-small and Pangu-Weather are available from 10/2020 to present...

Airborne Object Tracking Dataset

amazon.sciencecomputer visiondeep learningmachine learning

Airborne Object Tracking (AOT) is a collection of 4,943 flight sequences of around 120 seconds each, collected at 10 Hz in diverse conditions. There are 5.9M+ images and 3.3M+ 2D annotations of airborne objects in the sequences. There are 3,306,350 frames without labels as they contain no airborne objects. For images with labels, there are on average 1.3 labels per image. All airborne objects in the dataset are labelled.

Amazon Berkeley Objects Dataset

amazon.sciencecomputer visiondeep learninginformation retrievalmachine learningmachine translation

Amazon Berkeley Objects (ABO) is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalog images. 8,222 listings come with turntable photography (also referred as "spin" or "360º-View" images), as sequences of 24 or 72 images, for a total of 586,584 images in 8,209 unique sequences. For 7,953 products, the collection also provides high-quality 3d models, as glTF 2.0 files.

Amazon Seller Contact Intent Sequence

amazon.scienceHawkes Processmachine learningtemporal point process

When sellers need help from Amazon, such as how to create a listing, they often reach out to Amazon seller support through email, chat or phone. For each contact, we assign an intent so that we can manage the request more easily. The data we present in this release includes 548k contacts with 118 intents from 70k sellers sampled from recent years. There are 3 columns. 1. De-identified seller id - seller_id_anon; 2. Noisy inter-arrival time in the unit of hour between contacts - interarrival_time_hr_noisy; 3. An integer that represents the contact intent - contact_intent. Note that, to balance ...

Blue Brain Open Data

brain imagesbrain modelselectrophysiologyion channelslife sciencesmicrocircuit modeling and simulationmorphological reconstructionsMus musculusneurosciencesimulation neurosciencesingle neuron models

The Blue Brain Open Data represents an extensive neuroscience dataset encompassing a diverse range of data types, including experimental, model, and simulation data, along with images and videos depicting reconstructed neurons and brain regions.

Clay v1.5 NAIP-2

aerial imageryagricultureenvironmentalland usenatural resource

National Agriculture Imagery Program (NAIP) dataset providing high-resolution aerial imagery for agricultural monitoring, land use analysis, and natural resource management.

Clay v1.5 Sentinel-2

agricultureearth observationenvironmentalland usesatellite imagery

Sentinel-2 satellite imagery dataset providing high-resolution optical data for land monitoring, agriculture, and environmental applications.

FashionLocalTriplets

amazon.sciencecomputer visionmachine learning

Fine-grained localized visual similarity and search for fashion.

Google Books Ngrams

amazon.sciencenatural language processing

N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.

MWIS VR Instances

amazon.sciencegraphtraffictransportation

Large-scale node-weighted conflict graphs for maximum weight independent set solvers

Registry of Open Data on AWS

amazon.sciencejsonmetadata

The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Curren...

Spatiam Corporation National Lab Research Announcement International Space Station Technology Demonstration

network traffictelecommunications

Experiment data for the Spatiam DTN Network Platform Technology demonstration carried to test a Delay and Disruption Tolerant Network between the ISS and Earth.

TSBench

benchmarkdeep learningmachine learningmeta learningtime series forecasting

TSBench comprises thousands of benchmark evaluations for time series forecasting methods. It provides various metrics (i.e. measures of accuracy, latency, number of model parameters, ...) of 13 time series forecasting methods across 44 heterogeneous datasets. Time series forecasting methods include both classical and deep learning methods while several hyperparameters settings are evaluated for the deep learning methods.In addition to the tabular data providing the metrics, TSBench includes the probabilistic forecasts of all evaluated methods for all 44 datasets. While the tabular data is smal...