The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

About

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

See all usage examples for datasets listed in this registry tagged with image processing.


Search datasets (currently 13 matching datasets)

You are currently viewing a subset of data tagged with image processing.


Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.


Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

Allen Cell Imaging Collections

biologycell biologycell imagingHomo sapiensimage processinglife sciencesmachine learningmicroscopy

This bucket contains multiple datasets (as Quilt packages) created by the Allen Institute for Cell Science. The types of data included in this bucket are listed below:

  1. Field of view or cropped images of cells
  2. Segmentations of structures in the images (e.g., boundaries of cells, DNA, other intracellular structures, etc.)
  3. Processed versions of the above images and segmentations
  4. Machine learning predictions and labels of the data listed above
  5. Models trained on the previously listed data
  6. Additional supporting non-image data related to the above listed data types (e.g., gene expression data, whole genome sequencing data, features derived from the images or model predictions, metadata)
  7. Simulation, analysis, and visualization data of in silico cell structures, cells, and cell populations
Extern...

Details →

Usage examples

See 20 usage examples →

Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 Imagery

biologyfluorescence imagingimage processingimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscience

This data set, made available by Janelia's FlyLight project, consists of fluorescence images of Drosophila melanogaster driver lines, aligned to standard templates, and stored in formats suitable for rapid searching in the cloud. Additional data will be added as it is published.

Details →

Usage examples

See 13 usage examples →

Low Altitude Disaster Imagery (LADI) Dataset

aerial imagerycoastalcomputer visiondisaster responseearth observationearthquakesgeospatialimage processingimaginginfrastructurelandmachine learningmappingnatural resourceseismologytransportationurbanwater

The Low Altitude Disaster Imagery (LADI) Dataset consists of human and machine annotated airborne images collected by the Civil Air Patrol in support of various disaster responses from 2015-2023. Two key distinctions are the low altitude, oblique perspective of the imagery and disaster-related features, which are rarely featured in computer vision benchmarks and datasets.

Details →

Usage examples

See 11 usage examples →

Open NeuroData

array tomographybiologyelectron microscopyimage processinglife scienceslight-sheet microscopymagnetic resonance imagingneuroimagingneuroscience

This bucket contains multiple neuroimaging datasets (as Neuroglancer Precomputed Volumes) across multiple modalities and scales, ranging from nanoscale (electron microscopy), to microscale (cleared lightsheet microscopy and array tomography), and mesoscale (structural and functional magnetic resonance imaging). Additionally, many of the datasets include segmentations and meshes.

Details →

Usage examples

See 9 usage examples →

Earth Observation Data Cubes for Brazil

cogearth observationgeosciencegeospatialimage processingopen source softwaresatellite imagerystac

Earth observation (EO) data cubes produced from analysis-ready data (ARD) of CBERS-4, Sentinel-2 A/B and Landsat-8 satellite images for Brazil. The datacubes are regular in time and use a hierarchical tiling system. Further details are described in Ferreira et al. (2020).

Details →

Usage examples

See 7 usage examples →

Capella Space Synthetic Aperture Radar (SAR) Open Dataset

cogcomputer visionearth observationgeospatialimage processingsatellite imagerystacsynthetic aperture radar

Open Synthetic Aperture Radar (SAR) data from Capella Space. Capella Space is an information services company that provides on-demand, industry-leading, high-resolution synthetic aperture radar (SAR) Earth observation imagery. Through a constellation of small satellites, Capella provides easy access to frequent, timely, and flexible information affecting dozens of industries worldwide. Capella's high-resolution SAR satellites are matched with unparalleled infrastructure to deliver reliable global insights that sharpen our understanding of the changing world – improving decisions ...

Details →

Usage examples

See 6 usage examples →

PoroTomo

geospatialgeothermalimage processingseismology

Released to the public as part of the Department of Energy's Open Energy Data Initiative, these data represent vertical and horizontal distributed acoustic sensing (DAS) data collected as part of the Poroelastic Tomography (PoroTomo) project funded in part by the Office of Energy Efficiency and Renewable Energy (EERE), U.S. Department of Energy.

Details →

Usage examples

See 6 usage examples →

RACECAR Dataset

autonomous racingautonomous vehiclescomputer visionGNSSimage processinglidarlocalizationobject detectionobject trackingperceptionradarrobotics

The RACECAR dataset is the first open dataset for full-scale and high-speed autonomous racing. Multi-modal sensor data has been collected from fully autonomous Indy race cars operating at speeds of up to 170 mph (273 kph). Six teams who raced in the Indy Autonomous Challenge during 2021-22 have contributed to this dataset. The dataset spans 11 interesting racing scenarios across two race tracks which include solo laps, multi-agent laps, overtaking situations, high-accelerations, banked tracks, obstacle avoidance, pit entry and exit at different speeds. The data is organized and released in bot...

Details →

Usage examples

See 5 usage examples →

High Resolution Canopy Height Maps by WRI and Meta

aerial imageryagricultureclimatecogearth observationgeospatialimage processingland covermachine learningsatellite imagery

Global and regional Canopy Height Maps (CHM). Created using machine learning models on high-resolution worldwide Maxar satellite imagery.

Details →

Usage examples

See 4 usage examples →

Mouse Brain Anatomy: MouseLight Imagery

biologyfluorescence imagingimage processingimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscience

This data set, made available by Janelia's MouseLight project, consists of images and neuron annotations of the Mus musculus brain, stored in formats suitable for viewing and annotation using the HortaCloud cloud-based annotation system.

Details →

Usage examples

See 4 usage examples →

SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)

biodiversitybiologyecosystemsimage processingmultimediawildlife

The SiPeCaM goal is to create a data source that allows to evaluate changes in the biodiversity state, considering key aspect of how does the ecosystem behaves.

Details →

Usage examples

See 4 usage examples →

Allen Ivy Glioblastoma Atlas

biologycancercomputer visiongene expressiongeneticglioblastomaHomo sapiensimage processingimaginglife sciencesmachine learningneurobiology

This dataset consists of images of glioblastoma human brain tumor tissue sections that have been probed for expression of particular genes believed to play a role in development of the cancer. Each tissue section is adjacent to another section that was stained with a reagent useful for identifying histological features of the tumor. Each of these types of images has been completely annotated for tumor features by a machine learning process trained by expert medical doctors.

Details →

Usage examples

See 3 usage examples →

Allen Mouse Brain Atlas

biologygene expressiongeneticimage processingimaginglife sciencesMus musculusneurobiologytranscriptomics

The Allen Mouse Brain Atlas is a genome-scale collection of cellular resolution gene expression profiles using in situ hybridization (ISH). Highly methodical data production methods and comprehensive anatomical coverage via dense, uniformly spaced sampling facilitate data consistency and comparability across >20,000 genes. The use of an inbred mouse strain with minimal animal-to-animal variance allows one to treat the brain essentially as a complex but highly reproducible three-dimensional tissue array. The entire Allen Mouse Brain Atlas dataset and associated tools are available through an...

Details →

Usage examples

See 3 usage examples →

NIFS Large Helical Device (LHD) Experiment

analyticsanomaly detectionarchivescomputed tomographydatacenterdigital assetselectricityenergyfluid dynamicsimage processingphysicspost-processingradiationsignal processingsource codeturbulencevideox-rayx-ray tomography

The Large Helical Device (LHD), owned and operated by the National Institute for Fusion Science (NIFS), is one of the world's largest plasma confinement device which employs a heliotron magnetic configuration generated by the superconducting coils. The objectives are to conduct academic research on the confinement of steady-state, high-temperature, high-density plasmas, core plasma physics, and fusion reactor engineering, which are necessary to develop future fusion reactors. All the archived data of the LHD plasma diagnostics are available since the beginning of the LHD experiment, starte...

Details →

Usage examples

See 3 usage examples →

National Cancer Institute Imaging Data Commons (IDC) Collections

cancerdigital pathologyfluorescence imagingimage processingimaginglife sciencesmachine learningmicroscopyradiology

Imaging Data Commons (IDC) is a repository within the Cancer Research Data Commons (CRDC) that manages imaging data and enables its integration with the other components of CRDC. IDC hosts a growing number of imaging collections that are contributed by either funded US National Cancer Institute (NCI) data collection activities, or by the individual researchers.Image data hosted by IDC is stored in DICOM format.

Details →

Usage examples

See 3 usage examples →

PD12M

artdeep learningimage processinglabeledmachine learningmedia

PD12M is a collection of 12.4 million CC0/PD image-caption pairs for the purpose of training generative image models.

Details →

Usage examples

See 6 usage examples →

Aurora Multi-Sensor Dataset

autonomous vehiclescomputer visiondeep learningimage processinglidarmachine learningmappingroboticstraffictransportationurbanweather

The Aurora Multi-Sensor Dataset is an open, large-scale multi-sensor dataset with highly accurate localization ground truth, captured between January 2017 and February 2018 in the metropolitan area of Pittsburgh, PA, USA by Aurora (via Uber ATG) in collaboration with the University of Toronto. The de-identified dataset contains rich metadata, such as weather and semantic segmentation, and spans all four seasons, rain, snow, overcast and sunny days, different times of day, and a variety of traffic conditions.
The Aurora Multi-Sensor Dataset contains data from a 64-beam Velodyne HDL-64E LiDAR sensor and seven 1920x1200-pixel resolution cameras including a forward-facing stereo pair and five wide-angle lenses covering a 360-degree view around the vehicle.
This data can be used to develop and evaluate large-scale long-term approaches to autonomous vehicle localization. Its size and diversity make it suitable for a wide range of research areas such as 3D reconstruction, virtual tourism, HD map construction, and map compression, among others.
The data was first presented at the International Conference on Intelligent Robots an
...

Details →

Usage examples

See 2 usage examples →

DigitalCorpora

computer forensicscomputer securityCSIcyber securitydigital forensicsimage processingimaginginformation retrievalinternetintrusion detectionmachine learningmachine translationtext analysis

Disk images, memory dumps, network packet captures, and files for use in digital forensics research and education. All of this information is accessible through the digitalcorpora.org website, and made available at s3://digitalcorpora/. Some of these datasets implement scenarios that were performed by students, faculty, and others acting in persona. As such, the information is synthetic and may be used without prior authorization or IRB approval. Details of these datasets can be found at Details →

Usage examples

See 2 usage examples →

Satellogic EarthView dataset

cogcomputer visionearth observationgeospatialimage processingsatellite imagerystac

Satellogic EarthView dataset includes high-resolution satellite images captured over all continents. The dataset is organized in Hive partition format and hosted by AWS. The dataset can be accessed via STAC browser or aws cli. Each item of the dataset corresponds to a specific region and date, with some of the regions revisited for additional data. The dataset provides Top-of-Atmosphere (TOA) reflectance values across four spectral bands (Red, Green, Blue, Near-Infrared) at a Ground Sample Distance (GSD) of 1 meter, accompanied by comprehensive metadata such as off-nadir angles, sun elevation,...

Details →

Usage examples

See 2 usage examples →

Sub-Meter Canopy Tree Height of California in 2020 by CTrees.org

aerial imagerycogconservationdeep learningearth observationenvironmentalgeospatialimage processingland cover

Canopy Tree Height maps for California in 2020. Created using a deep learning model on very-high-resolution airborne imagery from the National Agriculture Imagery Program (NAIP) by United States Department of Agriculture (USDA).

Details →

Usage examples

See 2 usage examples →

Allen Brain Observatory - Visual Coding AWS Public Data Set

electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing

The Allen Brain Observatory – Visual Coding is a large-scale, standardized survey of physiological activity across the mouse visual cortex, hippocampus, and thalamus. It includes datasets collected with both two-photon imaging and Neuropixels probes, two complementary techniques for measuring the activity of neurons in vivo. The two-photon imaging dataset features visually evoked calcium responses from GCaMP6-expressing neurons in a range of cortical layers, visual areas, and Cre lines. The Neuropixels dataset features spiking activity from distributed cortical and subcortical brain regions, c...

Details →

Usage examples

See 1 usage example →

Allen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology Data

electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing

The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the raw and derived data we collect publicly with rich metadata, including preliminary data collected during methods development, as near to the time of collection as possible.

Details →

Usage examples

See 1 usage example →

High Resolution Population Density Maps + Demographic Estimates by CIESIN and Meta

aerial imagerydemographicsdisaster responsegeospatialimage processingmachine learningpopulationsatellite imagery

Population data for a selection of countries, allocated to 1 arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files. This refines CIESIN’s Gridded Population of the World using machine learning models on high-resolution worldwide Maxar satellite imagery. CIESIN population counts aggregated from worldwide census data are allocated to blocks where imagery appears to contain buildings.

Details →

Usage examples

See 1 usage example →

NYUMets Brain Dataset

biologycancercomputer visionhealthimage processingimaginglife sciencesmachine learningmagnetic resonance imagingmedical imagingmedicineneurobiologyneuroimagingsegmentation

This dataset contains 8,000+ brain MRIs of 2,000+ patients with brain metastases.

Details →

Usage examples

See 1 usage example →

Ohio State Cardiac MRI Raw Data (OCMR)

Homo sapiensimage processingimaginglife sciencesmagnetic resonance imagingsignal processing

OCMR is an open-access repository that provides multi-coil k-space data for cardiac cine. The fully sampled MRI datasets are intended for quantitative comparison and evaluation of image reconstruction methods. The free-breathing, prospectively undersampled datasets are intended to evaluate their performance and generalizability qualitatively.

Details →

Usage examples

See 1 usage example →

Xiph.Org Test Media

computer visionimage processingimagingmediamoviesmultimediavideo

Uncompressed video used for video compression and video processing research.

Details →

Usage examples

See 1 usage example →

Natural Scenes Dataset

computer visionimage processingimaginglife sciencesmachine learningmagnetic resonance imagingneuroimagingneurosciencenifti

Here, we collected and pre-processed a massive, high-quality 7T fMRI dataset that can be used to advance our understanding of how the brain works. A unique feature of this dataset is the massive amount of data available per individual subject. The data were acquired using ultra-high-field fMRI (7T, whole-brain, 1.8-mm resolution, 1.6-s TR). We measured fMRI responses while each of 8 participants viewed 9,000–10,000 distinct, color natural scenes (22,500–30,000 trials) in 30–40 weekly scan sessions over the course of a year. Additional measures were collected including resting-state data, retin...

Details →

Open Food Facts Images

image processingmachine learning

A dataset of all images of Open Food Facts, the biggest open dataset of food products in the world.

Details →

Umbra Synthetic Aperture Radar (SAR) Open Data

earth observationgeospatialimage processingsatellite imagerystacsynthetic aperture radar

Umbra satellites generate the highest resolution Synthetic Aperture Radar (SAR) imagery ever offered from space, up to 16-cm resolution. SAR can capture images at night, through cloud cover, smoke and rain. SAR is unique in its abilities to monitor changes. The Open Data Program (ODP) features over twenty diverse time-series locations that are updated frequently, allowing users to experiment with SAR's capabilities. We offer single-looked spotlight mode in either 16cm, 25cm, 35cm, 50cm, or 1m resolution, and multi-looked spotlight mode. The ODP also features an assorted collection of over ...

Details →

VENUS L2A Cloud-Optimized GeoTIFFs

activity detectionagriculturecogdisaster responseearth observationenvironmentalgeospatialimage processingland covernatural resourcesatellite imagerystac

The Venµs science mission is a joint research mission undertaken by CNES and ISA, the Israel Space Agency. It aims to demonstrate the effectiveness of high-resolution multi-temporal observation optimised through Copernicus, the global environmental and security monitoring programme. Venµs was launched from the Centre Spatial Guyanais by a VEGA rocket, during the night from 2017, August 1st to 2nd. Thanks to its multispectral camera (12 spectral bands in the visible and near-infrared ranges, with spectral characteristics provided here), it acquires imagery every 1-2 days over 100+ areas at...

Details →