This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.
See all usage examples for datasets listed in this registry tagged with image processing.
You are currently viewing a subset of data tagged with image processing.
If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.
Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.
If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.
biologycell biologycell imagingHomo sapiensimage processinglife sciencesmachine learningmicroscopy
This bucket contains multiple datasets (as Quilt packages) created by the Allen Institute for Cell Science. The types of data included in this bucket are listed below:
bioinformaticsbiologycancercell biologycell imagingcell paintingchemical biologycomputer visioncsvdeep learningfluorescence imaginggenetichigh-throughput imagingimage processingimage-based profilingimaginglife sciencesmachine learningmedicinemicroscopyorganelle
The Cell Painting Gallery is a collection of image datasets created using the Cell Painting assay. The images of cells are captured by microscopy imaging, and reveal the response of various labeled cell components to whatever treatments are tested, which can include genetic perturbations, chemicals or drugs, or different cell types. The datasets can be used for diverse applications in basic biology and pharmaceutical research, such as identifying disease-associated phenotypes, understanding disease mechanisms, and predicting a drug’s activity, toxicity, or mechanism of action (Chandrasekaran et al 2020). This collection is maintained by the Carpenter–Singh lab and the Cimini lab at the Broad Institute. A human-friendly listing of datasets, instructions for accessing them, and other documentation is at the corresponding GitHub page abou...
biologyfluorescence imagingimage processingimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscience
This data set, made available by Janelia's FlyLight project, consists of fluorescence images of Drosophila melanogaster driver lines, aligned to standard templates, and stored in formats suitable for rapid searching in the cloud. Additional data will be added as it is published.
aerial imagerycoastalcomputer visiondisaster responseearth observationearthquakesgeospatialimage processingimaginginfrastructurelandmachine learningmappingnatural resourceseismologytransportationurbanwater
The Low Altitude Disaster Imagery (LADI) Dataset consists of human and machine annotated airborne images collected by the Civil Air Patrol in support of various disaster responses from 2015-2023. Two key distinctions are the low altitude, oblique perspective of the imagery and disaster-related features, which are rarely featured in computer vision benchmarks and datasets.
array tomographybiologyelectron microscopyimage processinglife scienceslight-sheet microscopymagnetic resonance imagingneuroimagingneuroscience
This bucket contains multiple neuroimaging datasets (as Neuroglancer Precomputed Volumes) across multiple modalities and scales, ranging from nanoscale (electron microscopy), to microscale (cleared lightsheet microscopy and array tomography), and mesoscale (structural and functional magnetic resonance imaging). Additionally, many of the datasets include segmentations and meshes.
cogearth observationgeosciencegeospatialimage processingopen source softwaresatellite imagerystac
Earth observation (EO) data cubes produced from analysis-ready data (ARD) of CBERS-4, Sentinel-2 A/B and Landsat-8 satellite images for Brazil. The datacubes are regular in time and use a hierarchical tiling system. Further details are described in Ferreira et al. (2020).
cogcomputer visionearth observationgeospatialimage processingsatellite imagerystacsynthetic aperture radar
Open Synthetic Aperture Radar (SAR) data from Capella Space. Capella Space is an information services company that provides on-demand, industry-leading, high-resolution synthetic aperture radar (SAR) Earth observation imagery. Through a constellation of small satellites, Capella provides easy access to frequent, timely, and flexible information affecting dozens of industries worldwide. Capella's high-resolution SAR satellites are matched with unparalleled infrastructure to deliver reliable global insights that sharpen our understanding of the changing world – improving decisions ...
biologyhealthimage processingimaginglife sciencesmagnetic resonance imagingneurobiologyneuroimaging
This dataset contains deidentified raw k-space data and DICOM image files of over 1,500 knees and 6,970 brains.
geospatialgeothermalimage processingseismology
Released to the public as part of the Department of Energy's Open Energy Data Initiative, these data represent vertical and horizontal distributed acoustic sensing (DAS) data collected as part of the Poroelastic Tomography (PoroTomo) project funded in part by the Office of Energy Efficiency and Renewable Energy (EERE), U.S. Department of Energy.
autonomous racingautonomous vehiclescomputer visionGNSSimage processinglidarlocalizationobject detectionobject trackingperceptionradarrobotics
The RACECAR dataset is the first open dataset for full-scale and high-speed autonomous racing. Multi-modal sensor data has been collected from fully autonomous Indy race cars operating at speeds of up to 170 mph (273 kph). Six teams who raced in the Indy Autonomous Challenge during 2021-22 have contributed to this dataset. The dataset spans 11 interesting racing scenarios across two race tracks which include solo laps, multi-agent laps, overtaking situations, high-accelerations, banked tracks, obstacle avoidance, pit entry and exit at different speeds. The data is organized and released in bot...
aerial imageryagricultureclimatecogearth observationgeospatialimage processingland covermachine learningsatellite imagery
Global and regional Canopy Height Maps (CHM). Created using machine learning models on high-resolution worldwide Maxar satellite imagery.
biologyfluorescence imagingimage processingimaginglife sciencesmicroscopyneurobiologyneuroimagingneuroscience
This data set, made available by Janelia's MouseLight project, consists of images and neuron annotations of the Mus musculus brain, stored in formats suitable for viewing and annotation using the HortaCloud cloud-based annotation system.
biodiversitybiologyecosystemsimage processingmultimediawildlife
The SiPeCaM goal is to create a data source that allows to evaluate changes in the biodiversity state, considering key aspect of how does the ecosystem behaves.
biologycancercomputer visiongene expressiongeneticglioblastomaHomo sapiensimage processingimaginglife sciencesmachine learningneurobiology
This dataset consists of images of glioblastoma human brain tumor tissue sections that have been probed for expression of particular genes believed to play a role in development of the cancer. Each tissue section is adjacent to another section that was stained with a reagent useful for identifying histological features of the tumor. Each of these types of images has been completely annotated for tumor features by a machine learning process trained by expert medical doctors.
biologygene expressiongeneticimage processingimaginglife sciencesMus musculusneurobiologytranscriptomics
The Allen Mouse Brain Atlas is a genome-scale collection of cellular resolution gene expression profiles using in situ hybridization (ISH). Highly methodical data production methods and comprehensive anatomical coverage via dense, uniformly spaced sampling facilitate data consistency and comparability across >20,000 genes. The use of an inbred mouse strain with minimal animal-to-animal variance allows one to treat the brain essentially as a complex but highly reproducible three-dimensional tissue array. The entire Allen Mouse Brain Atlas dataset and associated tools are available through an...
analyticsanomaly detectionarchivescomputed tomographydatacenterdigital assetselectricityenergyfluid dynamicsimage processingphysicspost-processingradiationsignal processingsource codeturbulencevideox-rayx-ray tomography
The Large Helical Device (LHD), owned and operated by the National Institute for Fusion Science (NIFS), is one of the world's largest plasma confinement device which employs a heliotron magnetic configuration generated by the superconducting coils. The objectives are to conduct academic research on the confinement of steady-state, high-temperature, high-density plasmas, core plasma physics, and fusion reactor engineering, which are necessary to develop future fusion reactors. All the archived data of the LHD plasma diagnostics are available since the beginning of the LHD experiment, starte...
cancerdigital pathologyfluorescence imagingimage processingimaginglife sciencesmachine learningmicroscopyradiology
Imaging Data Commons (IDC) is a repository within the Cancer Research Data Commons (CRDC) that manages imaging data and enables its integration with the other components of CRDC. IDC hosts a growing number of imaging collections that are contributed by either funded US National Cancer Institute (NCI) data collection activities, or by the individual researchers.Image data hosted by IDC is stored in DICOM format.
artdeep learningimage processinglabeledmachine learningmedia
PD12M is a collection of 12.4 million CC0/PD image-caption pairs for the purpose of training generative image models.
autonomous vehiclescomputer visiondeep learningimage processinglidarmachine learningmappingroboticstraffictransportationurbanweather
The Aurora Multi-Sensor Dataset is an open, large-scale multi-sensor dataset with highly accurate localization ground truth, captured between January 2017 and February 2018 in the metropolitan area of Pittsburgh, PA, USA by Aurora (via Uber ATG) in collaboration with the University of Toronto. The de-identified dataset contains rich metadata, such as weather and semantic segmentation, and spans all four seasons, rain, snow, overcast and sunny days, different times of day, and a variety of traffic conditions.
The Aurora Multi-Sensor Dataset contains data from a 64-beam Velodyne HDL-64E LiDAR sensor and seven 1920x1200-pixel resolution cameras including a forward-facing stereo pair and five wide-angle lenses covering a 360-degree view around the vehicle.
This data can be used to develop and evaluate large-scale long-term approaches to autonomous vehicle localization. Its size and diversity make it suitable for a wide range of research areas such as 3D reconstruction, virtual tourism, HD map construction, and map compression, among others.
The data was first presented at the International Conference on Intelligent Robots an...
computer forensicscomputer securityCSIcyber securitydigital forensicsimage processingimaginginformation retrievalinternetintrusion detectionmachine learningmachine translationtext analysis
Disk images, memory dumps, network packet captures, and files for use in digital forensics research and education. All of this information is accessible through the digitalcorpora.org website, and made available at s3://digitalcorpora/. Some of these datasets implement scenarios that were performed by students, faculty, and others acting in persona. As such, the information is synthetic and may be used without prior authorization or IRB approval. Details of these datasets can be found at Details →
cogcomputer visionearth observationgeospatialimage processingsatellite imagerystac
Satellogic EarthView dataset includes high-resolution satellite images captured over all continents. The dataset is organized in Hive partition format and hosted by AWS. The dataset can be accessed via STAC browser or aws cli. Each item of the dataset corresponds to a specific region and date, with some of the regions revisited for additional data. The dataset provides Top-of-Atmosphere (TOA) reflectance values across four spectral bands (Red, Green, Blue, Near-Infrared) at a Ground Sample Distance (GSD) of 1 meter, accompanied by comprehensive metadata such as off-nadir angles, sun elevation,...
aerial imagerycogconservationdeep learningearth observationenvironmentalgeospatialimage processingland cover
Canopy Tree Height maps for California in 2020. Created using a deep learning model on very-high-resolution airborne imagery from the National Agriculture Imagery Program (NAIP) by United States Department of Agriculture (USDA).
electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing
The Allen Brain Observatory – Visual Coding is a large-scale, standardized survey of physiological activity across the mouse visual cortex, hippocampus, and thalamus. It includes datasets collected with both two-photon imaging and Neuropixels probes, two complementary techniques for measuring the activity of neurons in vivo. The two-photon imaging dataset features visually evoked calcium responses from GCaMP6-expressing neurons in a range of cortical layers, visual areas, and Cre lines. The Neuropixels dataset features spiking activity from distributed cortical and subcortical brain regions, c...
electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing
The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the raw and derived data we collect publicly with rich metadata, including preliminary data collected during methods development, as near to the time of collection as possible.
aerial imagerydemographicsdisaster responsegeospatialimage processingmachine learningpopulationsatellite imagery
Population data for a selection of countries, allocated to 1 arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files. This refines CIESIN’s Gridded Population of the World using machine learning models on high-resolution worldwide Maxar satellite imagery. CIESIN population counts aggregated from worldwide census data are allocated to blocks where imagery appears to contain buildings.
biologycancercomputer visionhealthimage processingimaginglife sciencesmachine learningmagnetic resonance imagingmedical imagingmedicineneurobiologyneuroimagingsegmentation
This dataset contains 8,000+ brain MRIs of 2,000+ patients with brain metastases.
Homo sapiensimage processingimaginglife sciencesmagnetic resonance imagingsignal processing
OCMR is an open-access repository that provides multi-coil k-space data for cardiac cine. The fully sampled MRI datasets are intended for quantitative comparison and evaluation of image reconstruction methods. The free-breathing, prospectively undersampled datasets are intended to evaluate their performance and generalizability qualitatively.
computer visionimage processingimagingmediamoviesmultimediavideo
Uncompressed video used for video compression and video processing research.
computer visionimage processingimaginglife sciencesmachine learningmagnetic resonance imagingneuroimagingneurosciencenifti
Here, we collected and pre-processed a massive, high-quality 7T fMRI dataset that can be used to advance our understanding of how the brain works. A unique feature of this dataset is the massive amount of data available per individual subject. The data were acquired using ultra-high-field fMRI (7T, whole-brain, 1.8-mm resolution, 1.6-s TR). We measured fMRI responses while each of 8 participants viewed 9,000–10,000 distinct, color natural scenes (22,500–30,000 trials) in 30–40 weekly scan sessions over the course of a year. Additional measures were collected including resting-state data, retin...
image processingmachine learning
A dataset of all images of Open Food Facts, the biggest open dataset of food products in the world.
earth observationgeospatialimage processingsatellite imagerystacsynthetic aperture radar
Umbra satellites generate the highest resolution Synthetic Aperture Radar (SAR) imagery ever offered from space, up to 16-cm resolution. SAR can capture images at night, through cloud cover, smoke and rain. SAR is unique in its abilities to monitor changes. The Open Data Program (ODP) features over twenty diverse time-series locations that are updated frequently, allowing users to experiment with SAR's capabilities. We offer single-looked spotlight mode in either 16cm, 25cm, 35cm, 50cm, or 1m resolution, and multi-looked spotlight mode. The ODP also features an assorted collection of over ...
activity detectionagriculturecogdisaster responseearth observationenvironmentalgeospatialimage processingland covernatural resourcesatellite imagerystac
The Venµs science mission is a joint research mission undertaken by CNES and ISA, the Israel Space Agency. It aims to demonstrate the effectiveness of high-resolution multi-temporal observation optimised through Copernicus, the global environmental and security monitoring programme. Venµs was launched from the Centre Spatial Guyanais by a VEGA rocket, during the night from 2017, August 1st to 2nd. Thanks to its multispectral camera (12 spectral bands in the visible and near-infrared ranges, with spectral characteristics provided here), it acquires imagery every 1-2 days over 100+ areas at...