Amazon Web Services is collaborating with the Allen Institute to share open datasets to enable faster breakthroughs in human health.
If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.
Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.
If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.
biologycell biologycell imagingHomo sapiensimage processinglife sciencesmachine learningmicroscopy
This bucket contains multiple datasets (as Quilt packages) created by the Allen Institute for Cell Science. The types of data included in this bucket are listed below:
biologycancercomputer visiongene expressiongeneticglioblastomaHomo sapiensimage processingimaginglife sciencesmachine learningneurobiology
This dataset consists of images of glioblastoma human brain tumor tissue sections that have been probed for expression of particular genes believed to play a role in development of the cancer. Each tissue section is adjacent to another section that was stained with a reagent useful for identifying histological features of the tumor. Each of these types of images has been completely annotated for tumor features by a machine learning process trained by expert medical doctors.
biologygene expressiongeneticimage processingimaginglife sciencesMus musculusneurobiologytranscriptomics
The Allen Mouse Brain Atlas is a genome-scale collection of cellular resolution gene expression profiles using in situ hybridization (ISH). Highly methodical data production methods and comprehensive anatomical coverage via dense, uniformly spaced sampling facilitate data consistency and comparability across >20,000 genes. The use of an inbred mouse strain with minimal animal-to-animal variance allows one to treat the brain essentially as a complex but highly reproducible three-dimensional tissue array. The entire Allen Mouse Brain Atlas dataset and associated tools are available through an...
biologygene expressionHomo sapienslife sciencesMus musculusneurobiologynon-human primatesingle-cell transcriptomics
Human and Mammalian Brain Atlas (HMBA) is a major atlas of the BRAIN Initiative Cell Atlas Network (BICAN) that proposes to establish a comprehensive, highly granular cell atlas in complete adult human, macaque, and marmoset brains that links brain structure, function and cellular architecture. Release artifacts have been made available in this OpenData bucket to enable utilization along with their paper publications by the neuroscience community.
electrophysiologyHomo sapienslife sciencesMus musculusneurobiologysignal processing
This is a large-scale survey that describes the physiology (strength, kinetics, and short term plasticity) of thousands of synapses from patch clamp experiments in mouse visual cortex and human middle temporal gyrus.
electrophysiologylife sciencesMus musculusneurobiologysignal processing
Extracellular electrophysiology data is growing at a remarkable pace. This data, collected neuropixels probes by the Allen Institute and the International Brain Lab can be used to benchmark throughput rates and storage ratios of various data compression algorithms.
electrophysiologylife sciencesMus musculusneurobiologysignal processing
Evaluation of spike sorting methods is a challenging task, as it requires both ground-truth data and a variety of sorting algorithms to compare against. This dataset contains a set of hybrid data specifically designed for benchmarking spike sorting methods.
electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing
The Allen Brain Observatory – Visual Coding is a large-scale, standardized survey of physiological activity across the mouse visual cortex, hippocampus, and thalamus. It includes datasets collected with both two-photon imaging and Neuropixels probes, two complementary techniques for measuring the activity of neurons in vivo. The two-photon imaging dataset features visually evoked calcium responses from GCaMP6-expressing neurons in a range of cortical layers, visual areas, and Cre lines. The Neuropixels dataset features spiking activity from distributed cortical and subcortical brain regions, c...
electrophysiologyimage processingimaginglife sciencesMus musculusneurobiologyneuroimagingsignal processing
The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the raw and derived data we collect publicly with rich metadata, including preliminary data collected during methods development, as near to the time of collection as possible.
biologycell biologycell imagingepigenomicsgene expressionhistopathologyHomo sapiensimaginglife sciencesmedicinemicroscopyneurobiologyneurosciencesingle-cell transcriptomicstranscriptomics
The Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD) consortium strives to gain a deep molecular and cellular understanding of the early pathogenesis of Alzheimer's disease and is funded by the National Institutes on Aging (NIA U19AG060909). The SEA-AD datasets available here comprise single cell profiling (transcriptomics and epigenomics) and quantitative neuropathology. To explore gene expression and chromatin accessibility information, the single-cell profiling data includes: snRNAseq and snATAC-seq data from the SEA-AD donor cohort (aged brains which span the spectrum of Alzhe...