The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

Somatic Mosaicism across Human Tissues (SMaHT)

bam bioinformatics biology genetic genomic imaging life sciences whole genome sequencing

Description

The Somatic Mosaicism across Human Tissues (SMaHT) project is an NIH Common Fund consortium (2023-) aimed to comprehensively characterize somatic variation ("mosaicism") in normal human tissues. While most genetic studies have relied on blood-derived DNA, SMaHT captures the full spectrum of DNA variation across cell types, tissues, and organs from phenotypically normal individuals to better understand the role of somatic mosaicism in human development, aging, and disease progression.Researchers in the consortium develop and apply experimental and computational methods, paired with the state-of-the-art sequencing technologies, to accurately detect even rare mutations (frequency < 1%) in subpopulations of cells. In addition to generating the production data across ~20 tissue types from 150 post-mortem donors, SMaHT also produces datasets from cell line and tissue homogenate samples, to benchmark and develop new technologies and computational tools for mosaic variant detection.The resulting data include high-coverage whole-genome and transcriptome data using both short-read and long-read sequencing technologies from multiple platforms (e.g., Illumina, PacBio, Oxford Nanopore Technologies, Ultima Genomics). SMaHT will also generate comprehensive genome-wide catalogs of somatic variants. We anticipate that this resource will be valuable not only for researchers studying somatic mosaicism, but also for the broader scientific community interested in large-scale WGS data from normal human tissues. More about the SMaHT project: program announcement, https://commonfund.nih.gov/smaht, and https://smaht.org/. More about the data portal: https://data.smaht.org/ and types of data generated: https://data.smaht.org/about/consortium/data

Update Frequency

Bi-annually

License

NIH Genomic Data Sharing Policy - https://gdc.cancer.gov/access-data/data-access-policies

Documentation

https://data.smaht.org/docs

Managed By

SMaHT Data Analysis Center (DAC)

See all datasets managed by SMaHT Data Analysis Center (DAC).

Contact

smhelp@hms-dbmi.atlassian.net

How to Cite

Somatic Mosaicism across Human Tissues (SMaHT) was accessed on DATE from https://registry.opendata.aws/smaht. The SMaHT datasets were generated as part of the NIH Common Fund consortium initiative, Somatic Mosaicism across Human Tissues (SMaHT). The SMaHT datasets are submitted under dbGaP studies (http://www.ncbi.nlm.nih.gov/gap), with the study accession numbers, phs004193 for the SMaHT Benchmarking data and phs004194 for the SMaHT Production data. The datasets were provided by the SMaHT Data Analysis Center (DAC) [1UM1DA058230] on behalf of the SMaHT network. More information about the SMaHT Network is available online at https://smaht.org/, about the SMaHT Data Portal at https://data.smaht.org/ , and types of data generated by the Network at https://data.smaht.org/about/consortium/data

Usage Examples

Tools & Applications
Publications

Resources on AWS

  • Description
    SMaHT Open-Access Data - Publicly available data files without restriction, including aligned reads from WGS and RNA-Seq, as well as variants identified from cell line samples that are commercially available without restriction. Somatic (non-inherited) variants from donor tissue samples are also open-access data.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::smaht-open-data-public
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://smaht-open-data-public/
  • Description
    SMaHT Controlled Access Data - Controlled-access data files, including aligned reads from WGS and RNA-Seq, as well as germline (inherited) from donor tissue samples. Access to these data is managed through dbGaP.
    Resource type
    S3 Bucket Controlled Access
    Amazon Resource Name (ARN)
    arn:aws:s3:::smaht-open-data-protected
    AWS Region
    us-east-1
  • Description
    Amazon SNS topic that publishes notifications when public access data is added for this dataset.
    Resource type
    SNS Topic
    Amazon Resource Name (ARN)
    arn:aws:sns:us-east-1:874962955096:smaht-open-data-public-object_created
    AWS Region
    us-east-1
  • Description
    Amazon SNS topic that publishes notifications when new controlled access data is added for this dataset.
    Resource type
    SNS Topic
    Amazon Resource Name (ARN)
    arn:aws:sns:us-east-1:874962955096:smaht-open-data-protected-object_created
    AWS Region
    us-east-1

Edit this dataset entry on GitHub

Tell us about your project

Home