The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

Genome Aggregation Database (gnomAD) - Data Lakehouse Ready

bioinformatics biology biotech blueprint genetic genomic life sciences parquet population genetics vcf whole genome sequencing

Description

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.

Update Frequency

Not updated

License

MIT; terms of use

Documentation

https://github.com/aws-samples/data-lake-as-code/tree/roda#readme

Managed By

See all datasets managed by Amazon Web Services.

Contact

https://github.com/aws-samples/data-lake-as-code/issues

How to Cite

Genome Aggregation Database (gnomAD) - Data Lakehouse Ready was accessed on DATE from https://registry.opendata.aws/gnomad-data-lakehouse-ready.

Usage Examples

Tutorials

Resources on AWS

  • Description
    Parquet representations of gnomAD summary data aggregated from gnomAD release 3.1
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::aws-roda-hcls-datalake/gnomad
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://aws-roda-hcls-datalake/gnomad/

Edit this dataset entry on GitHub

Tell us about your project

Home