The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

Genome Aggregation Database (gnomAD) - Data Lakehouse Ready

bioinformatics biology biotech blueprint genetic genomic life sciences parquet population genetics vcf whole genome sequencing


Amazon is no longer hosting this Data Lakehouse Ready dataset


The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.

Update Frequency

Not updated


MIT; terms of use


Managed By

See all datasets managed by Amazon Web Services.


How to Cite

Genome Aggregation Database (gnomAD) - Data Lakehouse Ready was accessed on DATE from

Usage Examples


Resources on AWS

  • Description
    Parquet representations of gnomAD summary data aggregated from gnomAD release 3.1
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    AWS Region
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://aws-roda-hcls-datalake/gnomad/

Edit this dataset entry on GitHub

Tell us about your project