Genome Aggregation Database (gnomAD) - Data Lakehouse Ready

bioinformatics biology biotech blueprint genetic genomic life sciences parquet population genetics vcf whole genome sequencing


The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.

Update Frequency

Not updated


MIT; terms of use


Managed By

See all datasets managed by Amazon Web Services.


How to Cite

Genome Aggregation Database (gnomAD) - Data Lakehouse Ready was accessed on DATE from

Usage Examples


Resources on AWS

  • Description
    Parquet representations of gnomAD summary data aggregated from gnomAD release 3.1
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    AWS Region
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://aws-roda-hcls-datalake/gnomad/ --no-sign-request

Edit this dataset entry on GitHub