Genome Aggregation Database (gnomAD) - Data Lakehouse ready

bioinformatics biology biotech blueprint genetic genomic life sciences parquet population genetics vcf whole genome sequencing

Description

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.

Update Frequency

Not updated

License

MIT; terms of use

Documentation

https://github.com/aws-samples/data-lake-as-code/tree/roda#readme

Managed By

See all datasets managed by Amazon Web Services.

Contact

https://github.com/aws-samples/data-lake-as-code/issues

Usage Examples

Tutorials

Resources on AWS

  • Description
    Parquet representations of gnomAD summary data aggregated from gnomAD release 3.1
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::aws-roda-hcls-datalake/gnomad
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://aws-roda-hcls-datalake/gnomad/ --no-sign-request

Edit this dataset entry on GitHub

Home