bioinformatics biology biotech blueprint genetic genomic life sciences parquet population genetics vcf whole genome sequencing

Deprecated

Amazon is no longer hosting this Data Lakehouse Ready dataset

Description

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.

Update Frequency

Not updated

License

MIT; terms of use

Documentation

https://github.com/aws-samples/data-lake-as-code/tree/roda#readme

Managed By

See all datasets managed by Amazon Web Services.

Contact

https://github.com/aws-samples/data-lake-as-code/issues

How to Cite

Genome Aggregation Database (gnomAD) - Data Lakehouse Ready was accessed on DATE from https://registry.opendata.aws/gnomad-data-lakehouse-ready.

Usage Examples

Tutorials

Sample Queries on the 1000 Genomes, gnomAD and ClinVar data Lake by Sujaya Srinivasan
Amazon AthenaAWS Glue

Resources on AWS

Description

Parquet representations of gnomAD summary data aggregated from gnomAD release 3.1

Resource type

S3 Bucket

Amazon Resource Name (ARN)

arn:aws:s3:::aws-roda-hcls-datalake/gnomad

AWS Region

us-east-1

AWS CLI Access (No AWS account required)

aws s3 ls --no-sign-request s3://aws-roda-hcls-datalake/gnomad/