The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

GATK Structural Variation (SV) Data

bioinformatics biology cromwell gatk-sv genetic genomic life sciences structural variation

Description

This dataset holds the data needed to run a structural variation discovery pipeline for Illumina short-read whole-genome sequencing (WGS) data in AWS.

Update Frequency

Every 3 months

License

https://github.com/LokaHQ/aws-open-data/blob/main/LICENSE

Documentation

https://github.com/LokaHQ/aws-open-data/blob/main/gatk-sv/documentation.md

Managed By

Loka Inc.

See all datasets managed by Loka Inc..

Contact

awsdata@loka.com

How to Cite

GATK Structural Variation (SV) Data was accessed on DATE from https://registry.opendata.aws/gatk-sv-data.

Usage Examples

Tutorials
Tools & Applications

Resources on AWS

  • Description
    This dataset contains, among others, the following data:
      • Illumina short-read whole-genome CRAMs or BAMs, aligned to hg38 with bwa-mem. BAMs must also be indexed.
      • Indexed GVCFs produced by GATK HaplotypeCaller, or a jointly genotyped VCF.
      • Family structure definitions file in PED format.
      • Reference files from the human reference genome Hg38.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::gatk-sv-data-us-east-2
    AWS Region
    us-east-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://gatk-sv-data-us-east-2/
  • Description
    This dataset contains, among others, the following data:
      • Illumina short-read whole-genome CRAMs or BAMs, aligned to hg38 with bwa-mem. BAMs must also be indexed.
      • Indexed GVCFs produced by GATK HaplotypeCaller, or a jointly genotyped VCF.
      • Family structure definitions file in PED format.
      • Reference files from the human reference genome Hg38.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::gatk-sv-data-us-east-1
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://gatk-sv-data-us-east-1/

Edit this dataset entry on GitHub

Tell us about your project

Home