bam bioinformatics biology cram genetic genomic genotyping life sciences machine learning population genetics short read sequencing structural variation tertiary analysis variant annotation whole genome sequencing
Overview
This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (i.e., repeat expansion; STR), structural variant (SV) and other variant call files from the 1000 Genomes Project (1KGP) Phase 3 dataset (3,202 individuals, 602 trios) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, v4.2.7, and v4.4.7 software. All DRAGEN analyses were performed in the cloud using the Illumina Connected Analytics bioinformatics platform powered by Amazon Web Services (see 'Data solution empowering population genomics' for more information). The v3.7.6, v4.2.7, and v4.4.7 datasets include results from trio small variant, de novo structural variant, and de novo copy number variant calls on 602 trio families comprised of members from the 1KGP Phase 3 dataset. Trio repeat expansion calling was included in the v3.7.6 dataset only. Joint cohort analysis was also performed on the entire 1KGP sample dataset for the v3.7.6, v4.0.3, v4.2.7, and v4.4.7 re-analyses using DRAGEN Iterative gVCF Genotyper v3.8.3, v4.2.0, v4.2.7, v4.4.7, respectively (see 'Genotyping variants at population scale using DRAGEN gVCF Genotyper' and 'Population Genotyping'). DRAGEN Versions
Starting with the v4.0.3 reanalysis, annotation using the Illumina Connected Annotations (also known as Illumina Annotation Engine or Nirvana) was included as part of the analysis (see Illumina Connected Annotations documentation for more information). For the v4.0.3, v4.2.7, and v4.4.7 datasets, annotation was performed on the merged small variant VCF generated by the DRAGEN Iterative gVCF Genotyper for the entire 1KGP cohort. For v4.2.7 and v4.4.7, annotation was also performed on the merged CNV, SV, and STR VCFs for the entire cohort.
Files may be updated subsequent to changes to the 1000 Genomes Project data set or select new DRAGEN features or offerings.
TBD
See all datasets managed by Illumina, Inc..
1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, 4.2, and 4.4 was accessed on DATE from https://registry.opendata.aws/ilmn-dragen-1kgp.
arn:aws:s3:::1000genomes-dragenus-west-2aws s3 ls --no-sign-request s3://1000genomes-dragen/arn:aws:s3:::1000genomes-dragen-3.7.6us-west-2aws s3 ls --no-sign-request s3://1000genomes-dragen-3.7.6/arn:aws:s3:::1000genomes-dragen-v3.7.6us-east-1aws s3 ls --no-sign-request s3://1000genomes-dragen-v3.7.6/arn:aws:s3:::1000genomes-dragen-v4.0.3us-east-1aws s3 ls --no-sign-request s3://1000genomes-dragen-v4.0.3/arn:aws:s3:::1000genomes-dragen-v4-2-7us-east-1aws s3 ls --no-sign-request s3://1000genomes-dragen-v4-2-7/arn:aws:s3:::1000genomes-dragen-v4-4-7us-east-1aws s3 ls --no-sign-request s3://1000genomes-dragen-v4-4-7/