The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

COVID-19 Genome Sequence Dataset

bam bioinformatics biology coronavirus COVID-19 cram fastq genetic genomic health life sciences MERS SARS STRIDES transcriptomics virus whole genome sequencing

Description

This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.

Update Frequency

Daily

License

NIH Genomic Data Sharing Policy

Documentation

https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/

Managed By

National Library of Medicine (NLM)

See all datasets managed by National Library of Medicine (NLM).

Contact

https://support.nlm.nih.gov/support/create-case/

How to Cite

COVID-19 Genome Sequence Dataset was accessed on DATE from https://registry.opendata.aws/ncbi-covid-19.

Usage Examples

Tools & Applications

Resources on AWS

  • Description
    Genomic sequence reads of SARS-CoV-2 and related coronaviridae, organized by NCBI accession. Files in the sra-src folder are in FASTQ, BAM, or CRAM format (original submission); files in the run folder are in .sra format and require the SRA Toolkit
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-sars-cov2
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://sra-pub-sars-cov2/
  • Description
    Metadata for sra-pub-sars-cov2 in an Athena-queryable format
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-sars-cov2-metadata-us-east-1
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://sra-pub-sars-cov2-metadata-us-east-1/

Edit this dataset entry on GitHub

Tell us about your project

Home