COVID-19 Genome Sequence Dataset

bam bioinformatics biology coronavirus COVID-19 cram fastq genetic genomic health life sciences MERS SARS STRIDES transcriptomics virus whole genome sequencing

Description

A centralized sequence repository for all strains of novel corona virus (SARS-CoV-2) submitted to the National Center for Biotechnology Information (NCBI). Included are both the original sequences submitted by the principal investigator as well as SRA-processed sequences that require the SRA Toolkit for analysis.

Update Frequency

Hourly

License

NIH Genomic Data Sharing Policy

Documentation

https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/

Managed By

National Library of Medicine (NLM)

See all datasets managed by National Library of Medicine (NLM).

Contact

https://support.nlm.nih.gov/support/create-case/

Usage Examples

Tools & Applications

Resources on AWS

  • Description
    Genomic sequence reads of SARS-CoV-2 and related coronaviridae, organized by NCBI accession. Files in the sra-src folder are in FASTQ, BAM, or CRAM format (original submission); files in the run folder are in .sra format and require the SRA Toolkit
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-sars-cov2
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://sra-pub-sars-cov2/ --no-sign-request
  • Description
    Metadata for sra-pub-sars-cov2 in an Athena-queryable format
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-sars-cov2-metadata-us-east-1
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://sra-pub-sars-cov2-metadata-us-east-1/ --no-sign-request

Edit this dataset entry on GitHub

Home