COVID-19 Genome Sequence Dataset

bam bioinformatics biology coronavirus COVID-19 cram fastq genetic genomic health life sciences MERS SARS STRIDES transcriptomics virus whole genome sequencing

Description

A centralized sequence repository for all records containing sequence associated with the novel corona virus (SARS-CoV-2) submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). Included are both the original sequences submitted by the principal investigator as well as SRA-processed sequences that require the SRA Toolkit for analysis. Additionally, submitter provided metadata included in associated BioSample and BioProject records is available alongside NCBI calculated data, such k-mer based taxonomy analysis results, contiguous assemblies (contigs) and associated statistics such as contig length, blast results for the assembled contigs, contig annotation, blast databases of contigs and their annotated peptides, and VCF files generated for each record relative to the SARS-CoV-2 RefSeq record. Finally, metadata is additionally made available in parquet format to facilitate search and filtering using the AWS Athena Service.

Update Frequency

Hourly

License

NIH Genomic Data Sharing Policy

Documentation

https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/

Managed By

National Library of Medicine (NLM)

See all datasets managed by National Library of Medicine (NLM).

Contact

https://support.nlm.nih.gov/support/create-case/

Usage Examples

Tools & Applications

Resources on AWS

  • Description
    Genomic sequence reads of SARS-CoV-2 and related coronaviridae, organized by NCBI accession. Files in the sra-src folder are in FASTQ, BAM, or CRAM format (original submission); files in the run folder are in .sra format and require the SRA Toolkit
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-sars-cov2
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://sra-pub-sars-cov2/ --no-sign-request
  • Description
    Metadata for sra-pub-sars-cov2 in an Athena-queryable format
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-sars-cov2-metadata-us-east-1
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://sra-pub-sars-cov2-metadata-us-east-1/ --no-sign-request

Edit this dataset entry on GitHub

Home