bam bioinformatics biology coronavirus COVID-19 cram fastq genetic genomic health life sciences MERS SARS STRIDES transcriptomics virus whole genome sequencing
A centralized sequence repository for all records containing sequence associated with the novel corona virus (SARS-CoV-2) submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). Included are both the original sequences submitted by the principal investigator as well as SRA-processed sequences that require the SRA Toolkit for analysis. Additionally, submitter provided metadata included in associated BioSample and BioProject records is available alongside NCBI calculated data, such k-mer based taxonomy analysis results, contiguous assemblies (contigs) and associated statistics such as contig length, blast results for the assembled contigs, contig annotation, blast databases of contigs and their annotated peptides, and VCF files generated for each record relative to the SARS-CoV-2 RefSeq record. Finally, metadata is additionally made available in parquet format to facilitate search and filtering using the AWS Athena Service.
Hourly
NIH Genomic Data Sharing Policy
https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/
National Library of Medicine (NLM)
See all datasets managed by National Library of Medicine (NLM).
https://support.nlm.nih.gov/support/create-case/
COVID-19 Genome Sequence Dataset was accessed on DATE
from https://registry.opendata.aws/ncbi-covid-19.
sra-src
folder are in FASTQ, BAM, or CRAM format (original submission); files in the run
folder are in .sra format and require the SRA Toolkitarn:aws:s3:::sra-pub-sars-cov2
us-east-1
aws s3 ls --no-sign-request s3://sra-pub-sars-cov2/
arn:aws:s3:::sra-pub-sars-cov2-metadata-us-east-1
us-east-1
aws s3 ls --no-sign-request s3://sra-pub-sars-cov2-metadata-us-east-1/