bam bioinformatics biology coronavirus COVID-19 cram fastq genetic genomic health life sciences MERS SARS STRIDES transcriptomics virus whole genome sequencing
This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.
Daily
NIH Genomic Data Sharing Policy
https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/
National Library of Medicine (NLM)
See all datasets managed by National Library of Medicine (NLM).
https://support.nlm.nih.gov/support/create-case/
COVID-19 Genome Sequence Dataset was accessed on DATE
from https://registry.opendata.aws/ncbi-covid-19.
sra-src
folder are in FASTQ, BAM, or CRAM format (original submission); files in the run
folder are in .sra format and require the SRA Toolkitarn:aws:s3:::sra-pub-sars-cov2
us-east-1
aws s3 ls --no-sign-request s3://sra-pub-sars-cov2/
arn:aws:s3:::sra-pub-sars-cov2-metadata-us-east-1
us-east-1
aws s3 ls --no-sign-request s3://sra-pub-sars-cov2-metadata-us-east-1/