analysis ready data biodiversity bioinformatics biology fasta genome genomic graph information retrieval life sciences medicine metagenomics microbiome transcriptomics whole exome sequencing whole genome sequencing
The MetaGraph Sequence Indexes dataset comprises full-text searchable index files for raw sequencing data hosted in major public repositories. These include the European Nucleotide Archive (ENA) managed by the European Bioinformatics Institute (EMBL-EBI), the Sequence Read Archive (SRA) maintained by the National Center for Biotechnology Information (NCBI), and the DNA Data Bank of Japan (DDBJ) Sequence Read Archive (DRA).All index files can be used with the MetaGraph framework for sequence search. Indexes can be jointly used for aggregated search in the cloud or can be individually downloaded for search using local hardware.
Continuously as new sequencing data becomes available.
Documentation of the dataset available under https://github.com/ratschlab/metagraph-open-data Documentation of the MetaGraph framework is available under https://metagraph.ethz.ch/
Biomedical Informatics Lab, ETH Zurich, Switzerland
See all datasets managed by Biomedical Informatics Lab, ETH Zurich, Switzerland.
Please open an issue under https://github.com/ratschlab/metagraph-open-data/issues
MetaGraph Sequence Indexes was accessed on DATE
from https://registry.opendata.aws/metagraph. Karasikov M, Mustafa H, Danciu D, Zimmermann M, Barber C, Raetsch G, Kahles A. Indexing All Life’s Known Biological Sequences. Preprint (2024). doi: 10.1101/2020.10.01.322164
arn:aws:s3:::metagraph
eu-west-1
aws s3 ls --no-sign-request s3://metagraph/