bioinformatics life sciences metagenomics open source software protein protein folding
The Steinegger Lab Dataset comprises biological databases and resources critical for protein sequence and structure analysis, developed to support ColabFold, MMseqs2, and Foldseek/Foldcomp—three high-performance computational tools widely used in bioinformatics.The MMseqs2 dataset serves as the backbone for our fast structure prediction tool, ColabFold, and includes UniRef30, BFD, and the ColabFold environmental databases. These datasets are specifically designed for the rapid generation of multiple sequence alignments (MSAs), which are essential for high-accuracy structure prediction. Beyond MSA generation, these resources allow for fast taxonomy annotations and functional annotation, supporting a wide range of bioinformatics applications.The Foldseek dataset includes preprocessed databases such as the AlphaFold Database (AFDB), PDB, SwissProt, and CATH, specifically designed for protein structure similarity searches. These datasets encompass the majority of both experimental and predicted structural resources, supporting analyses for monomers and multimers alike.
Occasionally, where new data is available
For the MMseqs2/ColabFold dataset, please see https://colabfold.mmseqs.com For the Foldseek dataset, please see https://search.foldseek.com
Steinegger Lab, Seoul National University
See all datasets managed by Steinegger Lab, Seoul National University.
Steinegger Lab Datasets was accessed on DATE
from https://registry.opendata.aws/steineggerlab. If you’re using MMseqs2, please cite:
“Steinegger M and Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology (2017), doi: 10.1038/nbt.3988
If you're using Foldseek, please cite:
"van Kempen M, Kim S, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, and Steinegger M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology (2023), doi:10.1038/s41587-023-01773-0"
If you're using ColabFold, please cite:
"Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold: Making protein folding accessible to all. Nature Methods (2022) doi: 10.1038/s41592-022-01488-1"
arn:aws:s3:::steineggerlab
us-east-1
aws s3 ls --no-sign-request s3://steineggerlab/