The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

AllTheBacteria

assembly bacteria bioinformatics fasta genomic life sciences microbial genomics short read sequencing whole genome sequencing

Description

All bacterial isolate whole-genome sequencing data from INSDC, uniformly assembled, quality-controlled, annotated, and searchable.

Update Frequency

The current release is for all SRA bacterial isolate data up to August 2024. The colllection will be updated occasionally, with no fixed schedule.

License

MIT License

Documentation

https://allthebacteria.org

Managed By

European Bioinformatics Institute

See all datasets managed by European Bioinformatics Institute.

Contact

https://github.com/AllTheBacteria/AllTheBacteria/issues

How to Cite

AllTheBacteria was accessed on DATE from https://registry.opendata.aws/allthebacteria.

Usage Examples

Publications

Resources on AWS

  • Description
    Individual, compressed genome assemblies in .fasta format in a public S3 bucket.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-assemblies
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-assemblies/
  • Description
    Phylogenetically-compressed, batched xz archives of all genome assemblies in .fasta format in a public S3 bucket.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-phylogeneticbatches
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-phylogeneticbatches/
  • Description
    Metadata for each genome assembly, including taxonomic information, in a public S3 bucket.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-metadata
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-metadata/
  • Description
    A LexicMap index of all genome assemblies. This can be used for efficient sequence alignment against all genomes.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-lexicmap
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-lexicmap/

Edit this dataset entry on GitHub

Tell us about your project

Home