Registry of Open Data on AWS

About

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

See all usage examples for datasets listed in this registry tagged with fast5.

Search datasets (currently 13 matching datasets)

You are currently viewing a subset of data tagged with fast5.

Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.

Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

The Singapore Nanopore Expression Data Set

bambioinformaticsfast5fastafastqgenomiclife scienceslong read sequencingshort read sequencingtranscriptomics

The Singapore Nanopore Expression (SG-NEx) project is an international collaboration to generate reference transcriptomes and a comprehensive benchmark data set for long read Nanopore RNA-Seq. Transcriptome profiling is done using PCR-cDNA sequencing (PCR-cDNA), amplification-free cDNA sequencing (direct cDNA), direct sequencing of native RNA (direct RNA), and short read RNA-Seq. The SG-NEx core data includes 5 of the most commonly used cell lines and it is extended with additional cell lines and samples that cover a broad range of human tissues. All core samples are sequenced with at least 3 ...

Details →

Usage examples

Differential RNA modification analysis tutorial with xPore by Yu Song Chuah
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data. by Yuk Kei Wan et al.
m6Anet: Identification of m6A from a single direct RNA-Seq sample by Christopher Hendra et al.
RNA modification analysis tutorial with m6Anet by Yu Song Chuah
Bambu: Transcript discovery and quantification using long read RNA-Seq data by Ying Chen, Andre Sim et al.

See 15 usage examples →

PubSeq - Public Sequence Resource

bambioinformaticsbiologycoronavirusCOVID-19fast5fastafastqgeneticgenomichealthjsonlife scienceslong read sequencingmedicineMERSmetadataopen source softwareRDFSARSSARS-CoV-2SPARQL

COVID-19 PubSeq is a free and open online bioinformatics public sequence resource with on-the-fly analysis of sequenced SARS-CoV-2 samples that allows for a quick turnaround in identification of new virus strains. PubSeq allows anyone to upload sequence material in the form of FASTA or FASTQ files with accompanying metadata through the web interface or REST API.

Details →

Usage examples

See 9 usage examples →

Human PanGenomics Project

cramfast5fastqgeneticgenomiclife sciences

This dataset includes sequencing data, assemblies, and analyses for the offspring of ten parent-offspring trios.

Details →

Usage examples

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes by Shafin et al (2020)

See 1 usage example →

Oxford Nanopore Technologies Benchmark Datasets

bioinformaticsbiologyfast5fastqgenomicHomo sapienslife scienceswhole genome sequencing

The ont-open-data registry provides reference sequencing data from Oxford Nanopore Technologies to support, 1) Exploration of the characteristics of nanopore sequence data. 2) Assessment and reproduction of performance benchmarks 3) Development of tools and methods. The data deposited showcases DNA sequences from a representative subset of sequencing chemistries. The datasets correspond to publicly-available reference samples (e.g. Genome In A Bottle reference cell lines). Raw data are provided with metadata and scripts to describe sample and data provenance.

Details →

Usage examples

ONT Dataset Tutorials by EPI2MELabs

See 1 usage example →

1KG-ONT-VIENNA panel

fast5fastqgeneticgenomiclife scienceswhole genome sequencing

The 1KG-ONT-VIENNA panel comprises medium coverage ONT sequencing data for 1.019 samples from the 1000 Genomes Project collection, structural variants, and their haplotype context.

Details →

Usage examples

Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project by Siegfried Schloissnig, Samarendra Pani, Bernardo Rodriguez-Martin, Jana Ebler, Carsten Hain, Vasiliki Tsapalou, Arda Söylev, Patrick Hüther, Hufsah Ashraf, Timofey Prodanov, Mila Asparuhova, Sarah Hunt, Tobias Rausch, Tobias Marschall, Jan O Korbel

See 1 usage example →