The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

The Singapore Nanopore Expression Data Set

bam bioinformatics fast5 fasta fastq genomic life sciences long read sequencing short read sequencing transcriptomics


The Singapore Nanopore Expression (SG-NEx) project is an international collaboration to generate reference transcriptomes and a comprehensive benchmark data set for long read Nanopore RNA-Seq. Transcriptome profiling is done using PCR-cDNA sequencing (PCR-cDNA), amplification-free cDNA sequencing (direct cDNA), direct sequencing of native RNA (direct RNA), and short read RNA-Seq. The SG-NEx core data includes 5 of the most commonly used cell lines and it is extended with additional cell lines and samples that cover a broad range of human tissues. All core samples are sequenced with at least 3 high quality replicates. For a subset of samples spike-in RNAs are used and matched m6A profiling data is available.

Update Frequency

Datasets will be updated periodically as additional data are generated.


CC BY-NC 4.0


Managed By

The Genome Institute of Singapore (

See all datasets managed by The Genome Institute of Singapore (


SG-NEx team

How to Cite

The Singapore Nanopore Expression Data Set was accessed on DATE from In addition, please cite Chen et al. A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. bioRxiv (2021). doi: when referencing the SG-NEx data in publications.

Usage Examples

Tools & Applications

Resources on AWS

  • Description
    Nanopore long read RNA Seq data and matched short read RNA-Seq from the Singapore Nanopore Expression Project (SG-NEx). The data includes raw signal data (fast5), basecalled reads (fastq), aligned reads (bam), processed data for RNA modification detection (json), reference genome annotation files (gtf and fa) and sample metadata (txt).
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    AWS Region
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://sg-nex-data/
    Browse Bucket

Edit this dataset entry on GitHub

Tell us about your project