REDASA COVID-19 Open Data

coronavirus COVID-19 information retrieval life sciences natural language processing text analysis

Description

The REaltime DAta Synthesis and Analysis (REDASA) COVID-19 snapshot contains the output of the curation protocol produced by our curator community. A detailed description can be found in our paper. The first S3 bucket listed in Resources contains a large collection of medical documents in text format extracted from the CORD-19 dataset, plus other sources deemed relevant by the REDASA consortium. The second S3 bucket contains a series of documents surfaced by Amazon Kendra that were considered relevant for each medical question asked. The final S3 bucket contains the GroundTruth annotations created by our curator community.

Update Frequency

Yearly updates

License

CC-BY-4.0

Documentation

https://github.com/PanSurg/redasa-sample-data/blob/master/open-data.md

Managed By

REDASA Consortium, Imperial College London, UK

See all datasets managed by REDASA Consortium, Imperial College London, UK.

Contact

redasa-open-data@imperial.ac.uk

Usage Examples

Tools & Applications
Publications

Resources on AWS

  • Description
    This is the raw data repository containing a common crawl of CORD-19 papers and other sources identified by the REDASA Project.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::pansurg-curation-raw-open-data
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://pansurg-curation-raw-open-data/ --no-sign-request
  • Description
    For all the questions curated during the REDASA project, we created a Kendra index. The documents available in this S3 bucket were surfaced by the Kendra index as being relevant to the research medical question.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::pansurg-curation-workflo-kendraqueryresults50d0eb-open-data
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://pansurg-curation-workflo-kendraqueryresults50d0eb-open-data/ --no-sign-request
  • Description
    An S3 bucket that contains the final curation data in GroundTruth format
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::pansurg-curation-final-curations-open-data
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://pansurg-curation-final-curations-open-data/ --no-sign-request

Edit this dataset entry on GitHub

Home