Amazon Web Services is collaborating with the Allen Institute for Artificial Intelligence to distribute benchmark datasets used for solving the hardest problems in artificial intelligence.
If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.
Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.
4,817 illustrative diagrams for research on diagram understanding and associated question answering.
630 paper annotations
7,787 multiple choice science questions and associated corpora
The DROP dataset contains 96k Question and Answer pairs (QAs) over 6.7K paragraphs, split between train (77k QAs), development (9.5k QAs) and a hidden test partition (9.5k QAs).
24K Question/Answer (QA) pairs over 4.7K paragraphs, split between train (19K QAs), development (2.4K QAs) and a hidden test partition (2.5K QAs).
14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs).
9092 crowd-sourced science questions and 68 tables of curated facts
ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.