The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

Allen Institute for Artificial Intelligence (AI2)

Amazon Web Services is collaborating with the Allen Institute for Artificial Intelligence to distribute benchmark datasets used for solving the hardest problems in artificial intelligence.


Search datasets (currently 13 matching datasets)


Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.


Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

AI2 Diagram Dataset (AI2D)

machine learning

4,817 illustrative diagrams for research on diagram understanding and associated question answering.

Details →

Usage examples

See 1 usage example →

AI2 Meaningful Citations Data Set

csvmachine learning

630 paper annotations

Details →

Usage examples

See 1 usage example →

AI2 Reasoning Challenge (ARC) 2018

csvjsonmachine learning

7,787 multiple choice science questions and associated corpora

Details →

Usage examples

See 1 usage example →

COVID-19 Open Research Dataset (CORD-19)

coronavirusCOVID-19life sciencesMERSSARS

Full-text and metadata dataset of COVID-19 and coronavirus-related research articles optimized for machine readability.

Details →

Usage examples

See 1 usage example →

Discrete Reasoning Over the content of Paragraphs (DROP)

machine learningnatural language processing

The DROP dataset contains 96k Question and Answer pairs (QAs) over 6.7K paragraphs, split between train (77k QAs), development (9.5k QAs) and a hidden test partition (9.5k QAs).

Details →

Usage examples

See 1 usage example →

Quoref

machine learningnatural language processing

24K Question/Answer (QA) pairs over 4.7K paragraphs, split between train (19K QAs), development (2.4K QAs) and a hidden test partition (2.5K QAs).

Details →

Usage examples

See 1 usage example →

Reasoning Over Paragraph Effects in Situations (ROPES)

jsonmachine learningnatural language processing

14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs).

Details →

Usage examples

See 1 usage example →

AI2 TabMCQ: Multiple Choice Questions aligned with the Aristo Tablestore

machine learningnatural language processing

9092 crowd-sourced science questions and 68 tables of curated facts

Details →

AI2 Tablestore (November 2015 Snapshot)

machine learningnatural language processing

68 tables of curated facts

Details →

Aristo Mini Corpus

csvjsonmachine learning

1,197,377 science-relevant sentences

Details →

Aristo Tuple KB

machine learningnatural language processing

294,000 science-relevant tuples

Details →

Textbook Question Answering (TQA)

machine learning

1,076 textbook lessons, 26,260 questions, 6229 images

Details →

ZEST: ZEroShot learning from Task descriptions

machine learningnatural language processing

ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.

Details →