The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

About

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

See all usage examples for datasets listed in this registry tagged with amazon.science.


Search datasets (currently 13 matching datasets)

You are currently viewing a subset of data tagged with amazon.science.


Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.


Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

2021 Amazon Last Mile Routing Research Challenge Dataset

amazon.scienceanalyticsdeep learninggeospatiallast milelogisticsmachine learningoptimizationroutingtransportationurban

The 2021 Amazon Last Mile Routing Research Challenge was an innovative research initiative led by Amazon.com and supported by the Massachusetts Institute of Technology’s Center for Transportation and Logistics. Over a period of 4 months, participants were challenged to develop innovative machine learning-based methods to enhance classic optimization-based approaches to solve the travelling salesperson problem, by learning from historical routes executed by Amazon delivery drivers. The primary goal of the Amazon Last Mile Routing Research Challenge was to foster innovative applied research in r...

Details →

Usage examples

See 2 usage examples →

Amazon Bin Image Dataset

amazon.sciencecomputer visionmachine learning

The Amazon Bin Image Dataset contains over 500,000 images and metadata from bins of a pod in an operating Amazon Fulfillment Center. The bin images in this dataset are captured as robot units carry pods as part of normal Amazon Fulfillment Center operations.

Details →

Usage examples

See 2 usage examples →

Amazon-PQA

amazon.sciencemachine learningnatural language processing

Amazon product questions and their answers, along with the public product information.

Details →

Usage examples

See 1 usage example →

Answer Reformulation

amazon.sciencemachine learningnatural language processing

Original StackExchange answers and their voice-friendly Reformulation.

Details →

Usage examples

See 1 usage example →

Automatic Speech Recognition (ASR) Error Robustness

amazon.sciencedeep learningmachine learningnatural language processingspeech recognition

Sentence classification datatasets with ASR Errors.

Details →

Usage examples

See 1 usage example →

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

amazon.scienceconversation datamachine learningnatural language processing

This bucket contains the checkpoints used to reproduce the baseline results reported in the DialoGLUE benchmark hosted on EvalAI (https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview). The associated scripts for using the checkpoints are located here: https://github.com/alexa/dialoglue. The associated paper describing the benchmark and checkpoints is here: https://arxiv.org/abs/2009.13570. The provided checkpoints include the CONVBERT model, a BERT-esque model trained on a large open-domain conversational dataset. It also includes the CONVBERT-DG and BERT-DG checkpoints descri...

Details →

Usage examples

See 1 usage example →

Enriched Topical-Chat Dataset for Knowledge-Grounded Dialogue Systems

amazon.scienceconversation datamachine learningnatural language processing

This dataset provides extra annotations on top of the publicly released Topical-Chat dataset(https://github.com/alexa/Topical-Chat) which will help in reproducing the results in our paper "Policy-Driven Neural Response Generation for Knowledge-Grounded Dialogue Systems" (https://arxiv.org/abs/2005.12529?context=cs.CL). The dataset contains 5 files: train.json, valid_freq.json, valid_rare.json, test_freq.json and test_rare.json. Each of these files will have additional annotations on top of the original Topical-Chat dataset. These specific annotations are: dialogue act annotations a...

Details →

Usage examples

See 1 usage example →

Helpful Sentences from Reviews

amazon.scienceinformation retrievaljsonnatural language processingtext analysis

A collection of sentences extracted from customer reviews labeled with their helpfulness score.

Details →

Usage examples

See 1 usage example →

Humor Detection from Product Question Answering Systems

amazon.sciencemachine learningnatural language processing

This dataset provides labeled humor detection from product question answering systems. The dataset contains 3 csv files: Humorous.csv containing the humorous product questions, Non-humorous-unbiased.csv containing the non-humorous prodcut questions from the same products as the humorous one, and, Details →

Usage examples

See 1 usage example →

Humor patterns used for querying Alexa traffic

amazon.sciencedialogmachine learningnatural language processing

Humor patterns used for quering Alexa traffic when creating the taxonomy described in the paper "“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational Agents" by Shani C., Libov A., Tolmach S., Lewin-Eytan L., Maarek Y., and Shahaf D. (CHI LBW 2022). These patterns corrospond to the researchers' hypotheses regarding what humor types are likely to appear in Alexa traffic. These patterns were used for querying Alexa traffic to evaluate these hypotheses.

Details →

Usage examples

See 1 usage example →

Learning to Rank and Filter - community question answering

amazon.sciencemachine learningnatural language processing

This dataset provides product related questions and answers, including answers' quality labels, as as part of the paper 'IR Evaluation and Learning in the Presence of Forbidden Documents'.

Details →

Usage examples

See 1 usage example →

Multilingual Name Entity Recognition (NER) Datasets with Gazetteer

amazon.sciencenatural language processing

Name Entity Recognition datasets containing short sentences and queries with low-context, including LOWNER, MSQ-NER, ORCAS-NER and Gazetteers (1.67 million entities). This release contains the multilingual versions of the datasets in Low Context Name Entity Recognition (NER) Datasets with Gazetteer.

Details →

Usage examples

See 1 usage example →

PASS: Perturb-and-Select Summarizer for Product Reviews

amazon.sciencenatural language processingtext analysis

A collection of product reviews summaries automatically generated by PASS for 32 Amazon products from the FewSum dataset

Details →

Usage examples

See 1 usage example →

Pre- and post-purchase product questions

amazon.sciencemachine learningnatural language processing

This dataset provides product related questions, including their textual content and gap, in hours, between purchase and posting time. Each question is also associated with related product details, including its id and title.

Details →

Usage examples

See 1 usage example →

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

amazon.sciencemachine learningnatural language processing

This dataset provides how-to articles from wikihow.com and their summaries, written as a coherent paragraph. The dataset itself is available at wikisum.zip, and contains the article, the summary, the wikihow url, and an official fold (train, val, or test). In addition, human evaluation results are available at wikisum-human-eval...

Details →

Usage examples

See 1 usage example →

Airborne Object Tracking Dataset

amazon.sciencecomputer visiondeep learningmachine learning

Airborne Object Tracking (AOT) is a collection of 4,943 flight sequences of around 120 seconds each, collected at 10 Hz in diverse conditions. There are 5.9M+ images and 3.3M+ 2D annotations of airborne objects in the sequences. There are 3,306,350 frames without labels as they contain no airborne objects. For images with labels, there are on average 1.3 labels per image. All airborne objects in the dataset are labelled.

Details →

Amazon Berkeley Objects Dataset

amazon.sciencecomputer visiondeep learninginformation retrievalmachine learningmachine translation

Amazon Berkeley Objects (ABO) is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalog images. 8,222 listings come with turntable photography (also referred as "spin" or "360º-View" images), as sequences of 24 or 72 images, for a total of 586,584 images in 8,209 unique sequences. For 7,953 products, the collection also provides high-quality 3d models, as glTF 2.0 files.

Details →

FashionLocalTriplets

amazon.sciencecomputer visionmachine learning

Fine-grained localized visual similarity and search for fashion.

Details →

MWIS VR Instances

amazon.sciencegraphtraffictransportation

Large-scale node-weighted conflict graphs for maximum weight independent set solvers

Details →