amazon.science json natural language processing
This dataset is part of the paper "McPhraSy: Multi-Context Phrase Similarity and Clustering" by DN Cohen et al (2022). The purpose of PCD is to evaluate the quality of semantic-based clustering of noun phrases. The phrases were collected from the [Amazon Review Dataset] (https://nijianmo.github.io/amazon/).
Not updated
This data is available for anyone to use under the terms of the CDLA-permissive license, which is available here
https://amazon-phrase-clustering.s3.amazonaws.com/readme.md
See all datasets managed by Amazon.
Post any questions to re:Post and use the AWS Open Data
tag.
Phrase Clustering Dataset (PCD) was accessed on DATE
from https://registry.opendata.aws/pcd.
arn:aws:s3:::amazon-phrase-clustering
us-west-2
aws s3 ls --no-sign-request s3://amazon-phrase-clustering/