amazon.science natural language processing
N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.
Not updated
Creative Commons Attribution 3.0 Unported License
http://books.google.com/ngrams/
Not managed
See all datasets managed by Not managed.
https://books.google.com/ngrams
Google Books Ngrams was accessed on DATE
from https://registry.opendata.aws/google-ngrams.
arn:aws:s3:::datasets.elasticmapreduce/ngrams/books/
us-east-1
aws s3 ls --no-sign-request s3://datasets.elasticmapreduce/ngrams/books/