encyclopedic machine learning internet
A corpus of web crawl data composed of over 5 billion web pages.
Monthly
This data is available for anyone to use under the Common Crawl Terms of Use
http://commoncrawl.org/the-data/get-started/
http://commoncrawl.org/connect/contact-us/
arn:aws:s3:::commoncrawlus-east-1