encyclopedic internet natural language processing
A corpus of web screenshot and metadata data composed of over 70 million websites.
Monthly
Attribution 4.0 International (CC BY 4.0)
https://commonscreens.com/?page_id=1492
See all datasets managed by Common Screens.
Common Screens was accessed on DATE
from https://registry.opendata.aws/comonscreens.
arn:aws:s3:::common-screens
us-west-2
aws s3 ls --no-sign-request s3://common-screens/
dqh5x5k6xg3n1.cloudfront.net
us-west-2