encyclopedic internet natural language processing
A corpus of web screenshot and metadata data composed of over 70 million websites.
Monthly
Attribution 4.0 International (CC BY 4.0)
https://commonscreens.com/?page_id=1492
See all datasets managed by Common Screens.
Common Screens was accessed on DATE from https://registry.opendata.aws/comonscreens.
arn:aws:s3:::common-screensus-west-2aws s3 ls --no-sign-request s3://common-screens/dqh5x5k6xg3n1.cloudfront.netus-west-2