The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

End of Term Web Archive Dataset

archives internet natural language processing web archive

Description

The End of Term Web Archive (EOT) captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, and 2020. Data from these web crawls have been made openly available in several formats in this dataset.

Update Frequency

Every four years after a US Presidentaial Election

License

There are no restrictions on the use, access, and/or download of data from the End of Term Web Archive Dataset. We request that you cite the End of Term Web Archive project when using the data provided from this dataset.

Creative Commons Zero

Documentation

https://eotarchive.org/data/

Managed By

End of Term Web Archive

See all datasets managed by End of Term Web Archive.

Contact

Mark Phillips mark.phillips@unt.edu, Sawood Alam sawood@archive.org

How to Cite

End of Term Web Archive Dataset was accessed on DATE from https://registry.opendata.aws/eot-web-archive.

Usage Examples

Publications

Resources on AWS

  • Description
    Web Archive Crawl Data (WARC and ARC formats)
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::eotarchive
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://eotarchive/

Edit this dataset entry on GitHub

Tell us about your project

Home