The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

Software Heritage Graph Dataset

digital preservation free software open source software source code

Description

Software Heritage is the largest existing public archive of software source code and accompanying development history. The Software Heritage Graph Dataset is a fully deduplicated Merkle DAG representation of the Software Heritage archive.The dataset links together file content identifiers, source code directories, Version Control System (VCS) commits tracking evolution over time, up to the full states of VCS repositories as observed by Software Heritage during periodic crawls. The dataset’s contents come from major development forges (including GitHub and GitLab), FOSS distributions (e.g., Debian), and language-specific package managers (e.g., PyPI). Crawling information is also included, providing timestamps about when and where all archived source code artifacts have been observed in the wild.

Update Frequency

Data is updated yearly

License

Creative Commons Attribution 4.0 International.By accessing the dataset, you agree with the Software Heritage Ethical Charter for using the archive data, the terms of use for bulk access, and the Software Heritage principles for large language models.

Documentation

https://docs.softwareheritage.org/devel/swh-dataset/graph/athena.html

Managed By

Software Heritage

See all datasets managed by Software Heritage.

Contact

aws@softwareheritage.org

How to Cite

Software Heritage Graph Dataset was accessed on DATE from https://registry.opendata.aws/software-heritage.

Resources on AWS

  • Description
    Software Heritage Graph Dataset
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::softwareheritage
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://softwareheritage/
  • Description
    S3 Inventory files
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::softwareheritage-inventory
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://softwareheritage-inventory/

Edit this dataset entry on GitHub

Tell us about your project

Home