art deep learning image processing labeled machine learning media
PD12M is a collection of 12.4 million CC0/PD image-caption pairs for the purpose of training generative image models.
Data will be adjusted as infringing works are discovered, improved provenance is acquired, or infringing captions are discovered.
https://cdla.dev/permissive-2-0/
https://huggingface.co/datasets/Spawning/PD12M
Spawning
See all datasets managed by Spawning.
PD12M was accessed on DATE
from https://registry.opendata.aws/pd12m.
arn:aws:s3:::pd12m
us-west-2
aws s3 ls --no-sign-request s3://pd12m/