The Massively Multilingual Image Dataset (MMID)

computer vision machine learning machine translation natural language processing

Description

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, and parallel to the word's translation into English (and corresponding images.)

Update Frequency

Language data is added as it is ready for distribution.

License

See citation instructions at http://multilingual-images.org

Documentation

https://multilingual-images.org/doc.html

Managed By

Penn NLP

Contact

mmid-users@googlegroups.com

Resources on AWS

  • Description
    Images for words in various languages, packaged by in .tar archives by each language.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::mmid-pds
    AWS Region
    us-east-1

Edit this dataset entry on GitHub

Home