The Massively Multilingual Image Dataset (MMID)

computer vision machine learning machine translation natural language processing

Description

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, and parallel to the word's translation into English (and corresponding images.)

Update Frequency

Language data is added as it is ready for distribution.

See all datasets managed by Penn NLP.

Contact

mmid-users@googlegroups.com

How to Cite

The Massively Multilingual Image Dataset (MMID) was accessed on DATE from https://registry.opendata.aws/mmid.

Resources on AWS

Description

Images for words in various languages, packaged by in .tar archives by each language.

Resource type

S3 Bucket

Amazon Resource Name (ARN)

arn:aws:s3:::mmid-pds

AWS Region

us-east-1

AWS CLI Access (No AWS account required)

aws s3 ls --no-sign-request s3://mmid-pds/

The Massively Multilingual Image Dataset (MMID)

Description

Update Frequency

License

Documentation

Managed By

Contact

How to Cite

Resources on AWS