The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

Japanese Tokenizer Dictionaries

csv japanese natural language processing

Description

Japanese Tokenizer Dictionaries for use with MeCab.

Update Frequency

Infrequently (typically less than once a year)

License

Versions of Unidic offered here are available under the GPL/LGPL/BSD license.IPADic is offered under a unique BSD-like license. See below.

https://github.com/polm/ipadic-py/blob/master/ipadic/dicdir/COPYING

Documentation

This dataset includes dictionaries for tokenization and morphological analysis of Japanese for use with MeCab. This includes NINJAL's UniDic, a modified smaller version of UniDic for situations that require it, and the legacy IPADic dictionary.

Managed By

Cotonoha

See all datasets managed by Cotonoha.

Contact

polm@cotonoha.io

How to Cite

Japanese Tokenizer Dictionaries was accessed on DATE from https://registry.opendata.aws/cotonoha-dic.

Usage Examples

Tutorials
Tools & Applications
Publications

Resources on AWS

  • Description
    Dictionary Files
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::cotonoha-dic
    AWS Region
    ap-northeast-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://cotonoha-dic/

Edit this dataset entry on GitHub

Tell us about your project

Home