Sudachi Language Resources

natural language processing

Description

Japanese dictionaries and word embeddings for natural language processing. SudachiDict is the dictionary for a Japanese tokenizer (morphological analyzer) Sudachi. chiVe is Japanese pretrained word embeddings (word vectors), trained using the ultra-large-scale web corpus NWJC by National Institute for Japanese Langauge and Linguistics, analyzed by Sudachi.

Update Frequency

The dictionaries are updated every few months to include neologism and fixes for the existing words.

License

Apache-2.0

Documentation

https://worksapplications.github.io/Sudachi/

Managed By

Works Applications

See all datasets managed by Works Applications.

Contact

sudachi@worksap.co.jp

Usage Examples

Tutorials
Tools & Applications
Publications

Resources on AWS

  • Description
    SudachiDict: Binary format of the mophological analysis dictionaries chiVe: Pretrained word embedding in various formats
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sudachi
    AWS Region
    ap-northeast-1
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://sudachi/ --no-sign-request

Edit this dataset entry on GitHub

Home