natural language processing
Japanese dictionaries and word embeddings for natural language processing. SudachiDict is the dictionary for a Japanese tokenizer (morphological analyzer) Sudachi. chiVe is Japanese pretrained word embeddings (word vectors), trained using the ultra-large-scale web corpus NWJC by National Institute for Japanese Langauge and Linguistics, analyzed by Sudachi.
The dictionaries are updated every few months to include neologism and fixes for the existing words.
Apache-2.0
https://worksapplications.github.io/Sudachi/
See all datasets managed by Works Applications.
arn:aws:s3:::sudachi
ap-northeast-1
aws s3 ls s3://sudachi/ --no-sign-request