Company
Date Published
Author
Kiju Kim • Jim Ferenczi
Word count
2025
Language
-
Hacker News points
None

Summary

In the blog post, Kiju Kim and Jim Ferenczi introduce Nori, an official Elasticsearch plugin for Korean language analysis, which evolved from a Lucene module initially borrowed from the Japanese morphological analyzer. Nori efficiently segments Korean text by transforming the mecab-ko-dic dictionary into a compressed binary format, optimizing the structure for fast lookups and reducing the original size significantly. The plugin uses the Viterbi algorithm to determine the most likely segmentation path by computing transition costs dynamically, thus enhancing throughput. Benchmarks comparing Nori to other plugins like Seunjeon and Arirang show that Nori offers superior indexing throughput, handling over 3000 documents per second without significant failures. The article underscores the ongoing effort to improve language support in Lucene and Elasticsearch, encouraging users to explore Nori through the latest Elasticsearch documentation and releases.