Korean-Wikipedia words frequency list

Back to hanguk.thbz.org

You may find here a list of Korean words ordered by frequency.

The list is available here (ZIP format, around 26 megabytes):

This list was generated using a simple Perl program from a dump of the Korean language Wikipedia retrieved on December 1st, 2014.

Each line contains two fields: a Korean word and the number of occurrencies of that word in Wikipedia. The list is ordered by descending number of occurrencies.

Words are defined in a very simple way: each consecutive string of hangeul characters is considered as a word. A word stops when the parser encounters a space, a ponctuation or any character that is not defined as hangeul in the Unicode standard.

For example, '오신', `전', `계십니까', 행동이야말로`' are words.

Use at your own risk: this is not a scientific work! Its only purpose was to provide a source for my HangulDrill program. However, if you think it may be useful, you may use freely this list according to the terms of the following license (used by the Korean Wikipedia): Creative Commons Attribution-ShareAlike 3.0 Unported License (https://creativecommons.org/licenses/by-sa/3.0/).

If so, it would be nice to mention the source, for example in the following way:

Source: Korean Wikipedia words frequency list (http://hanguk.thbz.org/koreanwordlist)

Thierry Bézecourt, February 2015.