Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

Ahrii Kim; Jinhyun Kim

doi:10.20944/preprints202201.0018.v1

Submitted:

02 January 2022

Posted:

04 January 2022

You are already at the latest version

Abstract

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.

Keywords:

NMT Evaluation

;

Meta-Evaluation

;

SacreBLEU

;

Korean

Subject:

Computer Science and Mathematics - Computer Science

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe