Preprint
Article

Exploiting Linguistic Knowledge for Low-Resource Neural Machine Translation

Altmetrics

Downloads

814

Views

667

Comments

1

This version is not peer-reviewed

Submitted:

29 February 2020

Posted:

02 March 2020

You are already at the latest version

Alerts
Abstract
Exploiting the linguistic knowledge of the source language for neural machine translation (NMT) has recently achieved impressive performance on many large-scale language pairs. However, since the Turkish→English machine translation task is low-resource and the source-side Turkish is morphologically-rich, there are limited resources of bilingual corpora and linguistic information available to further improve the NMT performance. Focusing on the above issues, we propose a multi-source NMT approach that models the word feature in parallel to external linguistic features by using two separate encoders to explicitly incorporate linguistic knowledge into the NMT model. We extend the word embedding layer of the knowledge-based encoder to accommodate for each word’s linguistic annotations in the context. Moreover, we share all parameters across encoders to enhance the representation ability of the NMT model on the source language. Experimental results show that our proposed approach achieves substantial improvements of up to 2.4 and 1.1 BLEU scores in Turkish→English and English→Turkish machine translation tasks, respectively, which points to a promising way to utilize the source-side linguistic knowledge for the low-resource NMT.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated