A peer-reviewed article of this preprint also exists.
Abstract
A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, universal embedding space, is proposed in this paper. Previous approaches learn translation matrices between two specific languages, but this method learn translation matrices between a given language and a shared, universal space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case two different training data were applied; Dinu’s English-Italian benchmark data, and English-Italian translation pairs extracted from the PanLex database. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [1], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [2] and Dinu et al. [3]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.
Keywords:
Subject:
Computer Science and Mathematics - Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Alerts
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.