Preprint
Article

Towards a Universal Semantic Dictionary

This version is not peer-reviewed.

Submitted:

25 July 2019

Posted:

29 July 2019

You are already at the latest version

A peer-reviewed article of this preprint also exists.

Abstract
A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, universal embedding space, is proposed in this paper. Previous approaches learn translation matrices between two specific languages, but this method learn translation matrices between a given language and a shared, universal space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case two different training data were applied; Dinu’s English-Italian benchmark data, and English-Italian translation pairs extracted from the PanLex database. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [1], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [2] and Dinu et al. [3]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.
Keywords: 
Subject: 
Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Alerts
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated