Preprint
Article

What Do We Learn from Word Associations? Evaluating Machine Learning Algorithms for the Extraction of Contextual Word Meaning in Natural Language Processing

Altmetrics

Downloads

2055

Views

1070

Comments

0

Submitted:

04 May 2018

Posted:

07 May 2018

Read the latest preprint version here

Alerts
Abstract
You should know the words by the company they keep!” has been one of the most famous slogans attribute to John Rubert Firth, 1957. This has ignited a whole school in linguistic research known as the British empiricist contextualism. Sixty years later, many un- or semi-supervised machine learning algorithms have been successfully designed and implemented aiming at extracting word meaning from within the context of a text corpus. These algorithms treat words, more or less, as vectors of real numbers representing frequencies of word occurrences within context and word meaning as positions of words in a high-dimensional vector space models. Word associations, in turn, are treated as calculated distances among them. With the rise of Deep Learning (DL) and other artificial neural networks based architectures, learning the positioning of words and extracting word associations as measured by their distances has further improved. In this paper, however, we revisited the main stream of algorithmic approaches and set the stage for a partly cross-disciplinary evaluation framework to judge about the nature of the extracted word associations by state-of-the-art machine learning algorithms. Our preliminary results, which are based on word associations extracted from the application of DL framework on a Google News text corpus, and comparisons with human created word association lists, provide some insights into the inherited limitations in interpreting the type of word associations and underpinning relations between words with inevitable consequences in other areas, such as extraction of knowledge graphs or image understanding.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated