In this paper, we provide a study on the effects of applying classical clustering algorithms, such as k-Means to free text recommender systems. A typical recommender system may face problems when the number of items from a database goes from a few items to hundreds of items. Currently, one of the most prominent techniques to scale the database is applying clustering, however clustering may have a negative impact on the accuracy of the system when applied without taking into consideration the underlying items. In this work, we build a conceptual text recommender system and use k-Means to partition its search space into different groups. We study how the variation of the number of clusters affects its performance in the light of two performance measurements: recommendation time and precision. We also analyze if this clustering is affected by the representation of text we use. All the techniques used in this study uses word-embeddings to represent the document. One of the main findings of this work is that using clustering we can improve the recommendation time in up to almost 30 times without affecting much off its initial accuracy. Another interesting finding is that the increment of the number of clusters is not directly translated into linear performance.
Keywords:
Subject: Computer Science and Mathematics - Information Systems
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.