PreprintArticleVersion 1This version is not peer-reviewed
Unsupervised Modeling of E-Customers’ Profiles: Multiple Correspondence Analysis with Hierarchical Clustering of Principal Components and Machine Learning Classifiers
Version 1
: Received: 4 November 2024 / Approved: 5 November 2024 / Online: 6 November 2024 (12:23:35 CET)
How to cite:
Vrhovac, V.; Orošnjak, M.; Ristić, K.; Sremcev, N.; Jocanović, M.; Spajić, J.; Brkljač, N. Unsupervised Modeling of E-Customers’ Profiles: Multiple Correspondence Analysis with Hierarchical Clustering of Principal Components and Machine Learning Classifiers. Preprints2024, 2024110363. https://doi.org/10.20944/preprints202411.0363.v1
Vrhovac, V.; Orošnjak, M.; Ristić, K.; Sremcev, N.; Jocanović, M.; Spajić, J.; Brkljač, N. Unsupervised Modeling of E-Customers’ Profiles: Multiple Correspondence Analysis with Hierarchical Clustering of Principal Components and Machine Learning Classifiers. Preprints 2024, 2024110363. https://doi.org/10.20944/preprints202411.0363.v1
Vrhovac, V.; Orošnjak, M.; Ristić, K.; Sremcev, N.; Jocanović, M.; Spajić, J.; Brkljač, N. Unsupervised Modeling of E-Customers’ Profiles: Multiple Correspondence Analysis with Hierarchical Clustering of Principal Components and Machine Learning Classifiers. Preprints2024, 2024110363. https://doi.org/10.20944/preprints202411.0363.v1
APA Style
Vrhovac, V., Orošnjak, M., Ristić, K., Sremcev, N., Jocanović, M., Spajić, J., & Brkljač, N. (2024). Unsupervised Modeling of E-Customers’ Profiles: Multiple Correspondence Analysis with Hierarchical Clustering of Principal Components and Machine Learning Classifiers. Preprints. https://doi.org/10.20944/preprints202411.0363.v1
Chicago/Turabian Style
Vrhovac, V., Jelena Spajić and Nebojša Brkljač. 2024 "Unsupervised Modeling of E-Customers’ Profiles: Multiple Correspondence Analysis with Hierarchical Clustering of Principal Components and Machine Learning Classifiers" Preprints. https://doi.org/10.20944/preprints202411.0363.v1
Abstract
The rapid growth of e-commerce has transformed customer behaviours, demanding deeper insights into how demographic factors shape online user preferences. To understand the impact of these changes, this study performs a threefold analysis. Firstly, the study investigates how demographic factors (e.g., age, gender, education, income) influence e-customer preferences in Serbia. From a sample of n = 906 respondents, we test conditional dependencies between demographics and user preferences – “purchase frequency”, “the most important property when buying for the first time”, “the most important property before repeating a purchase”, and “reasons for quitting an online purchase”. From a hypothetical framework of 24 tested hypotheses, the study successfully rejects 8/24 (with p < 0.05), suggesting a high association between demographics with purchase frequency (p < 0.01) and reasons for quitting the purchase (p < 0.01). However, although reported test statistics suggest an association, understanding how interactions between categories shape e-customer profiles is lacking. As a consequence, the second part considers an MCA-HCPC (Multiple Correspondence Analysis with Hierarchical Clustering on Principal Components) to identify user profiles. The analysis reveals three main clusters : (1) young female unemployed e-customers driven mainly by customer reviews; (2) retirees and older adults with infrequent purchases, hesitant to buy without experiencing the product in person; (3) employed, highly educated, male midlife adults who prioritise fast and accurate delivery over price. In the third stage, the study uses identified clusters as labels for Machine Learning (ML) classification through the following algorithms: Gradient Boosting Machine (GBM), Decision Tree (DT), k-Nearest Neighbors (kNN), Gaussian Naïve Bayes (GNB), Random Forest (RF) and Support Vector Machine (SVM). The results suggest high classification performance of GBM (AUROC = 0.994), RF (AUROC = 0.994) and SVM (AUROC = 0.902) in identifying user profiles. Lastly, after performing Permutation Feature Importance (PFI), the findings suggest that age, work status, education, and income are the main determinants of shaping e-customer profiles and developing marketing strategies.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.