Preprint
Article

Predict the Shopping Trip (Online and Offline) Using a Combination of a Gray Wolf Optimization Algorithm (GWO) and a Deep Convolutional Neural Network: A Case Study of Tehran, Iran

Altmetrics

Downloads

85

Views

40

Comments

0

Submitted:

08 September 2023

Posted:

12 September 2023

You are already at the latest version

Alerts
Abstract
Online and offline shopping trip have different impacts on various aspects of urban life, such as e-commerce, transportation systems, and sustainability. Therefore, it is important to evaluate the factors that influence their choices. We use a hybrid machine learning model that combines a gray wolf optimization algorithm and a deep convolutional neural network to estimate shopping trip based on a survey of 1,000 active e-commerce users who made successful orders in both online and offline services in the last 20 days of 2021 in areas 2 and 5 of Tehran. The gray wolf optimization algorithm performs feature selection and hyperparameter tuning for the deep convolutional neural network, which is a powerful deep learning model for image recognition and classification. The results show that our model achieves an accuracy of 97.81% with an MSE of 0.325 by selecting seven out of ten features. The most important features are delivery cost, delivery time, product price, car ownership. In addition, comparing the performance of the proposed method with other methods showed that the proposed algorithm with an accuracy of 97.81%, the accuracies of the single deep learning model, MLP neural network, decision tree, and KNN models were 95.63%, 90.0%, 86.49%, and 80.16%, respectively.
Keywords: 
Subject: Engineering  -   Transportation Science and Technology

1. Introduction

Many statistics show that after trip with business purposes, the shopping trips have the highest demand among people. Therefore, optimizing, replacing or modifying such trips; using appropriate policies and planning based on information and communication technology can have a positive effect on the transportation systems [1,2]. The issue of travel demand and production as well as its effect on the urban transportation system is of special importance in transportation planning, especially in a country like Iran, which faces the severe problem of traffic congestion in many of its cities and provinces. Studies show that information and communication technology has completely changed the way of human life and had a wide influence in the world society from the point of view of work and business issues, for people to shop and entertainment. The management of information resources is of a great importance in different parts of Iran, especially in the e-commerce sector, according to its application. In fact, information and communication technology refers to an advanced information and remote technology in various private sectors (government and public agencies ,transportation ,telecommunications ,banking ,insurance ,hotel ,engineering units and other businesses), that electronic exchanges in their activities.
Customers in e-commerce are the main pulse of business dynamics on the internet platform. Businesses have increased significantly compared to previous years, and some users still do not have the necessary reliability in online shopping and prefer to visit a physical store for the products they need(offline shopping trip). Therefore, attracting positive opinions from customers, in addition to creating motivation for managers, will lead to indirect marketing of their products. Perhaps these conditions have made the Industrial Management Organization take action and to enhance the credibility of users in e-commerce, to rate the Internet companies active in this sector according to the opinions and views of customers [3].
Depending on the types of online or offline shopping, the types of shopping trip are very important. The logistics system, which includes the transportation, handling, processing and access to logistics information, for the coordination of transportation and delivery, ordering and manufacturing processes, order changes, production scheduling, logistics plans and warehousing operations, is the most important part in the network of Internet product supply companies. On the other hand, in offline purchases, the types of travels and the use of transportation systems by customer, in addition to economic issues, also include environmental issues. Based on this, the estimation of the types of trips is selected as the main topic of this treatise. Obviously, due to the large amount of features related to shopping trips that are collected in both online and offline shopping, it is necessary to use new methods based on artificial intelligence and computing technologies [4].
In the era of e-commerce, information about online shopping trip, mainly from private customers, is collected through online platforms, which has sometimes led to a dynamic and uncertain environment. This is because the address of the customers is not known in advance. On the other hand, many times, the number of orders is much higher than the traditional distribution service related to routing problems. In addition, a wide range of trips are demanded by customers [5].
Since online customers often expect fast delivery, their orders should be processed immediately upon registration. All operations required to deliver orders to customers (order processing and pickup, long-term shipping and delivery) are compressed into a short period of time. Therefore, this usually happens: orders arrive at the distribution system when existing (previous) trips have already started the distribution process, so they need to be integrated into the delivery plan [6].
In online shopping trip, a wide range of travels are placed depending on the address of a customer using the logistics system, the most important issue is the delivery of the ordered to the farthest point of the customer (optimization of batching trips), which sometimes leads to complaints by customers due to possible delays or the failure of the product to arrive at a certain time. This factor shows many challenges in the field of choosing to online or offline shopping trip. In the general category, the advantages of online shopping trip, from the point of view of customers, are saving time, saving travel costs, using product discounts, buying at any hour of the day, not standing in line, and avoiding congestion. On the other hand, online shopping has disadvantages such as the risk of fraud, delay in delivery, lack of physical examination of the product, hidden fees of delivery, and the long procedure of returning the product, which has made many customer want to visit a physical store for their needs. Naturally, each of these two types of shopping (online and offline) involves evaluating shopping trips. Based on this, it is necessary to predict the type of shopping trip that can be made by the customer (offline trips) or companies providing internet products (online trips) using the factors influencing the type of online or offline choice by customers. The importance of writing this paper demonstrates the necessity of employing methods based on artificial intelligence and machine learning. By identifying and prioritizing the features affecting the generation of online and offline shopping trips, we can play a significant role in optimizing delivery costs, reducing pollutant emissions, reducing urban traffic, increasing user satisfaction, and contributing to the growth of sustainable development. Table 1 shows the number of online and offline shopping trips in one day in Tehran [7]
Figure 1 and Figure 2 also shows the distribution of trips with the purpose of online shopping in the 22 districts of Tehran.
The challenge of estimating the amounts of online and offline shopping trips is one of the most important challenges that we seek to accomplish in this research by applying machine learning models.

2. Literature Review

Due to the importance of the estimating the amounts of online and offline shopping trips, various research has been conducted on the field of estimating shopping trips.
Shao et al. [6] assessed the effects of physical and virtual accessibility on e-commerce based on the geographic location of buyers. They used a spatial autoregressive model (SAC) to examine how physical and virtual accessibility influence the spatial distribution of online shopping trip in 276 provincial-level cities in China. The results indicate that both physical access (measured by the relative number of shopping centers and public transportation system) and virtual access (measured by the percentage of broadband subscribers and the relative number of delivery points) enhance online shopping trips.
Dong et al. [8] applied a machine learning approach to estimate customer behavior for a large multipurpose online store between October and November 2019. They found that the pipeline and random forest algorithms had the highest performance with 96% accuracy. They also showed that the indicators of busyness and product price comparison had the greatest impact on increasing the online shopping trip intention.
Xiong [9] examined consumer behavior in online shopping in the context of artificial intelligence and digital economy. The main focus of this paper is on the factors that affect the online shopping trip intention within a day. Based on the data collected by a questionnaire, the paper found that online shopping trip was prevalent among all age groups in China, with young people being the majority.
Xiahou and Harada [10] explored the online and offline shopping behavior using machine learning techniques and longitudinal and multidimensional data variables. They proposed a churn user prediction model based on the combination of k-means customer segmentation and support vector machine (SVM). The results indicated that the online shopping trip intention was higher than the offline one. They also found that the SVM method had higher accuracy than the logistic regression method.
Lee et al. [11] applied and compared different machine learning algorithms to predict online shopping trip conversion using 374,749 online consumer behavior data from the Google product store. They found that the ensemble model of the incremental gradient method was the most suitable method for predicting online shopping trip conversion, and that oversampling was the best method to reduce the bias of data imbalance.
Espinoza et al. [12] examined consumer behavior in online and offline shopping trips in the context of the coronavirus pandemic. They used primary data from a structured questionnaire and an online survey to collect 200 heterogeneous types of products, and they investigated the factors that influenced people’s purchase choices. They found that the respondents’ skill level in using the Internet, among various technological factors, had a significant effect on their preference for the mode of shopping trip. They also found that factors such as quick product information, wider product selection, better prices and discounts influenced customers to choose online shopping trips, while faster delivery time and reliability and accuracy of product quality influenced consumers to choose offline shopping trips.
Chawla et al. [13] used artificial neural networks to predict offline shopping trip demand for an American retail company. They developed a comparative forecasting mechanism based on ANN and ANFIS techniques to handle the trip demand forecasting problem under fuzzy conditions. They evaluated the results and showed that the ANFIS method was more effective than the ANN structure in producing more reliable forecasts for their case study.
Shi et al. [14] proposed an approach to improve support service decision-making by predicting offline shopping trip interactions and intentions in real time using historical time series data. They analyzed real-time consumer behavior data of offline customers. They confirmed that context-aware interaction could greatly enhance consumers’ shopping experience in the offline scenario. A summary of the literature review is shown in Table 2.
Based on the literature review, we found that despite the high importance of the problem and the research that has been done in this field, most of them focused on customer behavior and demand estimation, and it seems that some aspects of this research have not received enough attention. Moreover, according to the research conducted, it seems that the choice between online and offline shopping has not been considered as a multi-criteria decision-making problem and has only been investigated and analyzed separately. Therefore, the contributions of this paper can be stated as follows:
  • Using the deep learning approach by gray wolf optimization to estimate online and offline shopping trips
  • Examining and prioritizing the factors that affect the generation of online and offline shopping trips
According to the detailed analysis of previous studies, some of which were discussed in this chapter, it can be stated that the main method of this research involves employing a deep learning model, specifically deep convolutional network (CNN), in the classification of shopping trip features in order to detect shopping trips. This model takes input vectors, which are in the form of numerical vectors, and by applying transformation on them with several neurons in each layer, it transfers the output vector to the next layer. On the other hand, the efficiency of the deep learning model depends on the high capacity of learning complex nonlinear patterns from input data. However, when the complexity of the layers increases, the efficiency of this model decreases. One of the most important techniques to solve this problem is to enhance and optimize the deep learning model through hyperparameter tuning and feature selection with meta-heuristic algorithms. Considering that most high-dimensional data sets, such as time series data sets, contain additional features, outliers and noisy data, one of the most important steps in order to increase classification performance and reduce operating costs is to perform feature selection so that only the most relevant features are selected. Finding the optimal features is a challenging problem, especially when dealing with a large search space. Finding the fewest and most critical features that provide enough insight to describe the data set is the main goal of feature selection. In general, feature selection can save processing time, make data interpretation easier, avoid curse of dimensionality, and reduce overfitting. When feature selection is performed, misleading, irrelevant and redundant features are removed, leading to the selection of the appropriate features. Consequently, this step leads to a decrease in the computational load and an improvement in the speed of data processing. Within the meta-heuristic algorithms, the gray wolf algorithm is of great importance because of the extensive search in the computing space. In the process of hunting gray wolves, three groups of alpha, beta and delta wolf coordinate the hunting process (identification of relevant features), simultaneously. The alpha wolf is considered as the leader for following beta and delta, which is supposed to be aware of the approximate position of the prey. Therefore, the three best solutions found up to a given iteration are stored, motivating other wolves to update their positions in the decision space according to the optimal location. This step causes the selection of optimal features in the search space to be performed continuously. The position of three different vectors determines the wolf update mechanism, which selects the best features from the first three best solutions. This problem demonstrates the high efficiency of the GWO algorithm in feature selection.

3. Methodology

This study surveyed 1,000 active e-commerce users in Tehran’s 2nd and 5th regions, who had made successful orders in online and offline services in the last 20 days of the year. The sampling technique was purposive. The factors affecting the mode of travel for purchase, such as age, gender, marital status, car ownership, delivery cost, delivery time, product price, income, employment status, and education level, were measured by questionnaires. Based on the data frequency and Cochran’s formula calculations, 500 questionnaires were obtained from online trips by sending text messages and 500 questionnaires were obtained from offline trips by handing them out in shopping centers in these two regions. All the data were transformed into numerical values and served as the algorithm input. The general procedure of the proposed method is shown as a flowchart in Figure 3.
The Gray Wolf Optimization (GWO) is applied to set the hyperparameters and select the optimal features after reading and pre-processing the data set. The GWO algorithm emulates the hunting phases of wolves, which include three parts: stalking and encircling the prey, harassing the prey until it stops moving, and finally attacking the prey. Each wolf ‘i’ in the search space has a position vector W i = w i 1 , w i 2 , , w in that corresponds to the n dimensions of the problem. The position of the wolves is evaluated by the fitness function (suitable to the problem definition). The best wolf is indicated by alpha (α), the second best by beta (β) and the third best by delta (δ). The position of the wolves is updated based on the positions of the three wolves, alpha, beta, and delta during the hunting (optimization) process [19].
The process of implementing the gray wolf optimizer algorithm is as follows:
  • Inputs:
    N: number of wolves (population members)
    T: number of algorithm executions
    F: fitting function (according to the problem definition)
  • Output
Wolf alpha (α): the best solution obtained
  • Randomly create a population of wolves.
  • Initialize coefficient vectors A, C and a according to relations (1) and (2). Vector A has random values between [a,a], which models the divergence. When |A|>1, search agents (wolves) are forced to move away from the prey and when |A|<1 to attack. The vector C contains random values in the interval [0,2], which helps the agents to avoid the trap of local optimality. The components of the vector a decrease linearly from 2 to 0 during the iteration of the algorithm.
  • Calculate the fit value of each search factor (wolf).
  • Among the population, choose the first best solution as the alpha wolf, the second best solution as the beta wolf, and the third best solution as the delta wolf.
  • If the number of repetitions is over, return the alpha wolf.
  • Update the position of the wolves according to the position of the three wolves Alpha, Beta and Delta using relationships.
  • Update vector values of coefficients A, C and go to step 3 [19].
A = 2 a . r 1 ¯   a
c = 2. r 2 ¯
D ¯ a = | C ¯ 1 . X ¯ a X ¯ | , D ¯ β = | C ¯ 2 . X ¯ β X ¯ |   , D ¯ δ = | C ¯ 3 . X ¯ δ X ¯ |
X ¯ 1 = X a   A 1 . ( D ¯ a ) , X ¯ 2 = X β   A 2 . ( D ¯ β ) , X ¯ 3 = X δ   A 3 . ( D ¯ δ )
X ( t + 1 ) ¯ = X ¯ 1 + X ¯ 2 + X ¯ 3 3
In this algorithm, N is the number of flocks and D is the number of decision variables or dimensions of the problem. The pack of gray wolves is simulated by an N*D matrix, where each row is a potential solution. In the proposed model, N is the number of records and D is the number of features in the data set. The herd population, which consists of many wolves, is defined by Eq. 6 The gray wolf algorithm operates on the data set as follows: each flock contains one wolf (feature) and each flock has N wolves. Each herd is characterized by these D features:
Population   of   GWO = [ x 11 x 12 x 12 x 22 x 22 x 2 d x n 1 x n 2 x nd ]
Each x i = ( x i 1 , x i 2 , , x id ) ,   i = 1 , 2 , . , n in the set is a potential solution in the solution space. A pack of wolves has a group of attacking wolves, which are the elements of a solution. The attacking wolves in a pack move together as a unit, seeking a place with rich resources. A pack of wolves reaches an optimal solution when it finds an ideal position. The objective function evaluates each pack of wolves according to Eq. 7.
fit i = 1 Obj i worst ( Obj ) best ( Obj ) worst ( Obj )
The fitness of the herd is fit i in Eq. 7 The objective function value for the herd is Obj i . Distance criteria measure each pack of wolves. The worst and best parameters are the lowest and highest wolf pack relative to the prey. The proposed model changes the gray wolves algorithm from continuous to discrete. Numbers are converted to binary using the v-shaped hyperbolic tangent function of two solutions, as in Eq. 8 The algorithm modifies the hyperbolic tangent function from V-shaped functions, as in Eq. 8 and 9. The algorithm finds the best combination of features by searching the feature space.
y k = | tan hx k |
x ij = { 0 ,   if   rand < y k 1 ,   otherwise
The proposed model uses the gray wolves algorithm to select a subset of features that leads to the optimal value. Eq. 10 defines the fitness function for feature selection from each pack of wolves. In Eq. 10, |n| is the total number of features and |S| is the number of selected features. The accuracy parameter is the percentage of accuracy and the parameters δ and ρ have fixed values of 1 and 99, respectively.
Fitness = δ . Accuracy + ρ .   | n | | S | | n |
The data set is split into two parts: training (80% of samples) and testing (20% of samples). The training data builds the evaluation model, and the testing data evaluates the model by assigning labels and classes to the records. The CNN algorithm classifies the data according to the labels learned from the training data after extracting the data. One of the most important deep learning methods is Convolutional Neural Network (CNN), which trains multiple layers in a powerful way. This method is very effective and widely used in various computer vision applications. A CNN network generally has three main layers: convolution layer, pooling layer and fully connected layer. Each layer performs a different task. A CNN network has two phases for training: feed forward phase and back propagation phase. A CNN network usually has two parts. The first part alternates convolution and pooling operations to generate deep features from the raw data. The second part connects the features to a classifier for classification. Figure 4 shows a typical CNN architecture for classification with two convolution layers and two pooling layers [20]
The Max pooling method is usually preferred because it can lead to faster convergence, better generalization and excellent selection of invariant features. Another layer in deep neural networks is the fully connected layer. This layer converts the 2D feature maps from the pooling stage into a 1D feature vector. The fully connected layer acts like a traditional artificial neural network and contains about 90% of the CNN parameters. The fully connected layer produces a vector of a specified size that can be used for classification or further processing [21].

4. Results

The first step is to present the descriptive statistics of the statistical population. Table 3 shows this information.
The gray wolf algorithm evaluates the data, which consists of 10 features describing the behavior of 1,000 people. Table 4 shows the outcome of applying the gray wolf optimization (GWO) algorithm.
The proposed approach estimates the type of shopping trip after setting the hyperparameters. Table 5 shows the CNN algorithm architecture used after setting the hyperparameters.
By the above settings, Table 6 presents the results of trip type prediction for online and offline purchases.
The next step is to compare the results obtained by the proposed algorithm with other models, such as single CNN convolution, K-nearest neighbor (KNN), decision tree (DT), and MLP neural network. Table 7 shows the result of this comparison.
According to the results presented in Table 7, it can be seen that the proposed algorithm has a higher efficiency than other algorithms.

5. Conclusion

Online shopping is very popular today, as many customers use smart phones and handheld devices to search and buy products online. However, the type of shopping trip varies depending on whether the purchase is online or offline. The logistics system, which integrates transportation, ordering, manufacturing, order changes, production scheduling, logistics plans and warehousing operations, is a crucial part of the supply chain for online products. The type of travel for offline purchases affects not only the economy, but also the environment, as customers use different transportation systems. Therefore, the main topic of this article is to estimate the type of travel for online and offline shopping using time series data and an integrated approach of gray wolf and deep learning. The data was collected from 1,000 users who had successful orders in online and offline services in the last 20 days of the year. According to the data frequency and Cochran’s formula calculations, 500 questionnaires were sent to online trips by text messages and 500 questionnaires were distributed to offline trips in shopping centers in areas 2 and 5 of Tehran. The results are presented in this article. The qualitative data was quantified and labeled, and then used as the model input. The factors influencing the type of shopping trip were age, gender, marital status, car ownership, delivery cost, delivery time, product price, income, employment status and education level. The gray wolf algorithm adjusted the hyperparameter values for the CNN model, which processed and classified the data to estimate the type of shopping trip. The proposed model selected 7 features as the final features: marital status, car ownership, delivery cost, delivery time, product price, income, and employment status. The model predicted the type of trip with an accuracy of 81.97%. The proposed method outperformed other models, such as single CNN convolution, K-nearest neighbor (KNN), decision tree (DT), and MLP neural network. The accuracy of these models was 95.63%, 90.12%, 86.49% and 80.16%, respectively.

Author Contributions

Conceptualization, MH.D., A.N., and T.A.; methodology, MH.D., A.N., and T.A.; software, MH.D.; validation, MH.D., A.N., and T.A.; formal analysis, MH.D.; investigation, MH.D.; resources, MH.D., A.N., and T.A.; data curation, MH. D.; writing—original draft preparation, MH.D.; writing—review and editing, MH.D.; visualization, MH.D.; supervision, A.N.; project administration, MH.D funding acquisition, A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, C.; Wang, Y.; Lv, X.; Li, H. To buy or not to buy? The effect of time scarcity and travel experience on tourists’ impulse buying. Ann. Tour. Res. 2020, 86, 103083. [Google Scholar] [CrossRef]
  2. Candra, S.; Nita, S.; Loang, O.K.; Basmantra, I.N.; Wong, N.K. Consumer Buying Behavior in Online Travel Agent: A Preliminary Finding. In2022 International Conference on Information Management and Technology (ICIMTech) 2022 Aug 11 (pp. 24-27). IEEE. [CrossRef]
  3. Khrais, L.T. Role of Artificial Intelligence in Shaping Consumer Demand in E-Commerce. Future Internet 2020, 12, 226. [Google Scholar] [CrossRef]
  4. Bansal, A.; Srivastava, P. ; Factors affecting consumer buying behavior of online travel agencies. Elementary Education Online 2021, 20, 2958 https://1017051/ilkonline202101331. [Google Scholar]
  5. Archetti, C.; Bertazzi, L. Recent challenges in Routing and Inventory Routing: E-commerce and last-mile delivery. Networks 2020, 77, 255–268. [Google Scholar] [CrossRef]
  6. Shao, R.; Derudder, B.; Witlox, F. The geography of e-shopping in China: On the role of physical and virtual accessibility. J. Retail. Consum. Serv. 2021, 64, 102753. [Google Scholar] [CrossRef]
  7. Periodic report of Urban Traffic and Transportation Organization 2020. Available online: https://www.ictte.
  8. Dong, Y.; Tang, J.; Zhang, Z. Integrated Machine Learning Approaches for E-commerce Customer Behavior Prediction. In 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022) (pp. 1008-1015). Atlantis Press. [CrossRef]
  9. Xiong, Y. The Impact of Artificial Intelligence and Digital Economy Consumer Online Shopping Behavior on Market Changes. Discret. Dyn. Nat. Soc. 2022, 2022, 1–12. [Google Scholar] [CrossRef]
  10. Xiahou, X.; Harada, Y. B2C E-Commerce Customer Churn Prediction Based on K-Means and SVM. J. Theor. Appl. Electron. Commer. Res. 2022, 17, 458–475. [Google Scholar] [CrossRef]
  11. Lee, J.; Jung, O.; Lee, Y.; Kim, O.; Park, C. A Comparison and Interpretation of Machine Learning Algorithm for the Prediction of Online Purchase Conversion. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 1472–1491. [Google Scholar] [CrossRef]
  12. Wong, K.X.; Wang, Y.; Wang, R.; Wang, M.; Oh, Z.J.; Lok, Y.H.; Khan, N.; Khan, F. Consumer behavior analysis on online and offline shopping during pandemic situation. Journal of Accounting Finance in Asia Pacific (IJAFAP) 2021, 4, 75–87. [Google Scholar] [CrossRef]
  13. Chawla, A.; Singh, A.; Lamba, A.; Gangwani, N.; Soni, U. Demand forecasting using artificial neural networks—A case study of american retail corporation. In Applications of Artificial Intelligence Techniques in Engineering; Springer: Singapore, 2019; pp. 79–89. [Google Scholar]
  14. Shi, F.; Guegan, C.G. Adapted Decision Support Service Based on the Prediction of Offline Consumers’ Real-Time Intention and Devices Interactions. 42nd Annual Computer Software and Applications Conference (COMPSAC), 2018, (Vol. 2, pp. 266-271). IEEE. [CrossRef]
  15. Jiang, H.; He, M.; Xi, Y.; Zeng, J. Machine-Learning-Based User Position Prediction and Behavior Analysis for Location Services. Information 2021, 12, 180. [Google Scholar] [CrossRef]
  16. Lee, R.J.; Sener, I.N.; Mokhtarian, P.L.; Handy, S.L. Relationships between the online and in-store shopping frequency of Davis, California residents. Transp. Res. Part A: Policy Pr. 2017, 100, 40–52. [Google Scholar] [CrossRef]
  17. Zubaidi, S.L.; Al-Bugharbee, H.; Ortega-Martorell, S.; Gharghan, S.K.; Olier, I.; Hashim, K.S.; Al-Bdairi, N.S.S.; Kot, P. A Novel Methodology for Prediction Urban Water Demand by Wavelet Denoising and Adaptive Neuro-Fuzzy Inference System Approach. Water 2020, 12, 1628. [Google Scholar] [CrossRef]
  18. Punia, S.; Nikolopoulos, K.; Singh, S.P.; Madaan, J.K.; Litsiou, K. Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail. Int. J. Prod. Res. 2020, 58, 4964–4979. [Google Scholar] [CrossRef]
  19. Kamel, S.R.; YaghoubZadeh, R.; Kheirabadi, M. Improving the performance of support-vector machine by selecting the best features by Gray Wolf algorithm to increase the accuracy of diagnosis of breast cancer. J. Big Data 2019, 6, 90. [Google Scholar] [CrossRef]
  20. Zhao, B.; Lu, H.; Chen, S.; Liu, J.; Wu, D. Convolutional neural networks for time series classification. J. Syst. Eng. Electron. 2017, 28, 162–169. [Google Scholar] [CrossRef]
  21. Zhang, W.; Yu, Y.; Qi, Y.; Shu, F.; Wang, Y. Short-term traffic flow prediction based on spatio-temporal analysis and CNN deep learning. Transp. A: Transp. Sci. 2019, 15, 1688–1711. [Google Scholar] [CrossRef]
Figure 1. Percentage of online shopping trips in the 22 districts of Tehran.
Figure 1. Percentage of online shopping trips in the 22 districts of Tehran.
Preprints 84691 g001
Figure 2. Number of online shopping trips in the 22 districts of Tehran.
Figure 2. Number of online shopping trips in the 22 districts of Tehran.
Preprints 84691 g002
Figure 3. Flowchart of the proposed method.
Figure 3. Flowchart of the proposed method.
Preprints 84691 g003
Figure 4. Typical architecture of a CNN network.
Figure 4. Typical architecture of a CNN network.
Preprints 84691 g004
Table 1. Number of trips with the purpose of shopping in one day in Tehran [7].
Table 1. Number of trips with the purpose of shopping in one day in Tehran [7].
Type Number Percent (%)
online shopping trip 52,621 4.91
offline shopping trip 1,018,652 95.09
Total 1,071,273 100.00
Table 2. Summary of literature review.
Table 2. Summary of literature review.
Author Objective Variable Method Result
[6] Evaluation of physical and virtual accessibility based on the geographic location Number of physical accesses, public transportation and virtual access General Spatial Model (SAC) Increasing physical and virtual access increases shopping trip
[8] Predicting customer behavior in online shopping trip User behavioral data, such as purchasing new products and staying loyal to a particular product Pipeline and random forest algorithms The prediction accuracy was 96% and the two features of user ID and user session were the most important in predicting the next purchase of the user
[9] Assessing the impact of artificial intelligence and consumer online shopping behavior in the digital economy on market changes Number of trips by age in online services Statistical analysis based on questionnaire Forecasting purchases until the end of 2022 and increasing
[10] Predicting offline shopping trip for customer churn User behavioral characteristics, such as return rate, conversion rate K-means clustering, support vector machine and logistic regression The data showed that SVM or support vector machine has higher accuracy than the other two methods
[11] Comparison and interpretation of different machine learning algorithms in order to predict online shopping conversion Based on recorded data and user return rate Neural networks, Extreme Gradient Boosting model The incremental gradient model with the purpose of repeated and effective advertising for users showed that this model provides accurate answers for Changing offline to online purchases
[12] Analysis of consumer behavior in online and offline shopping trip in the conditions of the corona virus epidemic Consumers’ behavior and their relative familiarity with the Internet and delivery time Chi-square test and Cronbach’s alpha test The ability to use the Internet, the variety of products, and the delivery time were identified as indicators influencing online shopping trip
[13] Forecasting offline shopping demand in an American retail company Number of offline purchases for a product Artificial neural networks, ANFIS model It showed that the ANFIS method will be more effective than the ANN structure in estimating the trip demand forecast
[14] Predicting offline consumer interactions and intent in real time User behavioral characteristics, such as return rate, conversion rate Time series algorithms They only investigated the accuracy of the algorithm in predicting interactions
[15] Customer location prediction and machine learning-based behavior analysis for location services in e-commerce Location of stores, location of users LSTM neural network Adding the speed-accuracy index increased the accuracy of the algorithm
[16] Comparison and interpretation of different machine learning algorithms in order to predict online shopping trip Based on recorded data and user return rate Neural networks, Extreme Gradient Boosting model The incremental gradient model with the purpose of repeated and effective advertising for users showed that this model provides accurate answers for Changing offline to online shopping trip
[17] Online and offline shopping trip demand forecasting Annual sales of a particular product Artificial Neural Network (ANN), Fuzzy Neural Network (FNN) The accuracy of Fuzzy Neural Network is more useful and has higher accuracy for demand forecasting
[18] Online shopping trip demand forecasting in multi-channel retailing The number of online purchases compared to the number of offline purchases Deep learning based on short-term memory (LSTM) networks and random forests Statistically, the method presented in this paper provided better results than the reviewed methods
Table 3. Demographics of obtained data.
Table 3. Demographics of obtained data.
Variable Sub-variable Number Percent (%)
Age Between 18 and 25 years 350 35.00
Between 25 and 30 years 321 32.10
Between 30 and 35 years 198 19.80
Above 35 years 131 13.1
Gender Male 532 53.20
Female 468 46.80
Marital status Married 415 41.50
Single 585 58.50
Car ownership status Car owner 486 48.60
Lack of car ownership 514 51.40
Income level Less than 5 million tomans 56 5.60
Between 5 and 10 million tomans 225 22.50
Between 10 and 15 million tomans 413 41.30
More than 15 million tomans 306 30.60
Employment status Self employed 182 18.20
Student 246 24.60
Full time employee 292 29.20
Part time employee 229 22.90
Retired 36 3.60
No job 10 1.00
Others 5 0.50
Education Diploma and sub-diploma 89 8.90
Associate Degree 186 18.60
Bachelor 394 39.40
Masters 298 29.80
Above master’s degree (doctor and doctorate and higher) 33 3.30
Table 4. The result of the GWO algorithm in feature selection.
Table 4. The result of the GWO algorithm in feature selection.
Original Features No. Features Selected No. Accuracy (%) Solution Time (S)
10 7 95.68 8.69
Table 5. Initial settings of the CNN model.
Table 5. Initial settings of the CNN model.
Parameter Value
Initial learning coefficient 0.005
A factor in reducing the learning rate 0.2
The number of repetitions 40
The solver Adam
Network architecture Alexnet
Training function Trainlm
Table 6. Performance evaluation of the proposed method.
Table 6. Performance evaluation of the proposed method.
Value
Accuracy (%) 97.81
MSE 0.325
RMSE 0.570
Table 7. Comparison of accuracy in trip type prediction.
Table 7. Comparison of accuracy in trip type prediction.
Proposed Method (GWO & CNN) Deep model (CNN) DT MLP KNN
Accuracy (%) 97.81 95.63 86.49 90.12 80.16
MSE 0.325 0.9604 6.8864 1.6391 10.0567
RMSE 0.570 0.98 2.6242 1.2803 3.1712
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated