2.1. Time Series Forecasting
Nowadays, statistical-learning-based methods are less frequently the focal point of research; instead, they often serve as a baseline. In [
29], the authors introduced the spARIMA (
self-paced auto regressive integrated moving average) concept to address performance instability caused by noisy samples. Prior to applying the forecasting model, the authors employed a data difficulty ranking and then utilized the SPL (
structured prediction learning) regime to gradually increase the complexity of samples used in training. Consequently, spARIMA outperformed both ARIMA and online ARIMA. Although statistical models offer significantly increased interpretability compared to machine learning and deep learning ones, they struggle to reveal nonlinear dependencies and multi-dimensional, complex patterns, which limits their applicability to data originating from cloud environments.
In [
13], the authors introduced a hybrid deep neural network for volatility forecasting. They expanded the data preprocessing stage by employing an encoding framework to transform one-dimensional time-series into GAF (
gramian angular field) images, thereby enabling the subsequent application of a CNN (
convolutional neural network). While deep learning may seem appealing since it bypasses the manual feature engineering stage, it requires large, reliable, and representative historical datasets. This necessarily limits the applicability of this approach in newly established, small or medium-sized cloud environments facing data scarcity.
RNNs (
recurrent neural networks), alongside their variants encompassing LSTM (
long short-term memory) and GRU (
gated recurrent unit) cells, have become a state-ofthe-art solution for processing long sequences [
56]. Despite the variety of architectures used for forecasting, researchers are increasingly focusing on the general stability of predictive modeling. Therefore, in [
32], the authors introduced CNN-FCM (
CNN-fuzzy cognitive maps) to address the scenario where samples in a dataset are not identically and independently distributed. Despite using latent feature extraction with neural networks and applying clustering algorithms (such as K-means and fuzzy C-means) on the embedded sequences, the major limitation of this approach is the lack of domain-specific metrics for automatically determining the optimal number of clusters. Only standard metrics were utilized, whereas domain-specific measures are critical in the cloud computing context.
In [
59], the authors noted that conventional deep learning models often overlook the distinctive characteristics of time series data, particularly trends and seasonality. To address this, they developed an information fusion transfer mechanism designed to enhance the exchange of extracted long- term characteristics between data sequences. Despite improving forecasting accuracy by up to 60% compared to the Informer model by incorporating a sequence decomposition block as a pre-processing step in the novel DFNet (
decomposition fusion network) architecture, which replaced the traditional self-attention mechanism, a major limitation of DFNet was performance degradation when handling irregular and noisy data with outliers. Subsequently, in [
31], the authors concentrated on enhancing generalization abilities with machine learning and deep learning models under conditions of data scarcity. The researchers employed various data augmentation techniques, including adding random noise, sequence permutation, and scaling. This not only helped mitigate overfitting but also improved the performance of ResNet (
residual network). However, a limitation of this study is the lack of evaluation and insights into how the hybrid approach – utilizing multiple data augmentation techniques simultaneously – performs on time series-specific architectures, such as LSTM or GRU networks.
In the domain of time series forecasting, jointly addressing interdependencies among multiple metrics brings superior outcomes. Consequently, in [
6], the authors applied multivariate analysis alongside the implementation of the GFM concept [
23]. Although this study aimed to explore the factors affecting overall GFM performance based on simulated datasets from various domains, the model selection criterion was based solely on evaluation metrics.
Appropriate domain-level measures that would allow for a comprehensive assessment of the solutions’ properties were absent. Moreover, the conclusions drawn were related to the specific architectures implementing the GFM concept rather than to the GFM concept itself. The concept of GFM was further elaborated upon in [
4], where the authors introduced an LSTM-MSNet (
LSTM-multi-seasonal net) model designed for dealing with multiple seasonal patterns. However, evaluation metrics were found to be satisfactory only in the case of homogeneous datasets, which constitutes a limitation of this study in the context of cloud computing, where historical data from multiple virtual machines is heterogeneous.
In [
5], the authors decided to follow the strategy of employing individual LSTM models for each set of similar time series. Common approaches to determining similarities among time series include distance-based, feature- based, and model-based methods [
30]. Consequently, the researchers in [
5] favored a feature-based approach, where extracted predictors were subsequently fed as input to algorithms such as K-means, DBScan (
density-based spatial clustering of applications with noise), and PAM (
partition around medoids). Unfortunately, the use of handcrafted features has two main drawbacks: it requires domain expertise and can result in predictors that negatively impact the model’s performance. The authors did not conduct a study on feature importance or dynamic feature selection. Additionally, the manual feature enrichment is not scalable with the increase in the number of time series, as it necessitates expert intervention, and the ranking of features itself can change over time.
Leveraging low-dimensional data representations proves to be successful across various domains. In [
35], the authors presented the unsupervised Signal2Vec method for universal time series representation. However, a limitation of this method is that it considers the entire time series rather than sequences, meaning that all data samples, including older ones, are given equal weights. This approach is not ideal for similarity analysis in cloud computing, where the most recent resource usage patterns are the most crucial. Furthermore, in [
11], the authors introduced the HyVAE (
hybrid variational autoencoder) approach, aiming to incorporate the learning of both local patterns and temporal dynamics of data sequences. Subsequently, in [
27], the authors introduced the FEAT (
feature-aware multivariate time-series representation learning) framework. FEAT utilized both timestamp-wise and feature-wise embeddings, combined with data augmentation techniques and a decoder layer to flexibly extract low-dimensional representations of signals. However, in [
51], the authors found that a sequence- to-sequence LSTM network produced topologically correct embeddings of the time series sequences in the hidden space, effectively capturing the state of the underlying Rössler system, thus eliminating the need for a dedicated solution for semantic sequence representation. Unfortunately, all the aforementioned approaches conduct a comparative study of embedding models based solely on first-level metrics, such as reconstruction accuracy, while lacking domain-specific metrics that should support model selection and are introduced in our research.
As stated in [
32], the features extracted by deep learning architectures generally outperform conventionally designed predictors. Accordingly, in [
33], the authors aimed to optimize multi-horizon time series forecasting performance by leveraging stacked autoencoders to generate contextual embeddings. Those embeddings were then utilized in discriminative clustering. For each homogeneous cluster, the researchers applied a separate TCN (
temporal convolutional network) model. Unfortunately, the researchers emphasized solely evaluation metrics, overlooking vital domain-specific measures in both the clustering and prediction phases. In our research, we aim to improve data efficiency and address the risks of direct forecast utilization. In [
16], the authors addressed the quadratic time complexity of DTW (
dynamic time warping), a representative distance-based similarity method. They introduced the first exact algorithm that made the running time dependent solely on the input coding lengths. Nevertheless, a limitation of the study is the application of the method in the context of time series similarity, where feature-based approaches are preferred for their semantic representativeness.
The third approach to time series similarity has been explored in [
24]. The authors employed concept-based model extraction, matching time series to concepts, and conducting pairwise comparisons between concepts to determine similarities. However, this approach required creating additional models for each time series solely for the purpose of determining similarity, which introduces significant overhead.
In the domain of processing long sequences, the attention mechanism has emerged as revolutionary [
48,
52]. It not only forms the cornerstone of the transformer architecture, but also serves as an additional building block that can extend state-of-the-art models, such as LSTM networks [
55]. However, complexity does not always lead to better results, both in terms of evaluation metrics and domain-level metrics. In [
53], the authors highlighted a drawback of using the vanilla transformer model for multivariate forecasting – the inability to exploit spatial dependencies between variables. This limitation was addressed by the transformer-based LSTF (
long sequence time series forecasting) model. Noticeably, researchers tend to focus on developing simpler alternatives, as demonstrated in [
57], where multiple frequency-domain MLPs (
multi-layered perceptrons) proved to be more effective learners than a single transformer. In the context of cloud environments, small models capable of quick adaptation appear to be particularly attractive. Frequent adaptations mean that cost-awareness becomes crucial for all processes related to the machine learning model lifecycle, such as retraining [
34]. Cloud environments are indeed subject to rapid changes, and employing a model that lags in adapting to these dynamics could prove impractical. However, a common limitation of these approaches is their focus on high accuracy rather than on the trade-off between efficiency and responsiveness, neglecting the critical importance of
minimizing training time and model complexity for timely online adjustments.
2.2. Cloud Resource Usage Optimization
Optimizing resource utilization in a cloud environment is a highly intricate and multifaceted process. While our primary focus in this study lies in the application of timeseries forecasting, there exist equally sophisticated paths such as task scheduling optimization [
10], virtual machine placement [
47] and task allocation optimization [
17]. In the realm of multi-objective approaches, various heuristics and genetic algorithms are commonly researched [
10,
22], along with their hybrid counterparts [
19]. Despite the growing interest in machine learning-based optimization, it is essential to consider interpretability, which is a factor facilitated by employing rule-based systems [
8]. In [
9], the authors introduced an intelligent rule-based metaheuristic for task scheduling in time-critical applications. Cloud services are diverse, and almost any resource-intensive process can undergo refinement towards being more cost-efficient while maintaining the required high QoS and QoE (
quality of experience).
Many cloud services incorporate autoscaling [
45], where a reactive approach proves inadequate due to the time delay between underprovisioning detection and the provisioning of additional resources [
15]. Consequently, a proactive approach leveraging time series forecasting with GRU network was examined in [
58]. Furthermore, in [
50], the authors enhanced the built-in Knative autoscaler by employing models such as ARIMA, LR (
linear regression), LSTM, and BiLSTM (
bidirectional-LSTM). This resulted in achieving both downtime minimization and a reduction in resource usage by 14%–20%. In [
25], the authors introduced ProHPA (
Proactive horizontal pod autoscaler), utilizing a BiLSTM network with an attention mechanism for multivariate workload forecasting. ProHPA demonstrated significant improvements, resulting in 23.39% and 42.52% reductions in CPU (
central processing unit) and RAM (
random access memory) utilization, respectively. This approach can perform poorly with new machines and those with diverse usage patterns due to the limited amount of data that supports scaling decisions. Consequently, a common limitation of the research on demand-based predictive autoscaling presented by the authors is its reliance on a dedicated model for each virtual machine – an approach based on the LFM concept – which goes against effective data usage. Additionally, the authors do not address mitigating the risks of incorrect forecasts, which is a focus of our research.
In [
1], the authors introduced a sparse auto-encoder to retrieve low-dimensional workload representations. Subsequently, they employed a GRU network for CPU prediction over short-term horizons. This solution outperformed the RNN network in terms of learning stability. In [
60], the authors introduced the entropy-optimized variational mode decomposition transformer – VMDSETformer. By decomposing the original time-series and assessing the complexity of each component separately using structured entropy, VMD- SETformer outperformed the LSTM network in the short- term prediction. Unfortunately, the time horizons evaluated may limit the ability to make informed operational decisions. Long-term forecasting proves to be much more useful in the context of dynamic resource reservation planning.
In [
40], the authors introduced an SA-LTPS (
self-adapting long-term prediction system) designed to optimize resource utilization for cloud-native applications. SA-LTPS comprised the RF (
random forest) model for weekly predictions, enhanced by the PSO (
particle swarm optimization) algorithm. The authors noted the inherent risk in forecasts; hence, an hourly co-routine was employed to monitor discrepancies between actual and forecasted usage. As a result, infrastructure costs were reduced by as much as 76–89% compared to scenarios without SA-LTPS and by 30–61% compared to those with an active autoscaling mechanism in Azure Cloud. Unfortunately, SA-LTPS required a running copy of the application to populate the sparse QoS table. Furthermore, SA-LTPS only covered univariate predictions, necessitating a dedicated system instance for each resource separately. While SA-LTPS addresses many critical aspects, its major limitations include scalability and high operational costs.
In [
3], the authors presented a method for locally predicting scientific workflow runtimes; however, they did not provide or utilize it in the context of resource reservations. This omission prevents assessing the effectiveness of the solution in the context of FinOps. Furthermore, in [
36], the authors leveraged long-term predictions to estimate resource demand in high-performance computing using XGBoost (
extreme gradient boosting). This work was extended in [
37], translating predicted demands into dynamic resource reservation plans, with neural networks and a TFT (
temporal fusion transformer) included in the comparative study, which allowed for a 31.4% improvement in RMSE over the baseline model – Holt-Winters seasonal smoothing. However, both approaches required separate models for each scientific workflow or virtual machine, limiting their scalability in general-purpose cloud environments. Additionally, in [
38], the authors highlighted the crucial role of exploratory data analysis in time series forecasting. They incorporated diverse methods for achieving multi-step prediction within the proposed model’s architecture. Statistical tests and the analysis of multiple seasonal patterns provided valuable insights prior to the modeling process. However, a limitation of this study is the lack of context regarding FinOps and the estimation of resource reservation plans.
Data from cloud environments usually consists of a small fraction of outliers among a large number of time-series samples. Given the high cost of data labeling, the focus primarily shifts towards unsupervised anomaly detection methods. In [
39], the authors concentrated on anomaly detection within the context of long-term resource usage planning. They proposed an algorithm called WHA (
weighted hybrid algorithm) that combines the SMA (
simple moving average), Kalman filter, and Savitzky-Golay filter for identifying
outliers. The evaluation was conducted using a prediction system composed of three modules: a metric collection module, an anomaly detection module, and a resource usage prediction module based on LSTM. The real-life test dataset included historical usage metrics from over 1,700 virtual machines. The results indicated that the WHA outperformed a static approach employing a LA (
limit-based algorithm) and a DLA (
dynamic limit-based algorithm) in terms of minimizing the average underestimation of anomalies. Furthermore, the method achieved a substantial cost reduction of up to 52.09% for Google Cloud services. In [
39], the authors increased the confidence of classification through a weighted mechanism, but their study was limited to considering pointwise univariate anomalies. In contrast, our research considers hierarchical structures and also focuses on pattern-wise outliers, including multivariate ones. In [
26], the authors achieved increased stability of verdicts through an ensemble-based algorithm called AERF (
adaptive ensemble random fuzzy) to discover anomalous events during infrastructure operation and send them to the global event collector prior to such events. AERF was supported by the RFRB (
random fuzzy rule-based) method. However, despite the dynamic weighted strategy it proposes, it is limited to labeled data only, which is a significant constraint on its implementation in the context of cloud environments. In [
28], the authors introduced the FS-ADAPT (
few-shot time-series anomaly detection framework with unsupervised domain adaptation) concept, which comprised two stages: a dueling triplet adversarial network and an incremental adaptation module. This framework addressed the target imbalance problem through few-shot learning, while unsupervised domain adaptation was employed to train models on data from one or more source domains and subsequently apply the acquired knowledge to unlabeled data from the target domain in operation. However, a limitation of this approach was the need for representative labeled data from multiple source domains. In contrast, our focus is on fully unsupervised methods for anomaly detection, aiming to enhance precision through hierarchical and weighted structures. In [
2], the authors focused on unsupervised anomaly detection through a novel concept called USAD (
unsupervised anomaly detection for multivariate time series). This approach featured a double autoencoder architecture within a two-phase adversarial training framework, which achieved a 24.09% increase in outlier detection accuracy compared to non- adversarial training regimes. However, a limitation of this approach is that, when attempts are made to integrate it into a lightweight data processing pipeline, maintaining the double autoencoder setup may introduce significant complexity. Furthermore, in [
54], the authors introduced a novel unsupervised anomaly detection method targeting long-term seasonal patterns in data, called FCVAE (
frequency-enhanced conditional variational autoencoder). However, a crucial limitation is that it is exclusively applicable to univariate time series. For more reliable detection of abnormal virtual machine conditions, multiple factors need to be considered simultaneously, which suggests that multivariate approaches might offer a more comprehensive solution. Despite this, the FCVAE outperformed baseline methods by up to 14.14% in terms of the best F1 score for univariate cases.
Counteracting inappropriate VM configurations and insecure allocations promotes the efficient use of cloud resources and infrastructure. Consequently, in [
42], the authors proposed a novel MR-TPM (
multiple risks analysis-based virtual machine threat prediction model) to proactively predict potential security threats related to virtual machine instances using the XGBoost model. Evaluated across various resource allocation policies, the solution achieved a reduction in cybersecurity threats by up to 88.9%. Additionally, MR-TPM incorporated workload prediction using a neural network-based approach. However, its evaluation was conducted using threat traces from sources such as Google Cluster, without focusing on scenarios of historical data scarcity, and thus excluding cases involving small or medium-sized environments. It also did not include a study on anomaly detection, which is an area covered by our research. Furthermore, unauthorized access to sensitive data contributes to excessive power consumption. Consequently, in [
43], the authors proposed the ETP-WE (
emerging virtual machine threat prediction and dynamic workload estimation based resource allocation) framework, which predicts both threats and resource usage in real time. Their approach achieved reductions in security threats, power consumption, and the number of active servers by up to 86.9%, 66.67%, and 30%-80%, respectively, while improving resource utilization by 60%-75%. However, the article lacked crucial FinOps context, as it did not provide estimated cost savings from the model’s application or the costs associated with running the solution in real time in a cloud environment. In contrast, we advocate for incorporating a FinOps-driven approach. Similarly, in [
20], the authors addressed the challenge of identifying malicious entities responsible for data misuse. They introduced a model based on a quantum neural network that predicts potential malicious data disclosures, effectively enhancing system security by up to 33.28% compared to similar solutions. In [
21], the authors enhanced privacy preservation in cloud environments and improved model privacy accuracy by up to 15.89%, particularly for sensitive medical data. Additionally, in [
46], the authors presented a novel PPMD (
privacy-preserving model based on differential privacy) approach, which divides data into sensitive and non-sensitive partitions and injects noise into the sensitive segments. Further classification tasks employed in PPMD operation were evaluated using various machine learning algorithms including neural networks. Finally, the authors achieved an improvement of up to 16% in accuracy compared to similar security-oriented solutions.
2.3. Summary
Recent advancements in cloud resource usage optimization emphasize the significance of data-driven and machine learning-based techniques. Despite the considerable focus on leveraging diverse machine learning architectures for processing data sequences, there is a growing awareness of the
risk of inaccurate predictions, which necessitates improving their stability to ensure reliable decision-making.
Cloud FinOps principles include proactively shaping resource and cost management strategies. Consequently, concept-centric and model-agnostic solutions targeting enhanced resource utilization have become particularly crucial. Unfortunately, many studies, especially those focusing on deep learning, tend to assume access to large amounts of historical data, which may not be the case for early-stage cloud environments. Consequently, we observe a research gap in data-driven solutions that are applicable to both small and large environments, remain scalable during phases of dynamic growth and do not generate significant operational costs or management overhead.
Among the commonly researched approaches is the GFM, which might be too broad to fully capture the unique characteristics of all time series within the dataset. Conversely, optimization methods requiring a separate model for each virtual machine (LFM) could escalate costs as the environment expands. Additionally, while solutions aimed at identifying similarities between time series and applying predictive modeling to sets of similar sequences are being developed, these often encounter scalability limitations, overlook the key FinOps context, lack important domainlevel metrics and fail to pay increased attention to the contextual application of multivariate forecasts for longterm decision-making.
Furthermore, there is a notable trend towards favoring smaller models that can swiftly adapt to the variable conditions. This preference is understandable, especially considering that the models achieving the best forecasts in terms of evaluation metrics may not always lead to the best resource reservation plans [
37]. However, it is worth noting that predictive autoscaling, which typically leans towards the use of smaller models, may overlook the possibility of cloud resources being unavailable during provisioning requests. Consequently, embracing forecasting within the framework of dynamic resource reservation not only facilitates more informed decision-making and enhanced budget planning but also supports demand-based and FinOps-aware resource management, while simultaneously minimizing the risk of resource shortages.
Additionally, in
Table 1 we highlight some of the most important features in the context of this research and categorize the articles that exhibit these features. Among these features are machine-learning based, statistical learning-based, clustering enabled, anomaly detection aware, resource reservation, FinOps aware, and forecasting optimization (with a focus on increased stability and scalability of the process rather than solely achieving better accuracy).