1. Introduction
Ransomware has emerged as a particularly pernicious form of attack, with its capability to encrypt the digital assets of its victims, rendering them inaccessible [
1,
2]. This form of cyber attack represents a direct challenge to the core principles of digital security, often referred to as the CIA Triad, which encompasses the confidentiality, integrity, and availability of information [
3,
4]. The disruptive impact of ransomware has its roots in the latter part of the 20th century, most notably with the emergence of the AIDS Trojan towards the end of the 1980s [
5,
6]. This early form of ransomware was notorious for its method of operation, which involved the encryption of file identifiers and the masking of directories on the infected systems’ primary storage [
6,
7]. This initial foray into the encryption of user data set the stage for the sophisticated ransomware attacks that are prevalent today [
1,
8]. The contemporary execution strategy of ransomware is typically characterized by a deceptive process that entices unsuspecting users into downloading a seemingly benign payload that, in reality, harbors the ransomware [
9,
10]. Once this malicious payload is activated, it proceeds to encrypt vital documents and files across various storage mediums, including local hard drives, removable media, and even networked storage systems shared among multiple users [
11]. In the aftermath of this encryption, the perpetrators behind the ransomware demand financial compensation, often in the form of cryptocurrencies, for the provision of a decryption key necessary to revert the files back to their original, usable state [
6,
12]. The demand for a ransom, a hallmark of this cyber extortion, is not merely a monetary burden on the victims but also signifies a severe breach of the security and privacy of the affected parties [
1,
13]. As the digital sphere continues to evolve, the methods employed by ransomware have become increasingly sophisticated, leveraging various attack vectors to infiltrate and compromise systems [
1,
14]. This relentless progression of ransomware attacks requires a continuous and rigorous approach to cybersecurity, particularly in the development of dynamic and resilient strategies to counteract these threats [
1,
15,
16].
While the digital security industry has made significant progress, the adeptness at identifying and mitigating the threats posed by ransomware, particularly the emerging strains, continues to present a substantial obstacle [
17]. The age-old technique of static analysis — predicated on the detection of binary signatures without the need to run the questionable software — is now frequently outmaneuvered by the elusive tactics of ransomware developers, such as the application of obfuscation methods to cloak malicious code [
9,
10]. In contrast, dynamic analysis has risen to prominence, offering a more robust avenue for probing the real-time conduct of ransomware within a secure setting [
11,
18]. This method hinges on the meticulous observation of Windows API calls, which are the fundamental mechanisms through which ransomware interacts with and solicits services from the operating system to execute its nefarious objectives [
19,
20]. The sheer volume and complexity of the data produced by Windows API call logs have been known to hinder the efficacy of this analytical approach [
13,
21]. The dense and disordered nature of these traces often leads to a surge in computational overhead and a consequent depletion in the ability to accurately pinpoint and characterize malevolent activities [
14,
22]. This increase in resource consumption coupled with the challenge of accurately discerning the malicious from the benign has driven researchers and cybersecurity professionals to seek out more advanced and nuanced methods of detection and analysis [
15,
23,
24]. As ransomware evolves, so too must the tools and techniques employed to detect it, underscoring the need for continuous innovation in the realm of cybersecurity [
25,
26]. The Windows API call analysis, while being a critical element in the behavioral representation of ransomware, demands a nuanced approach that can effectively distill and interpret the vast datasets generated during a ransomware attack [
27,
28]. These calls, when analyzed with precision, have the potential to reveal the distinctive patterns and tactics employed by ransomware, yet the task is akin to finding a needle in a haystack given the extensive and cluttered nature of the information captured [
6,
29]. Thus, the ongoing struggle within the cybersecurity industry is not only to enhance the capability to detect ransomware but also to refine the processes involved in such detection to ensure that they are both efficient and accurate [
30,
31].
The central objective of this inquiry is to forge an advanced methodology for the selection and detection of ransomware-specific features, employing a refinement process grounded in the sequence of Windows API calls in conjunction with a dynamic programming technique [
9,
32] (
Figure 1). This methodical approach aspires to curtail the number of evaluations necessary to develop an optimal set of features, thereby augmenting the efficiency of the detection mechanism [
17,
18]. An examination contrasting the proposed Enhanced Min Max Relevance (EmRmR) technique with traditional methodologies is anticipated to demonstrate the enhanced computational speed and precision of evaluation that EmRmR offers [
33,
34]. A suite of machine learning classifiers, which includes but is not limited to Decision Trees, K-Nearest Neighbors, Logistic Regression, Random Forests, and Support Vector Machines, has been enlisted to gauge the effectiveness of the meticulously refined feature sets [
19,
20,
35]. The ultimate aim is the establishment of a formidable and resilient framework capable of the accurate detection of ransomware intrusions [
36,
37]. This study has analyzed the sequence of Windows API calls, which are instrumental in the operation of ransomware, to refine the features that are most indicative of such malevolent software [
38]. By streamlining these features, the study seeks to eradicate extraneous data that may obfuscate the true nature of ransomware activity [
10,
39,
40]. The EmRmR method, a cornerstone of this research, has been crafted to exclude redundant information while preserving the most critical data points, thereby ensuring that the most relevant features are highlighted [
2,
41]. This process is not only essential for the accurate identification of ransomware but also for the swift analysis of its behavior, which is paramount in a landscape where the speed of detection is often critical [
1,
12]. The selected classifiers are known for their robustness and have been deployed to validate the effectiveness of the feature selection process [
15,
22]. Each classifier has been rigorously trained and tested on a diverse array of data samples, with the intention of calibrating the detection system to recognize the subtle nuances of ransomware behavior [
23,
25]. Through this comprehensive evaluation, the classifiers are expected to exhibit a heightened sensitivity to the presence of ransomware, thus providing a crucial layer of defense against this ever-evolving digital threat [
6,
26,
27].
The major contributions of this study include:
We examined the latest ransomware definition
We surveyed studies of static analysis and dynamic analysis of ransomware, and critiqued their shortcomings.
We listed and discussed major insights of this surveys.
The rest of the article is organized like this:
Section 2 is about related works.
Section 3 is our methodology.
Section 4 is the experiment.
Section 5 discusses the main implications of our study.
Section 6 is the conclusion and future research direction.
4. Results
This section shows the results of this study.
4.1. Classifier Performance Analysis Based on N-Gram Feature Lengths
The efficacy of various classifiers was scrutinized based on n-gram lengths of 2, 3, 4, and 5, derived from each Windows API call sequence. This analysis was facilitated by the train-test split methodology, which bifurcates the dataset into a training set and a testing set. Prior to the construction of the model, the dataset was divided, ensuring a uniform distribution where 80.7% constituted the training set and 19.3% formed the testing set. The classifiers’ precision was evaluated by metrics such as accuracy and the values of the Area under the Receiver Operating Characteristic Curve (AUC).
The Support Vector Machine (SVM) classifier, when utilizing a 3-gram approach, manifested superior accuracy, with a spectrum from 66.7% to 98.7%, and a reduced false-positive rate of approximately 0.0267. The robustness of this classifier was observed to be consistent with the implementation of 4-gram and 5-gram feature sequences. According to the data presented in
Table 2, the k-Nearest Neighbors (kNN) and Decision Tree (DT) algorithms revealed almost parallel accuracy. These classifiers, with a 2-gram feature set, exhibited a lower accuracy threshold, whereas a significant increase was noted with the application of a 3-gram sequence.
The Random Forest (RF) classifier displayed the least accurate results across the spectrum of classifiers when the models were subjected to various lengths of n-grams. Specifically, the accuracy was moderately satisfactory with 67.7% for 2-grams and 78.7% for 3-grams. However, a decline in accuracy to 61.7% and 59.7% was observed for 4-grams and 5-grams, respectively. This decrease could be attributed to the increased computational complexity required by the RF classifier to construct the model with larger n-gram sizes, leading to overfitting issues during the training phase and consequently a lack of generalization for new features during testing. In contrast, Logistic Regression (LR) exhibited remarkable performance, with an accuracy of 72.3% and a lower false-positive rate of 0.413 for 2-gram features. This can be ascribed to the classifier’s lower computational demands with smaller n-gram sizes.
Table 3.
Classifier Performance Evaluation with Varied N-Gram Feature Lengths.
Table 3.
Classifier Performance Evaluation with Varied N-Gram Feature Lengths.
n-gram Length |
SVM |
kNN |
DT |
RF |
2-gram |
66.1% |
58.1% |
63.1% |
67.1% |
3-gram |
98.1% |
88.1% |
86.1% |
78.1% |
4-gram |
96.1% |
76.1% |
74.1% |
61.1% |
5-gram |
97.1% |
78.1% |
76.1% |
59.1% |
The Receiver Operating Characteristic (ROC) curves show that overall AUC values underwent a marked decrease when 2-gram based features were utilized for model training. This is likely due to shorter n-grams’ inability to encapsulate ransomware behavior adequately, thus increasing false positive alerts. A broadening of feature dimensionality, achieved by expanding the n-gram window to a 3-gram, resulted in a dramatic rise in AUC values for the classifiers. As seen in Figure (c), a marginal decline in AUC by an average of 0.167% was observed for n = 4. Furthermore, Figure (d) presents a comparative analysis of the ROC curves for longer sequences with n = 5 across the classifiers, indicating a significant drop in AUC values. This could be because extended sequences of n-grams are not efficacious in differentiating ransomware behavior from benign processes. Moreover, the increment in the length of n-gram sequences has a direct correlation with the training duration required for the model.
4.2. Enhanced Analysis of Classifier Efficacy Based on Variable N-Gram Feature Sizes
The objective of this investigation was to assess the classifiers’ effectiveness when applied to an array of n-gram feature sizes, utilizing a tenfold cross-validation method. A notable limitation observed with the train-test splitting approach is the potential for class imbalance post-dataset division, which may lead to an erroneous estimate of the holdout error rate. To circumvent this issue, tenfold cross-validation was employed, which serves to mitigate overfitting and more accurately evaluate the model’s performance. In this method, the dataset was thoroughly shuffled and partitioned into ten equal-sized segments. During each of the ten iterations, nine segments were utilized to develop the model, and the remaining segment was used for evaluation.
The Support Vector Machine (SVM), when employing a 3-gram scheme, demonstrated remarkable accuracy levels, varying from 66.1% to 98.1%, accompanied by a diminished false-positive rate of approximately 0.0261. The SVM’s consistency was notable across the board, with the 4-gram and 5-gram feature sequences showing stable performance metrics. According to
Table 3, k-Nearest Neighbors (kNN) and Decision Tree (DT) algorithms exhibited commensurate levels of accuracy. The 2-gram feature set resulted in lower accuracy rates for these classifiers, which noticeably improved upon the adoption of a 3-gram sequence.
Random Forest (RF) classifiers yielded the least precise outcomes among the spectrum of classifiers when evaluated across diverse n-gram lengths. In particular, the RF classifier’s accuracy was moderately acceptable with a 67.1% for 2-grams and 78.1% for 3-grams. A downward trend in accuracy was noted for 4-grams and 5-grams, registering at 61.1% and 59.1% respectively. Such a trend suggests an escalation in computational demands as the n-gram size increases, potentially causing overfitting during the training phase and a subsequent inability to generalize when confronted with new features in the testing phase. In stark contrast, Logistic Regression (LR) outshone other classifiers with an accuracy of 72.3% and a lower false-positive rate of 0.0413 for 2-gram features, attributed to its lesser computational requirements for smaller n-gram sizes.
5. Discussions
This section provides an in-depth interpretation of the ransomware detection study results across 4 key dimensions. The first 3 subsections discuss the major insights uncovered regarding the efficacy of feature engineering strategies, comparative classifier performance, and the role of enhanced feature selection techniques. The final subsection examines the limitations present within the current study.
5.1. Efficacy of N-Gram Based Feature Engineering
The n-gram based feature engineering approach proved instrumental in constructing an informative behavioral profile for both ransomware and benign files. Shorter length 2-grams were unable to adequately encapsulate the intricacies of ransomware behavior, resulting in suboptimal detection rates. This limitation necessitated an expansion of the n-gram window to 3-grams, which dramatically enhanced feature dimensionality. The 3-gram sequences enabled a more thorough characterization of ransomware behavioral attributes, significantly boosting detection accuracy levels. However, further lengthening n-grams beyond 3 led to a decline in efficacy, implying that excessive n-gram sizes fail to provide additional discriminative power. Overall, 3-gram feature sequences offered an optimal balance between dimensionality and informational content regarding ransomware activities. The study demonstrates the importance of meticulous feature engineering to extract indicators with high relevance towards identifying ransomware. Mindful tuning of n-gram sizes is essential to construct feature vectors that maximize detection performance.
5.2. Comparative Analysis of Classifier Performance
The spectrum of classifiers exhibited varying levels of accuracy across the different n-gram feature lengths. SVM consistently outperformed other models, underscoring its suitability for discerning ransomware behaviors within the high-dimensional feature space produced by n-grams. SVM’s maximal margin property enables an effective separation between benign and ransomware classes. The kNN and DT classifiers also showed competency, although their accuracy was inferior compared to SVM. In contrast, RF classifiers displayed subpar outcomes, which further deteriorated as n-gram sizes increased. The elevated feature dimensionality likely overwhelmed RF’s decision tree constituents, resulting in overfitting. Furthermore, RF’s bagging approach failed to ameliorate the overfitting problem. The study highlights SVM’s superiority in modeling ransomware behaviors from Windows API call traces. Tailoring classifiers to the intricacies of feature engineering choices is imperative for maximizing detection efficacy.
5.3. Relevance of Enhanced Feature Selection
The integration of feature selection techniques such as mRMR helped to refine the core indicators differentiating ransomware from benign entities. By pruning redundant and non-informative features from the n-gram vectors, mRMR reduced complexity and retained only the most decisive attributes. This process of enhancement enabled easier interpretation of how specific API call sequences characterize ransomware operations. The mRMR decreased computational overhead by diminishing the volume of features fed into classifiers. The improved efficiency facilitated quicker model training to rapidly adapt to new ransomware strains. Overall, the strategic application of feature selection is integral to elevate both the speed and accuracy of ransomware detection frameworks.
5.4. Limitations of the Study
While providing valuable insights, the study possesses certain limitations that warrant acknowledgement. The dataset composition relied exclusively on Windows PE files, limiting the extension of findings across other platforms. Additionally, the scope was confined to static analysis, without considering run-time interactions such as file encryption. There is a need for continued research with expanded datasets covering diverse file types. Execution in sandboxes could reveal supplementary dynamic indicators beyond API calls. Finally, testing across adversarial evasion attempts can further validate the detection framework’s resilience. This study provided a rigorous examination of feature engineering and selection strategies for optimizing ransomware detection from Windows API calls. The techniques discussed will serve as a foundation for developing robust anti-ransomware systems. Future work should focus on building ensemble classifiers that synthesize the strengths of multiple algorithms. With continuous enhancement, ransomware detection will grow more potent in safeguarding against cyber extortion.