1. Introduction
Modern Denial of Service (DoS) attacks are becoming larger and more sophisticated every year, which imposes the need for robust defense mechanisms utilizing cutting-edge technologies [
1,
2,
3]. With ML emerging as a promising tool in not only defence but also attack execution, it becomes necessary to explore its potential usage by attackers and to stay ahead of evolving threats. This study investigates the feasibility of leveraging ML for DoS attack detection by generating behavioral telemetry data using Long-Short Term Memory (LSTM) network [
4] and Bezier Curves and crafting spoofing requests to make the analyzed traffic appear legitimate.
While previous research has explored ML-based approaches for DoS attack detection and mitigation [
5,
6], few have examined the potential of generating realistic attack traffic through ML itself. Those mainly revolve around adversarial ML attacks using generative adversarial networks (GANs) that can successfully evade an ML-based Intrusion Detection System (IDS) [
7,
8]. This gap presents an opportunity to deepen our understanding of attacker capabilities and develop proactive countermeasures.
This research contributes to the field by:
Demonstrating the potential of LSTM networks to generate realistic behavioral telemetry data that can be used for DoS attack simulations.
Evaluating the effectiveness of spoofing requests in masking attack traffic and challenging current detection methods.
Analyzing the economic feasibility of ML-based DoS attacks and highlighting the need for proactive countermeasures despite current limitations.
Our findings aim to advance the development of ML-based DoS attack detection systems by providing valuable insights into attack methodologies and potential blind spots in existing defenses.
2. Related Work
In paper [
9], the authors propose an adversarial DBN-LSTM method for detecting and defending DDoS attacks. The key point of the system is the adversarial DBN-LSTM anomaly detection method. It can generate several adversarial samples and build adversarial datasets using the GAN model. Moreover, other proposal have been presented to improve the performance of the system:
implementation of DBN for data dimensionality reduction,
implementation of LSTM to extract sample timing features to detect IP flow records and identify adversarial DDoS attacks.
The simulation experiments show that the proposed method can effectively detect DDoS attacks and improves system sensitivity to adversarial attacks. In this paper, we propose some new techniques to protect systems against the ML-based DoS attacks.
The authors of [
10] presented a CGAN-based intrusion detection system against DDoS attacks on IoT networks. The core of the system is the mechanism which generates synthetic traffic mapping known patterns and completely new network discriminator network to detect anomalies. The obtained results confirm that the generated dataset significantly improved the detection effectiveness. An important element of the entire solution is the mechanism based on deep learning classifiers. In our proposal, we use different ML-based models to detect specific DoS attacs.
In [
11], the LSTM-based system was proposed to detect and prevent DDoS attacks in a public cloud networks. The system was designed based on a signature-based attack detection approach. The accuracy rate is on the level of 99.83% according to the CICDDoS2019 data set. The simulation results confirm that the prevention part of the system obtained a performance as good as for the previous studies conducted with different DL algorithms on the same and different datasets. In our approach, we use the LSTM-based system with data generated by behavioral telemetry. This is unique for currently known studies.
The review of currently known deep learning based approaches for detecting DDoS attacks is presented in [
12]. The review is provided classifying the papers into five main research areas:
types of DDoS attack detection deep learning approaches,
methodologies, strengths, and weaknesses of existing deep learning approaches for DDoS attacks detection,
benchmarked datasets and classes of attacks in used datasets,
the preprocessing strategies, hyperparameter values, experimental setups, and performance metrics,
the research gaps, and future directions.
Another review which presents different approaches to detecting DDoS attacks, using machine learning techniques is presented in [
13]. Analyzed techniques, such as k-means, K-Nearest Neighbors, and Naive Bayes used in intrusion detection systems and flow-based intrusion detection systems were considered for review. In the review the high-speed network accuracy evaluation factors are highlighted. They provide a detailed DDoS attack taxonomy, and classifies detection techniques. Sevelat types of attacs were considered, e.g., Zero-Day Attacks, Reflection Attacks, DNS Amplification or SYN Flood.
The mechanism proposed in our paper has not been noticed in the analyzed literature presented in both review papers presented above.
3. Testing Environment
The testing environment simulates a simple login/register website utilizing JavaScript listeners for background data collection. Upon a user’s visit to the application, a continuous stream of requests to "telemetry.js" is initiated. This script captures both mouse position and keyboard key press timestamps, sending them to the server via POST requests. The pseudocode in Algorithm 1 outlines the implemented logic.
Algorithm 1 Telemetry Data Collection |
upon Document is fully loaded do
on KeyDown event on window do
Current timestamp
Call SendTelemetryData with ’keyboard’ and
end on
on MouseMove event on window do
Current timestamp
Mouse X coordinate from event
Mouse Y coordinate from event
Call SendTelemetryData with ’mouse’, , ,
end on
function SendTelemetryData(eventType, timestamp, x, y)
Send POST request to ’/track_telemetry’ with event data
on Response received do
Log response message
end on
end function
end upon
|
Collected data is stored in two dedicated databases:
-
Telemetry Database, which captures user interactions with the website, including:
Event ID (unique identifier),
IP address and source port,
Event type (keyboard or mouse),
Timestamp,
Mouse X and Y coordinates (applicable only for mouse events);
-
Request Database, which logs information about individual requests made to the website, such as:
Request ID (unique identifier),
Telemetry data hash (linking request to interaction),
IP address and source port,
Timestamp (time of current request),
HTTP headers, status code, method, hostname, path,
Interaction start time (when was first GET request to critical endpoint sent).
To enhance security, user behavior data (telemetry) is collected only on critical pages like login and register. This allows us to focus verification efforts on areas with heightened risk of bot activity. Specifically, a unique telemetry hash is calculated after credentials are submitted via a POST request.
The hash generation process, outlined in Algorithm 2, considers two key factors:
Mouse movement: We capture a string representing mouse position points throughout the interaction,
Keystroke cadence: The average time between key presses is calculated based on timestamps collected since the page loaded successfully.
Algorithm 2 Telemetry Data Hash |
function SendRequestData
Current datetime
Request headers as dictionary
’N/A’
Initialize database cursor
if Request method is ’POST’ then
Retrieve telemetry data between interaction start and current timestamp
Fetched data
if is not empty then
Initialize , , ,
for each entry d in do
if is ’mouse’ then
Append mouse coordinates to
else if is ’keyboard’ then
Add to
end if
end for
Calculate average time between key presses
Append average time to
Compute hash of
end if
Insert request data into the database
end if
Commit changes to the database
Close database connection
end function
|
3.1. Attack Execution
The attack begins by stealthily gathering crucial information about the target login page. This involves pinpointing the exact positions of input boxes and the submit button, as meticulously outlined in Algorithm 3.
Algorithm 3 Get Initial Position of Element |
function GetInitialPosition(host, path, element_id, offset)
Initialize headless Chrome browser with fullscreen and SSL error ignore options
Navigate browser to "https://{host}{path}"
Find the element on the webpage by its ID
Get the screen dimensions using PyAutoGUI
Retrieve browser window position
Calculate the center coordinates of the element
Adjust coordinates to be within screen limits and apply offset
Quit the browser
return Screen coordinates (screen_x, screen_y)
end function
|
Armed with this knowledge, the attack commences by sending a carefully crafted stream of POST requests. Each request bears a deceptive payload: meticulously crafted mouse position data, designed to mimic genuine human interaction. This process, detailed in Algorithm 4, aims to evade detection by blending seamlessly with typical user behavior.
Algorithm 4 Send Telemetry Request |
function SendTelemetryRequest(event_type, timestamp, x (optional), y (optional))
Define path as ’/track_telemetry’
Construct data as a JSON string with event_type, timestamp, x, and y
Define headers with necessary information including host, session, and browser details
Call send_post_request with path, data, and headers
end function
|
Mouse positions are calculated, leveraging Bézier curves, widely used in computer graphics, game development, and image processing, to generate realistic mouse trajectories. These curves offer flexibility and smooth transitions, crucial for mimicking natural user behavior. Important to understand is that those curves can be exchanged with new ML models, such as "SapiAgent", to generate even more human-like mouse trajectories [
14]. However, for simplicity, we will operate on the Bézier curves. They are defined by Bernstein polynomials as shown in Equation
1 [
15],
where:
n - the degree of the Bézier Curve (number of control points minus one),
t - the parameter of the Curve (position along the curve ranging between 0 to 1),
the binomial coefficients are given by Expression
2:
We strategically choose three control points for each curve:
This point selection process is elucidated in Algorithm 5.
Algorithm 5 Generate Random Point on Line |
- 1:
function GenerateRandomPointOnLine(point_a, point_b, randomness)
- 2:
Random number between 0 and 1
- 3:
- 4:
- 5:
Calculate the directional difference , between and
- 6:
Determine perpendicular direction ,
- 7:
Normalize the perpendicular direction
- 8:
Calculate based on
- 9:
Adjust x, y coordinates by applying in the perpendicular direction
- 10:
return x, y
- 11:
end function
|
By leveraging Equation 1, we numerically calculate the Bézier curve points, as outlined in Algorithm 6.
Algorithm 6 Generate Bézier Curve Points |
- 1:
function GenerateBezierCurve(control_points, num_points)
- 2:
Create a list of evenly spaced values between 0 and 1
- 3:
Initialize as a zero matrix
- 4:
Number of control points minus 1
- 5:
for to do
- 6:
for to n do
- 7:
Calculate Bernstein polynomial
- 8:
- 9:
end for
- 10:
end for
- 11:
return
- 12:
end function
|
To visualize the generated curves and enhance understanding, we present them in
Figure 1.
Once the simulated mouse reaches its intended input field, the attack shifts its focus to simulating keyboard activity. This attack leverages the assumption that attackers can readily build a database of keyboard dynamics for their use. To simulate real user behavior, a simple keylogger was implemented. The code samples 128 random passwords from the "rockyou.txt" file [
16], one at a time. For each password, it meticulously records every keystroke, capturing both the exact moment the key is pressed (key_down) and released (key_up). This precise timing data continues until the user presses "Enter", signaling the completion of the typed word. Captured data is then stored in a comma-separated values (CSV) file. The structure of this database is outlined in
Table 1.
Each row uniquely identifies a keystroke with a timestamp (recorded in milliseconds), and associates it with the typed word (using an internal Word_ID) and the corresponding ASCII code.
A Long Short-Term Memory (LSTM) model serves as the backbone for predicting the intervals between keystrokes. This choice leverages the inherent sequential nature of keystroke timing data, allowing the model to capture and learn crucial temporal dependencies.
Figure 2 visually represents the model’s structure. The model loss is shown on the figure number
Figure 3.
Specifically, the model uses 34 input features, representing the maximum password length from the generated database. This allows the model to understand the potential range of keystroke sequences. An LSTM layer then processes this information, effectively extracting relevant patterns and dependencies from the data. Finally, the output layer utilizes these learned patterns to predict the time a key should be held for, as well as the interval between the current and next keystroke.
Algorithm 7 precisely predicts key press and release timings, ensuring a natural typing cadence. These meticulously timed events are then transmitted to the server, further solidifying the illusion of human presence.
Algorithm 7 Type In Current Input Field |
function TypeInCurrentField(word)
global timestamp
times ← predict_hold_and_release_times(word)
for each time in times do
offset ← create time delta from time[0]
send_telemetry_request("keyboard", timestamp + offset)
increment timestamp by offset and additional time from time[1]
end for
end function
|
After successfully filling each input field, the attack seamlessly resumes its simulated mouse movements, guiding the cursor towards the submit button. Upon reaching this critical element, the attack culminates in the final submit button click. To maximize the likelihood of success, the attack relentlessly repeats this intricate sequence, relentlessly attempting passwords from the expansive "rockyou.txt" file. This unwavering persistence ensures that no potential password combination is overlooked.
4. Results
We examined the request data, focusing on the following key columns:
ID,
Telemetry Hash,
IP Address,
Port,
Timestamp.
To illustrate the differences introduced by behavioral telemetry, we present excerpts of request data from two sets of simulations:
Without Behavioral Telemetry: As shown in
Table 2, requests lack unique telemetry hashes. Bot traffic resulted in generating likewise hashes, making it potentially easier to mark as fraudulent.
With Behavioral Telemetry:
Table 3 demonstrates the presence of randomized telemetry hashes, indicating the traffic was more sophisticated in the way behavioral telemetry was used. These unique identifiers, generated based on user behavior simulated during the attack, add a layer of complexity and potentially enhance the attack’s ability to evade detection.
Executing the attack yielded several clues about its impact on the server. First one being the frequent number of 500 Internal Server Errors being logged. These responses suggest the server struggled to handle the influx of requests, potentially overloading resources. Another suggestion is the "ConnectionResetError" seen at the peak of traffic received by the server. This error indicates the server actively terminated connections, likely as a defense mechanism.
To gauge the attack’s resource consumption, we instrumented the server with dedicated threads. One thread continuously recorded Central Processing Unit (CPU) usage at a specified interval, while the other tracked memory usage.
Figure 4 reveals how the attack affected CPU utilization. Unlike legitimate traffic, which exhibits periods of lower usage (subfigure_b), CPU usage during the attack remained consistently high (subfigure_a). This indicates the server struggled to process the influx of requests, likely exceeding its capacity. Furthermore, the lack of clear data at the end of the attack period (subfigure_a) suggests that resources were saturated, preventing complete file writing. As a result, the recorded data had to be truncated.
While CPU usage offered clear insights,
Figure 5 shows no significant differences in memory consumption on the web application server during the attack (subfigure_a) compared to normal use (subfigure_b). This suggests the attack primarily impacted CPU resources. However, we expect the changes to be reflected in storage utilization on the Windows MySQL server, which requires further investigation.
Figure 6 compares the CPU usage of the attacker’s machine with the server throughout the attack. While the server experienced sustained high usage, the attacker’s CPU usage dropped to 0% just when the server shut down.
While the attack demonstrably impacted server resources, its efficiency from the attacker’s perspective is questionable. Many other DoS techniques offer a better "workload-result ratio," meaning they achieve similar disruption with less resource investment. However, this advantage comes at the cost of easier detection.
Table 4 illustrates a captured request sent during the attack. Generally, for the application itself, headers appear legitimate and do not differ from a typical request sent by a legitimate user.
5. Discussion
This study examined a DoS attack utilizing spoofed telemetry data, aiming to mimic legitimate user behavior and evade detection. Our analysis revealed:
Deceptive Nature: The attack exploited the system’s reliance on telemetry data for user identification and resource allocation. This highlights the inherent vulnerabilities of trust-based security models in complex systems.
Challenges in Detection: The attack’s ability to blend in with typical user behavior necessitates advanced detection methods that move beyond simple anomaly-based approaches.
However, while using controlled simulations ensured data collection, it may not fully capture the complexities of real-world attack scenarios. Additionally, the analysis focused on a specific attack type, and the effectiveness of deception might vary with different approaches.
Our findings corroborate previous research highlighting the growing sophistication of DoS attacks and their increasing use of deception techniques. Moreover, the challenges in detecting deception-based attacks resonate with calls for advanced threat detection methods that combine behavior analysis with anomaly detection.
Author Contributions
Conceptualization, P.Nędza; methodology, P.Nędza; software, P.Nędza; validation, J.Domżał; formal analysis, p.Nędza; investigation, P.Nędza.; resources, P.Nędza; data curation, P.Nędza; writing—original draft preparation, P.Nędza; writing—review and editing, J.Domżał; visualization, P.Nędza; supervision, J.Domżał; project administration, J.Domżał; funding acquisition, P.Nędza. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.
Acknowledgments
Not applicable.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
DoS |
Denial of Service |
ML |
Machine Learning |
LSTM |
Long-Short Term Memory |
GAN |
Generative Adversarial Network |
IDS |
Intrusion Detection System |
CSV |
Comma-Separated Values |
CPU |
Central Processing Unit |
References
- Kiner, E.; April, T. Google mitigated the largest DDoS attack to date, peaking above 398 million rps. Available online: https://cloud.google.com/blog/products/identity-security/google-cloud-mitigated-largest-ddos- attack-peaking-above-398-million-rps (Accessed on 23 November 2023).
- Protecting Your Business From Cyber Attacks. The State of DDoS Attacks DDoS Insights From Q1 & Q2, 2023. Accessible online: https://www.zayo.com/resources/truth-and-trends-of-ddos-attacks/?utm_source=newsroom_website&utm_medium=press-release&utm_content=ddos-insights-report-2023&utm_campaign=2023_ddos_insights_report (Accessed on 24 November 2023).
- Yoachimik, O.; Pacheco, J., DDoS threat report for 2023 Q3. Accessible online: https://blog.cloudflare.com/ddos-threat-report-2023-q3 (Accessed on 24 November 2023).
- Lindemann, B.; Müller, T.; Vietz, H.; Jazdi, N.; Weyrich, M. A survey on long short-term memory networks for time series prediction. Procedia CIRP 2021, 99, 650–655. [Google Scholar] [CrossRef]
- L. Haripriya; M. A. Jabbar, Role of Machine Learning in Intrusion Detection System: Review. In Proceedings of 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 30 September 2018, pp. 925-929. [CrossRef]
- Han, J.; Pak, W. Hierarchical LSTM-Based Network Intrusion Detection System Using Hybrid Classification. Appl. Sci. 2023, 13, 3089. [Google Scholar] [CrossRef]
- M. Usama, M. Asim, S. Latif, J. Qadir and Ala-Al-Fuqaha, Generative Adversarial Networks For Launching and Thwarting Adversarial Attacks on Network Intrusion Detection Systems. In Proceedings of 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Tangier, Morocco, 24-28 July 2019 pp. 78-83. [CrossRef]
- Zhang, C.; Costa-Pérez, X.; Patras, P. Adversarial Attacks Against Deep Learning-Based Network Intrusion Detection Systems and Defense Mechanisms. IEEE/ACM Transactions on Networking 2022, 30, 1294–1311. [Google Scholar] [CrossRef]
- Chen, L.; Wang, Z.; Huo, R.; Huang, T. An Adversarial DBN-LSTM Method for Detecting and Defending against DDoS Attacks in SDN Environments. Algorithms 2023, 16, 197. [Google Scholar] [CrossRef]
- Alabsi, B.A.; Anbar, M.; Rihan, S.D.A. Conditional Tabular Generative Adversarial Based Intrusion Detection System for Detecting Ddos and Dos Attacks on the Internet of Things Networks. Sensors 2023, 23, 5644. [Google Scholar] [CrossRef]
- Aydın, H.; Orman, Z.; Ali Aydın, M. A long short-term memory (LSTM)-based distributed denial of service (DDoS) detection and defense system design in public cloud network environment. Computers & Security, Volume 118, 2022, 102725, ISSN 0167-4048. [CrossRef]
- Mittal, M., Kumar, K. & Behal, S. Deep learning approaches for detecting DDoS attacks: a systematic review. Soft Comput 27, 13039–13075 (2023). [CrossRef]
- Haseeb-ur-rehman, R.M.A.; Aman, A.H.M.; Hasan, M.K.; Ariffin, K.A.Z.; Namoun, A.; Tufail, A.; Kim, K.-H. High-Speed Network DDoS Attack Detection: A Survey. Sensors 2023, 23, 6850. [Google Scholar] [CrossRef] [PubMed]
- M. Antal; K. Buza; N. Fejer, SapiAgent: A Bot Based on Deep Learning to Generate Human-Like Mouse Trajectories. IEEE Access 2021, 9, 124396–124408. [CrossRef]
- S. Karateke; R. Akalin; M. Gümüş, The Evolution of Bézier Curves in Computer- Aided Geometric Design (CAGD): A Systematic Review. In Akpınar, A. (ed.), Research on Mathematics and Science- II. Özgür Publications: Istanbul, Türkiye, 2023; pp. 1-15. [CrossRef]
- Kali Linux. Rockyou. Accessible online: https://www.kali.org/tools/wordlists/ (Accessed on 14 February 2024).
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).