Machine Learning Forensics: State of the Art in the Use of Machine Learning Techniques for Digital Forensic Investigations within Smart Environments

Laila Tageldin; H.S. Venter

doi:10.20944/preprints202306.1660.v1

Submitted:

21 June 2023

Posted:

23 June 2023

You are already at the latest version

Abstract

According to the wide variety of internet of things (IoT) devices within smart environments, many challenges face conventional digital forensic investigation (DFI) in smart environments. Challenges in this environment include heterogeneity, distribution, and massive amounts of data, which exceed digital forensic (DF) investigators’ human capabilities to deal with all of these challenges within a short period of time. Furthermore, it significantly slows down or even incapacitates the conventional DFI process. With the increasing frequency of digital crimes, better and more sophisticated DFI procedures are desperately needed, particularly in such environments. Since machine learning (ML) techniques might be a viable option in certain situations, this paper presents the integration of ML into DF. It also explores the potential further use of ML techniques in DF in smart environments to reduce the hard work of human beings, as well what to expect from future ML applications to the conventional DFI process.

Keywords:

IoT devices

;

smart environments

;

digital forensics

;

machine learning techniques

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Currently, smart environments offer various technologies and services such as smart transport systems, smart vehicles, smart homes, smart urban lighting, integrated travel ticketing, smart energy grids, and smart sensors [1]. These technologies strongly depend on the use of small electronic chips and electromechanical devices (i.e. IoT devices) such as sensors, wireless technologies, radio frequency identification (RFID) devices, localisation technologies, and near-field communication devices [1].

The wide variety of IoT devices used within smart environments makes it very difficult to perform digital forensics (DF) in this environment. The challenge for DF professionals and practitioners is that standard industrial DF equipment and its capabilities concerning conventional computing operating systems are not coping with the smart environment due to its complex, heterogeneous, and distributed nature [2].

The problem raised in this paper is that little to no reliable DF applications or DF directives currently exist to retrieve data from internet of things (IoT) devices in the event of a digital attack, an active investigation, or a litigation request within smart environments. The DF of IoT technologies is the missing piece of the puzzle that is required to successfully investigate IoT devices in smart environments [3]. Thus, researchers and practitioners in the DF field are working hard to define new techniques and tools to improve DF capabilities for coping with this problem.

The numerous challenges that face traditional digital forensic investigation (DFI) in smart environments result from the heterogeneity, distribution, and huge amounts of data involved. This exceeds the capabilities of human DF investigators to cope with all these challenges in a short time. It severely slows down or even incapacitates the conventional DFI process. Due to the rapid pace at which digital crimes are committed, better and more intelligent DFI techniques are sorely needed, especially in smart environments. Machine learning (ML) techniques might offer a solution to these challenges [4].

The importance of ML in DFIs should not be underestimated, since such intelligent technologies have the potential to support and significantly enhance the conventional DFI process. ML technologies can potentially assist in the automation of the manual DFI processes when significant volumes and a large variety of data must be analysed. Using more intelligent techniques will increase the chances of identifying and successfully investigating cybercrimes in modern smart environments. This will help DF specialists get to the root cause much faster and more efficiently [5].

For all the reasons mentioned above, ML holds great potential for DFIs. However, it is a foreign field to most DF investigators and the scope for new research is vast. That being said, there exists a small corpus of research where ML technology was used to investigate digital crimes [4].

ML techniques, which are often used to predict behaviour, make use of pattern recognition software for investigators to analyse huge amounts of data. ML techniques seek to learn from historical perspectives so as to predict future behaviour. Therefore, by using ML techniques, investigators may gain the capability to recognise patterns of criminal activity and learn from the historical data when, where, and how the cybercrime probably took place.

The remainder of this paper is structured as follows. Section 2 provides some background on digital forensics, the ISO/IEC 27043 international standard on the DFI process, smart environments, and ML. Section 3 presents the state-of-the-art ML techniques used in digital forensics. Section 4 discusses the role of ML techniques in the DFI process and gives future direction on the use of ML in this process. The paper is concluded in Section 5.

2. Background

This section deals with digital forensics, the internationally standardised DFI process, smart environments, and machine learning – all the important concepts of which the reader needs to take cognisance in this paper.

2.1. Digital Forensics (DF)

DF forms part of the greater field of forensic science. DF investigators are responsible for retrieving and investigating data on digital devices. As these new and updated platforms work with IoT and cloud technologies in smart environments, industry and practitioners are struggling to develop DF tools and procedures to keep up with the challenges involved. These technologies may even be embedded electronics or computing systems with specific functionality that may exist as part of a larger platform [3].

A systematic and standardised process that has been developed to perform DFI was captured in the international standard, “ISO/IEC 27043:2015 – Incident investigation principles and processes” [6], which is briefly elaborated on in the next section.

2.2. ISO/IEC 27043

The ISO/IEC 27043 international standard was initially proposed by Valjarevic and Venter [7] to handle DF incident investigation principles and processes [6]. Figure 1 shows a high-level overview of this ISO/IEC 27043 international standard.

The conventional DF process (i.e. the process that had been followed before ISO/IEC 27043 was imposed) was only concerned with the initialisation, acquisitive, and investigative processes. However, the conventional DF process consisted of various disparate process models that were not harmonised. Therefore, Valjarevic and Venter [7] considered all relevant models and other standards so as to address the disparities and harmonise them into a single standardised model, known as ISO/IEC 27043. In addition to the harmonisation effort, Valjarevic and Venter [7] added the readiness and concurrent processes classes.

However, since it has not been tailored for IoT and smart environments, performing the ISO/IEC 27043 DFI process within the smart environment is still challenging, due to the wide variety of IoT devices that exist within this environment. The next section briefly describes the smart environments to allow the reader to understand the solutions proposed by recent research.

2.3. Smart Environments

The smart environment comprises various types of smart devices, sensors, and computers, that are connected to the internet and embedded in numerous objects within such environment. Smart environments have fast grown as a network of Internet-enabled devices, also known as IoT devices [7]. Currently, IoT devices are adopted in almost all parts of our lives, for instance, home temperature management, smart lighting, smart appliances, smart sensors, and smart cities [7].

Although a smart environment may improve our quality of life, it also provides a new set of data previously untapped and with tremendous forensic value, due to the huge amount of data generated in this environment [4]. The rapid pace at which digital crimes are conceptualised and committed makes it essential to develop better and more intelligent DFI techniques, especially in smart environments. ML techniques might offer a solution to these challenges [4], [27], [28], [29].

However, researchers are still argued that smart environments DF are at a progress level, where an international standard implementation of infrastructure for smart cities has not been completed yet. meanwhile, it provides an opportunity for law enforcement organizations and investigators to swiftly expand their DF solutions and capabilities [27], [29].

The following section presents a brief background of ML techniques.

2.4. Machine Learning (ML)

The application of ML in the field of DF has given rise to a new discipline known as machine learning forensics (MLF), which has the capacity to detect criminal patterns, anticipate criminal activities (e.g. where and when crimes are likely to occur), and automate DF investigative procedures. To conduct MLF, an adapted DF framework is required, which must be capable of capturing and analysing data in smart environments – regardless of whether devices in this smart environment are connected to the internet via wired or wireless networking interfaces [9].

Furthermore, ML is an approach to artificial intelligence (AI) that allows a system to learn on its own from experience and example, rather than from programming. In other words, ML is used to describe a system that continually learns and makes decisions based on data rather than programming [4]. ML is not only utilised for AI goals such as simulating human behaviour, but also to minimise human effort and time spent on complex and time-consuming jobs. Particular machine learning techniques include supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning is a method of developing AI by training a computer program on labelled input data for a certain output. The model is trained until it recognises the underlying patterns and correlations between the input data and the output labels, allowing it to produce appropriate labelling results when given previously unseen data. Supervised learning excels in classification and regression issues. The goal of supervised learning is to make meaning of data in the context of a given topic [10]. Supervised learning was proposed by some researchers to improve DFIs in smart environments, as mentioned in section 3.

In contrast to supervised learning, unsupervised learning is presented with unlabelled data and is designed to detect patterns or similarities on its own. In other words, unsupervised learning techniques include two types: clustering and association, which find all kinds of unknown patterns in data and help to find features that can be useful for categorisation [10].

Reinforcement learning is totally different from both supervised and unsupervised ML techniques. The relationship between supervised and unsupervised techniques can be related to each other with the presence or absence of data labelling. However, reinforcement learning is a subfield of machine learning concerned with how intelligent agents should behave in a given environment. When the system being represented is independent and not affected by an external actor, Markov models are utilised [11]. Markov chains are the simplest type of Markov model and are used to represent systems where all states are observable. Markov chains show all possible states. Applications of this type of model include prediction, which is a probabilistic technique that uses Markov models to predict the future behaviour of some variable based on the current state, and it can be used in many domains.

However, ML has a substantial influence on DF and has various applications in this sector. These applications can improve the overall efficiency of DFIs by finding trends and patterns, similarities, anomalies, and other characteristics inside digital evidence. So, forensic professionals can produce leads and solve crimes in less time and with fewer resources. These advancements lead to the second major contribution of ML applications, which is a reduction in cost for a DFI [22].

Section 3 presents a table that contains overviews of research papers using ML techniques in DF as proposed by different researchers between 2018 and 2023. Section 4 later presents (also in table form) the contribution of ML in the DFI process, in order to identify the gaps in the reviewed papers and suggest high-level solutions.

3. State-of-the-Art Use of Machine Learning Techniques in Digital Forensics

Due to the challenges that traditional DFIs face in smart environments (i.e. the heterogeneity, distribution, and huge amounts of data that exceed human capabilities to manage all these threads in a short time), ML seems to be the best solution for these environments [4]. Once the available literature has been reviewed, this statement will be confirmed. Table 1 presents how ML techniques can be further used to support DF in smart environments and to reduce the hard work and time spent on human effort. Moreover, these technologies can automate the laborious DFI operations of analysing huge amounts and wide ranges of data to increase the likelihood of successfully detecting and investigating cybercrime. This would greatly aid DF professionals to rapidly and effectively determining the fundamental causes of incidents [5].

Table 1 presents MLF from research papers published between 2018 and 2023. Research shows that the main issues that face DF investigators in the smart environment are the large volume of data, and attack and violation detection. The proposed solutions are summarised in Figure 2 and Figure 3.

Figure 2 summarises the applications of MLF that were reported in research papers from 2018 to 2023 to serve as proposed solutions for dealing with the large amounts of data generated in smart environments. The following list explains the elements of Figure 2 in more detail:

IoTDots framework is proposed as a solution to deal with the large amounts of data collected by IoT devices and sensors.
Automatic prioritisation of suspicious file artefacts methodology is proposed as a solution to deal with the growing volume of data and manual retrieval of suspicious files.
Intelligent methods to automate problem-solving are proposed as a solution to deal with the massive amounts of data that must be analysed for digital evidence.
Automation by using ML techniques for classification and using AI techniques for prioritising suspicious devices is proposed as a solution to deal with the growing number of cases needing DF competence and the large volumes of data to be processed.
Automatic text analysis to detect online sexual predatory talks is proposed as a solution to deal with the growth of cybercrime targeting minors, the large volume of data, and the DFI process which is done primarily by hand.
The "VERITAS" mechanism to automatically collect and extract forensic evidence from smart environments is proposed as a solution to deal with the large amounts of data that is generated in smart environments.

Figure 3 summarises the applications of ML in DF as proposed in research published between 2018 and 2023 for detecting data attacks and violations in smart environments. The following list explains Figure 3 in more detail:

An intelligent intrusion detection system to detect regular and malicious attacks on data created in smart environments was proposed as a solution to deal with the simple or complex attacks that face IoT networks in particular.
A blockchain-assisted shared audit framework for identifying data-scavenging attacks in virtualised resources was proposed as a solution to deal with attacks and violation detection in smart environments.
An intelligent forensic analysis mechanism was proposed as a solution to deal with the probability of continual attacks on IoT devices and the low processing power and memory of these devices.

The following section discusses the impact of MLF on the DFI process.

4. The Impact of MLF on the DFI Process

As can be seen in Section 3, Table 1 presents a review of research papers that examined the contribution of ML techniques to DF in smart environments. It also identifies digital forensic issues that each of the reviewed papers addresses, and proposes solutions that are based on machine learning to improve the DFI process.

Table 2 summarises the role of ML techniques in the DFI process. The column headers present the paper reference number, the used ML techniques, and the main ISO/IEC 27043:2015 process class headings.

Table 2 presents the role of ML techniques in the ISO/IEC 27043:2015 DFI processes and highlights gaps where ML techniques may be used to improve the processes. An “X” in a particular cell indicates that the specific ML technique was applied in the processes indicated. The techniques contributed mainly to the initialisation and investigative processes of the ISO/IEC 27043:2015 set of standards and there was a lack of application of ML techniques in the other process areas of this standard.

Applying ML and AI techniques in the areas of the ISO/IEC 27043:2015 can automate and improve the DFI process since the uncovered areas are currently mostly performed manually. For example, the Markov Chain model (see Table 2 [12]) already automates the analysis process through two main components referred to as the ‘modifier’ and ‘analyzer’ components. The ‘modifier’ component examines smart applications in search of forensically significant information, then modifies the smart application by introducing specialised logs and sending them to a specialised logs database. The ‘analyzer’ component uses data processing and Markov Chain models on the logs database to learn the status of the smart environment and the users’ activity during the time of the forensic analysis so as to identify possible security violations from people, devices, or smart apps. However, the technique presented in [12] does not focus on the automation of any of the other ISO/IEC 27043:2015 processes. The remainder of Table 2 can be interpreted in a similar fashion.

Section 4 furthermore proposes the integration of ML techniques into the ISO/IEC 27043:2015 processes that are not currently covered by ML techniques. While there currently exist several digital forensic process models, Table 2 explores the integration of ML techniques into the ISO/IEC 27043:2015 set of standards, since this standard represents the de facto DFI process – owing to its widespread acceptance and the ability to integrate new digital forensic methods into its existing processes. Such integration can improve efficiency and reduce time and human effort by automating the manual tasks of the DFI process. For example, it was proposed by [14] intelligent methods for intrusion detection or real-time intrusion prevention be used through two main techniques – rule-based and anomaly-based – to support DFI. Rule-based techniques mostly utilise databases that include predefined rules to detect known intrusions. The most widespread use of intelligent techniques in this field is connected to the creation of new rules or the optimisation of an enabled set of rules. Anomaly detection may be thought of as a conventional clustering and outlier identification problem in terms of intelligent approaches. Because a detected anomaly is not always proof of intrusion and might be attributed to odd but proper user behaviour, this strategy typically has a high false alarm rate.

Also, [20] proposed an intelligent framework based on clustering and classification. The model learns from past crimes, and when a new crime is registered some of the crime information needs to be inserted by the investigator such as the crime type, location, and time. The clustering process then automatically groups the new crime with previous similar crimes in the system using the k-nearest neighbour and crime-matching classification algorithms. In this way, the investigator can gain insights into the pre-investigation process by exploring the new crime that is then clustered with previous similar crimes.

[23] indicated that the use of ML and deep learning algorithms is effective for cyber-attacks discovering, identifying, and tracing through proposing a framework of cyber-attacks against smart satellite networks.

[24] used a well-structured and realistic dataset to test ML and deep learning techniques that can be used for the DF analysis process to detect multimedia content manipulations. The dataset was technically validated by Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) algorithms, which concluded that SVM had less processing time than CNN, as one of the goals of incorporating ML techniques into DFI is automating processes and reducing time-consuming. Also, ML techniques aim to overcome complexity, consistency, correlation, and data volume, as the evidence is gathered from several sources [26].

[25] proposed a DF analysis system based on natural language processing (NLP) techniques and the blockchain for social media data as a significant source of digital evidence that can support various DFI, NLP is used for data collection, text analysis, and evaluation, blockchain is used for securing the analysed data and avoiding any other attacks.

Moreover, recent research indicated that ML algorithms are also useful for drone data analysis, by generating judgments and predictions about the likelihood of an event occurring by examining varied datasets with varying volumes. As commonalities between data pieces may be discovered by clustering common data samples into a single cluster and then visualizing the data clusters, which can then be labelled, it becomes feasible to forecast the trajectory of drones in flight and cluster these as either legitimate flight pathways or compromised ones [21].

Thus, according to the state of the art of applying ML techniques in DFI within smart environments (which mainly involves the initialisation and investigative processes (see Table 2)), ML techniques should be applied more prominently in the readiness processes, acquisitive processes, and concurrent processes of DFI.

Readiness processes in smart environments will be improved by applying ML techniques to enable automation and pre-incident prediction and detection. Also, a thorough awareness of the setups and the various data sources and types will greatly reduce the time necessary for DFIs. The automation of the DFR processes will enable the automatic capturing and saving of digital evidence from smart environments, based on pre-defined rules, by using a rule-based classifier and association rules. This should provide proactive and preventive methods for automation in such an environment.

Furthermore, DFR principles already assure the forensic soundness of the information gathered, making it appropriate for litigation. Therefore, ML techniques could be applied in the DFR process by making use of techniques such as noise-resistant algorithms, support vector machines, and neural networks. By making use of such ML techniques, investigators could deduct the rules of classification from existing and historical datasets and scenarios to learn and train the readiness model. Then clustering could be applied to improve the accuracy of a classification and allow the model to make decisions by itself.

ML techniques are mainly used for prediction and classification. Therefore, the acquisitive and concurrent processes in ISO/IEC 27043:2015 can be automated using ML techniques. This will benefit the readiness and initialisation processes in this set of standards to predict and detect incidents by using decision trees and neural networks.

On the other hand, the incorporation of ML techniques can be powerful for DFI, but also there is a lack of interpretability and inadequate training data, which may lead to powerless and not properly comprehended models [26].

5. Conclusion

By presenting an overview of MLF research papers from 2018 until 2023, this paper shows how ML techniques have recently been used across different areas of the DFI process in smart environments. Common challenges for DF in these environments were also highlighted. Although intelligent technologies such as ML have the potential to aid in DFI, these technologies mainly facilitate the automation of manual DFI processes. However, this paper reports that numerous research papers found that ML techniques are applied in DF in a bid to improve the efficiency of the DFI process by means of automation that decreases the investigator’s manual effort and hard work. Also, it investigated numerous ways to highlight what to expect in the future from MLF applications. Finally, it discussed the role of ML techniques in the DFI process as advocated in ISO/IEC 27043:2015. This was done to highlight gaps that need more attention and where ML techniques can also be applied to improve the current DFI process.

References

D. POPESCUL and L. RADU, "Data Security in Smart Cities: Challenges and Solutions", Informatica Economica, vol. 20, no. 12016, pp. 29-38, 2016. [CrossRef]
D. Quick and K. Choo, "Big forensic data management in heterogeneous distributed systems: quick analysis of multimedia forensic data", Software: Practice and Experience, 2016. [CrossRef]
S. Watson and A. Dehghantanha, "Digital forensics: the missing piece of the Internet of Things promise", Computer Fraud & Security, vol. 2016, no. 6, pp. 5-8, 2016. [CrossRef]
X. Du et al., "SoK", Proceedings of the 15th International Conference on Availability, Reliability and Security, 2020. [CrossRef]
V. Kebande, R. Ikuesan, N. Karie, S. Alawadi, K. Choo and A. Al-Dhaqm, "Quantifying the need for supervised machine learning in conducting live forensic analysis of emergent configurations (ECO) in IoT environments", Forensic Science International: Reports, vol. 2, p. 100122, 2020. [CrossRef]
A. Valjarevic and H. Venter, "A Comprehensive and Harmonized Digital Forensic Investigation Process Model", Journal of Forensic Sciences, vol. 60, no. 6, pp. 1467-1483, 2015. [CrossRef]
M. Conti, A. Dehghantanha, K. Franke and S. Watson, "Internet of Things security and forensics: Challenges and opportunities", Future Generation Computer Systems, vol. 78, pp. 544-546, 2018. [CrossRef]
A. Valjarević, H. Venter and R. Petrović, "ISO/IEC 27043:2015 – Role and application", in 24th Telecommunications forum TELFOR, Serbia, Belgrade, 2016.
M. Qadir and A. Varol, “The role of machine learning in Digital Forensics,” 2020 8th International Symposium on Digital Forensics and Security (ISDFS), 2020. [CrossRef]
Goni, J. Mishion Gumpy, T. Umar Maigari, M. Muhammad and A. Saidu, "Cybersecurity and Cyber Forensics: Machine Learning Approach", Machine Learning Research, vol. 5, no. 4, p. 46, 2020. [CrossRef]
S. Iqbal and S. Abed Alharbi, "Advancing Automation in Digital Forensic Investigations Using Machine Learning Forensics", Digital Forensic Science, 2020. [CrossRef]
L. Babun, A. Sikder, A. Acar and A. Uluagac, "IoTDots: A Digital Forensics Framework for Smart Environments", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1809.00745.
Du X, Scanlon M (2019) Methodology for the automated metadata-based classification of incriminating digital forensic artefacts. In Proceedings of the 14th International Conference on Availability, Reliability and Security 1-8. Link: https://bit.ly/2Oqh6u6.
A. Krivchenkov, B. Misnevs and D. Pavlyuk, "Intelligent Methods in Digital Forensics: State of the Art", Lecture Notes in Networks and Systems, pp. 274-284, 2019. [CrossRef]
L. Babun, A. Sikder, A. Acar and S. Uluagac, "The Truth Shall Set Thee Free: Enabling Practical Forensic Capabilities in Smart Environments", Proceedings 2022 Network and Distributed System Security Symposium, 2022. [CrossRef]
P. Shakeel, S. Baskar, H. Fouad, G. Manogaran, V. Saravanan and C. Montenegro-Marin, "Internet of things forensic data analysis using machine learning to identify roots of data scavenging", Future Generation Computer Systems, vol. 115, pp. 756-768, 2021. [CrossRef]
Ngejane, J. Eloff, T. Sefara and V. Marivate, "Digital forensics supported by machine learning for the detection of online sexual predatory chats", Forensic Science International: Digital Investigation, vol. 36, p. 301109, 2021. [CrossRef]
G. Kalnoor and S. Gowrishankar, "IoT-based smart environment using intelligent intrusion detection system", Soft Computing, vol. 25, no. 17, pp. 11573-11588, 2021. [CrossRef]
M. Mazhar et al., "Forensic Analysis on Internet of Things (IoT) Device Using Machine-to-Machine (M2M) Framework", Electronics, vol. 11, no. 7, p. 1126, 2022. [CrossRef]
Y. Adam and C. Varol, “Intelligence in digital forensics process,” 2020 8th International Symposium on Digital Forensics and Security (ISDFS), 2020. [CrossRef]
Z. Baig, M. A. Khan, N. Mohammad, and G. B. Brahim, “Drone forensics and machine learning: Sustaining the investigation process,” Sustainability, vol. 14, no. 8, p. 4861, 2022. [CrossRef]
A. Jarrett and K. R. Choo, “The impact of automation and artificial intelligence on Digital Forensics,” WIREs Forensic Science, vol. 3, no. 6, 2021. [CrossRef]
N. Koroniotis, N. Moustafa, and J. Slay, “A new intelligent satellite deep learning network forensic framework for Smart Satellite Networks,” Computers and Electrical Engineering, vol. 99, p. 107745, 2022. [CrossRef]
S. Ferreira, M. Antunes, and M. E. Correia, “A dataset of photos and videos for Digital Forensics Analysis Using Machine Learning Processing,” Data, vol. 6, no. 8, p. 87, 2021. [CrossRef]
Z. Shahbazi and Y.-C. Byun, “NLP-based digital forensic analysis for online social network based on system security,” International Journal of Environmental Research and Public Health, vol. 19, no. 12, p. 7027, 2022. [CrossRef]
Y. A. Balushi, H. Shaker, and B. Kumar, “The use of machine learning in digital forensics: Review paper,” Proceedings of the 1st International Conference on Innovation in Information Technology and Business (ICIITB 2022), pp. 96–113, 2023. [CrossRef]
Y. C. Tok and S. Chattopadhyay, “Identifying threats, cybercrime and digital forensic opportunities in smart city infrastructure via threat modeling,” Forensic Science International: Digital Investigation, vol. 45, p. 301540, 2023. [CrossRef]
Gumusbas, T. Yldrm, A. Genovese, and F. Scotti, “A comprehensive survey of databases and deep learning methods for cybersecurity and Intrusion Detection Systems,” IEEE Systems Journal, vol. 15, no. 2, pp. 1717–1731, 2021. [CrossRef]
Hussein Ali Sahib, M. Y. Alsudani, M. K. Ali, Haydar Qassim Abbas, K. Moorthy, and Myasar Mundher Adnan, “Proposed intelligence systems based on digital Forensics: Review paper,” vol. 80, pp. 2647–2651, Jan. 2023. [CrossRef]
Salih and N. Dabagh, “Digital Forensic Tools: A literature review,” Journal of Education and Science, vol. 32, no. 1, pp. 109–124, 2023. [CrossRef]
Palmese, A. E. C. Redondi, and M. Cesana, “Feature-Sniffer: Enabling IoT Forensics in OpenWrt based Wi-Fi Access Points,” arXiv.org, Feb. 14, 2023. https://arxiv.org/abs/2302.06991 (accessed June 5, 2023).

Figure 1. High-level overview of the 27043 international standards [8]

Figure 2. MLF solutions for large amounts of data in smart environments

Figure 3. MLF solutions for attacks and violations detection in smart environments

Table 1. Machine learning digital forensics reviewed in research papers between 2018 and 2023.

Ref. No.	Paper title	Problem statement of the paper	The solution proposed by the paper
[12]	IoTDots: A Digital Forensics Framework for Smart Environments	The amount of data collected by IoT devices and sensors is immense and contains valuable forensic evidence. This data can help identify and prevent unauthorised access within smart environments.	A new framework known as IoTDots is designed to help protect the data collected by various smart devices and applications. It features two main components: the IoTDots Analyzer and the IoTDots modifier. The former scans the source code of the applications and detects forensic information. The latter automatically inserts tracking logs and reports the results.
[13]	Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts	One of the most discussed challenges in DFI is the growing volume of data. Since the majority of file artefacts on seized devices are usually irrelevant to the investigation, manually retrieving suspicious files relevant to the investigation is very difficult.	To reduce the amount of manual analysis required in DFI, this research proposed a methodology for the automatic prioritising of suspicious file artefacts. Rather than providing the final analysis results, this methodology aims to predict and recommend the artefacts that are likely to be suspicious. A supervised machine learning approach is used, which makes use of previously processed case results.
[14]	Intelligent Methods in Digital Forensics: State of the Art	One of the main issues with DF is that a massive amount of data must be analysed for digital evidence. The primary goal of this work is to improve this difficult forensic process by employing intelligent methods for analysing digital evidence.	The ability of computers to learn a specific task from data is known as "intelligent methods", which also includes data mining, machine learning, soft computing, and traditional artificial intelligence. This term is commonly used to express ways to automate problem solving in DF. In support of DF, two main intelligent approaches are utilised, namely rule-based and anomaly-based.
[15]	The Truth Shall Set Thee Free: Enabling Practical Forensic Capabilities in Smart Environments	The interaction of devices, users, and apps in smart environments creates a large amount of data. Such data contains valuable forensic information about smart environment events and activities. However, current smart platforms lack any digital forensic capability for identifying, tracing, storing, and analysing data generated in these environments.	"VERITAS", a novel and practical DF capability for smart environments, is introduced in this paper. The Collector and Analyzer are the two main components of VERITAS. The Collector employs mechanisms to automatically collect forensically relevant data from the smart environment. The Analyzer then uses a First Order Markov Chain model to extract valuable and usable forensic evidence from the collected data for the purposes of a forensic investigation.
[16]	Internet of Things Forensic Data Analysis using Machine Learning to Identify Roots of Data Scavenging	To detect attacks and violations in an IoT environment, intensive data analysis and computational intelligence are required. On such platforms, advanced computer systems based on machine learning and smart computing are used to identify enemies.	To discover and declare the presence of adversaries, DF necessitates intensive data analysis, such as retrieving and confirming system logs, blockchain information evaluation, and so on. The blockchain-assisted shared audit framework was proposed in this paper to analyse DF data in an IoT environment. It was created to identify the source and cause of data scavenging attacks in virtualised resources. It uses blockchain technology to manage access logs and controls. Using logistic regression ML and cross-validation, access log data is examined for the consistency of adversary event detection.
[5]	Quantifying the Need for Supervised Machine Learning in Conducting Live Forensic Analysis of Emergent Configurations (ECO) in IoT Environments	In an IoT system, particularly in the case of emergent configurations (ECOs), data might be dynamic, making it difficult to classify information during live forensics. In this sense, live forensics refers to a forensic investigation that is done in near real-time.	A conceptual framework based on supervised machine learning techniques was proposed in this paper. One of the advantages of using supervised ML techniques in live forensics is the ability of such techniques to predict possible events based on past occurrences. In addition, an automated feature identification was used to prevent redundancy throughout the feature selection and elimination.
[4]	SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic Investigation	The number of cases needing DF competence and the volume of data to be processed have overburdened digital forensic investigators. Many large data challenges are considered as having been solved by artificial intelligence. Automated evidence processing based on artificial intelligence techniques holds considerable potential for speeding up the digital forensic analysis process while improving case-processing capacity.	In DFI, automation uses ML techniques for classification. ML techniques can obtain important information for investigations more efficiently by exploiting existing digital evidence-processing knowledge. Also, digital evidence triage was developed for the prompt detection, processing and interpretation of digital evidence. Currently, with AI techniques, the investigator determines the priority of device gathering and processing at a crime scene.
[17]	Digital Forensics Supported by Machine Learning for the Detection of Online Sexual Predatory Chats	With the growth of cybercrime that targets minors, chat logs can be examined to detect and report harmful behaviour to law authorities. This can make a significant difference in protecting youngsters on social media platforms from being abused by cyber predators. Since DFI is done primarily by hand, the enormous volume and variety of data cause DF investigators to have a tough assignment.	To enable the automatic finding of hazardous talks in chat logs, the approach suggested in this research leverages a DF process model backed by ML methodologies. ML has previously been used successfully in the field of text analysis to detect online sexual predatory talks.
[18]	IoT-based Smart Environment Using Intelligent Intrusion Detection System	One of the most fundamental characteristics of any smart device in an IoT network is its ability to acquire a bigger set of data than has been produced and then send the obtained data to the destination/ receiver server through the internet. Thus, IoT-based networks are particularly vulnerable to simple or sophisticated assaults, which must be discovered early in the data transmission process in order to protect the network against these hostile attacks.	The primary purpose of this study was to develop and build an intelligent intrusion detection system utilising machine learning models so that assaults in the IoT network may be discovered. The approach was designed to account for both regular and malicious attacks on data created in an IoT smart environment.
[19]	Forensic Analysis on Internet of Things (IoT) Device Using Machine-to-Machine (M2M) Framework	The adaptability of IoT devices raises the probability of continual attacks on them. Due to the low processing power and memory of IoT devices, security researchers have found it challenging to preserve records of diverse attacks performed on these devices during a DFI.	An intelligent forensic analysis mechanism is proposed for the automatic detection of attacks on IoT devices based on the machine-to-machine framework. However, the proposed mechanism combines several ML techniques and different forensic analysis tools to detect different types of attacks. Furthermore, by providing a third-party logging server, the problem of evidence gathering has been overcome. To assess the effect and type of attacks and violations, forensic analysis is done on logs utilising a forensic server.
[30]	Digital Forensic Tools: A Literature Review	The growth of IoT devices that produce a large amount of forensic data presents the main challenge of IoT forensics.	A smart fridge was selected as an IoT device to examine and investigate. The dataset was examined using two ML algorithms, Bayes net, and decision stump. Each algorithm represents a distinct idea, a stump tree, which is a simple version of the decision tree ML technique. The Bayes net is useful for estimating the likelihood of numerous recognised causes, one of which was the occurrence of an event. The validation results indicate that Bayes net algorithm is more accurate than the decision stump tree.
[31]	Feature-Sniffer: Enabling IoT Forensics in OpenWrt based Wi-Fi Access Points	IoT forensics and smart environments with their recognised challenges create a great opportunity to develop new forensic tools to make the task of forensic investigators easier, which can be used for acquiring, preserving, and also analysing such forensic data.	A user-friendly tool was proposed for smart devices that support WiFi and used in smart environment scenarios to allow forensic investigators, network administrators, and data scientists access to various features of network traffic with simple steps. The proposed tool allows network traffic features to be computed in real-time on any WiFi access point running the OpenWrt firmware, avoiding the time-consuming tasks of dumping network traffic and implementing the needed procedures to analyse the captured traffic.

Table 2. The role of ML techniques in the DFI process ISO/IEC 27043:2015

Reference No.	Used ML technique	Readiness processes	Initialisation processes	Investigative processes
[12]	Markov Chain model			X
[17]	Logistic regression			X
[5]	Supervised machine learning	X		X
[13]	Supervised machine learning			X
[14]	Unsupervised identification		X	X
[15]	Markov Chain model			X
[16]	Logical regression		X	X
[18]	Markov chain model		X
[19]	Decision tree algorithm			X

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.