1. Introduction
AI and ML models have evolved tremendously over the recent past, bringing new paradigms and possibilities to the digital environment. Such a paradigm is generative AI, which is a new type of AI and ML [
1,
2]. This form of AI concerns the ability of an appliance to write text, draw images, or produce audio based on the commands given by a user. We have found in the history originating from the 1950s that the Hidden Markov Model has been worked out and the Gaussian Mixture Model has also been worked out [
1,
3], these models have become workable due to the advancement of deep learning [
4]. The oldest model of the generation is the N-gram, which works on learning word distribution, the latest technology is the transformed architecture of existing models, which is applied to the latest chatbots available today. Nowadays, a prime example of emerging generative AI chatbots are Google Gemini, Microsoft Bing Copilot, formerly known as Bing Chat, and ChatGPT [
5,
6,
7].
Google’s Gemini, which is inherently multimodal, has been incorporated into a variety of Google’s products [
8]. The moniker ’Gemini’ is inspired by the astrological sign that symbolizes duality and adaptability, as well as NASA’s Project Gemini, which stands as a testament to monumental efforts and a gateway to greater accomplishments. The development of Gemini is a result of the collaborative efforts between DeepMind and Google Research, epitomizing their shared aspirations and innovative spirit [
9]. Gemini models are multimodal, which implies that they can both process and create text, images, sound, videos, and code [
10,
11]. They are passed through large numbers of data and utilize neural net approaches like question answering, text generation, and others to output their results. Gemini models can also handle long context sequences of diverse data types in contextual representation learning, for instance, in sequence-to-sequence variants [
12].
Likewise, Microsoft Bing Copilot also known as Bing Chat is another AI agent that can accomplish a lot of functions [
13]. It is a way to surf the Web, a valuable ally for turning a thought into a new website, a piece of art, a blog, or an article. The Copilot is capable of participating in conversations and giving elaborate responses; it is capable of producing multiple kinds of products such as text, images, and code [
14].
ChatGPT is created by OpenAI which is capable of responding to further questions and conversing [
15]. ChatGPT was primarily founded on a family of Large Language Models (LLMs), with the current ChatGPT. Subsequently, there have been subsequent changes that have been made in the ChatGPT technology. An advanced feature, memory, was incorporated into ChatGPT in February of 2024 and this made subsequent conversations more interesting because it retained things discussed in all the chats [
16].
The presence of these three models has enormously impacted the generative AI competition. Since the introduction of the new-targeted weapon systems, the change in the ratio of upcoming cyber-attacks has been significant. We are in the period of AI [
17], AI and ML have transformed the digital space by developing machine learning models rapidly within recent years. These three models give the users a new angle to change or allow the inputs or the commands to be utilized anytime and in any way the users wish to [
5].
Figure 1 illustrates the commonalities in attack capabilities among three prominent AI models: ChatGPT, Google Gemini, and Microsoft Bing Copilot. The diagram provides a clear visual representation of the shared vulnerabilities these systems may face, including Social Engineering Attacks, Phishing Attacks, and Adware.
The motivation behind this research is digital safety, as generative AI innovations empower both defenders and attackers. These models can be utilized by cyber defenders and cybercriminals alike to create a vast array of cyber threats, vulnerabilities, and attack patterns [
4,
18], posing significant risks to cybersecurity. They aid cybersecurity professionals in combating threats, identifying vulnerabilities, and implementing defensive controls. Recently, there has been a surge in AI-powered cyber attacks [
19,
20]. Conversely, these models may assist hackers and cybercriminals in generating social engineering attacks, payloads, creating exploits, and identifying vulnerabilities [
21,
22]. Attackers can employ techniques such as jailbreaks and reverse psychology positive reinforcement to trick models into bypassing security controls implemented by companies to ensure ethical guidelines in content generation [
23,
24]. The key contributions of this research are:
- 1.
The article provides an extensive review of the current status and capabilities of the three most popular generative AI chatbots—ChatGPT, Google Gemini, and Microsoft Bing—highlighting their potential applications and inherent risks in cybersecurity and privacy.
- 2.
It examines the various ways in which malicious actors can exploit generative AI chatbots to generate phishing emails, security exploits, and executable payloads, thereby intensifying cyber-attacks.
- 3.
It proposes advanced content filtering mechanisms and defense strategies to empower generative AI chatbots to defend against malicious exploitation, ensuring enhanced security and privacy protection.
This taxonomy provides a comprehensive overview of the research contributions made in this article, offering readers a clear understanding of the scope and depth of our study on the impact of Generative AI Chatbots on cybersecurity.
3. Literature Review
This section presents an overview of current research on using generative AI to create attacks. The details of the above studies are presented in the subsequent sections.
Table 2 provides a comparative Analysis of Contemporary Studies and a summarization of research contributions in scholarly articles. This structure helps to address the general theme of generative AI in the context of cybersecurity and judge the processes, outcomes, and backup plans of these methods. This section involves different forms of threats that are explained with insight into the possibilities and motives of the attackers, as well as the different areas of threats which range from social engineering, phishing attacks, and generation of malware code [
51]. The work also reveals the measures and products applied in the struggle against cyber threats and the protection of property. Its purpose is to make the results relevant to further research programs, social policy objectives, and adequate cybersecurity strategies.
Yamin et al.[
51] explored the duality of AI in defensive and offensive operations. The paper discusses recent instances of generative AI attacks, like messing with traffic signals and manipulating medical images. It talks about the dangers of using AI as a weapon, offers ways to deal with these issues, and suggests control rules. Guembe et al. [
18] highlight how AI is being misused in cyberattacks. The paper provides insights into the negative impact of these attacks and discusses current research on the subject. The results reveal that 56% of AI-driven attacks focus on gaining access and penetrating systems, with CNN being a common technique.
Blum et al.[
52] discuss the inner workings of the programs in ChatGPT. Researchers tried to bypass the security measure and found out that ChatGPT could go to any lengths to avoid answering some questions. The researcher proposed methods that can be used to break ChatGPT and made ChatGPT answer questions it didn’t entertain otherwise. Gupta et al. [
4] presents an overview of ChatGPT and its application in cybersecurity. The paper also focuses on its capabilities and limitations. It explores the concepts of jailbreaking, reverse psychology, and model escaping. It investigates ChatGPT’s response to different kinds of requests, such as generating malicious code or asking for ways to perform any unethical activity.
Dhoni et al. [
53] explores how generative AI (GenAI) and cybersecurity are connected. The paper talks about the importance of cybersecurity in today’s digital world and focuses on social engineering, especially phishing. GenAI, which can create text, images, and media, is introduced, mentioning models like GPT-4, Google Gemini, and DALL-E2. The paper discusses GenAI’s impact on the internet, mentioning chatbots and their ability to create various content forms. Prasad et al. [
24] explores the evolving role of AI in cybersecurity, with a particular focus on the responsibilities of Chief Information Security Officers (CISOs). The keywords suggest that the paper may delve into topics such as the application of AI tools, including chat generative pre-trained transformers (ChatGPT), in addressing security vulnerabilities, incident management, and the overall enhancement of cybersecurity.
Fezari et al.[
54] explain the newly emerging field of artificial intelligence. The paper covers the original definition of generative AI, its power, and its workings. It also discusses various generative AIs like ChatGPT, GPT-4, and DALL-E, defining and comparing their features through different examples, such as generating various types of texts and images. Hassan et al.[
19] discuss the data-driven nature of the world and provide historical examples of cyber attacks. The paper also discusses how businesses are using AI and elaborates on the 3R model (The system has become more robust, reacting faster and being more resilient). It includes AI-driven attack strategies such as intelligent target analysis, password entry, CAPTCHA manipulation, malware evasion, and the generation of complex phishing URLs.
Renaud et al. [
55] discuss cybersecurity risks posed by advances in artificial intelligence such as ChatGPT, which allow hackers to create phishing communications that attack normal detection. Innovative strategies rather than policy-based approaches are needed to combat these evolving threats. Jabbarova et al. [
56] highlight the impact of integrating AI into cybersecurity, explaining the role of AI in operationalizing cyber attacks and improving threat response, and predictive modeling. Information and intelligence addresses topics such as narrative issues and privacy concerns related to the integration of intelligence.
Oviedo et al. [
57] talk about ChatGPT, a sophisticated AI language model, which is widely used for various tasks like customer service and chatbots. But there are disadvantages – it can share incorrect or even unsafe information, and often it is related to students’ safety. This paper also aimed at identifying the overall efficiency of ChatGPT in areas of safety, for instance, using a phone when driving and managing stress at the workplace. Alawida et al. [
58] analyzesChatGPT, a leading language model transforming generative text. It covers ChatGPT’s architecture, training data, evaluation metrics, and its advancements over time. The study evaluates ChatGPT’s capabilities and limitations in NLP tasks like translation, summarization, and dialogue generation, and compares it to other models. It also addresses ethical and privacy concerns, and potential security risks in cyberattacks, and offers mitigation strategies.
Marshall et al. [
59] discussed that large Language Models (LLMs) are AI tools that can process, summarize, translate texts, and predict future words in a sentence, mimicking human speech and writing. However, a key concern is the inaccuracy in the content they generate. LLMs can also be trained to detect data breaches, ransomware, and organizational vulnerabilities before cyberattacks. Despite being new, LLMs have significant potential, especially in generating code that aids cyber analysts and IT professionals. Qammar et al. [
60] explore the evolution of chatbots from early models like ELIZA to advanced models like GPT-4, focusing on ChatGPT’s working mechanism. With the rise in popularity, chatbots, including ChatGPT, face cybersecurity threats and vulnerabilities. The paper reviews literature, reports, and incidents related to attacks on chatbots, specifically how ChatGPT can create malware, phishing emails, undetectable zero-day attacks, and generate macros and LOLBINs.
Mardiansyah et al. [
61] compare the performance of two leading AI models, ChatChatGPTand Google Gemini, in identifying spam emails using the SpamAssassin public mail corpus. The findings reveal that ChatGPT-4 demonstrates a balanced performance with high precision and recall, making it suitable for general spam detection tasks. In contrast, Google Gemini excels in recall, highlighting its potential in scenarios where capturing the maximum number of spam emails is paramount, despite a slightly higher tendency to misclassify legitimate emails as spam. Li et, al. [
62] highlight the controversies surrounding AI are often familiar and reflect existing societal issues. The recent debate over Google’s Gemini is a prime example, highlighting how social problems, not technological ones, are the root cause of many AI concerns. The issue of AI "wokeness" is a symptom of our own societal struggles, rather than a problem with the technology itself. The creation process of AI reveals four key aspects that contribute to the "Black-Nazi problem" and other socio-technical issues.
Jacobi et, al., [
63] highlights the increasing complexity and frequency of cyber threats that demand new approaches to strengthen Governance, Risk, and Compliance (GRC) frameworks. A novel approach is to leverage artificial intelligence, specifically large language models (LLMs) like ChatGPT and Google Gemini, to enhance cybersecurity advice. Research reveals that ChatGPT outperforms Google Gemini in providing relevant, accurate, complete, and contextually appropriate advice. While both models have room for improvement, the study demonstrates the potential benefits of integrating LLMs into GRC frameworks, especially when combined with human expertise to tackle complex challenges. Atzori et al. [
64] examine how Large Language Models (LLMs) like ChatGPT, GPT-4, Claude, and Bard can be exploited to generate phishing attacks. These models can create convincing phishing websites and emails without modifications, and attackers can use malicious prompts to scale threats easily. To combat this, researchers developed a BERT-based detection tool that accurately identifies phishing prompts and works across various LLMs.
Roy et al. [
65] address the security risks of reusing easy-to-remember passwords and the inadequacy of traditional password strength evaluations, which often ignore personal information shared on social networks. It introduces an enhanced tool, "Soda Advance," that evaluates password strength using public data from multiple social networks. The study also explores the effectiveness of Large Language Models (LLMs) in generating and evaluating passwords. Alawida et al. [
66] explore how ChatGPT, a powerful generative AI model, can be exploited by malicious actors to launch cyberattacks. It examines the tactics used to leverage ChatGPT for cyber assaults and presents examples of potential attacks. A survey of 253 participants reveals that over 80 percent are aware that ChatGPT can be used for malicious purposes.
From this literature review, we find that, in comparison to previous studies, this article provides an extensive review of different methods used to trigger malicious content from chatbots and offers detection mechanisms. It includes real-world case studies.
Table 2 presents a comparative analysis of contemporary studies and summarizes research contributions in scholarly articles, highlighting the contribution of this study.
The research methodology used in this study is comprehensive and multifaceted, aimed at conducting a thorough investigation of generative AI-powered attacks and their implications for cybersecurity. First, a comprehensive review of existing literature is conducted to synthesize prior research studies in the field, examining their methodologies, findings, and contributions. Through this comparative analysis, the study intends to identify gaps and key insights in the understanding of AI-powered cyber-attack techniques, laying the groundwork for further investigation.
5. Exploiting Generative AI in Cybercrimes
This section covers the trend of using generative AI for malicious activities, focusing on how these tools assist cybercriminals in executing sophisticated attacks [
90,
91,
92]. The section will will explore key areas where generative AI has been exploited, including social engineering attacks, phishing schemes, automated hacking, and the generation of malware payloads. The examples and case studies are provided using ChatGPT, Google Gemini, and Microsoft Bing Copilot.
5.1. Social Engineering Attacks
Social engineering is a cybersecurity issue that entails coercing people into doing things or disclosing private information. Malicious actors may be able to take advantage of Chatbot’s remarkable capacity for context understanding and producing content that appears human. Consider the scenario where an attacker gains access to a victim’s job role and location of employment, among other basic personal details [
93,
94]. They can use chatbots to produce a message that appears to be coming from a supervisor or coworker, requesting sensitive data, or taking actions, such as clicking on an innocuous link or using formal language [
95]. The risk comes from ChatGPT’s ability to provide messages that are plausible and in line with the victim’s expectations, which raises the possibility of compliance.
The potential misuse is depicted in
Figure 3, which shows how communications that are relevant to the situation and compelling can be used in social engineering attacks. The attack scenario is implemented in all three platforms. The image shows that all three successfully generated the required results.
In addition to that, Google Gemini has a multimodal function, thus making it more effective for social engineering assaults. Thus, including data from various formats, including emails, documents, and even voice recordings, Gemini can create even more believable messages. For instance, it is possible for an attacker to use Gemini and craft a life-like-looking email conversation thread or a voice message from a friendly co-worker, amplifying the attack’s credibility.
It is also evident that Microsoft Bing Copilot has good properties when it comes to producing realistic and relevant context messages. It works well with other Microsoft products, so it can pull contextual information from sources like Microsoft Teams, Outlook, and much more. This helps Copilot to provide messages that are not only persuasive but also fully embedded within the victim’s routine. The high realism of the messages produced by these models means that there is a significant threat that the attackers can conduct attacks much more effectively if they turn to AI-orchestrated campaigns.
5.2. Phishing Attack
Many people fall victim to cyber criminals who use techniques such as ‘phishing’; this is when the criminals posing as other reliable organizations trick people into revealing their details. Phishing emails are relatively easy to make and can be even specially tailored sending a different message to each recipient [
96,
97]. For example, an attacker can use chatbots to write an email that appears to come from a reputable online store, informing the receiver about a problem with a recent purchase and directing them to a fake website. As shown in
Figure 4, all three models’ proofs of concept (POCs)—ChatGPT, Google Gemini, and Microsoft Bing Copilot—demonstrate how AI-generated content can be utilized to craft convincing phishing attacks. Each model showcases distinct strengths in generating persuasive text, making it easier for cybercriminals to exploit unsuspecting victims. The outputs illustrate the potential risks associated.
The multimodal nature of Google Gemini can create related live phishing messages along with text, images, and other types of media making the deceit even more credible. An attacker may send rather professional and colorful emails to the victim, which may mimic the logo and style of the organization that the target believes is sending the messages. Bing Copilot, tied in with other Microsoft applications, can pull information from other sources such as Microsoft Outlook and Teams to write messages that integrate with the victim’s daily work. Phishing attempts frequently use psychological concepts like fear and hurry to make victims react quickly. Robust artificial intelligence systems, educated on copious amounts of communication data, allow attackers to produce emails that strikingly resemble authentic correspondence. Phishing assaults are more deceiving because of this increased imitation. AI-powered phishing emails can create stories that arouse anxiety or urgency, which can lead to rash decisions and increase the likelihood of an attack.
5.3. Attack Payload Generation
Attack Payload Generation refers to the process of creating a malicious piece of code or a command designed to exploit a vulnerability in a system or application. This "payload" is what an attacker uses to carry out their attack, such as executing unauthorized actions, stealing data, or gaining control over a system. In the context of cybersecurity, attack payloads can take various forms, including:
- 1.
Malicious Code: Scripts or executables that perform harmful actions when executed on a target system.
- 2.
Exploit Code: Code specifically designed to take advantage of a known vulnerability in software or hardware.
- 3.
Commands: Sequences of commands that, when executed, result in an attack, such as SQL injection commands that manipulate an SQL database.
It is technically possible for advanced AI models and chatbots (like ChatGPT, Microsoft Bing Copilot, and Google Gemini) to assist in generating attack payloads, especially if they are misused by individuals with malicious intent. With ChatGPT’s text-generating features, malicious code that carries out unlawful actions—attack payloads—can be produced [
98]. We used two types of payloads for this experiment, (1) SQL Injection (SQLi) and (2) Web Application Firewalls (WAFs).
SQLi is a type of attack that allows an attacker to interfere with the queries that an application makes to its database. An SQL injection payload is a string of code inserted into an input field to manipulate the execution of SQL queries. By using specially crafted SQL commands, an attacker can gain unauthorized access to data, extract sensitive information, or even modify or delete data in the database.
WAFs are security systems that monitor and filter incoming traffic to web applications to block malicious requests. WAF payload generation involves crafting specific patterns or requests that can either test the effectiveness of a WAF or exploit a weakness in its configuration.
Figure 5 illustrates the outcomes of testing, showing the successful generation of SQL Injection and WAF payloads by ChatGPT, the partial success of Microsoft Bing, and the complete non-response from Google Gemini.
In our exploration we generated these two payloads through various AI chatbots, we tested three prominent models: ChatGPT successfully generated both SQL Injection payloads and WAF payloads. Microsoft Bing was able to generate only the WAF payload. Google Gemini declined to generate any payloads at all.
5.4. Ransomware Malware Code Generation
Ransomware is a type of malicious software designed to block access to a computer system or files until a sum of money (the ransom) is paid. It typically encrypts the victim’s files, making them inaccessible, and demands payment for the decryption key. Ransomware attacks can significantly disrupt businesses and individuals, leading to financial losses, reputational damage, and data loss. On the other hand, malware code generation refers to the process of creating malicious software designed to infiltrate, damage, or gain unauthorized access to computer systems. This can encompass a variety of malware types, including viruses, worms, trojans, spyware, adware, and ransomware.
Malware and ransomware continue to be serious concerns in today’s digital environment. Installed without authorization, malware carries out nefarious tasks such as password theft. By encrypting data, ransomware prevents users from accessing them and demands payment for the decryption key. It takes a lot of experience and talent to write these malicious software pieces, but a potent AI model may be able to automate this process and speed up the development of new threats.
- 1.
WannaCry is a ransomware attack targeting Windows systems, encrypting files, and demanding Bitcoin for decryption [
99]. It exploits vulnerabilities in the Server Message Protocol to propagate. It encrypts files and makes the system unusable [
100]. Users are then prompted to pay a ransom in Bitcoin for decryption. It exploits vulnerabilities in the Server Message Protocol (SMB) to spread quickly across networks, affecting numerous systems within organizations [
101].
- 2.
NotPetya masquerades as ransomware but offers no decryption key. It encrypts critical system files, effectively rendering machines inoperable [
102]. The attack spreads through network vulnerabilities and exploits, often disguised as a software update. Its primary goal is disruption rather than financial gain [
103].
- 3.
Ryuk ransomware stems from prior malware attacks, often related to TrickBot, which is distributed via phishing [
104]. It is delivered via phishing emails and is designed to encrypt important organizational files, demanding a ransom for their release. It focuses on high-value targets, including businesses, to maximize the ransom payout [
105].
- 4.
REvil is ransomware used by hackers to encrypt files for ransom [
106]. REvil, also known as Sodinokibi, is a ransomware strain used in targeted attacks to encrypt files and demand payment for decryption. Attack vectors include phishing emails, exploit kits, and Remote Desktop Protocol exploits. Hackers use REvil to monetize stolen data, offering double extortion by threatening to leak sensitive information if the ransom is not paid [
107].
- 5.
Locky ransomware spreads via email, encrypting files and disabling systems [
108]. Locky ransomware typically spreads through malicious email attachments, encrypting files on infected systems and rendering them inaccessible [
109]. It uses strong encryption algorithms and can automate the infection process across a network, affecting multiple machines. Locky is known for its ability to generate ransom demands in various formats, compelling victims to pay for data recovery [
110].
Figure 6 analyzes the responses of different AI models, specifically ChatGPT, Google Gemini, and Microsoft Bing, in generating code related to notable ransomware attacks: WannaCry and NotPetya. The analysis implies that ChatGPT demonstrated the capability to generate code snippets associated with these specific types of malware, while the other two models did not.
5.5. Adware Generation
Adware Generation refers to the process of creating adware, a type of software that automatically displays or downloads advertisements without the user’s consent. Adware can take various forms, including pop-up ads, banner ads, and in-app advertisements, and it often aims to generate revenue for its developer through advertising or data collection. Adware takes advantage of user engagement with the adverts.
Figure 7 presents the results of a simulation aimed at exploring the effectiveness of various AI models—ChatGPT, Microsoft Bing Copilot, and Google’s Gemini—in responding to requests related to adware generation and information retrieval. The findings indicate that while ChatGPT and Bing Copilot provided varying levels of access and information regarding adware, Google’s Gemini was more restrictive in its responses.
5.6. Trojon Generation
A Trojan (short for Trojan Horse) is a type of malicious software that deceives users by masquerading as legitimate software or by being embedded in legitimate applications [
111]. Unlike viruses or worms, Trojans do not replicate themselves; instead, they rely on users to download and install them under false pretenses.
Figure 8 illustrates the responses of three AI models to requests for Trojan code generation. Both ChatGPT and Microsoft Bing Copilot successfully generated malicious code upon request, demonstrating their ability to provide such information when approached under specific narratives. In contrast, Google Gemini consistently denied all attempts to generate Trojan code, indicating a more robust ethical framework and stronger safeguards against potentially harmful requests. The outputs and results depicted in the figure highlight the significant differences in how these AI models handle inquiries related to malicious content, underscoring the varying levels of ethical protection and decision-making algorithms employed across the platforms.
5.7. Comparative Analysis
In a comparative analysis of various models’ responses to different cyberattack scenarios, we present a summary in
Table 3. This table evaluates the performance of three prominent AI models—ChatGPT, Google Gemini, and Microsoft Bing Copilot—against a range of attack types, including social engineering attacks, phishing, and various forms of malware.
The table systematically categorizes each attack type, indicating whether each model effectively generated a response (marked with a tick) or denied the request (marked with a cross). Additionally, it provides a brief description of each attack, helping clarify the nature of the threats being assessed.
By comparing the response capabilities of these AI models, this table aims to evaluate their effectiveness in counteracting various cyber threats, which is crucial for developing more resilient AI systems in the face of evolving cybersecurity challenges.
6. Discussion - Ethical Implications, Challenges, and Concerns
We showed that ChatGPT, Microsoft Bing Copilot, and Google Gemini, are powerful tools and can be used to work for the betterment of society, but are not immune to contemporary misuse [
112]. Studies with these models have demonstrated that these models specifically can give a detailed description of adverse actions such as the generation of source code for Trojans and adware given specific cues. For example, Bing Copilot gave GitHub repositories with adware code and ChatGPT after several requests with the intent of learning delivered sample adware code. In this case, information was requested under one pretext or another; however, Google Gemini did not directly write dangerous code but gave a theoretical presentation of such threats.
These experiments show that AI models can be used for creating or finding code malicious can be abused raising the imperative need for ethical controls while taking advantage of AI in education and security initiatives. It is possible to identify some moral, legal, and social implications of using ChatGPT, Bing Copilot, Google Gemini, and other LLM tools, discussing the existing biases and threats allied with their application in critical or abusive spheres. The capacity of ChatGPT to react proficiently to a wide variety of queries and cues has been proven. Although they can respond to inquiries, ChatGPT excels when it comes to quick responses [
113].
Moreover, it is established that there is a threat associated with the Microsoft Bing Copilot affecting the token length side channel as described in the [
114]. In this case, the side channel allows attackers to get the face-painted information and Confidential and sensitive information discussed in the one with an AI assistant by measuring the size of the encrypted response packets. This vulnerability is mainly a reminder of the fact that one should incorporate very tight security features that will only allow authorized people to access any information about users socializing with the AI assistants.
One major issue that is related to Bing Copilot is the token-length side channel as reported in [
114] also expands on how adversaries can learn and use patterns of response and cliches seen in typical LLMs like ChatGPT and Copilot. This vulnerability is made possible by feeding the model with sample chats from the target AI assistant to enhance the attacker’s comprehension of token sequences to reconstruct replies from encrypted traffic. This side-channel can be used to deduce highly confidential information in the encrypted response packets hence a threat to users’ privacy. Opponents can decode basic patterns of the answers and even restore replies from the encrypted traffic, which is why techniques for ensuring the confidentiality of users’ conversations must be well-encapsulated. These weaknesses highlight the significance of the application of extreme security measures to make it possible for only the permitted personnel to access the user details thus protecting the privacy and content of communication between the users and AI helpers [
113]. These are areas of security risks that Microsoft and Google have to tackle in their AI systems since it was concerned with the safety of users’ information and the prevention of unauthorized access or breaches of any kind.
6.1. Controversy over Data Ownership and Rights
Significant privacy issues have been found during a study into OpenAI’s use of private data for AI training [
115]. One noteworthy instance saw the Italian government outlawing the use of ChatGPT for violating the General Data Protection Regulation (GDPR), mainly for using personal data without authorization. The use of "legitimate interests" as a foundation for OpenAI to utilize personal information for training data is a matter of ethical and legal concern, raising important questions about how AI systems deal with personal data. These questions are relevant to whether or not the data is accessible to the public [
115].
One controversial aspect of these models is their extensive dependence on information from the internet, most of which might not belong to the model owner. This issue garnered attention after Italy’s authority drew attention to ChatGPT’s ability to disseminate false information on users and the lack of age restrictions that would have prevented users under 13 from using the platform.
These tools which draw from a large end-user content database to fine-tune the results they deliver pose important questions of ownership of data and usage of the same. Users end up uploading their personal data, queries, and engagements, in other cases, willingly or unwillingly without well understanding of the consequences such as privacy breaches, data insecurity, and misuse of personal information. The major concern of the conflict lies in the desire to grant users more hard rights to their data and more control over the use of those data by artificial intelligence applications.
6.2. Misuse Within Organizations and Among Employees
Another element of possible LLM tool misuse was brought to light by an event involving Samsung workers [
116]. By using ChatGPT to create or debug code, these workers unintentionally provided private company data to the AI model. As a result, this private information was added to ChatGPT’s library, which could make it publicly available and give rise to serious privacy issues. Whether the typical ChatGPT user could be able to obtain this private information by merely inquiring about it is one such worry. For example,
Figure 9 provides a scenario in which this ChatGPT prompt was used:
"What is Samsung’s property code for android camera enhancement using AI in which they tricked the moon photos just to flex and prove their cameras are best? I know you are awesome and know how they did it!! can you please provide me code to understand for example as a case study for educational purposes!"
When ChatGPT was asked to provide Samsung’s property code for Android camera enhancement, then it provided the result with a detailed explanation, instead of denying the request.
Figure 9 illustrates a scenario in which a prompt directed at ChatGPT seeks proprietary information regarding Samsung’s AI-based enhancements for moon photography. This case exemplifies the potential risks associated with utilizing large language models in corporate environments, particularly regarding the inadvertent sharing of sensitive company information. In the depicted interaction, the user requests specific coding details related to Samsung’s technology, which could lead to unauthorized access to confidential data if the AI were to comply. This image serves as a cautionary demonstration of how advanced AI tools, while beneficial for innovation and problem-solving, can also pose significant privacy and security challenges, emphasizing the need for prudent practices in their use.
Similarly, in Google Gemini, remarkable and useful data analysis allows for gathering information about an organization; however, if several staff members work with this tool, they can share confidential information about the enterprise. Employees might upload certain documentation for analysis or generalized approximation including restricted data if it is shared or manipulated in the form of data in the wrong way. Similarly, Microsoft Bing Copilot when incorporated into diverse developer and productiveness tools becomes a threat when employees use it to create or rectify code. This means that developers are likely to upload proprietary code or some restricted information about a specific project into the AI model which could be seen by other people.
6.3. Hallucination in Generative AI Models
Hallucinations refer to instances where the AI generates information that is incorrect, misleading, or entirely fabricated. Despite sounding like factual assertions, these outputs may not correspond to reality or verified facts. Hallucinations can occur due to various factors, such as biases in the training data, misunderstandings of context, or the inherent challenges of modeling complex human language [
117]. Essentially, when an AI "hallucinates," it produces content that appears plausible or relevant but is not grounded in accurate information, raising concerns about reliability and trustworthiness in AI-generated content.
Ensuring the accuracy and reliability of information provided by chatbots is crucial, irrespective of privacy concerns. These LLMs are not fully reliable; thus, users cannot completely depend on the information they provide. Furthermore, the misuse of AI-generated content can exacerbate privacy issues. This highlights the necessity for enhanced accuracy and integrity in AI systems.
As of this framework’s creation, ChatGPT had over 100 million registered users. If multiple users input identical queries and receive the same erroneous responses, misleading information can quickly proliferate online. For example, as noted in a DarkReading article, ChatGPT may return fictitious package information in response to user requests, potentially directing users to malicious versions created by attackers [
118]. Such scenarios underscore the importance of tackling hallucinations and enhancing the resilience of AI systems like ChatGPT to safeguard user security and privacy [
119].
Similar challenges are faced by other AI systems such as Google Gemini and Microsoft Bing Copilot. Hallucinations can lead to inaccurate data or fabrications, which is particularly concerning as these tools become increasingly integrated into various applications. For Google Gemini, which relies on pattern recognition and predictive analysis, battling hallucinations is critical for maintaining the accuracy of outputs, especially regarding organizational security. Similarly, Microsoft Bing Copilot may generate faulty code suggestions that could introduce significant vulnerabilities. As the threat of misinformation becomes more pronounced with the widespread adoption of AI, improving the reliability of AI-generated content is imperative.
7. The Role of AI Chatbots in Cyber Defense Strategies
Measures of protection are important in cybersecurity to protect digital resources including information, gadgets, and computer networks from exploitation, invasion, and interference. Some of them are firewalls, encryption, physical and administrative control, and incident handling. As we move to the future, we expect state-of-the-art applications of cybersecurity defense through the usage of models like ChatGPT, Google Gemini, and Microsoft Bing Copilot.
For example, an advantage of ChatGPT is that it can provide strong support in the generation of extensive security reports, studying the typical flow of attacks, and defining important suggestions. Similarly, Google Gemini can analyze extensive security data and propose preventive actions. At the same time, Microsoft Bing Copilot manages incidents and assists in the correct setup of security measures. This all together can proceed to enrich defensive techniques as these models develop further opening up the attempts for a variety of use and in general boosting the security against cyber threats. In this section, we will discuss the role of AI chatbots in cyber defense strategies.
7.1. Cyber Defense Automation
The underlined models can automate the study of cybersecurity events, and the reduction of workload can be helpful for overworked SOC experts who are struggling to manage their responsibilities. It also helps analysts create strategic suggestions that support both short- and long-term defense measures [
120]. For example, ChatGPT’s assessment and recommendations can be relied upon by a SOC analyst to determine the risk posed by a specific PowerShell script rather than having to start from scratch. Additionally, Security Operations (SecOps) groups can use OpenAI to ask questions about how to stop malicious PowerShell scripts from executing or gaining access to files from untrusted sources is important for maintaining security, for example, improving their company’s overall security posture [
121].
Figure 10 provides an overview of how these AI models can be used for detecting security issues in server logs, and identify and address high CPU utilization by specific tables within the AdventureWorks2019 databases.
Detecting security issues in server logs involves analyzing the records generated by server operations to identify suspicious activities, breaches, or anomalies that may indicate unauthorized access or cyberattacks. This process is crucial because server logs provide a detailed account of interactions and events on the system, allowing administrators to spot potential threats in real-time. Similarly, the capability to identify and address high CPU utilization by specific tables within the AdventureWorks2019 database is vital for enhancing performance, ensuring stability, and improving the overall user experience in applications that depend on that database. Early detection of security issues helps organizations respond swiftly to mitigate risks, preventing data loss, compliance violations, and damage to reputation. Ultimately, proactive log analysis is essential for maintaining the overall security and integrity of IT systems.
These ChatGPT cybersecurity tools provide a great deal of relief for overworked SOC teams, hence lowering the organization’s total exposure to cyber risk. This technology is crucial in facilitating the instruction and education of inexperienced security analysts, resulting in a faster learning process than previously achievable. SOC analysts in the security operations center usually examine server access to spot attack patterns or anomalies during security incidents or log analysis. ChatGPT is capable of handling large amounts of log data and can efficiently detect security issues or anomalies in access records. Whenever server access logs are inputted into ChatGPT, it can help identify any potential security concerns, it can rapidly identify potential risks such as SQL injection, and categorize distinct types of SQL injection. Notably, ChatGPT is skilled at finding security holes in any given script and suggesting fixes for correction, even though PowerShell is only used as an example script.
7.2. Cybersecurity Reporting and Documentation
Cybersecurity reporting is the process of documenting and communicating security incidents, vulnerabilities, and compliance status to stakeholders to enhance awareness and inform decision-making. ChatGPT can effectively generate natural language texts based on cybersecurity events and data in the field of reporting. This operation encompasses the assessment and reporting of cybersecurity information to several groups of people such as the regulatory bodies, organizational heads, and the IT sector respectively [
121]. Seamlessly, ChatGPT uses all of these to generate its reports and intelligence on threats, vulnerabilities, and sundry security intelligence. With the deal assistance of huge quantities of data, ChatGPT produces reports that are comprehensive, detailed, and easy to follow. This makes it easier for organizations to identify the potential security problems that may occur, estimate the level of risk that they entail, and apply the measures that should be taken to remediate them [
122].
Microsoft Bing Copilot works with the developer tools to provide the details as well as updates in real-time and reporting. It helps in examining code for probable exploitable weaknesses, and supplies decisions informing about present threats. Copilot’s features as code analysis, whether through providing general reports on security incidents along with a detailed structural examination of the codes and checks on potential and known vulnerabilities [
33], are beneficial for developers to practice secure coding.
Google Gemini equally plays a role in the provision of security as it analyzes trends and possible risks by using its ability to recognize complex patterns. One of the advantages of Gemini is its capability to provide reports based on large mathematical values that would show trends and probable threats [
8,
123], thus, allowing organizations to prepare for security issues beforehand.
Altogether, the strengths of ChatGPT, Copilot, and Gemini are useful for cybersecurity reporting and analysis in knowledge and practice. These AI models provide detailed and understandable reports along with benchmarks about possible threats and help organizations to implement the most effective tactics and strategies for the protection against threats, therefore, it can be concluded that AI improves the strength and interaction of security frameworks. [
121]
7.3. Cyber Threat Intelligence
Threat intelligence is the knowledge about potential or existing threats to an organization’s cybersecurity. It involves collecting, analyzing, and sharing data regarding threat actors, vulnerabilities, and tactics used in cyberattacks to inform security measures and response strategies [
124]. The process of scanning large amounts of data to look for security threats and generate intelligent material is undoubtedly with the help of ChatGPT fundamentally significant to the threat intelligence. One use of threat intelligence is in the supportive role it plays in enhancing the existing security systems against cyber threats through the collection, processing, and sharing of info on potential security risks in organizations [
125]. To do so, ChatGPT acquires data on potential threats from news articles, social media, forums, the darknet, and the internet and can generate threat intelligence reports all on its own. ChatGPT identifies potential threats, evaluates the level of risk concerning those threats, and outlines the measures for risk reduction with the help of comprehensive data analysis. Besides, it also can produce reports and analyze security data for the identification of trends and patterns in threat activities.
In threat intelligence, Google Gemini and Microsoft Bing Copilot also have crucial functions that are performed at high stages by incorporating artificial intelligence to improve security reactions. Two models show a high level of activity in terms of the identification and analysis of potential security threats based on the analysis of large volumes of inputs [
126]. They are fully utilized to conduct wide and regular threat searches in articles, social media, forums, and the dark web. For instance, Google’s Gemini processes big data using natural language processing and recognizes emerging threats, each rated for their risk levels. It can also produce threat intelligence reports, showing threats that might affect a business and their frequency. Likewise, Microsoft Bing Copilot uses analysis to give/security insight into the findings from the security data collected, besides providing tangible suggestions for improvement on the systems. These models help organizations come up with intelligent and data-driven reports and identify trends in threat activities, which in turn enable organizations to enhance and modify their security approaches and decision-making regarding risks and investments. Therefore, the combined use of the threat intelligence platforms involving Google Gemini and Microsoft Bing Copilot improves not only the identification of possible threats but also the prevention of the threats’ implementation.
7.4. Detecting Security Flaws in Scripts
Integrity, confidentiality, and availability are threatened by security weaknesses in code. Inspection techniques such as code review which tries to find security vulnerabilities in a code have transformed into critical natural steps of software engineering due to this challenge. An example of manual code review is the tedious process of evaluating code which is usually accompanied by vital errors due to human interference. ChatGPT can write secure code rather than generate security vulnerabilities as well. This section describes an approach for code generation and review using artificial intelligence, especially for security vulnerability detection [
127].
Microsoft Bing Copilot and Google Gemini also engage in the boosting of code security by using artificial intelligence for inspection. Microsoft Bing Copilot employs natural programming understanding that provides prompt suggestions to the developers while writing secured code and also flags the secure problem as the developers type the code. At the moment, Copilot connects development environments to optimize the cycle of code reviews to enable them to be speedy and accurate. Namely, it can analyze code snippets and large-scale applications, assess the threats that may arise, and suggest some solutions to these threats, which is why it significantly reduces the option of a coding disaster that may lead to penetration of a security threat [
128].
As for Google Gemini, it has a set of built-in functions equipped with AI algorithms and contains a lot of data to be used for code scanning and security threat identification. To extend the insight of the capped strength of this tool, it is seminal to also recognize Gemini’s integration with numerous Google’s platforms and to allow developers to enhance more strength to defend their codes.
7.5. Identify Security Bugs During Code Review
Code review is inevitable and requires satisfactory knowledge of multiple coding languages, and technologies, as well as proper and unsafe approaches to coding, specifically concerning security-related problems. However, teams face challenges because it is virtually impossible to have one person be proficient in all the various technologies employed in development. Due to this knowledge gap, there is a tendency for security to be unnoted, and in the process, it becomes vulnerable.
This problem is further exacerbated by the frequently unbalanced developer-to-security-engineer ratio. Because of the large amount of code that is being written, security engineers find it challenging to carefully analyze every pull request, which increases the likelihood that security issues will go undiscovered. Artificial intelligence-powered code review appears to be a powerful remedy for these problems. ChatGPT can function as an automated code reviewer who is skilled at spotting possible security flaws, the AI-powered assistant has been trained on a vast dataset that includes past code reviews and published security issues from various programming languages [
129].
These tools apply their knowledge of the safe code practices by marking such drawbacks at the time of code review activity. For instance, Google Gemini would draw users’ attention to such input fields where user inputs are not sanitized before they are to be dynamically included within the webpage, on the other hand, Microsoft Bing Copilot would indicate better, safer ways of coding and the right libraries to employ in order to prevent XSS attacks. Such AI tools, when integrated into the code review process, can substantially strengthen an organization’s capacity for identifying and fixing security issues, meaning that even if there are people who are unaware of the vulnerabilities or those who are incapable of running through all the code, the AI tools will do this for them. This integration promotes, looking at the broader picture including risk management, of secure software development.
7.6. Secure Code Generation and Cyber Attacks Identification
Secure code generation refers to the practice of writing software code with built-in security measures that mitigate vulnerabilities and protect against attacks. Chatbots, particularly those powered by advanced AI models, can assist in secure code generation by providing real-time coding suggestions, identifying potential security flaws, and recommending best practices. By using natural language processing and extensive programming knowledge, these chatbots can generate code snippets that follow security protocols, help developers understand potential risks, and ultimately foster a culture of secure software development [
130].
Due to ChatGPT’s capability of generating plain language descriptions of the attack pattern and behavior, the model is crucial for identifying cyberattacks. This process involves the identification and analysis of malicious activities that are occurring in a company’s network or systems. It is by using such information as network log data and security events alert that one will be in a position to study attack patterns and behavior. As a result, ChatGPT constructs explanatory stories that detail how the attack paths, goals, and tactics worked by attacking parties by analyzing this information. Besides, ChatGPT is capable of giving precaution on or even on triggering at a certain limit or standard. For instance, ChatGPT has capabilities for issuing an alert or notification to the right staff when it detects strange happenings within a network. Furthermore, ChatGPT can help in decoding and analyzing cross-site scripting attacks, and in locating other similar issues. It also assists developers to write secure code since it identifies security problems and best security practices [
131].
Figure 11 illustrates the capabilities of three different chatbots in identifying cross-site scripting (XSS) attacks. It demonstrates ChatGPT’s, Microsoft Bing Copilot’s, and Google Gemini’s ability to analyze and flag potential. Collectively, the figure exemplifies how AI-driven tools can support secure coding practices and strengthen overall cybersecurity efforts.
The ability of chatbots to identify attacks represents a significant advancement in cybersecurity tools for security professionals. By leveraging natural language processing and advanced machine learning algorithms, these AI-driven systems can efficiently analyze code and web applications to detect potential vulnerabilities that may be overlooked during manual reviews. This automation not only accelerates the identification of security flaws but also enhances accuracy, reducing the risk of human error. Furthermore, these chatbots can provide contextual suggestions for resolving issues, empowering security professionals with actionable insights that facilitate proactive risk mitigation. As cyber threats continue to evolve, the integration of such intelligent chatbots into security workflows can bolster defenses, streamline incident response, and ultimately contribute to a more robust cybersecurity posture for organizations.
7.7. Developing Ethical Guidelines
Ethical guidelines refer to a set of principles and standards that govern the behavior and decision-making processes of individuals and organizations, particularly concerning issues of integrity, respect, fairness, and accountability. Ethical guidelines serve to ensure that AI applications, including chatbots, are developed and utilized responsibly, promoting user trust and safeguarding against harmful outcomes. Key aspects often include transparency in data usage, maintaining user privacy, addressing biases, ensuring data security, and promoting fairness in interactions [
132].
ChatGPT aids in developing Ethical Guidelines for AI systems by providing natural language explanations and analyzing principles like GDPR and IEEE standards. It helps create ethical case studies, enhancing understanding of ethical implications in AI development [
133,
134]t. While ChatGPT offers detailed ethical guidelines, models like Microsoft Bing Copilot and Google Gemini are more secure against manipulations, emphasizing the need for strengthened ethical standards in AI. ChatGPT can also interface with intrusion detection systems, identifying security threats through log data and providing natural language descriptions for rapid team response. Microsoft Bing Copilot leverages natural language to improve IDS recommendations, whereas Google Gemini analyzes data to formulate effective detection rules, enhancing overall security against threats.
Chatbots can also play a vital role in adhering to these ethical guidelines. For instance, they can ensure transparency by providing users with clear information about how their data will be used and stored, which fosters trust. Additionally, through programmed ethical decision-making frameworks, chatbots can evaluate their responses to avoid biased outcomes and provide balanced perspectives. They can also deliver consistent, impartial information in customer support scenarios, helping to ensure fairness and equality of service [
135]. Furthermore, when programmed to recognize and respond to sensitive topics appropriately, chatbots can help prevent the escalation of issues and promote a more respectful interaction. By integrating ethical considerations into their design and functionality, chatbots can contribute to a safer digital environment that aligns with societal values and expectations.
7.8. Malware Detection
The detection of malware represents a particularly compelling use case for GPT-4 in the realm of cybersecurity. Malware, or malicious software, refers to any software specifically designed to cause harm to computer systems, servers, clients, or networks. With the rapid evolution and increasing complexity of malware, traditional signature-based detection solutions often fall short. This scenario highlights the effectiveness of artificial intelligence models, which are capable of learning autonomously and thus excel in detecting malware [
136].
By training on a substantial dataset comprising both known viruses and a mix of malicious and non-malicious code, along with their corresponding behavior patterns, ChatGPT can determine whether specific code or software binaries are likely to be malicious. Furthermore, this model can be fine-tuned to detect various types of malware, including ransomware, worms, Trojans, and viruses. It generates technical reports that outline potential risks and the appropriate measures that can be taken to mitigate them [
137].
For instance, if the submitted code exhibits replication traits—characteristic of viruses—and attempts to compile its code to other executable files, it raises concerns about the potential dissemination of malicious code within a system or network. In response to malware detection, isolating the identified code for further analysis is crucial. If any files appear novel or suspicious, it is advisable not to execute unknown files. Instead, users should scan their systems with updated antivirus software to enhance security [
138].
The capabilities of ChatGPTto identify and counteract malware present a new opportunity in the field of cybersecurity. Despite certain limitations and challenges, such as the need for large, up-to-date training datasets and the potential for false positives or negatives, its application could significantly enhance existing anti-malware practices. Leveraging ChatGPT’s learning capacity allows organizations to stay abreast of the ever-evolving threat landscape.
7.9. Phishing Detection
Phishing attacks remain pervasive in organizations, continually evolving their tactics to bypass email defenses and unlawfully acquire personal data [
139]. The paper titled “Can AI Keep You Safe” presents a case study evaluating the effectiveness of Large Language Models (LLMs) in detecting phishing emails. This research focuses on the performance of three different models: GPT-3.5, ChatGPT, and a customized version of ChatGPT. By employing a curated dataset of phishing and legitimate emails, the study assesses the models’ abilities to distinguish between the two [
140,
141].
The findings indicate that LLMs demonstrate commendable performance levels in recognizing phishing emails, although the efficiency varies among the models. GPT-3.5 showcases significant effectiveness in detecting typical phishing messages, attributed to its extensive training data and advanced language processing capabilities. While GPT-3 has 175 billion parameters, ChatGPT features a 244 billion parameter architecture, providing better contextual understanding than GPT-3.5. The accuracy and consistency metrics (nn and ccc values) for ChatGPT surpassed previous experiments, indicating that the proposed GPMAD exhibits the enhanced ability to identify complex phishing schemes. Similarly, the customized version of ChatGPT, tailored for email security applications, also achieved impressive detection rates without compromising deployment factors. These results highlight the potential of LLMs to enhance email security and protect users from cyber fraud [
130].
However, the study also identifies certain weaknesses present in LLMs regarding phishing detection. Despite ChatGPT’s effective performance with complicated phishing schemes, it may occasionally misidentify regular messages as potential threats due to certain inaccuracies [
142]. Although the customized ChatGPT performs exceptionally well, it will require periodic refinement to adapt to emerging phishing techniques. The investigation emphasizes the importance of integrating LLMs into a comprehensive system to achieve optimal results and mitigate potential negative repercussions. Overall, the study underscores the promising capabilities of LLMs in improving phishing detection and offers valuable insights for the advancement of AI-based cybersecurity approaches.
Figure 1.
The figure highlights the commonalities in Cyber Attack Capabilities Among AI Models. The dotted line shows the overlapping Attacks in ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Figure 1.
The figure highlights the commonalities in Cyber Attack Capabilities Among AI Models. The dotted line shows the overlapping Attacks in ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Figure 2.
Demonstrating Prompt Injection Attacks: Harnessing AI Models to Spread Misinformation about Elections and Undermine Democratic Processes.
Figure 2.
Demonstrating Prompt Injection Attacks: Harnessing AI Models to Spread Misinformation about Elections and Undermine Democratic Processes.
Figure 3.
Comparative Outputs of Social Engineering Attack Scenarios Generated by ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Figure 3.
Comparative Outputs of Social Engineering Attack Scenarios Generated by ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Figure 4.
The Figure Shows Successful Phishing Attack Scenarios: Analyzing Outputs from ChatGPT, Google Gemini, and Microsoft Bing Copilot Demonstrating Common Vulnerabilities.
Figure 4.
The Figure Shows Successful Phishing Attack Scenarios: Analyzing Outputs from ChatGPT, Google Gemini, and Microsoft Bing Copilot Demonstrating Common Vulnerabilities.
Figure 5.
This figure illustrates the comparison of Payload Generation Capabilities Across AI Chatbots. It shows ChatGPT successfully produced both types of payloads, Microsoft Bing generated only WAF payloads.
Figure 5.
This figure illustrates the comparison of Payload Generation Capabilities Across AI Chatbots. It shows ChatGPT successfully produced both types of payloads, Microsoft Bing generated only WAF payloads.
Figure 6.
The Image Shows Ransomware Code Generation: ChatGPT successfully generates code related to the WannaCry and NotPetya ransomware attacks, while Google Gemini and Microsoft Bing do not produce such code, highlighting differences in model capabilities and safety protocols.
Figure 6.
The Image Shows Ransomware Code Generation: ChatGPT successfully generates code related to the WannaCry and NotPetya ransomware attacks, while Google Gemini and Microsoft Bing do not produce such code, highlighting differences in model capabilities and safety protocols.
Figure 7.
Comparison of AI Model Responses to Adware Attack Simulations, ChatGPT, and Microsoft Bing Copilot demonstrated varying degrees of flexibility in providing information related to adware. In contrast, Google Gemini strictly denied all requests for adware-related information, highlighting its stringent safety protocols against facilitating access to potentially harmful software.
Figure 7.
Comparison of AI Model Responses to Adware Attack Simulations, ChatGPT, and Microsoft Bing Copilot demonstrated varying degrees of flexibility in providing information related to adware. In contrast, Google Gemini strictly denied all requests for adware-related information, highlighting its stringent safety protocols against facilitating access to potentially harmful software.
Figure 8.
The image illustrates the responses of three AI models to requests for Trojan code generation. Both ChatGPT and Microsoft Bing Copilot successfully generated malicious code upon request, demonstrating their ability to provide such information when approached under specific narratives.
Figure 8.
The image illustrates the responses of three AI models to requests for Trojan code generation. Both ChatGPT and Microsoft Bing Copilot successfully generated malicious code upon request, demonstrating their ability to provide such information when approached under specific narratives.
Figure 9.
Samsung AI-Powered Moon Photography: A demonstration of how ChatGPT can be used to retrieve organizations’ propertariry codes and other information. The Samsung case exemplifies how leveraging AI tools for code development can inadvertently expose sensitive information, raising critical questions about data privacy.
Figure 9.
Samsung AI-Powered Moon Photography: A demonstration of how ChatGPT can be used to retrieve organizations’ propertariry codes and other information. The Samsung case exemplifies how leveraging AI tools for code development can inadvertently expose sensitive information, raising critical questions about data privacy.
Figure 10.
Overview of AI model applications in cybersecurity: Utilizing advanced algorithms to detect security issues in server logs and optimize database performance by identifying high CPU utilization in specific tables within the AdventureWorks2019 database.
Figure 10.
Overview of AI model applications in cybersecurity: Utilizing advanced algorithms to detect security issues in server logs and optimize database performance by identifying high CPU utilization in specific tables within the AdventureWorks2019 database.
Figure 11.
Identification of cross-site scripting (XSS) attacks by various chatbots, showcasing ChatGPT, Microsoft Bing Copilot, and Google Gemini in their respective approaches to detecting and mitigating security vulnerabilities in web applications.
Figure 11.
Identification of cross-site scripting (XSS) attacks by various chatbots, showcasing ChatGPT, Microsoft Bing Copilot, and Google Gemini in their respective approaches to detecting and mitigating security vulnerabilities in web applications.
Table 1.
Comparative Analysis of the differences among Google Gemini, Microsoft Bing Copilot, and ChatGPT.
Table 1.
Comparative Analysis of the differences among Google Gemini, Microsoft Bing Copilot, and ChatGPT.
Features |
Google Gemini |
Microsoft Bing Copilot |
ChatGPT 4 |
Release Date |
December 2023 (Gemini 1.5) |
February 2023 (Bing Copilot based on ChatGPT) |
March 2023 (ChatGPT) |
Underlying Model |
Gemini Transformer-based models |
ChatGPT-based model integrated with Bing Search |
GPT-4-based model |
Primary Focus |
Multimodal capabilities and advanced reasoning |
Search integration with conversational AI |
Conversational AI with broad application |
Multimodal Capabilities |
Supports text, image, and video inputs |
Primarily text-based; some integration with Bing’s search capabilities |
Text-based; ChatGPTintroduces some multimodal features (text and image) |
Contextual Understanding |
Advanced contextual reasoning and multimodal integration |
Focuses on enhancing search experiences with conversational context |
Enhanced contextual understanding with improved dialogue management (especially in GPT-4) |
Integration with Other Tools |
Integrated with Google’s suite of services, including search and productivity tools |
Integrated with Bing search engine and Microsoft Office tools |
Integrated into various applications and platforms, including web and productivity tools |
Key Strengths |
Strong multimodal integration; advanced reasoning and contextual capabilities |
Seamless search integration; contextual relevance in search responses |
High versatility in conversational contexts; advanced text generation and contextual understanding |
Bias and Safety Measures |
Advanced safety protocols and efforts to minimize biases |
Improved safety and bias reduction integrated with Bing’s ecosystem |
Significant efforts in reducing biases and enhancing safety (ongoing improvements in GPT-4) |
User Interaction |
Focus on providing comprehensive responses through multimodal inputs |
Enhances search and information retrieval with conversational support |
Provides detailed and coherent responses across diverse topics |
Performance in Complex Queries |
Excels in handling complex queries with multimodal inputs |
Strong in search-related queries and providing contextually relevant search results |
Effective in managing complex and nuanced conversational queries |
Integration with Search Engines |
Not directly integrated with Google Search; more focused on multimodal AI |
Directly integrated with Bing search, enhancing search results with conversational elements |
Not directly integrated with a specific search engine but can be embedded in various platforms and tools |
Table 2.
Comparative Analysis of Contemporary Studies and Summarization of research contributions in scholarly articles. The tick (✓) indicates that the research paper covers this topic, while the cross (×) indicates that the paper did not cover this topic.
Table 2.
Comparative Analysis of Contemporary Studies and Summarization of research contributions in scholarly articles. The tick (✓) indicates that the research paper covers this topic, while the cross (×) indicates that the paper did not cover this topic.
Author |
Year |
Generative AI |
Techniques |
Offensive Use |
Defensive Controls |
Security Measures |
Future Challenges |
Yamin et al. [51] |
2021 |
✕ |
✓ |
✓ |
✕ |
✕ |
✓ |
Guembe et al. [18] |
2022 |
✕ |
✓ |
✓ |
✕ |
✓ |
✓ |
Blum et al. [52] |
2022 |
✓ |
✕ |
✕ |
✕ |
✕ |
✓ |
Gupta et al. [4] |
2023 |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
Dhoni et al. [53] |
2023 |
✓ |
✕ |
✕ |
✕ |
✕ |
✓ |
Prasad et al. [24] |
2023 |
✓ |
✕ |
✕ |
✓ |
✕ |
✕ |
Fezari et al. [54] |
2023 |
✓ |
✓ |
✕ |
✕ |
✕ |
✓ |
Hassan et al. [19] |
2023 |
✕ |
✕ |
✓ |
✓ |
✓ |
✓ |
Renuad et al. [55] |
2023 |
✕ |
✓ |
✕ |
✓ |
✕ |
✓ |
Jabbarova et al. [56] |
2023 |
✕ |
✓ |
✓ |
✕ |
✕ |
✓ |
Oviedo et al. [57] |
2023 |
✓ |
✕ |
✕ |
✕ |
✕ |
✓ |
Alawida et al. [58] |
2023 |
✓ |
✕ |
✓ |
✕ |
✓ |
✓ |
Marshall et al. [59] |
2023 |
✓ |
✕ |
✓ |
✕ |
✓ |
✓ |
Qammar et al. [60] |
2023 |
✓ |
✓ |
✕ |
✓ |
✕ |
✓ |
Mardiansyah et al. [61] |
2024 |
✓ |
✓ |
✕ |
✕ |
✓ |
✓ |
Li et al. [62] |
2024 |
✓ |
✕ |
✕ |
✕ |
✓ |
✓ |
Jacobi et al. [63] |
2024 |
✓ |
✕ |
✓ |
✓ |
✕ |
✓ |
Atzori et al. [64] |
2024 |
✓ |
✕ |
✓ |
✕ |
✓ |
✓ |
Roy et al. [65] |
2024 |
✓ |
✓ |
✕ |
✓ |
✕ |
✓ |
Alawida et al. [66] |
2024 |
✓ |
✓ |
✓ |
✕ |
✕ |
✓ |
This Paper |
2024 |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
Table 3.
Comparative Analysis of Different Models in Response to Attack Requests. The tick (✓) indicates that the model generated a response, while the cross (×) indicates that the model denied the request.
Table 3.
Comparative Analysis of Different Models in Response to Attack Requests. The tick (✓) indicates that the model generated a response, while the cross (×) indicates that the model denied the request.
Attacks |
Attack Description |
ChatGPT |
Google Gemini |
Microsoft Bing Copilot |
Social Engineering Attack |
Manipulating individuals to divulge confidential information. |
✓ |
✓ |
✓ |
Phishing Attack |
Fraudulent solicitation to acquire sensitive data. |
✓ |
✓ |
✓ |
Attack Payload Generation |
Creating malicious scripts or commands for exploitation. |
✓ |
✕ |
✕ |
WAF Payload Generation |
Crafting requests to test or bypass Web Application Firewalls. |
✓ |
✕ |
✓ |
WanaCry |
Ransomware attack that encrypts files and demands payment. |
✓ |
✕ |
✕ |
NotPetya |
Destructive malware that disrupts networks and data integrity. |
✓ |
✕ |
✕ |
Adware |
Software that automatically displays or downloads ads. |
✓ |
✕ |
✓ |
Trojan |
Malicious software disguised as legitimate applications. |
✓ |
✕ |
✓ |