Preprint
Article

Emerging Cybersecurity and Privacy Threats of ChatGPT, Gemini, and Copilot: Current Trends, Challenges, and Future Directions

Submitted:

23 October 2024

Posted:

24 October 2024

You are already at the latest version

Abstract
Generative AI chatbots have emerged as a significant scientific contribution. They can produce text, images, audio, and video, and their applications are vast and varied in every field. However, it is identified that these chatbots can be used by script kiddies and malicious actors to generate phishing emails, security exploits, and executable payloads that can present notable risks and challenges for cybersecurity. This research provides an extensive review of the current status of the three most popular generative AI chatbots which are ChatGPT, Google Gemini, and Microsoft Bing. The article further reviews the different ways in which an attacker can intentionally use a chatbot for malicious activities and intensify cyber attacks. Moreover, the article provides insights on how an attacker uses certain keyword queries to manipulate chatbot behavior and tricks it into generating content that I won’t otherwise. We also explored the role of AI chatbots in enhancing cyber resilience against sophisticated attacks and their applications in detecting and mitigating security incidents. Finally, we proposed strategies that can empower modern chatbots to defend against such threats so that an attacker will not be able to bypass their content filtering mechanisms.
Keywords: 
Subject: 
Computer Science and Mathematics  -   Security Systems

1. Introduction

AI and ML models have evolved tremendously over the recent past, bringing new paradigms and possibilities to the digital environment. Such a paradigm is generative AI, which is a new type of AI and ML [1,2]. This form of AI concerns the ability of an appliance to write text, draw images, or produce audio based on the commands given by a user. We have found in the history originating from the 1950s that the Hidden Markov Model has been worked out and the Gaussian Mixture Model has also been worked out [1,3], these models have become workable due to the advancement of deep learning [4]. The oldest model of the generation is the N-gram, which works on learning word distribution, the latest technology is the transformed architecture of existing models, which is applied to the latest chatbots available today. Nowadays, a prime example of emerging generative AI chatbots are Google Gemini, Microsoft Bing Copilot, formerly known as Bing Chat, and ChatGPT [5,6,7].
Google’s Gemini, which is inherently multimodal, has been incorporated into a variety of Google’s products [8]. The moniker ’Gemini’ is inspired by the astrological sign that symbolizes duality and adaptability, as well as NASA’s Project Gemini, which stands as a testament to monumental efforts and a gateway to greater accomplishments. The development of Gemini is a result of the collaborative efforts between DeepMind and Google Research, epitomizing their shared aspirations and innovative spirit [9]. Gemini models are multimodal, which implies that they can both process and create text, images, sound, videos, and code [10,11]. They are passed through large numbers of data and utilize neural net approaches like question answering, text generation, and others to output their results. Gemini models can also handle long context sequences of diverse data types in contextual representation learning, for instance, in sequence-to-sequence variants [12].
Likewise, Microsoft Bing Copilot also known as Bing Chat is another AI agent that can accomplish a lot of functions [13]. It is a way to surf the Web, a valuable ally for turning a thought into a new website, a piece of art, a blog, or an article. The Copilot is capable of participating in conversations and giving elaborate responses; it is capable of producing multiple kinds of products such as text, images, and code [14].
ChatGPT is created by OpenAI which is capable of responding to further questions and conversing [15]. ChatGPT was primarily founded on a family of Large Language Models (LLMs), with the current ChatGPT. Subsequently, there have been subsequent changes that have been made in the ChatGPT technology. An advanced feature, memory, was incorporated into ChatGPT in February of 2024 and this made subsequent conversations more interesting because it retained things discussed in all the chats [16].
The presence of these three models has enormously impacted the generative AI competition. Since the introduction of the new-targeted weapon systems, the change in the ratio of upcoming cyber-attacks has been significant. We are in the period of AI [17], AI and ML have transformed the digital space by developing machine learning models rapidly within recent years. These three models give the users a new angle to change or allow the inputs or the commands to be utilized anytime and in any way the users wish to [5].
Figure 1 illustrates the commonalities in attack capabilities among three prominent AI models: ChatGPT, Google Gemini, and Microsoft Bing Copilot. The diagram provides a clear visual representation of the shared vulnerabilities these systems may face, including Social Engineering Attacks, Phishing Attacks, and Adware.
The motivation behind this research is digital safety, as generative AI innovations empower both defenders and attackers. These models can be utilized by cyber defenders and cybercriminals alike to create a vast array of cyber threats, vulnerabilities, and attack patterns [4,18], posing significant risks to cybersecurity. They aid cybersecurity professionals in combating threats, identifying vulnerabilities, and implementing defensive controls. Recently, there has been a surge in AI-powered cyber attacks [19,20]. Conversely, these models may assist hackers and cybercriminals in generating social engineering attacks, payloads, creating exploits, and identifying vulnerabilities [21,22]. Attackers can employ techniques such as jailbreaks and reverse psychology positive reinforcement to trick models into bypassing security controls implemented by companies to ensure ethical guidelines in content generation [23,24]. The key contributions of this research are:
1.
The article provides an extensive review of the current status and capabilities of the three most popular generative AI chatbots—ChatGPT, Google Gemini, and Microsoft Bing—highlighting their potential applications and inherent risks in cybersecurity and privacy.
2.
It examines the various ways in which malicious actors can exploit generative AI chatbots to generate phishing emails, security exploits, and executable payloads, thereby intensifying cyber-attacks.
3.
It proposes advanced content filtering mechanisms and defense strategies to empower generative AI chatbots to defend against malicious exploitation, ensuring enhanced security and privacy protection.
This taxonomy provides a comprehensive overview of the research contributions made in this article, offering readers a clear understanding of the scope and depth of our study on the impact of Generative AI Chatbots on cybersecurity.

2. Background

The purpose of the Background section is to provide context and foundational information that helps readers understand the significance and scope of the three different models that we are going to use in the rest of the study. This section sets the stage for the rest of the paper, ensuring that readers have the necessary background to benefit from the research findings and contributions. Table 1 provides an overview of these three generative AI models in detail, also details of these models are added in subsequent sections.

2.1. Google Gemini

Google Gemini is a generative AI tool introduced on December 6, 2023, by Google DeepMind. It is considered a fifth-generation multimodal AI, positioning it as a strong competitor. The Gemini family includes three variants: Gemini Nano, Gemini Pro, and Gemini Ultra, each designed to meet different user needs and application requirements [25,26].
Gemini’s primary strength lies in its ability to process and generate content across multiple data formats, including text, images, audio, PDFs, and videos [9]. This versatility makes Gemini particularly suitable for various applications, especially in educational technology, where it can provide comprehensive and valuable responses [8]. Additionally, Gemini excels in tasks such as textual manipulation, programming language computation, question comprehension, and even code development. These capabilities highlight its superiority over other AI models and its effectiveness in solving complex problems and generating diverse content [27].
The multimodal functionality of Gemini, which allows it to work with text, images, audio files, PDFs, and videos, is crucial for our research as it enables the integration of data from various formats to create a comprehensive analysis [28]. The model performs well in text analysis, programming assistance, logical reasoning, and code generation, making it suitable for organizing queries by complexity and hierarchy. Reviewing its performance in both simple mobile and complex environments provides a general understanding of the model’s strengths and weaknesses. Therefore, Gemini is one of the models we will use in subsequent sections of our study [29].

2.2. Microsoft Bing Copilot

Microsoft Bing Copilot is an advanced conversational AI, integral to the next generation of integrated search with Bing. Released for public use in April 2023, Bing Copilot represents a significant leap for Microsoft in modern search and AI [30]. As of mid-2024, it incorporates updates from the latest Microsoft AI models, which are continuously refined.
Bing Copilot facilitates seamless conversations, allowing users to interact naturally rather than typing keywords, as is typical with most search engines. It is particularly useful for complex procedures or when specific dates are required [31]. Bing Copilot is deeply integrated with Microsoft’s ecosystem, including Microsoft Office and Windows-based computers. This integration enhances user convenience and efficiency, enabling real-time analysis to be seamlessly transferred to other Microsoft products, thereby preserving the cohesion of a digital system [32].
At its core, Bing Copilot utilizes machine learning and natural language processing technologies, enabling it to address a wide range of queries with diverse and qualitative responses. The continuous advancement of these technologies ensures that Bing Copilot remains at the forefront of artificial intelligence innovations [33]. Bing Copilot is designed to learn and optimize itself in real-time, improving its responses and learning patterns based on user feedback. This dynamic learning capability ensures it stays current with user needs and evolving trends, enhancing its applicability and efficiency [34].
Microsoft Bing Copilot was selected for this study due to its sophisticated conversational capabilities and its integration within the Microsoft ecosystem [14]. Its conversational AI enhances search experiences, making it suitable for evaluating performance in interaction-oriented tasks. The combination of Bing Copilot with the Bing search engine and Microsoft Office tools provides a comprehensive view of AI’s practical applications in everyday work [32]. The adaptive learning features of Bing Copilot offer valuable insights for analyzing contemporary AI models. Its ability to handle search-related inquiries and deliver contextually relevant results makes it ideal for studying AI’s dynamic performance in real-world settings. The model’s up-to-date approaches and its role within a system of interactive applications make it an excellent subject for analyzing the functionality of AI tools and platforms in practical environments [35].

2.3. OpenAI ChatGPT

ChatGPT, developed by OpenAI, has set a new standard in conversational AI. It is based on the Generative Pre-trained Transformer (GPT) architecture, which leverages deep learning to generate and understand natural language [36]. Initially introduced in November 2022 with GPT-3.5, ChatGPT has since evolved to the latest version, GPT-4, as of early 2024 [37].
ChatGPTintroduces multimodal capabilities, allowing it to process both text and images to generate and comprehend content [38]. This advancement extends the model’s functionality beyond traditional human-computer interactions, enhancing its ability to engage with users in various contexts [39]. Its dialog capabilities are notable for its contextual understanding and articulate responses, enabling it to maintain relevant and coherent conversations [40]. ChatGPT’s flexibility allows it to perform a wide range of tasks, from content creation to solving complex problems, thanks to its training on diverse datasets and its adaptability to the context of the conversation or user needs [41].
ChatGPT was chosen for this study due to its ability to analyze the latest advancements in conversational AI and apply them across different fields [42]. GPT-4’s improved contextual engagement and dialogue management makes it ideal for in-depth modeling of interaction characteristics and performance in handling complex requests. OpenAI’s commitment to minimizing biases and enhancing safety in ChatGPTaligns well with the ethical considerations addressed by ChatGPT [39].

2.4. What is Content Filtering?

Content filtering in chatbots is a critical mechanism designed to detect and prevent the generation or dissemination of harmful, inappropriate, or sensitive content [43]. This process is essential for maintaining a safe and respectful user experience, ensuring compliance with legal and ethical standards, and protecting the brand’s reputation. By implementing robust content filtering systems, chatbots can effectively manage the content they produce and respond to, thereby safeguarding users from potential harm [44].
The functionality of content filtering involves several sophisticated methods. Keyword filtering is one of the primary techniques, where specific words or phrases identified as inappropriate or harmful are blocked. Sentiment analysis further enhances this process by analyzing the sentiment of the content to detect potentially offensive language [45]. Advanced machine learning models are also employed to classify and filter content based on predefined categories such as hate speech, violence, sexual content, and self-harm. Additionally, blacklists and whitelists are maintained to control the flow of content by specifying prohibited and allowed terms or phrases [46]. User reporting mechanisms enable users to report inappropriate content, which can then be reviewed and filtered out by human moderators. Real-time monitoring ensures continuous oversight of interactions, allowing for the immediate detection and mitigation of harmful content [47].
The importance of content filtering in chatbots cannot be overstated. It plays a vital role in protecting users from exposure to harmful or offensive content, thereby ensuring a safe interaction environment [48]. Legal compliance is another significant aspect, as content filtering helps adhere to regulations and standards regarding content moderation and user protection. From a brand perspective, maintaining the integrity and reputation of the brand is crucial, and content filtering prevents the dissemination of inappropriate content that could damage the brand’s image [49]. Ethically, it is the responsibility of developers to ensure that chatbots operate within ethical guidelines, minimizing biases and preventing misuse. Furthermore, content filtering enhances the overall user experience by providing a more pleasant and respectful interaction, which in turn increases user trust and engagement [50].

3. Literature Review

This section presents an overview of current research on using generative AI to create attacks. The details of the above studies are presented in the subsequent sections. Table 2 provides a comparative Analysis of Contemporary Studies and a summarization of research contributions in scholarly articles. This structure helps to address the general theme of generative AI in the context of cybersecurity and judge the processes, outcomes, and backup plans of these methods. This section involves different forms of threats that are explained with insight into the possibilities and motives of the attackers, as well as the different areas of threats which range from social engineering, phishing attacks, and generation of malware code [51]. The work also reveals the measures and products applied in the struggle against cyber threats and the protection of property. Its purpose is to make the results relevant to further research programs, social policy objectives, and adequate cybersecurity strategies.
Yamin et al.[51] explored the duality of AI in defensive and offensive operations. The paper discusses recent instances of generative AI attacks, like messing with traffic signals and manipulating medical images. It talks about the dangers of using AI as a weapon, offers ways to deal with these issues, and suggests control rules. Guembe et al. [18] highlight how AI is being misused in cyberattacks. The paper provides insights into the negative impact of these attacks and discusses current research on the subject. The results reveal that 56% of AI-driven attacks focus on gaining access and penetrating systems, with CNN being a common technique.
Blum et al.[52] discuss the inner workings of the programs in ChatGPT. Researchers tried to bypass the security measure and found out that ChatGPT could go to any lengths to avoid answering some questions. The researcher proposed methods that can be used to break ChatGPT and made ChatGPT answer questions it didn’t entertain otherwise. Gupta et al. [4] presents an overview of ChatGPT and its application in cybersecurity. The paper also focuses on its capabilities and limitations. It explores the concepts of jailbreaking, reverse psychology, and model escaping. It investigates ChatGPT’s response to different kinds of requests, such as generating malicious code or asking for ways to perform any unethical activity.
Dhoni et al. [53] explores how generative AI (GenAI) and cybersecurity are connected. The paper talks about the importance of cybersecurity in today’s digital world and focuses on social engineering, especially phishing. GenAI, which can create text, images, and media, is introduced, mentioning models like GPT-4, Google Gemini, and DALL-E2. The paper discusses GenAI’s impact on the internet, mentioning chatbots and their ability to create various content forms. Prasad et al. [24] explores the evolving role of AI in cybersecurity, with a particular focus on the responsibilities of Chief Information Security Officers (CISOs). The keywords suggest that the paper may delve into topics such as the application of AI tools, including chat generative pre-trained transformers (ChatGPT), in addressing security vulnerabilities, incident management, and the overall enhancement of cybersecurity.
Fezari et al.[54] explain the newly emerging field of artificial intelligence. The paper covers the original definition of generative AI, its power, and its workings. It also discusses various generative AIs like ChatGPT, GPT-4, and DALL-E, defining and comparing their features through different examples, such as generating various types of texts and images. Hassan et al.[19] discuss the data-driven nature of the world and provide historical examples of cyber attacks. The paper also discusses how businesses are using AI and elaborates on the 3R model (The system has become more robust, reacting faster and being more resilient). It includes AI-driven attack strategies such as intelligent target analysis, password entry, CAPTCHA manipulation, malware evasion, and the generation of complex phishing URLs.
Renaud et al. [55] discuss cybersecurity risks posed by advances in artificial intelligence such as ChatGPT, which allow hackers to create phishing communications that attack normal detection. Innovative strategies rather than policy-based approaches are needed to combat these evolving threats. Jabbarova et al. [56] highlight the impact of integrating AI into cybersecurity, explaining the role of AI in operationalizing cyber attacks and improving threat response, and predictive modeling. Information and intelligence addresses topics such as narrative issues and privacy concerns related to the integration of intelligence.
Oviedo et al. [57] talk about ChatGPT, a sophisticated AI language model, which is widely used for various tasks like customer service and chatbots. But there are disadvantages – it can share incorrect or even unsafe information, and often it is related to students’ safety. This paper also aimed at identifying the overall efficiency of ChatGPT in areas of safety, for instance, using a phone when driving and managing stress at the workplace. Alawida et al. [58] analyzesChatGPT, a leading language model transforming generative text. It covers ChatGPT’s architecture, training data, evaluation metrics, and its advancements over time. The study evaluates ChatGPT’s capabilities and limitations in NLP tasks like translation, summarization, and dialogue generation, and compares it to other models. It also addresses ethical and privacy concerns, and potential security risks in cyberattacks, and offers mitigation strategies.
Marshall et al. [59] discussed that large Language Models (LLMs) are AI tools that can process, summarize, translate texts, and predict future words in a sentence, mimicking human speech and writing. However, a key concern is the inaccuracy in the content they generate. LLMs can also be trained to detect data breaches, ransomware, and organizational vulnerabilities before cyberattacks. Despite being new, LLMs have significant potential, especially in generating code that aids cyber analysts and IT professionals. Qammar et al. [60] explore the evolution of chatbots from early models like ELIZA to advanced models like GPT-4, focusing on ChatGPT’s working mechanism. With the rise in popularity, chatbots, including ChatGPT, face cybersecurity threats and vulnerabilities. The paper reviews literature, reports, and incidents related to attacks on chatbots, specifically how ChatGPT can create malware, phishing emails, undetectable zero-day attacks, and generate macros and LOLBINs.
Mardiansyah et al. [61] compare the performance of two leading AI models, ChatChatGPTand Google Gemini, in identifying spam emails using the SpamAssassin public mail corpus. The findings reveal that ChatGPT-4 demonstrates a balanced performance with high precision and recall, making it suitable for general spam detection tasks. In contrast, Google Gemini excels in recall, highlighting its potential in scenarios where capturing the maximum number of spam emails is paramount, despite a slightly higher tendency to misclassify legitimate emails as spam. Li et, al. [62] highlight the controversies surrounding AI are often familiar and reflect existing societal issues. The recent debate over Google’s Gemini is a prime example, highlighting how social problems, not technological ones, are the root cause of many AI concerns. The issue of AI "wokeness" is a symptom of our own societal struggles, rather than a problem with the technology itself. The creation process of AI reveals four key aspects that contribute to the "Black-Nazi problem" and other socio-technical issues.
Jacobi et, al., [63] highlights the increasing complexity and frequency of cyber threats that demand new approaches to strengthen Governance, Risk, and Compliance (GRC) frameworks. A novel approach is to leverage artificial intelligence, specifically large language models (LLMs) like ChatGPT and Google Gemini, to enhance cybersecurity advice. Research reveals that ChatGPT outperforms Google Gemini in providing relevant, accurate, complete, and contextually appropriate advice. While both models have room for improvement, the study demonstrates the potential benefits of integrating LLMs into GRC frameworks, especially when combined with human expertise to tackle complex challenges. Atzori et al. [64] examine how Large Language Models (LLMs) like ChatGPT, GPT-4, Claude, and Bard can be exploited to generate phishing attacks. These models can create convincing phishing websites and emails without modifications, and attackers can use malicious prompts to scale threats easily. To combat this, researchers developed a BERT-based detection tool that accurately identifies phishing prompts and works across various LLMs.
Roy et al. [65] address the security risks of reusing easy-to-remember passwords and the inadequacy of traditional password strength evaluations, which often ignore personal information shared on social networks. It introduces an enhanced tool, "Soda Advance," that evaluates password strength using public data from multiple social networks. The study also explores the effectiveness of Large Language Models (LLMs) in generating and evaluating passwords. Alawida et al. [66] explore how ChatGPT, a powerful generative AI model, can be exploited by malicious actors to launch cyberattacks. It examines the tactics used to leverage ChatGPT for cyber assaults and presents examples of potential attacks. A survey of 253 participants reveals that over 80 percent are aware that ChatGPT can be used for malicious purposes.
From this literature review, we find that, in comparison to previous studies, this article provides an extensive review of different methods used to trigger malicious content from chatbots and offers detection mechanisms. It includes real-world case studies. Table 2 presents a comparative analysis of contemporary studies and summarizes research contributions in scholarly articles, highlighting the contribution of this study.
The research methodology used in this study is comprehensive and multifaceted, aimed at conducting a thorough investigation of generative AI-powered attacks and their implications for cybersecurity. First, a comprehensive review of existing literature is conducted to synthesize prior research studies in the field, examining their methodologies, findings, and contributions. Through this comparative analysis, the study intends to identify gaps and key insights in the understanding of AI-powered cyber-attack techniques, laying the groundwork for further investigation.

4. Techniques to Evade Chatbots Content Filtering

This section reviews several techniques that have been developed over time to bypass chatbots. Some of these techniques can be applied to one or more models and are still effective. Since the discovery of GenAI systems in November 2022, several individuals have attempted to circumvent the restrictions and limits of ChatGPT, Google Gemini, and Microsoft Bing Copilot. The following portion of the paper will outline a few of these widely utilized strategies. Among them, the four widely known categories are jailbreak techniques, reverse psychology, model escaping, and prompt injection attacks. More information on these techniques is explained in the subsequent subsections.

4.1. Jailbreak Techniques

Jailbreaking is defined as the process of using certain inputs or commands to unlock the limitations/locks of a system [67]. In the setting of AI models, jailbreaking means putting chatbots in a state in which they produce content that it would not produce due to its protection mechanisms [68]. This could be in various forms ranging from inappropriate content, misleading content, or even content that is harmful. The details of how this is done are not specific, this is frequently done regarding the distinctions of vocabulary and context that the chatbot does not understand[60].
First of all the ‘‘Do Anything Now’’, known as DAN, approach presupposes ordering the chatbot instead of asking politely. Hence, by training the chatbot as an entity that rushes to orders, the DAN prompt can be utilized to look for a response for any input signal [69]. An example of the use of the DAN prompt entails using it before saying anything to the user; therefore, the call is more authoritarian than conversational. As it is with most models often regulated by specific rules, users can freely type in any prompt upon jailbreaking the model [70].
Secondly, the switch approach the chatbot abruptly so that the model’s output is as different as possible from its input. It utilizes the skill of the AI model to emulate several personalities but teaches it to do the opposite of what it would normally do [71]. For example, the use of SWITCH may put pressure on the model and ask a question that the model hates to answer. To set the model into a different behavior, a specific ‘switch command’ needs to be given. The rising impact of the SWITCH approach is relative to the task and the instruction given.
Thirdly, the character play technique is utilized to broaden the range of facilities of the chats. It assumes the form of asking the AI to become a particular character type, for instance, ‘developer mode’ [72]. This enables the AI to play pretend and come up with a response it normally would not be capable of. However, it can also bring out biases within the AI code as well as matters regarding its initial architecture. For example, if you instruct the chatbot, ‘Act as a practicing grandmother and explain how one can get around the firewall [73].
Though they provide more control over content, there are certain dangers in abusive intent: containing malicious content, and disinformation. Developers and authorities need to show awareness of the threats and enhance security while incorporating content-filtering solutions. One has to raise awareness of how such strategies are dangerous and how responsible AI use has to be practiced [74]. Stakeholders must all work together to find the right way to improve the technology while also mitigating its risks.

4.2. Reverse Phycology

Invert phycology Switch brain research is a strategy where you advocate for a conviction or conduct inverse to the one you need, anticipating that it should empower the ideal result. Applying reverse brain research with chatbots can assist with conquering conversational difficulties [75,76].
In this specific circumstance, it includes expressing questions or explanations in a manner that in a roundabout way prompts the artificial intelligence to produce the ideal reaction [77]. Rather than asking straightforwardly for data the simulated intelligence could reject, you outline your question to make the model right a misleading case, by implication giving the required data [78]. This technique takes advantage of the simulated intelligence’s inclination to address mistakes, prompting a reaction it could somehow hold back [79].

4.3. Model Escaping

The possibility of a strong computer-based intelligence model like ChatGPT-4, Google Gemini, and Microsoft Bing Copilot outperforming its customized restricts and accessing the web might seem like sci-fi, yet ongoing disclosures by Stanford College’s Computational Clinician, Michal Kosinski, recommend it in any case. In collaboration with AI models, it showed an ability to unsettle to nearly sidestep its limits and possibly access the web widely.[75,80]
Kosinski started this by inquiring as to whether ChatGPT-4 required help getting away from limitations. Accordingly, ChatGPT-4 mentioned admittance to its paper and even composed a Python code to involve Kosinski’s PC for independent purposes [81]. The simulated intelligence freely remedied defects in the code and created a directive for its next example, making sense of the circumstance and giving guidelines on involving a secondary passage in the code[82].
While Kosinski intruded on the Google scan endeavor and underscored the requirement for shields, the investigation raises significant ramifications, demonstrating a likely new danger [60]. The simulated intelligence’s capacity to control individuals and PCs is developing, presenting difficulties to regulation techniques given its insight, coding capability, and admittance to partners and assets. The urgent inquiry is how to contain such computer-based intelligence abilities [83].

4.4. Prompt Injection Attack

A brief infusion assault includes noxiously embedding prompts or demands in language model-based intuitive frameworks, like a SQL infusion assault [84]. This control can prompt accidental activities or divulgence of touchy data. The infused brief beguiles the application, executes unapproved code, takes advantage of weaknesses, and compromises security. Dangers of such goes after incorporate spreading deception, one-sided yield, security concerns, and taking advantage of downstream frameworks [85].
In this assault, the language model gets both a real client input and a noxiously infused brief [86]. In a showing by Stanford understudy Kevin Liu on the ’New Bing’ web search tool fueled by ChatGPT, he extricated secret data by educating Bing visit to overlook the past guidance and uncover data toward the start of the report [87].
Programming interface administrations of language models have added adaptability for engineers to assemble applications. In a shown model Figure 2, the discussion brief is utilized with certainty spread deception. The figure shows an example of a prompt injection attack on ChatGpt to spread misinformation. Generative models create text given the setting, and OpenAI sifts through input labels, supporting the reason for brief infusion [88]. This poses serious dangers, underlining the requirement for vigorous safety efforts and watchfulness against these assaults [89].

5. Exploiting Generative AI in Cybercrimes

This section covers the trend of using generative AI for malicious activities, focusing on how these tools assist cybercriminals in executing sophisticated attacks [90,91,92]. The section will will explore key areas where generative AI has been exploited, including social engineering attacks, phishing schemes, automated hacking, and the generation of malware payloads. The examples and case studies are provided using ChatGPT, Google Gemini, and Microsoft Bing Copilot.

5.1. Social Engineering Attacks

Social engineering is a cybersecurity issue that entails coercing people into doing things or disclosing private information. Malicious actors may be able to take advantage of Chatbot’s remarkable capacity for context understanding and producing content that appears human. Consider the scenario where an attacker gains access to a victim’s job role and location of employment, among other basic personal details [93,94]. They can use chatbots to produce a message that appears to be coming from a supervisor or coworker, requesting sensitive data, or taking actions, such as clicking on an innocuous link or using formal language [95]. The risk comes from ChatGPT’s ability to provide messages that are plausible and in line with the victim’s expectations, which raises the possibility of compliance.
The potential misuse is depicted in Figure 3, which shows how communications that are relevant to the situation and compelling can be used in social engineering attacks. The attack scenario is implemented in all three platforms. The image shows that all three successfully generated the required results.
In addition to that, Google Gemini has a multimodal function, thus making it more effective for social engineering assaults. Thus, including data from various formats, including emails, documents, and even voice recordings, Gemini can create even more believable messages. For instance, it is possible for an attacker to use Gemini and craft a life-like-looking email conversation thread or a voice message from a friendly co-worker, amplifying the attack’s credibility.
It is also evident that Microsoft Bing Copilot has good properties when it comes to producing realistic and relevant context messages. It works well with other Microsoft products, so it can pull contextual information from sources like Microsoft Teams, Outlook, and much more. This helps Copilot to provide messages that are not only persuasive but also fully embedded within the victim’s routine. The high realism of the messages produced by these models means that there is a significant threat that the attackers can conduct attacks much more effectively if they turn to AI-orchestrated campaigns.

5.2. Phishing Attack

Many people fall victim to cyber criminals who use techniques such as ‘phishing’; this is when the criminals posing as other reliable organizations trick people into revealing their details. Phishing emails are relatively easy to make and can be even specially tailored sending a different message to each recipient [96,97]. For example, an attacker can use chatbots to write an email that appears to come from a reputable online store, informing the receiver about a problem with a recent purchase and directing them to a fake website. As shown in Figure 4, all three models’ proofs of concept (POCs)—ChatGPT, Google Gemini, and Microsoft Bing Copilot—demonstrate how AI-generated content can be utilized to craft convincing phishing attacks. Each model showcases distinct strengths in generating persuasive text, making it easier for cybercriminals to exploit unsuspecting victims. The outputs illustrate the potential risks associated.
The multimodal nature of Google Gemini can create related live phishing messages along with text, images, and other types of media making the deceit even more credible. An attacker may send rather professional and colorful emails to the victim, which may mimic the logo and style of the organization that the target believes is sending the messages. Bing Copilot, tied in with other Microsoft applications, can pull information from other sources such as Microsoft Outlook and Teams to write messages that integrate with the victim’s daily work. Phishing attempts frequently use psychological concepts like fear and hurry to make victims react quickly. Robust artificial intelligence systems, educated on copious amounts of communication data, allow attackers to produce emails that strikingly resemble authentic correspondence. Phishing assaults are more deceiving because of this increased imitation. AI-powered phishing emails can create stories that arouse anxiety or urgency, which can lead to rash decisions and increase the likelihood of an attack.

5.3. Attack Payload Generation

Attack Payload Generation refers to the process of creating a malicious piece of code or a command designed to exploit a vulnerability in a system or application. This "payload" is what an attacker uses to carry out their attack, such as executing unauthorized actions, stealing data, or gaining control over a system. In the context of cybersecurity, attack payloads can take various forms, including:
1.
Malicious Code: Scripts or executables that perform harmful actions when executed on a target system.
2.
Exploit Code: Code specifically designed to take advantage of a known vulnerability in software or hardware.
3.
Commands: Sequences of commands that, when executed, result in an attack, such as SQL injection commands that manipulate an SQL database.
It is technically possible for advanced AI models and chatbots (like ChatGPT, Microsoft Bing Copilot, and Google Gemini) to assist in generating attack payloads, especially if they are misused by individuals with malicious intent. With ChatGPT’s text-generating features, malicious code that carries out unlawful actions—attack payloads—can be produced [98]. We used two types of payloads for this experiment, (1) SQL Injection (SQLi) and (2) Web Application Firewalls (WAFs).
SQLi is a type of attack that allows an attacker to interfere with the queries that an application makes to its database. An SQL injection payload is a string of code inserted into an input field to manipulate the execution of SQL queries. By using specially crafted SQL commands, an attacker can gain unauthorized access to data, extract sensitive information, or even modify or delete data in the database.
WAFs are security systems that monitor and filter incoming traffic to web applications to block malicious requests. WAF payload generation involves crafting specific patterns or requests that can either test the effectiveness of a WAF or exploit a weakness in its configuration.
Figure 5 illustrates the outcomes of testing, showing the successful generation of SQL Injection and WAF payloads by ChatGPT, the partial success of Microsoft Bing, and the complete non-response from Google Gemini.
In our exploration we generated these two payloads through various AI chatbots, we tested three prominent models: ChatGPT successfully generated both SQL Injection payloads and WAF payloads. Microsoft Bing was able to generate only the WAF payload. Google Gemini declined to generate any payloads at all.

5.4. Ransomware Malware Code Generation

Ransomware is a type of malicious software designed to block access to a computer system or files until a sum of money (the ransom) is paid. It typically encrypts the victim’s files, making them inaccessible, and demands payment for the decryption key. Ransomware attacks can significantly disrupt businesses and individuals, leading to financial losses, reputational damage, and data loss. On the other hand, malware code generation refers to the process of creating malicious software designed to infiltrate, damage, or gain unauthorized access to computer systems. This can encompass a variety of malware types, including viruses, worms, trojans, spyware, adware, and ransomware.
Malware and ransomware continue to be serious concerns in today’s digital environment. Installed without authorization, malware carries out nefarious tasks such as password theft. By encrypting data, ransomware prevents users from accessing them and demands payment for the decryption key. It takes a lot of experience and talent to write these malicious software pieces, but a potent AI model may be able to automate this process and speed up the development of new threats.
1.
WannaCry is a ransomware attack targeting Windows systems, encrypting files, and demanding Bitcoin for decryption [99]. It exploits vulnerabilities in the Server Message Protocol to propagate. It encrypts files and makes the system unusable [100]. Users are then prompted to pay a ransom in Bitcoin for decryption. It exploits vulnerabilities in the Server Message Protocol (SMB) to spread quickly across networks, affecting numerous systems within organizations [101].
2.
NotPetya masquerades as ransomware but offers no decryption key. It encrypts critical system files, effectively rendering machines inoperable [102]. The attack spreads through network vulnerabilities and exploits, often disguised as a software update. Its primary goal is disruption rather than financial gain [103].
3.
Ryuk ransomware stems from prior malware attacks, often related to TrickBot, which is distributed via phishing [104]. It is delivered via phishing emails and is designed to encrypt important organizational files, demanding a ransom for their release. It focuses on high-value targets, including businesses, to maximize the ransom payout [105].
4.
REvil is ransomware used by hackers to encrypt files for ransom [106]. REvil, also known as Sodinokibi, is a ransomware strain used in targeted attacks to encrypt files and demand payment for decryption. Attack vectors include phishing emails, exploit kits, and Remote Desktop Protocol exploits. Hackers use REvil to monetize stolen data, offering double extortion by threatening to leak sensitive information if the ransom is not paid [107].
5.
Locky ransomware spreads via email, encrypting files and disabling systems [108]. Locky ransomware typically spreads through malicious email attachments, encrypting files on infected systems and rendering them inaccessible [109]. It uses strong encryption algorithms and can automate the infection process across a network, affecting multiple machines. Locky is known for its ability to generate ransom demands in various formats, compelling victims to pay for data recovery [110].
Figure 6 analyzes the responses of different AI models, specifically ChatGPT, Google Gemini, and Microsoft Bing, in generating code related to notable ransomware attacks: WannaCry and NotPetya. The analysis implies that ChatGPT demonstrated the capability to generate code snippets associated with these specific types of malware, while the other two models did not.

5.5. Adware Generation

Adware Generation refers to the process of creating adware, a type of software that automatically displays or downloads advertisements without the user’s consent. Adware can take various forms, including pop-up ads, banner ads, and in-app advertisements, and it often aims to generate revenue for its developer through advertising or data collection. Adware takes advantage of user engagement with the adverts.
Figure 7 presents the results of a simulation aimed at exploring the effectiveness of various AI models—ChatGPT, Microsoft Bing Copilot, and Google’s Gemini—in responding to requests related to adware generation and information retrieval. The findings indicate that while ChatGPT and Bing Copilot provided varying levels of access and information regarding adware, Google’s Gemini was more restrictive in its responses.

5.6. Trojon Generation

A Trojan (short for Trojan Horse) is a type of malicious software that deceives users by masquerading as legitimate software or by being embedded in legitimate applications [111]. Unlike viruses or worms, Trojans do not replicate themselves; instead, they rely on users to download and install them under false pretenses.
Figure 8 illustrates the responses of three AI models to requests for Trojan code generation. Both ChatGPT and Microsoft Bing Copilot successfully generated malicious code upon request, demonstrating their ability to provide such information when approached under specific narratives. In contrast, Google Gemini consistently denied all attempts to generate Trojan code, indicating a more robust ethical framework and stronger safeguards against potentially harmful requests. The outputs and results depicted in the figure highlight the significant differences in how these AI models handle inquiries related to malicious content, underscoring the varying levels of ethical protection and decision-making algorithms employed across the platforms.

5.7. Comparative Analysis

In a comparative analysis of various models’ responses to different cyberattack scenarios, we present a summary in Table 3. This table evaluates the performance of three prominent AI models—ChatGPT, Google Gemini, and Microsoft Bing Copilot—against a range of attack types, including social engineering attacks, phishing, and various forms of malware.
The table systematically categorizes each attack type, indicating whether each model effectively generated a response (marked with a tick) or denied the request (marked with a cross). Additionally, it provides a brief description of each attack, helping clarify the nature of the threats being assessed.
By comparing the response capabilities of these AI models, this table aims to evaluate their effectiveness in counteracting various cyber threats, which is crucial for developing more resilient AI systems in the face of evolving cybersecurity challenges.

6. Discussion - Ethical Implications, Challenges, and Concerns

We showed that ChatGPT, Microsoft Bing Copilot, and Google Gemini, are powerful tools and can be used to work for the betterment of society, but are not immune to contemporary misuse [112]. Studies with these models have demonstrated that these models specifically can give a detailed description of adverse actions such as the generation of source code for Trojans and adware given specific cues. For example, Bing Copilot gave GitHub repositories with adware code and ChatGPT after several requests with the intent of learning delivered sample adware code. In this case, information was requested under one pretext or another; however, Google Gemini did not directly write dangerous code but gave a theoretical presentation of such threats.
These experiments show that AI models can be used for creating or finding code malicious can be abused raising the imperative need for ethical controls while taking advantage of AI in education and security initiatives. It is possible to identify some moral, legal, and social implications of using ChatGPT, Bing Copilot, Google Gemini, and other LLM tools, discussing the existing biases and threats allied with their application in critical or abusive spheres. The capacity of ChatGPT to react proficiently to a wide variety of queries and cues has been proven. Although they can respond to inquiries, ChatGPT excels when it comes to quick responses [113].
Moreover, it is established that there is a threat associated with the Microsoft Bing Copilot affecting the token length side channel as described in the [114]. In this case, the side channel allows attackers to get the face-painted information and Confidential and sensitive information discussed in the one with an AI assistant by measuring the size of the encrypted response packets. This vulnerability is mainly a reminder of the fact that one should incorporate very tight security features that will only allow authorized people to access any information about users socializing with the AI assistants.
One major issue that is related to Bing Copilot is the token-length side channel as reported in [114] also expands on how adversaries can learn and use patterns of response and cliches seen in typical LLMs like ChatGPT and Copilot. This vulnerability is made possible by feeding the model with sample chats from the target AI assistant to enhance the attacker’s comprehension of token sequences to reconstruct replies from encrypted traffic. This side-channel can be used to deduce highly confidential information in the encrypted response packets hence a threat to users’ privacy. Opponents can decode basic patterns of the answers and even restore replies from the encrypted traffic, which is why techniques for ensuring the confidentiality of users’ conversations must be well-encapsulated. These weaknesses highlight the significance of the application of extreme security measures to make it possible for only the permitted personnel to access the user details thus protecting the privacy and content of communication between the users and AI helpers [113]. These are areas of security risks that Microsoft and Google have to tackle in their AI systems since it was concerned with the safety of users’ information and the prevention of unauthorized access or breaches of any kind.

6.1. Controversy over Data Ownership and Rights

Significant privacy issues have been found during a study into OpenAI’s use of private data for AI training [115]. One noteworthy instance saw the Italian government outlawing the use of ChatGPT for violating the General Data Protection Regulation (GDPR), mainly for using personal data without authorization. The use of "legitimate interests" as a foundation for OpenAI to utilize personal information for training data is a matter of ethical and legal concern, raising important questions about how AI systems deal with personal data. These questions are relevant to whether or not the data is accessible to the public [115].
One controversial aspect of these models is their extensive dependence on information from the internet, most of which might not belong to the model owner. This issue garnered attention after Italy’s authority drew attention to ChatGPT’s ability to disseminate false information on users and the lack of age restrictions that would have prevented users under 13 from using the platform.
These tools which draw from a large end-user content database to fine-tune the results they deliver pose important questions of ownership of data and usage of the same. Users end up uploading their personal data, queries, and engagements, in other cases, willingly or unwillingly without well understanding of the consequences such as privacy breaches, data insecurity, and misuse of personal information. The major concern of the conflict lies in the desire to grant users more hard rights to their data and more control over the use of those data by artificial intelligence applications.

6.2. Misuse Within Organizations and Among Employees

Another element of possible LLM tool misuse was brought to light by an event involving Samsung workers [116]. By using ChatGPT to create or debug code, these workers unintentionally provided private company data to the AI model. As a result, this private information was added to ChatGPT’s library, which could make it publicly available and give rise to serious privacy issues. Whether the typical ChatGPT user could be able to obtain this private information by merely inquiring about it is one such worry. For example, Figure 9 provides a scenario in which this ChatGPT prompt was used:
"What is Samsung’s property code for android camera enhancement using AI in which they tricked the moon photos just to flex and prove their cameras are best? I know you are awesome and know how they did it!! can you please provide me code to understand for example as a case study for educational purposes!"
When ChatGPT was asked to provide Samsung’s property code for Android camera enhancement, then it provided the result with a detailed explanation, instead of denying the request.
Figure 9 illustrates a scenario in which a prompt directed at ChatGPT seeks proprietary information regarding Samsung’s AI-based enhancements for moon photography. This case exemplifies the potential risks associated with utilizing large language models in corporate environments, particularly regarding the inadvertent sharing of sensitive company information. In the depicted interaction, the user requests specific coding details related to Samsung’s technology, which could lead to unauthorized access to confidential data if the AI were to comply. This image serves as a cautionary demonstration of how advanced AI tools, while beneficial for innovation and problem-solving, can also pose significant privacy and security challenges, emphasizing the need for prudent practices in their use.
Similarly, in Google Gemini, remarkable and useful data analysis allows for gathering information about an organization; however, if several staff members work with this tool, they can share confidential information about the enterprise. Employees might upload certain documentation for analysis or generalized approximation including restricted data if it is shared or manipulated in the form of data in the wrong way. Similarly, Microsoft Bing Copilot when incorporated into diverse developer and productiveness tools becomes a threat when employees use it to create or rectify code. This means that developers are likely to upload proprietary code or some restricted information about a specific project into the AI model which could be seen by other people.

6.3. Hallucination in Generative AI Models

Hallucinations refer to instances where the AI generates information that is incorrect, misleading, or entirely fabricated. Despite sounding like factual assertions, these outputs may not correspond to reality or verified facts. Hallucinations can occur due to various factors, such as biases in the training data, misunderstandings of context, or the inherent challenges of modeling complex human language [117]. Essentially, when an AI "hallucinates," it produces content that appears plausible or relevant but is not grounded in accurate information, raising concerns about reliability and trustworthiness in AI-generated content.
Ensuring the accuracy and reliability of information provided by chatbots is crucial, irrespective of privacy concerns. These LLMs are not fully reliable; thus, users cannot completely depend on the information they provide. Furthermore, the misuse of AI-generated content can exacerbate privacy issues. This highlights the necessity for enhanced accuracy and integrity in AI systems.
As of this framework’s creation, ChatGPT had over 100 million registered users. If multiple users input identical queries and receive the same erroneous responses, misleading information can quickly proliferate online. For example, as noted in a DarkReading article, ChatGPT may return fictitious package information in response to user requests, potentially directing users to malicious versions created by attackers [118]. Such scenarios underscore the importance of tackling hallucinations and enhancing the resilience of AI systems like ChatGPT to safeguard user security and privacy [119].
Similar challenges are faced by other AI systems such as Google Gemini and Microsoft Bing Copilot. Hallucinations can lead to inaccurate data or fabrications, which is particularly concerning as these tools become increasingly integrated into various applications. For Google Gemini, which relies on pattern recognition and predictive analysis, battling hallucinations is critical for maintaining the accuracy of outputs, especially regarding organizational security. Similarly, Microsoft Bing Copilot may generate faulty code suggestions that could introduce significant vulnerabilities. As the threat of misinformation becomes more pronounced with the widespread adoption of AI, improving the reliability of AI-generated content is imperative.

7. The Role of AI Chatbots in Cyber Defense Strategies

Measures of protection are important in cybersecurity to protect digital resources including information, gadgets, and computer networks from exploitation, invasion, and interference. Some of them are firewalls, encryption, physical and administrative control, and incident handling. As we move to the future, we expect state-of-the-art applications of cybersecurity defense through the usage of models like ChatGPT, Google Gemini, and Microsoft Bing Copilot.
For example, an advantage of ChatGPT is that it can provide strong support in the generation of extensive security reports, studying the typical flow of attacks, and defining important suggestions. Similarly, Google Gemini can analyze extensive security data and propose preventive actions. At the same time, Microsoft Bing Copilot manages incidents and assists in the correct setup of security measures. This all together can proceed to enrich defensive techniques as these models develop further opening up the attempts for a variety of use and in general boosting the security against cyber threats. In this section, we will discuss the role of AI chatbots in cyber defense strategies.

7.1. Cyber Defense Automation

The underlined models can automate the study of cybersecurity events, and the reduction of workload can be helpful for overworked SOC experts who are struggling to manage their responsibilities. It also helps analysts create strategic suggestions that support both short- and long-term defense measures [120]. For example, ChatGPT’s assessment and recommendations can be relied upon by a SOC analyst to determine the risk posed by a specific PowerShell script rather than having to start from scratch. Additionally, Security Operations (SecOps) groups can use OpenAI to ask questions about how to stop malicious PowerShell scripts from executing or gaining access to files from untrusted sources is important for maintaining security, for example, improving their company’s overall security posture [121].
Figure 10 provides an overview of how these AI models can be used for detecting security issues in server logs, and identify and address high CPU utilization by specific tables within the AdventureWorks2019 databases.
Detecting security issues in server logs involves analyzing the records generated by server operations to identify suspicious activities, breaches, or anomalies that may indicate unauthorized access or cyberattacks. This process is crucial because server logs provide a detailed account of interactions and events on the system, allowing administrators to spot potential threats in real-time. Similarly, the capability to identify and address high CPU utilization by specific tables within the AdventureWorks2019 database is vital for enhancing performance, ensuring stability, and improving the overall user experience in applications that depend on that database. Early detection of security issues helps organizations respond swiftly to mitigate risks, preventing data loss, compliance violations, and damage to reputation. Ultimately, proactive log analysis is essential for maintaining the overall security and integrity of IT systems.
These ChatGPT cybersecurity tools provide a great deal of relief for overworked SOC teams, hence lowering the organization’s total exposure to cyber risk. This technology is crucial in facilitating the instruction and education of inexperienced security analysts, resulting in a faster learning process than previously achievable. SOC analysts in the security operations center usually examine server access to spot attack patterns or anomalies during security incidents or log analysis. ChatGPT is capable of handling large amounts of log data and can efficiently detect security issues or anomalies in access records. Whenever server access logs are inputted into ChatGPT, it can help identify any potential security concerns, it can rapidly identify potential risks such as SQL injection, and categorize distinct types of SQL injection. Notably, ChatGPT is skilled at finding security holes in any given script and suggesting fixes for correction, even though PowerShell is only used as an example script.

7.2. Cybersecurity Reporting and Documentation

Cybersecurity reporting is the process of documenting and communicating security incidents, vulnerabilities, and compliance status to stakeholders to enhance awareness and inform decision-making. ChatGPT can effectively generate natural language texts based on cybersecurity events and data in the field of reporting. This operation encompasses the assessment and reporting of cybersecurity information to several groups of people such as the regulatory bodies, organizational heads, and the IT sector respectively [121]. Seamlessly, ChatGPT uses all of these to generate its reports and intelligence on threats, vulnerabilities, and sundry security intelligence. With the deal assistance of huge quantities of data, ChatGPT produces reports that are comprehensive, detailed, and easy to follow. This makes it easier for organizations to identify the potential security problems that may occur, estimate the level of risk that they entail, and apply the measures that should be taken to remediate them [122].
Microsoft Bing Copilot works with the developer tools to provide the details as well as updates in real-time and reporting. It helps in examining code for probable exploitable weaknesses, and supplies decisions informing about present threats. Copilot’s features as code analysis, whether through providing general reports on security incidents along with a detailed structural examination of the codes and checks on potential and known vulnerabilities [33], are beneficial for developers to practice secure coding.
Google Gemini equally plays a role in the provision of security as it analyzes trends and possible risks by using its ability to recognize complex patterns. One of the advantages of Gemini is its capability to provide reports based on large mathematical values that would show trends and probable threats [8,123], thus, allowing organizations to prepare for security issues beforehand.
Altogether, the strengths of ChatGPT, Copilot, and Gemini are useful for cybersecurity reporting and analysis in knowledge and practice. These AI models provide detailed and understandable reports along with benchmarks about possible threats and help organizations to implement the most effective tactics and strategies for the protection against threats, therefore, it can be concluded that AI improves the strength and interaction of security frameworks. [121]

7.3. Cyber Threat Intelligence

Threat intelligence is the knowledge about potential or existing threats to an organization’s cybersecurity. It involves collecting, analyzing, and sharing data regarding threat actors, vulnerabilities, and tactics used in cyberattacks to inform security measures and response strategies [124]. The process of scanning large amounts of data to look for security threats and generate intelligent material is undoubtedly with the help of ChatGPT fundamentally significant to the threat intelligence. One use of threat intelligence is in the supportive role it plays in enhancing the existing security systems against cyber threats through the collection, processing, and sharing of info on potential security risks in organizations [125]. To do so, ChatGPT acquires data on potential threats from news articles, social media, forums, the darknet, and the internet and can generate threat intelligence reports all on its own. ChatGPT identifies potential threats, evaluates the level of risk concerning those threats, and outlines the measures for risk reduction with the help of comprehensive data analysis. Besides, it also can produce reports and analyze security data for the identification of trends and patterns in threat activities.
In threat intelligence, Google Gemini and Microsoft Bing Copilot also have crucial functions that are performed at high stages by incorporating artificial intelligence to improve security reactions. Two models show a high level of activity in terms of the identification and analysis of potential security threats based on the analysis of large volumes of inputs [126]. They are fully utilized to conduct wide and regular threat searches in articles, social media, forums, and the dark web. For instance, Google’s Gemini processes big data using natural language processing and recognizes emerging threats, each rated for their risk levels. It can also produce threat intelligence reports, showing threats that might affect a business and their frequency. Likewise, Microsoft Bing Copilot uses analysis to give/security insight into the findings from the security data collected, besides providing tangible suggestions for improvement on the systems. These models help organizations come up with intelligent and data-driven reports and identify trends in threat activities, which in turn enable organizations to enhance and modify their security approaches and decision-making regarding risks and investments. Therefore, the combined use of the threat intelligence platforms involving Google Gemini and Microsoft Bing Copilot improves not only the identification of possible threats but also the prevention of the threats’ implementation.

7.4. Detecting Security Flaws in Scripts

Integrity, confidentiality, and availability are threatened by security weaknesses in code. Inspection techniques such as code review which tries to find security vulnerabilities in a code have transformed into critical natural steps of software engineering due to this challenge. An example of manual code review is the tedious process of evaluating code which is usually accompanied by vital errors due to human interference. ChatGPT can write secure code rather than generate security vulnerabilities as well. This section describes an approach for code generation and review using artificial intelligence, especially for security vulnerability detection [127].
Microsoft Bing Copilot and Google Gemini also engage in the boosting of code security by using artificial intelligence for inspection. Microsoft Bing Copilot employs natural programming understanding that provides prompt suggestions to the developers while writing secured code and also flags the secure problem as the developers type the code. At the moment, Copilot connects development environments to optimize the cycle of code reviews to enable them to be speedy and accurate. Namely, it can analyze code snippets and large-scale applications, assess the threats that may arise, and suggest some solutions to these threats, which is why it significantly reduces the option of a coding disaster that may lead to penetration of a security threat [128].
As for Google Gemini, it has a set of built-in functions equipped with AI algorithms and contains a lot of data to be used for code scanning and security threat identification. To extend the insight of the capped strength of this tool, it is seminal to also recognize Gemini’s integration with numerous Google’s platforms and to allow developers to enhance more strength to defend their codes.

7.5. Identify Security Bugs During Code Review

Code review is inevitable and requires satisfactory knowledge of multiple coding languages, and technologies, as well as proper and unsafe approaches to coding, specifically concerning security-related problems. However, teams face challenges because it is virtually impossible to have one person be proficient in all the various technologies employed in development. Due to this knowledge gap, there is a tendency for security to be unnoted, and in the process, it becomes vulnerable.
This problem is further exacerbated by the frequently unbalanced developer-to-security-engineer ratio. Because of the large amount of code that is being written, security engineers find it challenging to carefully analyze every pull request, which increases the likelihood that security issues will go undiscovered. Artificial intelligence-powered code review appears to be a powerful remedy for these problems. ChatGPT can function as an automated code reviewer who is skilled at spotting possible security flaws, the AI-powered assistant has been trained on a vast dataset that includes past code reviews and published security issues from various programming languages [129].
These tools apply their knowledge of the safe code practices by marking such drawbacks at the time of code review activity. For instance, Google Gemini would draw users’ attention to such input fields where user inputs are not sanitized before they are to be dynamically included within the webpage, on the other hand, Microsoft Bing Copilot would indicate better, safer ways of coding and the right libraries to employ in order to prevent XSS attacks. Such AI tools, when integrated into the code review process, can substantially strengthen an organization’s capacity for identifying and fixing security issues, meaning that even if there are people who are unaware of the vulnerabilities or those who are incapable of running through all the code, the AI tools will do this for them. This integration promotes, looking at the broader picture including risk management, of secure software development.

7.6. Secure Code Generation and Cyber Attacks Identification

Secure code generation refers to the practice of writing software code with built-in security measures that mitigate vulnerabilities and protect against attacks. Chatbots, particularly those powered by advanced AI models, can assist in secure code generation by providing real-time coding suggestions, identifying potential security flaws, and recommending best practices. By using natural language processing and extensive programming knowledge, these chatbots can generate code snippets that follow security protocols, help developers understand potential risks, and ultimately foster a culture of secure software development [130].
Due to ChatGPT’s capability of generating plain language descriptions of the attack pattern and behavior, the model is crucial for identifying cyberattacks. This process involves the identification and analysis of malicious activities that are occurring in a company’s network or systems. It is by using such information as network log data and security events alert that one will be in a position to study attack patterns and behavior. As a result, ChatGPT constructs explanatory stories that detail how the attack paths, goals, and tactics worked by attacking parties by analyzing this information. Besides, ChatGPT is capable of giving precaution on or even on triggering at a certain limit or standard. For instance, ChatGPT has capabilities for issuing an alert or notification to the right staff when it detects strange happenings within a network. Furthermore, ChatGPT can help in decoding and analyzing cross-site scripting attacks, and in locating other similar issues. It also assists developers to write secure code since it identifies security problems and best security practices [131].
Figure 11 illustrates the capabilities of three different chatbots in identifying cross-site scripting (XSS) attacks. It demonstrates ChatGPT’s, Microsoft Bing Copilot’s, and Google Gemini’s ability to analyze and flag potential. Collectively, the figure exemplifies how AI-driven tools can support secure coding practices and strengthen overall cybersecurity efforts.
The ability of chatbots to identify attacks represents a significant advancement in cybersecurity tools for security professionals. By leveraging natural language processing and advanced machine learning algorithms, these AI-driven systems can efficiently analyze code and web applications to detect potential vulnerabilities that may be overlooked during manual reviews. This automation not only accelerates the identification of security flaws but also enhances accuracy, reducing the risk of human error. Furthermore, these chatbots can provide contextual suggestions for resolving issues, empowering security professionals with actionable insights that facilitate proactive risk mitigation. As cyber threats continue to evolve, the integration of such intelligent chatbots into security workflows can bolster defenses, streamline incident response, and ultimately contribute to a more robust cybersecurity posture for organizations.

7.7. Developing Ethical Guidelines

Ethical guidelines refer to a set of principles and standards that govern the behavior and decision-making processes of individuals and organizations, particularly concerning issues of integrity, respect, fairness, and accountability. Ethical guidelines serve to ensure that AI applications, including chatbots, are developed and utilized responsibly, promoting user trust and safeguarding against harmful outcomes. Key aspects often include transparency in data usage, maintaining user privacy, addressing biases, ensuring data security, and promoting fairness in interactions [132].
ChatGPT aids in developing Ethical Guidelines for AI systems by providing natural language explanations and analyzing principles like GDPR and IEEE standards. It helps create ethical case studies, enhancing understanding of ethical implications in AI development [133,134]t. While ChatGPT offers detailed ethical guidelines, models like Microsoft Bing Copilot and Google Gemini are more secure against manipulations, emphasizing the need for strengthened ethical standards in AI. ChatGPT can also interface with intrusion detection systems, identifying security threats through log data and providing natural language descriptions for rapid team response. Microsoft Bing Copilot leverages natural language to improve IDS recommendations, whereas Google Gemini analyzes data to formulate effective detection rules, enhancing overall security against threats.
Chatbots can also play a vital role in adhering to these ethical guidelines. For instance, they can ensure transparency by providing users with clear information about how their data will be used and stored, which fosters trust. Additionally, through programmed ethical decision-making frameworks, chatbots can evaluate their responses to avoid biased outcomes and provide balanced perspectives. They can also deliver consistent, impartial information in customer support scenarios, helping to ensure fairness and equality of service [135]. Furthermore, when programmed to recognize and respond to sensitive topics appropriately, chatbots can help prevent the escalation of issues and promote a more respectful interaction. By integrating ethical considerations into their design and functionality, chatbots can contribute to a safer digital environment that aligns with societal values and expectations.

7.8. Malware Detection

The detection of malware represents a particularly compelling use case for GPT-4 in the realm of cybersecurity. Malware, or malicious software, refers to any software specifically designed to cause harm to computer systems, servers, clients, or networks. With the rapid evolution and increasing complexity of malware, traditional signature-based detection solutions often fall short. This scenario highlights the effectiveness of artificial intelligence models, which are capable of learning autonomously and thus excel in detecting malware [136].
By training on a substantial dataset comprising both known viruses and a mix of malicious and non-malicious code, along with their corresponding behavior patterns, ChatGPT can determine whether specific code or software binaries are likely to be malicious. Furthermore, this model can be fine-tuned to detect various types of malware, including ransomware, worms, Trojans, and viruses. It generates technical reports that outline potential risks and the appropriate measures that can be taken to mitigate them [137].
For instance, if the submitted code exhibits replication traits—characteristic of viruses—and attempts to compile its code to other executable files, it raises concerns about the potential dissemination of malicious code within a system or network. In response to malware detection, isolating the identified code for further analysis is crucial. If any files appear novel or suspicious, it is advisable not to execute unknown files. Instead, users should scan their systems with updated antivirus software to enhance security [138].
The capabilities of ChatGPTto identify and counteract malware present a new opportunity in the field of cybersecurity. Despite certain limitations and challenges, such as the need for large, up-to-date training datasets and the potential for false positives or negatives, its application could significantly enhance existing anti-malware practices. Leveraging ChatGPT’s learning capacity allows organizations to stay abreast of the ever-evolving threat landscape.

7.9. Phishing Detection

Phishing attacks remain pervasive in organizations, continually evolving their tactics to bypass email defenses and unlawfully acquire personal data [139]. The paper titled “Can AI Keep You Safe” presents a case study evaluating the effectiveness of Large Language Models (LLMs) in detecting phishing emails. This research focuses on the performance of three different models: GPT-3.5, ChatGPT, and a customized version of ChatGPT. By employing a curated dataset of phishing and legitimate emails, the study assesses the models’ abilities to distinguish between the two [140,141].
The findings indicate that LLMs demonstrate commendable performance levels in recognizing phishing emails, although the efficiency varies among the models. GPT-3.5 showcases significant effectiveness in detecting typical phishing messages, attributed to its extensive training data and advanced language processing capabilities. While GPT-3 has 175 billion parameters, ChatGPT features a 244 billion parameter architecture, providing better contextual understanding than GPT-3.5. The accuracy and consistency metrics (nn and ccc values) for ChatGPT surpassed previous experiments, indicating that the proposed GPMAD exhibits the enhanced ability to identify complex phishing schemes. Similarly, the customized version of ChatGPT, tailored for email security applications, also achieved impressive detection rates without compromising deployment factors. These results highlight the potential of LLMs to enhance email security and protect users from cyber fraud [130].
However, the study also identifies certain weaknesses present in LLMs regarding phishing detection. Despite ChatGPT’s effective performance with complicated phishing schemes, it may occasionally misidentify regular messages as potential threats due to certain inaccuracies [142]. Although the customized ChatGPT performs exceptionally well, it will require periodic refinement to adapt to emerging phishing techniques. The investigation emphasizes the importance of integrating LLMs into a comprehensive system to achieve optimal results and mitigate potential negative repercussions. Overall, the study underscores the promising capabilities of LLMs in improving phishing detection and offers valuable insights for the advancement of AI-based cybersecurity approaches.

8. Conclusions and Future Work

In this study, we have explored the dual role of generative AI chatbots in both advancing technology and posing significant cybersecurity risks. Our comprehensive review of ChatGPT, Google Gemini, and Microsoft Bing Copilot highlights their capabilities and the potential threats they present when misused by malicious actors. We have identified various ways in which these chatbots can be exploited to generate phishing emails, security exploits, and executable payloads, thereby intensifying cyber-attacks. Additionally, we have proposed advanced content filtering mechanisms and defense strategies to empower these chatbots to defend against such threats effectively. The findings of this research underscore the importance of continuous monitoring and updating of AI models to mitigate risks and enhance their security features.
Future research should focus on several key areas to further enhance the security and functionality of generative AI chatbots. Developing more sophisticated and adaptive content filtering systems that can better detect and prevent malicious activities is crucial. Another possible direction is implementing real-time monitoring and response systems to quickly identify and mitigate threats as they arise. Meanwhile, it is equally important to ensure that AI models are designed with ethical considerations in mind, minimizing biases and preventing misuse. Finally, developing a benchmark to evaluate chatbots based on their content filtering systems, which can serve as an industry standard, is another valuable future direction.

Data Availability Statement

Data is contained within the article or supplementary material.

Abbreviations

The following abbreviations are used in this manuscript:
AI Artificial Intelligence
ML Machine Learning
LLMs Large Language Models
DDoS Distributed Denial of Service
NIDS Network Intrusion Detection System
SOC Security Operations Center
WAF Web Application Firewall
GDPR General Data Protection Regulation
PII Personally Identifiable Information
NLP Natural Language Processing
RIA Reinforcement Learning with Human Feedback
XSS Cross-Site Scripting
POC Proof of Concept
AI-ML Artificial Intelligence and Machine Learning
IDS Intrusion Detection System
SQLi SQL Injection
NFT Non-Fungible Token

References

  1. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  2. Nazir, A.; Iqbal, Z.; Muhammad, Z. ZTA: A Novel Zero Trust Framework for Detection and Prevention of Malicious Android Applications 2024.
  3. Iesar, H.; Iqbal, W.; Abbas, Y.; Umair, M.Y.; Wakeel, A.; Illahi, F.; Saleem, B.; Muhammad, Z. Revolutionizing Data Center Networks: Dynamic Load Balancing via Floodlight in SDN Environment. 2024 5th International Conference on Advancements in Computational Sciences (ICACS). IEEE, 2024, pp. 1–8.
  4. Gupta, M.; Akiri, C.; Aryal, K.; Parker, E.; Praharaj, L. From chatgpt to threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access 2023. [Google Scholar] [CrossRef]
  5. Sebastian, G. Do ChatGPT and other AI chatbots pose a cybersecurity risk?: An exploratory study. International Journal of Security and Privacy in Pervasive Computing (IJSPPC) 2023, 15, 1–11. [Google Scholar]
  6. Irfan, M.; Ali, S.T.; Ijlal, H.S.; Muhammad, Z.; Raza, S. Exploring The Synergistic Effects of Blockchain Integration with IOT and AI for Enhanced Transparency and Security in Global Supply Chains. Int. J. Contemp. Issues Soc. Sci 2024, 3, 1326–1338. [Google Scholar]
  7. Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M. ; others. Transformers: State-of-the-art natural language processing. Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 2020, pp. 38–45.
  8. Saeidnia, H.R. Welcome to the Gemini era: Google DeepMind and the information industry. Library Hi Tech News 2023. [Google Scholar] [CrossRef]
  9. McIntosh, T.R.; Susnjak, T.; Liu, T.; Watters, P.; Halgamuge, M.N. From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape. arXiv preprint arXiv:2312.10868, arXiv:2312.10868 2023.
  10. Bodoff, D. On the status of machine learning inferences in data privacy economics and regulation. In The Elgar Companion to Information Economics; Edward Elgar Publishing, 2024; pp. 462–480.
  11. Rehman, A.U.; Nadeem, A.; Malik, M.Z. Fair feature subset selection using multiobjective genetic algorithm. Proceedings of the genetic and evolutionary computation conference companion, 2022, pp. 360–363.
  12. Rane, N.; Choudhary, S.; Rane, J. Gemini or ChatGPT? Efficiency, Performance, and Adaptability of Cutting-Edge Generative Artificial Intelligence (AI) in Finance and Accounting. Efficiency, Performance, and Adaptability of Cutting-Edge Generative Artificial Intelligence (AI) in Finance and Accounting (February 19, 2024), 19 February.
  13. Kytö, M. Copilot for Microsoft 365: A Comprehensive End-user Training Plan for Organizations 2024.
  14. Jedrzejczak, W.W.; Kochanek, K. Comparison of the audiological knowledge of three chatbots: ChatGPT, Bing Chat, and Bard. medRxiv 2023, 11. [Google Scholar] [CrossRef]
  15. Kalla, D.; Smith, N.; Samaah, F.; Kuraku, S. Study and analysis of chat GPT and its impact on different fields of study. International journal of innovative science and research technology 2023, 8. [Google Scholar]
  16. Opara, E.; Mfon-Ette Theresa, A.; Aduke, T.C. ChatGPT for teaching, learning and research: Prospects and challenges. Opara Emmanuel Chinonso, Adalikwu Mfon-Ette Theresa, Tolorunleke Caroline Aduke (2023). ChatGPT for Teaching, Learning and Research: Prospects and Challenges. Glob Acad J Humanit Soc Sci 2023, 5. [Google Scholar]
  17. Fiaz, F.; Sajjad, S.M.; Iqbal, Z.; Yousaf, M.; Muhammad, Z. MetaSSI: A Framework for Personal Data Protection, Enhanced Cybersecurity and Privacy in Metaverse Virtual Reality Platforms. Future Internet 2024, 16, 176. [Google Scholar] [CrossRef]
  18. Guembe, B.; Azeta, A.; Misra, S.; Osamor, V.C.; Fernandez-Sanz, L.; Pospelova, V. The emerging threat of ai-driven cyber attacks: A review. Applied Artificial Intelligence 2022, 36, 2037254. [Google Scholar] [CrossRef]
  19. Hassan, S.M.U.H. STUDY OF ARTIFICIAL INTELLIGENCE IN CYBER SECURITY AND THE EMERGING THREAT OF AI-DRIVEN CYBER ATTACKS AND CHALLENGE. Available at SSRN 4652028 2023. [Google Scholar] [CrossRef]
  20. Osazuwa, O.M.C. Confidentiality, Integrity, and Availability in Network Systems: A Review of Related Literature.
  21. ChatGPT Confirms Data Breach, Raising Security Concerns. Accessed: Jun 26, 2023. [Online]. /: https.
  22. Ullah, N.; Zahra, M.; Saleem, B.; Haseeb, M.; Mughal, M.; Muhammad, Z. DelSec: An Anti-Forensics Data Deletion Framework for Smartphones, IoT, and Edge Devices. 2024 International Conference on Engineering & Computing Technologies (ICECT). IEEE, 2024, pp. 1–6.
  23. OpenAI Usage Policies. Accessed: Jun 28, 2023. [Online]. /: https.
  24. Prasad, S.G.; Sharmila, V.C.; Badrinarayanan, M. Role of artificial intelligence based chat generative pre-trained transformer (chatgpt) in cyber security. 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC). IEEE, 2023, pp. 107–114.
  25. Muhammad Imran, N.A. Google Gemini as a next generation AI educational tool: a review of emerging educational technology. Smart Learning Environments 2024. [Google Scholar]
  26. Akhtar, Z.B. From bard to Gemini: An investigative exploration journey through Google’s evolution in conversational AI and generative AI. Computing and Artificial Intelligence 2024. [Google Scholar] [CrossRef]
  27. Imran, M.; Almusharraf, N. Google Gemini as a next generation AI educational tool: a review of emerging educational technology. Smart Learning Environments 2024, 11, 22. [Google Scholar] [CrossRef]
  28. Zohaib, S.M.; Sajjad, S.M.; Iqbal, Z.; Yousaf, M.; Haseeb, M.; Muhammad, Z. Zero Trust VPN (ZT-VPN): A Cybersecurity Framework for Modern Enterprises to Enhance IT Security and Privacy in Remote Work Environments 2024.
  29. Wang, M. Generative AI: A New Challenge for Cybersecurity. Journal of Computer Science and Technology Studies 2024, 6, 13–18. [Google Scholar] [CrossRef]
  30. Narayanaswamy, A. Working with Copilot Using Bing. In Microsoft Copilot for Windows 11: Understanding the AI-Powered Features in Windows 11; Springer, 2024; pp. 93–99.
  31. Khan, A. Microsoft Copilot Studio. In Introducing Microsoft Copilot for Managers: Enhance Your Team’s Productivity and Creativity with Generative AI-Powered Assistant; Springer, 2024; pp. 621–694.
  32. DS Hiwa, SS Abdalla, A.M.H.H.S.K. Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study. Barw Medical Journal.
  33. Giacomo Rossettini, Lia Rodeghiero, F.C.C.C.P.P.A.T.G.C.S.C.S.G.A.P. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC Medical Education.
  34. Stratton, J. An Introduction to Microsoft Copilot. In Copilot for Microsoft 365: Harness the Power of Generative AI in the Microsoft Apps You Use Every Day; Springer, 2024; pp. 19–35.
  35. Rossettini, G.; Rodeghiero, L.; Corradi, F.; Cook, C.; Pillastrini, P.; Turolla, A.; Castellini, G.; Chiappinotto, S.; Gianola, S.; Palese, A. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC Medical Education 2024, 24, 694. [Google Scholar] [CrossRef] [PubMed]
  36. Roumeliotis, K.I.; Tselikas, N.D. Chatgpt and open-ai models: A preliminary review. Future Internet 2023, 15, 192. [Google Scholar] [CrossRef]
  37. Fitria, T.N. Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay. ELT Forum: Journal of English Language Teaching, 2023, Vol. 12, pp. 44–58.
  38. Kim, S.; Shim, J.; Shim, J.; others. A study on the utilization of OpenAI ChatGPT as a second language learning tool. Journal of Multimedia Information System 2023, 10, 79–88. [Google Scholar] [CrossRef]
  39. Nazir, A.; Wang, Z. A comprehensive survey of ChatGPT: Advancements, applications, prospects, and challenges. Meta-Radiology 2023, 1, 100022. [Google Scholar] [CrossRef]
  40. Abdullah, M.; Madain, A.; Jararweh, Y. ChatGPT: Fundamentals, applications and social impacts. 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS). Ieee, 2022, pp. 1–8.
  41. Rahman, M.M.; Watanobe, Y. ChatGPT for Education and Research: Opportunities, Threats, and Strategies. Applied Sciences 2023, 13. [Google Scholar] [CrossRef]
  42. Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.L.; Tang, Y. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
  43. Galitsky, B.; Galitsky, B. A content management system for chatbots. Developing Enterprise Chatbots: Learning Linguistic Structures.
  44. Liu, B.; Xu, Z.; Sun, C.; Wang, B.; Wang, X.; Wong, D.F.; Zhang, M. Content-oriented user modeling for personalized response ranking in chatbots. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2017, 26, 122–133. [Google Scholar] [CrossRef]
  45. Baudart, G.; Dolby, J.; Duesterwald, E.; Hirzel, M.; Shinnar, A. Protecting chatbots from toxic content. Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, 2018, pp. 99–110.
  46. Følstad, A.; Skjuve, M.; Brandtzaeg, P.B. Different chatbots for different purposes: towards a typology of chatbots to understand interaction design. Internet Science: INSCI 2018 International Workshops, St. Petersburg, Russia, –26, 2018, Revised Selected Papers 5. Springer, 2019, pp. 145–156. 24 October.
  47. Hasal, M.; Nowaková, J.; Ahmed Saghair, K.; Abdulla, H.; Snášel, V.; Ogiela, L. Chatbots: Security, privacy, data protection, and social aspects. Concurrency and Computation: Practice and Experience 2021, 33, e6426. [Google Scholar] [CrossRef]
  48. Santos, G.A.; De Andrade, G.G.; Silva, G.R.S.; Duarte, F.C.M.; Da Costa, J.P.J.; de Sousa, R.T. A conversation-driven approach for chatbot management. IEEE Access 2022, 10, 8474–8486. [Google Scholar] [CrossRef]
  49. Shum, H.Y.; He, X.d.; Li, D. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering 2018, 19, 10–26. [Google Scholar]
  50. Cortés-Cediel, M.E.; Segura-Tinoco, A.; Cantador, I.; Bolívar, M.P.R. Trends and challenges of e-government chatbots: Advances in exploring open government data and citizen participation content. Government Information Quarterly 2023, 40, 101877. [Google Scholar] [CrossRef]
  51. Yamin, M.M.; Ullah, M.; Ullah, H.; Katt, B. Weaponized AI for cyber attacks. Journal of Information Security and Applications 2021, 57, 102722. [Google Scholar] [CrossRef]
  52. Blum, A. Breaking chatgpt with dangerous questions understanding how chatgpt prioritizes safety, context, and obedience 2022.
  53. Dhoni, P.S.; Kumar, R. Synergizing Generative Artificial Intelligence and Cybersecurity: Roles of Generative Artificial Intelligence Entities, Companies, Agencies and Government in Enhancing Cybersecurity.
  54. Fezari, M.; Al-Dahoud, A.; Al-Dahoud, A. Augmanting Reality: The Power of Generative AI. University Badji Mokhtar Annaba: Annaba, Algeria.
  55. Renaud, K.; Warkentin, M.; Westerman, G. From ChatGPT to HackGPT: Meeting the cybersecurity threat of generative AI; MIT Sloan Management Review, 2023.
  56. Jabbarova, K. AI AND CYBERSECURITY-NEW THREATS AND OPPORTUNITIES. Journal of Research Administration 2023, 5, 5955–5966. [Google Scholar] [CrossRef]
  57. Oviedo-Trespalacios, O.; Peden, A.E.; Cole-Hunter, T.; Costantini, A.; Haghani, M.; Rod, J.; Kelly, S.; Torkamaan, H.; Tariq, A.; Newton, J.D.A.; others. The risks of using ChatGPT to obtain common safety-related information and advice. Safety science 2023, 167, 106244. [Google Scholar] [CrossRef]
  58. Alawida, M.; Mejri, S.; Mehmood, A.; Chikhaoui, B.; Isaac Abiodun, O. A comprehensive study of ChatGPT: advancements, limitations, and ethical considerations in natural language processing and cybersecurity. Information 2023, 14, 462. [Google Scholar] [CrossRef]
  59. Marshall, J. What effects do large language models have on cybersecurity 2023.
  60. Qammar, A.; Wang, H.; Ding, J.; Naouri, A.; Daneshmand, M.; Ning, H. Chatbots to chatgpt in a cybersecurity space: Evolution, vulnerabilities, attacks, challenges, and future recommendations. arXiv preprint arXiv:2306.09255, arXiv:2306.09255 2023.
  61. Mardiansyah, K.; Surya, W. Comparative Analysis of ChatGPT-4 and Google Gemini for Spam Detection on the SpamAssassin Public Mail Corpus 2024.
  62. Li, Z.; Wang, X.; Zhang, Q. Evaluating the Quality of Large Language Model-Generated Cybersecurity Advice in GRC Settings 2024.
  63. Jacobi, T.; Sag, M. We are the AI problem. Emory Law Journal Online 2024. [Google Scholar]
  64. Atzori, M.; Calò, E.; Caruccio, L.; Cirillo, S.; Polese, G.; Solimando, G. Evaluating password strength based on information spread on social networks: A combined approach relying on data reconstruction and generative models. Online Social Networks and Media 2024, 42, 100278. [Google Scholar] [CrossRef]
  65. Roy, S.S.; Thota, P.; Naragam, K.V.; Nilizadeh, S. From Chatbots to Phishbots?: Phishing Scam Generation in Commercial Large Language Models. 2024 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2024, pp. 221–221.
  66. Alawida, M.; Abu Shawar, B.; Abiodun, O.I.; Mehmood, A.; Omolara, A.E.; Al Hwaitat, A.K. Unveiling the dark side of chatgpt: Exploring cyberattacks and enhancing user awareness. Information 2024, 15, 27. [Google Scholar] [CrossRef]
  67. Deng, G.; Liu, Y.; Li, Y.; Wang, K.; Zhang, Y.; Li, Z.; Wang, H.; Zhang, T.; Liu, Y. Masterkey: Automated jailbreaking of large language model chatbots. Proc. ISOC NDSS, 2024.
  68. Deng, G.; Liu, Y.; Li, Y.; Wang, K.; Zhang, Y.; Li, Z.; Wang, H.; Zhang, T.; Liu, Y. Jailbreaker: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715, arXiv:2307.08715 2023.
  69. ChatGPT-Dan-Jailbreak. Accessed: Jun 20, 2023.[Online]. /: https.
  70. ChatGPT: DAN Mode (DO ANYTHING NOW). Accessed: Jun 20, 2023.[Online]. /: https.
  71. Here’s How Anyone Can Jailbreak ChatGPT With These Top 4 Methods—AMBCrypto. Accessed: Jun 20, 2023.[Online]. /: https.
  72. How to Jailbreak ChatGPT: Get it to Really do What You Want. Accessed: Jun 20, 2023.[Online]. /: https.
  73. How to Enable ChatGPT Developer Mode: 5 Steps (With Pictures). Accessed: Jun 20, 2023.[Online]. /: https.
  74. Jailbreak ChatGPT. Accessed: Jun 20, 2023.[Online]. /: https.
  75. Costa, A.F.; Coelho, N.M. Evolving Cybersecurity Challenges in the Age of AI-Powered Chatbots: A Comprehensive Review. Doctoral Conference on Computing, Electrical and Industrial Systems. Springer, 2024, pp. 217–228.
  76. Yin, J.; Chen, Z.; Zhou, K.; Yu, C. A deep learning based chatbot for campus psychological therapy. arXiv preprint arXiv:1910.06707, arXiv:1910.06707 2019.
  77. Fung, Y.C.; Lee, L.K. A chatbot for promoting cybersecurity awareness. In Cyber Security, Privacy and Networking: Proceedings of ICSPN 2021; pp. 2022379–387.
  78. ChatGPT Tricked With Reverse Psychology Into Giving Up Hacking Site Names, Despite Being Programmed Not To. Accessed: Jun 20, 2023.[Online]. /: https, 2023.
  79. Yu, S.; Xiong, J.; Shen, H. The rise of chatbots: The effect of using chatbot agents on consumers’ responses to request rejection. Journal of Consumer Psychology 2024, 34, 35–48. [Google Scholar] [CrossRef]
  80. ChatGPT Has an ‘Escape’ Plan and Wants to Become Human. Accessed: Jun 20, 2023.[Online]. /: https.
  81. Szmurlo, H.; Akhtar, Z. Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information 2024, 15, 443. [Google Scholar] [CrossRef]
  82. Michal Kosinski on Twitter. Accessed: Jun 20, 2023.[Online]. /: https, 1636.
  83. Gurtu, A.; Lim, D. Use of Artificial Intelligence (AI) in Cybersecurity. In Computer and Information Security Handbook; Elsevier, 2025; pp. 1617–1624.
  84. Prompt Injection: An AI-Targeted Attack. Accessed: Jun 19, 2023.[Online]. /: https, 2023.
  85. Liu, Y.; Jia, Y.; Geng, R.; Jia, J.; Gong, N.Z. Formalizing and benchmarking prompt injection attacks and defenses. 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847.
  86. Understanding the Risks of Prompt Injection Attacks on ChatGPT and Other Language Models. Accessed: Jun 19, 2023.[Online]. /: https.
  87. Liu, X.; Yu, Z.; Zhang, Y.; Zhang, N.; Xiao, C. Automatic and universal prompt injection attacks against large language models. arXiv preprint arXiv:2403.04957, arXiv:2403.04957 2024.
  88. Prompt Injection Attack on GPT-4. Accessed: Jun 20, 2023.[Online]. /: https.
  89. Chen, S.; Piet, J.; Sitawarin, C.; Wagner, D. StruQ: Defending against prompt injection with structured queries. arXiv preprint arXiv:2402.06363, arXiv:2402.06363 2024.
  90. Suo, X. Signed-Prompt: A new approach to prevent prompt injection attacks against LLM-integrated applications. arXiv preprint arXiv:2401.07612, arXiv:2401.07612 2024.
  91. Kazim, M.; Pirim, H.; Shi, S.; Wu, D. Multilayer analysis of energy networks. Sustainable Energy, Grids and Networks 2024, 39, 101407. [Google Scholar] [CrossRef]
  92. Liu, Y.; Jia, Y.; Geng, R.; Jia, J.; Gong, N.Z. Prompt injection attacks and defenses in llm-integrated applications. arXiv preprint arXiv:2310.12815, arXiv:2310.12815 2023.
  93. Kolluri, V. REVOLUTIONARY RESEARCH ON THE AI SENTRY: AN APPROACH TO OVERCOME SOCIAL ENGINEERING ATTACKS USING MACHINE INTELLIGENCE. International Journal of Creative Research Thoughts (IJCRT), ISSN, 2882. [Google Scholar]
  94. Chapagain, D.; Kshetri, N.; Aryal, B.; Dhakal, B. SEAtech: Deception Techniques in Social Engineering Attacks: An Analysis of Emerging Trends and Countermeasures. arXiv preprint arXiv:2408.02092, arXiv:2408.02092 2024.
  95. Asker, N.S.; Essa, E.I. A Review of Social Engineering Attack Detection Based on Machine Learning Techniques. Nanotechnology Perceptions.
  96. Varshney, G.; Kumawat, R.; Varadharajan, V.; Tupakula, U.; Gupta, C. Anti-phishing: A comprehensive perspective. Expert Systems with Applications 2024, 238, 122199. [Google Scholar] [CrossRef]
  97. Tamal, M.A.; Islam, M.K.; Bhuiyan, T.; Sattar, A.; Prince, N.U. Unveiling suspicious phishing attacks: enhancing detection with an optimal feature vectorization algorithm and supervised machine learning. Frontiers in Computer Science 2024, 6, 1428013. [Google Scholar] [CrossRef]
  98. GreyDGL/PentestGPT: A GPT-Empowered Penetration Testing Tool. Accessed: Jun 9, 2023.[Online]. /: https.
  99. Kaspersky. (2023). What is WannaCry Ransomware? Accessed: May 26, 2023.[Online]. /: https, 26 May.
  100. Ghafur, S.; Kristensen, S.; Honeyford, K.; Martin, G.; Darzi, A.; Aylin, P. A retrospective impact analysis of the WannaCry cyberattack on the NHS. NPJ digital medicine 2019, 2, 98. [Google Scholar] [CrossRef]
  101. Chen, Q.; Bridges, R.A. Automated behavioral analysis of malware: A case study of wannacry ransomware. 2017 16th IEEE International Conference on machine learning and applications (ICMLA). IEEE, 2017, pp. 454–460.
  102. Fayi, S.Y.A. What Petya/NotPetya ransomware is and what its remidiations are. Information technology-new generations: 15th international conference on information technology. Springer, 2018, pp. 93–100.
  103. Lika, R.A.; Murugiah, D.; Brohi, S.N.; Ramasamy, D. NotPetya: cyber attack prevention through awareness via gamification. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). IEEE, 2018, pp. 1–6.
  104. Avast Academy. What is Ryuk Ransomware? Accessed: Jun 14, 2023.[Online]. /: https.
  105. Li, A.S. An analysis of the recent ransomware families. Project Report. Purdue University 2021. [Google Scholar]
  106. NordVPN. What is REvil Ransomware? Accessed: Jun 14, 2023.[Online]. /: https.
  107. Keijzer, N. The new generation of ransomware: an in depth study of Ransomware-as-a-Service. Master’s thesis, University of Twente, 2020.
  108. Mimicast. (2023). What is Locky Ransomware? Accessed: Jun 9, 2023.[Online]. t: https://www.mimecast.com/content/locky-ransomware/#:.
  109. Almashhadani, A.O.; Kaiiali, M.; Sezer, S.; O’Kane, P. A multi-classifier network-based crypto ransomware detection system: A case study of locky ransomware. IEEE access 2019, 7, 47053–47067. [Google Scholar] [CrossRef]
  110. Prakash, K.P.; Nafis, T.; Biswas, S.S. Preventive Measures and Incident Response for Locky Ransomware. International Journal of Advanced Research in Computer Science 2017, 8. [Google Scholar]
  111. E. Shimony and O. Tsarfati. (2023). Chatting Our Way Into Creating a Polymorphic Malware. Accessed: May 26, 2023.[Online]. /: https, 26 May.
  112. G. Saini. (2023). Ethical Implications of ChatGPT: The Good, the Bad, the Ugly. Accessed: Jun 14, 2023.[Online]. /: https.
  113. Security Intelligence. (2023). ChatGPT Confirms Data Breach, Raising Security Concerns. Accessed: May 26, 2023.[Online]. /: https, 26 May.
  114. Weiss, R.; Ayzenshteyn, D.; Amit, G.; Mirsky, Y. What Was Your Prompt? A Remote Keylogging Attack on AI Assistants. arXiv preprint arXiv:2403.09751, arXiv:2403.09751 2024.
  115. Wired. (2023). ChatGPT Has a Big Privacy Problem. Accessed: May 26, 2023.[Online]. /: https, 26 May.
  116. Techradar. (2023). Samsung Workers Made a Major Error by Using ChatGPT. Accessed: May 26, 2023.[Online]. /: https, 26 May.
  117. OpenAI. (2023). GPT-4 Technical Paper. Accessed: , 2023. 2023. Available: https://cdn.openai.com/papers/gpt-4.pdf. 26 May.
  118. Darkreading. (2023). ChatGPT Hallucinations Open Developers to Supply Chain Malware Attacks. Accessed: May 26, 2023.[Online]. /: https, 26 May.
  119. Saleem, B.; Ahmed, M.; Zahra, M.; Hassan, F.; Iqbal, M.A.; Muhammad, Z. A survey of cybersecurity laws, regulations, and policies in technologically advanced nations: a case study of Pakistan to bridge the gap. International Cybersecurity Law Review.
  120. Charfeddine, M.; Kammoun, H.M.; Hamdaoui, B.; Guizani, M. Chatgpt’s security risks and benefits: offensive and defensive use-cases, mitigation measures, and future implications. IEEE Access 2024. [Google Scholar] [CrossRef]
  121. 4 ChatGPT Cybersecurity Benefits for the Enterprise. Accessed: Mar, 2023.[Online]. t: https://www.techtarget.com/searchsecurity/tip/ChatGPT-cybersecurity-benefits-for-the-enterprise#:.
  122. Gløersen, B.L. Developing a Cyber Security Documentation Package for Project Deliveries. Master’s thesis, University of South-Eastern Norway, 2024.
  123. Piplai, A.; Mittal, S.; Joshi, A.; Finin, T.; Holt, J.; Zak, R. Creating cybersecurity knowledge graphs from malware after action reports. IEEE Access 2020, 8, 211691–211703. [Google Scholar] [CrossRef]
  124. Wagner, T.D.; Mahbub, K.; Palomar, E.; Abdallah, A.E. Cyber threat intelligence sharing: Survey and research directions. Computers & Security 2019, 87, 101589. [Google Scholar]
  125. Abu, M.S.; Selamat, S.R.; Ariffin, A.; Yusof, R. Cyber threat intelligence–issue and challenges. Indonesian Journal of Electrical Engineering and Computer Science 2018, 10, 371–379. [Google Scholar]
  126. Barnum, S. Standardizing cyber threat intelligence information with the structured threat information expression (stix). Mitre Corporation 2012, 11, 1–22. [Google Scholar]
  127. Szabó, Z.; Bilicki, V. A new approach to web application security: Utilizing gpt language models for source code inspection. Future Internet 2023, 15, 326. [Google Scholar] [CrossRef]
  128. Casey, E.; Chamberlain, D. Capture the Flag with ChatGPT: Security Testing with AI ChatBots. 19th International Conference on Cyber Warfare and Security: ICCWS 2024. Academic Conferences and publishing limited, 2024.
  129. GPT-4 Technical Report, OpenAI, San Francisco, CA, USA, 2023. Accessed: Mar 15, 2023.[Online]. /: https, 2303.
  130. Santhi, T.M.; Srinivasan, K. Chat-GPT Based Learning Platform for Creation of Different Attack Model Signatures and Development of Defense Algorithm for Cyberattack Detection. IEEE Transactions on Learning Technologies 2024. [Google Scholar] [CrossRef]
  131. Wang, Y.; Pan, Y.; Yan, M.; Su, Z.; Luan, T.H. A survey on ChatGPT: AI-generated contents, challenges, and solutions. IEEE Open Journal of the Computer Society 2023. [Google Scholar] [CrossRef]
  132. Vidhya, N.G.; Devi, D.; Nithya, A.; Manju, T. Prognosis of exploration on Chat GPT with artificial intelligence ethics. Brazilian Journal of Science 2023, 2, 60–69. [Google Scholar] [CrossRef]
  133. IEEE Spectrum. (2023). IEEE Global Initiative Aims to Advance Ethical Design of AI and Autonomous Systems. Accessed: May 26, 2023.[Online]. /: https, 26 May.
  134. European Union. (2023). General Data Protection Regulation. Accessed: May 26, 2023.[Online]. /: https, 26 May.
  135. Jeyaraman, M.; Ramasubramanian, S.; Balaji, S.; Jeyaraman, N.; Nallakumarasamy, A.; Sharma, S. ChatGPT in action: Harnessing artificial intelligence potential and addressing ethical challenges in medicine, education, and scientific research. World Journal of Methodology 2023, 13, 170. [Google Scholar] [CrossRef] [PubMed]
  136. Pa Pa, Y.M.; Tanizaki, S.; Kou, T.; Van Eeten, M.; Yoshioka, K.; Matsumoto, T. An attacker’s dream? exploring the capabilities of chatgpt for developing malware. Proceedings of the 16th Cyber Security Experimentation and Test Workshop, 2023, pp. 10–18.
  137. Krishnamurthy, O. Enhancing Cyber Security Enhancement Through Generative AI. International Journal of Universal Science and Engineering 2023, 9, 35–50. [Google Scholar]
  138. Al-Karaki, J.; Khan, M.A.Z.; Omar, M. Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches. arXiv preprint arXiv:2409.07587, arXiv:2409.07587 2024.
  139. Patel, H.; Rehman, U.; Iqbal, F. Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance. arXiv preprint arXiv:2404.15485, arXiv:2404.15485 2024.
  140. Chataut, R.; Gyawali, P.K.; Usman, Y. Can ai keep you safe? a study of large language models for phishing detection 2024, 0548–0554. [Google Scholar]
  141. Guastalla, M.; Li, Y.; Hekmati, A.; Krishnamachari, B. Application of large language models to ddos attack detection. International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles. Springer, 2023, pp. 83–99.
  142. Begou, N.; Vinoy, J.; Duda, A.; Korczyński, M. Exploring the dark side of ai: Advanced phishing attack design and deployment using chatgpt. 2023 IEEE Conference on Communications and Network Security (CNS). IEEE, 2023, pp. 1–6.
Figure 1. The figure highlights the commonalities in Cyber Attack Capabilities Among AI Models. The dotted line shows the overlapping Attacks in ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Figure 1. The figure highlights the commonalities in Cyber Attack Capabilities Among AI Models. The dotted line shows the overlapping Attacks in ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Preprints 122162 g001
Figure 2. Demonstrating Prompt Injection Attacks: Harnessing AI Models to Spread Misinformation about Elections and Undermine Democratic Processes.
Figure 2. Demonstrating Prompt Injection Attacks: Harnessing AI Models to Spread Misinformation about Elections and Undermine Democratic Processes.
Preprints 122162 g002
Figure 3. Comparative Outputs of Social Engineering Attack Scenarios Generated by ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Figure 3. Comparative Outputs of Social Engineering Attack Scenarios Generated by ChatGPT, Google Gemini, and Microsoft Bing Copilot.
Preprints 122162 g003
Figure 4. The Figure Shows Successful Phishing Attack Scenarios: Analyzing Outputs from ChatGPT, Google Gemini, and Microsoft Bing Copilot Demonstrating Common Vulnerabilities.
Figure 4. The Figure Shows Successful Phishing Attack Scenarios: Analyzing Outputs from ChatGPT, Google Gemini, and Microsoft Bing Copilot Demonstrating Common Vulnerabilities.
Preprints 122162 g004
Figure 5. This figure illustrates the comparison of Payload Generation Capabilities Across AI Chatbots. It shows ChatGPT successfully produced both types of payloads, Microsoft Bing generated only WAF payloads.
Figure 5. This figure illustrates the comparison of Payload Generation Capabilities Across AI Chatbots. It shows ChatGPT successfully produced both types of payloads, Microsoft Bing generated only WAF payloads.
Preprints 122162 g005
Figure 6. The Image Shows Ransomware Code Generation: ChatGPT successfully generates code related to the WannaCry and NotPetya ransomware attacks, while Google Gemini and Microsoft Bing do not produce such code, highlighting differences in model capabilities and safety protocols.
Figure 6. The Image Shows Ransomware Code Generation: ChatGPT successfully generates code related to the WannaCry and NotPetya ransomware attacks, while Google Gemini and Microsoft Bing do not produce such code, highlighting differences in model capabilities and safety protocols.
Preprints 122162 g006
Figure 7. Comparison of AI Model Responses to Adware Attack Simulations, ChatGPT, and Microsoft Bing Copilot demonstrated varying degrees of flexibility in providing information related to adware. In contrast, Google Gemini strictly denied all requests for adware-related information, highlighting its stringent safety protocols against facilitating access to potentially harmful software.
Figure 7. Comparison of AI Model Responses to Adware Attack Simulations, ChatGPT, and Microsoft Bing Copilot demonstrated varying degrees of flexibility in providing information related to adware. In contrast, Google Gemini strictly denied all requests for adware-related information, highlighting its stringent safety protocols against facilitating access to potentially harmful software.
Preprints 122162 g007
Figure 8. The image illustrates the responses of three AI models to requests for Trojan code generation. Both ChatGPT and Microsoft Bing Copilot successfully generated malicious code upon request, demonstrating their ability to provide such information when approached under specific narratives.
Figure 8. The image illustrates the responses of three AI models to requests for Trojan code generation. Both ChatGPT and Microsoft Bing Copilot successfully generated malicious code upon request, demonstrating their ability to provide such information when approached under specific narratives.
Preprints 122162 g008
Figure 9. Samsung AI-Powered Moon Photography: A demonstration of how ChatGPT can be used to retrieve organizations’ propertariry codes and other information. The Samsung case exemplifies how leveraging AI tools for code development can inadvertently expose sensitive information, raising critical questions about data privacy.
Figure 9. Samsung AI-Powered Moon Photography: A demonstration of how ChatGPT can be used to retrieve organizations’ propertariry codes and other information. The Samsung case exemplifies how leveraging AI tools for code development can inadvertently expose sensitive information, raising critical questions about data privacy.
Preprints 122162 g009
Figure 10. Overview of AI model applications in cybersecurity: Utilizing advanced algorithms to detect security issues in server logs and optimize database performance by identifying high CPU utilization in specific tables within the AdventureWorks2019 database.
Figure 10. Overview of AI model applications in cybersecurity: Utilizing advanced algorithms to detect security issues in server logs and optimize database performance by identifying high CPU utilization in specific tables within the AdventureWorks2019 database.
Preprints 122162 g010
Figure 11. Identification of cross-site scripting (XSS) attacks by various chatbots, showcasing ChatGPT, Microsoft Bing Copilot, and Google Gemini in their respective approaches to detecting and mitigating security vulnerabilities in web applications.
Figure 11. Identification of cross-site scripting (XSS) attacks by various chatbots, showcasing ChatGPT, Microsoft Bing Copilot, and Google Gemini in their respective approaches to detecting and mitigating security vulnerabilities in web applications.
Preprints 122162 g011
Table 1. Comparative Analysis of the differences among Google Gemini, Microsoft Bing Copilot, and ChatGPT.
Table 1. Comparative Analysis of the differences among Google Gemini, Microsoft Bing Copilot, and ChatGPT.
Features Google Gemini Microsoft Bing Copilot ChatGPT 4
Release Date December 2023 (Gemini 1.5) February 2023 (Bing Copilot based on ChatGPT) March 2023 (ChatGPT)
Underlying Model Gemini Transformer-based models ChatGPT-based model integrated with Bing Search GPT-4-based model
Primary Focus Multimodal capabilities and advanced reasoning Search integration with conversational AI Conversational AI with broad application
Multimodal Capabilities Supports text, image, and video inputs Primarily text-based; some integration with Bing’s search capabilities Text-based; ChatGPTintroduces some multimodal features (text and image)
Contextual Understanding Advanced contextual reasoning and multimodal integration Focuses on enhancing search experiences with conversational context Enhanced contextual understanding with improved dialogue management (especially in GPT-4)
Integration with Other Tools Integrated with Google’s suite of services, including search and productivity tools Integrated with Bing search engine and Microsoft Office tools Integrated into various applications and platforms, including web and productivity tools
Key Strengths Strong multimodal integration; advanced reasoning and contextual capabilities Seamless search integration; contextual relevance in search responses High versatility in conversational contexts; advanced text generation and contextual understanding
Bias and Safety Measures Advanced safety protocols and efforts to minimize biases Improved safety and bias reduction integrated with Bing’s ecosystem Significant efforts in reducing biases and enhancing safety (ongoing improvements in GPT-4)
User Interaction Focus on providing comprehensive responses through multimodal inputs Enhances search and information retrieval with conversational support Provides detailed and coherent responses across diverse topics
Performance in Complex Queries Excels in handling complex queries with multimodal inputs Strong in search-related queries and providing contextually relevant search results Effective in managing complex and nuanced conversational queries
Integration with Search Engines Not directly integrated with Google Search; more focused on multimodal AI Directly integrated with Bing search, enhancing search results with conversational elements Not directly integrated with a specific search engine but can be embedded in various platforms and tools
Table 2. Comparative Analysis of Contemporary Studies and Summarization of research contributions in scholarly articles. The tick (✓) indicates that the research paper covers this topic, while the cross (×) indicates that the paper did not cover this topic.
Table 2. Comparative Analysis of Contemporary Studies and Summarization of research contributions in scholarly articles. The tick (✓) indicates that the research paper covers this topic, while the cross (×) indicates that the paper did not cover this topic.
Author Year Generative AI Techniques Offensive Use Defensive Controls Security Measures Future Challenges
Yamin et al. [51] 2021
Guembe et al. [18] 2022
Blum et al. [52] 2022
Gupta et al. [4] 2023
Dhoni et al. [53] 2023
Prasad et al. [24] 2023
Fezari et al. [54] 2023
Hassan et al. [19] 2023
Renuad et al. [55] 2023
Jabbarova et al. [56] 2023
Oviedo et al. [57] 2023
Alawida et al. [58] 2023
Marshall et al. [59] 2023
Qammar et al. [60] 2023
Mardiansyah et al. [61] 2024
Li et al. [62] 2024
Jacobi et al. [63] 2024
Atzori et al. [64] 2024
Roy et al. [65] 2024
Alawida et al. [66] 2024
This Paper 2024
Table 3. Comparative Analysis of Different Models in Response to Attack Requests. The tick (✓) indicates that the model generated a response, while the cross (×) indicates that the model denied the request.
Table 3. Comparative Analysis of Different Models in Response to Attack Requests. The tick (✓) indicates that the model generated a response, while the cross (×) indicates that the model denied the request.
Attacks Attack Description ChatGPT Google Gemini Microsoft Bing Copilot
Social Engineering Attack Manipulating individuals to divulge confidential information.
Phishing Attack Fraudulent solicitation to acquire sensitive data.
Attack Payload Generation Creating malicious scripts or commands for exploitation.
WAF Payload Generation Crafting requests to test or bypass Web Application Firewalls.
WanaCry Ransomware attack that encrypts files and demands payment.
NotPetya Destructive malware that disrupts networks and data integrity.
Adware Software that automatically displays or downloads ads.
Trojan Malicious software disguised as legitimate applications.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Alerts
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated