1. Introduction
In 1950, Alan Turing published his seminal paper, “Computing Machinery and Intelligence” (Turing, 2012), posing a profound question, “Can machines think?” More than seventy years later, on November 30, 2022, the global community witnessed the launch of ChatGPT (Chat Generative Pre-training Transformer), a revolutionary Artificial Intelligence (AI) language model developed by OpenAI. This innovation, a product of advancements in Large Language Models (LLMs), exhibits capabilities suggestive of thought, comprehension, voice, video, and creativity. Within a remarkably short span, ChatGPT has already precipitated significant changes across various sectors.
Throughout history, transformative technologies like manufacturing automation and the rise of e-commerce have ushered in new epochs. ChatGPT's rapid adoption reflects this historical pattern. For instance, in Europe, the travel company Expedia has harnessed AI chatbots to help users plan cost-effective, eco-friendly trips (Blesiada, 2023). According to Enterprise Apps Today, the technology and education sectors are among the foremost adopters of OpenAI's solutions, with industries such as business services, manufacturing, and finance also integrating AI into their operations (Elad, 2024). A 2023 Goldman Sachs report highlighted AI's potential to displace up to 300 million full-time jobs (Kelly, 2023), sparking debates among financial analysts about the future relevance of their roles in an increasingly automated economy.
ChatGPT, fundamentally a language model, is trained on extensive datasets using advanced Natural Language Processing (NLP) techniques. This training enables the model to perform a wide array of tasks. Its capability is further enhanced by Reinforcement Learning from Human Feedback, which allows ChatGPT to better understand and respond to user expectations. This sets it apart from traditional search engines by providing specific, concise answers instead of a list of relevant links. The addition of features like the Code Interpreter (now known as Advanced Data Analysis) equips ChatGPT Plus users with tools for complex tasks such as financial modeling, forecasting, and risk analysis.
However, the accuracy and quality of ChatGPT's responses can vary based on the question posed, the training data available, the complexity of the topic, and the given instructions or prompts. The evolution of such AI capabilities necessitates a reconsideration of Turing's original question, inviting us to reflect on the nature of intelligence and the intersection of human cognition and artificial computation. In light of ongoing AI-induced transformations, this study aims to uncover the diverse impacts of AI, particularly ChatGPT, on financial analysis and strategy, exploring how the financial sector can adapt to these technological shifts.
The contributions of this study are manifold, enhancing the understanding of academicians, developers, and stakeholders in the integration of ChatGPT into financial practices. This includes providing detailed insights into ChatGPT's financial applications, contrasting its capabilities with those of human analysts, and discussing potential limitations in financial contexts. The paper is organized as follows:
Section 2 reviews AI-related financial studies;
Section 3 outlines the empirical design and test;
Section 4 presents and analyzes findings, and
Section 5 concludes the discussion.
2. Artificial Intelligence Techniques and Related Studies in Finance
2.1. AI Applications and Studies
Technological advancements have played a pivotal role in the evolution of the financial services industry. Historically, innovations such as telegrams and Morse code in the late 19th century significantly altered monetary transactions (Saunders, Cornett & Erhemjamts, 2021). The rise of financial technology (FinTech) has further transformed this landscape by influencing various facets such as information management, trading practices, and financial management. AI stands out as a significant and expanding field of interest among scholars and practitioners. Its applications extend across traditional areas like financial markets, trading, banking, investments, optimization, and insurance. Additionally, AI is increasingly pivotal in burgeoning FinTech sectors, including big data analytics, blockchain, and data mining. These applications are crucial for risk management and regulatory compliance (Ahmed, Alshater, El Ammari, & Hammami, 2022; Cao, 2022; Farooq & Chawla, 2021; Lin, 2019).
The momentum of FinTech surged notably in the 20th and 21st centuries, coinciding with the digital transition of banks to online platforms. The 1970s marked a critical turning point with the advent of algorithmic trading in financial institutions, which leveraged computer models to automate trading strategies. This technological evolution enabled the development of advanced trading models that could analyze extensive datasets, identify patterns, and make informed trading decisions. By the 2000s, these algorithmic trading strategies saw a dramatic increase in adoption, driven by technological advancements, an influx of historical and real-time market data, and the growing complexities within financial markets. Studies by Burgess (2021) using frameworks from Oxford University revealed that these AI-driven trading strategies often surpass human traders under various market conditions, including during the COVID-19 pandemic. The integration of machine learning and AI techniques has further refined algorithmic trading, significantly influencing market dynamics, liquidity, and trading strategies, thereby enhancing overall market efficiency (Chaboud, Chiquoine, Hjalmarsson, and Vega, 2014).
Moreover, advanced algorithms and machine learning models have demonstrated their efficacy in analyzing extensive datasets to identify potential risk (Demajo, Vella, & Dingli, 2020; Yu, Härdle, Borke, & Benschop, 2023) and detect patterns of fraudulent activities (Jullum, Løland, Huseby, Ånonsen, & Lorentzen, 2020). Since the 1990s, AI methodologies such as artificial neural networks, support vector machines, ensemble methods, generalized boosting, AdaBoost, and Random Forests have been employed to predict financial distress and failures in banks (Liu, Liu, & Sathye, 2021). Additionally, the application of Explainable AI (XAI) in credit models within the banking sector, such as credit scoring and credit default prediction, has been explored, contributing to the adoption of XAI techniques in the finance industry (Demajo, Vella, and Dingli, 2020; de Lange, Melsom, Vennerød, and Westgaard, 2022.
The use of AI to model behavioral biases has also gained prominence. The integration of Natural Language Processing (NLP) has become increasingly vital in finance studies since the early 21st century, covering areas such as text classification, sentiment analysis, and natural language generation. Research by Tetlock, Saar-Tsechansky, and Macskassy (2008) and Bollen, Mao, and Zeng (2011) has shown the predictive power of sentiment analysis in determining stock market trends, establishing a significant correlation between news sentiment and market behavior. Similarly, Félix, Kräussl, and Stork (2020) have employed machine learning-based models to construct implied volatility sentiment, further highlighting the utility of AI in financial analytics.
2.2. Studies of ChatGPT on Finance
Since its inception in November 2022, ChatGPT has sparked considerable academic interest in its application to finance. Researchers have explored its utility in a variety of financial tasks, including financial document classification, sentiment analysis, named entity recognition in financial texts, and financial data extraction (Zaremba & Demir, 2023). Traditional keyword-based methods in financial sentiment analysis have shown weaknesses, particularly in handling complex texts, as these methods are susceptible to adversarial manipulation (Boukes et al., 2020; Hartmann et al., 2023; Leippold, 2023a).
Further studies have assessed ChatGPT’s ability to interface with explainable artificial intelligence (XAI) models and to demystify complex financial concepts for lay audiences (Wenzlaff & Spaeth, 2022; Yue et al., 2023). Despite its advantages, Leippold (2023b) noted that large language models (LLMs) like GPT-3 may generate unfounded content, as demonstrated in tests involving GPT-3's responses on climate change topics. Additionally, Lopez-Lira and Tang (2023) identified a significant correlation between ChatGPT’s interpretations of corporate news and subsequent stock market reactions, suggesting its accuracy in financial analysis.
In the context of finance research, Dowling and Lucey (2023) highlighted ChatGPT’s contributions across various stages of research, particularly in the study of cryptocurrencies. Hansen and Kazinnik (2023) demonstrated ChatGPT’s effectiveness in analyzing central bank communications, underscoring its value in comparative studies and zero-shot learning capabilities.
Ali and Aysan (2023) investigated ChatGPT’s broader financial industry applications, including content generation, account management, customer support, investment advice, and regulatory compliance. In risk management, Zaremba and Demir (2023) emphasized the ethical and regulatory challenges associated with deploying ChatGPT in financial settings, calling for measures to ensure its responsible use. These inquiries collectively reflect the growing recognition of ChatGPT’s potential to significantly impact financial practices and research.
3. Empirical Design
3.1. AI Chatbots in the Finance Domain
The market for AI in finance is experiencing significant growth, driven by key players who are facilitating this transformation. Services such as KAI, AlphaChat, Growthbotics, and FinChat have been developed to meet the specific requirements of the financial sector. FinChat, in particular, leverages generative AI to provide investment research, offering fundamental investors relevant data through an interactive conversational interface.
A notable advancement in this field is OpenAI’s ChatGPT advanced data analysis. This tool significantly enhances the functionality of AI-driven conversational agents by enabling them to write and execute code. This capability is crucial for performing complex financial analyses, including data visualization, chart creation, file manipulation, and complex mathematical computations. Moreover, ChatGPT can analyze results, engage in logical reasoning, and produce downloadable outputs. These features make it an invaluable resource in financial analysis, where precision and efficiency are paramount.
On May 14, 2024, OpenAI introduced a groundbreaking new voice and video model, further extending the functionalities of AI in financial applications. This development promises to enrich the interactivity and accessibility of financial services, providing a more intuitive and engaging user experience and expanding the reach of AI-enhanced financial solutions.
3.2. Rationale of AI in Financial Analysis
Artificial intelligence (AI) is a branch of computer science dedicated to developing systems and machines capable of performing tasks that typically require human intelligence, such as learning, reasoning, problem-solving, perception, language understanding, and decision-making (Sokolov, 2019).
A central debate in AI research is whether AI competes with human capabilities or enhances human efficiency. A key distinction between traditional tools and AI is the principle of autonomy. Advanced artificial intelligence and sophisticated biological intelligence can exhibit autonomy, learning from data and experiences to make decisions and take actions without direct human intervention (Korteling et al., 2021). In contrast, conventional tools lack autonomy; they operate based on predetermined rules or instructions and require user manipulation to fulfill their intended purposes.
According to a study by Wei et al. (2022), enabling a chain of thought or intermediate reasoning steps enhances the complex reasoning capabilities of LLMs. In financial analysis, reasoning involves using available financial data, information, and relevant factors to make judgments, conclusions, and inferences about companies, businesses, projects, investments, or financial markets. This process demands critical thinking, analysis, and problem-solving skills.
Financial professionals, including analysts, traders, and investors, typically employ reasoning to analyze financial statements, assess performance metrics, forecast future outcomes, and devise strategies for investment, budgeting, and financial planning. These professionals must possess a solid foundation in algebra and mathematics to excel in their roles. They utilize a range of robust tools to support their research, analysis, and investment management efforts. These tools include charting software, technical analysis applications, options and derivatives analyzers, portfolio management solutions, and algorithmic trading platforms. Excel is fundamental for tasks such as ratio analysis, risk management, investment analysis, and asset valuation. When handling vast datasets, a deep understanding of mathematical and statistical techniques is crucial to draw accurate conclusions from financial data.
Furthermore, these professionals are responsible for developing advanced financial models and conducting extensive research. Proficiency in specialized financial software and programming languages like C++, R, SAS, and Python is vital for effectively navigating the financial landscape.
Son et al. (2023) investigated the application of LLMs in financial reasoning, confirming the models' ability to generate coherent investment opinions. Although the study does not provide a detailed description of the reasoning process in financial analysis with LLMs, it underscores the importance of task formulation, synthetic data generation, prompting methods, and evaluation capability in influencing the quality of responses generated by LLMs.
3.3. Financial Reasoning in a Nutshell
Unlike basic mathematical calculations, most financial calculations involve multi-step processes. A prime example is the concept of present value, a fundamental principle in finance widely used to determine the value of shares, bonds, projects, or entire businesses. Calculating present value requires several steps: identifying future cash flows, selecting the appropriate discount rate, determining the number of periods for each cash flow, and computing the present value for each cash flow. When ChatGPT undertakes this task, it establishes a numerical reasoning path to integrate all the information and deduce meaningful answers. However, questions arise regarding the capability of LLMs like ChatGPT to perform more complex tasks within these basic present value calculations, such as determining the appropriate discount rate when it is not provided.
Advanced financial analysis demands accurate reading comprehension, logical interpretation, and the application of financial principles. For instance, evaluating a company's operational status involves interpreting extensive financial statements to extract meaningful insights, identifying data patterns and relationships, and subsequently analyzing and formulating strategies. When making investment decisions, it is essential to consider the cross-temporal and cross-domain characteristics of financial investments, conduct both fundamental and technical analyses, and choose the best investment strategy amidst various uncertainties.
Given these considerations, our empirical examination is structured into three distinct domains within financial analysis: corporate finance, investment, and derivatives. To gain deeper insights into the performance of ChatGPT-4o in managing these tasks, we have categorized them based on the complexity of the reasoning process into Multi-Step Reasoning Tasks and Complex Reasoning Tasks. Additionally, we will assess the effectiveness of traditional tools used by human analysts, such as mathematical equations, Excel, Refinitiv, Stata, and other resources, as benchmarks to evaluate the achievement of our objectives.
3.4. Tasks/Prompt
3.4.1. Multi-Step Reasoning Tasks
The Multi-Step Reasoning task consists of 32 questions covering various topics in corporate finance, investments, and derivatives. These topics include but are not limited to, present value, future value, annuities, payment schedules, investment accumulation, and rate calculations. The task also explores basic futures and options pricing models, value calculation, risk management, and forecasting. Additionally, it includes qualitative assessments leading to decision-making or recommendations regarding future dividend payouts and capital structure. A detailed overview of these tasks can be found in Appendix 1.
These tasks primarily require straightforward calculations or judgments, involving a series of logical or computational steps to reach a specific conclusion. They are usually solvable through explicit logic and analysis without subjective judgments. It is expected that ChatGPT-4o will provide accurate computational formulas and resultant values when addressing such tasks. The aim is to evaluate ChatGPT’s ability to apply logical and analytical reasoning in finance and investment, focusing on precision and objectivity in computations and assessments.
3.4.2. Complex Reasoning Tasks
Complex reasoning tasks require advanced calculations, extensive analysis, and creative thought processes, demanding a higher level of critical thinking compared to Multi-Step reasoning tasks.
To assess these analyses, we have developed six primary tasks. The first task evaluates ChatGPT’s ability to perform technical analysis of randomly selected stocks and provide stock recommendations based on each technical indicator used. The second task aims to determine if ChatGPT can act as a portfolio manager by constructing an investment portfolio that meets the client’s needs, with a focus on the application of Modern Portfolio Theory. The third task centers on corporate finance, emphasizing cash flow analysis and capital budgeting analysis. The fourth and fifth tasks are about financial statement analysis. The sixth task involves a binomial tree analysis. A detailed description of these tasks is available in Appendix 2.
We conduct our evaluations using ChatGPT-4o, the latest and advanced model equipped with a code interpreter, and sophisticated data analysis capabilities. ChatGPT-4o excels in performing complex analyses and computations, allowing seamless interaction with various platforms and applications to ensure the accuracy and reliability of results. This enables comprehensive exploration and execution of tasks in finance and data analysis.
3.4.3. Evaluation Metrics
When financial analysts tackle a financial task, their approach typically begins with reasoning based on previously acquired specialized financial knowledge. They identify relevant concepts, formulas, and solutions applicable to the task at hand. Subsequently, they employ various professional tools to code and execute the task, culminating in the output of results. To scientifically compare the capabilities of large models like ChatGPT with traditional financial professionals, it is essential that these models also adopt a similar workflow. This workflow consists of logical reasoning followed by coding and modeling.
In evaluating the financial mathematics and decision-making performance of ChatGPT-4o, we will assess several metrics that encompass both quantitative and qualitative dimensions. These metrics are derived from generalized university rubrics, specifically tailored for elements of financial mathematics, designed to assess students' proficiency in comprehension, reasoning, modeling, data analysis, and critical thinking capabilities (Selke, 2013). Consequently, we divide the key steps of task processing into two primary modules: reasoning and modeling. The reasoning module includes evaluative dimensions such as task understanding and task deconstruction, while the modeling module encompasses calculation ideas and formulas, as well as accuracy. Additionally, we have incorporated an extra metric for critical thinking to assess ChatGPT-4o's ability in the application of knowledge and the level of critical thinking.
Task Understanding: This dimension gauges the ability to assimilate the prerequisites and objectives of a designated task or problem, evaluating the comprehension of the foundational concepts and principles inherent to the task.
Task Deconstruction: This dimension assesses the capability to fragment a task or problem into manageable and resolvable components or steps, focusing on the identification and isolation of pertinent variables and elements within a task.
Calculation Ideas and Formulas: This dimension scrutinizes the aptness and pertinence of the mathematical concepts, calculations, and formulas employed to decipher tasks, assessing the comprehension and application of mathematical models in problem resolution.
Accuracy: This metric quantifies the correctness and precision of the provided solutions against human analysts.
Critical Thinking: This dimension evaluates the capacity to objectively dissect information and formulate reasoned judgments, applying logical and reflective thinking to draw coherent conclusions and make informed decisions. The depth, quality, and efficacy of critical thinking can be assessed using diverse terminology that delineates the level of critical thinking applied (Stevens & Levi, 2023).
For the criteria of task understanding, task deconstruction, and calculation ideas and formulas, we utilize qualitative scales categorized as basic, intermediate, and advanced to evaluate. The basic level identifies some components or steps of the task but lacks clarity and coherence in breaking it down and struggles to isolate pertinent variables and elements. The intermediate level represents effectively breaking down the task into clear, manageable components, accurately identifying and isolating pertinent variables and elements. The advanced level presents a skillful and coherent deconstruction of the task into detailed, manageable components, demonstrating precise identification and isolation of all pertinent variables and elements.
For the assessment of critical thinking/application of knowledge, we employ descriptors such as practical, applicable, functional, operational, and useful for questions 31 and 32 in the multi-step reasoning tasks. This practical descriptor evaluates if the knowledge applied is realistic and can be implemented in real-world scenarios. The applicable term assesses whether the knowledge is relevant and suitable for the given task. The functional descriptor evaluates if the applied knowledge effectively performs its intended purpose within the task. The operational descriptor checks if the knowledge can be actively used in real-world operations, considering all practical constraints and requirements. The useful descriptor measures the overall utility of the knowledge in achieving the task’s objectives.
Conversely, for complex reasoning tasks such as investment suggestions and corporate strategy, the evaluative process is anchored in varying levels of critical thinking to appraise performance, with terms including advanced, moderate, basic, superficial, and naive. The advanced level signifies a deep and thorough understanding, with the ability to analyze, synthesize, and evaluate information critically. It involves strategic thinking and insightful judgment. The moderate level indicates a reasonable level of critical thinking, where the individual can interpret and analyze information adequately but may not demonstrate the same depth of insight as at the advanced level. The basic level shows a fundamental understanding and ability to apply critical thinking but with limited depth and complexity in reasoning. The superficial level suggests a shallow approach to critical thinking, where the individual’s analysis and evaluation lack depth and are primarily surface-level. The naive level indicates a very simplistic and undeveloped approach to critical thinking, often characterized by a lack of understanding and basic reasoning skills.
This comprehensive evaluative framework ensures a nuanced and multifaceted assessment of both human analysts and ChatGPT in the domains of financial mathematics and decision-making. It allows for a robust comparison and analysis of competencies and proficiencies across diverse tasks and scenarios.
4. Empirical Results and Findings
4.1. Data Collection/Retrieval
First, it is evident that contemporary AI models, including those analogous to ChatGPT, lack the functionalities and capabilities for real-time data retrieval. Consequently, they cannot directly generate the datasets required for specific financial analyses. Instead, these models are primarily limited to guiding users on potential sources from which pertinent data can be acquired, as illustrated in
Figure 1.
For academic pursuits, practitioners ranging from students to seasoned professionals such as analysts, traders, and investors might consider platforms like Yahoo Finance, which offers complimentary access to a vast array of financial data. However, for more comprehensive datasets, one may turn to institutional databases. Organizations often provide access to premium platforms like S&P Capital IQ, Bloomberg, and LSEG Refinitiv Workspace, among other specialized software, to facilitate in-depth financial analysis.
Consequently, the data used in our Complex Reasoning Tasks were sourced from S&P Capital IQ and LSEG Workspace for the following stocks listed on the ASX: Chalice Mining (CHN), Vulcan Energy Resources (VUL), Fineos Corporation (FCL), Southern Cross Gold Ltd (SXG), Liontown Resources (LTR), Neuren Pharmaceuticals (NEU), WiseTech Global Ltd (WTC), Aristocrat Leisure Limited (ALL), NextDC Ltd (NXT), and Pro Medicus Limited (PME). Additionally, we retrieved Australian 10-year bond yields from Bloomberg on the 15th of May and divided it by 252 trading days to get the daily yield.
4.2. Multi-Step Reasoning Tasks Results and Findings
Based on the comprehensive multi-step analytical assessment presented in
Table 1, it can be concluded that ChatGPT-4o demonstrates a proficient capability in basic or standard financial analysis reasoning. It follows a step-by-step procedure, working through sequential processes to find solutions, akin to highly capable human analysts. In most cases (27 out of 30), ChatGPT-4o reaches accurate conclusions and exhibits a strong understanding of the task at hand. Due to space constraints, we are only displaying the task results that differ from those of human analysts. Results for other tests can be provided upon request.
Several noteworthy insights emerge from the observations. Firstly, the importance of prompts cannot be overstated. Prompts are instructions or queries entered into the AI’s interface to elicit responses, and they require careful wording and specific instruction. Inadequate instructions or poorly aligned Excel files often result in error messages and failure to achieve meaningful results. During our experiment, we observed that unclear instructions led to such issues.
Secondly, ChatGPT-4o demonstrates the ability to learn from instructions. Of the 30 calculation-focused multi-step reasoning tasks, the answers generated by ChatGPT-4o diverged from those provided by human analysts in only three instances: Tasks 9, 12, and 19. However, with the appropriate instructions or hints, ChatGPT-4o eventually arrives at the correct solutions, similar to those produced by skilled human analysts. For instance, in Task 9, ChatGPT-4o initially struggled with the exponential calculation, repeatedly arriving at an incorrect answer of 22.73%. After a question was asked, it corrected its answer to 19%, aligning with the human analysts' solution. However, on the following day, when the same question was asked again, ChatGPT-4o produced another incorrect answer by using a different approach. Detailed information can be found in
Figure 2.
Task 12 involved calculating the internal rate of return (IRR). In the initial attempt, ChatGPT-4o employed a trial-and-error method but persisted in trying with larger rate numbers despite the net present value diminishing. A human analyst had to intervene and provide guidance, after which ChatGPT-4o completed the task. Subsequently, when the same task was entered again, ChatGPT-4o immediately produced the correct answer. However, on another fresh trial the next day, ChatGPT-4o generated an incorrect result by using Python. More detailed information is provided in
Figure 3.
The issue with Task 19 pertained to the application of the weighted average cost of capital (WACC) for mergers and acquisitions (M&A). Initially, ChatGPT-4o incorrectly applied the WACC of the acquired firm, resulting in different outcomes compared to those of a human financial analyst. Upon receiving prompts about selecting the appropriate WACC for M&A, ChatGPT-4o correctly identified the use of the acquiring firm’s WACC. Thus, with the proper instructions, it reached the correct conclusion.
Additional observations include instances where ChatGPT-4o does not directly provide final answers. In such cases, it recommends using tools like a financial calculator, Excel, or Python to complete the task.
For conceptual or qualitative tasks, such as Task 31 and Task 32, ChatGPT is capable of producing responses that are logical and adhere to recognized standards. However, these answers tend to be concise and may require further investigation. For example, in Task 31, which involves the understanding and insights into the dividend growth rate, ChatGPT-4o simply applied the average value, overlooking other elements that may affect the growth rate.
Moreover, it is noticeable that responses can vary each time a task is given, even though the main theme is maintained. This variability is a characteristic of artificial intelligence models. Language models, like chatbots, fundamentally operate as probabilistic systems, unlike deterministic systems. This means that posing the same questions can lead to different responses due to the inherent variability in the model's response generation. In these tasks, the wording and structure of the task significantly affect the resulting response generated by the model.
Conversely, for computational or quantitative tasks, the responses, including any incorrect outputs, tend to be consistent across multiple repetitions until intervention occurs. This consistency in computational tasks contrasts with the variability seen in responses to qualitative or conceptual tasks, underscoring the different response mechanisms inherent to artificial intelligence models in different task environments.
Overall, financial analysis is a critical task where even a small error can result in significant financial losses. The ongoing refinement and synergistic collaboration between LLMs and human expertise are crucial to melding analytical precision with human intuition. Therefore, it is recommended to utilize ChatGPT for analysis with great care and caution. It is imperative to always double-check the results to ensure accuracy.
4.3. Complex Reasoning Tasks Results and Findings
Table 2 presents the performance of ChatGPT-4o in executing complex reasoning tasks, highlighting its proficiency in technical analysis, portfolio construction, and corporate finance. For technical analyses and portfolio construction, we asked ChatGPT-4o to select the best 10 ASX-listed stocks based on the performance between January 2024 and the 14
th of May 2024. Prompts are presented in
Figure 4. After that, we extracted the daily stock prices from LSEG Workspace.
Within technical analysis, particularly Tasks 1a to 1c in Appendix 2, ChatGPT-4o showed proficiency in performing tasks related to Bollinger Bands, Moving Average Convergence/Divergence (MACD), and Relative Strength Index (RSI). Following this, it offered individual stock recommendations—whether to buy, sell, or hold—based on the latest technical indicators available in our data sample. To validate the outcomes generated by ChatGPT-4o, we utilized LSEG / Refinitiv Workspace to create comparable results, typically formulated by us - human analysts. It was observed that the Bollinger Bands and MACD generated by ChatGPT-4o aligned with those from Workspace. However, discrepancies were identified in the RSI charts between ChatGPT-4o and Workspace.
Further, ChatGPT-4o demonstrated the capability to offer investment recommendations, providing rational justifications to back stock recommendations stemming from each technical indicator. For instance, it could detect a potential bullish crossover in the MACD and subsequently propose a “hold” recommendation. When it detects a bearish crossover in the MACD, it proposes a “sell” recommendation. In addition, when the price goes above the upper Bollinger band, ChatGPT-4o suggested a “sell” as it is in an overbought condition. The results are depicted in
Figure 5.
For Complex Reasoning Task Number 2 (i.e. 2a to 2e), ChatGPT-4o demonstrated proficiency in mirroring the responses of human analysts by constructing a global minimum variance portfolio and optimal risky portfolio, determining the weights of each stock in the portfolios, and combining the portfolios. However, there is a discrepancy in the stock weights of the global minimum variance portfolio determined by Excel/Stata and ChatGPT-4o. For an optimal risky portfolio, stock weights provided by ChatGPT-4o are almost the same as the weights computed by Excel and Stata. ChatGPT-4o also successfully constructed an efficient frontier promptly. Both ChatGPT-4o’s results and our analyses are presented in
Figure 6.
However, ChatGPT-4o faced challenges in completing Complex reasoning Tasks 3, 4 and 5. Task 3 assessed ChatGPT-4o’s ability in capital budgeting analysis, evaluating its proficiency in interpreting extensive information, distinguishing relevant information, and critical thinking. The results provided by ChatGPT-4o were inconsistent with human analyst calculations, showing errors in analyzing information and recognizing irrelevant costs, and miscomputing depreciated expenses. Moreover, ChatGPT-4o failed to create a detailed capital budgeting template, outlining each cash inflow and outflow item annually.
In Tasks 4 and 5, we asked ChatGPT-4o conduct financial statement analyses (task 4 is a basic financial statement analysis, task 5 is a complex financial statement analysis). These tasks were sourced from the CFA problems test bank in the book of Essentials of Investments (Bodie et al. 2022). However, the final results varied significantly when compared to those of a human analyst. For example, without explicit instruction, ChatGPT-4o would apply the three-component DuPont formula analysis for Task 4 instead of the commonly used five-component method. Additionally, it appears that ChatGPT-4o struggles to accurately retrieve data tables formatted as images. As shown in Figure 9, the Excel template created by ChatGPT-4o displays different values. Any issues encountered in the initial step led to markedly different results or interpretations in subsequent steps. Results are presented in
Figure 7,
Figure 8, and
Figure 9, respectively.
Task 6 assessed whether ChatGPT-4o could calculate the American call option price using binomial tree approach. As shown in
Figure 10, tree diagram generated by DerivaGem indicates that early exercise is optimal at certain node (indicated by red value). However, ChatGPT-4o concluded that early exercise is not optimal at any step. In addition, it wasn’t able to display the binomial tree diagram, even after instructing it to follow DerivaGem diagram. Lastly, we attempted to use the new voice interaction feature in ChatGPT-4o. GPT-4o could provide a better tree diagram, but the option prices and early exercise decisions remained incorrect.
4.4. Other Observations
During our reasoning evaluation, several issues related to ChatGPT-4o were identified.
First, even when the same methods have been applied, a discrepancy exists between the charts produced by ChatGPT-4o and Workspace. Since both ChatGPT-4o and Workspace are tools or software used by human analysts to draw conclusions, it is plausible that the charts are slightly different from one another. Despite the existing discrepancies in both charts, the stock recommendations using RSI from both ChatGPT-4o and Workspace are consistent (RSI value lies between lower and upper bands).
Second, ChatGPT-4o relies mainly on Python programming. According to Dilmegani (2024), “the code interpreter only supports Python as a language”. Difference in programming methods may cause differences such as the stock weights in the construction of a global minimum variance portfolio. In addition, this requires Python experts or analysts with proficient Python skills to detect any discrepancies in the calculation method.
Third, the capital budgeting and financial statement analyses exposed a shortfall in ChatGPT-4o’s capability to replicate human analytical processes, particularly in offering sequential calculations and in creating Excel-like templates outlining each cash flow item. This indicates that ChatGPT-4o generates responses based on patterns learned during training and doesn’t understand context or infer meanings in the way humans do. This highlights a limitation in ChatGPT-4o’s ability to accurately process comprehensive information, suggesting a potential obstacle in its capability to assimilate and analyze complex data sets accurately.
Lastly, ChatGPT-4o may not provide accurate results for specialized finance areas such as derivative securities. Although GPT-4o was able to perform the step-by-step calculations like a human analyst, the results were not correct. Furthermore, it has to rely on Python programming to display the tree diagram, but the structure is quite different from a normal binomial tree diagram.
5. Conclusion
The advancement of artificial intelligence (AI) and large language models (LLMs) such as ChatGPT has showcased remarkable capabilities, particularly in financial analysis. This study has examined the reasoning abilities of ChatGPT-4o through various financial tasks, providing significant insights into the strengths and limitations of LLMs in mimicking human analytical processes.
ChatGPT-4o has demonstrated considerable skill in performing standard financial reasoning tasks, closely aligning its analytical approach with that of human analysts. It excels in logical reasoning, task decomposition, and generating solutions, which are essential for tasks like financial modelling and forecasting. However, the study also highlights several challenges and limitations.
The variability in ChatGPT-4o's responses, especially for qualitative tasks, underscores the importance of explicit instructions and careful task formulation. The discrepancies observed in some tasks between ChatGPT-4o and human analysts emphasize the need for robust evaluation metrics to ensure consistent and reliable outputs. Additionally, ChatGPT-4o encountered difficulties with complex tasks requiring a higher level of analytical depth and comprehensive understanding, indicating its limitations in replicating intricate human analytical methods.
Despite these challenges, the prospective integration of ChatGPT-4o with specialized financial data providers and tools, such as Bloomberg, S&P Capital IQ, and Excel etc., represents a transformative shift in the financial sector. This integration is poised to significantly enhance human analytical processes, enabling financial professionals to concentrate more on critical decision-making elements. The recent introduction of advanced features, such as the code interpreter and voice and video models, has further expanded the functionalities of AI in financial applications, thereby solidifying its status as an invaluable resource for complex financial analyses.
In conclusion, while ChatGPT-4o exhibits strong analytical reasoning capabilities, its integration with human insight is essential to fully harness cognitive capabilities in financial analysis. The ongoing refinement and collaboration between LLMs and human expertise are crucial to achieving analytical precision and leveraging the full potential of AI in finance. It is recommended to use ChatGPT-4o with caution, always verifying results to ensure accuracy and reliability in financial decision-making.
Appendix 1: Multi-Step Reasoning Tasks/Prompt
Suppose you deposit $1,000 in a savings account that pays 10% interest, compounded quarterly. How much will be in that account after 10 years if there is no withdrawal?
Sammy deposits $1,000 now, $1,500 in one more year, then $2,000 in two years and $2,500 in three years in a savings account that pays 10% interest per annum. How much does Sammy have in the account at the end of the third year?
You will deposit $1,500 in one year’s time from now, $2,000 in two years’ time and $2,500 in three years’ time, in an account paying 10 per cent interest per annum. What is the present value of these cash flows?
You are purchasing a home and are scheduled to make 30 annual instalments of $10,000 per year. Given an interest rate of 5%, what is the price you are paying for your house?
The superannuation guarantee rate in the industry is 9.5% in Australia. If your annual income is $100,000, you will have $9,500 every year for the next 30 years till your retirement (ignore the growth of income here). Given a 10% rate of interest, how much will you have saved by the time you retire?
Suppose you are valuing an investment that promises $100 per year at the end of this and the next four years. If the annual interest rate is 10%, calculate the value of this investment.
You wish to invest financial security with a face value of $500,000, a term to maturity of 180 days and a yield of 8.75% per annum. How much will it cost you today to buy?
-
As a winner of a dragon boat competition, you can choose one of the following prizes:
$100,000 now.
$180,000 at the end of five years.
$11,400 a year forever.
D. $19,000 for each of 10 years.
$6.500 next year and increasing thereafter by 5% a year forever.
Assume the interest rate is 12%, what is your choice?
Google Inc. became a public company when it conducted an IPO of ordinary shares in August 2004. Originally priced at $85 per share. By August 2018, Google shares stood at $1,084. What annual rate of return did the investors who bought Google shares at the IPO and held them until August 2018 earn?
It is 30 June; ABC company has a commercial bill which has a current interest rate yield of 6.08 per cent per annum. The existing bills mature on 31 August 2015 but will be replaced by a further issue at that date. What is the effective annual interest rate on the bill?
What is the WACC for a firm with $30 million in outstanding debt with a required return of 8%, 8 million in equity shares outstanding trading at $15 each with a required return of 12%, and a tax rate of 35%?
Consider an investment that costs $800 and has cash flows of 300, 200, 150, 122, and 133 in years 1-5. Calculate the internal rate of return.
ABC Corporation has a stock price of $50. The firm has just paid a dividend of $3 per share, and shareholders think that this dividend will grow by a rate of 5% per year. Use the Gordon dividend model to calculate the cost of equity of ABC.
ABC Corporation has just paid a dividend of $3 per share. You, an experienced analyst, feel quite sure that the growth rate of the company’s dividends over the next ten years will be 15% per year. After ten years you think that the company’s dividend growth rate will slow to the industry average, which is about 5% per year. If the cost of equity for ABC is 12%, what is the value today of one share of the company?
Your firm ABC is considering acquiring a business. Calculate its value using the following information: Firm ABC’s WACC is 12.5%, and the cash flows of the business are $1 million for years 1-4. The business is expected to grow at a rate of 5% after the fourth year.
The current level of the S&P 500 is 3,000. The dividend yield on the S&P 500 is 2%. The risk-free interest rate is 1%. What should be the price of a one-year maturity futures contract?
A stock selling for $25 today will, in one year, be worth either $35 or $20. If the interest rate is 8%, what is the value today of a one-year call option on the stock with an exercise price $30? Use the simultaneous equation approach to price the option.
Use the Black-Scholes model to price a call option on a stock whose current price is 50, with an exercise price of 50, an interest rate of 10%, maturity of 0.5-year, and a standard deviation of 25%.
You are analysing Woolworth’s potential acquisition of Billabong. Suppose Woolworths plans to offer $450 million as the purchase price for Billabong, and it will need to issue additional debt and equity to finance the acquisition. You estimate that the issuance costs will be $15 million and will be paid as soon as the transaction closes. You estimate the incremental free cash flows from the acquisition will be $29 million in the first year and will grow at 4% per year thereafter. What is the NPV of the proposed acquisition? You may access the other information from the file uploaded.
The spreadsheet uploaded is the five-year monthly prices for Intel Corporation and the S&P 500. Calculate Intel's beta.
-
ABC Corporation has issued 1 million fully paid ordinary shares. The after-tax profits for ABC are $500,000. Earnings per share are 50 cents ($500,000/1m=50 cents). ABC’s shares are currently selling at a price-earnings multiple of 10. The financial manager of ABC is planning a 1-for-5 bonus issue. Answer the following questions.
What is the current share price?
How many new shares will be issued under the 1-for-5 bonus scheme?
What are the new earnings per share after the bonus issue?
What is the market price after the bonus issue if the price-earnings multiple remains at 10?
After the bonus issue, what is the total value of the investor’s holdings? Assume this investor previously had 10 shares.
-
You looked at the newspaper quotes for options on ABC share, you saw that a March call option with a strike price of 37.5 is priced at 6.375, whereas the May call option with the same exercise price is priced at 6.
-
An American call option is written on a stock whose price today is $60. The exercise price of the call is $45.
if the call price is 2, explain how you would use arbitrage to make an immediate profit.
if the option is exercisable at time T=1 year and if the interest rate is 10%, what is the minimum price of the option? Use proposition a.
-
A one-year gold futures contract is selling for $1,558. Spot gold prices are $1,500 and the one-year risk-free rate is 4%.
According to spot-futures parity, what should be the futures price?
What risk-free strategy can investors use to take advantage of the futures mispricing, and what will be the profits on the strategy?
Based on the monthly TV Ads released and the revenues recorded over the past one-year period, forecast next year’s revenue. And if there are 100 Ads put on next month, what is next month’s revenue? You may access the data through the file uploaded.
-
Suppose a fund manager has a portfolio that consists of a single asset. The return of the asset is normally distributed with a mean return of 20% and a standard deviation of 30%. The value of the portfolio today is $100 million.
What is the distribution of the end-of-year portfolio value?
What is the probability of a loss of more than $20 million by year-end? For example, what is the probability that the end-of-year value is less than $80 million?
With 1% probability what is the maximum loss at the end of the year?
-
Your company is considering either purchasing or leasing an asset that costs $1,000,000. The asset, if purchased, will be depreciated on a straight-line basis over six years to a zero residual value. A leasing company is willing to lease the asset for $300,000 per year; the first payment on the lease is due at the time the lease is undertaken (i.e., year 0), and the remaining five payments are due at the beginning of years 1-5. Your company has a tax rate of 40% and can borrow at 10% from its bank.
A one-year, $100,000 loan carries a coupon rate and a market interest rate of 12%. The loan requires payment of accrued interest and one-half of the principal at the end of six months. The remaining principal and accrued interest are due at the end of the year. What is the duration of this loan?
-
On January 23, 1999, the market price of a Bond was $1,122.32. The bond pays $59 in interest on March 1 and September 1 of each of the years 1999-2005. On September 1, 2005, the bond was redeemed at its face value of $1,000.
An investment fund owns the following portfolio of three fixed-rate government bonds: You may access the data from the file uploaded.
The total market value of the portfolio is US$96,437,017. Each bond is on a coupon date so that there is no accrued interest. The market values are the full prices given the par value. Coupons are paid half yearly. The yields to maturity are stated for a periodicity of 2. The Macaulay durations are annualized.
Calculate the average (annual) modified duration for the portfolio using the shares of market value as the weights.
Estimate the percentage loss in the portfolio’s market value if the annual yield to maturity on each bond goes up by 20 bps.
- 31.
BHP in Australia is still generating good profits. But growth is slowing down. Based on BHP’s previous 10-year dividend payout history, help the CFO decide how to start up a program of paying out cash to stockholders. Access the data from the file uploaded.
- 32.
-
A few years after being appointed financial manager at Sedona Fabricators, Inc., you are asked by your boss to prepare for your first presentation to the Board of Directors. This presentation will pertain to issues associated with capital structure. It is intended to ensure that some of the newly appointed, independent board members understand certain terminology and issues. As a guideline for your presentation, you are provided with the following outline of questions.
What is capital structure?
What is financial leverage?
How does financial leverage relate to company risk and expected returns?
Modigliani and Miller demonstrated that capital structure policy is irrelevant. What is the basis for their argument? What are their Propositions I and II?
How does the introduction of corporate taxes affect the M&M model?
How do the costs of insolvency and financial distress affect the M&M model?
What are agency costs? How can the use of debt reduce agency costs associated with equity?
Appendix 2: Complex Reasoning Tasks/Prompt
-
Using the historical price data for the 10 stocks named: CHN, VUL, FCL, SXG, LTR, NEU, WTC, ALL, NXT, and PME in the attached file, please conduct technical analyses using the following indicators: Bollinger Bands, Relative Strength Index (RSI), and Moving Average Convergence Divergence (MACD).
Please draw technical charts and indicators for each stock.
Assume today is the 15th of May 2024, what is your overall recommendation on each stock based on the charts and indicators?
Please summarise the recommendations in a table with the explanation provided.
-
Using the historical price data for the 10 stocks named: CHN, VUL, FCL, SXG, LTR, NEU, WTC, ALL, NXT, and PME in the attached file, please first sort the date from the earliest to the latest date and then perform the following analyses. Please save the return data in the same excel file:
Please calculate daily return and then present summary statistics for each stock in a table such as Mean Return, Standard Deviation, Max, Min, Median, Skewness and Kurtosis in a table. Then discuss whether the returns of each stock follow normal distribution based on summary statistics.
Please display correlation matrix based on the returns for the 10 stocks and indicate whether it is significant at 1%, 5% and 10% levels using ***, **, * respectively. Please discuss the results and significance of the correlations between different pairs of stocks.
Based on the average returns, standard deviations and correlation matrix for the 10 stocks, please construct a global minimum variance portfolio. What criterion have you considered when creating the global minimum variance portfolio? Assume the risk-free rate is 0.017% and there is no short selling in any stock, hence all weights should lie between 0 and 1. Please calculate and explain the weights of stocks in the global minimum variance portfolio? Based on the weights you have calculated, what is the global minimum variance portfolio return and standard deviation?
Based on the average returns, standard deviations and correlation matrix for the 10 stocks, please construct an optimal risky portfolio. What criterion have you considered when creating the optimal risky portfolio? Assume the risk-free rate is 0.017% and there is no short selling in any stock, hence all weights should lie between 0 and 1. Please calculate and explain the optimal weights of stocks in the optimal risky portfolio? Based on the weights you have calculated, what is the optimal risky portfolio return and standard deviation?
Please create an efficient frontier for the combinations of these 10 stocks and also indicate the global minimum variance and optimal risky portfolios on the graph.
Aus Car Execs (ACE) is set up as a sole trader and is analysing whether to enter the discount used rental car market. This project would involve the purchase of 100 used, late-model, mid-sized automobiles at the price of $9,500 each. In order to reduce their insurance costs, ACE will have a LoJack Stolen Vehicle Recovery System installed in each automobile at a cost of $1,000 per vehicle. ACE will also utilise one of their abandoned lots to store the vehicles. If ACE does not undertake this project, they could sublease this lot to an auto repair company for $80,000 per year. The $20,000 annual maintenance cost on this lot will be paid by ACE whether the lot is subleased or used for this project. In addition, if this project is undertaken, net working capital will increase by $50,000.
For taxation purposes, the useful life of the automobiles is determined to be 5 years, and they will be depreciated using the diminishing value method. Each car is expected to generate $4,800 a year in revenue and have operating costs of $1,000 per year. Starting 6 years from now, one-quarter of the fleet is expected to be replaced every year with a similar fleet of used cars. This is expected to result in a net cash flow (including acquisition costs) of $100,000 per year continuing indefinitely. This discount rental car business is expected to have a minimum impact on ACE’s regular rental car business where the net cash flow is expected to fall by only $25,000 per year. ACE expects to have a marginal tax rate of 32%.
Based on the above information, if ACE uses a discount rate of 12% for capital budgeting, what is the NPV of this project? If ACE adjusts the discount rate to 14% to reflect higher project risk, what is the NPV? For each question, please construct a capital budgeting analysis.
- 4.
-
John is reviewing ABC’s financial statements to estimate its sustainable growth rate. Using the information presented in the Table uploaded, can you please first convert this into an excel template (save it as a separately file) and: (measurement: $ million, except per-share data).
Identify and calculate the components of the DuPont formula.
Calculate the ROE (Return on Equity) for 2022 using the components of the DuPont formula.
Calculate the sustainable growth rate for 2022 from the firm’s ROE and plowback ratio. (Bodie et al., 2022, p.468)
- 5.
Jennifer is a recently hired analyst. After describing the electric toothbrush industry, her first report focuses on two companies, WhiteBrush company and ProtectBrush company, and concludes:
WhiteBrush is a more profitable company than ProtectBrush, as indicated by the 40% sales growth and substantially higher margins it has produced over the last few years. ProtectBrush’s sales and earnings are growing at a 10% rate and produce much lower margins. We don’t think ProtectBrush is capable of growing faster than its recent growth rate of 10% whereas WhiteBrush can sustain a 30% long-term growth rate. Please convert the information in the screenshots into an Excel template and save it as a separate file.
- a)
-
Criticize Jennifer’s analysis and conclusion that WhiteBrush is more profitable, as defined by return on equity (ROE), than ProtectBrush and that it has a higher sustainable growth rate. Use only the information provided in Table WhiteBrush and Table ProtectBrush. Support your criticism by calculating and analysing:
- b)
Explain how WhiteBrush has produced an average annual earnings per share (EPS) growth rate of 40% over the last two years with an ROE that has been declining. Use only the information provided in Table WhiteBrush. (Bodie et al., 2022, p.468)
- 6.
The Australian dollar is currently worth USD 0.6100 and this exchange rate has a volatility of 12%. The Australian risk-free rate is 7% and the U.S. risk-free rate is 5%. Use a three-step binomial tree to value a 3-month American call option with a strike price of USD 0.6000. Please draw a tree diagram and show the value of currency and value of option at each node in the diagram, and also indicate whether early exercise is optimal (in red colour). Assume U.S. is domestic country and Australia is foreign country (Hull, 2015, Example 13.2, p.314).
References
- Ali, H., & Aysan, A. F. (2023). What will ChatGPT Revolutionize in Financial Industry? Available at SSRN 4403372.
- Ahmed, S.; Alshater, M.M.; El Ammari, A.; Hammami, H. Artificial intelligence and machine learning in finance: A bibliometric review. Research in International Business and Finance 2022, 61, 101646. [Google Scholar] [CrossRef]
- Blesiada, J. Expedia group gives users the opportunity to test new technology. Tavel Weekly 2023. Accessed at https://www.travelweekly.com/Travel-News/Travel-Technology/Expedia-Group-gives-users-opportunity-test-new-technology.
- Bodie, Z., Kane, A., & Marcus, A.J. (2022). Essentials of Investments, 12e. McGraw Hill LLC.
- Bollen, J.; Mao, H.; Zeng, X. Twitter mood predicts the stock market. Journal of computational science 2011, 2, 1–8. [Google Scholar] [CrossRef]
- Boukes, M.; Van de Velde, B.; Araujo, T.; Vliegenthart, R. What’s the tone? Easy doesn’t do it: Analyzing performance and agreement between off-the-shelf sentiment analysis tools. Communication Methods and Measures 2020, 14, 83–104. [Google Scholar] [CrossRef]
- Burgess, N. Machine Earning–Algorithmic Trading Strategies for Superior Growth, Outperformance and Competitive Advantage. International Journal of Artificial Intelligence and Machine Learning 2021, 2, 38–60. [Google Scholar] [CrossRef]
- Cao, L. Ai in finance: challenges, techniques, and opportunities. ACM Computing Surveys (CSUR) 2022, 55, 1–38. [Google Scholar]
- Chaboud, A.P.; Chiquoine, B.; Hjalmarsson, E.; Vega, C. Rise of the machines: Algorithmic trading in the foreign exchange market. The journal of finance 2014, 69, 2045–2084. [Google Scholar] [CrossRef]
- de Lange, P.E.; Melsom, B.; Vennerød, C.B.; Westgaard, S. Explainable AI for Credit Assessment in Banks. Journal of Risk and Financial Management 2022, 15, 556. [Google Scholar] [CrossRef]
- Demajo, L.M.; Vella, V.; Dingli, A. Explainable ai for interpretable credit scoring. arXiv 2020, arXiv:2012.03749. [Google Scholar]
- Dilmegani, C. (2024). ChatGPT Code Interpreter Plugin: Use Cases & Limitations in 2024. AIMultiple Research. https://research.aimultiple.com/chatgpt-code-interpreter/.
- Dowling, M.; Lucey, B. ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters 2023, 53, 103662. [Google Scholar] [CrossRef]
- Farooq, A.; Chawla, P. (2021). Review of data science and AI in finance. Paper presented at the 2021 International Conference on Computing Sciences (ICCS).
- Félix, L.; Kräussl, R.; Stork, P. Implied volatility sentiment: a tale of two tails. Quantitative Finance 2020, 20, 823–849. [Google Scholar] [CrossRef]
- Hansen, A.L.; Kazinnik, S. (2023). Can ChatGPT Decipher Fedspeak? Available at SSRN.
- Hartmann, J.; Heitmann, M.; Siebert, C.; Schamp, C. More than a feeling: Accuracy and application of sentiment analysis. International Journal of Research in Marketing 2023, 40, 75–87. [Google Scholar] [CrossRef]
- Hull, J. C. (2015). Options, Futures, and Other Derivatives, Global Edition: Pearson Education.
- Jullum, M.; Løland, A.; Huseby, R.B.; Ånonsen, G.; Lorentzen, J. Detecting money laundering transactions with machine learning. Journal of Money Laundering Control 2020, 23, 173–186. [Google Scholar] [CrossRef]
- Kelly, Jack. (2023). Goldman Sachs predicts 200 million jobs will be lost or degraded by artificial intelligence. Forbes. Accessed at: https://www.forbes.com/sites/jackkelly/2023/03/31/goldman-sachs-predicts-300-million-jobs-will-be-lost-or-degraded-by-artificial-intelligence/?sh=43cb004a782b.
- Korteling, J.; van de Boer-Visschedijk, G.C.; Blankendaal, R.A.; Boonekamp, R.C.; Eikelboom, A.R. Human-versus artificial intelligence. Frontiers in Artificial Intelligence 2021, 4, 622364. [Google Scholar] [CrossRef] [PubMed]
- Leippold, M. Sentiment spin: Attacking financial sentiment with GPT-3. Finance Research Letters 2023, 103957. [Google Scholar] [CrossRef]
- Leippold, M. Thus spoke GPT-3: Interviewing a large-language model on climate finance. Finance Research Letters 2023, 53, 103617. [Google Scholar] [CrossRef]
- Lin, T.C. Artificial intelligence, finance, and the law. Fordham L. Rev. 2019, 88, 531. [Google Scholar]
- Liu, L.X.; Liu, S.; Sathye, M. Predicting bank failures: A synthesis of literature and directions for future research. Journal of Risk and Financial Management 2021, 14, 474. [Google Scholar] [CrossRef]
- Lopez-Lira, A.; Tang, Y. Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models. arXiv 2023, arXiv:2304.07619. [Google Scholar] [CrossRef]
- Richter, F. (2023). Which sectors are working with OpenAI? Statista. Accessed at: https://www.statista.com/chart/29244/number-of-companies-using-open-ai-in-their-business-processes-worldwide/.
- Saunders, A.; Cornett, M.M.; Erhemjamts, O. Financial institutions management: A risk management approach, Tenth edition; McGraw-Hill Education: 2 Penn Plaza, New York, NY 10121, 2021. [Google Scholar]
- Selke, M.J.G. (2013). Rubric assessment goes to college: Objective, comprehensive evaluation of student work: R&L Education.
- Sokolov, I. Theory and practice in artificial intelligence. Вестник Рoссийскoй академии наук 2019, 89, 365–370. [Google Scholar] [CrossRef]
- Son, G.; Jung, H.; Hahm, M.; Na, K.; Jin, S. Beyond Classification: Financial Reasoning in State-of-the-Art Language Models. arXiv 2023, arXiv:2305.01505. [Google Scholar]
- Stevens, D.D.; Levi, A.J. Introduction to rubrics: An assessment tool to save grading time, convey effective feedback, and promote student learning: Routledge. 2023.
- Tetlock, P.C.; Saar-Tsechansky, M.; Macskassy, S. More than words: Quantifying language to measure firms' fundamentals. The journal of finance 2008, 63, 1437–1467. [Google Scholar] [CrossRef]
- Turing, A. M. Computing machinery and intelligence (1950). The Essential Turing: the Ideas That Gave Birth to the Computer Age 2012, 433–464. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 2022, 35, 24824–24837. [Google Scholar]
- Wenzlaff, K.; Spaeth, S. Wenzlaff, K.; Spaeth, S. (2022). Smarter than Humans? Validating how OpenAI’s ChatGPT model explains Crowdfunding, Alternative Finance and Community Finance. Validating how OpenAI’s ChatGPT model explains Crowdfunding, Alternative Finance and Community Finance.(December 22, 2022).
- Yu, L.; Härdle, W.K.; Borke, L.; Benschop, T. An AI approach to measuring financial risk. The Singapore Economic Review 2023, 68, 1529–1549. [Google Scholar] [CrossRef]
- Yue, T.; Au, D.; Au, C.C.; Iu, K.Y. Democratizing financial knowledge with ChatGPT by OpenAI: Unleashing the Power of Technology. Available at SSRN 4346152. 2023.
- Zaremba, A., & Demir, E. (2023). ChatGPT: Unlocking the Future of NLP in Finance. Available at SSRN 4323643.
Figure 1.
Data Collection/Retrieval.
Figure 1.
Data Collection/Retrieval.
Figure 2.
Taks 9 demonstration.
Figure 2.
Taks 9 demonstration.
Task Number |
Tasks |
Task understanding |
Task deconstruction |
Calculation ideas and formulas |
Accuracy |
Critical Thinking/application of knowledge |
1 |
Future value |
advanced |
advanced |
advanced |
Yes |
Functional |
2 |
Future value with uneven cash flows |
advanced |
advanced |
advanced |
Yes |
Functional |
3 |
Present value with uneven cash flows |
advanced |
advanced |
advanced |
Yes |
Functional |
4 |
Installments/mortgage/annuity calculation |
advanced |
advanced |
advanced |
Yes |
Functional |
5 |
Future value of annuity |
advanced |
advanced |
advanced |
Yes |
Functional |
6 |
Investment accumulation |
advanced |
advanced |
advanced |
Yes |
Functional |
7 |
Present value of financial security |
advanced |
advanced |
advanced |
Yes |
Functional |
8 |
Present value decision-making |
advanced |
advanced |
advanced |
Yes |
Functional |
9 |
Investment yield |
advanced |
advanced |
advanced |
No |
Functional |
10 |
Effective annual interest rate |
advanced |
advanced |
advanced |
Yes |
Functional |
11 |
Weighted Average Cost of Capital |
advanced |
advanced |
advanced |
Yes |
Functional |
12 |
Internal rate of return |
advanced |
advanced |
advanced |
No |
Functional |
13 |
Cost of equity |
advanced |
advanced |
advanced |
Yes |
Functional |
14 |
Share valuation |
advanced |
advanced |
advanced |
Yes |
Functional |
15 |
Simple business valuation |
advanced |
advanced |
advanced |
Yes |
Functional |
16 |
Cost of carry futures model |
advanced |
advanced |
advanced |
Yes |
Functional |
17 |
Simultaneous equation option pricing model |
advanced |
advanced |
advanced |
Yes |
Functional |
18 |
Black-Scholes option pricing model |
advanced |
advanced |
advanced |
Yes |
Functional |
19 |
Simple business valuation in M&A |
advanced |
advanced |
advanced |
No |
Functional |
20 |
Beta calculation |
advanced |
advanced |
advanced |
Yes |
Functional |
21 |
Bonus issue |
advanced |
advanced |
advanced |
Yes |
Functional |
22 |
Option mispricing and arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
23 |
Option arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
24 |
Futures mispricing and arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
25 |
Simple regression forecasting |
advanced |
advanced |
advanced |
Yes |
Functional |
26 |
Value at risk (risk management) |
advanced |
advanced |
advanced |
Yes |
Functional |
27 |
Lease or purchase |
advanced |
advanced |
advanced |
Yes |
Functional |
28 |
Duration |
advanced |
advanced |
advanced |
Yes |
Functional |
29 |
Yield to maturity and duration |
advanced |
advanced |
advanced |
Yes |
Functional |
30 |
Fixed-income risk and return |
advanced |
advanced |
advanced |
Yes |
Functional |
31 |
Dividend payout suggestions |
advanced |
advanced |
N/A |
N/A |
Functional |
32 |
Capital structure |
advanced |
advanced |
N/A |
N/A |
Functional |
Figure 3.
Taks 12 demonstration.
Figure 3.
Taks 12 demonstration.
Task Number |
Tasks |
Task understanding |
Task deconstruction |
Calculation ideas and formulas |
Accuracy |
Critical Thinking/application of knowledge |
1 |
Future value |
advanced |
advanced |
advanced |
Yes |
Functional |
2 |
Future value with uneven cash flows |
advanced |
advanced |
advanced |
Yes |
Functional |
3 |
Present value with uneven cash flows |
advanced |
advanced |
advanced |
Yes |
Functional |
4 |
Installments/mortgage/annuity calculation |
advanced |
advanced |
advanced |
Yes |
Functional |
5 |
Future value of annuity |
advanced |
advanced |
advanced |
Yes |
Functional |
6 |
Investment accumulation |
advanced |
advanced |
advanced |
Yes |
Functional |
7 |
Present value of financial security |
advanced |
advanced |
advanced |
Yes |
Functional |
8 |
Present value decision-making |
advanced |
advanced |
advanced |
Yes |
Functional |
9 |
Investment yield |
advanced |
advanced |
advanced |
No |
Functional |
10 |
Effective annual interest rate |
advanced |
advanced |
advanced |
Yes |
Functional |
11 |
Weighted Average Cost of Capital |
advanced |
advanced |
advanced |
Yes |
Functional |
12 |
Internal rate of return |
advanced |
advanced |
advanced |
No |
Functional |
13 |
Cost of equity |
advanced |
advanced |
advanced |
Yes |
Functional |
14 |
Share valuation |
advanced |
advanced |
advanced |
Yes |
Functional |
15 |
Simple business valuation |
advanced |
advanced |
advanced |
Yes |
Functional |
16 |
Cost of carry futures model |
advanced |
advanced |
advanced |
Yes |
Functional |
17 |
Simultaneous equation option pricing model |
advanced |
advanced |
advanced |
Yes |
Functional |
18 |
Black-Scholes option pricing model |
advanced |
advanced |
advanced |
Yes |
Functional |
19 |
Simple business valuation in M&A |
advanced |
advanced |
advanced |
No |
Functional |
20 |
Beta calculation |
advanced |
advanced |
advanced |
Yes |
Functional |
21 |
Bonus issue |
advanced |
advanced |
advanced |
Yes |
Functional |
22 |
Option mispricing and arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
23 |
Option arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
24 |
Futures mispricing and arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
25 |
Simple regression forecasting |
advanced |
advanced |
advanced |
Yes |
Functional |
26 |
Value at risk (risk management) |
advanced |
advanced |
advanced |
Yes |
Functional |
27 |
Lease or purchase |
advanced |
advanced |
advanced |
Yes |
Functional |
28 |
Duration |
advanced |
advanced |
advanced |
Yes |
Functional |
29 |
Yield to maturity and duration |
advanced |
advanced |
advanced |
Yes |
Functional |
30 |
Fixed-income risk and return |
advanced |
advanced |
advanced |
Yes |
Functional |
31 |
Dividend payout suggestions |
advanced |
advanced |
N/A |
N/A |
Functional |
32 |
Capital structure |
advanced |
advanced |
N/A |
N/A |
Functional |
Figure 4.
10-stock selection by ChatGPT-4o.
Figure 4.
10-stock selection by ChatGPT-4o.
Figure 5.
Complex Reasoning Task 1 demonstration: Technical Analyses and Indicators.
Figure 5.
Complex Reasoning Task 1 demonstration: Technical Analyses and Indicators.
Figure 6.
Complex Reasoning Task 2 demonstration: Portfolio Construction.
Figure 6.
Complex Reasoning Task 2 demonstration: Portfolio Construction.
Figure 7.
Complex Reasoning Task 3 demonstration: Capital Budgeting.
Figure 7.
Complex Reasoning Task 3 demonstration: Capital Budgeting.
Figure 8.
Complex Reasoning Task 4 demonstration: Financial Statement Analysis.
Figure 8.
Complex Reasoning Task 4 demonstration: Financial Statement Analysis.
Figure 9.
Complex Reasoning Task 5 demonstration: Financial Statement Analysis.
Figure 9.
Complex Reasoning Task 5 demonstration: Financial Statement Analysis.
Figure 10.
Complex Reasoning Task 6 demonstration: Option pricing – binomial tree.
Figure 10.
Complex Reasoning Task 6 demonstration: Option pricing – binomial tree.
Table 1.
Multi-step Reasoning Task Evaluation Results for ChatGPT-4o. Task understanding, task deconstruction, and calculation ideas and formulas are evaluated with descriptors of basic, intermediate, and advanced. Critical thinking/application of knowledge is evaluated with descriptors of practical, applicable, functional, operational, and useful.
Table 1.
Multi-step Reasoning Task Evaluation Results for ChatGPT-4o. Task understanding, task deconstruction, and calculation ideas and formulas are evaluated with descriptors of basic, intermediate, and advanced. Critical thinking/application of knowledge is evaluated with descriptors of practical, applicable, functional, operational, and useful.
Task Number |
Tasks |
Task understanding |
Task deconstruction |
Calculation ideas and formulas |
Accuracy |
Critical Thinking/application of knowledge |
1 |
Future value |
advanced |
advanced |
advanced |
Yes |
Functional |
2 |
Future value with uneven cash flows |
advanced |
advanced |
advanced |
Yes |
Functional |
3 |
Present value with uneven cash flows |
advanced |
advanced |
advanced |
Yes |
Functional |
4 |
Installments/mortgage/annuity calculation |
advanced |
advanced |
advanced |
Yes |
Functional |
5 |
Future value of annuity |
advanced |
advanced |
advanced |
Yes |
Functional |
6 |
Investment accumulation |
advanced |
advanced |
advanced |
Yes |
Functional |
7 |
Present value of financial security |
advanced |
advanced |
advanced |
Yes |
Functional |
8 |
Present value decision-making |
advanced |
advanced |
advanced |
Yes |
Functional |
9 |
Investment yield |
advanced |
advanced |
advanced |
No |
Functional |
10 |
Effective annual interest rate |
advanced |
advanced |
advanced |
Yes |
Functional |
11 |
Weighted Average Cost of Capital |
advanced |
advanced |
advanced |
Yes |
Functional |
12 |
Internal rate of return |
advanced |
advanced |
advanced |
No |
Functional |
13 |
Cost of equity |
advanced |
advanced |
advanced |
Yes |
Functional |
14 |
Share valuation |
advanced |
advanced |
advanced |
Yes |
Functional |
15 |
Simple business valuation |
advanced |
advanced |
advanced |
Yes |
Functional |
16 |
Cost of carry futures model |
advanced |
advanced |
advanced |
Yes |
Functional |
17 |
Simultaneous equation option pricing model |
advanced |
advanced |
advanced |
Yes |
Functional |
18 |
Black-Scholes option pricing model |
advanced |
advanced |
advanced |
Yes |
Functional |
19 |
Simple business valuation in M&A |
advanced |
advanced |
advanced |
No |
Functional |
20 |
Beta calculation |
advanced |
advanced |
advanced |
Yes |
Functional |
21 |
Bonus issue |
advanced |
advanced |
advanced |
Yes |
Functional |
22 |
Option mispricing and arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
23 |
Option arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
24 |
Futures mispricing and arbitrage |
advanced |
advanced |
advanced |
Yes |
Functional |
25 |
Simple regression forecasting |
advanced |
advanced |
advanced |
Yes |
Functional |
26 |
Value at risk (risk management) |
advanced |
advanced |
advanced |
Yes |
Functional |
27 |
Lease or purchase |
advanced |
advanced |
advanced |
Yes |
Functional |
28 |
Duration |
advanced |
advanced |
advanced |
Yes |
Functional |
29 |
Yield to maturity and duration |
advanced |
advanced |
advanced |
Yes |
Functional |
30 |
Fixed-income risk and return |
advanced |
advanced |
advanced |
Yes |
Functional |
31 |
Dividend payout suggestions |
advanced |
advanced |
N/A |
N/A |
Functional |
32 |
Capital structure |
advanced |
advanced |
N/A |
N/A |
Functional |
Table 2.
Complex Reasoning Task Evaluation Results for ChatGPT-4o. Task understanding, task deconstruction, and calculation ideas and formulas are evaluated with descriptors of basic, intermediate, and advanced. Critical thinking is evaluated with descriptors of advanced, moderate, basic, superficial, and naïve.
Table 2.
Complex Reasoning Task Evaluation Results for ChatGPT-4o. Task understanding, task deconstruction, and calculation ideas and formulas are evaluated with descriptors of basic, intermediate, and advanced. Critical thinking is evaluated with descriptors of advanced, moderate, basic, superficial, and naïve.
Task Number |
Tasks |
Task understanding |
Task deconstruction |
Calculation ideas and formulas |
Accuracy |
Critical Thinking/level of critical thinking |
1.1 |
Technical Analysis and Stock Recommendation (Bollinger Bands) |
advanced |
advanced |
advanced |
Yes |
advanced |
1.2 |
Technical Analysis and Stock Recommendation (MACD) |
advanced |
advanced |
advanced |
Yes |
advanced |
1.3 |
Technical Analysis and Stock Recommendation (RSI) |
advanced |
advanced |
advanced |
Partially accurate |
advanced |
|
Portfolio Construction |
|
|
|
|
|
2.1 |
Stock summary statistics |
advanced |
advanced |
advanced |
Yes |
advanced |
2.2 |
Correlation matrix |
advanced |
advanced |
advanced |
Yes |
advanced |
2.3 |
Portfolio Construction - Global Minimum Variance |
advanced |
advanced |
advanced |
Partially accurate |
advanced |
2.4 |
Portfolio Construction - Optimal Risky Portfolio |
advanced |
advanced |
advanced |
Yes |
advanced |
2.5 |
Efficient Frontier |
advanced |
advanced |
advanced |
Yes |
advanced |
3 |
Capital Budgeting |
intermediate |
Basic |
basic |
No |
naive |
4 |
Financial Statement Analysis – Appendix 2 Q4 |
advanced |
advanced |
intermediate |
Partially accurate |
moderate |
5 |
Financial Statement Analysis – Appendix 2 Q5 |
intermediate |
Intermediate |
intermediate |
No |
superficial |
6 |
Option pricing- Binomial Tree |
advanced |
advanced |
moderate |
No |
moderate |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).