Introduction
In the current digital world, information security breaches pose a significant threat to companies. Therefore, the datasets described herein contain data on information security breaches and related financial data of S&P500 companies. Initially, these datasets have been collected to obtain insight into the impact of information security breaches on overall firm performance and the moderating impacts of insider involvement and breach identification. Data on the information security breaches affecting S&P500 companies between 2020 and 2023 has been collected by manual search of the Internet. The financial data, including stock data and several accounting measures have been collected via Refinitiv (LSEG). With this document, a description of the datasets, the procedure for collecting the data and an initial version of the datasets are provided. While these datasets are useful for examining the impact of information security breaches on firm performance, they are also useful for other research purposes. For example, the identification of patterns and trend in information security breaches, the effect on other financial metrics or the development of predictive models.
The rest of the document is structured as follows. First, a specification table presenting a general overview of the characteristics of the dataset will be given. Then, the value of the data will be explained. The data description part describes in detail what the datasets contains. Lastly, the data collection process will be explained.
Specification Table
Table 1 present an overview of the datasets, facilitating easy access to all relevant characteristics of the data. It includes details such as the subject area, data type, the acquisition of the data and the data format. Additionally, it outlines the data collection process and its parameters.
Value of the Data
The datasets records disclosed information security breaches of S&P500 companies between 2020 and 2023 and financial stock data and accounting measures. These comprehensive datasets are vital for examining the impact of disclosed information security breaches on performance measures, shedding light on the connection between security breaches and firm performance. As the dataset focuses on recent data, it is valuable for gaining new insights and identifying patterns and trends in information security breaches. Moreover, the availability of accounting metrics alongside stock data enables the examination and understanding of the overall economic consequences. Researchers across various sectors such as finance, cyber security and IT can leverage this data to enhance their understanding of information security breaches. These insights are not only valuable for researchers, but also inform investor decision making incorporating cyber security risks. In addition, companies can utilize this information to enhance their risk management strategies and information security policies. Beyond examining the impact of information security breaches on firm performance, this dataset can be used for analysis of trends over time, industry- or firm-specific comparisons and investigation into other moderating factors like firm characteristics. The events facilitate the analysis of impact on other metrics like market share and brand reputation. Additionally, the data enables the development of predictive models of the occurrence and impact of information security breaches. Furthermore, comparative analysis of the datasets with other datasets or research findings in the field of information security enables researchers to identify commonalities, discrepancies and trends. All are advancing research in information security and contribute to deeper understanding of information security breaches.
Data Description
Overall, three types of datasets have been collected. The first dataset centers on specific information security breach events. The second and third dataset focus on short term financial data and long term financial data, respectively. In the following subsections, each type of dataset will be described in detail
1. Information Security Breach Data
The event dataset includes rows corresponding to each individual S&P500 company and columns representing various variables of interest, capturing crucial information regarding information security breaches among S&P500 companies.
Table 2 offers a comprehensive overview of the columns in the dataset, including their data types and descriptions. The primary key of the dataset is the ticker: a one, two, three or four lettered code which uniquely identifies each company. In addition to this ticker, the company name serves as another identifier for each company in the dataset ensuring clarity. The dataset includes each company’s operating sector, categorized in one of 19 possible sectors. Focusing on the information security breaches, a boolean variable has been added, labeled yes when the firm experienced a disclosed information security breach between 2020 and 2023 and no if they did not. In case of a disclosed information security breach, the date of first disclosure has been added. If a firm did not experience a disclosed information security breach, the disclosure date is denoted as X. In some cases of breaches a short description is added, such as ”multiple breaches” if the firms experienced multiple disclosed breaches in the period, ”no open source” when the disclosure was not publicly available”, or for example ”MOVEit” or ”Cl0P” to specificy the name of the breach. If the firm did not experience a disclosed information security breach or no description was added, this is again denoted as X. In the case of a security breach, the dataset includes the URL of the source where the first disclosure about the breach was found. For some breaches, an additional source has been added which has been used to obtain the URL of the initial disclosure. If no disclosed information security breach occurred and no additional source was utilized, it is denoted as X. In addition, a column has been added to indicate the level of insider involvement in case of a breach.
Table 3 outlines the possible values of insider involvement, along with a description of each value. If no information was disclosed regarding insider involvement during a breach, it is denotes as ”unknown”. If there was no disclosed information security breach, it is denoted as X. Then, a column has been added to indicate who identified the information security breach.
Table 4 outlines the possible values of the identifier with a description for each value. Again, if no information was disclosed about the identifier, it is denoted as ”unknown” and if there was no security breach, it is denoted as X. Two additional columns have been included to capture firm characteristics: the four-digit SIC code representing its industry and the firm size in thousands relative to industry averages. If data on these characteristics were unavailable, they are marked as ”unknown”. In case of no information security breach, it is denoted as X. For each breached firm, an associated control firm has been identified. The unique ticker symbol of this control firm is provided in the control firm ticker column. In cases where no breach was disclosed, no control firm was needed, thus marked as X. If no suitable control firm could be determined, is it denoted as ”unknown. Finally, the dataset includes two columns specifying the range of stock data required, based on the date of disclosure. This range spans from 125 days before to 125 days after the first disclosure of the information security breach. In case of no security breach, they are both denoted as X. Initially, data was obtained for 503 companies. Among them, 205 companies experienced a disclosed information security breach within the specified period. After sample filtering, the final sample comprises 97 companies that faced a disclosed information security breach between 2020 and 2023 for which all information is available. The dataset includes all firms experiencing a disclosed information security breach, regardless of the completeness of information, and firms without disclosed information security breach, as all may provide valuable insights for further analysis
2. Short Term Financial Data
The short term financial dataset consists of two files structured similarly: one for the breached firms and one for the control firms. Each row corresponds to a combination of a firm with a trading date for which stock data has been collected. The columns represent various stock-related variables of interest.
Table 5 offers a comprehensive overview of the columns in the dataset, including their data types and descriptions. Similar to the event dataset, the ticker symbol and company name are included. In addition, the date variable corresponds to the date for which stock data is recorded, aligning with the range provided by the first and last stock date in the event dataset (125 days before to 125 days after the event). The unique combination of the ticker and the date is the primary key in this dataset. Focusing on the stock related variables, the closing price of the breached or control company’s stock has been included. Additionally, it incorporates the net change in stock price compared to the previous trading day, indicating whether the difference is positive or negative. This variable is available for each day, except for the initial date data is collected for a company, where it is left blank. In addition, the percentage change in the company’s stock compared to the previous trading day has been included, except for the initial date when data is collected. Moreover, the dataset includes the opening price of the company’s stock on the given date, as well as the lowest and highest prices observed on that day. The dataset also includes the total number of traded shares for the company’s stock on a certain date, along with the total value of those traded shares. Furthermore, the flow of the company’s stock on a certain trading date is included, indicating the overall direction of trading activity by comparing inflow (stock bought) and outflow (stock sold) of the stock. A positive flow indicates a net buying activity, when more shares were bought than sold on that specific date. Conversely, a negative inflow indicates a net selling activity, when more shares were sold than bought. The flow has a value of 0 on each company’s initial trading date. All values are denoted in USD to be able to easily compare them. The short term financial dataset is supplemented by an additional dataset that contains the S&P500 market index data. This data has been collected to control for industry specific influences later in this research, such as events affecting the entire industry like the Covid-19 pandemic. Similar to the collection of stock data for the breached and control firms spanning from October 1, 2019 to May 1, 2024, corresponding market data has been gathered. The variables collected are similar to those in the short term financial dataset: ticker, date, stock close, stock net, percentage change, stock open, low, high, volume, turnover and flow, allowing for direct comparison of the breached firms with the market index
3. Long Term Financial Data
The long term financial dataset also comprises two similarly structured files: one for the breached firms and one for the control firms. Every row corresponds to a breached or control firm, while the columns represent various accounting metrics of interest for each firm. Table 6 offers a comprehensive overview of the columns in the long term financial dataset, including their data types and descriptions. The ticker symbol of the firm servesas the primary key, uniquely distinguishing each row. In addition, the company name has been included. For each ticker, the date of first disclosure of the information security breach, as recorded in the event dataset, has been included. Based on this date, the accounting metrics have been gathered. Specifically, data for each metric has been collected from financial statements spanning from 2 quarters prior to two quarters after the disclosure. Consequently, for each firm, data from differences quarters has been collected. Each metric, in combination with a quarter (-2, -1, 1 or 2), is represented in a column. This timeframe was chosen to enable to measurement of the long term impact of disclosed information security breaches while minimizing the risk of other significant events occurring that could affect the metrics. The long term financial dataset includes data on sales (S) and operating income (OI), reported in millions of USD. Additionally, it comprises return on assets (ROA) and return on equity (ROE), expressed in percentages and calculated pretax. Lastly, the dataset includes operating income divided by assets (OI/A) and operating income divided by sales (OI/S), given as ratios. These specific accounting measures are carefully selected as they are most commonly used as firm performance indicators, reflecting firm profitability and efficiency (Bharadwaj, 2000; Ko & Dorantes, 2006). Negative values are indicated with a ”-” sign and are shown in red text. In case no data was available for a specific metric in a specific quarter, it is left blank.
Variable |
Type |
Description |
Ticker |
Categorical |
Ticker symbol of the S&P500 company to uniquely identify the company |
Company name |
Text |
Name of the S&P500 company |
Date of breach |
Date |
Date of the information security breach events to be able to determine the needed quarterly data |
S – 2 |
Numerical |
Sales in millions two quarters before the information security breach |
S – 1 |
Numerical |
Sales in millions one quarter before the information security breach |
S + 1 |
Numerical |
Sales in millions one quarter after the information security breach |
S + 2 |
Numerical |
Sales in millions two quarters after the information security breach |
OI – 2 |
Numerical |
Operating income in millions two quarters before the information security breach |
OI – 1 |
Numerical |
Operating income in millions one quarter before the information security breach |
OI + 1 |
Numerical |
Operating income in millions one quarter after the information security breach |
OI + 2 |
Numerical |
Operating income in millions two quarters after the information security breach |
ROA – 2 |
Numerical |
Pretax return on assets two quarters before the information security breach |
ROA – 1 |
Numerical |
Pretax return on assets one quarter before the information security breach |
ROA + 1 |
Numerical |
Pretax return on assets one quarter after the information security breach |
ROA + 2 |
Numerical |
Pretax return on assets two quarters after the information security breach |
ROE – 2 |
Numerical |
Pretax return on equity two quarters before the information security breach |
ROE – 1 |
Numerical |
Pretax return on equity one quarter before the information security breach |
ROE + 1 |
Numerical |
Pretax return on equity one quarter after the information security breach |
ROE + 2 |
Numerical |
Pretax return on equity two quarters after the information security breach |
OI/A – 2 |
Numerical |
Operating income divided by total assets two quarters before the information security breach |
OI/A – 1 |
Numerical |
Operating income divided by total assets one quarter before the information security breach |
OI/A + 1 |
Numerical |
Operating income divided by total assets one quarter after the information security breach |
OI/A + 2 |
Numerical |
Operating income divided by total assets two quarters after the information security breach |
OI/S – 2 |
Numerical |
Operating income divided by sales two quarters before the information security breach |
OI/S – 1 |
Numerical |
Operating income divided by sales one quarter before the information security breach |
OI/S + 1 |
Numerical |
Operating income divided by sales one quarter after the information security breach |
OI/S + 2 |
Numerical |
Operating income divided by sales two quarters after the information security breach |
Experimental Design, Materials and Methods
To obtain the three types of datasets described above, several steps have been taken related to the collecting of this data, which will be described below.
2. Financial Data
For the short term financial data, stock price data has been obtained from financial market data provider Refinitiv (LSEG). Data of the breached and control firms’ stock have been collected for 125 days prior to 125 days after the disclosure of the information security breach, ensuring a reliable time frame for event studies (Campbell et al., 2003; Gordon, Loeb & Zhou, 2011; Tripathi & Mukhopadhyay, 2022). All has been merged into two XLS files, one for the treatment firms and one for the control firms. In addition, the stock price data from the general market index (S&P500) has been collected from October 1, 2019 to May 1, 2024, similar to the range of data collected for the breached and control firms.
For the long term financial data, several accounting measures are employed. Using accounting measures of performance to assess firm performance is a widely adopted and accepted approach (Bharadwaj, 2000; Hunton, Lippincott & Reck, 2003; Ko & Dorantes, 2006). Consequently, data on four profit-based ratios has been gathered: Return on Assets (ROA), Return on Equity (ROE), Operating Income to Assets (OI/A) and Operating Income to Sales (OI/S). ROA and ROE are indicators of the profitability of a firm, while OI/A and OI/S serve as measures of the direct effect (Bharadwaj, 2000). Hence, these profit ratios are considered reliable measures of firm performance and are therefore collected. In addition, data on sales and operating income has been gathered. Data regarding these performance measures has been collected from Refinitiv (LSEG). Quarterly financial performance data has been collected for both the treatment and control samples, focusing on two distinct timeframes: before and after the disclosed information security breach. For each treatment sample, the performance metrics spanning two quarters before to two quarters after the security breach have been collected based on the financial statements of the firms. Correspondingly, the performance metrics have been collected for each related control firm for the same applicable quarters included in the treatment firm. The decision to limit the analysis to two periods before and after the disclosed security breach was made to minimize the potential impact of other significant events unrelated to this research. Again, all has been merged into two XLS files, one for the treatment firms and one for the control firms
Conclusions
In this document, a collecting of datasets has been presented to advance research in information security breaches. Data regarding disclosed information security breaches has been collected through manual search of Internet. In addition, short term and long term financial data has been retrieved from Refinitiv (LSEG). The breach dataset includes data on 504 companies, of which 97 experiences an disclosed information security breach and had all necessary data available. For these 97 firms, both short term and long term financial data has been collected. The datasets spans data from 2020 to 2023 affecting S&P500 companies.
References
- Bharadwaj, A.S. A Resource-Based Perspective on Information Technology Capability and Firm Performance: An Empirical Investigation. MIS Q. 2000, 24, 169–196. [CrossRef]
- Campbell, K.; Gordon, L.A.; Loeb, M.P.; Zhou, L. The Economic Cost of Publicly Announced Information Security Breaches: Empirical Evidence from the Stock Market. J. Comput. Secur. 2003, 11, 431–448. [CrossRef]
- Gordon, L.A.; Loeb, M.P.; Zhou, L. The Impact of Information Security Breaches: Has There Been a Downward Shift in Costs? J. Comput. Secur. 2011, 19, 33–56. [CrossRef]
- Hunton, J.E.; Lippincott, B.; Reck, J.L. Enterprise Resource Planning Systems: Comparing Firm Performance of Adopters and Nonadopters. Int. J. Account. Inf. Syst. 2003, 4, 165–184. [CrossRef]
- Ko, M.; Dorantes, C. Ko, M.; Dorantes, C. The Impact of Information Security Breaches on Financial Performance of the Breached Firms: An Empirical Investigation. J. Inf. Technol. Manag.
- Paul, S.; Das, S. Public Disclosure of Information Security Breach Incidents:Short-term Stock Market Reaction on Indian Listed Firms. J. Organ. Comput. Electron. Commer. 2024, 1–29. [CrossRef]
- Tripathi, M.; Mukhopadhyay, A. Does Privacy Breach Effect Firm Performance? An Analysis Incorporating Event-Induced Changes and Event Clustering. Inf. Manag. 2022, 59. [CrossRef]
Table 1.
Tabular description of all information in the datasets.
Table 1.
Tabular description of all information in the datasets.
Subject |
Information security / economics |
Specific subject area |
Information security breaches and firm performance measures |
Type of data |
Six Microsoft Excel documents (.xls) |
How data were acquired |
Manual search of Google Chrome, financial data from Refinitiv Workspace (LSEG) |
Data |
Raw |
Parameters for data collection |
Data has been collected on disclosed information security breaches between 2020 and 2023 to capture recent data and ensure relevance to recent trends and developments. This process involved sourcing from various highly visible media outlets, as these are likely to be followed by investors and consumers and therefore likely able to detect an effect. Specific key- words were used to identify relevant events. This research focused on S&P500 companies, as their security breaches are likely to attract attention and their high data availability. Several data collection and filtering steps were taken to ensure data integrity |
Description of data collection |
The event data collection process involved identifying disclosed information security breaches among S&P500 companies from 2020 and 2023. This was done through manually searching various highly visible media outlets, including technology platforms, news sites and firm’s websites. 4 keywords were used to identify disclosed information security breaches. Initially, 205 breached firms were identified, with data collected on disclosure date, insider involvement and the identifier of the breach. Matching control firms were selected based on industry SIC codes and size similarity. Financial data, including stock data and several accounting measures were obtained from Refinitiv (LSEG) for both short- and long term impact assessment. |
Table 2.
Information security breach dataset overview.
Table 2.
Information security breach dataset overview.
Variable |
Type |
Description |
Ticker |
Categorical |
Ticker symbol of the S&P500 company to uniquely identify the company |
Company name |
Text |
Name of the S&P500 company |
Sector |
Categorical |
Sector in which the S&P500 company operates |
Disclosed information security breach between 2020 and 2023 |
Boolean |
Indicates whether the company experienced a disclosed information security breaches between 2020 and 2023 (yes or no) |
Date of disclosure |
Date |
Date of first disclosure of the information security breach |
Short description |
Text |
Brief description of the information security breach |
Source |
Text |
URL of the first disclosure of the information security breach |
Additional source |
Text |
Additional URL used to obtain the URL of the first disclosure of the information security breach |
Insider involvement |
Categorical |
Involvement of insiders and outsiders in the information security breach (insider error, malicious insider or outsider) |
Identifier of the breach |
Categorical |
Identifier of the information security breach (company itself, hacker or third party |
SIC code |
Categorical |
Four-digits Standard Industrial Classification (SIC) code of the breached firm |
Firm size (in thousands) |
Numerical |
Size of the breached company in proportion to industry averages (given that they have more than 100 million euros operational revenue, more than 200 million euros total assets and more than 1000 employees) |
Control firm ticker |
Categorical |
Ticker symbol of the related S&P500 control firm |
First stock date |
Date |
Starting date from which stock prices of the breached and control firm are necessary, 125 days before the event date |
Last stock date |
Date |
End date from which stock prices of the breached and control firm are necessary,125 days after the event date |
Table 3.
Insider involvement column specification.
Table 3.
Insider involvement column specification.
Values |
Description |
Insider error |
The information security breach was the result of an unintentional action or mistake by someone within the organization, for example through phishing |
Malicious insider |
The information security breach was caused by someone within the organization with malicious intent |
Outsider |
The information security breach was caused by an external hacker or hacker group who gained unauthorized access without the involvement of an insider |
Table 4.
Identifier of the breach column specification.
Table 4.
Identifier of the breach column specification.
Values |
Description |
Organization Itself |
T he information security breach was identified internally by the breached organization |
Hacker |
The information security breach was identified by / diclosed to the breached firm by the external hacker who gained unauthorized access, for example through ransomware |
Third party |
The information security breach was identified by a benign third party, for example a cybersecurity firm or regulatory agency |
Table 5.
Short term financial dataset overview.
Table 5.
Short term financial dataset overview.
Variable |
Type |
Description |
Ticker |
Categorical |
Ticker symbol of the S&P500 company to uniquely identify the company |
Company name |
Text |
Name of the S&P500 company |
Date |
Date |
Date for which stock data is recorded |
Stock close |
Numerical |
The closing price of the company’ stock on the given date in USD |
Stock net |
Numerical |
The net change in the company’s stock price compared to the previous trading day in USD |
Percentage change |
Numerical |
The percentage change in the company’s stock price compared to the previous trading day |
Stock open |
Numerical |
The opening price of the company’s stock on the given date in USD |
Low |
Numerical |
The lowest price of the company’s stock during the given trading date in USD |
High |
Numerical |
The highest price of the company’s stock during the given trading date in USD |
Volume |
Numerical |
The total number of shares traded for the company’s stock on the given date, including both buying and selling |
Turnover |
Numerical |
The total value of shares traded for the company’s stock on the given date in USD |
Flow |
Numerical |
The flow of the company’s stock on the given date, indicating overall direction of trading activity through a comparison of inflow (stock bought) and outflow (stock sold) |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).