4.1. Data Sources
The armed conflict activity variables were sourced from Armed Conflict Location and Event Data Program (Raleigh et al. 2010) (ACLED). We also retrieved geo-coded maps from the Humanitarian Data Exchange (
Humanitarian Data Exchange, 2021, “Iraq - Subnational Administrative Boundaries,” accessed December 19, 2022,
https://data.humdata.org/dataset/cod-ab-irq), European Centre for Medium-Range Weather Forecasts of Copernicus Climate Change Service (Muñoz-Sabater et al. 2021), NASA (Huffman et al. 2009; Running and Mu 2015; McNally et al., 2017), Center for International Earth Science Information Network of Columbia University (Gridded Population of the World 2018), and MapSPAM (International Food Policy Research Institute 2019). The latter maps geo-code explanatory variables as grids. Our unit of analysis is the Iraqi municipalities. Our sample size includes all the 294 Iraqi municipalities. Our time horizon was from January 1, 2020, to January 1, 2022. The observations across this time horizon were aggregated as a single cross-section. We specify aggregation methods below. The reason for this time horizon is the high availability of recent data. Whenever a grid was not available for the entire horizon, we respectively sourced a grid for the corresponding variable across the available time horizon.
Despite the rationale for the inclusion of other explanatory variables pertaining to the society and politics (Yin, 2020; Adano et al. 2012; Raleigh 2010), Iraq is a data scarce country, and such variables are generally unavailable for Iraq at an appropriate resolution. We discuss this unavailability in the context of our research limitations in the discussion section.
Conflict activity variables
We retrieved several armed conflict variables from ACLED (Raleigh et al. 2010). Geospatially associated with specific geographical coordinates on a geographical map, values of conflict activity variables are geo-coded as counts. As conflict types, ACLED specifies battles, explosions/remote violence, violence against civilians, protests, riots, and strategic developments. ACLED also reports fatalities (The reported fatalities should be understood in terms of ACLED’s additional qualifications (Raleigh et al., 2010)) for each such conflict type (Raleigh et al. 2010).
Conflict Fatalities: Sourced from ACLED, our outcome variable was the count of total conflict fatalities. We proxy the outcome variable as the sum of counted fatalities that are attributed to each of the specific conflict event types.
Civilian Fatalities: From ACLED, we also sourced the count of civilian fatalities. Violence against civilians is specified as deliberate infliction of violence on unarmed non-combatants by an organized armed faction (Raleigh et al. 2010). These events include sexual violence attacks, abductions, or acts of forced disappearance (Raleigh et al. 2010). We sourced this variable as the count of fatalities for the conflict event that is designated as acts of violence against civilians.
Conflict Events: From ACLED, we also sourced conflict events. To proxy the counted conflict events, we summed counts for each of the specific conflict event types (i.e., battles, explosions/remote violence, violence against civilians, protests, riots, and strategic developments).
Explanatory Variables
Climatological Processes
Climate is the long-term weather pattern characterizing an area. Climatological conditions pertain to weather. We specified the climatological variables in accordance with Sakaguchi, Varughese, and Auld (2017). Because temperature, precipitation, and heat have been already hypothesized and empirically shown to be associated with and cause violent conflict (Hsiang, Burke, and Miguel 2013), we selected these variables to describe the weather conditions in Iraq.
Soil Temperature: Temperature is a physical quantity that expresses how hot matter is (Glickman and Zenk 2000; Inventing Temperature: Measurement and Scientific Progress 2005). We selected soil temperature, because it should impact the growth and cultivation of agricultural resources, as well as availability of water. From European Centre for Medium-Range Weather Forecasts’ ERA5-Land dataset (Muñoz-Sabater et al. 2021), we sourced soil temperature at the 28-100 cm depth: Corresponding to the resolution of 11132 square meters, a pixel on the map shows a value of temperature in Kelvins.
Precipitation Index: Precipitation is condensed water vapor that falls from clouds as raindrops, pellets or needles of ice, hail, or snowflakes (Glickman and Zenk 2000). We sourced this variable from NASA and Japan Aerospace Exploration Agency’s Integrated Multi-Satellite Retrievals for Global Precipitation Measurement Data as the quality index (The notion of quality refers to the degree of confidence in the index (Huffman, 2019)) of precipitation for monthly data (Huffman 2019): A pixel on the geo-coded map shows a value of precipitation in millimeters per 11132 square meters (Huffman et al. 2009).
Latent Energy: Also referred to as latent heat, latent energy refers to energy released from the Earth’s surface to the atmosphere that is associated with evaporation or condensation of water vapor at the Earth’s surface (Glickman and Zenk 2000), and, therefore, shows the climatological process that goes beyond temperature and precipitation, but impacts the physical surroundings. We sourced average latent heat flux (Running and Mu 2015) from NASA and United States Geological Survey’s MODIS 006 MOD16A2 dataset, where the flux pertains to the average latent heat that passes through matter. Corresponding to the resolution of 500 square meters, a pixel on the map shows a value in Joules.
Environmental scarcity of vital resources
Environmental scarcity refers to scarcity of vital resources for which human communities directly and vitally depend on their physical surroundings. Since scarcity of agricultural resources can catalyze violent conflict (Sakaguchi, Varughese, and Auld 2017), the earlier findings on rice production informed our proxy of crops in Iraq (Caruso, Petrarca, and Ricciuti 2016).
Rice Production: From Global Spatially Disaggregated Crop Production Statistics Data for 2010 Version 2.0 of MapSPAM, we sourced the production of rice for rainfed portion of the crop (International Food Policy Research Institute 2019). Corresponding to the resolution of 10000 square meters, a pixel on the map shows a value in metric tons. Given the constrained time horizon of the dataset, this variable was for 2017 as the most recent observation.
Demographics
Population Density: Since density of population has been found relevant for armed conflict activity (Raleigh and Hegre 2009), we sourced population density from Gridded Population of the World Version 4.11 of Center for International Earth Science Information Network at Columbia University (Gridded Population of the World 2018). Corresponding to the resolution of 927.67 square meters, a pixel on the map shows the estimated number of people per 30 arc-second grid cells. Given the constrained time horizon of the dataset, this variable was sourced for 2020 as the most recent observation.
From geo-coded observations to municipal values
All the variables are geo-coded. On each geo-coded map, values of the corresponding variable are respectively associated with specific geographical coordinates. Since our unit of analysis is the Iraqi municipalities, a municipal value of a given variable must be an aggregate of geo-located values of that variable within that municipality. However, the geo-coded maps that store these variables do not contain the municipal borders. Thus, a shapefile with the municipal borders had to be inscribed into each such map if the any geographical aggregation of pixel values were to be conducted municipally. We sourced this shapefile from the UN OCHA Humanitarian Data Exchange. The shapefile stores the municipal borders as a geometry variable (i.e., a polygon). The 294 municipal borders are shown in Figure 7.
Figure 5.
Municipal borders in Iraq.
Figure 5.
Municipal borders in Iraq.
Furthermore, the conflict activity variables are geolocated. For each municipality, we counted geolocated values of conflict activity variable that were reported to have occurred within that municipality’s polygon. This procedure was applied to all the count of conflict events, civilian fatalities, and conflict fatalities. Figure 8 shows the count of conflict fatalities across the Iraqi municipalities.
Figure 6.
Civilian fatalities visualized across the Iraqi municipalities.
Figure 6.
Civilian fatalities visualized across the Iraqi municipalities.
Finally, the available grids respectively store values of the explanatory variables as pixels. Each such pixel is associated with specific geographical coordinates. We respectively bounded the pixel values by the municipal polygons as in the previous case. Since the storage of explanatory variables was not sparse anymore, we aggregated the values of explanatory variables, so much so each aggregation emphasized extreme values of the observed explanatory variables, respectively. Specifically, pixel values of precipitation were aggregated geographically as pixel maxima and temporally as municipal maxima. Pixel values of soil temperature were aggregated geographically as pixel sums and temporally as municipal sums. Pixels values of average latent heat flux were aggregated geographically as pixel maxima and temporally as municipal standard deviations. Moreover, pixel values of rice production were aggregated geographically as pixel minima and temporally as municipal minima. Finally, pixels values of population density were aggregated geographically as pixel maxima for the year 2020, the most recent observation. Exemplifying the output, Figure 9 shows the resulting map for population density.
Figure 7.
Population density visualized across 294 municipalities in Iraq.
Figure 7.
Population density visualized across 294 municipalities in Iraq.
4.2. Methods
Causal methodology generally requires experimentation (Rubin 1974, 1978, 2005; Holland 1986). However, it is nowadays possible to infer and model causality even with non-experimental observations (Robins 1986; Spirtes, Glymour, and Schienes 1993; Hitchcock and Pearl 2001; Nichols 2007; Pearl 2009). Acknowledging the need for the non-experimental approach to environmental security, we argue that it is possible to unpack the black-box relationship at the core of climate-conflict nexus. By applying causal methodology to non-experimental observations, causal paths, and effects behind the causal mechanism of climate-conflict linkages can be respectively disentangled and quantified. This is done in three stages outlined in
Table 1.
Table 1.
The Stages of Causal Inference.
Table 1.
The Stages of Causal Inference.
Causal Discovery |
Causal Identification |
Causal Estimation |
Retrieval of a causal structure from non-experimental observations. |
Verification if a causal query has a unique answer and, if so, formulation of a causal effect as a quantity that is yet to be estimated (i.e., causal estimand). |
Calculation of a causal effect as a quantity that has been estimated (i.e., causal estimate), and testing that its probability is not due to randomness. |
The three subsections that follow explain in stages how causality is respectively discovered, identified, and inferred from non-experimental observations. The subsections exemplify response to a causal query: “What is the magnitude of causal effect of soil temperature on the count of conflict fatalities?”
Causal Discovery
The purpose of causal discovery is to retrieve a causal structure from available observations (Malinsky and Danks 2017). Such structures can be modeled graphically (Pearl 2009). Each directed edge of such causal graph represents causation between the node with an outgoing arrow and the node with an incoming arrow, respectively referred to as a cause and an effect (Pearl 2009). The graph on the left in
Figure 8 contains a directed edge: Soil temperature → Conflict Fatalities, indicating that a change in soil temperature causes a change in conflict fatalities. The graph in the middle, however, contains a bidirected edge: Soil temperature ↔ Conflict Fatalities.
Figure 8.
Bidirected edges and causal cycle in a causal structure.
Figure 8.
Bidirected edges and causal cycle in a causal structure.
Each node in-between a node with only outgoing arrows (i.e., root cause) and node with only incoming arrows (i.e., effect, outcome) is a mediating node (for instance, Population Density). Further, the graph on the right in
Figure 8 is characterized by a causal cycle: Soil Temperature → Conflict Fatalities → Population Density → Soil Temperature. Despite the recent theoretical advances (Bongers et al. 2021), the simplest conception of causality requires absence of bidirected edges and causal cycles from a causal graph, as they respectively point to hidden common causes or reverse causality; both of which can confound causal inference (Pearl 2009). Graphs without bidirected edges and cycles are referred to as directed acyclic graphs (DAG). The simplest conception of causality requires causal discovery to retrieve a DAG from available observations (Malinsky and Danks 2017).
Following this logic of causal discovery stage, we retrieved a DAG from our observations. Having applied Greedy Equivalence Search (GES) algorithm (Malinsky and Danks 2017), we retrieved the entire DAG from the available observations. The loss function we used was Bayesian Information Criterion. The output of the GES was the likeliest DAG, given our observations (Malinsky and Danks 2017). The nodes of the DAG correspond to our armed conflict activity and explanatory variables. The edges correspond to respective causal relationships between them (Malinsky and Danks 2017).
Causal Identification
The purpose of causal identification is to determine if, given a causal structure, the causal query has a unique answer (Shpitser and Pearl 2008). Otherwise, the query is unidentifiable. The identification also serves the purpose of formulating a quantity that is the unique answer to the query (Shpitser and Pearl 2008; Pearl 2009). A formula that enables quantification of that answer is referred to as an estimand (Lundberg, Johnson, and Stewart 2021).
Given the Soil temperature → Conflict Fatalities arrow in
Figure 9, a node such as population density opens an alternative path from soil temperature to conflict fatalities. If such a node is not explicitly accounted for, it is referred to as a confounder (Pearl 2009). If it is possible to exhaustively account for confounders, it is also possible to identify a causal query.
Figure 9.
Presence of a confounder in causal identification.
Figure 9.
Presence of a confounder in causal identification.
Given the probabilistic interpretation of causal graphs (Pearl 2009), let
be a conditional probability distribution, let
and
respectively be the conflict fatalities, soil temperature, and population density variables, let
,
, and
be the realized values of these variables respectively, and let
be the operator that encodes interventions (Pearl 2012). If no variable were associated with soil temperature and conflict fatalities in
Figure 9, then the unique answer to our causal query would have been
. However, since population density opens an alternative causal path between these two variables, not accounting for population density may preclude the determination of unique answer to our causal query. Hence, the confounding effects of population density must be eliminated from the answer if our query is to be identified; the confounding effect must be marginalized. Thus, the causal graph in
Figure 11 makes it possible to identify the unique answer to our causal query. Specifically,
formulates the unique answer to the query.
This reasoning exemplifies an identification criterion known as backdoor criterion (Pearl 2009). For other identification methods, we refer the reader to Tian and Pearl (2002) and Shpitser and Pearl (2008). Having applied the above reasoning to the retrieved DAG, we respectively identified causal estimands of our explanatory variables, whenever access to our available variables made identification possible.
Causal Estimation and Hypothesis Testing
The last stage of causal inference includes estimation and hypothesis testing. In this stage, an estimator (i.e., a calculation method for estimation purposes), along with sampled observations, is applied to a causal estimand (Pearl 2009; Lundberg, Johnson, and Stewart 2021). This application results in an estimated causal quantity of causal effect, i.e., causal estimate. Eventually, an assessment is made if the estimate should be attributed to random error. Otherwise, the quantity is considered statistically significant.
The histogram in
Figure 10 suggest that fatality counts can be modeled with Poisson distribution,
, where
is the expected value of fatality counts,
. The probability of an observed fatality count,
, is then stated as
, where
is the basis of natural logarithm.
Figure 10.
Histogram of counted conflict fatalities.
Figure 10.
Histogram of counted conflict fatalities.
Given the histogram in
Figure 10, we can resort to Poisson regression to calculate causal estimates as Poisson regression coefficients. A Poisson coefficient is a value that applies to an outcome variable, stated as counts, given a change in a single explanatory variable, and holding all the other explanatory variables constant. If the two explanatory variables from
Figure 11 can assumedly be linearly combined, then the expected value of fatality counts is stated as
, where
is the Poisson coefficient for soil temperature,
is the Poisson coefficient for population density, and
is a constant. In log terms,
. The latter formula shows why commonly reported parameters of Poisson regression are stated in logs of expected counts.
Given the observed soil temperature and population density variables, the conditional probability of single observed fatality count is stated as , where is the expected value of fatality counts. The previous subsection established that where is the probability of population density variable. The population density is equal to population count per pixel, where pixel is equal to 926.7 square meters (CIESIN - Columbia University, 2018). The histogram in Figure 13 shows that the population counts can also be modeled with Poisson distribution P(D) ∼ , where is the expected value of population counts. Assuming the latter, the probability distribution of an observed population count, is stated as .
Figure 11.
Histogram of population counts.
Figure 11.
Histogram of population counts.
If we apply an intervention that sets the soil temperature variable to value , the probability of an observed fatality count, , is stated as Assuming independently and identically distributed observations of fatality counts the probability of an entire sample of such observations is stated as , where the intervention sets soil temperature to value across all the observations of fatality counts. The index is required for the calculation of probability of entire sample of independently and identically distributed observations of fatality counts, and is required for marginalizing population counts that could confound the calculation of estimate otherwise. To estimate the probability of the entire sample of observations, we resort to maximum likelihood estimation (Bertsimas and Nohadani 2019).
A Poisson coefficient is interpreted as a difference in logs of expected total conflict counts. Specifically, let and designate the expected count of fatalities for soil temperature that the intervention respectively sets to and . The difference between the subscripts and refers to one unit change in soil temperature . The Poisson coefficient for soil temperature is then stated as Moreover, under this intervention, only the coefficient for soil temperature is interpreted causally. One unit change in soil temperature is expected to cause the difference in the logs of expected fatality counts to change by the Poisson coefficient for soil temperature, given population density is held constant.
Furthermore, if the real effect is hypothesized not to exist (the null hypothesis), a significance level is the probability of incorrectly rejecting the null hypothesis (also referred to as the probability of Type I error). If the estimate’s probability (i.e., ) is smaller than the selected significance level (typically set at 5%, 1%, or 0.1%), then that estimate is too improbable for the null hypothesis to hold. Otherwise, the inference fails to reject the null hypothesis. Specifically, if the null hypothesis states that the causal effect of soil temperature does not exist, and the for the Poisson coefficient for soil temperature is less than, for instance, 1%, then the null hypothesis is rejected at the 1% level of statistical significance. It is in this case that the response to our query (“What is the magnitude of causal effect of soil temperature on the count of conflict fatalities?”) can be stated as the estimated quantity. Otherwise, the null hypothesis is not rejected, and the estimated quantity is zero.
Having formulated our causal estimands, rather than with Poisson distribution, we implemented our estimation as a generalized linear regression with negative-binomial distribution
(Hilbe 2011). A generalization of Poisson distribution, negative-binomial distribution additionally allows to account for overdispersion of conflict fatalities, originating from conflict stickiness if the variance of conflict fatalities exceeds the mean of conflict fatalities. However, the interpretation of coefficients remains the same. By applying the maximum likelihood estimator (Bertsimas and Nohadani 2019) to our regression, we calculated the coefficients of our explanatory variables, those that yielded the maximum probability of our sample. We then tested their statistical significance. The estimation requires that our observations do not violate specific statistical and causal assumptions. We discuss these assumptions, their violations, and remedies in
Section 6.