2.1. Literature Review2.2. Introduction
This section is aimed to provide an overview of existing research and scholarly work related to thunderstorms frequency, agriculture, climate change and Machine learning. thunderstorm productivity is an important component of food security and agricultural sustainability. Accurate modelling is important for crop yield for making good for informed decision in agriculture, resource allocation, and understanding the impact of climate change on food production. In recent times, machine learning has gained popularity as valuable tool for predicting a lot of environmental issues and concerns, such as predicting the weather, Tsunami, and other environmental pollution.
Machine Learning Algorithms
Machine learning offers a robust framework for handling the mixed nature of agriculture data and complex interactions between various factors affecting crop yields. These factors include soil quality, weather conditions, seed variety and farming practices. ML algorithms can process large datasets to identify patterns and relationships that are not immediately apparent to human analysts. Despite the potential of ML in agriculture, several challenges persist. Data quality availability are often limiting factors. High quality and refined data are required to trained models effectively.
Additionally, the selection of relevant features and the interpretation of ML model’s outputs are important for providing actionable insights and there are challenges to them for instance, feature selection can be a challenge in modeling, especially when dealing with high-dimensional data or datasets with a large number of variables. The process of feature selection aims to identify the most relevant and informative features (variables) that contribute to the predictive power of the model while removing irrelevant or redundant features. As the number of features increases, the complexity of the model also increases, leading to higher computational costs and potential overfitting issues (Bellman, 1961). Irrelevant or redundant features can introduce noise and misleading patterns, making it harder for the model to learn the underlying relationships effectively (Guyon & Elisseeff, 2003). Models with a large number of features can become difficult to interpret and understand, especially in domains where interpretability is essential, such as healthcare or finance (Ribeiro et al., 2016). In some cases, features may be highly correlated with each other, leading to multicollinearity issues (Dormann et al., 2013). This can cause instability in the model’s parameter estimates and make it challenging to determine the individual contribution of each feature. In certain domains, such as text mining or genomics, the number of features can be extremely large compared to the number of observations. Feature selection becomes crucial in these scenarios to identify the most informative features and avoid overfitting (Saeys et al., 2007).
Many advanced machine learning models, such as neural networks or ensemble methods, are often criticized for being “black boxes,” (Khaki, S., & Wang, L. 2019) meaning that it’s difficult to understand the internal workings and the specific contributions of each feature to the final output. This lack of interpretability can be problematic in the agricultural domain, where stakeholders (farmers, policymakers, researchers) may want to understand the underlying relationships and mechanisms driving crop productivity. According to Shahhosseini et al.(2020) interpreting the models becomes even more challenging when dealing with complex interactions and non-linear relationships between the features and the target variable (yield). In research conducted by Cao et al. (2021) they highlighted that the interpretability of machine learning models remains a challenge, which limits their ability to provide meaningful insights into the underlying mechanisms and interactions between environmental factors and crop growth processes.
Recent studies by Priyatikanto et al. (2023) have focused on improving the accuracy and reliability of ML models for crop yield prediction. For instance, a systemic review by Assous et al. (2023) highlighted the importance of selecting appropriate ML methods and features that can analyze large amounts of data and provide accurate results in this they developed sustainable ML mode; to crop yields in the Gulf countries, emphasizing the impact of variables like rain, temperature changes and nitrogen fertilizer.
Thunderstorms are meteorological event that have been studied widely due to their impact on agriculture and human activities. Doswell (2001) describes the three key factors necessary for thunderstorm formation: moisture, instability and a lifting mechanism. The intensity and frequency of thunderstorms are influenced by various factors including temperature, humidity and atmospheric dynamics (Markowski and Richardson, 2010). In West Africa, including Ghana, thunderstorms are often associated with the movement of inter tropical convergence zone( ITCZ) Nicholas (2018) explains that the seasonal migration of the ITCZ plays a key role in determining the timing and intensity of thunderstorms in a region.
The relationship between thunderstorms and agricultural productivity is diverse. While thunderstorms can provide necessary rainfall for crop growth, they can also lead to significant damage. Rosenzweig et al (2001) highlights how extreme precipitation events, often associated with thunderstorms, can lead to soil erosion, water logging and nutrients leaching, all of which negatively impacts crop yields. In Ghana specifically Antwi-Agyei et al. (2014) found that extreme weather events including intense thunderstorm, contribute to crop failures and food insecurity, particularly in the Northern region where Wa is located. Their study emphasizes the need for improved weather forecasting and agricultural adaptation strategies.
Advances in technology and data analysis have led to great improvements in thunderstorms modeling. Traditional approaches often relied on numerical weather predictions (NWP) models. However, as pointed out by Gijben et al., (2017), machine learning techniques have shown promising results in improving the accuracy of thunderstorm predictions. Litta et al., (2012) demonstrated the effectiveness of artificial neural networks in predicting thunderstorm occurrence, using parameters such as temperature, humidity and wind speed. Their model showed improved accuracy compared to traditional statistical methods. In the African concept, Thiaw et al., (2017) used a combination of satellite data and machine learning algorithms to predict extreme precipitation events, including those associated with thunderstorms.
The use of thunderstorms predictions lies in their ability to inform agricultural decision making. Crane et al., (2011) emphasizes the importance of understanding local farming practices and decision making processes when developing weather based agricultural advisory service. In Ghana, Naab et al., (2019) found that farmers in Upper West, where Wa is located, increasing rely on weather forecasts for making planting decisions . However, they also noted challenges in forecast interpretation and the need for more localized, user friendly prediction tools.
Climate change is expected to change thunderstorm patterns globally. Taylor et al., (2017) shows that warming temperatures may lead to more intense thunderstorms in parts of Africa including Ghana. This shows and relay the need for adaptable modeling approaches that can account for changing climate dynamics. Sylla et al., (2016) used regional climatic models to project future rainfall patterns in West Africa, including a potential increase in extreme precipitation events, including those associated with thunderstorms, particularly in the latter half of the 21st century.
The impact of thunderstorms on crop yield have been studied extensively. Lesk et al., (2016) conducted a global analysis of extreme weather disasters and their efforts on crop production, finding that droughts and extreme heat greatly reduced national cereal production, while the impact of flood and extreme cold were generally less severe. However they noted that the effects varied regionally, emphasizing the need for localized studies . In West Africa, Roudier et al., (2011) reviewed the potential impact of climate change on crop yields, highlighting the vulnerability of rain fed agriculture to changes in precipitation patterns, including those associated with thunderstorms.
The use of remote sensing and satellite data has greatly increased our ability to monitor and predict thunderstorm activity. Sorooshian et al., (2000) demonstrated the potential of satellite based precipitation estimates for hydrological modeling and water resource management, which is particularly relevant for understanding the impact of thunderstorm on agriculture. Building on this, Huffman et al., (2007) developed the tropical rainfall measuring mission (TRMM) multi satellite precipitation analysis, which has been widely used for studying precipitation patterns in tropical regions, including west Africa.
In terms of agricultural adaptations to extreme weather events, Howden et al., ( 2007) emphasized the importance of developing crop varieties that are more resilient to climate variability. This approach could be particularly relevant in regions like Wa, where thunderstorms pose a risk to crop production. Additionally, Di Falco and Veronesi (2013) studied the role of crop diversification as an adaption strategy in Ethiopia, finding that it significantly increased farm productivity in the face of climate variability.
The Integration of indigenous knowledge with modern forecasting techniques presents an opportunity for more comprehensive and locally relevant thunderstorms prediction models. Cudjoe et al., (2014) explored the precipitation and indigenous knowledge of climate change in Ghana, highlighting the potential for combining traditional and scientific approaches to weather forecasting. Similarly, Nyong et al., (2007) discussed the value of indigenous knowledge in climate change mitigation and adaptation strategies in the African Sahel.
While progress has been made in thunderstorm modeling and understanding it’s agricultural impacts, several gaps remain. There is a need for more localized studies, particularly in regions like Wa, where the link between thunderstorm and agriculture is an issue but understudied. The development of user friendly, localized predictions tools that integrates multiple data sources and account for climate change projections remain a challenge.
Crop productivity. A study by Antwi-Agyei et al. (2012) examined the effects of climate variability and change on food security in some regions of ghana which Northern region happens to be part of that study. They highlighted the region’s variability to drought, downpour of unexpected rains and high temperature which has affected the regions crop productivity and low yield in certain crops. Cedric et al. (2022) in their work “crops yield prediction based on machine learning models: Case of West African countries”, examined the high impact climate and other factors has on crop productivity yield which is dwindling the agricultural zones in most regions in Ghana and other African Countries.
Climate change impact on Agriculture
The impact of climate change on agriculture has been a topic of considerable research since it has become a global priority to ensure food security. The United Nations framework convention on climate change (UNFCC, 2014) and the International Panel on Climate Change (IPCC, 2014) have highlighted the risks associated with changing climate patterns, including shifts in rainfall patterns and the increasing temperatures in Africa. These changes can significantly affect crop yield and agricultural practices. Lobell et al. (2012) conducted a study on the influence of climate change on global crop productivity which Africa was highlighted. Their findings mark the importance of modeling and predicting climate change impacts on crop productivity, aligning with the objectives this study carries.
Sustainable Agriculture and Food Security
Achieving sustainable agriculture and food security is shared goal among data scientist, researchers, policymakers, and international organizations. The Ministry of food and agriculture and the Food and Agriculture Organization (2020) has consistently stressed the importance of sustainable practices to ensure food security in the face of growing economy and environmental shift. Sustainable agricultural practices such soil management, water management, proper information to farmers and available data for researchers, play important role in the achieving better crop yield. Oikonomidis et al.,2023) in their research: Deep learning for crop yield prediction, gave insights into the resilience for sustainable African agriculture. They emphasis on the need to integrate modern technology and sustainable practices to ensure better crop yield and eliminate hunger.
Machine Learning in Agriculture
Machine learning (ML) has changed various fields, including agriculture. ML techniques, such as decision trees, support vector Machine (SVM), and neural networks (NN) have shown promise in predicting crop yield (Cedric et al., 2022). Naveen et al. (2022) in their work emphasized that the ability to analyse big datasets and capture complex relationships between the climate and crop performances makes ML a powerful tool in the agricultural and other field.
Crop yield and climate parameters, Buenor et al. (2023) in their work emphasized that, in crop growth and yield climate parameters play a big role. Rainfall, temperature, sunlight, and humidity are among key factors that that influences crop productivity. Understanding how these parameters act or interact with specific crops is good for effective modelling. ML algorithms excel in capturing in capturing non-linear between climate parameters and crop yield, giving advantages over traditional statistical methods (Lontsi et al.,2022)
In a work conducted by Subhadra et al. (2016) they developed a model for corn and soybean yield forecasting with climatic aspect by applying artificial neu ral network. They have considered the rainfall, Maryland corn and soybean yield data and predict the corn and soy bean yield at state, regional and local levels by applying both the artificial neural network technique and the mul tiple linear regression model. Lastly, they compared both the techniques and conclude that the ANN model gives more accurate yield prediction than the multiple linear regressions. Crop-climate interaction modeling., determining the climatic parameters affecting yields requires an understanding of the complex interactions between crop growth stages, environmental conditions, and management practices(Veenadhari et al.,2014)
Machine learning is a realistic method that can provide better yield prediction based on many attributes. It is a subdivision of Artificial Intelligence (AI) that focuses on learning. Machine learning (ML) can discover information from datasets by identifying patterns and correlations. The models must be trained using datasets that represent prior experience-based outcomes(Wigh et al.,2022). Crop simulation models, such as DSSAT (Decision Support System for Agrotechnology Transfer) and APSIM (Agricultural Production Systems Simulator), can be useful tool for exploring these interactions and quantifying the impacts of climate variables on crop yields (Jones et al., 2003; Keating et al., 2003)
In a work conducted by Veenadhari et al. (2014) they came up with a website designed as an interactive software tool for predicting the influence of climatic parameters on the crop yields. C4.5 (The C4.5 is a popular decision tree algorithm used for classification tasks in machine learning and data mining.) algorithm is used to find out the most influencing climatic parameter on the crop yields of selected crops in selected districts of Madhya Pradesh. This software provides an indication of relative influence of different climate parameters on the crop yield, other agro-input parameters responsible for crop yield are not considered in this tool, since, and application of these input parameters varies with individual fields in space and time. Based on the C4.5 algorithm, decision tree and decision rules have been developed, which are displayed when icon decision tree is selected. This website went under massive data training and finally went under ML training to be to give the results. Using the developed software, the influence of climatic parameters on crop productivity in selected districts of Madhya Pradesh was carried out for predominant crops. For Soybean crop in all the selected districts, the most influencing parameter was found to be cloud cover, for paddy crop it was found as rainfall, for maize crop it was maximum temperature and for wheat crop the minimum temperature. Their work aligns with my where crop which are being modeled are against climate parameters
Several agricultural, soil, and environmental elements such as temperature, humidity, rainfall, moisture, and pH have an impact on agricultural production. Farmers continue to use the traditional methods they learned from their forefathers. However, the issue is that back then, when the climate was quite wholesome, everything went on schedule. Many things, however, have changed because of global warming and numerous other variables (Morales and Francisco, 2023). The various existing methods have solved these issues but also have several drawbacks such as low spatial resolution, more challenges in real-time implementation, minimal accuracy rate, etc. Because of this concern, a novel technique called interfused machine learning with an advanced stacking ensemble model is introduced for accurate prediction of various crops. The impact of changing climatic conditions on crop productivity and yield is an important of research. Studies have shown that rising temperatures can lead to heat stress, negatively affecting crop growth and reducing yields (Lobell and Gourdji, 2012). Additionally, shifts in precipitation patterns, including increased frequency of droughts and intense rainstorms, can limit the availability water for crops, causing yield reductions or crop damage from flooding (Lobell et al, 2011). A well detailed review of these climatic impacts on various crops and region is necessary to understand the extent and mechanisms of these effects.
Ensemble modeling techniques, to improve the accuracy and robustness of crop production models, ensemble modeling techniques can be employed. These techniques combine multiple machine learning algorithms or models, leveraging their strength and mitigating their weakness. (Paudel et al, 2021). Ensemble methods, such as, boosting, bugging, and stacking, can enhance predictive performance of crop yield productivity models.
An ML model can be descriptive or predictive, depending on the research topic and questions. Predictive models use past knowledge to predict what will happen in the future. Descriptive templates, on the other hand, help to describe how things are now or what happened in the past. (Bali et al, 2022) Machine learning could help predict agricultural yields and decide which crops to sow and what to do during the growing season. Several machine learning algorithms were deployed to enhance the agricultural yield forecast investigation. Crop yields have lately been predicted using machine learning approaches such as multivariate regression, decision trees, association rule mining, and artificial neural networks (Kavita, and Mathur, 2021). There is the need to do spatial modeling and interpolation, crop yield and climate data often exhibit spatial variability, requiring spatial modeling and interpolation techniques to capture this variability. (Raju et al., 2023). Methods like kriging, inversing distance weighting, and regression-based techniques can be used to interpolate point-based data into continuous surfaces, enabling the inclusion of spatial dependencies in crop yield models. (Li & Heap., 2011).
The application of machine learning and AI in precision agriculture has gained a good attention, considering the new built libraries and frameworks. Abhinav et al. (2022) provides a well detailed review on machine learning applications, including early disease diagnosis through image analysis, weather forecasting using time-series models, crop tracking using remote sensing data, and resource optimization through predictive models. These techniques can improve crop management, increase yield, and reduce resource waste. Machine learning is a realistic method that can provide better yield prediction based on many attributes. It is a subdivision of Artificial Intelligence (AI) that focuses on learning. Machine learning (ML) can discover information from datasets by identifying patterns and correlations. The models must be trained using datasets that represent prior experience-based outcomes (Wigh et al,, 2022). The predictive model is built using a range of characteristics, and the parameters are calculated using previous data throughout the training phase. Machine learning models assume the output (crop yield) to be a non-linear function of the input variables (area and environmental factors)(Kavita, and Mathur, 2021).
Another work was conducted by Paudel et al.(2021) who combined agronomic principles of crop modeling with machine learning to de- sign a machine learning baseline for large-scale crop yield prediction. Their baseline was a workflow emphasizing correctness, modularity, and reusability. Their features were created by using crop simulation outputs and weather, remote sensing, and soil data from the MARS Crop Yield Forecasting System (MCYFS) database. In their proposed workflow, three machine learning algorithms namely Gradient boosting, Support Vector Regression(SVR), and k-Nearest Neighbors was used to predict the yield of soft wheat, spring barley, sunflower, sugar beet, and potato crops at the regional level in the Netherlands, Germany, and France. Sun et al proposed a novel multilevel deep learning model coupling Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) to extract both spatial and temporal features to predict crop yield. The main aims of their work were to evaluate the performance of the proposed method for corn belt yield prediction in the US Corn Belt and to evaluate the influence of different data sets on the pre- diction task. They used both time-series remote sensing data, soil property data, as the inputs. Their experimentation was done in the US Corn Belt states to predict corn yield from 2013 to 2016 at the county level.
Khaki et al. (2018) developed a Deep Neural Network-based solution to predict yield, check yield, and yield difference of corn hybrids based on genotype and environmental (weather and soil) data. Their work was carried out as part of the 2018 Syngenta Crop Challenge. Their model was found to predict with very good accuracy, with a RMSE of 12% of the average yield and 50% of the standard deviation for the validation dataset using predicted weather data. In their other paper work, the authors in Khaki et al. (2020) implemented a hybrid model which combines convolutional neural networks (CNNs), fully connected layer and recurrent neural networks (RNNs) to estimate the yield of corn and soybean. This model outperformed random forest (RF), deep fully connected neural networks (DFNN), and LASSO with a root mean square error of 9% and 8% of the respective average yield of corn and soybean. In this work, the CNNs were used to extract features from weather and soil datasets. The fully connected layer then combines the high-level features from the CNNs into the RNN including the yield data for the prediction analysis. Predictive learning models have been proposed to classify sugarcane yield grade with input features such as plot characteristics, sugarcane characteristics, plot cultivation scheme and rain volume. The machine learning models used in this work are random forest and gradient boosting trees. The accuracies of both models were compared to two non-machine learning models and they outperformed these models with 71 . 83% and 71 . 64% of random forest and gradient boosting tree respectively. Additionally, the authors noticed that both machine and non-machine learning models analyze yield grade 3 incorrectly from the confusion matrices, which they suggested to explore in future and find the cause Charoen-Ung et al. (2018).
Kaneko et al. recently proposed a crop yield study focusing on African countries. They used a deep learning architecture on satellite im- age data to predict maize at the district level in six countries in Africa: Ethiopia, Kenya, Malawi, Nigeria, Tanzania, and Zambia. Their model predicted with an R 2 of 0,56. We take another direction by using cli- mate, chemical, and agricultural parameters. The impacts of climate change are most evident in crop productivity because this parameter represents the component of greatest concern to producers, as well as consumers (Hatfield et al., 2015)
The use of ML in agriculture is promising as it assists farmers, policy-makers and other stakeholders in agriculture in making intelligent decisions. Machine learning applications in agriculture will enhance the optimized use of resources for the cultivation and harvesting of crops and the production of livestock. Proper management of pests and dis- eases on-farm can lead to an increase in quality farm produce. Image processing was used to detect diseases and spread of disease on leaf and fruits, and weight of mango Jhuria et al. (2013). Additionally, use of ML has been employed to detect and classify laurel wilt disease from healthy leaves for an effective disease management Abdulridha et al. (2018). Another use of ML in agriculture is in crop yield prediction. Forecasting crop yields enhances crop management, irrigation scheduling, and labor requirements for harvesting and storage Alibabaei et al. (2021).
Explanation of the Conceptual Framework diagram
This conceptual framework shows the diverse link of atmospheric variables that contribute to thunderstorm formation and their impact on agriculture. The framework is structured into three main components: independent variables, the dependent variable, and a mediating variable.
Independent Variables:
The framework identifies four groups of independent variables that are hypothesized to influence thunderstorm occurrence:
TCW, D2M: Total Column Water and Dew Point at 2 meters. These variables represent atmospheric moisture content, which is important for thunderstorm development.
TOTALX, CAPE: TOTALX , a stability index and Convective Available Potential Energy. These parameters show atmospheric instability, an important factor for thunderstorm formation.
TP, TCRW: Total Precipitation and Total Column Rain Water. These variables show the presence of water in the atmosphere, both as potential rain and as existing precipitation.
P88.162, CBH, TCWV, MCC: This group includes Vertical integral of eastward cloud liquid water flux (P88.162), Cloud Base Height (CBH), Total Column Water Vapor (TCWV), and Mesoscale Convective Complexes (MCC). These variables represent a mix of atmospheric conditions that can influence thunderstorm development or formation.
Dependent Variable:
THUNDERSTORMS: This is the primary outcome of interest in the model. The framework is showing that the occurrence and characteristics of thunderstorms are directly influenced by the independent variables listed above in the independent variables.
Mediating Variable:
AGRICULTURE: The framework shows that thunderstorms have an impact on agriculture, positioning agriculture as a mediating variable. This shows that the research takes into consideration not just the formation of thunderstorms, but also their effects on agricultural systems.
This conceptual framework provides a good view of the thunderstorm prediction model, including both the meteorological factors contributing to thunderstorm formation and the broader impact of these storms on agricultural systems. It shows the difference of thunderstorm prediction by including a wide range of atmospheric variables and appreciates the practical importance of accurate forecasting for the agricultural sector. The framework aligns well with the feature importance results discussed, particularly the importance of moisture-related variables (TCW, TCRW) in predicting thunderstorms. It also provides place for why some variables are important in the model, even if they showed lower feature importance in the statistical analysis. This conceptual model serves as a good guide for understanding the structure of the predictive model and interpreting its results in a broader environmental context, and economic context.