1. Introduction
Complex disorders affecting neurological processes are responsible of great health, social and economic costs worldwide. Despite the heterogeneity of these complex disorders, they all pose a significant global burden, since the misunderstanding of their causes and associated factors that intensify the importance of these phenotypes is the main cause of the insufficiency of diagnosis, and also the lack of effectiveness in medical treatment for patients negatively impacts the well-being of those affected.
Autistic Spectrum Disorder (ASD) (Hirota & King, 2023; Lord et al., 2018) is a phenotype that contains from the most severe autism, when social and communicative functions are very limited, until Asperger syndrome, characterized by mild symptoms. Anyway, all diagnostic features show a rigid behavior, a pathological selection for some issues, and the capacity for attention and communication is affected (Sharma et al., 2018). Some body systems are also affected, as digestive (Martin et al., 2018; Srikantha & Mohajeri, 2019), immune (Heidari et al., 2021; Meltzer & Van de Water, 2017; Ormstad et al., 2018; Robinson-Agramonte et al., 2022), circulatory (Kealy et al., 2020; Mouridsen et al., 2016) and nervous (Doroszkiewicz et al., 2021; Matta et al., 2019; Sharon et al., 2016). Microbiota (Kang et al., 2017) and genetic causes (Qiu et al., 2022) have been proposed in the early development of this disorder, and studies support that risk of suffering ASD rises when relatives are affected (Arberas & Ruggieri, 2019). These symptoms are harmful for patients’ autonomy and the welfare of caregivers (Dissanayake et al., 2019; Ruzzo et al., 2019). The World Health Organization communicated in 2023 that one out of one hundred children are suffering this disorder and prevalence is rising in the previous few years. Due to that, and given the fact that origins and development of this condition are not agreed by specialists and researchers in this field of medicine (Silverman et al., 2010; Tseng et al., 2022), plenty of research teams are thinking of strategies in order to discover the etiology and main factors for understanding this disorder.
Schizophrenia is a neurological disorder that is characterized by positive (hallucinations, lack of social skills and cognitive distortions) and negative (general apathy, in social and job issues) (Khan et al., 2013) symptoms. This disease is linked to increased vulnerability to cardiovascular (Barnett et al., 2007; Hagi et al., 2021), metabolic (Trubetskoy et al., 2022) and infectious (Fuglewicz et al., 2017) diseases, which rise the risk of an early death. Furthermore, it has a direct link with suicide index growth (Balhara & Verma, 2012; Carlborg et al., 2010; Sher & Kahn, 2019), and also caregivers and nearly people are negatively affected in social terms, since patients suffer from diminished autonomy (Gulayín, 2022; Ribé et al., 2018). Its prevalence worldwide is 24 million people (McGrath et al., 2008; Saha et al., 2005), with a percentage of 0.32%, 0.45% in adults (World Health Organization, 2022), and it tends to appear in teenagers at an advanced age, probably with bonds to neural pruning (Germann et al., 2021). Up to date, the origin of this disorder remains unknown (Pino et al., 2014). In this line, there is some consensus in the relevance of some gene factors implied in its onset (Trubetskoy et al., 2022), but that are not determinant to its origin (Vilain et al., 2013), considering that other factors as social environment, drug abuse (including alcohol) (Häfner & an der Heiden, 1997; Janoutová et al., 2016; Stilo & Murray, 2019), and neural pruning, usual in adolescence, can influence in a decisive way (Germann et al., 2021). Myelin sheaths (Valdés-Tovar et al., 2022) and central nervous system architecture (Bobilev et al., 2020; Cheng et al., 2022; Heckers & Konradi, 2002) are also bonded with this disorder. The most extended belief nowadays is that schizophrenia is a multifactorial disorder (Kahn & Sommer, 2015; Morera-Fumero & Abreu-Gonzalez, 2013). In order to clarify the disease causes, omics techniques as transcriptomics have been applied (Gandal et al., 2018). Nevertheless, this is a pathological situation that harms in a severe way the life quality of patients, what generates a medical and social interest that concerns to pharma industry, which intends to alleviate this suffering with drugs that minimize the secondary effects associated to available treatments (Krause et al., 2018), usually adverse for daily life of patients (Perlick et al., 2010). Thus, efficient research is crucial to solve the social and economic problems attached to this disease (Evans-Lacko et al., 2014).
Bipolar disorder (BD) (Smith et al., 2012; Tondo et al., 2017) is a neurological condition characterized by the alternance of maniac episodes (euphoria, excessive joy, uncontrolled enthusiasm, etc.) with depressive ones (anhedonia, sadness, lack of interest for living, etc.) (Fagiolini et al., 2015). Genetic causes have been studied (Gandal et al., 2018), and some environmental factors as alcoholism and other types of drug abuse has been proposed as disease cause (Aldinger & Schulze, 2017). The development of genomics and transcriptomics may help to understand the disorder and treat it efficiently. Its prevalence was 40 millions of people in 2019 (Institute of Health Metrics and Evaluation, 2022), and the suffering rises suicide index for these patients (Beyer & Weisler, 2016; Clemente et al., 2015; Miller & Black, 2020). There is still not much understanding about this disorder, but some drugs, including lithium, have been reported to alleviate its symptoms (Katz et al., 2022; Malhi et al., 2017).
Major depressive disorder (MDD) (Filatova et al., 2021) is a neurologic disease of unknown origin (Gómez Maquet et al., 2020), with more severe symptoms that common depression (Kennedy, 2008). Among these are anhedonia, sadness and lack of desire to live (Bauer et al., 2019). Genetic causes are considered, which has led to the development of transcriptomics and epigenetic studies (Kendall et al., 2021), but also a physiological and hormonal origin are reported, as well as environmental factors like stress, psychological and social aspects (Suda & Matsuda, 2022). Due to the fact that its origin remains unknown, it is classified as a complex disorder (Harder et al., 2022), which causes a great social and economic burden for the community environment of the affected people (Greenberg et al., 2021; Keshavarz et al., 2022). World prevalence is about 350 million of people (Gutiérrez-Rojas et al., 2020; Smith et al., 2019), but there is not much consensus. In fact, this prevalence differs between regions (3% in Japan and 16.3% in USA) (Gutiérrez-Rojas et al., 2020; Smith et al., 2019). Every year even 850,000 suicides have been registered because of major depressive disorder (Li et al., 2022; Serra et al., 2022). Different techniques, such as omics ones and neuroimage, and several biomarkers, as certain fatty acids and miRNA have been used, but there is no consensus (Figueroa-Hall et al., 2020; Gadad et al., 2018; Zhou et al., 2019). Nowadays, there are lots of medicines that treat this disease, taking advantage of the limited knowledge we have about encephalon.
Despite high prevalence worldwide, the origins of these disorders are still unknown. Because of that, it is necessary to apply techniques able to detect key factors for prevention and treatment, pointing towards its main causes and improving as much as possible the health and quality of life of these patients.
Advances in omics technologies, particularly microarray analysis, have revolutionized the comprehensive exploration of gene expression patterns associated with neurological conditions (Bettencourt et al., 2023; Legati et al., 2021; Xu et al., 2022). Microarray technology enables the simultaneous measurement of thousands of genes, providing deep insights into the dysregulated molecular pathways implicated in the pathogenesis of various diseases (Bryant et al., 2004; Copland et al., 2003; Krokidis & Vlamos, 2018; Rai et al., 2016; Ward, 2006). A critical aspect of microarray data analysis is the identification of differentially expressed genes (DEGs), which serve as key indicators in understanding disease mechanisms. Traditionally, these analyses have relied on ranking genes based on individual p-values; however, this approach does not always correlate with biological significance. In some cases, very small p-values, indicative of high statistical significance, may not correspond to biologically relevant signals, while larger p-values, often disregarded, could be linked to genes crucial for specific biological processes (Esteban & Wall, 2011). Classical microarray analysis methods typically utilize Welch’s t-test and linear models such as Empirical Bayes to identify DEGs by comparing gene expression levels between experimental groups or conditions (Jeffery et al., 2006; Selvaraj & Natarajan, 2011). However, these traditional approaches may miss significant gene expression changes, particularly in complex diseases like those affecting the brain, which are characterized by heterogeneous molecular profiles (Ganapathy et al., 2019; Villani & Marzetti, 2023).
To address the limitations of p-value-based approaches, which often result in the excessive suppression of biologically relevant signals due to multiple testing correction methods, more robust methodologies have been developed (Breitling & Herzyk, 2005; Cordero et al., 2007; Esteban & Wall, 2011). Notably, one such method integrates Game Theory, utilizing a computational index known as the Shapley value (Esteban & Wall, 2011). This approach provides a more nuanced assessment of gene significance by evaluating the cumulative contribution of each gene within the context of the entire gene set analyzed. The Shapley value quantifies the importance of each gene by considering its contribution in conjunction with the contributions of all other genes in the same experiment (Moretti & Patrone, 2008). By combining Game Theory with traditional statistical analyses, this methodology offers a powerful tool for enhancing the detection and interpretation of meaningful gene expression differences (Esteban & Wall, 2011).
We applied the microarray games methodology in this study, specifically harnessing Shapley values, to analyze gene expression data related to various neurological pathologies. This approach integrates Game Theory to improve the detection and functional analysis of genes involved in complex biological conditions, such as ASD, schizophrenia, bipolar disorder, and major depressive disorder (Esteban & Wall, 2011). By evaluating the average marginal contribution of each gene across all possible coalitions, this technique reveals critical insights into the genetic underpinnings of these diseases, potentially leading to innovative diagnostic and therapeutic strategies. The game-theoretic approach not only enhances the identification of key genetic players but also enriches our understanding of their biological roles within complex, multi-genic pathologies.
To achieve a comprehensive understanding of gene expression profiles associated with four prevalent neurological pathologies, we employed two distinct methods for microarray data analysis: (i) a conventional approach utilizing Welch’s t-test and Empirical Bayes methods, and (ii) a complementary analysis based on the Comparative Analysis of Shapley value (CASh) method, derived from Game Theory. Previous research (Castro-Martínez et al., 2024; Esteban & Wall, 2011) has demonstrated that the CASh method significantly increases the power to detect differentially expressed genes (DEGs), providing a more robust framework for analyzing complex biological data.
4. Discussion
Neurological pathologies inflict significant suffering and pose substantial burdens on millions of people worldwide. In recent years, the advent of omics technologies has enabled a comprehensive exploration of the molecular patterns associated with many of the most common neurological conditions. Microarray technology, which emerged nearly three decades ago with the goal of studying whole gene expression profiles, has since provided unprecedented insights into the dysregulated molecular pathways involved in disease pathogenesis (Ducray et al., 2007; Shai, 2006). In the present study, we analyzed data from nine datasets generated using Affymetrix microarray devices, including three datasets from Autism Spectrum Disorder, two from Schizophrenia, two from Bipolar Disorder, and two datasets encompassing samples from Schizophrenia, Bipolar Disorder, and Major Depressive Disorder.
Raw data were downloaded from the GEO public repository, and gene expression files were pre-processed, quality controlled, and normalized. To detect differentially expressed genes (DEGs), we employed two strategies: (i) a traditional approach using classical statistical t-tests and (ii) an alternative approach utilizing the CASh method (Moretti et al., 2008). The traditional t-test approach identified few DEGs, whereas the CASh method revealed a significant number of statistically relevant genes across the nine datasets analyzed. The t-test identifies genes based on their differential expression between two conditions, considering a gene significant when its p-value falls below a pre-established threshold (0.05 adjusted p-value in our study). In contrast, the CASh method not only considers the expression of each gene under two conditions but also evaluates the contribution of each gene across all possible permutations using the Shapley value as a measure. This holistic approach mitigates the impact of confounding variables by considering the overall gene network rather than isolated gene expressions. However, a current limitation of the CASh method is that it does not explicitly account for potential confounding effects, which should be addressed in future applications (Cesari et al., 2018; Moretti et al., 2008, 2010; Sun et al., 2020). In summary, CASh offers a more nuanced understanding of gene interactions and their collective impact on disease pathophysiology.
Interestingly, the functional enrichment analysis of the DEGs detected using the CASh method, confirmed previous findings on the molecular bases of the neurological pathologies studied. For instance, processes related to cardiac muscle cell development in ASD samples are directly linked to vascular abnormalities observed in patients with this phenotype (Yao et al., 2006). In the case of Schizophrenia, the regulation of primary metabolic processes and glycine-tRNA ligase activity emerged as significant processes, which are particularly relevant given the metabolic issues associated with Schizophrenia (Von Hausswolff-Juhlin et al., 2009). Similarly, Bipolar Disorder was linked to several key findings in our study, including the positive regulation of lipoprotein lipase activity and synapse and phosphatidylcholine-sterol O-acyltransferase activator activity, which align with the known association of this disorder with altered fatty acids (Saunders et al., 2016). For Major Depressive Disorder, characterized by inflammation and neurological damage, we identified processes such as "wound healing spreading of cells" and "growth cone" as significant in the context of differential gene expression.
5. Conclusions and Limitations
This study highlights the power of Comparative Analysis of Shapley values (CASh) in revealing complex genetic insights into neurological disorders such as Autism Spectrum Disorder (ASD), Schizophrenia, Bipolar Disorder, and Major Depressive Disorder. CASh has been proven as highly effective in identifying differentially expressed genes, many of which are missed by traditional statistical methods, offering a more nuanced understanding of the molecular mechanisms underlying these conditions. These findings open new opportunities for developing innovative diagnostic and therapeutic strategies that may shed light in the etiology of these complex conditions.
However, several limitations should be considered. The inherent complexity of microarray data—such as noise, batch effects, and variability in sample quality—can introduce biases that affect the accuracy of gene expression analysis, despite the rigorous preprocessing and normalization applied. Additionally, the reliance on public datasets may bring biases related to differences in data collection methods, patient selection, and experimental design, potentially limiting the generalizability of our results. To mitigate these issues, future studies should be conducted to validate the findings by using more diverse cohorts of patients.
Looking ahead, integrating CASh with complementary omics technologies, such as proteomics and metabolomics, promises a more comprehensive view of the pathophysiological processes in brain diseases. This combined approach could significantly improve the development of multi-marker panels, enhancing diagnostic accuracy. Longitudinal studies using CASh could also track disease progression and treatment responses, providing insights into how gene expression evolves over time in relation to disease states.
A further challenge is the computational intensity of CASh, particularly with large datasets. The method requires substantial computational resources, and interpreting Shapley values can be complex. Simplifying the approach—through algorithm optimization or data reduction—would make CASh more accessible for routine clinical and research applications. Additionally, CASh does not account for post-transcriptional modifications or protein-level interactions, which are critical for a complete understanding of disease mechanisms. Future work could address this by integrating CASh with proteomic and metabolomic data to offer deeper insights at the protein level.
Another key limitation is the lack of experimental validation of the identified differentially expressed genes. To confirm the biological relevance of these findings, future studies should incorporate in vitro functional assays, such as gene knockdown or overexpression experiments. Moreover, in vivo studies in animal models would help to further elucidate the roles of these genes in disease mechanisms and assess their potential as therapeutic targets.
Achieving the full potential of CASh will require strong interdisciplinary collaboration. Geneticists, neurologists, oncologists, and bioinformaticians must work together to conduct large-scale studies that validate and refine the gene signatures identified, translating these discoveries into practical clinical applications. By advancing our understanding of the genetic basis of neurological disorders, this research contributes to precision medicine approaches, ultimately improving patient outcomes and reducing the global burden of these conditions.