Preprint
Article

Integrated Neuro-Symbolic Analysis Pipeline for Cerebral Aneurysm Rupture Diagnosis

Altmetrics

Downloads

125

Views

125

Comments

0

This version is not peer-reviewed

Submitted:

25 March 2024

Posted:

25 March 2024

You are already at the latest version

Alerts
Abstract
The importance of detecting and diagnosing the imminence of aneurysm rupture before it occurs cannot be overstated. However, this process is often time-consuming and challenging, with variability in conclusions not uncommon among experts. Furthermore, current state-of-the-art models in medical image segmentation suffer from limited availability of large annotated datasets for training, exacerbated by an immense data imbalance. In most segmentation tasks, 3D segmentation is preferred due to its capacity to leverage more contextual information. However, these models require more data, which is constrained by the aforementioned issues. In the realm of current diagnosis models, those lacking geographic context are notably ineffective. To address these issues, the aim is to develop a model pipeline to accurately detect cerebral intracranial aneurysms and diagnose rupture imminence through 3D magnetic resonance angiography (MRA) and tabular input, augmented with the implementation of neuro-symbolic ai. The utilization of data fusion to tackle the lack of context in tabular models is proposed, leveraging extracted geographical features from a segmentation model. To enhance segmentation results and tackle the data imbalance in aneurysm segmentation, a two-stage model pipeline that extracts contextual geographic features before aneurysm segmentation is suggested. Segmentation results are additionally improved through ensemble techniques and novel preprocessing techniques. Finally, a neuro-symbolic aspect is introduced to enhance model diagnosis, interpretability and performance. The multi-step Integrated Neuro-Symbolic Analysis Pipeline for Cerebral Aneurysm Rupture Diagnosis is introduced.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning

1. Introduction

An intracranial aneurysm, also referred to as a cerebral or brain aneurysm, denotes a weak or thin section in the wall of a blood vessel within the brain. If left untreated, these aneurysms can leak or rupture, leading to subarachnoid hemorrhage (SAH). Before rupturing, the weakened section inflates, forming a bulge or balloon. Intracranial aneurysms affect approximately 2-5% of the population, and while not all of them rupture, those that do pose life-threatening risks. The likelihood of rupture varies based on factors such as the size, location, shape, and growth rate of the aneurysm. Ruptured intracranial aneurysms have a mortality rate of approximately 50%, and around half of all survivors experience neurological deficits such as cognitive impairment, memory loss, speech difficulties, or sensory impairments [1]. Moreover, intracranial aneurysms are responsible for about 80%–90% of nontraumatic subarachnoid hemorrhages.
The recognition and segmentation of aneurysms in modeling have traditionally relied heavily on manual interventions . These tasks require researchers to have a broad knowledge base in the interpretation of images from medical imaging. This process is however prone to errors because cerebrovascular anatomy is intricate, often resulting in inconsistencies in modeling or potential inaccuracies in subsequent analyses [2]. Moreover, when it comes to manual operations, this becomes labor intensive hence its feasibility is limited by the inability to conduct patient specific analysis for large groups of patients. Consequently, automating the process is crucial.
Preprints 102174 i001
There are significant challenges within the field of ML diagnosis despite the severity of the consequences with intracranial aneurysm ruptures. Not only are patients with aneurysms less common, but the presence of an aneurysm within a brain scan typically occupies less than 2% of the volume. This provides an issue, even after standard preprocessing. Because of this 3D models are limited.
Detection of aneurysms before rupture remains challenging [3], due to a lack of symptoms of similar severity. Tabular models utilize the fact that genetic factors, family history, female sex, and age amongst other factors are associated with an increased risk of hemorrhage. However, these models lack additional information such as site, size, and shape, which are more closely related to risk of rupture, causing them to be less effective outside preliminary screening.
A solution is proposed to the aforementioned limitations. Our main contributions are as follows:
  • Incorporating the knowledge of domain experts in the Tabular Deep Learning process through neuro-symbolic AI, enhancing interpretability and performance.
  • Utilizing benefits of data fusion, integrating results of image data into a geographic contextual output within the Tabular Deep Learning process to improve spatial awareness.
  • Optimizing segmentation methods through the proposition of a staged 3D segmentation model pipeline, enhancing accuracy.
  • Improving segmentation results through the ensembling of U-Nets, improved data augmentation, and novel preprocessing techniques.
  • Introducing a multi-step Neuro-Symbolic Analysis Pipeline for Cerebral Aneurysm Rupture Diagnosis.
Though optimized for cerebral aneurysm diagnosis, the methodology and contributions can be extrapolated to other medical domains with similar issues, such as tumor diagnosis.

1.1. Dataset

Two datasets were run through the model pipeline. In this research, for aneurysm segmentation, a dataset provided by Tommaso Di Noto (Lausanne_TOF-MRA_Aneurysm_Cohort [4]) was used which consisted of 284 subjects (170 females / 127 healthy controls). The imaging data for these cases were obtained through MRA (Magnetic Resonance Angiography), a non-invasive medical imaging technique used to visualize blood vessels, including those in the brain, using MRI technology.
I am not aware of any papers that have been published on 3D segmentation using this dataset. This can be attributed to the fact that even state-of-the-art frameworks, such as nnU-Net, only achieve a validation Dice Score of 0.7.
During training for aneurysms, there were 131 cases with 20% allocated to validation set across five different folds after filtering out cases (5-fold cross-validation was used). A second dataset was derived from this one manually by removing MRA data and instead segmenting arteries. There were 131 masks created and then again 5-fold cross validation was employed with 20% of data reserved for validation.
The tabular model was trained on an in-house dataset with 895 original patients. To expand the dataset size and augment the data, it was necessary to split individuals having multiple aneurysms into separate patients. Consequently, this process led to a dataset consisting of 1439 records. The primary columns were including Sex, Ethnicity, Family History, Age and Diplopia (double vision).

2. Related Work

As technology continues to advance, new techniques procure the growing interest of patient diagnosis in treatment, whether for decreased time in prescreening or analysis. In the realm of aneurysms segmentation, there have been many attempted tactics. Numerous research studies have explored non-learning approaches for aneurysm detection, utilizing enhancement filters and shape-based technologies.Automated rule-based schemes and geometric feature extraction methods have also been investigated.
Recent advancements in deep learning show promise for aneurysm detection. CNN-based frameworks utilizing intra-vasculature distance mapping and deep neural networks alongside maximum intensity projection algorithms have been suggested [5]. However, the effectiveness of these techniques may be limited by the size of datasets used for training or validation.
One prominent framework in the industry is the nnU-Net [6], known for its state-of-the-art performance in medical segmentation tasks. However, its performance on the aneurysm segmentation task is significantly compromised due to data limitations and imbalances, as previously discussed. For instance, Wenwen Yuan et al.[7] employed a dense convolution attention U-NET, a parent of the nnU-Net framework, but found that it lacked sensitivity to image details. Moreover, they adopted a 2 dimensional input instead of a traditional 3D input, projecting 3D images onto two-dimensional images in various directions based on voxel intensity. Subsequently, they constructed a 2D CNN for feature detection, inadvertently sacrificing valuable contextual information in the process.
Tommaso Di Noto [4] is the main source of comparison and frame of reference for segmentation as their dataset is used for segmentation training. One major technique they propose is to constrain the DL analysis only to the areas of the brain where aneurysm occurrence is plausible. This anatomically-informed approach aims at simulating the selective analysis that radiologists perform on the TOF MRA scans. However, the broad generalizations limit the performance of this technique.
Preprints 102174 i002
In the realm of tabular models, Malik et al. [8] proposes a model that achieves a precision of 0.73, recall of 0.67, F1 score of 0.64, and accuracy of 0.66. They compare these metrics to an expert score of 0.74, 0.73, 0.73, and 0.73 for precision, recall, F1 score, and accuracy, respectively. They utilize a rule-based system to enhance interoperability through the use of the Apriori Algorithm but did not employ it to improve upon the model score.

3. Methods

The initial phase in developing an automated system for estimating aneurysm rupture risk is establishing a dependable method for accurately detecting and segmenting cerebral aneurysms. Only afterwards, can geometric features be integrated into a tabular model or utilized with external information to enhance model scores.

3.1. Preprocessing/Data Collection

In the space of volumetric 3D data, before being fed into training, it was necessary to reduce dimensionality and complexity as much as possible. Before MRA volumes were ready for segmentation, Several preprocessing steps were carried out for each subject. First, skull-stripping was performed. Second, N4 bias field corrections were applied. Third, all volumes were resampled to a median voxel spacing.
Preprints 102174 i003Preprints 102174 i004
Similar to Tommaso Di Noto's method of confining the aneurysm to areas where aneurysm occurrence is plausible, the proposed method draws inspiration from it. However, instead of manually determining these areas around arterial locations with expert guidance, a mask of the cerebral arteries is inputted into the aneurysm segmentation model as the ground truth, replacing the preprocessed MRA.
Since the original dataset lacked label masks for arteries, to acquire the required data for training an artery segmentation model, the slicer application thresholding tool [9] was employed to manually generate noisy artery labels from the original preprocessed MRA data.

3.2. Geometric Feature Extraction

In order to improve the performance of the tabular model, geometric features need to be extracted, which can be achieved through volume segmentation. This process involves segmenting aneurysms to gather geometric attributes such as side_left, side_right, or size, which can then be utilized as columns in the tabular model.
To train a model with a high DICE score on the aneurysm data, a high performing model on artery segmentation is necessary. The artery segmentation model utilized the manually generated noisy artery masks as ground truth labels and original preprocessed MRA volumes as the feature input. Conversely, in the aneurysm segmentation model, aneurysm masks are used as ground truth labels.
An investigation on state-of-the-art medical segmentation frameworks results on messy / noisy labels was conducted including nnU-Net [6], MONAILabel [10], and Medsam [11]. Baseline tests favored nnU-Net. Thus further modifications to optimize the chosen metric, DICE, were undertaken on the nnU-Net framework.

3.2.1. Artery Segmentation

nnU-Net [6]is a framework designed to automate the configuration of U-Net-based segmentation pipelines for specific datasets. Originally developed for the Medical Segmentation Decathlon, nnU-Net exhibited exceptional generalization across datasets. However, additional steps were taken to enhance its performance beyond state-of-the-art baselines.
The default encoder of the nnU-Net, which is based on the Convolutional U-Net architecture, specializes in feature extraction through a series of convolutional layers followed by downsampling operations such as max-pooling. This architecture is effective for capturing hierarchical features at different scales and levels of abstraction. Residual encoders, however, which incorporate residual connections, sometimes perform better due to its ability to mitigate the vanishing gradient problem and facilitate the training of deeper networks.
After thorough investigation, ultimately, the proposed artery segmentation model is an ensemble of two different nnU-Net pipelines, each selected from the best fold of a 5-fold cross-validation process. One pipeline utilizes the ConvNet architecture with Dice Score + Cross Entropy as the loss function, while the other employs the Residual Encoder architecture with Dice Score + Cross Entropy. This weighted average ensemble approach aims to improve generalization, particularly in the presence of manually generated noisy artery labels.
Preprints 102174 i005

3.3. Aneurysm Segmentation

In the aneurysm segmentation model, I leveraged the artery segmentation model developed in the previous section. By inputting the original preprocessed MRA volumes into the artery segmentation model and then supplying the output masks to the aneurysm segmentation model as input, the aneurysm segmentation model is provided with a foundational understanding of areas of interest. This preprocessing step effectively eliminated a majority of areas where aneurysms cannot occur, streamlining the segmentation process and focusing the model's attention on relevant regions. This approach enhances the efficiency and accuracy of aneurysm segmentation, facilitating the identification and delineation of aneurysms within the cerebral vasculature. Additionally, ensembling was used across 5 different folds during the training phase using weighted averaging.

3.3. Neuro Symbolic AI

After segmenting and extracting relevant geographical info (like location and volume) from the aneurysm, geographical context can be incorporated as features in a tabular form into patient data (beside features like age, ethnicity, presence of symptoms).
Neurosymbolic AI combines aspects of both neural networks and symbolic reasoning. By integrating the pattern recognition capabilities of neural networks with the transparent reasoning of symbolic systems, Neurosymbolic AI models offer more interpretable results [12]. In a medical context, this is invaluable as explanations for diagnosis and treatment are essential.
Outside of improving transparency, Neurosymbolic AI, through the combination of symbolic and neural networks, is used to enhance model performance. Neurosymbolic AI is useful in leveraging symbolic representations of knowledge to make them more data-efficient, leading to improved results compared to traditional neural networks in domains with limited data, such as the medical industry. Additionally, in domains such as healthcare, where pre-existing domain knowledge is vital, neurosymbolic AI can incorporate these relationships into the learning process, improving accuracy.
Through testing, the Tabular Transformer was determined to be the base / neural component for the neural symbolic model, outperforming other neural networks such as simple logistic regression.
The Apriori Algorithm was chosen as the symbolic component. Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. The initial step was to prepare the in-house dataset for rule set generation. After basic preprocessing and converting to a one-hot encoding format, the in house dataset had 54 features / columns. After generating base rulesets, over-filtering out unhelpful rules was necessary.
Preprints 102174 i006
35701 mined rulesets were filtered to 10 ruleset that would be integrated into the model and affect results depending on if the item-set was met. These 10 were determined by sorting by correlation of the feature to the rupture status (analyzed through both expert advice and statistical analysis), which resulted in features like location and size to be popular amongst item-sets.
In inference, the final dense layer’s sigmoid activation result is directly affected depending if the case meets a ruleset.

4. Experimental Results

For sake of convenience, separate models, such as the artery segmentation, aneurysm segmentation, rupture diagnosis components, within the INSAP Pipeline will still be addressed as INSAP when in comparison.

4.1. Segmentation

Through extensive experimentation on the Lausanne_TOF-MRA_Aneurysm_Cohort dataset, comparisons are made between different techniques. Our noisy artery to aneurysm segmentation model offers significant improvement compared to other state-of-the-art frameworks in terms of both artery segmentation and aneurysm segmentation.
Preprints 102174 i007
In comparison to other state-of-the-art segmentation frameworks, the proposed INSAP artery segmentation strategy demonstrates a higher DICE score. When manually creating masks / labels, such as the case in the artery segmentation domain, it is essential to deal with noise. The INSAP’s ensembling of nnU-Net with a Convolutional Neural Network for the U-Net compared to a Residual Encoder Network allowed for significant advantage in terms of generalization and adaptability to noisy data, in comparison to other frameworks such as MedSam and Monailabel.
Preprints 102174 i008
By utilizing the results of the proposed artery segmentation model as input, the INSAP aneurysm segmentation strategy was able to achieve a significant jump in DICE score, going from a training DICE score of 0.6507 and mean validation DICE of 0.6483 to a training DICE score of 0.9429 and a validation DICE score of 0.9057.
Preprints 102174 i009Preprints 102174 i010

4.2. Aneurysm Rupture Diagnosis

Recall optimization in aneurysm rupture risk estimation is vital because of the potential consequences of diagnosing a false negative. Missing the detection of an aneurysm that is at risk of rupture could mean postponement of intervention, which can be life threatening for the patient. Therefore, ensuring that real positive cases are identified correctly is key to achieving high recall.
Preprints 102174 i011
By utilizing a neuro symbolic strategy, using the association rule mining through the Apriori Algorithm, the INSAP tabular classifier for aneurysm rupture classification demonstrates higher accuracy, precision, recall, and F-1 in comparison to other strategies.
Preprints 102174 i012

5. Conclusion

The following pipeline for cerebral aneurysm risk diagnosis is proposed: Integrated Neuro-Symbolic Analysis Pipeline for Cerebral Aneurysm Rupture Diagnosis.
In the realm of 3D artery segmentation using data crafted from the Lausanne_TOF-MRA_Aneurysm_Cohort dataset, the standard implementation of nnU-Net demonstrates superior performance compared to established frameworks like MedSam and MonaiLabel. However, the introduction of the INSAP ensembled artery segmentation Model outperforms the standard nnU-Net, proving the efficacy of the ensemble model, particularly when confronted with the noisy labels. Thus, for non-expert made labels (often noisy / messy), the INSAP method is preferred.
In the realm of 3D aneurysm segmentation using the Lausanne_TOF-MRA_Aneurysm_Cohort dataset, the INSAP Artery-to-Aneurysm method proves to outperform a standardly trained nnU-Net, proving the effectiveness of leveraging 3D extracted arteries as contextual information, surpassing the utility of the original preprocessed MRA volume as an input. Thus, the results highlight the significance of incorporating anatomical context for enhanced segmentation accuracy and efficacy.
When evaluating the in-house dataset with the neural symbolic AI utilizing extracted geometric features, it outperformed it without them, proving the importance of leveraging geometric context and neuro symbolic AI to enhance the performance of diagnosis.
Basic associative features alone are not enough for accurate preliminary screenings for aneurysms. Geographic context is paramount, however, the interpretation of MRA scans by radiologists is time consuming, and may span from hours to days. The proposed INSAP Pipeline allows a streamlining and optimization of these operations that results in more effective diagnosis, by automating MRA analysis and incorporating results into a neural symbolic classification model, improving results compared to industry standards at every component.
Preprints 102174 i013

Acknowledgements

Thanks to SMILES lab for their support in the 3D segmentation analysis part of the project along with the neuro-symbolic aspect. Below are the doctors and experts that gave invaluable feedback and validation for the project: K. Malik, University of Michigan; F. Alam, SMILES Lab Researcher; M. Irfan, SMILES Lab Graduate Research Assistant.

References

  1. Boulouis, G.; Rodriguez-Régent, C.; Rasolonjatovo, E.; Ben Hassen, W.; Trystram, D.; Edjlali-Goujon, M.; Meder, J.-F.; Oppenheim, C.; Naggara, O. Unruptured intracranial aneurysms: An updated review of current concepts for risk factors, detection and management. Rev. Neurol. 2017, 173, 542–551. [Google Scholar] [CrossRef] [PubMed]
  2. Wiebers, D.O.; Whisnant, J.P.; Sundt, T.M.; O'Fallon, W.M. The significance of unruptured intracranial saccular aneurysms. J. Neurosurg. 1987, 66, 23–29. [Google Scholar] [CrossRef] [PubMed]
  3. Hae, H.; Kang, S.-J.; Kim, W.-J.; Choi, S.-Y.; Lee, J.-G.; Bae, Y.; Cho, H.; Yang, D.H.; Kang, J.-W.; Lim, T.-H.; et al. Machine learning assessment of myocardial ischemia using angiography: Development and retrospective validation. PLOS Med. 2018, 15, e1002693. [Google Scholar] [CrossRef] [PubMed]
  4. Tommaso Di Noto and Guillaume Marie and Sebastien Tourbier and Yasser Alemán-Gómez and Oscar Esteban and Guillaume Saliou and Meritxell Bach Cuadra and Patric Hagmann and Jonas Richiardi (2022). Lausanne_TOF-MRA_Aneurysm_Cohort. OpenNeuro. [Dataset]. [CrossRef]
  5. Jerman, T.; Pernus, F.; Likar, B.; Spiclin, Z. Aneurysm detection in 3D cerebral angiograms based on intra-vascular distance mapping and convolutional neural networks. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, Australia, 18–21 April 2017; pp. 612–615. [Google Scholar] [CrossRef]
  6. Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2020, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
  7. Cai, S.; Tian, Y.; Lui, H.; Zeng, H.; Wu, Y.; Chen, G. Dense-UNet: A novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quant. Imaging Med. Surg. 2020, 10, 1275–1285. [Google Scholar] [CrossRef] [PubMed]
  8. Malik, K.; Alam, F.; Santamaria, J.; Krishnamurthy, M.; Malik, G. Toward Grading Subarachnoid Hemorrhage Risk Prediction: A Machine Learning-Based Aneurysm Rupture Score. World Neurosurg. 2023, 172, e19–e38. [Google Scholar] [CrossRef] [PubMed]
  9. Fedorov, A.; Beichel, R.; Kalpathy-Cramer, J.; Finet, J.; Fillion-Robin, J.-C.; Pujol, S.; Bauer, C.; Jennings, D.; Fennessy, F.; Sonka, M.; et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn. Reson. Imaging 2012, 30, 1323–1341. [Google Scholar] [CrossRef] [PubMed]
  10. Diaz-Pinto, A.; Alle, S.; Nath, V.; Tang, Y.; Ihsani, A.; Asad, M.; Pérez-García, F.; Mehta, P.; Li, W.; Flores, M.; et al. MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images. arXiv 2022, arXiv:2203.12362. [Google Scholar]
  11. Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 1–9. [Google Scholar] [CrossRef] [PubMed]
  12. Sheth, A.; Roy, K.; Gaur, M. Neurosymbolic Artificial Intelligence (Why, What, and How). IEEE Intell. Syst. 2023, 38, 56–62. [Google Scholar] [CrossRef]
  13. Shahzad, R.; Pennig, L.; Goertz, L.; Thiele, F.; Kabbasch, C.; Schlamann, M.; Krischek, B.; Maintz, D.; Perkuhn, M.; Borggrefe, J. Fully automated detection and segmentation of intracranial aneurysms in subarachnoid hemorrhage on CTA using deep learning. Sci. Rep. 2020, 10, 1–12. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated