This version is not peer-reviewed.
Submitted:
02 April 2024
Posted:
02 April 2024
You are already at the latest version
A peer-reviewed article of this preprint also exists.
System | Aim | References | Artificial intelligence approach | Type of data analyzed | Outcome |
---|---|---|---|---|---|
Neurocritical Care | |||||
Electro-encephalography | Automated seizures detection | O'Shea et al. [27], Liu et al. [28], Gotman et al. [29], O’Shea et al. [30], Pavel et al. [31], Mathieson et al. [33] | DL detection models based on SVM system [27,30], scored autocorrelation moment analysis [28], spectral analysis [29], and algorithm for automated neonatal seizure recognition [31], utilizing the AUROC | Continuous EEG data | The developed seizure detection algorithms achieved a 56% relative improvement, sensitivity 81.3%-84%, specificity 84.4%-98%, overall accuracy of 93.3%-98.5%, and a false detection rate of 0.04-1.7/h |
Severity grading of neonatal HIE | Stevenson et al. [32], Raurale et al. [35], Moghadam et al. [36], Matic et al. [37], Pavel et al. [38] | ML classifier models of automated grading system based on a multi-class linear analysis [32], quadratic time-frequency distribution with a CNN [35], SVM, multilayer feedforward neural network or recurrent neural network [36], utilizing matthews correlation coefficient [38] and the AUROC | Continuous EEG and clinical data | ML models of automated grading system had an accuracy of 83%-97%. The clinical and qualitative-EEG model significantly had an MCC of 0.470. The performance for quantitative aEEG was MCC 0.381, AUROC 0.696 and clinical and quantitative aEEG was MCC 0.384, AUROC 0.720 | |
Sleep stage classification | Ansari et al. [34] | Convolutional neural network inception block (SINC) | EEG data | The SINC-based model significantly outperformed neonatal quiet sleep detection algorithms, with mean Kappa 0.75-0.77 | |
Magnetic Resonance Imaging | Automated segmentation and quantification of the PLIC | Grubel et al. [44] | CNN-based algorithm comprised of slice-selection modules and a multi-view segmentation model | MRI data | The method could identify a specific desired slice from the MRI volume data |
Combination of structural and functional networks | Ball et al. [47], Galdi et al. [48] | A multivariate analysis combining multiple imaging modalities [47], or utilizing morphometric similarity networks [48] | MRI and clinical data | The model conformed the association between imaging markers of neuroanatomical abnormality and poor cognitive and motor outcomes. The regression model predicted post menstrual age at scan with an absolute error of 0.70 weeks, and an accuracy of 92% | |
Generation of reliable and accurate segmentation | Makropoulos et al. [49], Ding et al. [50] | AI model using a framework for accurate intensity-based segmentation [49], or DSC for each tissue type [50] | MRI data | The model achieved highly accurate results across a wide range of gestational ages. The dual-modality HyperDense-Net achieved the best DSC values. The single-modality LiviaNET processed better T2W than T1W images. Both neural networks achieved previously reported performance | |
Respiratory System | RDS severity | Ahmed et al. [52], Raimondi et al. [53] | ML model using attenuated total reflectance fourier transform infrared spectroscopy, callibration of principal component, and PLRS [52], or SVM regressor [53] | RDS biomarkers, lecithin and sphingomyelin (L/S ratio) and lung ultrasound both by visual and computer-assisted gray scale analysis | A three-factor PLSR model of second derivative spectra predicted L/S ratios with signifincat accuracy (R2 0.967). Visual assessment correlated with PaO2/FiO2 (r -0.55; p<0.0001) and the A-a gradient (r 0.59; p<0.0001). Oxygenation indices were associted with and the gray scale analysis of lung ultrasound scans |
Prediction of BPD | Verder et al. [51], Dai et al. [54], Leigh et al. [55], Xing et al. [56], Laughon et al. [60], Patel et al. [61] | Model using SVM [51], logistic regression [55], XSEG-Net model combining digital image processing and human-computer interaction [56], C statistic [60], RF algorithm [61], and the AUROC | Perinatal, clinical, genetic, laboratory, X-ray imaging, and demographic data | Algorithm combining perinatal data, and gastric aspirates analysis resulted to a sensitivity of 88% and a specificity of 91%. The predictive model combinig BPD with risk gene sets and basic clinical risk factors, showed discrimination of AUROC 0.915. The AI models performance showed AUROC 0.757-0.934. The deep CNN model had accuracy, precision, sensitivity, and specificity of 95.58%, 95.61%, 95.67%, and 96.98%, respectively. Prediction from C statistic was 0.793-0.854 | |
Extubation readiness | Mueller et al. [57], Precup et al. [58], Mikhno et al. [59] | ML approach of ANN [57], SVM [58], using multivariate logistic regression and the AUROC | Multiple clinical and laboratory, and measures of cardiorespiratory variability | The optimal models achieved an AUROC of 0.87-0.871, sensitivity of 70.1%, and specificity of 90%. AI predictive models compared well with the clinician's expertise, accurately classified infants who would fail extubation | |
Automated detection of apneas | Varisco et al. [64] | Optimized algorithm for automated detection using logistic regression and the AUROC | ECG, chest impedance and oxygen saturation signal features | The apnea detection model returned AUROC of 0.88-0.90. Feature relevance was found to be the highest for features derived from the chest impedance | |
Ophthalmology | Automated diagnosis of ROP | Ataer-Cansizoglu et al. [66], Redd et al. [67], Wu et al. [68], Biten et al. [70], Brown et al. [71], Taylor et al. [72], Campbell et al. [73] | DL computer-based image analysis system (i-ROP) [66,67], occurrence network and severity network of ROP [68], telemedicine diagnoses [70], deep CNN algorithm [71], quantitative severity scale for ROP [73], calculating the AUROC, accuracy, sensitivity, and specificity | Retina image | The i-ROP system had 95% accuracy for detecting preplus and plus disease. i-ROP had an AUROC of 0.960, 94% sensitivity, 79% specificity, 13% positive predictive value and 99.7% negative predictive value for detecting type 1 ROP. OC-Net had AUROC, accuracy, sensitivity, and specificity of 0.90, 52.8%, 100%, and 37.8%, respectively, while SE-Net 0.87, 68.0%, 100%, and 46.6%, respectively. Telemedicine had 78% sensitivity for zone I disease 79% for plus disease and 79% for type 2 ROP. Deep CNN algorithm had AUROC 0.94 for the diagnosis of normal and 0.98 for the diagnosis of plus disease, a sensitivity of 93% and specificity of 94%. The AI-based quantitative severity scale for ROP had AUROC of 0.98, with 100% sensitivity and 78% specificity |
Vital Signs | Detect artifacts | Tsien et al. [74] | Decision tree induction model | Multiple physiologic data signals | The classification system evaluating physiologic data may be a viable approach to detecting artifacts |
Predict overall mortality | Saria et al. [75] | Prediction algorithm (PhysiScore) based on a physiological assessment score | Apgar score and standard signals recorded noninvasively on admission | PhysiScore had 86% sensitive and 96% specificity in predicting overall morbidity. PhysiScore had accuarcy of 90%-100% in precting morbidity related to infection, and 96%-100% to cardiopulmonary events | |
Temperature detection | Lyra et al. [77] | A combination of DL-based algorithms and camera modalities | Thermographic recordings | The detector showed a precision of 0.82. The evaluation of the temperature extraction revealed an absolute error of 0.55 oC | |
Gastrointestinal System | Prediction of spontaneous intestinal perforation | Son et al. [78] | AI model of ANN using the receiver operating characteristic analysis | Clinical data | The ANN models showed an AUROC of 0.8797-0.8832 for predicting intestinal perforation |
Prediction of postnatal growth failure | Han et al. [80] | ML models of extreme gradient boosting, random forest, support vector machine, and convolutional neural network, using multiple logistic regression | Clinical data | The model showed an AUROC of 0.74, and accuracy of 0.68 | |
Jaundice | Detection of jaundice | Althnian et al. [81], Guedalia et al. [82] | DL and ML model using a combined data analysis approach with the AUROC | Eye, skin, and fused images | DL models performed the best with skin images. The ML diagnostic ability to evaluate the risk for jaundice was 0.748 |
Sepsis | Prediction of EOS | Stocker et al. [85] | ML was used in form of a random forest classifier | Risk factors, clinical signs and biomarkers | The ML model achieved an AUROC of 83.41% and an area under the precision recall curve 28.42%. |
Prediction of LOS | Cabrera-Quiros et al. [76] | ML approaches of logistic regressor, naive Bayes, and nearest mean classifier | Heart rate variability, respiration, and body motion data | Using a combination of all features, classification of LOS showed a mean accuracy of 0.79 and mean precision rate of 0.82 three hours before the onset of sepsis | |
Patent Ductus Arteriosus | Detection of PDA | Na et al. [86], Gomez-Quintana et al. [87] | ML algorithms of a RF, a decision tree-based theory, an L-GBM, a low-bias model formed by combining sequential weak models with a light computational algorithm, a multilayer perceptron, a feedforward ANN, a SVM, using multiple logistic regression | Database of risk factors and heart sounds data | L-GBM achieved an accuracy at predicting PDA of 0.77, AUROC of 0.82 and pecificity of 0.84. The RF model achieved an accuracy of 0.85, AUROC of 0.82 and sensitivity of 0.97 in determining sPDA therapy. ML-based on heart sounds system reached and AUROC of 77 % at detecting PDA |
Neurodevelopmental outcome | Detection of neonates with cognitive impairment | Wee et al. [97], Ali et al. [101], Krishnan et al. [102] | Clustering coefficients of individual structures using SVM and canonical correlation analysis [97], self-training DNN [101], and ML using sparse reduced rank regression [102] | DTI tractography, brain functional connectome and cognitive assesment data, genomewide, SNP-based genotypes, and neurodevelopmental scales | The clustering coefficient of the DTI tractography were associated with internalizing and externalizing behaviors at 24 and 48 months of age. The self-training DNN model achieved an accuracy of 71.0%, a specificity of 71.5%, a sensitivity of 70.4% and an AUROC of 0.75. SNPs in PPARG were significantly overrepresented, in introns or regulatory regions with predicted effects including protein coding and nonsense-mediated decay |
Detection of neonates at risk of language impairment | Vassar et al. [99], Valavani et al. [103] | Multivariate models with leave-one-out cross-validation and exhaustive feature selection [99], and RF classifier [103] | MRI diffusion tensor imaging and neurodevelopmental scales | Model based on DTI had 89% and 86% sensitivity and specificity for composite, 100% and 90% for expressive, and 100% and 90% for receptive language, respectively. The RF classifier model achieved accuracy 91%, sensitivity 86%, and specificity 96%. | |
Detection of neuromotor problems and risk of cerebral palsy | Balta et al. [104] | Tracking software of DeepLabCut using a k-means algorithm | Single videos of six poIs on the infant’s upper body | The results suggested that models may be potentially used for early identification of movement disorders | |
Mortality | Prediction of mortality | Podda et al. [105], Ambalavanan et al. [107], Hsu et al. [109], Do et al. [110], Moreira et al. [111], Nascimento et al. [112] | ML algorithms including ANN [105,107], RF, bagged classification, and regression tree model [109], methodsincluding ANN, RF, and SVM [110], and a linguistic fuzzy model with minimum of Mamdani inference method [112], using logistic regression models and AUROC | Maternal, perinatal, clinical, and laboratory data | The ANN model had an AUROC of 0.85 for regression and 0.84 for neural networks, respectively. RF model showed an AUROC of 0.939 for the prediction of neonates with respiratory failure, and the bagged classification and regression tree model demonstrated an AUROC of 0.915. The model performances of AUROC equaled ANN 0.845, RF 0.826, and SVM 0.631. The Fuzzy model was able to capture the expert knowledge with strong correlation (r 0.96) |
Challenges of artificial intelligence | Areas of improvement |
---|---|
Quality of the dataset | AI tools require high-quality data to be trained. Studies should address limitation including small sample sizes, improper management of missing information, and heterogeneity evaluation in various demographic subsets |
Model performance evaluation | Model performance should be continually evaluated on the entire dataset. Apart from the area under the receiver operating characteristics curve, additional performance metrics, such as the precision-recall curve, specificity/sensitivity, and calibration metrics should be assessed |
Clinical impact and external validation | External validation is crucial because, as in different dataset or in clinical practice, the tool's performance may degrade due to an over-modeling of the training data.Also, the effectiveness of AI should be evaluated in terms of calibration and discrimination quality as well as patient outcomes and the clinical workflow. |
Comprehending | Bed-side models should enhance intelligence, interpretability, and transparency |
Guidelines for critical evaluation, regulation, and oversight | methodological, critical appraisal, medicolegal problems, and necessary monitoring is required to guarantee the model’s safe and effective usage |
Ethics | Informed consent, bias, patient privacy, and allocation are among the ethical issues with health AI, and negotiating their solutions can be challenging. Important decisions in neonatology are often accompanied by a complex and difficult ethical component, and multidisciplinary methods are necessary for advancement |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 MDPI (Basel, Switzerland) unless otherwise stated