Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data

Vytautas Abromavičius; Darius Plonis; Deividas Tarasevičius; Artūras Serackis

doi:10.20944/preprints202005.0205.v1

Submitted:

11 May 2020

Posted:

12 May 2020

You are already at the latest version

Abstract

The presented research faces the problem of early detection of sepsis for patients in the Intensive Care Unit. The PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data. A labeled clinical records dataset for training and verification of the algorithms was provided by the challenge organizers. However, a relatively small number of records with sepsis, supported by Sepsis-3 clinical criteria, led to highly unbalanced dataset (only 2% records with sepsis label). A high number of unbalanced data records is a great challenge for machine learning model training and is not suitable for training classical classifiers. To address these issues, a number of various models were investigated. A solution including feature selection and data balancing techniques was proposed in this paper. In addition, several performance metrics were investigated. Results show, that for successful prediction, a particular model having few or more predictors based on the length of stay in the Intensive Care Unit should be applied.

Keywords:

Early detection

;

Sepsis

;

Evaluation metrics

;

Machine learning

;

Medical informatics

;

Feature extraction

;

Physionet challenge

Subject:

Engineering - Electrical and Electronic Engineering

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe