Interpretable risk models for Sleep Apnea and Coronary diseases from structured and non-structured data

Carlos Anderson Oliveira Silva, Rafael Gonzalez-Otero, Michel Bessani, Liliana Otero Mendoza, Cristiano L. de Castro

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Machine learning-based risk models built from Electronic Health Records (EHR) can support medical decision-making. However, the lack of standardization of EHR data and the “black-box” nature of the machine learning approaches have imposed difficulties to their acceptance as support tools in a clinical environment. This paper presents a method able to predict and explain the diagnostic of Atrial Fibrillation (AF), Sleep Apnea (SA) and, Coronary Arterial Disease (CAD); the proposed model is learned using EHR's structured data (commonly used screening variables) and non-structured data (textual data drawn from medical reports) of patients. An embedding scheme of variables together with a labeling approach is used to mimic the ability of an expert in categorizing the non-structured textual data. The method relies on complex models to predict such diseases combined with the SHAP approach to explaining the prediction. A comparison of prediction models with different settings of input variables has shown that the use of non-structured data improved the performances of CAD risk prediction models. Moreover, such a comparison pointed out that the patients’ medical histories is an important factor that should be considered during the data-driven learning process.

Original languageEnglish
Article number116955
JournalExpert Systems with Applications
Volume200
DOIs
StatePublished - 15 Aug 2022

Keywords

  • Coronary diseases
  • EHR data
  • SHAP
  • Sleep Apnea
  • Text embedding
  • Weak supervision

Fingerprint

Dive into the research topics of 'Interpretable risk models for Sleep Apnea and Coronary diseases from structured and non-structured data'. Together they form a unique fingerprint.

Cite this