Prediction of Pneumonia Mortality Risk and Cognitive Test Scores With Interpretable Machine Learning Models
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Adopting machine learning algorithms in medical practices is challenging due to their lack of transparency. This thesis shows that interpretable models can offer similar, if not better, performance than traditional machine learning methodologies when applied to tabular data with interpretable features.
Confirming that critical variables align with existing medical domain knowledge verifies that the model learns from relevant patterns instead of statistical coincidences or improperly created variables in the data set. Researchers need to investigate high-impact variables that are not known to correlate with a given condition to determine their relevance and improve existing medical domain knowledge.
This thesis explores applying explainable machine learning practices to two problems: pneumonia mortality risk prediction and predicting future cognitive test scores.
In our first case study, we proposed a novel pneumonia risk prediction framework using an explainable boosting machine model to predict patient mortality risk to optimize hospital resource usage. We pruned the model feature set to only allow for medically relevant features, which offered minimal performance decay while outperforming other machine learning methods. The model outperformed all prior work on the MIMIC-III dataset for this task.
Our second case study focused on predicting future cognitive test scores for the Canadian Longitudinal Study on Aging. We pruned the large dataset, which had over 6,000 input variables, down into 25 lightweight, explainable feature models with minimal performance loss from the feature pruning process. Results from this work show that there is promise in using explainable machine learning models to predict future cognitive test scores, which is the first step in applying early preventative measures for irreversible cognitive decline due to dementia or Alzheimer's disease.
Both case studies show that explainable machine learning on tabular data offers similar, if not better, results than black models.

