Artificial Intelligence-Based Prediction of Diabetes Mellitus Using Health Checkup Data

Authors

  • K. V. Odunuga
  • R. E. Ochogwu
  • C.I. Osuji
  • O. E. Owoicho
  • F. B. Oredipe
  • O. A. Bamgbose
  • S. I. Okogu
  • M. A. Sunmola
  • A. O. Adebanjo

DOI:

https://doi.org/10.5281/zenodo.13748331%20

Keywords:

Data Security, DDoS Attack, cybersecurity and e-government

Abstract

Diabetes Mellitus (DM) is a chronic condition due to chronic high blood glucose levels caused by relative or absolute insulin deficiency. AI could improve diabetes management by providing real-time health information to patients or providers, facilitating patient self-management, and enhancing intervention targeting high-risk populations. The study builds a machine-learning model to predict patients with diabetes mellitus. The study employed a quantitative analysis to understand the predictive power of machine learning algorithms for the risk of developing diabetes. The data of 768 patients was obtained from Kaggle Health data and was used to train a predictive model in Jupyter Notebook. The variables include socio-demographic data, clinical measurements, medical history, and diabetes outcomes. The study adopted the Support Vector Machines (SVM) as the model of choice used for classification. The model was trained by splitting the data into training (70%) and testing (30%) sets. The model was evaluated by assessing the precision and accuracy scores. Data was simulated; thus, no actual participants were involved. The dataset size was 768 samples, and the outcome distribution included Non-diabetic: 500 cases and diabetic: 268 cases. Glucose had a mean = 69.11 and range = 0 to 199. The Body Mass Index had a mean = 31.99 and a range = 0 to 67.10. The mean age was 33.24, and the range = was 21 to 81. The features were standardized using `StandardScaler` with a mean of 0 and a standard deviation of 1. The data was split into training (70%) and testing (30%) sets. The model used a Support Vector Machine (SVM) with a linear kernel. The model performance was Training Accuracy 78.66% and Test Accuracy 77.27%.  The model had a high accuracy score within the sample size used.  The model could predict patients who had diabetes by the parameters.

Downloads

Published

2024-09-11