Comparative Analysis of Machine Learning Models for Diabetes Prediction

Pratiksha Patil; Deepali Lawand; Mohit Gambas; Deepak Gaikwad

doi:10.5281/zenodo.15867611

Authors

Pratiksha Patil Department of Mathematics, Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
Deepali Lawand Department of Mathematics, Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
Mohit Gambas Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
Deepak Gaikwad Department of Physics, KTSP Mandal’s KMC College, Khopoli, Maharashtra, India

DOI:

https://doi.org/10.5281/zenodo.15867611

Keywords:

Diabetes Prediction, Machine Learning, Logistic Regression, Random Forest, Gradient Boosting, Linear Regression, Predictive Analytics

Abstract

Diabetes is a chronic health condition affecting millions worldwide, and early detection plays a vital role in effective disease management and prevention. In this study, we conduct a comparative analysis of four machine learning models—Logistic Regression, Random Forest, Gradient Boosting, and Linear Regression—applied to the Pima Indian Diabetes dataset obtained from Kaggle. The dataset comprises diagnostic measurements of female patients aged 21 and above of Pima Indian heritage. Each model is evaluated using key classification metrics, including accuracy, precision, recall, and F1-score. Among the models, Logistic Regression and Gradient Boosting achieved the highest accuracy of 75%, while Random Forest and Linear Regression showed slightly lower performance at 72% and 73.16%, respectively. The study highlights the effectiveness of ensemble methods and traditional classifiers in predicting diabetes outcomes and provides insight into their relative strengths for clinical decision support systems. These results suggest that machine learning can be a valuable tool in aiding early diagnosis and improving patient care strategies.

Downloads

Download data is not yet available.

References

International Diabetes Federation. (2021). IDF diabetes atlas. (10th ed.). Brussels, Belgium. Retrieved from: https://diabetesatlas.org/.

Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358.

Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.

Kaggle. (n.d.). Pima Indians diabetes database. Retrieved May 6, 2025, from: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. (3rd ed.). Wiley.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Faraway, J. J. (2016). Linear models with R. (2nd ed.). CRC Press.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

Comparative Analysis of Machine Learning Models for Diabetes Prediction

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Make a Submission

Information

Abstracting & Indexing

Current Issue

Similar Articles

Pneumonia and Covid-19 Prediction using CNN

Applying Convolutional-GRU for Term Deposit Likelihood Prediction

The Application of the Combination of Number and Shape in the Process of Mathematics Learning

Unmanned Aerial Vehicle (UAV) “Drones” using Machine Learning

Assessing the Effects of Tax Elasticity on Government Spending

Are Analytics Skills in Curriculum Aligned with Skills in Job Description? A Knowledge Graph Approach

Performance Analysis of Multi-Body Modeled Washing Machines (MBomWM)

Learning Management System Built Using the MERN Stack

Usability of “iCollege” Learning Management System in University Environment

QoS-Aware Virtual Machine Migration in Cloud Environment- A Review