Comparative Analysis of Machine Learning Models for Diabetes Prediction

Authors

  • Pratiksha Patil Department of Mathematics, Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
  • Deepali Lawand Department of Mathematics, Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
  • Mohit Gambas Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
  • Deepak Gaikwad Department of Physics, KTSP Mandal’s KMC College, Khopoli, Maharashtra, India

DOI:

https://doi.org/10.5281/zenodo.15867611

Keywords:

Diabetes Prediction, Machine Learning, Logistic Regression, Random Forest, Gradient Boosting, Linear Regression, Predictive Analytics

Abstract

Diabetes is a chronic health condition affecting millions worldwide, and early detection plays a vital role in effective disease management and prevention. In this study, we conduct a comparative analysis of four machine learning models—Logistic Regression, Random Forest, Gradient Boosting, and Linear Regression—applied to the Pima Indian Diabetes dataset obtained from Kaggle. The dataset comprises diagnostic measurements of female patients aged 21 and above of Pima Indian heritage. Each model is evaluated using key classification metrics, including accuracy, precision, recall, and F1-score. Among the models, Logistic Regression and Gradient Boosting achieved the highest accuracy of 75%, while Random Forest and Linear Regression showed slightly lower performance at 72% and 73.16%, respectively. The study highlights the effectiveness of ensemble methods and traditional classifiers in predicting diabetes outcomes and provides insight into their relative strengths for clinical decision support systems. These results suggest that machine learning can be a valuable tool in aiding early diagnosis and improving patient care strategies.

Downloads

Download data is not yet available.

References

International Diabetes Federation. (2021). IDF diabetes atlas. (10th ed.). Brussels, Belgium. Retrieved from: https://diabetesatlas.org/.

Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358.

Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.

Kaggle. (n.d.). Pima Indians diabetes database. Retrieved May 6, 2025, from: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. (3rd ed.). Wiley.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Faraway, J. J. (2016). Linear models with R. (2nd ed.). CRC Press.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

Published

2025-06-26
CITATION
DOI: 10.5281/zenodo.15867611
Published: 2025-06-26

How to Cite

Patil, P., Lawand, D., Gambas, M., & Gaikwad, D. (2025). Comparative Analysis of Machine Learning Models for Diabetes Prediction. International Journal of Engineering and Management Research, 15(3), 89–93. https://doi.org/10.5281/zenodo.15867611

Issue

Section

Articles

Most read articles by the same author(s)

Similar Articles

<< < 5 6 7 8 9 10 11 12 13 14 > >> 

You may also start an advanced similarity search for this article.