Comparative Analysis of Machine Learning Models for Diabetes Prediction

Pratiksha Patil; Deepali Lawand; Mohit Gambas; Deepak Gaikwad

doi:10.5281/zenodo.15867611

Authors

Pratiksha Patil Department of Mathematics, Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
Deepali Lawand Department of Mathematics, Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
Mohit Gambas Ramsheth Thakur College of Commerce and Science, Kharghar, Maharashtra, India
Deepak Gaikwad Department of Physics, KTSP Mandal’s KMC College, Khopoli, Maharashtra, India

DOI:

https://doi.org/10.5281/zenodo.15867611

Keywords:

Diabetes Prediction, Machine Learning, Logistic Regression, Random Forest, Gradient Boosting, Linear Regression, Predictive Analytics

Abstract

Diabetes is a chronic health condition affecting millions worldwide, and early detection plays a vital role in effective disease management and prevention. In this study, we conduct a comparative analysis of four machine learning models—Logistic Regression, Random Forest, Gradient Boosting, and Linear Regression—applied to the Pima Indian Diabetes dataset obtained from Kaggle. The dataset comprises diagnostic measurements of female patients aged 21 and above of Pima Indian heritage. Each model is evaluated using key classification metrics, including accuracy, precision, recall, and F1-score. Among the models, Logistic Regression and Gradient Boosting achieved the highest accuracy of 75%, while Random Forest and Linear Regression showed slightly lower performance at 72% and 73.16%, respectively. The study highlights the effectiveness of ensemble methods and traditional classifiers in predicting diabetes outcomes and provides insight into their relative strengths for clinical decision support systems. These results suggest that machine learning can be a valuable tool in aiding early diagnosis and improving patient care strategies.

Downloads

Download data is not yet available.

References

International Diabetes Federation. (2021). IDF diabetes atlas. (10th ed.). Brussels, Belgium. Retrieved from: https://diabetesatlas.org/.

Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358.

Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.

Kaggle. (n.d.). Pima Indians diabetes database. Retrieved May 6, 2025, from: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. (3rd ed.). Wiley.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Faraway, J. J. (2016). Linear models with R. (2nd ed.). CRC Press.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

Comparative Analysis of Machine Learning Models for Diabetes Prediction

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Make a Submission

Information

Abstracting & Indexing

Current Issue

Similar Articles

Disease Prediction using Machine Learning

Influence of Environmental Factors on Handloom Machine Workstation

An Investigation into the Regulation of Artificial Intelligence in Financial Markets Using Organisational Responses and Legislative Barriers

Accident Severity Detection Using Machine Learning

Multiple Disease Prediction System Using ML

Sentiment Analysis of Twitter Data Using Machine Learning Techniques

Enhancing Email Communication Security through Hierarchical Machine Learning Models

Linear Regression Model in Prediction System

Talent Management Strategies for Attracting and Retaining Talented Employees- A Study with reference to National Bank of Oman

A Fuzzy Approach to T2Dm in Young Adults in India