Predictive Modelling for Customer Purchase Behaviour: A Logistic Regression Approach Based on Age and Estimated Salary

Authors

  • Selvakumar S UG Student, Department of CSE AI & DS, VELS University, Chennai, Tamil Nadu, India
  • Yogeshwaramoorthi K UG Student, Department of CSE AI & DS, VELS University, Chennai, Tamil Nadu, India
  • P.M.G. Jegathambal Assistant Professor, Department of CSE, VELS University, Chennai, Tamil Nadu, India

DOI:

https://doi.org/10.5281/zenodo.17301302

Keywords:

Logistic Regression, Customer Purchase Behaviour, Insurance Analytics, Streamlit Application, Decision Boundary Visualization, Predictive Modeling

Abstract

Customer purchase prediction has become a critical requirement in the insurance industry, where businesses strive to maximize customer acquisition while minimizing marketing costs. Accurate forecasting of whether a potential customer will purchase an insurance policy allows companies to focus on high potential leads and optimize their strategies. In this study, we propose a predictive modelling approach using logistic regression to classify customers based on two key demographic features: Age and Estimated Salary. A dataset of over 1,000 customer records was pre-processed, visualized, and divided into training and testing subsets using an 80:20 ratio. The logistic regression model was trained to identify significant patterns influencing purchase decisions and to estimate the probability of policy adoption. To enhance usability, the trained model was deployed in a Streamlit based web application that includes secure user authentication, interactive input fields, decision boundary visualization, and a leaderboard to track predictive outcomes. Experimental results demonstrate that the logistic regression model achieves an accuracy of approximately 90%, with strong interpretability through coefficient analysis and decision boundary visualization. This work highlights the potential of combining machine learning models with lightweight, interactive applications to support business analysts and decision-makers. The proposed framework offers a scalable, interpretable, and cost-effective solution for insurance companies seeking to strengthen customer targeting. Future work will focus on incorporating additional demographic and behavioral features, applying advanced ensemble models, and integrating large-scale realworld datasets to further enhance prediction performance.

Downloads

Download data is not yet available.

References

Han, Jiawei, Kamber, Micheline, & Pei, Jian. (2011). Data mining: Concepts and techniques. (3rd ed.). San Francisco, USA: Morgan Kaufmann. ISBN: 978-0-12-381479.

Hastie, Trevor, Tibshirani, Robert, & Friedman, Jerome. (2009). The elements of statistical learning: Data mining, inference, and prediction. (2nd ed.). Springer, New York. ISBN: 978-0387848570.

Raschka, Sebastian, & Mirjalili, Vahid. (2019). Python machine learning: Machine learning and deep learning with python, scikit-learn, and tensorflow 2. (3rd ed.). UK: Packt Publishing, Birmingham. ISBN: 978-1789955750.

Murphy, Kevin P. (2012). Machine learning: A probabilistic perspective. Cambridge, MA, USA: MIT Press. ISBN: 978-0262018029.

McKinney, Wes. (2017). Python for data analysis: Data wrangling with pandas, NumPy, and IPython (2nd ed.). CA, USA: O’Reilly Media. ISBN: 978-1491957660.

Dewi, P., Nur, R., & Taufiqillah, R. (2022). Customer churn prediction for life insurance using binary logistic regression. Economic Reviews Journal, 3(3).

Yarmohammadtoosky, S., & Attota, D.C. (2024). Optimizing Fintech marketing: A comparative study of logistic regression and XGBoost. arXiv:2412.16333. DOI: 10.48550/arXiv.2412.16333.

Yin, S., Dey, D.K., Valdez, E.A., & Gan, G. (2020). Skewed link regression models for imbalanced binary response with applications to life insurance. arXiv:2007.15172.

Loisel, S., et al. (2019). Applying economic measures to lapse risk management with machine learning approaches. arXiv:1906.05087.

Collins, D. (2024). TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. DOI: 10.1136/bmj-2023-078378.

Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Wikipedia. (2025). Predictive modelling. Available at: https://en.wikipedia.org/wiki/Predictive_modelling. (Retrieved on 27/09/2025).

Wikipedia. (2025). Logit analysis in marketing. Available at: https://en.wikipedia.org/wiki/Logit_analysis_in_marketing. (Retrieved on 27/09/2025).

Streamlit Inc. (2025). Streamlit documentation. Available at: https://docs.streamlit.io/. (Retrieved on 27/09/2025).

Ongko, G. (2022). Building a machine learning web application using Streamlit. Towards Data Science.

GeeksforGeeks. (2025). Deploy a machine learning model using Streamlit library. Available at: https://www.geeksforgeeks.org/. (Retrieved on 27/09/2025).

Pykes, K. (2022). How to build an instant machine learning web application with Streamlit and FastAPI. NVIDIA Technical Blog.

Analytics Vidhya. (2021). Streamlit for ML web applications: Customer’s propensity to purchase. Analytics Vidhya Blog.

Omdena. (2022). 8 best Streamlit machine learning web app examples in 2024. Omdena Blog.

Reddit. (2025). Scaling Streamlit apps with task queues and Docker (user experience). Available at: https://www.reddit.com/. (Retrieved on 27/09/2025).

Published

2025-10-04
CITATION
DOI: 10.5281/zenodo.17301302
Published: 2025-10-04

How to Cite

Selvakumar, S., Yogeshwaramoorthi, K., & Jegathambal, P. (2025). Predictive Modelling for Customer Purchase Behaviour: A Logistic Regression Approach Based on Age and Estimated Salary. International Journal of Engineering and Management Research, 15(5), 34–43. https://doi.org/10.5281/zenodo.17301302

Similar Articles

<< < 38 39 40 41 42 43 

You may also start an advanced similarity search for this article.