Predictive Modelling for Customer Purchase Behaviour: A Logistic Regression Approach Based on Age and Estimated Salary
DOI:
https://doi.org/10.5281/zenodo.17301302Keywords:
Logistic Regression, Customer Purchase Behaviour, Insurance Analytics, Streamlit Application, Decision Boundary Visualization, Predictive ModelingAbstract
Customer purchase prediction has become a critical requirement in the insurance industry, where businesses strive to maximize customer acquisition while minimizing marketing costs. Accurate forecasting of whether a potential customer will purchase an insurance policy allows companies to focus on high potential leads and optimize their strategies. In this study, we propose a predictive modelling approach using logistic regression to classify customers based on two key demographic features: Age and Estimated Salary. A dataset of over 1,000 customer records was pre-processed, visualized, and divided into training and testing subsets using an 80:20 ratio. The logistic regression model was trained to identify significant patterns influencing purchase decisions and to estimate the probability of policy adoption. To enhance usability, the trained model was deployed in a Streamlit based web application that includes secure user authentication, interactive input fields, decision boundary visualization, and a leaderboard to track predictive outcomes. Experimental results demonstrate that the logistic regression model achieves an accuracy of approximately 90%, with strong interpretability through coefficient analysis and decision boundary visualization. This work highlights the potential of combining machine learning models with lightweight, interactive applications to support business analysts and decision-makers. The proposed framework offers a scalable, interpretable, and cost-effective solution for insurance companies seeking to strengthen customer targeting. Future work will focus on incorporating additional demographic and behavioral features, applying advanced ensemble models, and integrating large-scale realworld datasets to further enhance prediction performance.
Downloads
References
Han, Jiawei, Kamber, Micheline, & Pei, Jian. (2011). Data mining: Concepts and techniques. (3rd ed.). San Francisco, USA: Morgan Kaufmann. ISBN: 978-0-12-381479.
Hastie, Trevor, Tibshirani, Robert, & Friedman, Jerome. (2009). The elements of statistical learning: Data mining, inference, and prediction. (2nd ed.). Springer, New York. ISBN: 978-0387848570.
Raschka, Sebastian, & Mirjalili, Vahid. (2019). Python machine learning: Machine learning and deep learning with python, scikit-learn, and tensorflow 2. (3rd ed.). UK: Packt Publishing, Birmingham. ISBN: 978-1789955750.
Murphy, Kevin P. (2012). Machine learning: A probabilistic perspective. Cambridge, MA, USA: MIT Press. ISBN: 978-0262018029.
McKinney, Wes. (2017). Python for data analysis: Data wrangling with pandas, NumPy, and IPython (2nd ed.). CA, USA: O’Reilly Media. ISBN: 978-1491957660.
Dewi, P., Nur, R., & Taufiqillah, R. (2022). Customer churn prediction for life insurance using binary logistic regression. Economic Reviews Journal, 3(3).
Yarmohammadtoosky, S., & Attota, D.C. (2024). Optimizing Fintech marketing: A comparative study of logistic regression and XGBoost. arXiv:2412.16333. DOI: 10.48550/arXiv.2412.16333.
Yin, S., Dey, D.K., Valdez, E.A., & Gan, G. (2020). Skewed link regression models for imbalanced binary response with applications to life insurance. arXiv:2007.15172.
Loisel, S., et al. (2019). Applying economic measures to lapse risk management with machine learning approaches. arXiv:1906.05087.
Collins, D. (2024). TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. DOI: 10.1136/bmj-2023-078378.
Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Wikipedia. (2025). Predictive modelling. Available at: https://en.wikipedia.org/wiki/Predictive_modelling. (Retrieved on 27/09/2025).
Wikipedia. (2025). Logit analysis in marketing. Available at: https://en.wikipedia.org/wiki/Logit_analysis_in_marketing. (Retrieved on 27/09/2025).
Streamlit Inc. (2025). Streamlit documentation. Available at: https://docs.streamlit.io/. (Retrieved on 27/09/2025).
Ongko, G. (2022). Building a machine learning web application using Streamlit. Towards Data Science.
GeeksforGeeks. (2025). Deploy a machine learning model using Streamlit library. Available at: https://www.geeksforgeeks.org/. (Retrieved on 27/09/2025).
Pykes, K. (2022). How to build an instant machine learning web application with Streamlit and FastAPI. NVIDIA Technical Blog.
Analytics Vidhya. (2021). Streamlit for ML web applications: Customer’s propensity to purchase. Analytics Vidhya Blog.
Omdena. (2022). 8 best Streamlit machine learning web app examples in 2024. Omdena Blog.
Reddit. (2025). Scaling Streamlit apps with task queues and Docker (user experience). Available at: https://www.reddit.com/. (Retrieved on 27/09/2025).
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Selvakumar S, Yogeshwaramoorthi K, P.M.G. Jegathambal

This work is licensed under a Creative Commons Attribution 4.0 International License.
Research Articles in 'International Journal of Engineering and Management Research' are Open Access articles published under the Creative Commons CC BY License Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/. This license allows you to share – copy and redistribute the material in any medium or format. Adapt – remix, transform, and build upon the material for any purpose, even commercially.






