Automation in Data Engineering: Implementing GitHub Actions for CI/CD in ETL Workflows

Authors

  • Ronakkumar Bathani Sr. Data Engineer (Independent Researcher), Institute of Technology, Nirma University, INDIA

DOI:

https://doi.org/10.5281/zenodo.13994660

Keywords:

GITHUB, Data, ETL

Abstract

Automation in data engineering, particularly within ETL (Extract, Transform, Load) workflows, has become critical as data volumes increase. This paper demonstrates the use of GitHub Actions to implement Continuous Integration/Continuous Deployment (CI/CD) in ETL workflows, significantly improving performance and efficiency. Our research focuses on automating ETL stages—data extraction, transformation, and loading—using GitHub Actions to reduce errors, enhance deployment success rates, and minimize manual interventions. Results show a 33.33% reduction in average execution time, a 58.33% decrease in error rates, and an 18.75% increase in successful deployment rates. Additionally, automation led to a total time savings of 23 hours across ETL tasks. These findings highlight the importance of CI/CD in modern data engineering, emphasizing the role of automation in optimizing ETL workflows for greater reliability and efficiency.

Downloads

Download data is not yet available.

Downloads

Published

2022-02-28
CITATION
DOI: 10.5281/zenodo.13994660
Published: 2022-02-28

How to Cite

Ronakkumar Bathani. (2022). Automation in Data Engineering: Implementing GitHub Actions for CI/CD in ETL Workflows. International Journal of Engineering and Management Research, 12(1), 149–155. https://doi.org/10.5281/zenodo.13994660