OCR Text Extraction

Alan Jiju; Shaun Tuscano; Chetana Badgujar

doi:10.31033/ijemr.11.2.11

OCR Text Extraction

Authors

Alan Jiju Student, Department of IT, Fr. Conceicao Rodrigues Institute of Technology, Vashi, Navi Mumbai, INDIA
Shaun Tuscano Student, Department of IT, Fr. Conceicao Rodrigues Institute of Technology, Vashi, Navi Mumbai, INDIA
Chetana Badgujar Assistant Professor, Department of IT, Fr. Conceicao Rodrigues Institute of Technology, Vashi, Navi Mumbai, INDIA

DOI:

https://doi.org/10.31033/ijemr.11.2.11

Keywords:

OpenCV, Optical Character Reader (OCR), Tesseract, Document Detection

Abstract

This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.

Downloads

Download data is not yet available.

Downloads

PDF ³⁷³

Published

2021-04-30

CITATION

DOI: 10.31033/ijemr.11.2.11

Published: 2021-04-30

How to Cite

Alan Jiju, Shaun Tuscano, & Chetana Badgujar. (2021). OCR Text Extraction. International Journal of Engineering and Management Research, 11(2), 83–86. https://doi.org/10.31033/ijemr.11.2.11

Download Citation

Issue

Vol. 11 No. 2 (2021): April Issue

Section

Articles

License

Research Articles in 'International Journal of Engineering and Management Research' are Open Access articles published under the Creative Commons CC BY License Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/. This license allows you to share – copy and redistribute the material in any medium or format. Adapt – remix, transform, and build upon the material for any purpose, even commercially.

OCR Text Extraction

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Information

Abstracting & Indexing

Current Issue

Similar Articles

OSEMN Approach for Real Time Data Analysis

Effect of Pulsating Current on 304HCU GTAW Joints

Water Leakage Detection System

Early Detection and Prevention of Lungs Cancer using Artificial Neural Network

IOT-Based Gemstone Detection and Analysis System

Automatic Speech Recognition System to Analyze Autism Spectrum Disorder in Young Children

E-Hospital Management & Hospital Information System – Use of IOT

Virtual and Real World in Mobile Reality

Education with Augmented Reality

Plasmas and Various Nonlinearities, Causing Self Focusing Effect