Optical Character Recognition in Forms
Optical character recognition (OCR) technology is a business solution for automating data extraction from printed or written text from a scanned document or image file and then converting the text into a machine-readable form to be used for data processing like editing or searching.
INTRODUCTION
This blog is about the project I was working on throughout my pre-final semester. In this project, I have used OCR technology to read data from different form images such as the Event Registration form, Bank Account Form, etc., and automatically fill in the user details on the webpage. For this project, I used the following:
- Tesseract OCR: Character Recognition
- Flask: Web development
IMPLEMENTATION
- Python Script :
Using OpenCV library and tesseract, wrote a python script that takes form image from the system. It then scans the form and provides the output with form field data.
2. Flask App
The data returned by python is then passed to webpage to fill the detail automatically. The user then require to review the detail and submit the form.
WORKING
- Upload Form image.
2. Uploaded form is then scanned and all the form fields data is extracted.
3. The data is then returned to webpage for review.
And then submit the form. It will save the user detail to the database.
USE CASES
This is very helpful in banks, where bank officials have to enter new customers detail manually in the system. They can just scan the form filled by the customer and every detail will be filled automatically. They just have to review the detail if there are any changes.
We can use this at many different events where handwritten forms are filled and then the data is entered into the system.