Optical Character Recognition in Forms

Yash Chauhan
2 min readNov 12, 2021

--

Optical character recognition (OCR) technology is a business solution for automating data extraction from printed or written text from a scanned document or image file and then converting the text into a machine-readable form to be used for data processing like editing or searching.

INTRODUCTION

This blog is about the project I was working on throughout my pre-final semester. In this project, I have used OCR technology to read data from different form images such as the Event Registration form, Bank Account Form, etc., and automatically fill in the user details on the webpage. For this project, I used the following:

  1. Tesseract OCR: Character Recognition
  2. Flask: Web development

IMPLEMENTATION

  1. Python Script :

Using OpenCV library and tesseract, wrote a python script that takes form image from the system. It then scans the form and provides the output with form field data.

2. Flask App

The data returned by python is then passed to webpage to fill the detail automatically. The user then require to review the detail and submit the form.

WORKING

  1. Upload Form image.
Form which we will upload

2. Uploaded form is then scanned and all the form fields data is extracted.

3. The data is then returned to webpage for review.

Choose File
select form
automatically fills the details after scanning

And then submit the form. It will save the user detail to the database.

USE CASES

This is very helpful in banks, where bank officials have to enter new customers detail manually in the system. They can just scan the form filled by the customer and every detail will be filled automatically. They just have to review the detail if there are any changes.

We can use this at many different events where handwritten forms are filled and then the data is entered into the system.

--

--

No responses yet