TransWikia.com

Create a model that can extract only specific data out of receipts or invoices?

Data Science Asked on February 25, 2021

I’m trying to build a model that is capable of identifying only some of the information on receipts and invoices.

All the documents having different structure in image format.

Sample Data :
Click here

I have used pypdf2 and pytesseract for text extraction from the receipt but the problem is that just returns all the text from a receipt. Tried to work with regex but as the varieties for documents are different everytime so it is not working in this case.

Looking on to build a model that returns only a certain fields such as total price, Date, Tax from a receipt.

I could parse the text to extract by hard coding things but it’s not optimal I think. Is there any way to build model for this use case which can identify the required parameters and capture the values. I am looking for something to go on this project.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP