TransWikia.com

Which Object detection model will give the best result on images when the speed is not a problem for Text Images

Cross Validated Asked on November 21, 2021

I want to develop a model for cropping the equations from the Maths questions as people like me are struggling a lot for doing it manually for the research purpose. I want to know if we can do this? and if we can out of all the possible solutions out there for object recognition models, which one will produce the best results on Text images.

As there is tensorflow’s object recognition API, RCNN, Fast RCNN, Faster RCNN, YOLO (v-1,2,3,4,5).

An if there is any other , please do suggest. What I want to do is to detect the gray areas of equations in this image.

enter image description here

Note: The grey region shown in the image is for just demonstrating. My actual images are simple cropped questions from books with with background and black letters (most of the books)

One Answer

Note that there are two problems in this case: segmentation and classification. A neural net might be a solution for both steps in this case because you can easily generate zillions of labelled test images. Nevertheless, a classic approach should yield comparable results with much less efforts:

  1. Use a simple page segmentation alorithm like runlength smearing or bounding box merging for segmenting the image into regions
  2. Classify each region with an arbitrary classifier. You can use a NN on all normalized input pixels for this, but other classifiers like kNN should also work with gradient histograms as features (the gradients are computed on quasi grayscale images, which are generated from the onbit images by blurring). Gradient histograms were the state-of-the art features before the renaissance of neural nets.

Out of curiosity, I have tried out step one with the python library Gamera (gamera.sf.net) with the following code:

from gamera.core import *
init_gamera()

img = load_image("MathExpressionInputExample.png")
img = img.to_onebit()

img.remove_border()
segments = img.runlength_smearing()

# now you could process each segment (e.g. saving it to a file)
for seg in segments:
    # do some stuff

# visualize the result
color_ccs = img.graph_color_ccs(segments)
color_ccs.save_PNG("segments.png")

The result looks reasonable to me(note that the colors only indicate the segmentation, with adjacent segments having different colors):

Segmentation result

Answered by cdalitz on November 21, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP