TransWikia.com

Classifiers for Page Numbers Sequences

Data Science Asked by siya m on August 19, 2020

I have a dataset which has the following columns : Document id, page numbers and labels.

documentid    pagenumbers          label
document1     1 23 26 45 48 76     fiction
document2     22 34 56 67          mystery
document3     61 78  82 99         science
document      4 12 32              mystery

Explaining my table below:
Row 1 corresponds to data for document 1: document 1 starts from page 1 to 23,restarts from 26 to 45. Document 1 continues again from 48 to 76 and ends. This document 1 represents stories belonging to class fiction.

Similarly row 2 has data corresponding to document 2: document 2 starts from page 22 to 34 and then restarts from 56 to 67. This document 2 represents stories belonging to class mystery and so on. Aim is to develop a classifier that can classify a document to be of a particular category (fiction,mystery,science) based on the page numbers.

I am looking for advice on what kind of classifiers could be used for classification a page number series. This isn’t time series and hence i am a bit confused whether i need to use complex algorithms like RNN, LSTM.Are there easier models that can use series of data such as page numbers as features?

One thing that I am also considering doing is to introduce padding to the page numbers so that all the page numbers are of equal length as I was considering sequence classifiers. Is this required?

Are there any graph traversals based ML algorithms that could be used? Networkx provides features like page rank, centrality.It would be interesting to explore such graph features. Any inputs would be helpful too
Looking for tips on any sequence classifier libraries or even networkx features that might be useful for my problem.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP