Classifiers for Page Numbers Sequences

Data Science Asked by siya m on August 19, 2020

I have a dataset which has the following columns : Document id, page numbers and labels.

documentid    pagenumbers          label
document1     1 23 26 45 48 76     fiction
document2     22 34 56 67          mystery
document3     61 78  82 99         science
document      4 12 32              mystery

Explaining my table below:
Row 1 corresponds to data for document 1: document 1 starts from page 1 to 23,restarts from 26 to 45. Document 1 continues again from 48 to 76 and ends. This document 1 represents stories belonging to class fiction.

Similarly row 2 has data corresponding to document 2: document 2 starts from page 22 to 34 and then restarts from 56 to 67. This document 2 represents stories belonging to class mystery and so on. Aim is to develop a classifier that can classify a document to be of a particular category (fiction,mystery,science) based on the page numbers.

I am looking for advice on what kind of classifiers could be used for classification a page number series. This isn’t time series and hence i am a bit confused whether i need to use complex algorithms like RNN, LSTM.Are there easier models that can use series of data such as page numbers as features?

One thing that I am also considering doing is to introduce padding to the page numbers so that all the page numbers are of equal length as I was considering sequence classifiers. Is this required?

Are there any graph traversals based ML algorithms that could be used? Networkx provides features like page rank, centrality.It would be interesting to explore such graph features. Any inputs would be helpful too
Looking for tips on any sequence classifier libraries or even networkx features that might be useful for my problem.

Add your own answers!

Related Questions

Decomposing R squared or VIF

1  Asked on December 17, 2020 by jun-jang


Missing population values in census data

1  Asked on December 17, 2020 by threadid


Sample Space of Longest Run

1  Asked on December 16, 2020 by a-may


When to choose character instead of factor in R?

2  Asked on December 16, 2020 by lupi5


Data storage for Intrusion Detection System

0  Asked on December 16, 2020 by user2071938


Demand Forecasting – Error in predictions

0  Asked on December 16, 2020 by gopik-anand


How does attention mechanism learn?

3  Asked on December 16, 2020 by user2790103


Representing geospatial information

1  Asked on December 15, 2020 by exi_le


Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir