# Classifiers for Page Numbers Sequences

Data Science Asked by siya m on August 19, 2020

I have a dataset which has the following columns : Document id, page numbers and labels.

documentid    pagenumbers          label
document1     1 23 26 45 48 76     fiction
document2     22 34 56 67          mystery
document3     61 78  82 99         science
document      4 12 32              mystery


Explaining my table below:
Row 1 corresponds to data for document 1: document 1 starts from page 1 to 23,restarts from 26 to 45. Document 1 continues again from 48 to 76 and ends. This document 1 represents stories belonging to class fiction.

Similarly row 2 has data corresponding to document 2: document 2 starts from page 22 to 34 and then restarts from 56 to 67. This document 2 represents stories belonging to class mystery and so on. Aim is to develop a classifier that can classify a document to be of a particular category (fiction,mystery,science) based on the page numbers.

I am looking for advice on what kind of classifiers could be used for classification a page number series. This isn’t time series and hence i am a bit confused whether i need to use complex algorithms like RNN, LSTM.Are there easier models that can use series of data such as page numbers as features?

One thing that I am also considering doing is to introduce padding to the page numbers so that all the page numbers are of equal length as I was considering sequence classifiers. Is this required?

Are there any graph traversals based ML algorithms that could be used? Networkx provides features like page rank, centrality.It would be interesting to explore such graph features. Any inputs would be helpful too
Looking for tips on any sequence classifier libraries or even networkx features that might be useful for my problem.

## Related Questions

### Decomposing R squared or VIF

1  Asked on December 17, 2020 by jun-jang

### Finding relationships between multiple readings from an IoT sensor network time-series data

0  Asked on December 17, 2020 by prabath

### HR employee attrition modeling – making a balanced sample question

0  Asked on December 16, 2020 by nimrod-ets

### Sample Space of Longest Run

1  Asked on December 16, 2020 by a-may

### When to choose character instead of factor in R?

2  Asked on December 16, 2020 by lupi5

### Reporting descriptive statistics in case of missing data

0  Asked on December 16, 2020 by jishan

### Hyperparameter search for LSTM-RNN using Keras (Python)

4  Asked on December 16, 2020 by wacax

### Product classification according to description

1  Asked on December 16, 2020 by huy

### How to choose appropriate epsilon value while approximating gradients to check training?

0  Asked on December 16, 2020 by david-tth

### How to build a classification pipeline that will pass to another model?

0  Asked on December 16, 2020 by samay-lakhani

### Data storage for Intrusion Detection System

0  Asked on December 16, 2020 by user2071938

### What is the exact definition of VC dimension?

2  Asked on December 16, 2020 by kaushal28

### How to use fresh data when target prediction period is long?

2  Asked on December 16, 2020 by haffi112

### Confusion in applying k-fold cross validation to dataset

1  Asked on December 16, 2020 by mr-nlp

### How would you describe cluster 2 from this output of a run of the EM program?

1  Asked on December 16, 2020 by shroomy

### Demand Forecasting – Error in predictions

0  Asked on December 16, 2020 by gopik-anand

### How does attention mechanism learn?

3  Asked on December 16, 2020 by user2790103

### Does standardization result in normal distribution?

1  Asked on December 15, 2020 by scarlett-rouge

### Representing geospatial information

1  Asked on December 15, 2020 by exi_le