AnswerBun.com

Looking for feedback on my approach to split data into validation and test set?

Cross Validated Asked by s_am on September 22, 2020

I want to perform a novelty detection using one-class svm (ocsvm) to detect network attacks. I want to know how to split my dataset into validation and test set. The dataset consists of five files. Each file is generated in a different day (five days) where each file has normal data instances and one type of attacks except the first file, which only has normal data instances.

Since I am doing a novelty detection, I will train my model on the first file data instances (which is only normal data). Concerning validation and test set, I am thinking to split the data in the rest of the files (four files) into validation and test set, meaning each file will have a validation set and test set. Then merge all the validation set files into one big file and do the same for the test set files then order them by date (date and time). Afterwards, convert all attack types instances into one label (1) and the normal data instance to (0) .

I want to mention that the instances of the network attacks differ from one file to another; for example, one file has only 11 attacks instances and about 700000 normal instances.

Is this a good approach to split the dataset?? If yes, what percentage should I consider for the validation and testing sets?? Or is there any other better way to do it??

Add your own answers!

Related Questions

What is the difference between RMSE and SEP

1  Asked on January 1, 2022 by tiago-dias

 

survival analysis using unbalanced sample

2  Asked on December 29, 2021 by jessi

         

Randomly sample point from a 2D pdf image

1  Asked on December 29, 2021 by c-wang

 

Spline regression with many features in R

1  Asked on December 29, 2021 by user2117258

     

transfer function-noise modelling in R

1  Asked on December 29, 2021 by stucash

         

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir