TransWikia.com

Looking for feedback on my approach to split data into validation and test set?

Cross Validated Asked by s_am on September 22, 2020

I want to perform a novelty detection using one-class svm (ocsvm) to detect network attacks. I want to know how to split my dataset into validation and test set. The dataset consists of five files. Each file is generated in a different day (five days) where each file has normal data instances and one type of attacks except the first file, which only has normal data instances.

Since I am doing a novelty detection, I will train my model on the first file data instances (which is only normal data). Concerning validation and test set, I am thinking to split the data in the rest of the files (four files) into validation and test set, meaning each file will have a validation set and test set. Then merge all the validation set files into one big file and do the same for the test set files then order them by date (date and time). Afterwards, convert all attack types instances into one label (1) and the normal data instance to (0) .

I want to mention that the instances of the network attacks differ from one file to another; for example, one file has only 11 attacks instances and about 700000 normal instances.

Is this a good approach to split the dataset?? If yes, what percentage should I consider for the validation and testing sets?? Or is there any other better way to do it??

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP