Data simulation using make_classification in Python

Question

I have a question about data simulation in Python. I deal with the classification of imbalanced data and want to test the effectiveness of different methods on simulated data. I have seen in various articles and books that the make_classification function is used to generate data. Then the data is generated from a normal distribution, so the data is continuous and not discrete. Are such data correct for classification (SVM, Decision Trees) research?

tkarahan · Answer

There is no obstacle to doing this. For example you can create data by make_classification, and compare different algorithms by building model on it. You can also pass a random_state value to obtain same data each time you call the function. Both SVM, and Decision Trees can work with continuous data.

Answered by tkarahan on July 3, 2021

Data simulation using make_classification in Python

One Answer

Add your own answers!

Ask a Question