TransWikia.com

What is the best machine learning algorithm for large, noisy datasets with interaction between variables?

Data Science Asked on March 28, 2021

My initial thought was a neural network but I don’t see how a neural network can properly predict interaction between variables (ie. x1 * x2) since each node is just a sum of previous inputs?

Would a decision tree be better suited at capturing the interaction between variables?

My dataset is large, with 400 features and 5,000,000 instances. All data is in percentile and the label is also a percentile. The dataset is quite noisy as well, (customer data, predicting likelihood of becoming a return customer).

3 Answers

Probabilistic Random Forest tends to work better then other algorithms on noisy datasets. But the data you are using also plays a major role on whether a algorithm will work or not. Check this paper Probabilistic Random Forest for more details. Happy Learning

Answered by Shiv on March 28, 2021

Ensemble methods, boosting or bagging, often give predictive accuracies superior to other methods. From my personal experience, I find GBM (ie. Gradient Boosting Regressor over Decision Trees) and LightGBM(faster) often give very accurate predictions.

Check out this diagram on choosing the right estimator.

Answered by Chong Lip Phang on March 28, 2021

I would make the following models:

  1. a null baseline model
  2. a linear regression model with the most highly correlated features
  3. create polynomial features and do feature selection to just pick the top 10 or 20 features and try those with a linear regression model.
  4. #3 but with ridge regression
  5. a LightGBM model with the original features
  6. If you think you can still squeeze out some performance and it's worth the time/effort tradeoff, move to neural nets. As long as you have a few layers and a decent number of nodes and a non-linear transformation (e.g. RELU) it should be able to pick up interactions.

If something looks promising, go that direction.

Answered by jeffhale on March 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP