AnswerBun.com

Is too much or very few training sample of a specific feature hamper the neural network model?

I am analysing a technique "Sherlock" – a semantic type of column detecting technique wherein training dataset too many samples of a specific type are limited up to 15K and too few occurring samples exist less than 1K per class also excluded. What is the reason behind this? What are the disadvantages having too much or very few samples of a specific type in the input of a neural network?

Data Science Asked on November 30, 2021

1 Answers

One Answer

Theoretically speaking, there aren't any disadvantages to having too much or too few data. It will only reflect in the overall performance of your model. Based on the Sherlock paper, it seems that it's a choice they made for their preprocessing. This is their explanation:

Certain types occur more frequently in the VizNet corpus than others. For example, description and city are more common than collection and continent. To address this heterogeneity, we limited the number of columns to at most 15K per class and excluded the 10% types containing less than 1K columns

They did this to reduce the overall imbalance of their dataset.

Answered by Valentin Calomme on November 30, 2021

Add your own answers!

Related Questions

Maximum Dimensionality of AWS Machine Learning

0  Asked on December 15, 2020 by 719016

   

Training neural network to generate realistic terrain for video games

0  Asked on December 15, 2020 by max-walczak

 

How to use Kaggle Api in Google Colab for directly using dataset?

1  Asked on December 15, 2020 by mozilla_firefox

         

what is label shift?

1  Asked on December 14, 2020 by marzi-heidari

   

Keras mnist.load_data() unshuffled?

1  Asked on December 14, 2020 by user4779

   

What are the alternatives to Python + Spark (pyspark)?

2  Asked on December 14, 2020 by stackoverflower

     

NER and context mapping

1  Asked on December 14, 2020 by skb

   

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved.