TransWikia.com

Is too much or very few training sample of a specific feature hamper the neural network model?

Data Science Asked by Zannatul Ferdaus on August 26, 2021

I am analysing a technique "Sherlock" – a semantic type of column detecting technique wherein training dataset too many samples of a specific type are limited up to 15K and too few occurring samples exist less than 1K per class also excluded. What is the reason behind this? What are the disadvantages having too much or very few samples of a specific type in the input of a neural network?

One Answer

Theoretically speaking, there aren't any disadvantages to having too much or too few data. It will only reflect in the overall performance of your model. Based on the Sherlock paper, it seems that it's a choice they made for their preprocessing. This is their explanation:

Certain types occur more frequently in the VizNet corpus than others. For example, description and city are more common than collection and continent. To address this heterogeneity, we limited the number of columns to at most 15K per class and excluded the 10% types containing less than 1K columns

They did this to reduce the overall imbalance of their dataset.

Answered by Valentin Calomme on August 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP