Deep learning theory: why are hidden layers necessary?

Question

For this question, I’ll refer to the popular YouTube video by 3Blue1Brown on deep learning applied to recognition of written numbers.
The video describes a neural network with the following layers:

Input (individual grayscale pixels)

Small-scale feature determination (e.g., the top quarter of the loop in the number 9)

Larger-scale features (e.g., the entire loop in the number 9)

The output layer, which shows the probability of an input image being each of the numbers 0-9

I’m also going to try to read through the entire wiki section here, and I’m currently on the neural networks page.
I particularly like the explanation of coefficients... “A node combines input from the data with a set of coefficients, or weights, that either amplify or dampen that input, thereby assigning significance to inputs with regard to the task the algorithm is trying to learn; e.g. which input is most helpful is classifying data without error?“
Essentially, it’s saying that each input has some level of importance to each output, which to me begs the question... are component features/hidden layers necessary at all?  In the handwriting example, couldn’t every input node be connected to every output node without the use of hidden layers?  The idea is that all of the high-weight input pixels or a given output would still have high weights for that output, but the network would skip the feature/aggregation stages.  Is this just a matter of training efficiency (i.e., to prevent duplication by effectively extracting the same features more than once)?
Also, do the connections between various nodes need to specifically chosen so that the number of nodes and number and selection of connections are intelligently chosen?
Is it accurate to say that a sufficiently deep neural network looks to find the significance of all relevant combinations of input values and that’s basically all it’s doing?

fractalnature · Answer

Every input node can be connected to every output node, but that wouldn't account for non linearity of features and relationships amongst each other that are predictive. A neural network without any hidden layers, is just a regression. I think this is a great intro to deep learning.
In answer to your 2nd question: The number of nodes and connectivity are hyperparameters that should be optimized. Also, people use dropout (ie dropping out nodes) as a form of regularization.

Nicholas James Bailey · Answer

There’s a brilliant free interactive book here that explains how neural networks work if you want to understand them in more detail. The chapter I have linked to demonstrates that as long as there is at least one hidden layer, neural networks can approximate any function.
As fractalnature says above, if there were no hidden layers each of your output neurons would actually be a generalised linear model that linearly combines the input features. Your neural network would effectively be a one vs rest classifier. In many cases this would perform well, and it would certainly be easier to train than a neural network, but it wouldn’t be able to achieve the same performance as it wouldn’t be able to model non linear relationships between the features.

Deep learning theory: why are hidden layers necessary?

2 Answers

Add your own answers!

Ask a Question