Fast AI Lesson 4 - MNIST. Confused about multiplying weights by pixels?

Question

I’m on lesson 4 of the Fast AI "Deep Learning for Coders" course, and have been back through the same lesson a few times now but I don’t think I’m quite getting a few things. I want to have an understanding of what’s going on before moving on.
This lesson is on MNIST - and Jeremy is recognising 3s vs 7s. So he has 12000 images (ignoring mini-batches) of about 800 pixels each, and his tensor has a shape of (12000, 800). He randomly generates 800 weights, 1 for each pixel, plus a bias (which I don’t quite understand the purpose of either).
So then for the first image, he multiplies the image’s value by the weight for that pixel. So if the pixel value is 0.9 and the weight is 0.1, 0.9*0.1 = 0.09. Or if the pixel value is 0.5 and the weight is 0.6, 0.5*0.6 = 0.3.
(and then a bias is added, and this is done for every image)
I don’t understand what the purpose or benefit is of multiplying these numbers together. If I wanted a score of how accurate is our weight to the pixel value, I’d do abs(value - weight).
And then adding these up would give me a score of how accurate my weights are.
This is where I’m confused. what am I missing?

Oxbowerce · Answer

The input images contain a total of 784 (28 * 28) pixels, meaning that the model has 784 inputs. If we then use one node in the next layer which is connected to all inputs we also get 784 weights as each pixel is connected to the node in the next layer. See also the following image:

The input layer contains all the different inputs to our model (i.e. all pixels) and as you can see all those inputs are connected to all nodes in the next layer. Each connection/line has a specific weight assigned to it, the number of weights for a fully connected layer is equal to n_input * n_output. In the fast ai course this means that there are 784 weights (28 * 28 * 1), while in the example below there are 12 weights (3 * 4) for the first and second layer and 8 (4 * 2) for the second and third layer.
The reason for the bias already seems explained in the notebook itself, without the bias the function is not flexible enough. If all inputs are equal to zero the output would also be zero, whereas it needs to be a number different from zero. The bias allows for this by shifting the function up and down, see also the b term for a linear regression.

andins · Answer

Maybe thinking about the relation between neural networks and regression might help you understand the role of weights.
If you take a shallow single layer neural network, the output for a single input sample with $m$ features is:
$y = f(w_0 + sum_{i=1}^m w_i x_i)$
where $w_0$ is the bias and $f$ is your activation function.
You can recognize this is a non-linear regression formula, with the bias being the intercept - it reduces to linear regression when $f$ is the identity.
So, the weights are just the parameters of this regression, this is why you multiply the input values by the weights.
A deep neural network just adds other layers to allow for arbitrary (not limited to the function $f$) non-linearity - actually two layers already give the ability to fit arbitrary functions, provided enough neurons are present, but this networks might be practically impossible to fit to data and this is where additional layers with some structure come to rescue but this is another story.
Your input features are the values of the pixels, so they get multiplied by the weights because this is how the model is formulated, similar to a regression.
Hope this helps clarify.

Fast AI Lesson 4 - MNIST. Confused about multiplying weights by pixels?

2 Answers

Add your own answers!

Ask a Question