Changing representation in deep neural network

Say I have a neural net that outputs a vector of length 4 such as:

[0, 1, 2, 3]

Now say that the only way to calculate the loss is to convert this output to a one-hot vector matrix and pass that into the loss function:

[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]

This is a hypothetical question (obviously the answer to this question wouldn’t be to the aforementioned scenario, but to another more realistic, relevant one).

So, once we have calculated the loss using the one-hot vector matrix, is it still possible to back propogate and train the network even though there were two different representations used. A more general question would be, if I convert representations from the output of the neural net to the loss function ( output of neural net => some representation conversion => loss function), is it still possible to back propogate and optimize?

Cross Validated Asked on December 26, 2020

It depends on how you do this.

• If you're using PyTorch and you do all of your operation using torch.Tensor objects, then the answer is "yes, you can backprop this loss correctly because that's the whole point of using torch.Tensor objects." So, this code will work: loss = torch.eye(x.size).dot(x).sum() where x is your 4-vector. Just replace sum with whatever differentiable function(s) you need. (Rounding or other non-differentiable operations are not suitable.)

• If you dump your torch.Tensor objects to numpy, this is not possible because a numpy.array object does not record gradients, nor does it have a backward method.

Correct answer by Sycorax on December 26, 2020

Related Questions

ARIMA(0,0,0) model but residuals not white noise

0  Asked on November 30, 2020 by mathias-schinnerup-hejberg

Conditional Multivariate Gaussian Identity

1  Asked on November 29, 2020 by statian

Interpreting infinite odds ratio and confidence interval from Fisher’s test

2  Asked on November 29, 2020

Log-likelihood of Normal Distribution: Why the term $frac{n}{2}log(2pi sigma^2)$ is not considered in the minimization of SSE?

1  Asked on November 29, 2020 by javier-tg

Significant change over NxM distributions

0  Asked on November 29, 2020

Random distribution which statistical method

0  Asked on November 29, 2020 by theundecided

Bayes Optimal Classification Rule

1  Asked on November 28, 2020 by john

Calculate confidence interval over Relative Prediction Error

1  Asked on November 28, 2020 by xeon123

What could be a good way to interpret this neurophysiological data?

0  Asked on November 28, 2020 by ystein-dunker

The definition of natural cubic splines for regression

2  Asked on November 28, 2020 by durin

Why do we need to emphasize sufficient statistics in generalized linear models?

0  Asked on November 27, 2020 by tranquil-coder

How to impose restrictions on a random matrix via its prior distribution?

1  Asked on November 27, 2020 by souled_outt

Bounds for the expectation of the max

1  Asked on November 27, 2020 by user3285148

In OLS, when absolute value of the coefficient of a independent variable is large, does it mean this independent variable is more important?

0  Asked on November 26, 2020 by duke-yue

How to estimate click through rate lift for a ranking model?

0  Asked on November 26, 2020 by etang

Comparing percentages based on likert scale by year

1  Asked on November 26, 2020 by chris-beeley

The probability that the minimum of a multivariate Gaussian exceeds zero

0  Asked on November 26, 2020 by jld

Cohen’s Kappa for more than two categories

1  Asked on November 25, 2020 by asra-khalid

How can I forecast with multiple time series sampled at different frequencies?

1  Asked on November 25, 2020 by dkent