Cross Validated Asked by Rain on December 8, 2020

Using a custom activation function, when using SGD as an optimiser, except for setting the batch number to an excessively high value the loss will return as an NaN at some stage during training. Using Adam as an optimiser, this happens immediately regardless of batch size.

The reduced version of code used to test this:

```
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
class CustomActivation(tf.keras.layers.Layer):
def __init__(self):
super(CustomActivation, self).__init__()
def call(self, inputs):
x, y = inputs
return 1.0 / (1.0 + tf.math.exp(-1*(0.5*x*(1+tf.math.exp(2*x*y)))))
# load data
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# divide to be between 0 and 1
train_images = train_images / 255.0
test_images = test_images / 255.0
stuff = []
for i in range (60000):
stuff.append([0])
stuff = np.asarray(stuff)
stuff2 = []
for i in range (10000):
stuff2.append([0])
stuff2 = np.asarray(stuff2)
# declare inputs
input1 = keras.Input(shape=(28,28,))
input2 = keras.Input(shape=(1,))
#flatten
flat1 = layers.Flatten()(input1)
# weight and output layers
primary_1 = layers.Dense(10,)(flat1)
secondary_1 = layers.Dense(10,)(input2)
out = CustomActivation()([primary_1,secondary_1])
# declare model
model = keras.Model(inputs=[input1,input2],outputs=out)
model.summary()
# train and test
opt = keras.optimizers.SGD(lr=0.05)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=["accuracy"],
)
model.fit([train_images,stuff], train_labels, batch_size=20480, epochs=10)
test_loss, test_acc = model.evaluate([test_images,stuff2], test_labels, verbose=2)
print('nTest accuracy:', test_acc)
```

If the secondary input is set to any value other than zeros, then even a batch size of 20480 is too small.

As this is the fashion MNIST dataset, there are no NaN values in the input. Running such a large batch size in a more complicated network is unfeasible. Including l2 or l1 regularisation doesn’t allow for reducing the batch size.

It seems like a classic case of exploding gradients; the partial derivatives of activation for x and y are (calculated normally):

$$

frac{dA}{dx}=frac{1}{2}+(frac{1}{2}+xy)exp(2xy)

$$

and

$$

frac{dA}{dy}=x^{2}exp(2xy)

$$

However, it seems that as of Tensorflow 2, gradient clipping is deprecated–at minimum, no longer referenced in the documentation–so what can be done to avoid running into NaN so quickly in training?

2 Asked on January 7, 2022 by user4451922

1 Asked on January 7, 2022 by zheyuan-li

3 Asked on January 7, 2022

1 Asked on January 7, 2022 by lazylarry

1 Asked on January 7, 2022

0 Asked on January 5, 2022 by paul-kumar

1 Asked on January 5, 2022 by sarmes

monte carlo quasi monte carlo random generation resampling sensitivity analysis

1 Asked on January 5, 2022 by dmort

1 Asked on January 5, 2022 by chochot

0 Asked on January 5, 2022

1 Asked on January 5, 2022 by milan-bosnic

classification distributions probability regression uncertainty

1 Asked on January 5, 2022

1 Asked on January 5, 2022 by iuppiter

lstm neural networks rnn sequential pattern mining time series

0 Asked on January 5, 2022

logistic model selection multiple regression regression repeated measures

0 Asked on January 5, 2022 by kliocontar

2 Asked on January 5, 2022

bayesian conjugate prior distributions mathematical statistics terminology

0 Asked on January 5, 2022 by lstdnce

Get help from others!

Recent Answers

- Jon Church on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- haakon.io on Why fry rice before boiling?

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir