Metrics values are equal while training and testing a model

Question

I'm working on a neural network model with python using Keras with TensorFlow backend. Dataset contains two sequences with a result which can be 1 or 0 and positives to negatives ratio in dataset is 1 to 9. Model gets the two sequences as input and outputs a probability. At first my model had a Dense layer with one hidden unit and sigmoid activation function as output but then I changed my models last layer to a Dense with two hidden unit and softmax activation function and changed my dataset's result using Keras to_categorical function. After these changes the model metrics which contains Accuracy, Precision, Recall, F1, AUC are all equal and has a high and wrong value. Here are the implementation I used for those metrics
def recall(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def precision(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1(y_true, y_pred):
    precisionValue = precision(y_true, y_pred)
    recallValue = recall(y_true, y_pred)
    return 2*((precisionValue*recallValue)/(precisionValue+recallValue+K.epsilon()))

def auc(y_true, y_pred):
    auc = tf.metrics.auc(y_true, y_pred)[1]
    K.get_session().run(tf.local_variables_initializer())
    return auc

and here is the training result
Epoch 1/5
4026/4026 [==============================] - 17s 4ms/step - loss: 1.4511 - acc: 0.9044 - f1: 0.9044 - auc: 0.8999 - precision: 0.9044 - recall: 0.9044
Epoch 2/5
4026/4026 [==============================] - 15s 4ms/step - loss: 1.4573 - acc: 0.9091 - f1: 0.9091 - auc: 0.9087 - precision: 0.9091 - recall: 0.9091
Epoch 3/5
4026/4026 [==============================] - 15s 4ms/step - loss: 1.4573 - acc: 0.9091 - f1: 0.9091 - auc: 0.9083 - precision: 0.9091 - recall: 0.9091
Epoch 4/5
4026/4026 [==============================] - 15s 4ms/step - loss: 1.4573 - acc: 0.9091 - f1: 0.9091 - auc: 0.9090 - precision: 0.9091 - recall: 0.9091
Epoch 5/5
4026/4026 [==============================] - 15s 4ms/step - loss: 1.4573 - acc: 0.9091 - f1: 0.9091 - auc: 0.9085 - precision: 0.9091 - recall: 0.9091

after that I tested my model using predict and calculated metrics using sklearn's precision_recall_fscore_support function and I got the same result again. metrics are all equal and has high value (0.93) which is wrong based on the confusion matrix I generated

What am I doing wrong?

toodef · Answer

This may happen by different reasons:

You're model learned to fast (less than one epoch). In this case, u need to increase dataset (by adding some data or augmentations)
You're loss function doesn't work correctly return zero gradients every call
You 'train' model not in train mode
You data load incorrect. Test what is input for u model

Ugur MULUK · Answer

Your problem seems to be the class imbalance problem. You have too much samples from one class compared to other. The optimizer which is trying to minimize the loss solves the problem by learning to predict the superior class to minimize the error: it cheats. What you should do is to give weights to the classes or the samples according to their proportion to other class(es), so that falsely predicting the minor class gets more costly, whereas truely predicting the superior class gets cheap rewarding for the optimizer.

You can find how you can calculate class weights or sample weights is can be found from the answer: https://stackoverflow.com/a/44855957/10491777

And how you should use(those are pieces from my codes) in Keras:

nn.fit(x_train, y_train, callbacks = [es], epochs=8000, batch_size=64, shuffle=True, validation_data=(x_dev, y_dev),
           sample_weight = sample_weights)

if you want to use sample_weights. And similarly,

nn.fit(x_train, y_train, callbacks = [es], epochs=8000, batch_size=64, shuffle=True, validation_data=(x_dev, y_dev),
           class_weight=class_weight)

if you want to use class weights. There won't be any difference though, just dont use both.

Good luck!

Metrics values are equal while training and testing a model

2 Answers

Add your own answers!

Ask a Question