Loss found not good enough

Question

I'm trying to use a Neural Network in order to predict two different values from an input that contains 24 different features. The results I've gotten so far are not good enough, so any suggestions would be appreciated since I've been stuck for some time. This is what I've done so far:
Input data:
I have an input that contains 24 different features (the total dataset has around 150,000 instances). So I've tried to standarize my input, normalize it, log transform it, and use PCA in order to reduce the dimensionality of the problem. Out of this, PCA has proven to be the best solution (using the first 5 principal components).
To make sure that input is significant, I have done a quick fit using a Random Forest Regressor and a Extra Tress Regressor to calculate the importance of each feature (after having performed PCA). And compared to a random feature, all the features that are being used seem to be significant enough for the model.
Neural Network
For the neural network, I have tried a lot of things arriving to the following architecture:
initializer_sm = tf.keras.initializers.GlorotNormal()

model_nn = keras.Sequential([
    keras.layers.Dense(128,input_dim = xnn_train.shape[1], activation='selu', kernel_initializer=initializer),
    keras.layers.Dense(256, activation='selu', kernel_initializer=initializer),
    keras.layers.Dense(256, activation='selu', kernel_initializer=initializer),

keras.layers.Dense(2, activation = 'softplus', kernel_initializer=initializer_sm)
    ])

def custom_loss(y_true,y_pred):
    
    return (K.abs(y_true - y_pred)/y_true)*100

epochs = 100
opt = keras.optimizers.Nadam()
model_nn.compile(optimizer = opt, loss = custom_loss)
history_nn = model_nn.fit(xnn_train, ynn_train, epochs = epochs, batch_size = 1024, validation_data = (xnn_val,ynn_val))

With the following results:

And these are the histogram for each output predicted w.r.t the real values:

I have also tried to implement 1D convolutial layers before the architecture shown before, but the only effect that they have is that the loss function converges faster.

Abhishek Verma · Answer

Your network is too large for this much data. Reduce the number of units in the layers, go for simple 'relu' in the layers except the last one where you should use softmax. Consider reducing number of layers.
I would recommend using decision trees libraries like XGBoost, CatBoost, LightGBM etc. for this problem. You will get the best result.

Loss found not good enough

One Answer

Add your own answers!

Ask a Question