TransWikia.com

Is it better to use separately regularization methods for Neural Networks (L2/L1 & Dropout)

Data Science Asked by machine_apprentice on March 15, 2021

I have been exploring different regularization approaches and observed the most common to be using either Dropout Layers or L1/L2 Regularization. I have seen many debates of whether it is of interest to either combine or seperate regularization methods.

In my case I have implemented/integrated both approaches (combined and separate). For which I have seen promising results when actually combining as it has helped me not to always overfit my models entirely while generally improving the r2 score of my model.

Question:

Is it preferable to combine L1/L2 Regularization with Dropout Layer, or is it better to use them separately?

Example Code:

def model_build(x_train):
    # Define Inputs for ANN
    input_layer = Input(shape = (x_train.shape[1],), name = "Input")
    #Create Hidden ANN Layers
    dense_layer = BatchNormalization(name = "Normalization")(input_layer)  
    dense_layer = Dense(128, name = "First_Layer", activation = 'relu', kernel_regularizer=regularizers.l1(0.01))(dense_layer)
    #dense_layer = Dropout(0.08)(dense_layer)
    dense_layer = Dense(128, name = "Second_Layer", activation = 'relu',  kernel_regularizer=regularizers.l1(0.00))(dense_layer)
    #dense_layer = Dropout(0.05)(dense_layer)

    #Apply Output Layers
    output = Dense(1, name = "Output")(dense_layer)

    # Create an Interpretation Model (Accepts the inputs from branch and has single output)
    model = Model(inputs = input_layer, outputs = output)

    # Compile the Model
    model.compile(loss='mse', optimizer = Adam(lr = 0.01), metrics = ['mse'])
    #model.compile(loss='mse', optimizer=AdaBound(lr=0.001, final_lr=0.1), metrics = ['mse'])

One Answer

I am unsure there will be a formal way to show which is best in which situations as it depends on many factors like your dataset, architecture of the your ANN - simply trying out different combinations is likely best.

It is worth noting that Dropout* is actually doing more than just regularization, it makes the model more robust,allowing it to try different nodes for prediction.

As for L1/L2 it just reduces overfitting by penalizing the higher weights.

Correct answer by Shiv on March 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP