improving accuracy of logistic model

Question

I am trying to reproduce results from one paper, where authors minimized the following loss function
begin{align}
min_{w in R^d} frac{1}{n} sum_{i in [n]} log(1 + exp(-y_ix_i^Tw))+frac{lambda}{2}|w|^2,
end{align}
where $w$ are weights are $lambda$ is regularization parameter for  ijcnn1 dataset.

This dataset is specific by unbalanced data (90%-0, 10%-1).
As a preprocessing step, I applied MinMaxScaler and StandardScaler.

I use model written in keras, which looks simply as:

model = keras.Sequential([
    keras.layers.Dense(1, activation="sigmoid", kernel_initializer='uniform', kernel_regularizer=regularizers.l2(1e-4), use_bias=True)
])
sgd = optimizers.SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_dataset, epochs=1000)

However, I manage to obtain only 91% accuracy at the best.
Looking at the prediction, I observe that my model learns to predicts almost everything to be zero. I also tried using class_weight, but it did not seem to help. 
Does anybody have any suggestions, how to obtain better results?

Leevo · Answer

If you have a disproportionate amount of zeros, it means you models doesn't have enough data in order to learn how to correctly classify observations. Because it sees zeros almost all the time, it probably has learned to output zeros.

The main thing you can do to solve this problem is to train your model using mini-batches, and build them by sampling 0- and 1-observations with equal probability - in other words, attributing greater weight to 1-observations compared to zeros. In that way, your model will be fed with balanced data and will learn to classify them correctly.

improving accuracy of logistic model

One Answer

Add your own answers!

Ask a Question