TransWikia.com

improving accuracy of logistic model

Data Science Asked on March 27, 2021

I am trying to reproduce results from one paper, where authors minimized the following loss function
begin{align}
min_{w in R^d} frac{1}{n} sum_{i in [n]} log(1 + exp(-y_ix_i^Tw))+frac{lambda}{2}|w|^2,
end{align}

where $w$ are weights are $lambda$ is regularization parameter for ijcnn1 dataset.

This dataset is specific by unbalanced data (90%-0, 10%-1).
As a preprocessing step, I applied MinMaxScaler and StandardScaler.

I use model written in keras, which looks simply as:

model = keras.Sequential([
    keras.layers.Dense(1, activation="sigmoid", kernel_initializer='uniform', kernel_regularizer=regularizers.l2(1e-4), use_bias=True)
])
sgd = optimizers.SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_dataset, epochs=1000)

However, I manage to obtain only 91% accuracy at the best.
Looking at the prediction, I observe that my model learns to predicts almost everything to be zero. I also tried using class_weight, but it did not seem to help.
Does anybody have any suggestions, how to obtain better results?

One Answer

If you have a disproportionate amount of zeros, it means you models doesn't have enough data in order to learn how to correctly classify observations. Because it sees zeros almost all the time, it probably has learned to output zeros.

The main thing you can do to solve this problem is to train your model using mini-batches, and build them by sampling 0- and 1-observations with equal probability - in other words, attributing greater weight to 1-observations compared to zeros. In that way, your model will be fed with balanced data and will learn to classify them correctly.

Answered by Leevo on March 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP