TransWikia.com

Dealing with low variation data

Data Science Asked by Ifier on June 27, 2021

So my current project involves using a neural network to try and predict the probability of a player getting a kill in a first-person shooter.

I’ve recorded a number of features that should be relevant (positions, proximity to other players, amount of health, ect) and recorded the data for an amount of time (10 seconds) prior to one of the players trying to get a kill. This is to say, the training data is a number of sequences of the state of the game world in discrete time steps at the end of which one of the players will achieve a kill. The data does include some overlap, which should be fine since the outputs are supposed to represent the probability that each player will achieve a kill, not an actual prediction. The system is implemented so that the data is pre-processed into smaller 5 second sequences using a sliding window, which are used to train the network.

For the model I’ve attempted to use a convolutional neural network for multivariate time series classification (which I believe is the correct classification for this problem?) as detailed in this paper. The only modification I’ve made is the addition of a couple of fully-connected layers to the beginning of the model (the layers are only applied to one time step in the sequence at a time, like a temporal shared layer) in order to allow the model to learn a representation of the game state, since the convolution layer only convolves through time and a not insignificant chunk of the features are just one-hot encodings of data.

Though this brings me to the main issue. When training this network after a few iterations the network tends to get stuck predicting almost identical probabilities for every sequence, regardless of difference in input. After many epochs the network might change to predict a different label (and occasionally it will predict a second label for a very small subset of the training data, although this always disappears a couple epochs later), but it will still remain very insensitive to changes in input data. This lack of variation in the output, I believe, is related to the low variations in the input data.

The data for each frame can be split into 11 sections: the first is 8 features representing one-hot encodings of the map information, and then the next 10 are just 10 repeats of 67 features representing information about each of the 10 players. Of these 67 features, 36 are also one-hot encodings. Performing data analysis on the remaining 31 features I have found that they, aside from a couple of features, tend to vary by less than 0.1 within each sequence (bearing in mind that the data has been normalised).

I’m fairly new to developing models and have never encountered this problem before, so any insight on how to deal with it would be greatly appreciated.

I will gladly supply additional information on my model or the data if needed.

Thank you of any help you can offer!

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP