TransWikia.com

What ML architecture fits fixed length signal regression?

Data Science Asked by Shay on December 15, 2020

My problem is of regression type –

How to estimate a fish weight using fixed length signal (80 data points) of the change in resistance when the fish swim through a gate with electrodes (basically 4 seconds of the fish passing at 20Hz logger)?

It is a spike shaped-signal, height and width depends on the size of the fish, its speed and proximity to the gates edges and probably other things like the water salinity and temperature.

I have a data set of 15 different weights, each with 20-110 samples, each with 2 spikes for the 2 sets of electrodes I use for measurement (using 2 sets can help determine where the fish is heading).

Here is an example of resistance readout of 340 gram fish experiment:

Example of resistance readout of 340 gram fish experiment

And here is an example of the extracted spikes from the same 340 gram fish experiment:

enter image description here

As you can see, there is a significant variance, which led me to look for a Neural Network approach that can get such signal as an input and estimate the fish weight.

Do you know of such "State of the Art" network that does that?
What would you try?
Maybe different ML technique?

Thanks!

Edit:

The data presented is post-processing, I extract the spikes using this python code (attached) so some of the noise is cleaned.
I’m not sure as to how to clean it any better since the experimenter didn’t record when a fish goes through the gate – all we have is the electrodes signal to deduce that a fish passed through.

# extracting the spikes 
def get_spikes(data_series_elc1, data_series_elc2, signal_meta):
    window_size = int(signal_meta['freq'])*4
    half_window = int(window_size/2)
    
    std = np.std(data_series_elc1)
    p10 = np.quantile(data_series_elc1, 0.9)
    spikes = []
    i = 0
    while i < len(data_series_elc1)-half_window:
        if data_series_elc1[i] > p10:
            #find next max to fix as the center
            max_indx = np.argmax(data_series_elc1[i:i+window_size]) half_window:i+max_indx+half_window])
            spike_list = [[data_series_elc1[i+max_indx-half_window:i+max_indx+half_window]],[data_series_elc2[i+max_indx-half_window:i+max_indx+half_window]]]
            if len(spike_list[0][0])==window_size:                
                spikes.append(spike_list) 
            
            i = i+max_indx+half_window
        else:        
            i = i+1
    print('Number of Spikes: ',len(spikes))
    return spikes

Also, I extract features like max, width, integral and Gaussian fit but a linear regression model only gets me ~R^2=0.6 => a mean error of ~103 gram over all fish
[100., 144., 200., 275., 339., 340., 370., 390., 400., 404., 480., 500., 526., 700., 740., 800., 840.], which is quite a large error.

A vanilla fully connected neural network gets about the same.

model = keras.Sequential()
model.add(keras.Input(shape=(80,)))
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(10, activation="relu"))
model.add(layers.Dense(1))

So I’m looking to improve these results, any ideas?

One Answer

One common approach for this type of data is take the integral and learn either a translation function to fish weight. Taking the integral simplifies the problem to a single number.

You probably do not need a state of the art model. A general linear model would probably pick out a signal.

Answered by Brian Spiering on December 15, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP