TransWikia.com

Silhouette Score not robust when clustering time series with tslearn

Cross Validated Asked by bk_ on December 13, 2021

I have 40 univariate Time series which I am clustering with tslearn.

To determine a reasonable amount of clusters, I use the silhouette coefficient. However, I noticed that it is extremely unrobust, at it delivers different maxima.

I use dynamic time warping as distance measure and perform an minmax transformation to preprocess the time series

I cannot share the data, but my the df looks like this: (just a small piece)

time       | value | label
2020-01-01    1.3    10000
2020-01-02    1.9    10000
2020-01-01    0.5    20000
2020-01-02    1.2    20000

my code:

# imports
from tslearn.clustering import TimeSeriesKMeans, silhouette_score
from sklearn.preprocessing import minmax_scale
import pandas as pd

# get list of time series, perform minmax-transformation
ts = []
for ts_label in df[self.ts_col].unique():
    ts.append(minmax_scale(df.loc[df[label] == ts_label, 'value']))
    
# loop through different configurations for # of clusters and store the respective values for silhouette:
sil_scores = []
for n in range(2, 10):
km = TimeSeriesKMeans(n_clusters=n, metric="dtw")
km.fit(ts)
sil_scores.append(silhouette_score(ts, km.predict(ts), metric="dtw"))

# prepare resulting df
result_df = pd.DataFrame(data={
                "no_clusters": range(min_n, max_n+1),
                "silhouette_score": sil_scores,
        })

however, if I repeat this process for multiple times, I get different results: The highest value for silhouette_score is either at 2, 3 or 5 clusters (I tried this 11 times and got four times 2 / five time 3 / two times 5)

Is there an error in my code / methodology or is this a common problem of silhouette score?

Thanks in advance

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP