TransWikia.com

Question about graphing the clusters in K means

Data Science Asked on May 13, 2021

I’ve used K means to cluster my data. Before using K means, I had used StandardScaler on my data to standardize the data. Now, I’m wondering how can I show the clusters of the original data. Scikit-learn gives the labels on the standardized data but I want to have the labels on the original data and show the clusters of the original data on the graph.

3 Answers

Option 1:

Keep and access the original data (e.g. by index) - recompute the means.

Option 2:

Apply the inverse transformation. StandardScaler is a linear transformation, so its reversible up to some loss of precision.

Answered by Has QUIT--Anony-Mousse on May 13, 2021

StandardScaler subtracts the mean from each variable and then divides it by the standard deviation. It's a common preprocessing step, certainly for k-means because this algorithm heavily depends on the scaling of the data.

If I understand correctly you want to visualize the original data and make use of the labels from k-means by doing so. You could either add the labels to the original data (assuming the order of the records did not change):

original_with_label = numpy.concatenate(original, labels, axis = 1)

Or you could transform the data back to its original scale:

transformed_back_to_original = scalar_fit.inverse_transform(transformed_data)

Answered by Pieter on May 13, 2021

I think this is a really good tutorial for you to consider.

Towards the end, the author shows you how to map the index back to the cluster IDs.

details = [(name,cluster) for name, cluster in zip(returns.index,idx)]

for detail in details:
    print(detail)

Answered by ASH on May 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP