Question about graphing the clusters in K means

Question

I've used K means to cluster my data. Before using K means, I had used StandardScaler on my data to standardize the data. Now, I'm wondering how can I show the clusters of the original data. Scikit-learn gives the labels on the standardized data but I want to have the labels on the original data and show the clusters of the original data on the graph.

Has QUIT--Anony-Mousse · Answer

Option 1:

Keep and access the original data (e.g. by index) - recompute the means.

Option 2:

Apply the inverse transformation. StandardScaler is a linear transformation, so its reversible up to some loss of precision.

Pieter · Answer

StandardScaler subtracts the mean from each variable and then divides it by the standard deviation. It's a common preprocessing step, certainly for k-means because this algorithm heavily depends on the scaling of the data.

If I understand correctly you want to visualize the original data and make use of the labels from k-means by doing so. You could either add the labels to the original data (assuming the order of the records did not change):

original_with_label = numpy.concatenate(original, labels, axis = 1)

Or you could transform the data back to its original scale:

transformed_back_to_original = scalar_fit.inverse_transform(transformed_data)

ASH · Answer

I think this is a really good tutorial for you to consider.
Towards the end, the author shows you how to map the index back to the cluster IDs.
details = [(name,cluster) for name, cluster in zip(returns.index,idx)]

for detail in details:
    print(detail)

Question about graphing the clusters in K means

3 Answers

Add your own answers!

Ask a Question