TransWikia.com

PCA for dimensionality reduction with simultaneous clustering

Data Science Asked on October 2, 2021

so, let’s say I have a set of 3D points. Let’s say these points lie more or less on a plane that is embedded in the 3d space, then I can use PCA to ‘compress’ these 3D points to 2D coordinates on that plane, such that they still aproximate the original data well.

let’s say half of the 3d points don’t lie close to that plane, but instead close to some other plane.

If I just do PCA and reduce to 2 dimensions, I won’t get a good aproximation.

If the algorithm however would ‘see’ that some of the 3d points compress well onto one plane, and others compress well on another plane and label each point and do PCA separately for each set (and compress them to points with 2 coordinates plus one bit that says which set it belongs to) it will aproximate the original data much better.

What’s the name for such a PCA algorithm that is also capable of splitting the input data into maximally N sets (probably with some penalty on the number of sets), such that for each set dimensionality reduction yields a much better fitting than if all data points would be reduced together?

// Edit:
adding an example. If one would only cluster by distance in the high-dim space one would arrive at the bad clustering where there are more clusters and each cluster would have a higher error when projected down.

the good example uses fewer clusters and they project better on their 2 dimensional sub-spaces (the green cluster being able to even compress to a 1D space)

enter image description here

One Answer

Your task is achieved by Subspace Clustering

Answered by Graph4Me Consultant on October 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP