Calculating eigen values from principal components and deciding on the number of principal components?

Question

I calculated PCs for my samples and I am showing here data frame that has samples as my rows and PCs as my columns. My question is in order to decide on the number of PCs to keep for my regression analysis is this valid approach?

> head(a)
       PC1      PC2        PC3       PC4      PC5        PC6       PC7
1 -13.0692 3.825460 -2.8089500 -0.120865 -9.53690  2.2582600  0.975514
2 -13.0419 4.076040 -2.3597900  2.326170 -0.73101 -1.5689400  1.642810
3  -9.5570 4.270540 -0.9153700 -0.160893 -2.27807 -1.0854500 -0.551797
4 -11.4407 0.716765 -0.0932982 -1.229210  2.56851 -0.0708945  2.841000
5 -15.0062 6.971110 -2.9324700 -3.033660 -3.73211  1.8029200  0.712720
6 -13.8156 1.667130 -1.2647800  3.929120  4.12255  0.2541560  1.119040
    PC8      PC9      PC10
1 -2.220460  1.15324  3.677270
2 -2.552010 -2.57720  0.111892
3  0.360637  0.30142 -1.288880
4  1.391550 -5.13552 -1.975630
5  1.937330 -1.83419 -1.462170
6 -0.637011 -3.15796 -1.238350
...

a.cov <- cov(a)
a.eigen <- eigen(a.cov)
PVE <- a.eigen$values / sum(a.eigen$values)

> PVE
  [1] 0.49967626 0.22981763 0.07138644 0.04307668 0.03680999 0.02830493
  [7] 0.02526709 0.02384502 0.02135397 0.02046199

So it seems that the first 4 PCs explain about 85% of my variance. Is this the valid way on how to go abotu deciding the number of PCs to keep?

eigenvalues pca r

phil · Answer

Yes, typically this is a good way to select how many principal components to include in your model.
It could help to visualize the eigenvalues as well. Plot them from highest to lowest and find the point where the curve flattens out (so that later eigenvalues make less impact on the information content)

Calculating eigen values from principal components and deciding on the number of principal components?

One Answer

Add your own answers!

Ask a Question