# Calculating eigen values from principal components and deciding on the number of principal components?

Cross Validated Asked on January 7, 2022

I calculated PCs for my samples and I am showing here data frame that has samples as my rows and PCs as my columns. My question is in order to decide on the number of PCs to keep for my regression analysis is this valid approach?

> head(a)
PC1      PC2        PC3       PC4      PC5        PC6       PC7
1 -13.0692 3.825460 -2.8089500 -0.120865 -9.53690  2.2582600  0.975514
2 -13.0419 4.076040 -2.3597900  2.326170 -0.73101 -1.5689400  1.642810
3  -9.5570 4.270540 -0.9153700 -0.160893 -2.27807 -1.0854500 -0.551797
4 -11.4407 0.716765 -0.0932982 -1.229210  2.56851 -0.0708945  2.841000
5 -15.0062 6.971110 -2.9324700 -3.033660 -3.73211  1.8029200  0.712720
6 -13.8156 1.667130 -1.2647800  3.929120  4.12255  0.2541560  1.119040
PC8      PC9      PC10
1 -2.220460  1.15324  3.677270
2 -2.552010 -2.57720  0.111892
3  0.360637  0.30142 -1.288880
4  1.391550 -5.13552 -1.975630
5  1.937330 -1.83419 -1.462170
6 -0.637011 -3.15796 -1.238350
...

a.cov <- cov(a)
a.eigen <- eigen(a.cov)
PVE <- a.eigen$$values / sum(a.eigen$$values)

> PVE
[1] 0.49967626 0.22981763 0.07138644 0.04307668 0.03680999 0.02830493
[7] 0.02526709 0.02384502 0.02135397 0.02046199


So it seems that the first 4 PCs explain about 85% of my variance. Is this the valid way on how to go abotu deciding the number of PCs to keep?

Yes, typically this is a good way to select how many principal components to include in your model.

It could help to visualize the eigenvalues as well. Plot them from highest to lowest and find the point where the curve flattens out (so that later eigenvalues make less impact on the information content)

Answered by phil on January 7, 2022

## Related Questions

### Time series tracking queue optimization problem

1  Asked on January 14, 2021 by doxav

### Sample log geometric distribution from log probability

1  Asked on January 14, 2021

### what is the likelihood function $p(y|a,tau)$ of simple linear regression model?

1  Asked on January 14, 2021 by user261225

### Forecasting with mixed models

1  Asked on January 13, 2021 by katy

### Why do some researchers use the oxymoron “prevalence rate”?

0  Asked on January 13, 2021

### How to calculate out of sample R squared?

2  Asked on January 13, 2021 by crazydriver

### Denoising 3D matrix

0  Asked on January 13, 2021 by haohan-wang

### In this Bayesian network, where does this posterior probability come from?

1  Asked on January 13, 2021 by vin

### What is wrong with my approach on a custom way of creating Gabor-filter convolution kernels?

0  Asked on January 12, 2021 by g-s-luimstra

### Pseudo-inverse matrix for multivariate linear regression

1  Asked on January 12, 2021 by somethingsomething

### Assessing the representativeness of population sampling

1  Asked on January 12, 2021 by user3136

### How can an A/B test show significant result without enough data

0  Asked on January 11, 2021 by jonas-palaionis

### Cross-lagged model and supplement regressions: Do I have to include my control variables in the supplement regression analyses?

0  Asked on January 11, 2021 by sventon

### Is it Valid to Grid Search Cross Validation for Model Hyperparameter Selection then a separate Cross Validation for Generalisation Error?

2  Asked on January 11, 2021 by benjamin-phua

### Find $E[N^2 | N > 2]$ for a frequency distribution

1  Asked on January 10, 2021 by confusedmathstudent

### Finding meaningful boundaries between two continuous variables in R

0  Asked on January 10, 2021

### Using categorical feature as both a continuous feature, and also doing One hot encoding. Is this overkill?

2  Asked on January 10, 2021 by stats_nerd