# Calculating eigen values from principal components and deciding on the number of principal components?

Cross Validated Asked on January 7, 2022

I calculated PCs for my samples and I am showing here data frame that has samples as my rows and PCs as my columns. My question is in order to decide on the number of PCs to keep for my regression analysis is this valid approach?

> head(a)
PC1      PC2        PC3       PC4      PC5        PC6       PC7
1 -13.0692 3.825460 -2.8089500 -0.120865 -9.53690  2.2582600  0.975514
2 -13.0419 4.076040 -2.3597900  2.326170 -0.73101 -1.5689400  1.642810
3  -9.5570 4.270540 -0.9153700 -0.160893 -2.27807 -1.0854500 -0.551797
4 -11.4407 0.716765 -0.0932982 -1.229210  2.56851 -0.0708945  2.841000
5 -15.0062 6.971110 -2.9324700 -3.033660 -3.73211  1.8029200  0.712720
6 -13.8156 1.667130 -1.2647800  3.929120  4.12255  0.2541560  1.119040
PC8      PC9      PC10
1 -2.220460  1.15324  3.677270
2 -2.552010 -2.57720  0.111892
3  0.360637  0.30142 -1.288880
4  1.391550 -5.13552 -1.975630
5  1.937330 -1.83419 -1.462170
6 -0.637011 -3.15796 -1.238350
...

a.cov <- cov(a)
a.eigen <- eigen(a.cov)
PVE <- a.eigen$$values / sum(a.eigen$$values)

> PVE
[1] 0.49967626 0.22981763 0.07138644 0.04307668 0.03680999 0.02830493
[7] 0.02526709 0.02384502 0.02135397 0.02046199


So it seems that the first 4 PCs explain about 85% of my variance. Is this the valid way on how to go abotu deciding the number of PCs to keep?

Yes, typically this is a good way to select how many principal components to include in your model.

It could help to visualize the eigenvalues as well. Plot them from highest to lowest and find the point where the curve flattens out (so that later eigenvalues make less impact on the information content)

Answered by phil on January 7, 2022

## Related Questions

### what does within 3% of the true proportion mean?

0  Asked on January 21, 2021 by atilla

### What action to do after visualizing box plot of variables of a dataset

1  Asked on January 21, 2021 by alajeb

### Does estimated standard error of the mean says something about the population mean?

1  Asked on January 21, 2021 by funkwecker

### Is the maximum entropy probability distribution only determined through comparison?

0  Asked on January 21, 2021

### Regression when $X$ is random and unobservable

0  Asked on January 20, 2021 by igor-f

### How to use a GAM to predict the probability in binomial data as a function of predictors

1  Asked on January 19, 2021 by wetlabstudent

### How to decode/understand the math behind ACF and PACF?

1  Asked on January 19, 2021 by raghavsikaria

### A/B Testing – Repeated Sampling

0  Asked on January 18, 2021 by thomas-moore

### Anomaly detection for multiple correlated variables (sensors)

0  Asked on January 18, 2021 by shawn-strasser

### Clustering with Likert items and N/A option

1  Asked on January 17, 2021 by matthias

### Case-Control Study Design for Longitudinal Analysis

0  Asked on January 17, 2021 by cat-cuddler

### Multivariate time series vs multi dimensional time series

0  Asked on January 16, 2021 by ss-varshini

### Relation between test and train error with gradient descent iterates

0  Asked on January 16, 2021 by sgg

### How to derive Bias of PCA estimator

0  Asked on January 16, 2021 by baz

### Using gradient information in minimizing error function, in Bishop’s Pattern Recognition

3  Asked on January 16, 2021 by sorcererofdm

### Mean squared error of OLS smaller than Ridge?

5  Asked on January 15, 2021 by aristide-herve

### How to deal with large combinations of factors

0  Asked on January 14, 2021 by mat