# Different formulations of within-class scatter matrix

Cross Validated Asked on December 15, 2020

If we have a dataset $$X= {x_1,x_2,….,x_n}$$ where all the datapoints are in $$d-$$dimensional feature space and there are $$2$$ classes $$c_1$$ and $$c_2$$ for which $$n_1$$ points from $$X$$ are for class $$c_1$$ and rest are for class $$c_2$$. $$n_1$$ points are also for those $$y_i$$ for which $$y_i=v^Tx_i$$ for some vector $$v$$ and class label of $$x_i$$ is $$c_1$$ and rest belongs to class $$c_2$$ means we have $$n_1+n_2 = n$$.
$$m_1$$ is the mean-vector of class $$c_1$$ and $$m_2$$ is the mean-vector of class $$c_2$$. $$S_1$$ and $$S_2$$ are co-variance matrices corresponding to the class $$c_1$$ and $$c_2.$$
Now, in projected space, $$y_i=v^Tx_i$$ for all $$i=1,2,….,n.$$ In this space, $$mu_1$$ is the mean-vector of class $$c_1$$ and $$mu_2$$ is the mean-vector of class $$c_2$$. $$s_1$$ and $$s_2$$ are co-variance matrices corresponding to the class $$c_1$$ and $$c_2.$$

I have to derive $$3$$ things :
$$1)$$ within class scatter is : $$(mu_1 – mu_2)^2 + frac{s_1^2}{n_1} + frac{s_2^2}{n_2}$$
$$2)$$ within class scatter can also be written as: $$frac{1}{n_1n_2}sum_{y_i in class;c_1} sum_{y_j in class;c_2} (y_i – y_j)^2$$
(Here, $$y_i in class;c_1$$ means $$y_i = w^Tx_i$$ and class-label of $$x_i$$ is $$c_1$$ and $$y_j in class;c_2$$ means $$y_j = w^Tx_j$$ and class-label of $$x_j$$ is $$c_2$$)
$$3)$$ Total scatter is : $$frac{s_1^2}{n_1} + frac{s_2^2}{n_2}$$

According to Fisher Linear Discriminant,
A) within class-scatter($$S_w$$) = $$sum_{x_i in c_1}(x_i – m_1)(x_i – m_1)^T$$ + $$sum_{x_i in c_2}(x_i – m_2)(x_i – m_2)^T$$
B) $$mu_1 = v^Tm_1$$ and $$mu_2 = v^Tm_2$$
C) $$(n_1 s_1)^2 = v^T(n_1S_1)v$$ and $$(n_2 s_2)^2 = v^T(n_2S_2)v;$$ where $$n_1S_1+n_2S_2 =S_w$$
D) $$v= S_w^{-1} (m_1 – m_2)$$
E) $$S_1 = sum_{x_i in c_1} (x_i – m_1)(x_i – m_1)^T$$ and $$S_2 = sum_{x_i in c_2} (x_i – m_2)(x_i – m_2)^T$$
Now, for $$1)$$
$$(mu_1 – mu_2)^2 + frac{s_1^2}{n_1} + frac{s_2^2}{n_2} = (v^Tm_1 – v^Tm_2)^2 + frac{v^TS_1v}{n_1^2}+ frac{v^TS_2v}{n_2^2}$$
Now, how to introduce $$x_i$$ here to get the $$S_w$$.
I was manipulating all these things to get the answer but I was not getting it.
Can anyone please give a hint how to get all these derivations. Any help would be appreciated.

## Related Questions

### What’s wrong in this derivation of back-propagation errors?

1  Asked on December 27, 2021

### How does probability of default evolve over time?

1  Asked on December 27, 2021 by mathella

### Random effects not appearing for some levels in lmer model – Why would that be?

1  Asked on December 25, 2021

### Logistic regression with lasso versus PCA?

2  Asked on December 25, 2021 by manas

### Dealing with singular fit in mixed models

2  Asked on December 25, 2021 by user33268

### groups, levels and denominator dof in mixed effect models

1  Asked on December 25, 2021

### p-value random effect in glmer() in lme4 package

1  Asked on December 25, 2021 by ribelles

### Question about the structures and conditions of validity of the mixed-effect model

1  Asked on December 25, 2021

### Increasing multicollinearity in multilevel/hierarchical modeling?

1  Asked on December 25, 2021

### Specifying model in glmer() – interaction terms

1  Asked on December 25, 2021

### lmer or binomial GLMM

1  Asked on December 25, 2021

### Is my design nested or crossed? Question concerning specifying random effects with lmer in R

1  Asked on December 25, 2021

### Interpretation of binomial GLM (glmer) with interaction and results description

1  Asked on December 25, 2021 by catarina-toscano

### How to compute standard deviation from mean absolute error?

1  Asked on December 25, 2021

### If the joint density $f_{X_1,…,X_n}(x_1,…,x_n)$ is symmetric about the origin, does this imply that each marginal cdf $F_{X_i}(0)=1/2$?

0  Asked on December 25, 2021

### Graphic model factorizing, marginalization

1  Asked on December 25, 2021 by k-k-mcdonald

### Can PCA be applied to a subset of features

1  Asked on December 25, 2021 by mayank-kumar

### $E(f(X_0)f'(X_l)) = E(f'(X_0) f(X_l))$ for a stationary process?

0  Asked on December 25, 2021 by l-d

### Testing linear hypothesis with Wald test: unintuitive results

0  Asked on December 25, 2021 by carsonwhit