TransWikia.com

create and use weights in Python to perform weighted correlation and PCA

Data Science Asked by Zðkaria Bóüsserghine on September 26, 2020

I need help with the principal components’ analysis code below in 3 ways:

Write Code to create RIM (RAKE) weighting.
I’m trying to ensure that using the code below, I get exactly the same results than in SPSS.
If you change the weights below to 1, the two codes needs to yield the same results (weighting with 1 is the same than no weighting). However the two codes below do not give the same results if you change the weighing to 1 !
Here are the python codes I wrote:

Without weighting

import numpy as np
import pandas as pd
import math
import statistics
from sklearn.decomposition 
from sklearn.preprocessing import StandardScaler

d = {
'gender':[0,0,1,1,1,1,1,1],
'v1':[17,18,11,16,11,18,15,19],
'v2':[2.54,2.71,2.07,2.50,1.96,2.17,1.63,2.36],
'v3':[2.02,2.47,2.12,2.53,1.50,2.39,2.45,2.02],
'v4':[2.34,2.08,1.49,2.05,1.70,2.42,1.68,2.21]
    }
df = pd.DataFrame(data=d)

x = StandardScaler().fit_transform(df.loc[:, ['v1','v2','v3','v4']])
cor_mat1 = np.corrcoef(x.T)
eig_vals, eig_vecs = np.linalg.eig(cor_mat1)
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]
eig_pairs.sort(key=lambda x: x[0], reverse=True)
FAC1_1 = x.dot(eig_pairs[0][1].reshape(4,1))

print(FAC1_1)   


**With weighting**

import numpy as np
import pandas as pd
import math
import statistics
from sklearn.preprocessing import StandardScaler

def m(x, w):
    return np.sum(x * w) / np.sum(w)
def cov(x,y, w):
    return np.sum(w * (x - m(x, w)) * (y - m(y, w))) / np.sum(w)
def corr(df, w):
    dt = pd.DataFrame(np.nan, index=df.columns, columns=df.columns)    
    for i in range(df.shape[1]-1):
        dt.iloc[i , i]=1
        for j in range(1,df.shape[1]):
            x=cov(df.iloc[: , i], df.iloc[: , j], w) / np.sqrt(cov(df.iloc[: , i],df.iloc[: , i], w) * cov(df.iloc[: , j],df.iloc[: , j], w))
            dt.iloc[i , j]=x
            dt.iloc[j , i]=x
    dt.iloc[df.shape[1]-1 , df.shape[1]-1]=1    
    return dt

d = {
'gender':[0,0,1,1,1,1,1,1],
'v1':[17,18,11,16,11,18,15,19],
'v2':[2.54,2.71,2.07,2.50,1.96,2.17,1.63,2.36],
'v3':[2.02,2.47,2.12,2.53,1.50,2.39,2.45,2.02],
'v4':[2.34,2.08,1.49,2.05,1.70,2.42,1.68,2.21],
'weight':[2,2,0.666666667,0.666666667,0.666666667,0.666666667,0.666666667,0.666666667]
    }
df = pd.DataFrame(data=d)
x = pd.DataFrame(df.loc[:, ['v1','v2','v3','v4']])

cor_mat1 = corr(x,df['weight'])
eig_vals, eig_vecs = np.linalg.eig(cor_mat1)
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]
eig_pairs.sort(key=lambda x: x[0], reverse=True)
FAC1_1 = x.dot(eig_pairs[0][1].reshape(4,1))

print(FAC1_1)

Here is the SPSS syntax:

FACTOR
  /VARIABLES v1 v2 v3 v4
  /MISSING LISTWISE 
  /ANALYSIS v1 v2 v3 v4
  /PRINT INITIAL EXTRACTION
  /CRITERIA MINEIGEN(1) ITERATE(25)
  /EXTRACTION PC
  /ROTATION NOROTATE
  /SAVE REG(ALL)
  /METHOD=CORRELATION.

Spss results

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP