TransWikia.com

Efficient method of performing within matrix similarity

Data Science Asked on July 20, 2021

I want to compute a similarity comparison for each entry in a dataset to every other entry that is labeled as class 1 (excluding the current entry if it has a label of 1). So, consider a matrix of training data that has columns for ID and class/label, and then a bunch of data columns.

ID   Label   var1   var2   var3 ... varN
1    1       0.26   0.44   0.2      0.11
2    0       0.13   0.34   0.14     0.21
3    1       0.22   0.34   0.45     0.57
4    1       0.45   0.13   0.67     0.78
5    0       0.32   0.76   0.11     0.67
.
.
.

There are several thousand rows with entries like this. I want to compute the similarity between each row and every other row where Label==1. So for ID==1, I would like to compute the similarity for ID==3 and ID==4; for ID==2, I would like to compute similarity for ID==1, ID==3, and ID==4; and so on for every single row.

Another way to think about this is: I have a matrix A and I’m an forming matrix B which is a subset of A (i.e., entries of A where Label==1). I want to compute similarity between A and B, but the output matrix should exclude similarities where the entries are the same (as indicated by ID).

Right now, I have this implemented as a for loop in R, which is unbearably slow (it takes around 10 minutes to execute for about 3000 rows).

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP