TransWikia.com

Modelling the probability of class membership using k-NN and associated distances

Cross Validated Asked by Alexis Drakopoulos on November 26, 2021

I have a Euclidean space in which observations of a similar class are close and usually non-intersecting. I use k-NN to then classify new samples.

What I currently do is find the k nearest samples (L2 distance). Then I multiply the number of occurances per class by the inverst of the L2 distance.

Eg if my local neighbourhood has 3 samples with classes [0, 0, 1] and distances [1.5, 2, 1.2] instead of simply going class 1 is the closest, I instead get [1.1667, 0.8333] which is [$frac{1}{1.5} + frac{1}{2}, frac{1}{1.2}$]. I then choose the largest value, so in this case the class membership would be 0.

I want to now develop an intuition as to the uncertainity surrounding this decision. Essentially getting a probability of belonging to the classes.

If I was just looking at 1 class I might try to fit logistic regression, but I am wondering if anyone has any ideas as to what might be an elegant solution. For example when the majority of items are close and of the same class it should be a high probability, but when there are more collisions a lower probability.

EDIT:

The solution proposed here (without distance weighting): https://stats.stackexchange.com/a/83607/217058 is interesting and could be extended. We can say we have the distances $D$ of $K$ points in our neighbourhood, with $D_i in D$ being the distances of points belonging to class $Y_i$. We can then define some measure of uncertaintity as;
$$P(Y_i | x) = frac{sum_i frac{1}{D_i} + s}{K + Cs}$$
and maybe pass that through softmax since it isn’t technically a probability yet.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP