TransWikia.com

How determine the bandwidth of a gaussian kernel such that k nearest points represent a certain % of sum weight

Cross Validated Asked by tzirtzi on November 24, 2021

I have a dataset to which I am applying a gaussian kernel. I want to determine the bandwidth (sd) of the kernel such that, on average, the k nearest points will represent a specified proportion of the resulting weights. For example, say I have a hundred observations with m=0.5 and sd=0.25, how do I determine the bw of a gaussian kernel such that, for any given point, on average the 10 nearest points will represent 95% of the sum of all weights?

One Answer

Not exactly sure, but a 'rough' approximation would be to have 10 observations on average lying within 2 standard deviations. The thing is the area that contains the 10-nearest-neighbours is actually adaptive to the local density. If you use a constant bandwidth, the window size of the gaussian kernel would be static, so in some cases might contain less or much more than 10 neighbours (especially at the boundaries). If your goal to select a constant bandwidth I would go the simple way:

  • Select a range of bandwidths and find weights for each one
  • For each query point and for each bandwidth, estimate the ratio of weights that you target and create a histogram
  • Select the bandwidth that fits your goal

Answered by Akylas Stratigakos on November 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP