How to properly train your Self-Organized Map?

Question

I stumbled recently upon the Self-Organized Map, an ANN architecture used to cluster high dimensional data, while simultaneously imposing a neighbourhood structure on it. It's trained through a competitive learning approach where neurons compete to respond to a given input. The strongest responding neuron / best matching unit (BMU) is rewarded by being moved closer to the given input in the data space, as well its neighbours. However, within the literature and implementations, I find some deviations in how this training is implemented. Specifically, the influence of the BMU on its neighbors are mitigated using a neighborhood function

where d is the distance of the BMU to the input and σ(t) is a radius which is decreased during the training. Effectively, resulting in the influence of the readjustment of the BMU on its neighbourhood shrinking during training. The difference I find concerns the implementation of the shrinking of σ(t) . Most explanations and blog posts describe an exponential decrease

where λ is a decay constant which can be tuned arbitrarily. Alternatively, I find that some implementations do not really use this exponential decay, but instead use a linear interpolation of the form

where n is the number of training epochs and r is radius which is altered depending the training phase. These implementations further explicitely between a 'rough' training phase where

with e.g. SOM.dims=(100,100) being for a 100x100 sized SOM, and a 'fine-tuning' training phase where

My problem is that I do not quite understand why there seems to be this disagreement and what the 'canonical' way of training a SOM is. It certainly makes sense to divide the training into a 'rough' and a 'fine-tuning' phase, but why most newer descriptions neglect this without further discussion and only consider a single training phase with exponential decay is baffling me a bit.

Steve · Answer

An answer from Kohonen, inventor of the self-organized map himself:

"The true mathematical form of σ(t) is not crucial, as long as its
value is fairly large in the beginning of the process. Say, on the
order of half of the diameter of the grid, whereafter it is gradually
reduced to a fraction of it in about 1000 steps."

From: Kohonen, T., 2013. Essentials of the self-organizing map. Neural networks, 37, pp.52-65.

How to properly train your Self-Organized Map?

One Answer

Add your own answers!

Ask a Question