How to understand mapping function of kernel?

Question

For a kernel function, we have two conditions one is that it should be symmetric which is easy to understand intuitively because dot products are symmetric as well and our kernel should also follow this. The other condition is given below

There exists a map $φ:R^d→H$ called kernel feature map into some high dimensional feature space $H$ such that $∀x,x′$ in $R^d:k(x,x′) = <φ(x),φ(x′)>$

I understand that this means that there should exist a feature map that will project the data from low dimension to any high dimension $D$ and kernel function will take the dot product in that space.

For example, the Euclidean distance is given as

$d(x,y)=sum_{i}(x_i-y_i)^2= <x,x>+<y,y>-2<x,y>$

If I look this in terms of second condition how do we know that doesn’t exist any feature map for euclidean distance? What exactly are we looking in feature maps mathematically?

kernel trick machine learning

4.Pi.n · Answer

To check if the kernel $K$ (feature map function) is a valid kernel or not, $K$ must satisfying Mercer's condition.
Mercer's condition: The kenel $K$, is a valid kernel if and only if  $K$ is positive definite.
This satisifed in case of there's a matrix $c$ such that
$M$ = $c^T K c$, where $K$ is the Gram matrix (Kernel resultant matrix).
$st:$ $M geq 0, for ,, all ,, real , value , c_i$
Or
$sum_{i=1}^{n}sum_{j=1}^{n} c_i k_i{_j} c_j geq 0 $
$geq 0$ $Leftrightarrow$ positive semi-definite
before going to the intuition of why using Mercer's condition & intuitive proof, I would like to mention that, check the existence of the kernel $K$ and Mercer's condition has nothing to do with feature map itself, however it's a crucial for the convergence of the quadratic program such (SVM), or more generally the convexity.

Intuition
$K$ is a symmetric matrix  then by the spectral theorem $K$ is a diagonalizable matrix (in other words, we can decompose it), so we can reformulate K by eigendecomposition
$K = VDV^T$
where $V$ is an orthogonal matrix and its column are the eigenvectors, and $D$ is a diagonal matrix with eigenvalues $λ_i{_j}$ on the diagonal.
If : $exists $ $λ_i{_j}$ $for ,, all ,, i=j$ $such  that  λ_i{_j} < 0$
($exists $) $Leftrightarrow$ There's exist
Then the kernel $K$ not a valid kernel, beacuse of the negative eigenvalue means, there's a such point at which the Hessian is indefinite, in other words, the critical  point is a saddle (the function is strongly convex-concave), then the primal problem has no solution, and even dual problem could be very expensive to compute (arbitrarily large).
by Sylvester's criterion, $K$ has negative eigenvalues if and only if it is not positive semi-definite.

Geometric intuition
The feature mapping by kenel function or the inner product of the features vector (row matrix) $x_i$ , $x_j$, $K=langle phi(x_i),, phi(x_j)rangle$ is same as the mesurement of the similarity of two functions by the definition of Hilbert space concept, visually, the kernel function is a congruent transformation, which is a transformation in an isometric space, if we have a negative eigenvalue, then transformation has occurred in the opposite direction in isometric space, in other words, the image reflecte across some axis.

How to understand mapping function of kernel?

One Answer

Intuition

Geometric intuition

Add your own answers!

Ask a Question