TransWikia.com

Hypothesis vs Hyperplane in Machine Learning

Data Science Asked on January 11, 2021

I am finding it hard to understand the clear difference between Hypothesis and Hyperplane.

I know that Hypothesis is a candidate model that maps inputs to outputs after training . And , Hyperplane is the decision boundary in a classification algorithm .

But , I can’t seem to understand how the two are differentiated in equations .

Can someone help me understand their differences in equations with some visualizations ?

One Answer

Let's say you want to learn a specific mapping from $mathbb{R}^{n}$ to $mathbb{R}^{m}$.

The following elaboration assumes that you are referring to supervised learning.

Hyperplane

Hyperplanes play a key role in neural networks.

Consider the set $H_{v,d} := {x in mathbb{R}^{n} mid langle x,v rangle = d } $ for $v in mathbb{R}^{n}$ and $d in mathbb{R}$.

For $v = 0$, we have $H_{v,d} = begin{cases} emptyset & d neq 0 mathbb{R}^{n} & d = 0 end{cases}$

If $v neq 0$, $dim(H_{v,d}) = n-1$ and $H_{v,d}$ is a hyperplane.

If $d= 0 $, $H_{v,d}$ if a vector space (going through the origin), otherwise it is an affine space.

In general, a hyperplane is an affine subspace with co-dimension 1, which is of the form $H = v + U := {v+u mid u in U }$, where $U$ is a subspace of dimension $dim(U) = n-1$.

For example in a binary classification task, a hyperplane can be used to seperate the two classes. A geometric explanation of the role of hyperplane in neural networks can be found here.

In short, a neural network uses for each neuron a hyperplane (in the hidden or output layer) to define the output value of the neuron. All points within the same side (with respect to the hyperplane) are either mapped the same value (using a heavy-side function), or it depends on the distance to the hyperplane (e.g. using the sigmoid function). This understanding can be used to interprete the mapping of an input vector to a layer $l>1$. Essentially a neural network learns an arrangement of hyperplanes, which define regions. Each region is mapped to the same value (in case of the heavy-side function) or again, it further depends on the position within such a region (see here)


Model Hypothesis

A model hypothesis usually refers to something else. We consider a set of model hypothesis $mathcal{H}$, and each model $h in mathcal{H}$ may be used as the desired mapping. For example, we can define the set $$P[n] := left { begin{align*} f: mathbb{R}^{n} &rightarrow mathbb{R} x &mapsto sum_{r = 0}^{n} sum_{substack{b_{1}+ldots+b_{n} = r b_{k} in mathbb{N}, forall k} } a_{b_{1},ldots,b_{r}} prod_{s = 1}^{r} x_{s}^{b_{s}} end{align*} middle rvert a_{b_{1},ldots,b_{r}} in mathbb{R} right }. $$

The set collects all multivariate polynomials of degree at most $n$.

Then we can consider the hypothesis set $mathcal{H} := left { begin{align*} f:mathbb{R}^{n} &rightarrow mathbb{R}^{m} x &mapsto begin{pmatrix} p_{1}(x) vdots p_{m}(x) end{pmatrix} end{align*} middle lvert p_{1},ldots,p_{m} in P[n] right }$.

Then, $h in mathcal{H}$ maps an input vector $x$ to $h(x) = y$, where each component $y_{u}$ is given by a multivariate polynomial function in $x$ of degree at most $n$.

The task of the training is then to find the best coefficients which results in an optimal hypothesis $h^{*} in mathcal{H}$, which is the model that is used to perform inference. Here, optimal means that we find the best model within the set of all considered models in $mathcal{H}$.

Likewise for neural networks, if the "architecture" of the neural network is fixed, this defines the hypothesis set $mathcal{H}$, where each $h in mathcal{H}$ uses the same neural network architecture, but with a specific choise of weights.

Training a neural network then delivers a model $h in mathcal{H}$, with "optimal" weights (in practice the weights are often not optimal).

So in short, the set of model hypothesis $mathcal{H}$ defines which mappings can be used. The training within supervised learning then selects the best hypothesis $h in mathcal{H}$.

Answered by Graph4Me Consultant on January 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP