TransWikia.com

A question about dual representations, kernels and notations used in Bishop's book

Data Science Asked on February 13, 2021

I’m having hard times about kernel functions and dual representations on ‘Pattern recognition from and machine learning’ by Bishop. Here it is the page I’m trying to understand:enter image description here

Of course setting the gradient of $J(textbf{w})$ to $textbf{0}$ and solving for $textbf{w}$ leads to the (6.3). Now he writes about the design matrix $Phi$.

For me (without initially considering nonlinear transformations of dataset), the design Matrix $textbf{X}$ is a $N times p$ matrix with ($n$– number of samples, $d$– number of components/features)

$$textbf{X} = begin{bmatrix}x_1^{(1)} & dots & x_1^{(d)} vdots & & vdots x_n^{(1)} quad & dots & quad x_n^{(d)} end{bmatrix}$$

Considering a nonlinear transformation $phi: , mathbb{R}^D rightarrow mathbb{R}^K$ such that
$$
mathbb{R}^D ni textbf{x}=(x^{(1)}, dots , x^{(d)}) mapsto begin{bmatrix}phi_1(textbf{x}), dots , phi_K(textbf{x}) end{bmatrix}$$

I’m assuming the application $phi$ takes a ROW-vector and brings a ROW vector because I originally assumed that the $D$– dimensional features of matrix $textbf{X}$ were rows. Is this correct?If yes, why? This reasonment would bring me considering the design matrix $Phi$ as the $N times K$ matrix

$$Phi = begin{bmatrix}phi_1(x_1) & dots & phi_K(x_1)
vdots & dots & vdots phi_1(x_N) & dots & phi_K(x_N) end{bmatrix}$$

So why does he write that $n$-th row corresponds is given by $bf phi(textbf{x}_n)^T$?

Also how the hell does he obtain the 6.5 by substitution? I really can’t figure it out. Thank you for your help.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP