TransWikia.com

Computing the matrix differential/derivative of the matrix$rightarrow$scalar function $log det(BCB^T)$

Mathematics Asked by xedg on December 3, 2021

I am trying to learn how to replicate the matrix calculus done in the following paper: https://arxiv.org/pdf/1811.11433.pdf. To learn how to do this, I a using the following book I found (https://www.mobt3ath.com/uplode/book/book-33765.pdf), by Karim Abadir and Jan Magnus.

I attempted to start by find the differential of function H given below. However, it does not look like I am on the right track. Can someone tell me if my calculations below are correct so far? Or at least if I am using the correct book to be able to understand the paper I listed? I noticed that the book uses the ‘vec’ operator to treat the Hessian of a matrix function as a matrix while the paper uses an order 4 tensor, so I am not sure if I am using the right approach. Thanks for the help.

My work so far:

Let $H(B)=logdet BCB^T$ where $B$ and $C$ are square matrices of dimension $n$ and $C$ is symmetric. Let $F(B)=BCB^T$ and $G(R)=logdet R$ so that $H(B)=G(F(B))$.

begin{align*}
dF &= d(B)CB^T + BCd(B^T) hspace{0.4cm} dG(R) = Tr[R^{-1} dR] \
\
dH &= Tr[(BCB^T)^{-1} (d(B)CB^T + BCd(B^T))] textbf{ Take transpose}\
&= Tr[(BCd(B)^T+d(B)CB^T)(BCB^T)^{-1}] \
&=Tr[BCd(B)^T(BCB^T)^{-1}] + Tr[(d(B)CB^T(BCB^T)^{-1}] \
&=Tr[BCd(B)^T(B^T)^{-1}C^{-1}B^{-1}] + Tr[(d(B)CB^T(B^T)^{-1}C^{-1}B^{-1}] textbf{ Use cyclic property}\
&= Tr[(B^T)^{-1} d(B)^T] + Tr[B^{-1} d(B)] = 2* Tr[B^{-1}d(B)]
end{align*}

The corresponding total derivative is then $DH=2*(vec (B^{-1}))^T$ by the book’s notation. Then I assume I would just ‘unvectorize’ this to get the derivative in the paper’s notation? Is this a good start to calculating the gradient of the loss function in the paper I listed. Thanks.

One Answer

First, calculate the gradient for the full matrix. $$eqalign{ X &= BCB^T = X^T \ phi &= logdet X \ dphi &= X^{-T}:dX \ &= X^{-1}:2operatorname{sym}(dB,CB^T) \ &= 2X^{-1}BC:dB \ frac{partialphi}{partial B} &= 2X^{-1}BC \ }$$ Repeat the calculation for the diagonalized matrix. $$eqalign{ Y &= (Iodot X) = Y^T \ psi &= logdet(Y) \ dpsi &= 2Y^{-1}BC:dB \ frac{partialpsi}{partial B} &= 2Y^{-1}BC \ }$$ The Pham cost function is a linear combination of these functions. $$eqalign{ {cal L} &= frac{psi - phi}{2} \ frac{partial{cal L}}{partial B} &= Big(Y^{-1}-X^{-1}Big)BC ;doteq; G_{std} qquad&big({rm standard;gradient}big) \\ }$$ However, rather than the standard gradient, the linked paper utilizes the relative gradient, which is defined in terms of a small perturbation matrix $(E)$. $$eqalign{ d{cal L} &= {cal L}(B+EB) - {cal L}(B) \ &= G_{std}:EB \ &= G_{std}B^T:E \ &= G:E \ \ G &= Big(Y^{-1}-X^{-1}Big)BCB^T \ &= Big(Y^{-1}-X^{-1}Big)X \ &= (Y^{-1}X-I) \ }$$ This is the content of the first part of Eq (3) on the second page, except it is written in component form, i.e. $$eqalign{ G_{ab} &= frac{X_{ab}}{X_{aa}} - delta_{ab} \\ }$$


NB:   The paper uses bra-ket notation for the Frobenius product, whereas I use a colon, e.g. $$A:B = langle A|Brangle = {rm Tr}(A^TB)$$ because it's a lot easier to type (and it looks better).

The Kronecker-vec operation can flatten a matrix expression into a vector $${rm vec}(AXB)=(B^Totimes A){rm vec}(X) ;=; Mx$$ Using the vec operation, a gradient matrix can be flattened to a long vector $$eqalign{ frac{partialphi}{partial X} &= G quad&({rm matrix}) \ dphi &= G:dX \ &= {rm vec}(G)&:{rm vec}(dX) \ &= g:dx \ frac{partialphi}{partial x} &= g quad&({rm vector}) \ \ G,X &in{mathbb R}^{mtimes n} \ g,x &in {mathbb R}^{mntimes 1} \ }$$ Similarly, a 4th order Hessian tensor can be flattened into a large matrix $$eqalign{ {cal H} &= frac{partial G}{partial X} in{mathbb R}^{mtimes ntimes mtimes n} quad&({rm tensor}) \ H &= frac{partial g}{partial x} in {mathbb R}^{mntimes mn} quad&({rm matrix}) \ }$$

Answered by greg on December 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP