TransWikia.com

Minimizing the Schatten 1-norm over symmetric matrices.

Mathematics Asked by gene on January 25, 2021

Let $X$ be an $n times n$ Hermitian matrix, it follows that we can write $X=A+iB$ where $A$ is symmetric and $B$ is skew-symmetric. Let $S$ be the set of $n times n$ symmetric matrices and define the Schatten $1$-norm (also known as trace norm) as $lVert M rVert_1=sum_j sigma_j(M)$, where $
sigma_j(M)$ are the singular values of $M$. To measure how close $X$ is to the subspace of symmetric the matrices, we can define the distance measure
$$ D(X) = min_{Yin S} lVert X-Y rVert_1 .$$
Expanding $X$ into symmetric and skew-symmetric parts, we get that
$$D(X)= min_{Yin S} lVert (A-Y)+iB rVert_1$$
From this it seems intuitively that the minimum occurs when $Y=A$, in which case $D(X)=lVert BrVert_1$ is simply the $1$-norm of the imaginary part of $X$. How would one show that this quantity is minimized for $A=Y$?

2 Answers

Edit: It seems that I let myself get carried away by specifics, and as a consequence completely missed the bigger picture. My old answer is kept at the bottom for historical purposes. Let it be a reminder to all — including myself — that accepted answers on this site can occasionally be clumsy, needlessly complicated, or potentially even incorrect (though the latter doesn't seem to apply here).

$$ {} $$


New answer — much shorter and more general:

We present the result in a much more general setting (which is probably closer to your intuition).

Definition. Let $X$ be a complex vector space. A complex conjugation is a conjugate-linear map $f : X to X$ which is equal to its own inverse, in other words:

  • For all $x,yin X$ and $lambda,mu in mathbb{C}$ we have $f(lambda x + mu y) = overlinelambda f(x) + overline mu f(y)$;
  • For all $xin X$ we have $f(f(x)) = x$.

Instead of $f$ we usually write $bar{ }, : X to X$. The conjugate of an element $xin X$ is written $overline x$. We say that $x$ is real if $overline x = x$ holds. Every $xin X$ can be uniquely expressed as $x = a + ib$ with $a$ and $b$ real. These $a$ and $b$ are given by $a = text{Re}(x) := tfrac{1}{2}(x + overline x)$ and $b = text{Im}(x) := tfrac{1}{2i}(x - overline x)$. The set of real elements of $X$ forms an $mathbb{R}$-subspace (but not a $mathbb{C}$-subspace!), which we denote by $text{Re}(X)$.

If $(X,lVert:cdot:rVert)$ is a normed space, then a complex conjugation $bar{ }, : X to X$ defines an isometry (of the underlying metric space) if and only if $lVert overline xrVert = lVert xrVert$ holds for all $xin X$.

Proposition. Let $(X,lVert:cdot:rVert)$ be a complex normed space and let $bar{ }, : X to X$ be an isometric complex conjugation. Then for any $xin X$ and $yintext{Re}(X)$ we have $$ lVert x - yrVert : geq : lVert x - text{Re}(x)rVert. $$

Proof. Write $x = a + ib$ with $a,bintext{Re}(X)$, given explicitly by $a := text{Re}(x)$ and $b := text{Im}(x)$. The result follows from a simple application of the triangle inequality:

triangle inequality

Indeed, we have begin{align} hspace{-35mm}2lVert x - text{Re}(x)rVert : = : lVert 2ibrVert : &= : lVert (a + ib) - y : + : y - (a - ib)rVert\[1ex] &leq : lVert (a + ib) - y rVert + lVert y - (a - ib)rVert\[1ex] &= : lVert (a + ib) - yrVert + lVert (a - ib) - yrVert\[1ex] &= : lVert (a + ib) - yrVert + biglVert , overline{(a + ib) - y} , bigrVerttag*{(since $a,b,yintext{Re}(X)$)}\[1ex] &= : 2lVert (a + ib) - yrVerttag*{(since $bar{ },$ is isometric)}\[1ex] &= : 2lVert x - yrVert.tag*{$blacksquare$} end{align}

Now all that remains is to show that the entry-wise complex conjugation $bar{ }, : M_n(mathbb{C}) to M_n(mathbb{C})$ preserves the trace norm. This is not so hard to do. For arbitrary matrices $A,B in M_n(mathbb{C})$ we have $overline{AB} = overline A , overline B$ and $overline{A}^* = overline{A^*}$, hence $$ overline{A}^*,overline A : = : overline{A^*A}. $$ Now it is easy to see that $overline{|A|}$ is a positive semidefinite square root of $overline{A^*A}$, so it follows that $overline{|A|} = |,overline A,|$ holds. Finally, we find $$ lVert overline ArVert_1 : = : text{tr}big(big|,overline A,big|big) : = : text{tr}left(overline{|A|}right) : = : overline{text{tr}(|A|)} : = : overline{lVert ArVert_1} : = : lVert ArVert_1. $$

$$ {} $$

In summary: the entire result follows essentially from one application of the triangle inequality.

$$ {} $$


Old answer — needlessly complicated:

Your intuition is correct, but it's not particularly easy to prove this. We will need quite a few facts about the trace norm. Let $mathbb{C}^{ntimes n}$ denote the algebra of $ntimes n$ complex matrices, and let $lVert :cdot: rVert_infty : mathbb{C}^{ntimes n} to mathbb{R}_{geq 0}$ denote the operator norm (also know as the Schatten $infty$-norm).


Proposition 1 ("non-commutative Hölder inequality", case $p = 1$ and $q = infty$). For any $X,Yin mathbb{C}^{ntimes n}$ we have $lVert XYrVert_1 leq lVert XrVert_1cdot lVert YrVert_infty$.

Proof. See your favourite textbook on Schatten norms. (The only references I can produce at this time are books in functional analysis and operator theory, which are likely to be inappropriate due to being way too advanced.)


Proposition 2. For any $Xin mathbb{C}^{ntimes n}$ we have $|text{tr}(X)| leq lVert XrVert_1$.

Proof. Use this question and the triangle inequality.


Proposition 3. For any $Xin mathbb{C}^{ntimes n}$ there exists a unitary $Uin mathbb{C}^{ntimes n}$ such that $|X| = UX$.

Proof. This is (equivalent to) the polar decomposition.


Corollary 4. For any $Xin mathbb{C}^{ntimes n}$ we have $$lVert XrVert_1 = max_{substack{Y in mathbb{C}^{ntimes n}\lVert Y rVert_infty leq 1}} |text{tr}(XY)|. $$ Furthermore, there exists some $Uin mathbb{C}^{ntimes n}$ with $lVert UrVert_infty = 1$ such that $lVert XrVert_1 = text{tr}(XU)$ holds (note the omission of the absolute value).

Proof. By propositions 1 & 2, any $Yin mathbb{C}^{ntimes n}$ with $lVert YrVert_infty leq 1$ satisfies $$ |text{tr}(XY)| leq lVert XYrVert_1 leq lVert XrVert_1cdot lVert YrVert_infty leq lVert XrVert_1. $$ By proposition 3, there is some unitary $Uin mathbb{C}^{ntimes n}$ such that $|X| = UX$ holds. Note that we have $lVert U rVert_infty = 1$ since $U$ is unitary. Now we have $lVert XrVert_1 = text{tr}(|X|) = text{tr}(UX) = text{tr}(XU)$.


Now that we have all the ingredients, let's get on with the exercise at hand. Recall that we have $X = A + iB$ with $A$ real symmetric and $B$ real skew-symmetric, and we want to find a real symmetric matrix $Y$ minimising the distance $lVert X - YrVert_1$.

Assume without loss of generality that $A = 0$ holds, so that we have $X = iB$. We prove that the minimum occurs at $Y = 0$. Choose some $U in mathbb{C}^{ntimes n}$ satisfying $lVert UrVert_infty leq 1$ as well as $lVert XrVert_1 = text{tr}(XU)$. Now we have $$ text{tr}(XU) = text{tr}big((XU)^Tbig) = text{tr}big(U^TX^Tbig) = -text{tr}big(U^TXbig) = -text{tr}big(XU^Tbig), $$ so we may assume without loss of generality that $U$ is skew-symmetric (replace $U$ by $tfrac{1}{2}(U - U^T)$, if neccesary, then the properties $lVert UrVert_infty leq 1$ and $lVert XrVert_1 = text{tr}(XU)$ still hold).

Now let $Y$ be an arbitrary real symmetric matrix, then we have $$ text{tr}(YU) = text{tr}big((YU)^Tbig) = text{tr}big(U^TY^Tbig) = text{tr}big(U^TYbig) = -text{tr}(UY) = -text{tr}(YU), $$ hence $text{tr}(YU) = 0$. As a result, by corollary 4 we have $$ lVert X - YrVert_1 geq big|text{tr}big((X - Y)Ubig)big| = |text{tr}(XU) - text{tr}(YU)| = big|lVert XrVert_1 - 0big| = lVert XrVert_1, $$ which proves the result.


Closing remarks:

  1. It might be true that we can always choose a skew-symmetric unitary $U$ in the polar decomposition of a skew-symmetric self-adjoint matrix; I'm not entirely sure about that. In any case, proving that seemed like a hassle, so I decided on the above workaround.
  2. Note that we didn't really use that $Y$ was real, only that it was symmetric. As such, it seems that the result extends to the minimisation problem over all complex symmetric matrices. (Come to think of it, we didn't really use that $B$ was real either...)
  3. The minimum is not always unique. If $n$ is even, then the spectrum of $X$ is of the form ${-lambda_1,lambda_1,-lambda_2,lambda_2,ldots}$ with $0 leq lambda_1 leq lambda_2 leq cdots$. If we have $muinmathbb{R}$ and $0 < |mu| leq lambda_1$, then $Y = mu I$ is also at minimal distance. After all, half of the eigenvalues are moved closer to zero and half are moved away from zero, so the sum of their absolute values remains the same.
  4. The same result holds if we replace the trace norm by the Hilbert–Schmidt norm $lVert :cdot: rVert_2$ (also known as the Schatten 2-norm), and the proof is much easier in this case. I'll leave this as an exercise. ;-)

Correct answer by Josse van Dobben de Bruyn on January 25, 2021

If the distance is measured using the Frobenius norm then the problem is almost trivial.
(This post works through the exercise that Josse suggested at the end of his answer)

Use a real symmetric matrix $A=A^T$ and a real skew-symmetric matrix $B=-B^T$ to construct the hermitian matrix $$eqalign{ X &= A + iB quadimpliesquad X = X^H,quad X^T = X^* \ }$$

Use an unconstrained matrix $Uin{mathbb C}^{ntimes n}$ to construct the complex symmetric matrix $$eqalign{ Y &= {rm Sym}(U) doteq tfrac 12(U+U^T) quadimpliesquad Y = Y^T,quad Y^H = Y^* \ }$$ Calculate the gradient of the objective with respect to $U$ $$eqalign{ phi &= big|X-Ybig|_F^2 ;=; (X-Y)^*&:(X-Y) \ dphi &= (Y-X)^*:dY &quad+; conj \ &= (Y-X)^*:{rm Sym}(dU) &quad+; conj \ &= {rm Sym}(Y-X)^*:dU &quad+; conj \ &= (Y-tfrac 12(X+X^*))^*:dU &quad+; conj \ &= (Y-{cal Re}(X))^*:dU &quad+; conj \ frac{partialphi}{partial U} &= (Y-{cal Re}(X))^* quad& frac{partialphi}{partial U^*} = Y-{cal Re}(X) \ }$$ where "+ $conj,$" denotes the complex conjugate of the first half of the expression.

Anyway, setting either gradient to zero yields $$eqalign{ Y &= {cal Re}(X) = A \ }$$


In the above, a colon is used to denote the trace product, i.e. $$eqalign{ A:B &= {rm tr}(A^TB) = sum_{i=1}^msum_{j=1}^n A_{ij}B_{ij} ;=; B:A \ X^*:X &= {rm tr}(X^HX) ;doteq; big|Xbig|_F^2 \ }$$ It is interesting that the result obtained using the Frobenius norm is apparently the same as a much more difficult derivation involving the Nuclear norm.

Answered by greg on January 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP