TransWikia.com

Relation between gradient norm and distance to optimizer

Mathematics Asked by durdi on February 13, 2021

For context, this problem arose when I was trying to prove convergence rate bounds for convex optimization algorithms.

Let $f: mathcal{X} to mathbb{R}$, where $mathcal{X} subset mathbb{R}^n$ is convex. Moreover, let $f$ be a $mu$-strongly convex, $G$-Lipschitz continuous, with $L$-Lipschitz continuous gradients (a.k.a. $L$-smooth). We can also assume that $|x – y|_2 leq R$ for all $x,y in mathcal{X}$.

Define $x^* := text{arg}min_{xinmathcal{X}} f(x)$. My goal is to prove the following statement:

$$
frac{1}{2L}|nabla f(x)|^2_2 – frac{mu}{2}|x – x^*|^2_2 leq 0, quad forall x in mathcal{X}. tag{1}
$$

So far, I was able to show that it holds for the special case when $L=mu>0$, and $x^*$ belongs to the relative interior of $mathcal{X}$ (i.e., $nabla f(x^*) = 0$). To do so, I used the following argument:

begin{align*}
frac{1}{2L}|nabla f(x)|^2_2 – frac{mu}{2}|x – x^*|^2_2
&leq frac{1}{2L}|nabla f(x) – nabla f(x^*)|^2_2 – frac{mu}{2}|x – x^*|^2_2 \
&leq frac{L}{2}|x – x^*|^2_2 – frac{mu}{2}|x – x^*|^2_2 \
&= 0.
end{align*}

QUESTION: is it possible to prove that $(1)$ holds for general $L,mu$? If not, are there other special cases/assumptions under which the statement holds?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP