Keep the gradient flowing

Optimization inequalities cheatsheet

Most proofs in optimization consist in using inequalities for a particular function class in some creative way. This is a cheatsheet with inequalities that I use most often. It considers class of functions that are convex, strongly convex and $L$-smooth.

Setting. $f$ is a function $\mathbb{R}^p \to \mathbb{R}$. Below are a set of inequalities verified when $f$ belongs to a particular class of functions and $x, y, z \in \mathbb{R}^p$ are arbitrary elements in its domain. For simplicity I'm assuming that functions are differentiable, but most of these are also true replacing the gradient with a sub-gradient.

$f$ is $L$-smooth

This is the class of functions that are differentiable and its gradient is Lipschitz continuous.

  1. $\|\nabla f(y) - \nabla f(x) \| \leq L \|x - y\|$
  2. $|f(y) - f(x) - \langle \nabla f(x), y - x\rangle| \leq \frac{L}{2}\|y - x\|^2$
  3. $\nabla^2 f(x) \preceq L\qquad \text{ (assuming $f$ is twice differentiable)} $

  1. is the definition of gradient Lipschitz.
  2. follows from Taylor's theorem with remainder.
  3. follows from using $y = x + \varepsilon s$ in a and taking the limit $\varepsilon \to 0$.

$f$ is convex

  1. $f(\lambda x + (1 - \lambda)y) \leq \lambda f(x) + (1 - \lambda)f(y)$ for all $\lambda \in [0, 1]$.
  2. $f(x) \leq f(y) + \langle \nabla f(x), x - y\rangle$
  3. $0 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
  4. $f(\mathbb{E}X) \leq \mathbb{E}[f(X)]$ where $X$ is a random variable (Jensen's inequality).
  5. $x = \text{prox}_{\gamma f}(x) + \gamma \text{prox}_{f^*/\gamma}(x/\gamma)$, where $f^*$ is the Fenchel conjugate and $\text{prox}_{\gamma f}(x)$ is the proximal operator of $\gamma f$. This identity is sometimes referred to as Moreau's decomposition

$f$ is both $L$-smooth and convex

  1. $\frac{1}{L}\|\nabla f(x) - \nabla f(y)\|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
  2. $0 \leq f(y) - f(x) - \langle \nabla f(x), y - x\rangle \leq \frac{L}{2}\|x - y\|^2$
  3. $f(x) \leq f(y) + \langle \nabla f(x), x - y\rangle - \frac{1}{2 L}\|\nabla f(x) - \nabla f(y)\|^2$
  4. $f(x) \leq f(y) + \langle \nabla f(z), x - y \rangle + \frac{L}{2}\|x - z\|^2$ (three points descent lemma)

  1. Follows from using inequality 2.b with the function $f - \frac{\mu}{2}\|\cdot\|^2$.
  2. TODO
  3. The function $f$ being smooth implies that its Fenchel conjugate $f^\star$ is $\frac{1}{L}$-strongly convex, so using inequality 4.a with $f^\star$ we get \begin{equation} f^\star(\alpha) \leq f^\star(\beta) + \langle \nabla f^\star(\alpha), \alpha - \beta\rangle - \frac{1}{2L}\|\alpha - \beta\|^2 \end{equation} Using the Fenchel identities $f^\star(\nabla f(x)) = \langle \nabla f(x), x\rangle - f(x)$ and $\nabla f^\star(\nabla f(x)) = x$ with $\alpha = \nabla f(y)$ and $\beta = \nabla f(x)$ we get \begin{align} \langle \nabla f(y), y\rangle - f(y) \leq \langle \nabla f(x), x\rangle - f(x) + \langle y, \nabla f(y) - \nabla f(x)\rangle - \frac{1}{2L}\|\nabla f(y) - \nabla f(x)\|^2 \end{align} Rearranging terms gives the desired inequality.There are other ways to prove this inequality that don't require to use the heavy machinery of Fenchel conjugates, see for example Nesterov's book. Personally, I find Nesterov's proof rather "magical", as one needs to come up with the right proxy functions, and prefer the straightforwardness of the above one.

$f$ is $\mu$-strongly convex

This includes the set of functions $f$ such that $f - \frac{\mu}{2}\|\cdot\|^2$ is convex. It includes the set of convex functions with $\mu=0$. Here $x^*$ denotes the minimizer of $f$.

  1. $f(x) \leq f(y) + \langle \nabla f(x), x - y \rangle - \frac{\mu}{2}\|x - y\|^2$
  2. $f(x) \leq f(y) + \langle \nabla f(y), x - y\rangle + \frac{1}{2\mu}\|\nabla f(x) - \nabla f(y)\|^2$
  3. $\mu\|x - y\|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
  4. $\frac{\mu}{2}\|x-x^*\|^2\leq f(x) - f(x^*)$
  5. $f(\alpha x + (1 - \alpha)y) \leq \alpha f(x) + (1 - \alpha)f(y) - \alpha(1 - \alpha)\frac{\mu}{2}\|x - y\|^2$
  1. Follows from using inequality 2.b with the function $f - \frac{\mu}{2}\|\cdot\|^2$.

$f$ is both $L$-smooth and $\mu$-strongly convex.

  1. $\frac{\mu L}{\mu + L}\|x - y\|^2 + \frac{1}{\mu + L}\|\nabla f(x) - \nabla f(y)\|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
  2. $\mu \preceq \nabla^2 f(x) \preceq L \qquad \text{ (assuming $f$ is twice differentiable)}$
  3. $f(x) \leq f(y)+ \langle \nabla f(x), x - y\rangle - \frac{\mu}{2}\|x - y\|^2 - \frac{1}{2 (L - \mu)}\|\nabla f(x) - \nabla f(y) - \mu(x - y)\|^2$
  1. TODO
  2. TODO
  3. We have that if $f$ is $L$-smooth and $\mu$-strongly convex then $f - \frac{\mu}{2}\|\cdot\|^2$ is $(L-\mu)$-smooth and convex. Using inequality 3.c on the function $f - \frac{\mu}{2}\|\cdot\|^2$ we then have \begin{align} f(x) &\leq f(y) - \frac{\mu}{2}\|y\|^2 + \frac{\mu}{2}\|x\|^2 + \langle \nabla f(x) - \mu x, x - y\rangle - \frac{1}{2 (L - \mu)}\|\nabla f(x) - \nabla f(y) - \mu(x - y)\|^2 \\ &= f(y)+ \langle \nabla f(x), x - y\rangle - \frac{\mu}{2}\|x - y\|^2 - \frac{1}{2 (L - \mu)}\|\nabla f(x) - \nabla f(y) - \mu(x - y)\|^2 \end{align}

Misc

Another great source of general inequalities is this cheatsheet by László Kozma.

References

Most of these inequalities appear in the Book: "Introductory lectures on convex optimization: A basic course" by Nesterov (2013, Springer Science & Business Media). Another good (and free) resource is the book "Convex Optimization" by Stephen Boyd and Lieven Vandenberghe.