Optimization inequalities cheatsheet
Most proofs in optimization consist in using inequalities for a particular function class in some creative way. This is a cheatsheet with inequalities that I use most often. It considers class of functions that are convex, strongly convex and $L$-smooth.
Setting. $f$ is a function $\mathbb{R}^p \to \mathbb{R}$. Below are a set of inequalities verified when $f$ belongs to a particular class of functions and $x, y, z \in \mathbb{R}^p$ are arbitrary elements in its domain. For simplicity I'm assuming that functions are differentiable, but most of these are also true replacing the gradient with a sub-gradient.
$f$ is $L$-smooth
This is the class of functions that are differentiable and its gradient is Lipschitz continuous.
- $\|\nabla f(y) - \nabla f(x) \| \leq L \|x - y\|$
- $|f(y) - f(x) - \langle \nabla f(x), y - x\rangle| \leq \frac{L}{2}\|y - x\|^2$
- $\nabla^2 f(x) \preceq L\qquad \text{ (assuming $f$ is twice differentiable)} $
- is the definition of gradient Lipschitz.
- follows from Taylor's theorem with remainder.
- follows from using $y = x + \varepsilon s$ in a and taking the limit $\varepsilon \to 0$.
$f$ is convex
- $f(\lambda x + (1 - \lambda)y) \leq \lambda f(x) + (1 - \lambda)f(y)$ for all $\lambda \in [0, 1]$.
- $f(x) \leq f(y) + \langle \nabla f(x), x - y\rangle$
- $0 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
- $f(\mathbb{E}X) \leq \mathbb{E}[f(X)]$ where $X$ is a random variable (Jensen's inequality).
- $x = \text{prox}_{\gamma f}(x) + \gamma \text{prox}_{f^*/\gamma}(x/\gamma)$, where $f^*$ is the Fenchel conjugate and $\text{prox}_{\gamma f}(x)$ is the proximal operator of $\gamma f$. This identity is sometimes referred to as Moreau's decomposition
$f$ is both $L$-smooth and convex
- $\frac{1}{L}\|\nabla f(x) - \nabla f(y)\|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
- $0 \leq f(y) - f(x) - \langle \nabla f(x), y - x\rangle \leq \frac{L}{2}\|x - y\|^2$
- $f(x) \leq f(y) + \langle \nabla f(x), x - y\rangle - \frac{1}{2 L}\|\nabla f(x) - \nabla f(y)\|^2$
- $f(x) \leq f(y) + \langle \nabla f(z), x - y \rangle + \frac{L}{2}\|x - z\|^2$ (three points descent lemma)
- Follows from using inequality 2.b with the function $f - \frac{\mu}{2}\|\cdot\|^2$.
- TODO
-
The function $f$ being smooth implies that its Fenchel conjugate $f^\star$ is $\frac{1}{L}$-strongly
convex, so using inequality 4.a with $f^\star$ we get
\begin{equation}
f^\star(\alpha) \leq f^\star(\beta) + \langle \nabla f^\star(\alpha), \alpha - \beta\rangle -
\frac{1}{2L}\|\alpha - \beta\|^2
\end{equation}
Using the Fenchel
identities $f^\star(\nabla f(x)) = \langle \nabla f(x), x\rangle - f(x)$ and $\nabla f^\star(\nabla
f(x)) = x$ with $\alpha = \nabla f(y)$ and $\beta = \nabla f(x)$ we get
\begin{align}
\langle \nabla f(y), y\rangle - f(y) \leq \langle \nabla f(x), x\rangle - f(x) + \langle y, \nabla f(y) -
\nabla f(x)\rangle - \frac{1}{2L}\|\nabla f(y) - \nabla f(x)\|^2
\end{align}
Rearranging terms gives the desired inequality.
There are other ways to prove this inequality that don't require to use the heavy machinery of Fenchel conjugates, see for example Nesterov's book. Personally, I find Nesterov's proof rather "magical", as one needs to come up with the right proxy functions, and prefer the straightforwardness of the above one.
$f$ is $\mu$-strongly convex
This includes the set of functions $f$ such that $f - \frac{\mu}{2}\|\cdot\|^2$ is convex. It includes the set of convex functions with $\mu=0$. Here $x^*$ denotes the minimizer of $f$.
- $f(x) \leq f(y) + \langle \nabla f(x), x - y \rangle - \frac{\mu}{2}\|x - y\|^2$
- $f(x) \leq f(y) + \langle \nabla f(y), x - y\rangle + \frac{1}{2\mu}\|\nabla f(x) - \nabla f(y)\|^2$
- $\mu\|x - y\|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
- $\frac{\mu}{2}\|x-x^*\|^2\leq f(x) - f(x^*)$
- $f(\alpha x + (1 - \alpha)y) \leq \alpha f(x) + (1 - \alpha)f(y) - \alpha(1 - \alpha)\frac{\mu}{2}\|x - y\|^2$
- Follows from using inequality 2.b with the function $f - \frac{\mu}{2}\|\cdot\|^2$.
$f$ is both $L$-smooth and $\mu$-strongly convex.
- $\frac{\mu L}{\mu + L}\|x - y\|^2 + \frac{1}{\mu + L}\|\nabla f(x) - \nabla f(y)\|^2 \leq \langle \nabla f(x) - \nabla f(y), x - y\rangle$
- $\mu \preceq \nabla^2 f(x) \preceq L \qquad \text{ (assuming $f$ is twice differentiable)}$
- $f(x) \leq f(y)+ \langle \nabla f(x), x - y\rangle - \frac{\mu}{2}\|x - y\|^2 - \frac{1}{2 (L - \mu)}\|\nabla f(x) - \nabla f(y) - \mu(x - y)\|^2$
- TODO
- TODO
- We have that if $f$ is $L$-smooth and $\mu$-strongly convex then $f - \frac{\mu}{2}\|\cdot\|^2$ is $(L-\mu)$-smooth and convex. Using inequality 3.c on the function $f - \frac{\mu}{2}\|\cdot\|^2$ we then have \begin{align} f(x) &\leq f(y) - \frac{\mu}{2}\|y\|^2 + \frac{\mu}{2}\|x\|^2 + \langle \nabla f(x) - \mu x, x - y\rangle - \frac{1}{2 (L - \mu)}\|\nabla f(x) - \nabla f(y) - \mu(x - y)\|^2 \\ &= f(y)+ \langle \nabla f(x), x - y\rangle - \frac{\mu}{2}\|x - y\|^2 - \frac{1}{2 (L - \mu)}\|\nabla f(x) - \nabla f(y) - \mu(x - y)\|^2 \end{align}
Misc
Another great source of general inequalities is this cheatsheet by László Kozma.
References
Most of these inequalities appear in the Book: "Introductory lectures on convex optimization: A basic course" by Nesterov (2013, Springer Science & Business Media). Another good (and free) resource is the book "Convex Optimization" by Stephen Boyd and Lieven Vandenberghe.