\star

\alpha

\lambda

\beta

\alpha=

\beta=

\beta = 0

\beta=1

\alpha = 1/\lambda_i

\text{model}

0 p_1

0 \bar{p}_1

2\sqrt{\beta}

\lambda_i

\lambda_i = 0

\alpha > 1/\lambda_i

\max\{|\sigma_1|,|\sigma_2|\} > 1

x_i^k - x_i^*

\xi_i

\beta = (1 - \sqrt{\alpha \lambda_i})^2

$$ \DeclareMathOperator*{\argmin}{{arg\,min}} \DeclareMathOperator*{\argmax}{{arg\,max}} \DeclareMathOperator*{\minimize}{{minimize}} $$

A unifying principle: surrogate optimization

We aim to solve an optimization problem of the form

\begin{equation}\label{eq:fw_objective} \minimize_{\boldsymbol{x} \in \RR^p} f(\boldsymbol{x}) ~. \end{equation}

A key idea is to start at an initial estimate $\xx_0$ and successively minimize an approximating function $Q_t(\xx)$: \begin{equation} \xx_{t+1} = \argmin_{\xx \in \RR^p} Q_t(\xx)~. \end{equation}

We will call $Q_t$ a surrogate function. It is also known as merit function.

How to find such surrogate function?

A good surrogate function should:

Approximate the objective function.
Easy to optimize.

Linear surrogates

The simplest class of surrogates we can think of are linear surrogates, that is, functions of the form $$ Q_t(\xx) = (\xx - \xx_t)^T \boldsymbol{b}_t + c_t~. $$

While simple, they are general unbounded, making thir minimization problematic -- although they do have some uses in constrained optimization

Quadratic surrogates

Slightly more complex are quadratic functions. These are of the form $$ Q_t(\xx) = (\xx - \xx_t)^T \boldsymbol{A} (\xx - \xx_t) + \boldsymbol{b}_t^T (\xx - \xx_t) + c_t. $$

Yes! Many examples: Gradient descent, Newton, etc.

A unifying principle: surrogate optimization

Linear surrogates

Quadratic surrogates

Next: Gradient Descent

Previous: Introduction