$$ \DeclareMathOperator*{\argmin}{{arg\,min}} \DeclareMathOperator*{\argmax}{{arg\,max}} \DeclareMathOperator*{\minimize}{{minimize}} $$

A unifying principle: surrogate optimization

We aim to solve an optimization problem of the form

\begin{equation}\label{eq:fw_objective} \minimize_{\boldsymbol{x} \in \RR^p} f(\boldsymbol{x}) ~. \end{equation}

A key idea is to start at an initial estimate $\xx_0$ and successively minimize an approximating function $Q_t(\xx)$: \begin{equation} \xx_{t+1} = \argmin_{\xx \in \RR^p} Q_t(\xx)~. \end{equation}

We will call $Q_t$ a surrogate function. It is also known as merit function.

How to find such surrogate function?

A good surrogate function should:

Linear surrogates

The simplest class of surrogates we can think of are linear surrogates, that is, functions of the form $$ Q_t(\xx) = (\xx - \xx_t)^T \boldsymbol{b}_t + c_t~. $$

While simple, they are general unbounded, making thir minimization problematic -- although they do have some uses in constrained optimization

Quadratic surrogates

Slightly more complex are quadratic functions. These are of the form $$ Q_t(\xx) = (\xx - \xx_t)^T \boldsymbol{A} (\xx - \xx_t) + \boldsymbol{b}_t^T (\xx - \xx_t) + c_t. $$

Yes! Many examples: Gradient descent, Newton, etc.

Next: Gradient Descent

Previous: Introduction