Notes on the Frank-Wolfe algorithm, Part I

Category: optimization
#optimization #Frank-Wolfe

$$ \def\xx{\boldsymbol x} \def\yy{\boldsymbol y} \def\ss{\boldsymbol s} \def\dd{\boldsymbol d} \DeclareMathOperator*{\argmin}{{arg\,min}} \DeclareMathOperator*{\minimize}{{minimize}} \DeclareMathOperator*{\diam}{{diam}} $$

This blog post is the first in a series discussing different theoretical and practical aspects of the Frank-Wolfe algorithm.

The Frank-Wolfe Algorithm
Example …

Optimization inequalities cheatsheet

Category: optimization
#optimization #cheatsheet

Most proofs in optimization consist in using inequalities for a particular function class in some creative way. This is a cheatsheet with inequalities that I use most often. It considers class of functions that are convex, strongly convex and $L$-smooth.

Setting. $f$ is a function $\mathbb{R}^p \to …

A fully asynchronous variant of the SAGA algorithm

Category: optimization
#optimization #asynchronous #SAGA

My friend Rémi Leblond has recently uploaded to ArXiv our preprint on an asynchronous version of the SAGA optimization algorithm.

The main contribution is to develop a parallel (fully asynchronous, no locks) variant of the SAGA algorighm. This is a stochastic variance-reduced method for general optimization, specially adapted for problems …

Hyperparameter optimization with approximate gradient

Category: optimization
#machine learning #hyperparameters #HOAG

TL;DR: I describe a method for hyperparameter optimization by gradient descent.

Most machine learning models rely on at least one hyperparameter to control for model complexity. For example, logistic regression commonly relies on a regularization parameter that controls the amount of $\ell_2$ regularization. Similarly, kernel methods also have hyperparameters …