On the Link Between Optimization and Polynomials, Part 6.

The Cost of Differentiating Through Optimization

⊕
By Fabian Pedregosa.

Category: optimization
#polynomials #bi-level

Sat 04 May 2024

Differentiating through optimization is a fundamental problem in hyperparameter optimization, dataset distillation, meta-learning and optimization as a layer, to name a few. In this blog post we'll look into one of the main approaches to differentiate through optimization: unrolled differentiation. With the help of polynomials, we'll be able to derive …

Optimization Nuggets: Stochastic Polyak Step-size, Part 2

Faster rates under strong convexity

⊕
By Fabian Pedregosa and Fabian Schaipp.

Category: optimization
#SGD #proofs

Sun 19 November 2023

This blog post discusses the convergence rate of the Stochastic Gradient Descent with Stochastic Polyak Step-size (SGD-SPS) algorithm for minimizing a finite sum objective. Building upon the proof of the previous post, we show that the convergence rate can be improved to O(1/t) under the additional assumption that …

Optimization Nuggets: Stochastic Polyak Step-size

A simple step-size tuner with optimal rates

⊕
By Fabian Pedregosa.

Category: optimization
#SGD #proofs

Fri 29 September 2023

The stochastic Polyak step-size (SPS) is a practical variant of the Polyak step-size for stochastic optimization. In this blog post, we'll discuss the algorithm and provide a simple analysis for convex objectives with bounded gradients.

On the Convergence of the Unadjusted Langevin Algorithm

Insights From the Quadratic Model

⊕
By Fabian Pedregosa.

Category: sampling
#Langevin #diffusion

Wed 14 June 2023

The Langevin algorithm is a simple and powerful method to sample from a probability distribution. It's a key ingredient of some machine learning methods such as diffusion models and differentially private learning. In this post, I'll derive a simple convergence analysis of this method in the special case when the …

The Russian Roulette: An Unbiased Estimator of the Limit

⊕
By Fabian Pedregosa.

Category: statistics
#nuggets #unbiased

Sat 15 October 2022

The idea for what was later called Monte Carlo method occurred to me when I was playing solitaire during my illness.

Stanislaw Ulam, Adventures of a Mathematician

The Russian Roulette offers a simple way to construct an unbiased estimator for the limit of a sequence. It allows for example to …

Notes on the Frank-Wolfe Algorithm, Part III: backtracking line-search

⊕
By Fabian Pedregosa.

Category: optimization
#Frank-Wolfe #backtracking #line-search #step-size #adaptive

Fri 26 August 2022

Backtracking step-size strategies (also known as adaptive step-size or approximate line-search) that set the step-size based on a sufficient decrease condition are the standard way to set the step-size on gradient descent and quasi-Newton methods. However, these techniques are much less common for Frank-Wolfe-like algorithms. In this blog post I …

On the Link Between Optimization and Polynomials, Part 5

Cyclical Step-sizes.

⊕
By Baptiste Goujaud and Fabian Pedregosa.

Category: optimization
#polynomials #acceleration

Fri 27 May 2022

Six: All of this has happened before.
Baltar: But the question remains, does all of this have to happen again?
Six: This time I bet no.
Baltar: You know, I've never known you to play the optimist. Why the change of heart?
Six: Mathematics. Law of averages. Let a complex …

Optimization Nuggets: Implicit Bias of Gradient-based Methods

Losses with Unique Finite Root.

⊕
By Fabian Pedregosa.

Category: learning theory
#implicit bias #proofs #nuggets

Mon 10 January 2022

When an optimization problem has multiple global minima, different algorithms can find different solutions, a phenomenon often referred to as the implicit bias of optimization algorithms. In this post we'll characterize the implicit bias of gradient-based methods on a class of regression problems that includes linear least squares and Huber …

Optimization Nuggets: Exponential Convergence of SGD

⊕
By Fabian Pedregosa.

Category: optimization
#SGD #proofs #nuggets

Wed 15 December 2021

This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution.

On the Link Between Optimization and Polynomials, Part 4

Acceleration without Momentum.

⊕
By Fabian Pedregosa.

Category: optimization
#polynomials #acceleration

Tue 13 April 2021

While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical …

On the Link Between Optimization and Polynomials, Part 3

A Hitchhiker's Guide to Momentum.

⊕
By Fabian Pedregosa.

Category: optimization
#polynomials #acceleration #momentum #Chebyshev

Tue 02 March 2021

I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.

A momentum optimizer *

On the Link Between Optimization and Polynomials, Part 2

Momentum: when Chebyshev meets Chebyshev.

⊕
By Fabian Pedregosa.

Category: optimization
#polynomials #acceleration #momentum #Chebyshev

Mon 21 December 2020

We can tighten the analysis of gradient descent with momentum through a cobination of Chebyshev polynomials of the first and second kind. Following this connection, we'll derive one of the most iconic methods in optimization: Polyak momentum.

On the Link Between Polynomials and Optimization, Part 1

Residual Polynomials and the Chebyshev method.

⊕
By Fabian Pedregosa.

Category: optimization
#polynomials #acceleration #Chebyshev

Tue 07 April 2020

There's a fascinating link between minimization of quadratic functions and polynomials. A link that goes deep and allows to phrase optimization problems in the language of polynomials and vice versa. Using this connection, we can tap into centuries of research in the theory of polynomials and shed new light on …

How to Evaluate the Logistic Loss and not NaN trying

⊕
By Fabian Pedregosa and Bart van Merriënboer.

Category: coding
#logistic regression #numerical stability

Fri 27 September 2019

A naive implementation of the logistic regression loss can results in numerical indeterminacy even for moderate values. This post takes a closer look into the source of these instabilities and discusses more robust Python implementations.

Notes on the Frank-Wolfe Algorithm, Part II: A Primal-dual Analysis

⊕
By Fabian Pedregosa.

Category: optimization
#Frank-Wolfe #conditional gradient #convergence analysis

Sat 17 November 2018

This blog post extends the convergence theory from the first part of these notes on the Frank-Wolfe (FW) algorithm with convergence guarantees on the primal-dual gap which generalize and strengthen the convergence guarantees obtained in the first part.

Three Operator Splitting

⊕
By Fabian Pedregosa.

Category: optimization
#proximal splitting #three operator splitting #convergence analysis

Thu 06 September 2018

I discuss a recently proposed optimization algorithm: the Davis-Yin three operator splitting.

Notes on the Frank-Wolfe Algorithm, Part I

⊕
By Fabian Pedregosa.

Category: optimization
#Frank-Wolfe #conditional gradient #convergence analysis

Wed 21 March 2018

This blog post is the first in a series discussing different theoretical and practical aspects of the Frank-Wolfe algorithm.

Optimization inequalities cheatsheet

⊕
By Fabian Pedregosa.

Category: optimization
#optimization #cheatsheet

Wed 11 January 2017

Most proofs in optimization consist in using inequalities for a particular function class in some creative way. This is a cheatsheet with inequalities that I use most often. It considers class of functions that are convex, strongly convex and $L$-smooth.

A fully asynchronous variant of the SAGA algorithm

⊕
By Fabian Pedregosa.

Category: optimization
#optimization #asynchronous #SAGA

Wed 12 October 2016

My friend Rémi Leblond has recently uploaded to ArXiv our preprint on an asynchronous version of the SAGA optimization algorithm.

The main contribution is to develop a parallel (fully asynchronous, no locks) variant of the SAGA algorighm. This is a stochastic variance-reduced method for general optimization, specially adapted for problems …

Hyperparameter optimization with approximate gradient

⊕
By Fabian Pedregosa.

Category: optimization
#machine learning #hyperparameters #HOAG

Wed 25 May 2016

TL;DR: I describe a method for hyperparameter optimization by gradient descent.

Most machine …

Lightning v0.1

⊕
By Fabian Pedregosa.

Category: software
#Python #scikit-learn #machine learning #lightning

Fri 25 March 2016

Announce: first public release of lightning!, a library for large-scale linear classification, regression and ranking in Python. The library was started a couple of years ago by Mathieu Blondel who also contributed the vast majority of source code. I joined recently its development and decided it was about time for …

scikit-learn-contrib, an umbrella for scikit-learn related projects.

⊕
By Fabian Pedregosa.

Category: software
#Python #scikit-learn #machine learning #lightning

Sun 06 March 2016

Together with other scikit-learn developers we've created an umbrella organization for scikit-learn-related projects named scikit-learn-contrib. The idea is for this organization to host projects that are deemed too specific or too experimental to be included in the scikit-learn codebase but still offer an API which is compatible with scikit-learn and …

SAGA algorithm in the lightning library

⊕
By Fabian Pedregosa.

Category: misc
#Python #scikit-learn #machine learning #lightning

Mon 22 February 2016

Recently I've implemented, together with Arnaud Rachez, the SAGA[¹] algorithm in the lightning machine learning library (which by the way, has been recently moved to the new scikit-learn-contrib project). The lightning library uses the same API as scikit-learn but is particularly adapted to online learning. As for the SAGA …

On the consistency of ordinal regression methods

⊕
By Fabian Pedregosa.

Category: learning theory
#consistency #machine learning

Fri 09 October 2015

My latests work (with Francis Bach and Alexandre Gramfort) is on the consistency of ordinal regression methods. It has the wildly imaginative …

Holdout cross-validation generator

⊕
By Fabian Pedregosa.

Category: misc
#Python #scikit-learn #machine learning #model selection

Thu 20 August 2015

Cross-validation iterators in scikit-learn are simply generator objects, that is, Python objects that implement the __iter__ method and that for each call to this method return (or more precisely, yield) the indices or a boolean mask for the train and test set. Hence, implementing new cross-validation iterators that behave as …