Householder matrices

Category: misc
#linear algebra #householder #QR

Householder matrices are square matrices of the form

$$ P = I - \beta v v^T$$

where $\beta$ is a scalar and $v$ is a vector. It has the useful property that for suitable chosen $v$ and $\beta$ it makes the product $P x$ to zero out all of the coordinates but one, that is, $P x = |x| e_1$. The following code, given $x$, finds the values of $\beta, v$ that verify that property. The algorithm can be found in several textbooks 1

def house(x):
    Given a vetor x, computes vectors v with v[0] = 1
    and scalar beta such that P = I - beta v v^T
    is orthogonal and P x = ||x|| e_1

    x : array, shape (n,) or (n, 1)

    beta : scalar
    v : array, shape (n, 1)
    x = np.asarray(x)
    if x.ndim == 1:
        x = x[:, np.newaxis]
    sigma = linalg.norm(x[1:, 0]) ** 2
    v = np.vstack((1, x[1:]))
    if sigma == 0:
        beta = 0
        mu = np.sqrt(x[0, 0] ** 2 + sigma)
        if x[0, 0] <= 0:
            v[0, 0] = x[0, 0] - mu
            v[0, 0] = - sigma / (x[0, 0] + mu)
        beta = 2 * (v[0, 0] ** 2) / (sigma + v[0, 0] ** 2)
        v /= v[0, 0]
    return beta, v

As promised, this computes the parameters of $P$ such that $P x = |x| e_1$, exact to 15 decimals:

>>> n = 5
>>> x = np.random.randn(n)
>>> beta, v = house(x)
>>> P = np.eye(n) - beta * np.dot(v, v.T)
>>> print np.round(P.dot(x) / linalg.norm(x), decimals=15) 
[ 1. -0. -0.  0. -0.]

This property is what it makes Householder matrices useful in the context of numerical analysis. It can be used for example to compute the QR decomposition of a given matrix. The idea is to succesively zero out the sub-diagonal elements, thus leaving a triangular matrix at the end. In the first iteration we compute a Householder matrix $P_0$ such that $P_0 X$ has only zero below the diagonal of the first column, then compute a Householder matrix $P_1$ such that $P_1 X$ zeroes out the subdiagonal elements of the second column and so on. At the end we will have that $P_0 P_1 ... P_n X$ is an upper triangular matrix. Since all $P_i$ are orthogonal, the product $P_0 P_1 ... P_n$ is again an orthogonal matrix, namely the $Q$ matrix in the QR decomposition.

If we choose X as 20-by-20 random matrix, with colors representing different values

QR decomposition

we can see the process of the Householder matrices being applied one by one to obtain an upper triangular matrix

QR decomposition

A similar application of Householder matrices is to reduce a given symmetric matrix to tridiagonal form, which proceeds in a similar way as in the QR algorithm, only that now we multiply by the matrix $X$ by the left and right with the Householder matrices. Also, in this case we seek for Householder matrices that zero out the elements of the subdiagonal plus one, instead of just subdiagonal elements. This algorithm is used for example as a preprocessing step for most dense eigensolvers


  1. "Matrix Computations" third edition, Golub & VanLoan (Algorithm 5.1.1). 

  2. Code to reproduce the figures can be found here, source for the IPython notebook can be found here 

Ridge regression path

Category: misc
#scikit-learn #scipy #linear algebra

Ridge coefficients for multiple values of the regularization parameter can be elegantly computed by updating the thin SVD decomposition of the design matrix:

import numpy as np
from scipy import linalg
def ridge(A, b, alphas):
    Return coefficients for regularized least squares

         min ||A x - b||^2 + alpha ||x||^2

    A : array, shape (n, p)
    b : array, shape (n,)
    alphas : array, shape (k,)

    coef: array, shape (p, k)
    U, s, Vt = linalg.svd(X, full_matrices=False)
    d = s / (s[:, np.newaxis].T ** 2 + alphas[:, np.newaxis])
    return np.dot(d * U.T.dot(y), Vt).T

This can be used to efficiently compute what it's regularization path, that is, to plot the coefficients as a function of the regularization parameter. Since the bottleneck of the algorithm is the singular value decomposition, computing the coefficients for other values of the regularization parameter basically comes for free.

A variant of this algorithm can then be used to compute the optimal regularization parameter in the sense of leave-one-out cross-validation and is implemented in scikit-learn's RidgeCV (for which Mathieu Blondel has an excelent post by ). This optimal parameter is denoted with a vertical dotted line in the following picture, full code can be found here.

Computing the vector norm

Category: misc
#linear algebra #norm #scipy

Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10 Last week I discussed with Gael how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm(a) or computing sqrt(a.T a), but as I learned later, both have issues. Note: I use single-precision arithmetic for simplicity, but similar results hold for double-precision.

Overflow and underflow

Both approaches behave terribly in presence of big or small numbers. Take for example an array with a single entry:

In [0]: a = np.array([1e20], dtype=np.float32)
In [1]: a
Out[1]: array([1.00000002e+20], dtype=float32)
In [2]: scipy.linalg.norm(a)
Out[2]: inf
In [3]: np.sqrt(np.dot(a.T, a))
Out[3]: inf

That is, both methods return Infinity. However, the correct answer is 10^20, which would comfortably fit in a single-precision instruction. Similar examples can be found where numbers underflow.


Again, scipy.linalg.norm has a terrible behavior in what concerns numerical stability. In presence of different magnitudes severe cancellation can occur. Take for example and array with one 10.000 in the first value and 10.000 ones behind:

a = np.array([1e4] + [1]*10000, dtype=np.float32)

In this case, scipy.linalg.norm will discard all the ones, producing

In [3]: linalg.norm(a) - 1e4
Out[3]: 0.0

when the correct answer is 0.5. In this case $\sqrt{a^T a}$ has a much nicer behavior since results of a dot-product in single precision are accumulated using double-precision (but if double-precision is used, results won't be accumulated using quadruple-precision):

In [4]: np.sqrt(np.dot(a.T, a)) - 1e4
Out[4]: 0.5


The BLAS function nrm2 does automatic scaling of parameters rendering it more stable and tolerant to overflow. Luckily, scipy provides a mechanism to call some BLAS functions:

In [5]: nrm2, = scipy.linalg.get_blas_funcs(('nrm2',), (a,))

Using this function, no overflow occurs (hurray!)

In [95]: a = np.array([1e20], dtype=np.float32)
In [96]: nrm2(a)
Out[96]: 1.0000000200408773e+20

and stability is greatly improved

In [99]: nrm2(a) - 1e4
Out[99]: 0.49998750062513864

Update: as of scipy 0.10, this function is used by scipy.linalg.norm .


Computing the 2-norm of an array is a very cheap operation, thus computations are usually dominated by external factors, such as latency of memory access or overhead in the Python/C layer. Experimental benchmarks on an array of size 10^7 show that nrm2 is marginally slower than $latex \sqrt{a^T a}$, because scaling has a cost, but is is also more stable and less prone to overflow and underflow. It also shows that scipy.linalg.norm is the slowest (and numerically worst!) of all.

$\sqrt{a^T a}$ BLAS nrm2(a) scipy.linalg.norm(a)
0.02 0.02 0.16