Keep the gradient flowing

Computing the vector norm

Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10 Last week I discussed with Gael how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm(a) or computing sqrt(a.T a), but as I learned later, both have issues. Note: I use single-precision arithmetic for simplicity, but similar results hold for double-precision.

Overflow and underflow

Both approaches behave terribly in presence of big or small numbers. Take for example an array with a single entry:

In [0]: a = np.array([1e20], dtype=np.float32)
In [1]: a
Out[1]: array([1.00000002e+20], dtype=float32)
In [2]: scipy.linalg.norm(a)
Out[2]: inf
In [3]: np.sqrt(, a))
Out[3]: inf

That is, both methods return Infinity. However, the correct answer is 10^20, which would comfortably fit in a single-precision instruction. Similar examples can be found where numbers underflow.


Again, scipy.linalg.norm has a terrible behavior in what concerns numerical stability. In presence of different magnitudes severe cancellation can occur. Take for example and array with one 10.000 in the first value and 10.000 ones behind:

a = np.array([1e4] + [1]*10000, dtype=np.float32)

In this case, scipy.linalg.norm will discard all the ones, producing

In [3]: linalg.norm(a) - 1e4
Out[3]: 0.0

when the correct answer is 0.5. In this case $\sqrt{a^T a}$ has a much nicer behavior since results of a dot-product in single precision are accumulated using double-precision (but if double-precision is used, results won't be accumulated using quadruple-precision):

In [4]: np.sqrt(, a)) - 1e4
Out[4]: 0.5


The BLAS function nrm2 does automatic scaling of parameters rendering it more stable and tolerant to overflow. Luckily, scipy provides a mechanism to call some BLAS functions:

In [5]: nrm2, = scipy.linalg.get_blas_funcs(('nrm2',), (a,))

Using this function, no overflow occurs (hurray!)

In [95]: a = np.array([1e20], dtype=np.float32)
In [96]: nrm2(a)
Out[96]: 1.0000000200408773e+20

and stability is greatly improved

In [99]: nrm2(a) - 1e4
Out[99]: 0.49998750062513864

Update: as of scipy 0.10, this function is used by scipy.linalg.norm .


Computing the 2-norm of an array is a very cheap operation, thus computations are usually dominated by external factors, such as latency of memory access or overhead in the Python/C layer. Experimental benchmarks on an array of size 10^7 show that nrm2 is marginally slower than $latex \sqrt{a^T a}$, because scaling has a cost, but is is also more stable and less prone to overflow and underflow. It also shows that scipy.linalg.norm is the slowest (and numerically worst!) of all.

$\sqrt{a^T a}$ BLAS nrm2(a) scipy.linalg.norm(a)
0.02 0.02 0.16