Update: a fast and stable norm was added to scipy.linalg in August
2011 and will be available in scipy 0.10 Last week I discussed with
Gael how we should compute the euclidean norm of a vector a using
SciPy. Two approaches suggest themselves, either calling
scipy.linalg.norm(a) or computing sqrt(a.T a), but as I learned later,
both have issues. Note: I use single-precision arithmetic for
simplicity, but similar results hold for double-precision.
Overflow and underflow
Both approaches behave terribly in presence of big or small numbers.
Take for example an array with a single entry:
In [0]: a = np.array([1e20], dtype=np.float32)
In [1]: a
Out[1]: array([1.00000002e+20], dtype=float32)
In [2]: scipy.linalg.norm(a)
Out[2]: inf
In [3]: np.sqrt(np.dot(a.T, a))
Out[3]: inf
That is, both methods return Infinity. However, the correct answer is
10^20, which would comfortably fit in a single-precision
instruction. Similar examples can be found where numbers underflow.
Stability
Again, scipy.linalg.norm has a terrible behavior in what concerns
numerical stability. In presence of different magnitudes severe
cancellation can occur. Take for example and array with one 10.000 in
the first value and 10.000 ones behind:
a = np.array([1e4] + [1]*10000, dtype=np.float32)
In this case, scipy.linalg.norm will discard all the ones, producing
In [3]: linalg.norm(a) - 1e4
Out[3]: 0.0
when the correct answer is 0.5. In this case $\sqrt{a^T a}$ has a much
nicer behavior since results of a dot-product in single precision are
accumulated using double-precision (but if double-precision is used,
results won't be accumulated using quadruple-precision):
In [4]: np.sqrt(np.dot(a.T, a)) - 1e4
Out[4]: 0.5
BLAS BLAS BLAS ...
The BLAS function nrm2 does automatic scaling of parameters rendering
it more stable and tolerant to overflow. Luckily, scipy provides a
mechanism to call some BLAS functions:
In [5]: nrm2, = scipy.linalg.get_blas_funcs(('nrm2',), (a,))
Using this function, no overflow occurs (hurray!)
In [95]: a = np.array([1e20], dtype=np.float32)
In [96]: nrm2(a)
Out[96]: 1.0000000200408773e+20
and stability is greatly improved
In [99]: nrm2(a) - 1e4
Out[99]: 0.49998750062513864
Update: as of scipy 0.10, this function is used by scipy.linalg.norm .
Timing
Computing the 2-norm of an array is a very cheap operation, thus
computations are usually dominated by external factors, such as latency
of memory access or overhead in the Python/C layer. Experimental
benchmarks on an array of size 10^7 show that nrm2 is marginally slower
than $latex \sqrt{a^T a}$, because scaling has a cost, but is is also
more stable and less prone to overflow and underflow. It also shows that
scipy.linalg.norm is the slowest (and numerically worst!) of all.
$\sqrt{a^T a}$ |
BLAS nrm2(a) |
scipy.linalg.norm(a) |
0.02 |
0.02 |
0.16 |
There are comments.