In scipy's development version there's a new function closely related to the QR-decomposition of a matrix and to the least-squares solution of a linear system. What this function does is to compute the QR-decomposition of a matrix and then multiply the resulting orthogonal factor by another arbitrary matrix. In pseudocode:
def qr_multiply(X, Y): Q, R = qr(X) return dot(Q.T, Y)
but unlike this naive implementation, qr_multiply is able to do all this without explicitly computing the orthogonal Q matrix, resulting both in memory and time saving. In the following picture I measured the memory consumption as a function of time of running this computation on a 1.000 x 1.000 matrix X and a vector Y (full code can be found here):
It can be seen that not only qr_multiply is almost twice as fast as the naive approach, but also that the memory consumption is significantly reduced, since the orthogonal factor is never explicitly computed. Credit for implementing the qr_multiply function goes to Martin Teichmann.