fa.bianp.net

Memory plots with memory_profiler

Category: misc
#Python #memory #memory_profiler

Besides performing a line-by-line analysis of memory consumption, memory_profiler exposes some functions that allow to retrieve the memory consumption of a function in real-time, allowing e.g. to visualize the memory consumption of a given function over time.

The function to be used is memory_usage. The first argument specifies what code is to be monitored. This can represent either an external process or a Python function. In the case of an external process the first argument is an integer representing its process identifier (PID). In the case of a Python function, we need pass the function and its arguments to memory_usage. We do this by passing the tuple (f, args, kw) that specifies the function, its position arguments as a tuple and its keyword arguments as a dictionary, respectively. This will be then executed by memory_usage as f(*args, **kw).

Let's see this with an example. Take as function NumPy's pseudo-inverse function. Thus f = numpy.linalg.pinv and f takes one positional argument (the matrix to be inverted) so args = (a,) where a is the matrix to be inverted. Note that args must be a tuple consisting of the different arguments, thus the parenthesis around a. The third item is a dictionary kw specifying the keyword arguments. Here kw is optional and is omitted.

>>> from memory_profiler import memory_usage
>>> import numpy as np
# create a random matrix
>>> a = np.random.randn(500, 500)
>>> mem_usage = memory_usage((np.linalg.pinv, (a,)), interval=.01)
>>> print(mem_usage)
[57.02734375, 55.0234375, 57.078125, ...]

This has given me a list specifying at different time intervals (t0, t0 + .01, t0 + .02, ...) at which the measurements where taken. Now I can use that to for example plot the memory consumption as a function of time:

>>> import pylab as pl
>>> pl.plot(np.arange(len(mem_usage)) * .01, mem_usage, label='linalg.pinv')
>>> pl.xlabel('Time (in seconds)')
>>> pl.ylabel('Memory consumption (in MB)')
>>> pl.show()

Memory plot

This will give the memory usage of a single function across time, which might be interesting for example to detect temporaries that would be created during the execution.

Another use case for memory_usage would be to see how memory behaves as input data gets bigger. In this case we are interested in memory as a function of the input data. One obvious way we can do this is by calling the same function each time with a different input and take as memory consumption the maximum consumption over time. This way we will have a memory usage for each input.

>>> for i in range(1, 5):
...    A = np.random.randn(100 * i, 100 * i)
...    mem_usage = memory_usage((np.linalg.pinv, (A,)))
...    print max(mem_usage)

29.22
30.10
40.66
53.96

It is now possible to plot these results as a function of the dimensions.

import numpy as np
import pylab as pl
from memory_profiler import memory_usage

dims = np.linspace(100, 1000, 10)
pinv_mem = np.zeros(dims.size)

for i_dim, k in enumerate(dims):
    x = np.random.randn(k, k)
    tmp = memory_usage((np.linalg.pinv, (x,)), interval=.01)
    pinv_mem[i_dim] = np.max(tmp)

pl.plot(dims, pinv_mem, label='np.linalg.pinv')
pl.ylabel('Memory (in MB)')
pl.xlabel('Dimension of the square matrix')
pl.legend(loc='upper left')
pl.axis('tight')
pl.show()

Memory plot