Plot memory usage as a function of time

Category: misc
#memory_profiler #mprof #profile

One of the lesser known features of the memory_profiler package is its ability to plot memory consumption as a function of time. This was implemented by my friend Philippe Gervais, previously a colleague at INRIA and now at Google.

With this feature it is possible to generate very easily a plot of the memory consumption as a function of time. The result will be something like this:

where you can see the memory used (in the y-axis) as a function of time (x-axis). Furthermore, we have used two functions, test1 and test2, and it is possible to see with the colored brackets at what time do these functions start and finish.

This plot was generated with the following simple script:

import time

def test1():
    n = 10000
    a = [1] * n
    return a

def test2():
    n = 100000
    b = [1] * n
    return b

if __name__ == "__main__":

what happens here is that we have two functions, test1() and test2() in which we create two lists of different sizes (the one in test2 is bigger). We call time.sleep() for one second so that the function does not return too soon and so we have time to get reliable memory measurements.

The decorator @profile is optional and is useful so that memory_profiler knows when the function has been called so he can plot the brackets indicating that. If you don't put the decorator, the example will work just fine except that the brackets will not appear in your plot.

Suppose we have saved the script as test1.py. We run the script as

$ mprof run test1.py

where mprof is an executable provided by memory_profiler. If the above command was successful it will print something like this

$ mprof run test1.py
mprof: Sampling memory every 0.1s
running as a Python program...

The above command will create a .dat file on your current working directory, something like mprofile_20141108113511.dat. This file (you can inspect it, it's a text file) contains the memory measurements for your program.

You can now plot the memory measurements with the command

$ mprof plot

This will open a matplotlib window and show you the plot:

As you see, attention has been paid to the default values so that the plot it generates already looks decent without much effort. The not-so-nice-part is that, at least as of November 2014, if you want to customize the plot, well, you'll have to look and modify the mprof script. Some refactoring is still needed in order to make it easier to customize the plots (work in progress).

Different ways to get memory consumption or lessons learned from ``memory_profiler``

Category: misc
#Python #memory #memory_profiler

As part of the development of memory_profiler I've tried several ways to get memory usage of a program from within Python. In this post I'll describe the different alternatives I've tested.

The psutil library

psutil is a python library that provides an interface for retrieving information on running processes. It provides convenient, fast and cross-platform functions to access the memory usage of a Python module:

def memory_usage_psutil():
    # return the memory usage in MB
    import psutil
    process = psutil.Process(os.getpid())
    mem = process.get_memory_info()[0] / float(2 ** 20)
    return mem

The above function returns the memory usage of the current Python process in MiB. Depending on the platform it will choose the most accurate and fastest way to get this information. For example, in Windows it will use the C++ Win32 API while in Linux it will read from /proc, hiding the implementation details and proving on each platform a fast and accurate measurement.

If you are looking for an easy way to get the memory consumption within Python this in my opinion your best shot.

The resource module

The resource module is part of the standard Python library. It's basically a wrapper around getrusage, which is a POSIX standard but some methods are still missing in Linux . However, the ones we are interested seem to work fine in Ubuntu 10.04. You can get the memory usage with this function:

def memory_usage_resource():
    import resource
    rusage_denom = 1024.
    if sys.platform == 'darwin':
        # ... it seems that in OSX the output is different units ...
        rusage_denom = rusage_denom * rusage_denom
    mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / rusage_denom
    return mem

In my experience this approach is several times faster than the one based in psutil as was the default way to get the memory usage that I used in memory_profiler from version 0.23 up to 0.26. I changed this behavior in 0.27 after a bug report by Philippe Gervais. The problem with this approach is that it seems to report results that are slightly different in some cases. Notably it seems to differ when objects have been recently liberated from the python interpreter.

In the following example, orphaned arrays are liberated by the python interpreter, which is correctly seen by psutil but not by resource:

mem_resource = []
mem_psutil = []
for i in range(1, 21):
    a = np.zeros((1000 * i, 100 * i))

Memory plot

By the way I would be delighted to be corrected if I'm doing something wrong or informed of a workaround if this exists (I've got the code to reproduce the figures 1)

querying ps directly

The method based on psutils works great but is not available by default on all Python systems. Because of this in memory_profiler we use as last resort something that's pretty ugly but works reasonably well when all else fails: invoking the system's ps command and parsing the output. The code is something like::

def memory_usage_ps():
    import subprocess
    out = subprocess.Popen(['ps', 'v', '-p', str(os.getpid())],
    vsz_index = out[0].split().index(b'RSS')
    mem = float(out[1].split()[vsz_index]) / 1024
    return mem

The main disadvantage of this approach is that it needs to fork a process for each measurement. For some tasks where you need to get memory usage very fast, like in line-by-line memory usage then this be a huge overhead on the code. For other tasks such as getting information of long-running processes, where the memory usage is anyway working on a separate process this is not too bad.


Here is a benchmark of the different alternatives presented above. I am plotting the time it takes the different approaches to make 100 measurements of the memory usage (lower is better). As can be seen the smallest one is resource (although it suffers from the issues described above) followed closely by psutil which is in my opinion the best option if you can count on it being installed on the host system and followed far away by ps which is roughly a hundred times slower than psutil.

Memory plot

  1. IPython notebook to reproduce the figures: html ipynb 

Memory plots with memory_profiler

Category: misc
#Python #memory #memory_profiler

Besides performing a line-by-line analysis of memory consumption, memory_profiler exposes some functions that allow to retrieve the memory consumption of a function in real-time, allowing e.g. to visualize the memory consumption of a given function over time.

The function to be used is memory_usage. The first argument specifies what code is to be monitored. This can represent either an external process or a Python function. In the case of an external process the first argument is an integer representing its process identifier (PID). In the case of a Python function, we need pass the function and its arguments to memory_usage. We do this by passing the tuple (f, args, kw) that specifies the function, its position arguments as a tuple and its keyword arguments as a dictionary, respectively. This will be then executed by memory_usage as f(*args, **kw).

Let's see this with an example. Take as function NumPy's pseudo-inverse function. Thus f = numpy.linalg.pinv and f takes one positional argument (the matrix to be inverted) so args = (a,) where a is the matrix to be inverted. Note that args must be a tuple consisting of the different arguments, thus the parenthesis around a. The third item is a dictionary kw specifying the keyword arguments. Here kw is optional and is omitted.

>>> from memory_profiler import memory_usage
>>> import numpy as np
# create a random matrix
>>> a = np.random.randn(500, 500)
>>> mem_usage = memory_usage((np.linalg.pinv, (a,)), interval=.01)
>>> print(mem_usage)
[57.02734375, 55.0234375, 57.078125, ...]

This has given me a list specifying at different time intervals (t0, t0 + .01, t0 + .02, ...) at which the measurements where taken. Now I can use that to for example plot the memory consumption as a function of time:

>>> import pylab as pl
>>> pl.plot(np.arange(len(mem_usage)) * .01, mem_usage, label='linalg.pinv')
>>> pl.xlabel('Time (in seconds)')
>>> pl.ylabel('Memory consumption (in MB)')
>>> pl.show()

Memory plot

This will give the memory usage of a single function across time, which might be interesting for example to detect temporaries that would be created during the execution.

Another use case for memory_usage would be to see how memory behaves as input data gets bigger. In this case we are interested in memory as a function of the input data. One obvious way we can do this is by calling the same function each time with a different input and take as memory consumption the maximum consumption over time. This way we will have a memory usage for each input.

>>> for i in range(1, 5):
...    A = np.random.randn(100 * i, 100 * i)
...    mem_usage = memory_usage((np.linalg.pinv, (A,)))
...    print max(mem_usage)


It is now possible to plot these results as a function of the dimensions.

import numpy as np
import pylab as pl
from memory_profiler import memory_usage

dims = np.linspace(100, 1000, 10)
pinv_mem = np.zeros(dims.size)

for i_dim, k in enumerate(dims):
    x = np.random.randn(k, k)
    tmp = memory_usage((np.linalg.pinv, (x,)), interval=.01)
    pinv_mem[i_dim] = np.max(tmp)

pl.plot(dims, pinv_mem, label='np.linalg.pinv')
pl.ylabel('Memory (in MB)')
pl.xlabel('Dimension of the square matrix')
pl.legend(loc='upper left')

Memory plot

line-by-line memory usage of a Python program

Category: misc
#python #memory_profiler

My newest project is a Python library for monitoring memory consumption of arbitrary process, and one of its most useful features is the line-by-line analysis of memory usage for Python code. I wrote a basic prototype six months ago after being surprised by the lack of related tools. I wanted to plot memory consumption of a couple of Python functions but did not find a python module to do the job. I came to the conclusion that there is no standard way to get the memory usage of the Python interpreter from within Python, so I resorted to reading for from /proc/$PID/statm. From there on I realized that one the fetching of memory is done, making a line-by-line report wouldn't be hard. Back to today. I've been using the line-by-line memory monitoring to diagnose poor memory management (hidden temporaries, unused allocation, etc.) for some time. It seems to work on two different computers, so full of confidence as I am, I'll write a blog post about it ...

How to use it?

The easiest way to get it is to install from the Python Package Index:

$ easy_install -U memory_profiler # pip install -U memory_profiler

but other options include fetching the latests from github or dropping it on your current working directory or somewhere else on your PYTHONPATH since it consist of a single file. Then next step is to write some python code to profile. It can be just about any function, but for the purpose of this blog post I'll create a function `my_func()` with mostly memory allocations and save it to a file named example.py:

def my_func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    del b
    return a

if __name__ == '__main__':

Note that I've decorated the function with @profile. This tells the profiler to look into function my_func and gather the memory consumption for each line.