One of the lesser known features of the memory_profiler package is its ability to plot memory consumption as a function of time. This was implemented by my friend Philippe Gervais, previously a colleague at INRIA and now at Google.
With this feature it is possible to generate very easily a plot of the memory consumption as a function of time. The result will be something like this:
where you can see the memory used (in the y-axis) as a function of time (x-axis). Furthermore, we have used two functions,
test2, and it is possible to see with the colored brackets at what time do these functions start and finish.
This plot was generated with the following simple script:
n = 10000
a =  * n
n = 100000
b =  * n
if __name__ == "__main__":
what happens here is that we have two functions,
test2() in which we create two lists of different sizes (the one in
test2 is bigger). We call time.sleep() for one second so that the function does not return too soon and so we have time to get reliable memory measurements.
@profile is optional and is useful so that
memory_profiler knows when the function has been called so he can plot the brackets indicating that. If you don't put the decorator, the example will work just fine except that the brackets will not appear in your plot.
Suppose we have saved the script as
test1.py. We run the script as
where mprof is an executable provided by memory_profiler. If the above command was successful it will print something like this
$ mprof run test1.py
mprof: Sampling memory every 0.1s
running as a Python program...
The above command will create a
.dat file on your current working directory, something like
mprofile_20141108113511.dat. This file (you can inspect it, it's a text file) contains the memory measurements for your program.
You can now plot the memory measurements with the command
This will open a matplotlib window and show you the plot:
As you see, attention has been paid to the default values so that the plot it generates already looks decent without much effort. The not-so-nice-part is that, at least as of November 2014, if you want to customize the plot, well, you'll have to look and modify the mprof script. Some refactoring is still needed in order to make it easier to customize the plots (work in progress).
As part of the development of
memory_profiler I've tried
several ways to get memory usage of a program from within Python. In this post
I'll describe the different alternatives I've tested.
The psutil library
psutil is a python library that provides
an interface for retrieving information on running processes. It provides
convenient, fast and cross-platform functions to access the memory usage of a
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.get_memory_info() / float(2 ** 20)
The above function returns the memory usage of the current Python process in
MiB. Depending on the platform it will choose the most accurate and fastest
way to get this information. For example, in Windows it will use the C++ Win32
API while in Linux it will read from
/proc, hiding the implementation
details and proving on each platform a fast and accurate measurement.
If you are looking for an easy way to get the memory consumption within Python
this in my opinion your best shot.
The resource module
The resource module is part
of the standard Python library. It's basically a wrapper around
which is a POSIX standard but some methods are still missing in
Linux . However, the ones we are
interested seem to work fine in Ubuntu 10.04. You can get the memory usage
with this function:
rusage_denom = 1024.
if sys.platform == 'darwin':
# ... it seems that in OSX the output is different units ...
rusage_denom = rusage_denom * rusage_denom
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / rusage_denom
In my experience this approach is several times faster than the one based in
psutil as was the default way to get the memory usage that I used in
memory_profiler from version 0.23 up to 0.26. I changed this behavior in
0.27 after a bug report by Philippe Gervais.
The problem with this approach is that it seems to report results that are
slightly different in some cases. Notably it seems to differ when objects
have been recently liberated from the python interpreter.
In the following example, orphaned arrays are liberated by the python
interpreter, which is correctly seen by
psutil but not by
mem_resource = 
mem_psutil = 
for i in range(1, 21):
a = np.zeros((1000 * i, 100 * i))
By the way I would be delighted to be corrected if I'm doing something wrong
or informed of a workaround if this exists (I've got the code to reproduce the
The method based on
psutils works great but is not available by default on all
Python systems. Because of this in
memory_profiler we use as last resort
something that's pretty ugly but works reasonably well when all else fails:
invoking the system's
ps command and parsing the output. The code is
out = subprocess.Popen(['ps', 'v', '-p', str(os.getpid())],
vsz_index = out.split().index(b'RSS')
mem = float(out.split()[vsz_index]) / 1024
The main disadvantage of this approach is that it needs to fork a process for
each measurement. For some tasks where you need to get memory usage very fast,
like in line-by-line memory usage then this be a huge overhead on the code.
For other tasks such as getting information of long-running processes, where
the memory usage is anyway working on a separate process this is not too bad.
Here is a benchmark of the different alternatives presented above. I am
plotting the time it takes the different approaches to make 100 measurements
of the memory usage (lower is better). As can be seen the smallest one is
resource (although it suffers from the issues described above) followed
psutil which is in my opinion the best option if you can count on
it being installed on the host system and followed far away by
ps which is
roughly a hundred times slower than
Besides performing a line-by-line analysis of memory consumption,
exposes some functions that allow to retrieve the memory consumption
of a function in real-time, allowing e.g. to visualize the memory
consumption of a given function over time.
The function to be used is
memory_usage. The first argument
specifies what code is to be monitored. This can represent either an
external process or a Python function. In the case of an external
process the first argument is an integer representing its process
identifier (PID). In the case of a Python function, we need pass the
function and its arguments to memory_usage. We do this by passing the
(f, args, kw) that specifies the function, its position
arguments as a tuple and its keyword arguments as a dictionary,
respectively. This will be then executed by
Let's see this with an example. Take as function NumPy's
pseudo-inverse function. Thus
f = numpy.linalg.pinv and
f takes one positional argument (the
matrix to be inverted) so
args = (a,) where
a is the matrix to be
inverted. Note that args must be a tuple consisting of the different
arguments, thus the parenthesis around
a. The third item is a
kw specifying the keyword arguments. Here kw is optional
and is omitted.
>>> from memory_profiler import memory_usage
>>> import numpy as np
# create a random matrix
>>> a = np.random.randn(500, 500)
>>> mem_usage = memory_usage((np.linalg.pinv, (a,)), interval=.01)
[57.02734375, 55.0234375, 57.078125, ...]
This has given me a list specifying at different time intervals
t0 + .01, t0 + .02, ...) at which the measurements where taken. Now I can
use that to for example plot the memory consumption as a function of
>>> import pylab as pl
>>> pl.plot(np.arange(len(mem_usage)) * .01, mem_usage, label='linalg.pinv')
>>> pl.xlabel('Time (in seconds)')
>>> pl.ylabel('Memory consumption (in MB)')
This will give the memory usage of a single function across time, which
might be interesting for example to detect temporaries that would be
created during the execution.
Another use case for
memory_usage would be to see how memory behaves
as input data gets bigger. In this case we are interested in memory as
a function of the input data. One obvious way we can do this is by
calling the same function each time with a different input and take as
memory consumption the maximum consumption over time. This way we will
have a memory usage for each input.
>>> for i in range(1, 5):
... A = np.random.randn(100 * i, 100 * i)
... mem_usage = memory_usage((np.linalg.pinv, (A,)))
... print max(mem_usage)
It is now possible to plot these results as a function of the
import numpy as np
import pylab as pl
from memory_profiler import memory_usage
dims = np.linspace(100, 1000, 10)
pinv_mem = np.zeros(dims.size)
for i_dim, k in enumerate(dims):
x = np.random.randn(k, k)
tmp = memory_usage((np.linalg.pinv, (x,)), interval=.01)
pinv_mem[i_dim] = np.max(tmp)
pl.plot(dims, pinv_mem, label='np.linalg.pinv')
pl.ylabel('Memory (in MB)')
pl.xlabel('Dimension of the square matrix')
My newest project is a Python library for monitoring memory consumption
of arbitrary process, and one of its most useful features is the
line-by-line analysis of memory usage for Python code. I wrote a basic
prototype six months ago after being surprised by the lack of related
tools. I wanted to plot memory consumption of a couple of Python
functions but did not find a python module to do the job. I came to the
conclusion that there is no standard way to get the memory usage of the
Python interpreter from within Python, so I resorted to reading for from
/proc/$PID/statm. From there on I realized that one the fetching of
memory is done, making a line-by-line report wouldn't be hard. Back to
today. I've been using the line-by-line memory monitoring to diagnose
poor memory management (hidden temporaries, unused allocation, etc.) for
some time. It seems to work on two different computers, so full of
confidence as I am, I'll write a blog post about it ...
How to use it?
The easiest way to get it is to install from the Python Package Index:
$ easy_install -U memory_profiler # pip install -U memory_profiler
but other options include fetching the latests
from github or dropping it on your current working directory or
somewhere else on your PYTHONPATH since it consist of a single file.
Then next step is to write some python code to profile. It can be just
about any function, but for the purpose of this blog post I'll create a
function `my_func()` with mostly memory allocations and save it to a file
a =  * (10 ** 6)
b =  * (2 * 10 ** 7)
if __name__ == '__main__':
Note that I've decorated the function
with @profile. This tells the profiler to look into function my_func
and gather the memory consumption for each line.