How to profile Python applications inside a docker container

November 21, 2019 - Jan Pieter Bruins Slot

In the following post I’ll explain how you can profile a running Python program in a Docker container using py-spy. py-spy is able to generate flame graphs, and it can give us profiling capabilities in order to debug our Python programs. Now, when your Python code is running in a Docker container it can be a bit more difficult to profile your code, and this post sets out to show you how this can be done with these tools.

1. Setting up

Before we can profile the Python application, we need to set up the tools, and example files, that we will be using in this project. You can inspect the finished result here.

1.1 Project

Let’s get started with the project outline, and create the following files and folders, the docker-compose.yml is optional because I will show you how to use plain docker as well.

$ tree -L 1 --dirsfirst
.
├── app/
├── pyspy/
└── docker-compose.yml

1.2 py-spy

In the pyspy/ folder we will be creating a Dockerfile that we’re going to use to create a Docker container that contains the py-spy program. The Dockerfile should contain the following:

# pyspy/Dockerfile
FROM python:3.6
RUN pip install py-spy
WORKDIR /profiles
ENTRYPOINT [ "py-spy" ]
CMD []

Let’s test if is is working:

$ cd pyspy/
$ docker build -t pyspy .
$ docker run -it pyspy

py-spy 0.3.0
Sampling profiler for Python programs

USAGE:
    py-spy <SUBCOMMAND>

OPTIONS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    record    Records stack trace information to a flamegraph, speedscope
              or raw file
    top       Displays a top like view of functions consuming CPU
    dump      Dumps stack traces for a target program to stdout
    help      Prints this message or the help of the given subcommand(s)

Cool, that worked! Now, you’re also able to install py-spy locally on your host system. When you do, be sure to read the documentation at: https://github.com/benfred/py-spy, on how to do this.

1.3 Python

Next, we want to create a Python program that we will use to profile. Create a new file in the app/ folder name run.py and copy the contents into it.

# app/run.py
import random


def factorial(n):
    factorial = 1
    for i in range(1, n + 1):
        factorial *= i

    return factorial


if __name__ == "__main__":
    while True:
        n = random.choice(range(1, 5))
        f = factorial(n=n)

        print("Factorial of {n} is {f}".format(n=n, f=f))

Next, create a Dockerfile in the app/ folder:

# app/Dockerfile
FROM python:3.6
WORKDIR /usr/src/app
COPY run.py .
CMD [ "python", "./run.py" ]

And again, let’s see if it is working:

$ cd app/
$ docker build -t app .
$ docker run -t app

Press Control-C to exit the program.

1.4 Docker Compose

I promised that I’ll show you how to also use Docker Compose to orchestrate our containers. Create docker-compose.yml file and add the following to it:

# docker-compose.yml
version: "3"
services:

  pyspy:
    build:
      context: pyspy/
    pid: "host"
    privileged: "true"
    volumes:
      .:/profiles

  app:
    build:
      context: app/
    cap_add:
     - sys_ptrace

Now, we can build the containers like so:

$ docker-compose build

You’ll probably wondering about whtat the pid, privileged and cap_add is used for, and I’ll get to that in the next section.

2. Profiling the program

We’ve set everything up and we’re ready to test everything. First, let’s run the python program. We need to add --cap-add sys_ptrace, because by default Docker images do not have the SYS_PTRACE capability. And this is because Docker is restricting the process_vm_readv system call that py-spy uses to directly read the memory of the Python program.

# docker
$ docker run --cap-add sys_ptrace -t app --name py-app

# docker-compose
$ docker-compose up app

Now, before we’re able to use the py-spy program, we need to get the PID of the Python program that is running in the container. We will be using the PID of the program, and py-spy is going to sample from it.

# docker
$ docker inspect --format '{{.State.Pid}}' py-app
26982

# docker-compose
$ docker inspect --format '{{.State.Pid}}' profile-python-docker_app_1
26982

Now, that we’ve got the PID we will be using it with py-spy to profile our Python application. Now that we’re using the PID of the running program in the Docker container. We need the py-spy container to be able to use this PID inside it’s container. So we to use the host’s PID namespace inside this container, and we do that by adding the flag --pid=host to the run command. Additionally, we need to add the --privileged flag, so that it allows the container the same access to the host as processes running outside containers on the host.

# docker: change $(pwd) where you want the profiles to be saved
$ docker run \
    --pid=host \
    --privileged \
    -v $(pwd):/profiles \
    -it pyspy record -o myprofile.svg --pid 26982

# docker-compose
$ docker-compose run pyspy record -o myprofile.svg --pid 26982

You can now open the myprofile.svg file with a browser and look at the flamegraph that was created, and you’ll be able to interpret the results.

myprofile

The horizontal axis represents the total number of samples collected. So the larger the area, the more time has been spent executing the associated function. The vertical axis represents the depth of the call stack. So the higher the peak, the deeper the call stack. Colors don’t represent anything specific; they’re just there to make a visual contrast. (source)

A cool feature of py-spy is that we can also create our profiles in the speedscope and inspect it with the online tool. Add the --format speedscope flag to the command and you’ll be able to import the profile in the speedscope format.

Conclusion

Now, you’ve got a basic setup to profile your Python applications in a Docker container. So, be sure to read up on all the features and options of py-spy so that you can fully utilize it and, profile your Python applications in Docker containers.