Long lived Python scripts with Supervisor

By rcwd, Mon 29 April 2019, in category Python

cron, ops, supervisor

In a recent post to his blog, fellow Jersey Pythonista, Dave Edwards explored the process by which a Python application could be called on a schedule using Cron for scheduled execution.

Cron is a beautifully simple solution to the problem of regularly calling the same application, it's present on pretty much every *nix system on the planet, has a simple syntax and a range of logging and alerting features which make it a solid addition to any admin's tool box.

That said, there are some limitations to the Cron solution. Primarily it's limited in resolution with 1 minute being the shortest recurring interval you can configure. Secondly it has minimal support to avoid concurrency and thirdly, it's almost completely oblivious to the state of the called command, i.e. if the script crashes out, Cron will just keep on trying (as it should) and it's on you, the developer, to review & diagnose the issue.

And what if I want a long lived Python application? Not everything can fit into convenient 1 minute or time boxed intervals. What if my script has to maintain a connection to remote server and react to events on an ad hoc basis? Sure I could use cron to poll the server but if it only offers a streaming connection (websockets anyone?) then the scheduled task pattern doesn't work.

Servicing Services

The server admins out there will be very familiar with the concept of a "Service" within the operating system. For Windows we have the originally named Windows Services, MacOS (and NetBSD) has LaunchD and Linux has System V / rc.d / Init and more recently SystemD. All of these solutions allow developers of applications (not just python) to write a small amount of configuration that instructs the operating system to start the application at a certain point in the OS life-cycle (on boot, on login, etc.) and tries to ensure it stays running.

All of the above systems take care of a number of key aspects of running applications in this way. They all manage the process(s) by virtue of a Process Identifier, they offer logging and output management and they generally have parameters to control restarts and number of attempted runs in the event of an error.

Enter the Supervisor

All of the above are wonderful but writing the service specifications can be tricky. They require that you know your way around the various run levels and complexities of an OS wide service management system, and are generally used to start critical software on your server so you really don't want to mess them up.

Fortunately for those of us working in Python, and running our applications on Linux, BSD or OSX there is a simpler solution that gives us all of the benefits (and more) without quite so much tedious config - Supervisor

Supervisor consists of three main parts:

We are only really concerned with two as the web gui and XMLRPC interfaces are beyond the scope of this post.

I'm going to assume you have an Ubuntu 18.04 (or later) Server ready to rock, if not you can spin one up on Digital Ocean, Linode, AWS LightSail or wherever you get your VPS.

Installing Supervisor

Note: Supervisor is available as a Package on Ubuntu but, as is often the case, the version in the package repo is rather old (version 3.3.n vs the current version 4.0.2) so we are going to do a bit more work and install the latest and greatest.

Supervisor is distributed as a Python package and is compatible with Python 2.7 (which of course we're not going to use) and Python 3.4 and later. On a clean Ubuntu install we don't have the pip3 package needed to install other packages so we need to install it using ... a package manager:

sudo apt install python3-pip

Note: that I prefer to use the system version of pip3 as we're using the system version of Python3. You could use the get-pip.py installer but please see the warning on the installer's page which pretty much tells you not to do this.

We now have pip3 so we can install Supervisor:

sudo pip3 install supervisor

The sudo is required as we're installing the Supervisor package globally. Normally this is a terrible idea but this isn't a standard package (indeed it's very unlikely to ever be called from your own code) and having it globally available makes life generally easier.

We need to create a config file or Supervisor won't start:

echo_supervisord_conf > sudo tee /etc/supervisord.conf

Let's create a directory to hold our Supervisor Programs and update the config file to reference it:

sudo mkdir /etc/supervisor.d

Now edit the config file again:

sudo nano /etc/supervisord.conf

Scroll to the very end of the file and look for the lines:

;[include]
;files = relative/directory/*.ini

We want to uncomment (remove the semicolon), and update these lines to use the directory we just created:

[include]
files = /etc/supervisor.d/*.ini

Starting Supervisor on Boot

We've got the latest version of Supervisor installed but we need to ensure the daemon starts with our server. This means a little more configuration is required to create the SystemD service to get everything working.

First off let's create the empty service file:

sudo touch /lib/systemd/system/supervisord.service

Now we can populate the file. The config below is based on the supplied config from the Supervisor Github repository. It's actually based on the Centos config as the Ubuntu script is for the older InitV rather than SystemD and the Centos file will work just fine with a couple of simple amendments:

# supervisord service for systemd
# Based on config by ET-CS (https://github.com/ET-CS)
[Unit]
Description=Supervisor daemon

[Service]
Type=forking
ExecStart=/usr/local/bin/supervisord
ExecStop=/usr/local/bin/supervisorctl $OPTIONS shutdown
ExecReload=/usr/local/bin/supervisorctl $OPTIONS reload
KillMode=process
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

Copy the the above onto your clipboard and paste it into our new service file:

sudo nano /lib/systemd/system/supervisord.service

Finally we can load and start the service:

sudo systemctl daemon-reload

sudo systemctl enable supervisord.service

sudo systemctl start supervisord.service

If everything has gone according to plan then great! If not, check the output of sudo journalctrl -xe for hints on what's gone wrong ... and fix it!

Connect & Test

After all the work to install it, we're now ready to verify our Supervisor is working correctly and test it with a simple Python script.

Start by connecting to Supervisor using the command:

sudo supervisorctl

Note: the sudo is important here as a normal user is unlikely to have permission to access the SupervisorD socket.

Supervisorctl is a simple CLI interface with decent built in help. Let's start by confirming the version we have running. Type version and hit enter:

Supervisor CLI showing version number

If all is well, you should be looking at a version number. Something like 4.0.2. Now lets try help:

Supervisor CLI showing help

These are the commands that are available within Supervisor. Don't worry too much about these at the moment as we are going to define a Program first so enter exit to return to the normal command line.

Your First Supervisor Program

Supervisor jobs or processes are referred to as Programs and are defined using a simple syntax either in the main Supervisor config file or via individual files. When we edited supervisor.conf we added an include directive at the end which allows us to configure the various Programs as individual files so let's create one now.

First we're going to need a Python application to run. This can be any long lived Python script but for the sake of this post I've created a simple application that will generate a random integer between 1 and 10 and sleep for that number of seconds. Except if it generates a 10 in which case it will crash:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/env python3
# Sample Python script that generates a random sleep interval and will 
# occasionally crash - for testing Supervisor

import logging
from random import randint
from time import sleep

logging.basicConfig(
    format='%(asctime)s %(levelname)s %(message)s', 
    level=logging.DEBUG
)


def main():
    while True:
        i = randint(1, 10)
        # trigger a crash if we get a 10
        if i == 10:
            logging.error('Generated {}. Application Crashing'.format(i))
            raise Exception('Application Crashing')
        else:
            logging.info('Generated {}. Sleeping'.format(i))
        sleep(i)


if __name__ == "__main__":
    print('Starting the simple test application')
    main()


Let's create this file in our home directory on the server:

nano ~/supervisor-test.py

Paste in the code from above and feel free to run it to see what the application does. Given the random nature of random numbers it might take a while before it crashes out.

Now we need to write a Supervisor Program to run this very useful script. Programs are written in ini format and start with the directive program and the name of your application.

[program:supervisor_test_app]

Keep your Program names simple! Firstly you'll need to type them into supervisorctl and secondly any excess punctuation can break the ini format.

Next we need to tell Supervisor what to run by adding a command directive:

[program:supervisor_test_app]
command=/usr/bin/python3 /home/rcwd/supervisor-test.py

And that is the simplest form of Supervisor Program we can create. Let's make a new file in our /etc/supervisor.d/ directory so we can load it:

sudo nano /etc/supervisor.d/supervisor_test.ini

Note: the .ini extension is required. When we added the include to the main config file we told it only to look for .ini files.

Paste in the content above, save the file, and jump back into supervisorctl (don't forget to sudo).

We need to run a couple of commands in Supervisor. First we need to reread to load the new config file. Then we need to add supervisor_test_app to add and start it. Finally we'll check the status of the job:

Supervisor CLI showing reread, Program load and status

If all has gone to plan, you should see something like

supervisor_test_app     RUNNING     pid 13165, uptime 0:00:02

in response to the status command. Your pid and uptime will likely be different.

Is This Thing On?

Our application is running but are we getting any additional value from all of this work? Well let's see.

If you ran the application outside of Supervisor (or read the source) you'll see that we have a range of outputs from the script. We have the print statement on initialization, the logging messages and a traceback when the application crashes. If you ran the script in a terminal you'd expect to see all of this output echoed to your shell. But where does it go in Supervisor?

One of the neatest features offered by the standard config is the ability to redirect Standard Out and Standard Error to log files. Supervisor does this by default (at least based on the config we're using) and it creates both a core supervisord.log but also a unique stdout and stderr log for each Program we've defined.

In our current config these are stored in /tmp. I'll leave it up to you to work out how to move these to /var/log (hint, read /etc/supervisord.conf) but for now we can work with the files in the current location.

Exit out of supervisorctl and list all Supervisor log files in /tmp:

ls -l /tmp/supervisor*.log

You should see three files although the files names will differ slightly from the ones below:

List of Supervisor Log Files

Don't worry if you don't know what these mean, let's take a look inside the files to get a real-time view of what's happening with our poor little test script:

sudo tail -f /tmp/supervisor*.log

Note: the sudo is important here as you won't have permission to access the files otherwise.

If you watch the logs for a short while you should see that:

Keep watching and you should see the app crash time and time again, and every time, Supervisor brings it back to life. It's a Pythonic Reincarnation Miracle!

Next Steps

Supervisor has a load of options, both for the core application and for defining Programs. I'd encourage you to review and edit /etc/supervisord.conf as the file is well commented and the default configuration likely needs some further tweaking.

For Program definition you should definitely read the documentation under Program:X Section Values which detail all of the options available to you. A couple of things to try:

Also please don't forget that, having an application that comes back to life is no replacement for writing solid code in the first place. You should always monitor your logs and, where possible, fix issues that cause your applications to crash. Remember, exceptions should be exceptional.

If you have any questions or feedback, please feel free to reach out on Twitter but for now I hope this has been useful and may your applications live long (and prosper) 🖖🏻