Programming Dynamic Models in Python: Coding Efficient Dynamic Models

October 21, 2009 clsadmin1 Comment

In the next few tutorials, we’re going to transition to exploring how to model dynamics on a network.

The first tutorial was a bit of a blockbuster length-wise because there was a lot of ground to cover to get the basic model up and running. Moving forward, we’ll be able to go a bit more incrementally, adding elements to the basic model as we go. If you don’t feel comfortable with the original, go back and take a look at it and make sure that it makes sense to you before moving on.

We’re first going to deal with some of the efficiency issues in the first model. After this, we’ll make some basic changes to the architecture of the SIR program that makes it more amenable to contact patterns on a social network.

Finally, we’ll show you how to how to take the output of your epidemic model and generate animations like this one:

Blue nodes are exposed but uninfected, red nodes are infectious, and yellow ones have recovered.

The movie is a bit of a carrot to get you through the less flashy, but, I promise, important and actually interesting nuts and bolts of putting these kinds of models together.

This tutorial is going to cover the last two big things that we need to tackle before we get to the model of an outbreak on a network. So, here we go!

New Concepts

1. Arbitrarily Distributed Infectious Periods

First, we’re going to deal with the duration of the infectious period. The assumption of an exponentially distributed infectious period is unnecessarily restrictive for a general model of diffusion, and the way the original code goes about recovering individuals – drawing a random number on every step for every infectious individual – should strike you as both inelegant and computationally inefficient, particularly when the rate of recovery is slow and there are many infectious individuals.

In order to deal with this, we’re going to introduce two new tools. The first is the scipy.stats toolkit and the second is a neat (and very easy to use) data structure called a heap.

A heap is in very many ways what it sounds like: imagine a pile of trash in a landfill; the tires and rusting washing machines are on the bottom, while the pop cans and grocery store receipts are closer to the top.

As a programming tool, a heap is useful because it always keeps the smallest (or largest, depending on your preference) item at the top of the list. It also allows for linear-time insertion and removal of objects. This means that the time it takes to execute an action grows proportionally to the size of the list, so if it has N items, it takes N*C steps (where C is a constant) to process the list, and if it has 2*N items, it takes 2*N*C steps. Other ways of sorting could take N^2 or worse steps to do the same.

In our outbreak model, the top item on the heap is always going to be the time at which the next individual recovers. By doing this, we can avoid the loop in the first tutorial (and replicated in one implementation here) that checks whether each infectious individual is going to recover on each step.

Looping over everyone is the most intuitive way to check if they’re going to recover, but it’s very inefficient, especially when infectious periods are long and the population is large. It’s also problematic from a theoretical perspective, because it chains us to exponentially distributed recovery periods.

Exponentially distributed infectious periods make analytic sense for a deterministic model, but your disease or *insert diffusible here* may have a constant or normally distributed ‘infectious’ period.

By using a heap-based implementation, as you will see, we can use arbitrary recovery periods, and Python’s implementation of the heap is very straightforward – just a small twist on the usual list using the heapq module.

2. Object Oriented Programming

One of Python’s strengths is that it supports a style of programming that mixes the best of object-oriented programming (OOP) and procedural or imperative programming.

We won’t go too deep into the details of OOP here, but the real strength of OOP implementations are that they allow code to be easily re-used in other programs (Python’s all-powerful ‘import‘ statement really makes this true) and also forces some structure on what functions have access to what variables, etc.

Click Below to Review the Implementation and Commented Code!

Implementation

I’ll walk you through the steps of implementing and running the model below. If you aren’t familiar with how Python does OOP, go take a look here to get the basics down. Python OOP is very friendly, although its relative simplicity of implementation can be a double-edged sword when compared to more strongly-typed languages.

For example, you could fill a list with a combination of the ‘simpleSIRModel’ objects below, some integers, other Python lists, etc., but then you need to know what to do with every object as it comes off the list.

This isn’t a very steep hill to climb, but it can be tricky if this is your first rodeo. By contrast, in Java, you would need to implement something known as a polymorphic array and then cast objects back to their type as you take them off of the list. That kind of type-safe implementation forces you to be aware of what objects are in what data structures, but you pay a penalty in terms of speed of coding and flexibility.

By contrast, Python does some behind-the-scenes work to deal with lists of arbitrarily typed objects, but it makes your life a lot easier. I’ll discuss why you might want to do this after we walk through the code.

For this one, you’re definitely going to need Python 2.4+ and a working install of relatively recent versions of Scipy and Numpy.

(Mac users, just do yourself a favor and install the Enthought Python Distribution.)

I think the justification for using a heap of recovery times is straightforward: it’s flexible, may be faster, and lets us use arbitrary recovery periods. But why go all OOP?

If you take a look back at the model in the first tutorial, it should become clear that the main way you’re going to re-use that code is by cutting and pasting.

By contrast, this program can either be run using the Python interpreter (the code below “if __name__ == ‘__main__’:” only runs when it is used as a standalone program), or imported into another python script. We can then create multiple outbreak objects, subclass their methods, etc.

For example, I have used the model in the code example below to simulate outbreaks in a collection of households that were all exposed to a gastrointestinal pathogen called norovirus. In order to do this, I import this code and create a new ‘communityOutbreak’ object type that is filled with simpleSIRModel objects. When I call this object’s ‘run’ method, it calls the run method of each simpleSIRModel object contained within and records the results.

When we get to the networked outbreak model, I’ll show you another scenario where it can be helpful to import components from one model into another.

Ok, some food for thought for those of you playing along at home:

How could we better use objects here to use arbitrary recovery time distributions?
How could you make this even more flexible as a command-line program? Try to add command line arguments using either Python’s argv or getopt tools.
Try to run this a multiple times and output averages of the runs over time to get a sense of the model’s asymptotic behavior.