In this series of tutorials, we are going to focus on the theory and implementation of transmission models in some kind of population.
In epidemiology, it is common to model the transmission of a pathogen from one person to another. In the social sciences and law, we may be interested in thinking about the way in which individuals influence each other’s opinions, ideology and actions.
These two examples are different, but in many ways analogous: it is not difficult to imagine the influence that one individual has on another as being similar to the infectivity of a virus in the sense that both have the ability to change the state of an individual. One may go from being susceptible to being infected, or from unconvinced to convinced.
Additionally, social networks have become an important area of study for epidemiological modelers. We can imagine that the nature of the network is different than the ones we think about in the social sciences: when studying outbreaks of a sexually transmitted disease, one doesn’t care all that much about the friendship networks of the people involved, while this would be very important for understanding the impact of social influence on depression and anxiety.
As someone who spends a lot of time working in the space where epidemiology and sociology overlap, I end up thinking a lot about these models – and their potential application to new and different problems and am really excited to share them with a broader audience here. In this first tutorial, I’m going to introduce a simple Susceptible-Infected-Recovered (SIR) model from infectious disease epidemiology and show a simple, Pythonic implementation of it. We’ll work through the process of writing and optimizing this kind of model in Python and, in the final tutorials, will cover how to include a social network in the simulation model.
In order to use the example below, all you need to have installed is a current version of Python (2.4+ is probably best) and the excellent Python plotting package Matplotlib in order to view output. If you don’t have Matplotlib and don’t want to go and install it (although I guarantee you won’t regret it), just comment out import for Pylab and any lines related to plotting.
Model Assumptions
1. State Space / Markov Model
Before getting into the mechanics of the model, let’s talk about the theory and assumptions behind the model as it is implemented here:
The SIR model is an example of a ‘state space‘ model, and the version we’ll be talking about here is a discrete time, stochastic implementation that has the Markov property, which is to say that its state at time t+1 is only conditional on the parameters of the model and its state at time t.
For the uninitiated, in a state-space model, we imagine that each individual in the system can only be in one state at a time and transitions from state to state as a function of the model parameters, i.e., the infectivity of the pathogen or idea and the rate of recovery from infection…and the other states of the system. In other words, the system has endogenous dynamics. This is what makes it both interesting and in some ways difficult to work with.
In the SIR model, we assume that each infectious individual infects each susceptible individual at rate beta. So, if beta = .5, there is a 50% chance that each susceptible individual will be infected by an exposure to an infectious individual. For this reason, as the number of infected individuals in the system grows, the rate at which the remaining susceptible individuals is infected also grows until the pool of susceptible individuals is depleted and the epidemic dies out.
The other parameter we care about is gamma, or the rate of recovery. If gamma is also equal to .5, we assume that the average individual has a 50% chance of recovering on a given day, and the average duration of infectiousness will be 1/gamma, or 2 days.
We refer to the ratio beta/gamma as the basic reproductive ratio, or Ro (‘R naught’). When this number is less than one, we typically expect outbreaks to die out quickly. When this quantity is greater than one, we expect that the epidemic will grow and potentially saturate the whole population.
2. Homogeneous Mixing:
We’re assuming a world in which everyone has simultaneous contact with everyone else. In other words, we’re thinking of a totally connected social network. If you’re a regular reader of this blog, a social network enthusiast, or in some other way a thinking person, this assumption probably seems unreasonable. It turns out, however, that for many diseases, this assumption of homogeneous or ‘mass-action’ mixing, which was actually originally borrowed from chemistry, turns out to be a reasonable approximation.
For instance, if we are trying to approximate the transmission dynamics of a very infectious pathogen like measles in a city or town, we can usually overlook social network effects at this scale and obtain a very good fit to the data. This is because even very weak contacts can transmit measles, so that friendships and other types of close contacts are not good predictors of risk. Instead, we we are better off looking at a higher level of organization – the pattern of connection between towns and cities to understand outbreaks. In a social context, something like panic may be thought of as being super-infectious (for a really interesting study about the potential relationship between social panic and flu dynamics, see this paper by Josh Epstein).
This is, however, a generally problematic assumption for most problems of social influence, but an understanding of this most basic version of the model is necessary to move on to more complicated contact patterns.
3. Exponentially distributed infectious periods:
In the most basic implementation of the SIR model, we assume that each infectious individual has some probability of recovering on every step. If our model steps forwards in days and individuals have a .5 probability of recovery on each day, we should expect that the time to recovery follows an exponential distribution. This means that most people will be pretty close to the mean, but some will take a relatively long time to recover. This is accurate for a lot of cases, but definitely not for all. In some diseases, recovery times may be lognormal, power-law or bimodally disributed. For social models, the notion of an ‘infectious period’ may not make a tremendous amount of sense at all. But it allows for a very simple and transparent implementation, so we’ll use it here.
CLICK THROUGH TO SEE THE IMPLEMENTATION and RELEVANT PYTHON CODE!
Implementation
Ok, so now it’s time to move on to actually writing and running the model. If you have matplotlib installed, you should be able to just copy or save the code into a new file called ‘simpleSIR.py’ and then run it from the command line using the command ‘python simpleSIR.py’.
I’ll walk you through the substance of the model in the code:
And that’s really it for a basic SIR model. This implementation leaves a lot to be desired, but that’s not really the point of this tutorial.
In 36 lines of actual code (the vast majority of the program is comments), we implemented a stochastic, dynamic model and were able to take a look at the results.
This is really a strength of using a Python based approach: it’s simple, easy to read (for those of you who are Matlab users, it should look pretty familiar), and quick to put together. The code in this file took me about 20 minutes to write.
Your mileage may vary, but the point is that Python is great for getting up and going, prototyping your ideas and seeing what they look like. You can go very far with an all-Python model. If speed becomes a serious issue, you may think about blending in some C++ for the really computationally intensive methods.
For those of you playing along at home, before moving on to the next tutorials, spend some time with the following questions:
- Would this model scale well? Where are there potential bottlenecks in the code?
- What would an object-oriented implementation look like? Why would you want to try one?
- How would you change the assumptions behind the model to look more like a more social scientific problem you care about?
- What would have to (fundamentally) change in the code to be useful for thinking about outbreaks on a network?
Subsequent posts will cover the following topics (in roughly the following order):
#2) Tips and Tricks for writing more efficient model code. (And why you should care!)
#3) Writing Individual/Agent-Based models in Python
#4) Implementing an SIR Model on a Network in Python
#5) Animating networked outbreaks using rSonia
I looked at the code before reading the article and decided not to bother.
S I R N t .. ?
sList iList rList?
Where the f*ck did you learn to program?
@VLDR
Maybe you missed the part where we explain that this is part of a set of tutorials for those new to programming. Maybe you missed the part where we indicate that these variables are named to correspond with the variables from the differential equation model.
Or maybe you’re just a troll and missed a proper raising.
Unlike VLDR, I did read the article. I don’t like his tone very much, but he is right that your use of single letter variable names is bad form. Comments are useful for deciphering the code, but susceptibles = 1000 #number of susceptibles, etc. would not add much typing, and would greatly increase legibility.
Perhaps the most important core tenet of python is that code is read far more often than it is written, so legibility is of paramount importance.
Other than that, this is a good starting point for new programmers, and I am glad to see python taking such a large role in education nowadays (as opposed to icky icky java).
At my work, we write it first to get the logic working, then re-write to make the code standarized and optimized. So your python code is good.
What I found meaningful is how you used it. I now something I didn’t before, and thanks to you code, have a starting point should I use need it.
I’m looking forward to more.
Thanks!
LOL! Should have said I now know something more I didn’t before!
Thanks again.
@VLDR
There’s nothing wrong with this code. S, I and R are obvious acronyms for the numbers of susceptible, infected and recovered. It’s standard to do this (google “SIR model”).
I have to agree with @VLDR. To use long variable names in mathematical formula vcan obscure the formula and introduce mistakes. I am in favour of keeping the code similar to the mathematical formula (but maybe that is because I am a mathematician and write mathematical programs for a living). In other cases, then I think it is good practice to use long and descriptive variable names.