Economists are Prone to Fads, and the Latest is Machine Learning (via The Economist)

screen-shot-2016-11-29-at-1-47-57-pm
We started this blog (7 years ago) because we thought that there was insufficient attention to computational methods in law (NLP, ML, NetSci, etc.)  Over the years this blog has evolved to become mostly a blog about the business of law (and business more generally) and the world is being impacted by automation, artificial intelligence and more broadly by information technology. 

However, returning to our roots here — it is pretty interesting to see that the Economist has identified that #MachineLearning is finally coming to economics (pol sci + law as well).  

Social science generally (and law as a late follower of developments in social science) it is still obsessed with causal inference (i.e. diff in diff, regression discontinuity, etc.).  This is perfectly reasonable as it pertains to questions of evaluating certain aspects of public policy, etc.

However, there are many other problems in the universe that can be evaluated using tools from computer science, machine learning, etc. (and for which the tools of causal inference are not particularly useful). 

In terms of the set of econ papers using ML, my bet is that a significant fraction of those papers are actually from finance (where people are more interested in actually predicting stuff).   

In my 2013 article in Emory Law Journal called Quantitative Legal Prediction – I outline this distinction between causal inference and prediction and identify just a small set of the potential uses of predictive analytics in law.  In some ways, my paper is already somewhat dated as the set of use cases has only grown.  That said, the core points outlined therein remains fully intact …  

What A Difference 2 Percentage Points Makes and Nate Silver vs Huff Po Revisited

screen-shot-2016-11-11-at-10-37-03-am
There is likely to be lots of recriminations in the pollster space but I think one thing is pretty clear — there was not enough uncertainty in the various methods of poll aggregation.

I will highlight the Huffington Post (Huff Po) model because they had such hubris about their approach.  Indeed, Ryan Grimm wrote an attempted attack piece on the 538 model which stated “Silver is making a mockery of the very forecasting industry that he popularized.”  (Turns out his organization was the one making the mockery)

Nate Silver responded quite correctly that this Huff Po article is “so fucking idiotic and irresponsible.”  And it was indeed.
screen-shot-2016-11-11-at-11-08-37-am

Even after the election Huff Po is out there trying to characterize it as a black swan event.  It is *not* a black swan event.  Far from it … and among the major poll aggregators Five Thirty Eight was the closest because they had more uncertainty (which turns out was quite appropriate).  Specifically, the uncertainty that cascaded through 538’s model was truthful … and just because it resulted in big bounds didn’t mean that it was a bad model, because the reality is that the system in question was intrinsically unpredictable / stochastic.

From the 538 article cited above “Our finding, consistently, was that it was not very robust because of the challenges Clinton faced in the Electoral College, especially in the Midwest, and therefore our model gave a much better chance to Trump than other forecasts did.

Take a look again at the justification (explanation) from the Huffington Post:  “The model structure wasn’t the problem. The problem was that the data going into the model turned out to be wrong in several key places.”

Actually the model structure was the problem in so much as any aggregation model should try to characterize (in many ways) the level of uncertainty associated with a particular set of information that it is leveraging.

Poll aggregation (or any sort of crowd sourcing exercise) is susceptible to systemic bias. Without sufficient diversity of inputs, boosting and related methodological approaches are *not* able to remove systematic bias. However, one can build a meta-meta model whereby one attempts to address the systemic bias after undertaking the pure aggregation exercise.

So what is the chance that a set of polls have systematic error such that the true preferences of a group of voters is not reflected?  Could their be a Bradley type effect?  How much uncertainty should that possibility impose on our predictions?  These were the questions that needed better evaluation pre-election.

It is worth noting that folks were aware of the issue in theory but most of them discounted it to nearly zero.  Remember this piece in Vanity Fair which purported to debunk the Myth of the Secret Trump Voter (which is the exact systematic bias that appeared to undermine most polls)?

Let us look back to the dynamics of this election.  There was really no social stigma associated with voting for Hillary Clinton (in most social circles) and quite a bit (at least in certain social circles) with voting for Trump.

So while this is a set back for political science, I am hoping what comes from all of this is better science in this area (not a return to data free speculation (aka pure punditry)).

P.S. Here is one more gem from the pre-election coverage – Election Data Hero Isn’t Nate Silver. It’s Sam Wang (the Princeton Professor had HRC at more than a 99% chance of winning).  Turns out this was probably the worst performing model because it has basically zero model meta-uncertainty.

2016 General Counsel Summit Presented by The Economist Magazine


I am excited to speak tomorrow at the 2016 General Counsel Summit in London presented by The Economist Magazine.

It will be a short trip to London for me as I will be flying back on Thursday in time for final preparations for the Fin(Legal)Tech Conference that I am helping organize November 4th 2016 at Illinois Tech – Chicago Kent College of Law.

I look forward to seeing folks in London and Chicago later this week!