What A Difference 2 Percentage Points Makes and Nate Silver vs Huff Po Revisited

There is likely to be lots of recriminations in the pollster space but I think one thing is pretty clear — there was not enough uncertainty in the various methods of poll aggregation.

I will highlight the Huffington Post (Huff Po) model because they had such hubris about their approach.  Indeed, Ryan Grimm wrote an attempted attack piece on the 538 model which stated “Silver is making a mockery of the very forecasting industry that he popularized.”  (Turns out his organization was the one making the mockery)

Nate Silver responded quite correctly that this Huff Po article is “so fucking idiotic and irresponsible.”  And it was indeed.

Even after the election Huff Po is out there trying to characterize it as a black swan event.  It is *not* a black swan event.  Far from it … and among the major poll aggregators Five Thirty Eight was the closest because they had more uncertainty (which turns out was quite appropriate).  Specifically, the uncertainty that cascaded through 538’s model was truthful … and just because it resulted in big bounds didn’t mean that it was a bad model, because the reality is that the system in question was intrinsically unpredictable / stochastic.

From the 538 article cited above “Our finding, consistently, was that it was not very robust because of the challenges Clinton faced in the Electoral College, especially in the Midwest, and therefore our model gave a much better chance to Trump than other forecasts did.

Take a look again at the justification (explanation) from the Huffington Post:  “The model structure wasn’t the problem. The problem was that the data going into the model turned out to be wrong in several key places.”

Actually the model structure was the problem in so much as any aggregation model should try to characterize (in many ways) the level of uncertainty associated with a particular set of information that it is leveraging.

Poll aggregation (or any sort of crowd sourcing exercise) is susceptible to systemic bias. Without sufficient diversity of inputs, boosting and related methodological approaches are *not* able to remove systematic bias. However, one can build a meta-meta model whereby one attempts to address the systemic bias after undertaking the pure aggregation exercise.

So what is the chance that a set of polls have systematic error such that the true preferences of a group of voters is not reflected?  Could their be a Bradley type effect?  How much uncertainty should that possibility impose on our predictions?  These were the questions that needed better evaluation pre-election.

It is worth noting that folks were aware of the issue in theory but most of them discounted it to nearly zero.  Remember this piece in Vanity Fair which purported to debunk the Myth of the Secret Trump Voter (which is the exact systematic bias that appeared to undermine most polls)?

Let us look back to the dynamics of this election.  There was really no social stigma associated with voting for Hillary Clinton (in most social circles) and quite a bit (at least in certain social circles) with voting for Trump.

So while this is a set back for political science, I am hoping what comes from all of this is better science in this area (not a return to data free speculation (aka pure punditry)).

P.S. Here is one more gem from the pre-election coverage – Election Data Hero Isn’t Nate Silver. It’s Sam Wang (the Princeton Professor had HRC at more than a 99% chance of winning).  Turns out this was probably the worst performing model because it has basically zero model meta-uncertainty.

Econometrics (hereinafter Causal Inference) versus Machine Learning

Perhaps some hyperbolic language in here but the basic idea is still intact … for law+economics / empirical legal studies – the causal inference versus machine learning point is expressed in detail in this paper called “Quantitative Legal Prediction.”  Mike Bommarito and I have made this point in these slides, these slides, these slides, etc.   Mike and I also make this point on Day 1 of our Legal Analytics Class (which really could be called  “machine learning for lawyers”).

Law on the Market? Evaluating the Securities Market Impact Of Supreme Court Decisions (Katz, Bommarito, Soellinger & Chen)

Screen Shot 2015-08-24 at 5.01.07 PM
: Do judicial decisions affect the securities markets in discernible and perhaps predictable ways? In other words, is there “law on the market” (LOTM)? This is a question that has been raised by commentators, but answered by very few in a systematic and financially rigorous manner. Using intraday data and a multiday event window, this large scale event study seeks to determine the existence, frequency and magnitude of equity market impacts flowing from Supreme Court decisions.

We demonstrate that, while certainly not present in every case, “law on the market” events are fairly common. Across all cases decided by the Supreme Court of the United States between the 1999-2013 terms, we identify 79 cases where the share price of one or more publicly traded company moved in direct response to a Supreme Court decision. In the aggregate, over fifteen years, Supreme Court decisions were responsible for more than 140 billion dollars in absolute changes in wealth. Our analysis not only contributes to our understanding of the political economy of judicial decision making, but also links to the broader set of research exploring the performance in financial markets using event study methods.

We conclude by exploring the informational efficiency of law as a market by highlighting the speed at which information from Supreme Court decisions is assimilated by the market. Relatively speaking, LOTM events have historically exhibited slow rates of information incorporation for affected securities. This implies a market ripe for arbitrage where an event-based trading strategy could be successful.

Available on SSRN and arXiv

Even The Algorithms Think Obamacare’s Survival Is A Tossup (via 538.com)

Screen Shot 2015-03-04 at 10.03.19 AM

Readers will probably observe that {Marshall+} is still a work in progress (for example – my colleague noted {Marshall+} believes that Justice Ginsburg would appear to be slightly more likely to vote to overturn the ACA than Justice Thomas).  While this probably will not prove to be correct in King v. Burwell, our method is rigorously backtested and designed to minimize errors across all predictions (not just in this specific case).  This optimization question is tricky for the model and it will be the source of future model improvements. I have preached the whole mantra Humans + Machines > Humans or Machines and this problem is a good example.  The problem with exclusive reliance upon human experts is they have cognitive biases, info processing issues, etc.  The problem with models is that they generate errors that humans would not.

Anyway, the good thing about having a base model such as {Marshall+} is that we can begin to incorporate a range of additional information in an effort to create a {Marshall++} and beyond.    And on that front there is more to come …

The Utility of Text: The Case of Amicus Briefs and the Supreme Court (by Yanchuan Sim, Bryan Routledge & Noah A. Smith)

From the Abstract: “We explore the idea that authoring a piece of text is an act of maximizing one’s expected utility. To make this idea concrete, we consider the societally important decisions of the Supreme Court of the United States. Extensive past work in quantitative political science provides a framework for empirically modeling the decisions of justices and how they relate to text. We incorporate into such a model texts authored by amici curiae (“friends of the court” separate from the litigants) who seek to weigh in on the decision, then explicitly model their goals in a random utility model. We demonstrate the benefits of this approach in improved vote prediction and the ability to perform counterfactual analysis.  (HT: R.C. Richards from Legal Informatics Blog)

This Computer Program Can Predict 7 out of 10 Supreme Court Decisions (via Vox.com)

scotus_predictionThe story is here.  Full form interview with Mike + Josh is here. (I unfortunately could not participate because I was teaching my ICPSR class).  Our paper is available on SSRN and on the physics arXiv.

Predicting the Behavior of the Supreme Court of the United States: A General Approach (Katz, Bommarito & Blackman)

SCOTUS Prediction Model
:  “Building upon developments in theoretical and applied machine learning, as well as the efforts of various scholars including Guimera and Sales-Pardo (2011), Ruger et al. (2004), and Martin et al. (2004), we construct a model designed to predict the voting behavior of the Supreme Court of the United States. Using the extremely randomized tree method first proposed in Geurts, et al. (2006), a method similar to the random forest approach developed in Breiman (2001), as well as novel feature engineering, we predict more than sixty years of decisions by the Supreme Court of the United States (1953-2013). Using only data available prior to the date of decision, our model correctly identifies 69.7% of the Court’s overall affirm and reverse decisions and correctly forecasts 70.9% of the votes of individual justices across 7,700 cases and more than 68,000 justice votes. Our performance is consistent with the general level of prediction offered by prior scholars. However, our model is distinctive as it is the first robust, generalized, and fully predictive model of Supreme Court voting behavior offered to date. Our model predicts six decades of behavior of thirty Justices appointed by thirteen Presidents. With a more sound methodological foundation, our results represent a major advance for the science of quantitative legal prediction and portend a range of other potential applications, such as those described in Katz (2013).”

You can access the current draft of the paper via SSRN or via the physics arXiv.  Full code is publicly available on Github.  See also the LexPredict site.  More on this to come soon …