A Rational But Ultimately Unsuccessful Critique of Nate Silver

This article is reasonable in so much as it is a rational argument against Nate Silver’s work at 538 (rather than the ridiculous nonsense he had to endure from folks who are totally clueless – UnSkewedPolls.com, etc.).  However, it is ultimately unsuccessful.

“Nate Silver didn’t nail it; the pollsters did.”  Not true.  They both got it correct (or as accurate as can be when there is only 1 event that is being modeled).

“To be fair, the art of averaging isn’t simple.”  Well it is not just averaging.  Pure averaging is totally stupid.  This is weighting and it is non-trivial because you need to build a notion of how much signal vs. noise to assign to each {pollster, time point combo}.  Some of these polling outfits are totally disreputable and some have historic “house effects” (see e.g. Rasmussen).  With respect to time – the question is how much of the past is useful for predicting the future – so you need some sort of decay function to phase out the impact of prior data points (prior polls) on your current prediction.

It is correct to say that Nate Silver’s model cannot be validated in a traditional sense – he uses simulation – because on every day other than election day – there is no way to execute a direct test of the accuracy of the model.  Simulation is basically as good as we can do in an environment where there is only one event and it is perfectly valid as a scientific endeavor.  If folks want to complain and actually be taken seriously – they can come up with their own positive approach.  The scientific community can engage the competing claims.  For example, the Princeton Election Consortium is a good example of a challenge to the 538 methodology.

No matter what 538 is a hell of a lot better than the status quo practices previously existed to its founding in early 2008. The level of jealously directed toward Nate Silver is completely transparent.  If you want to get all Popperian – go right ahead but then you have little or nothing to say about this or most other prediction problems.  This is what happened in quantitative finance / algo trading and the arbitrage went to those who were not worried about whether what they were doing was science or just engineering [insert Sheldon Cooper quote here] .

One thing we can hope comes out of all this is that all of the data free speculation that was undertaken prior to the election can be put to bed.  I talking about you Dick Morris, Karl Rove, etc. – perhaps you guys should consider retirement and leave the arguments to the serious quants.

Natural Language Processing and Machine Learning for Electronic Discovery – Mike Bommarito Guest Lecture in Katz / Candeub Course

Yesterday I asked fellow Computational Legal Studies blogger Mike Bommarito to give an expert and fairly technical guest lecture in my e-Discovery seminar. Here is what Mike wrote over at his blog and the slides he generated for class are featured below.

“This seminar, taught jointly between Professor Daniel Martin Katz and Professor Adam Candeub, is an excellent example of MSU’s strategic pivot to deliver practical, 21st-century skills to their students. The goal of the talk was to provide students with the ability to understand and communicate with their discovery and predictive coding software vendors and service providers with respect to the underlying mechanics of predictive coding. It was a pleasure to present to these students, and I would encourage anyone interested to follow up by email with any questions they might have.”

Homo Electronicus {via Law Technology News}


From the article: “We can’t imagine a competent lawyer not knowing how to find a document in a file folder or cabinet; yet, oddly, we can’t imagine a lawyer knowing how to fashion a competent electronically stored information search protocol or query a database. We barely expect lawyers to know what ESI protocols and databases are. We’ve set the bar too low for the Bar, and clients and judges are suffering as a consequence. … Part of the problem is that the practical education of lawyers has long depended upon veteran partners handing down the lore of lawyering to associates. But when it comes to e-discovery, veteran lawyers have nothing to share.”

“I say, let’s start learning to carry our own briefcases when it comes to digital evidence. Let’s stop kidding ourselves that this isn’t something we need to understand, and stop being so damned afraid to get our hands dirty with data or look like we might not be the smartest person in the room because we don’t know what goes on under the hood!”

Quick Response: In surveying the landscape of other law schools, it is quite correct to say that very few schools teach e-discovery at all and most of them that do give the tech short shrift. I would just like to note  – however – that I teach E-Discovery here at Michigan State University College of Law with my colleague Adam Candeub  and we do not skip out of the tech or (“e”) side of the e-discovery.

We teach the tech in significant detail in part because we have the technical skills that most law professors do notplease see  (HERE) as well as (here) (here) (here) (here) (here) (here) (here) (here) (here) (here) (here) (here) (here), etc.

Our e-discovery course is designed to give our students a competitive advantage in the legal labor market.  As this article makes clear, there is a clear arbitrage play here for entry level lawyers to get a foothold because most of the practicing bar (as well as most law students) are not very sophisticated when it comes to e-discovery.

I should note that when we proposed this class – my colleague and e-discovery co-professor Adam Candeub got into a bit of a dust up over at Above the LawObviously, we ended having the last word when ~15 thought leaders told them they were wrong and they had to publish the equivalent of a retraction. 🙂