Back to Future in Legal Artificial Intelligence — Expert Systems, Data Science and the Need for Peer Review Technical Scholarship

In the broader field of Artificial Intelligence (A.I.) there is a major divide between Data Driven A.I. and Rules Based A.I.  Of course, it is possible to combine these approaches but let’s keep it separate and easy for now.  Rules Based AI in the form of expert systems peaked in the late 1980’s and culminated in the last AI Winter.  Absent a few commercial examples such as TurboTax, the world moved on and Data Driven A.I. took hold.

But here in #LegalTech #LawTech #LegalAI #LegalAcademy – it seems more and more like we have gone ‘Back to the A.I. Future’ (and brought an IF-THEN back in the Delorean).  As even in 2020, we see individuals and companies touting themselves for taking us Back to the A.I. Future.

There is nothing wrong with Expert Systems or Rules Based AI per se.  In law, the first expert system was created by Richard Susskind and Phillip Capper in the 1980’s.  Richard discussed this back at ReInventLaw NYC in 2014.    There are a some use cases where Legal Expert Systems (Rules Based AI) are appropriate.  For example, it makes the most sense in the A2J context.  Indeed, offerings such as A2J Author and Docassemble are good examples. However, for many (most) problems (particularly those with a decent level of complexity) such rule based methods alone are really not appropriate.  

Data Science — mostly leveraging methods from Machine Learning (including Deep Learning) as well as Natural Language Processing (NLP) and other computational allied methods (Network Science, etc.) are the modern coin of the realm (both in the commercial and academic spheres).

As the image above highlights, the broader A.I. world faces challenges associated with overhyped AI and faux expertise. #LegalAI also faces the problem of individuals and companies passing themselves off as “cutting edge AI experts” or “offering cutting edge AI products” without an academic record or codebase to their name. 

In the academy,  we judge scholars on academic papers published in appropriate outlets.  In order for someone to be genuinely considered an {A.I. and Law Scholar, Computational Law Expert, NLP and Law Researcher} that scholar should publish papers in technically oriented Peer Reviewed journals (*not* Law Reviews or trade publications alone).  In the Engineering or Computer Science side of the equation, it is possible to substitute a codebase (such as a major Python package or contribution) for peer reviewed papers.  In order for this field to be taken seriously within the broader academy (particularly by technical inclined faculty), we need more Peer Reviewed Technical Publications and more Codebases. If we do not take ourselves seriously – how can we expect others to do so.

On the commercial side, we need more objectively verifiable technology offerings that are not in line with Andriy Burkov’s picture as shown above … this is one of the reasons that we Open Sourced the core version of ContraxSuite / LexNLP.

NLLP Workshop 2020 — Legal Text Analysis Session — Video of Natural Legal Language Processing Workshop is Now on YouTube

NLLP Workshop 2020 Session 1: Legal Text Analysis — Video of Natural Legal Language Processing Workshop is Now on YouTube.  

Unfortunately, I was not available to participate as I was teaching class at the time of the workshop. However, Corinna Coupette and Dirk Hartung represented us well !  

Copy of the paper presented is available here —
SSRN LINKhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=3602098
arXiv LINKhttps://arxiv.org/abs/2005.07646

2nd Workshop on Natural Legal Language Processing (NLLP) – Co-Located at the broader 2020 KDD Virtual Conference

Today is the 2nd Workshop on Natural Legal Language Processing (NLLP) which is co-located at the broader 2020 KDD Virtual Conference. Corinna Coupette is presenting our paper ‘Complex Societies and the Growth of the Law’ as a Non-Archival Paper. NLLP is a strong scientific workshop (I did one the Keynote Addresses last year and found it to be a very good group of scholars and industry experts). More information is located here.

Revisiting Distance Measures for Dynamic Citation Networks – Published in Physica A


I was revisiting some of our old stuff for this Oslo event -early on for us on our #LegalPhysics #LegalAnalytics path – published in Physica A – “By applying our sink clustering method, we obtain a dendrogram of the network’s largest weakly connected component shown in Fig. 4. However, despite their general topical relatedness, these two clusters of cases engage substantively different sub-questions, and are thus appropriately divided into separate clusters. While not a major focus of the docket of the modern court, the early court elaborated a number of important legal concepts through the lens of these admiralty decisions. For example, the red group of cases engages questions of presidential power and the laws of war, as well as general interpretations of the Prize Acts of 1812. Meanwhile, the blue cluster engages questions surrounding tort liability, jurisdiction, and the burden of proof.” 

Measuring the Temperature and Diversity of the U.S. Regulatory Ecosystem – Bommarito + Katz – Presentation at Stanford CodeX on April 5th 2017


We are excited to be giving a talk at Stanford the day before the Future Law Conference.  Our talk will be hosted by Stanford CodeX – The Center for Legal Informatics. If you are in the Bay Area – you can join us by signing up for free here.

The underlying paper is available here.  Some starter slides here (start at slide 158) and we will be previewing our second paper in this three part series.

Measuring the Temperature and Diversity of the U.S. Regulatory Ecosystem (Preprint on arXiv + SSRN)

From the Abstract:  Over the last 23 years, the U.S. Securities and Exchange Commission has required over 34,000 companies to file over 165,000 annual reports. These reports, the so-called “Form 10-Ks,” contain a characterization of a company’s financial performance and its risks, including the regulatory environment in which a company operates. In this paper, we analyze over 4.5 million references to U.S. Federal Acts and Agencies contained within these reports to build a mean-field measurement of temperature and diversity in this regulatory ecosystem. While individuals across the political, economic, and academic world frequently refer to trends in this regulatory ecosystem, there has been far less attention paid to supporting such claims with large-scale, longitudinal data. In this paper, we document an increase in the regulatory energy per filing, i.e., a warming “temperature.” We also find that the diversity of the regulatory ecosystem has been increasing over the past two decades, as measured by the dimensionality of the regulatory space and distance between the “regulatory bitstrings” of companies. This measurement framework and its ongoing application contribute an important step towards improving academic and policy discussions around legal complexity and the regulation of large-scale human techno-social systems.

Available in PrePrint on SSRN and on the Physics arXiv.