Data Science & Machine Learning in Containers (or Ad Hoc vs Enterprise Grade Data Products)

As Mike Bommarito, Eric Detterman and I often discuss – one of the consistent themes in the Legal Tech / Legal Analytics space is the disconnect between what might be called ‘ad hoc’ data science and proper enterprise grade products / approaches (whether B2B or B2C). As part of the organizational maturity process, many organizations who decide that they must ‘get data driven’ start with an ad hoc approach to leveraging doing data science. Over time, it then becomes apparent that a more fundamental and robust undertaking is what is actually needed.

Similar dynamics also exist within the academy as well. Many of the code repos out there would not be considered proper production grade data science pipelines.  Among other things, this makes deployment, replication and/or extension quite difficult.

Anyway, this blog post from outlines just some of these issues.

Complex Societies and the Growth of the Law – Published Today in Scientific Reports (Nature Research)

Access the Full Article via Scientific Reports (Nature Research). This article is part of a special compilation for Scientific Reports devoted to Social Physics.

ABSTRACT: While many informal factors influence how people interact, modern societies rely upon law as a primary mechanism to formally control human behaviour. How legal rules impact societal development depends on the interplay between two types of actors: the people who create the rules and the people to which the rules potentially apply. We hypothesise that an increasingly diverse and interconnected society might create increasingly diverse and interconnected rules, and assert that legal networks provide a useful lens through which to observe the interaction between law and society. To evaluate these propositions, we present a novel and generalizable model of statutory materials as multidimensional, time-evolving document networks. Applying this model to the federal legislation of the United States and Germany, we find impressive expansion in the size and complexity of laws over the past two and a half decades. We investigate the sources of this development using methods from network science and natural language processing. To allow for cross-country comparisons over time, based on the explicit cross-references between legal rules, we algorithmically reorganise the legislative materials of the United States and Germany into cluster families that reflect legal topics. This reorganisation reveals that the main driver behind the growth of the law in both jurisdictions is the expansion of the welfare state, backed by an expansion of the tax state. Hence, our findings highlight the power of document network analysis for understanding the evolution of law and its relationship with society.

It has been a real pleasure to work with my transatlantic colleagues Corinna Coupette (Max Planck Institute for Informatics), Janis Beckedorf (Heidelberg University) and Dirk Hartung (Bucerius Law School). We have other projects also in the works — so stay tuned!

LEGAL-BERT: The Muppets Straight Out of Law School

ABSTRACT: “BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.”

Congrats to all of the authors on their acceptance in the Empirical Methods in Natural Language Processing Conference in November.

In the legal scientific community, we are witnessing increasing efforts to connect general purpose NLP Advances to domain specific applications within law. First, we saw Word Embeddings (i.e. word2Vec, etc.) now Transformers (i.e BERT, etc.). (And dont forget about GPT-3, etc.) Indeed, the development of LexNLP is centered around the idea that in order to have better performing Legal AI – we will need to connect broader NLP developments to the domain specific needs within law. Stay tuned!

Back to Future in Legal Artificial Intelligence — Expert Systems, Data Science and the Need for Peer Reviewed Technical Scholarship

In the broader field of Artificial Intelligence (A.I.) there is a major divide between Data Driven A.I. and Rules Based A.I.  Of course, it is possible to combine these approaches but let’s keep it separate and easy for now.  Rules Based AI in the form of expert systems peaked in the late 1980’s and culminated in the last AI Winter.  Absent a few commercial examples such as TurboTax, the world moved on and Data Driven A.I. took hold.

But here in #LegalTech #LawTech #LegalAI #LegalAcademy – it seems more and more like we have gone ‘Back to the A.I. Future’ (and brought an IF-THEN back in the Delorean).  As even in 2020, we see individuals and companies touting themselves for taking us Back to the A.I. Future.

There is nothing wrong with Expert Systems or Rules Based AI per se.  In law, the first expert system was created by Richard Susskind and Phillip Capper in the 1980’s.  Richard discussed this back at ReInventLaw NYC in 2014.    There are a some use cases where Legal Expert Systems (Rules Based AI) are appropriate.  For example, it makes the most sense in the A2J context.  Indeed, offerings such as A2J Author and Docassemble are good examples. However, for many (most) problems (particularly those with a decent level of complexity) such rule based methods alone are really not appropriate.  

Data Science — mostly leveraging methods from Machine Learning (including Deep Learning) as well as Natural Language Processing (NLP) and other computational allied methods (Network Science, etc.) are the modern coin of the realm (both in the commercial and academic spheres).

As the image above highlights, the broader A.I. world faces challenges associated with overhyped AI and faux expertise. #LegalAI also faces the problem of individuals and companies passing themselves off as “cutting edge AI experts” or “offering cutting edge AI products” without an academic record or codebase to their name. 

In the academy,  we judge scholars on academic papers published in appropriate outlets.  In order for someone to be genuinely considered an {A.I. and Law Scholar, Computational Law Expert, NLP and Law Researcher} that scholar should publish papers in technically oriented Peer Reviewed journals (*not* Law Reviews or trade publications alone).  In the Engineering or Computer Science side of the equation, it is possible to substitute a codebase (such as a major Python package or contribution) for peer reviewed papers.  In order for this field to be taken seriously within the broader academy (particularly by technical inclined faculty), we need more Peer Reviewed Technical Publications and more Codebases. If we do not take ourselves seriously – how can we expect others to do so.

On the commercial side, we need more objectively verifiable technology offerings that are not in line with Andriy Burkov’s picture as shown above … this is one of the reasons that we Open Sourced the core version of ContraxSuite / LexNLP.

NLLP Workshop 2020 — Legal Text Analysis Session — Video of Natural Legal Language Processing Workshop is Now on YouTube

NLLP Workshop 2020 Session 1: Legal Text Analysis — Video of Natural Legal Language Processing Workshop is Now on YouTube.  

Unfortunately, I was not available to participate as I was teaching class at the time of the workshop. However, Corinna Coupette and Dirk Hartung represented us well !  

Copy of the paper presented is available here —
arXiv LINK

2nd Workshop on Natural Legal Language Processing (NLLP) – Co-Located at the broader 2020 KDD Virtual Conference

Today is the 2nd Workshop on Natural Legal Language Processing (NLLP) which is co-located at the broader 2020 KDD Virtual Conference. Corinna Coupette is presenting our paper ‘Complex Societies and the Growth of the Law’ as a Non-Archival Paper. NLLP is a strong scientific workshop (I did one the Keynote Addresses last year and found it to be a very good group of scholars and industry experts). More information is located here.