natural language processing – Computational Legal Studies

OpenEDGAR: Open Source Software for SEC EDGAR Analysis is published in MIT Computational Law Report

November 20, 2020November 23, 2020 Daniel Katz

Today our Paper – “OpenEDGAR: Open Source Software for SEC EDGAR Analysis” was published in MIT Computational Law Report.

ABSTRACT: OpenEDGAR is an open source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC). OpenEDGAR is built on the Django application framework, supports distributed compute across one or more servers, and includes functionality to (i) retrieve and parse index and filing data from EDGAR, (ii) build tables for key metadata like form type and filer, (iii) retrieve, parse, and update CIK to ticker and industry mappings, (iv) extract content and metadata from filing documents, and (v) search filing document contents. OpenEDGAR is designed for use in both academic research and industrial applications, and is distributed under MIT License at https://github.com/LexPredict/openedgar

Data Science & Machine Learning in Containers (or Ad Hoc vs Enterprise Grade Data Products)

November 7, 2020November 7, 2020 Daniel Katz

As Mike Bommarito, Eric Detterman and I often discuss – one of the consistent themes in the Legal Tech / Legal Analytics space is the disconnect between what might be called ‘ad hoc’ data science and proper enterprise grade products / approaches (whether B2B or B2C). As part of the organizational maturity process, many organizations who decide that they must ‘get data driven’ start with an ad hoc approach to leveraging doing data science. Over time, it then becomes apparent that a more fundamental and robust undertaking is what is actually needed.

Similar dynamics also exist within the academy as well. Many of the code repos out there would not be considered proper production grade data science pipelines. Among other things, this makes deployment, replication and/or extension quite difficult.

Anyway, this blog post from Neptune.ai outlines just some of these issues.

Complex Societies and the Growth of the Law – Published Today in Scientific Reports (Nature Research)

October 30, 2020October 30, 2020 Daniel Katz

Access the Full Article via Scientific Reports (Nature Research). This article is part of a special compilation for Scientific Reports devoted to Social Physics.

ABSTRACT: While many informal factors influence how people interact, modern societies rely upon law as a primary mechanism to formally control human behaviour. How legal rules impact societal development depends on the interplay between two types of actors: the people who create the rules and the people to which the rules potentially apply. We hypothesise that an increasingly diverse and interconnected society might create increasingly diverse and interconnected rules, and assert that legal networks provide a useful lens through which to observe the interaction between law and society. To evaluate these propositions, we present a novel and generalizable model of statutory materials as multidimensional, time-evolving document networks. Applying this model to the federal legislation of the United States and Germany, we find impressive expansion in the size and complexity of laws over the past two and a half decades. We investigate the sources of this development using methods from network science and natural language processing. To allow for cross-country comparisons over time, based on the explicit cross-references between legal rules, we algorithmically reorganise the legislative materials of the United States and Germany into cluster families that reflect legal topics. This reorganisation reveals that the main driver behind the growth of the law in both jurisdictions is the expansion of the welfare state, backed by an expansion of the tax state. Hence, our findings highlight the power of document network analysis for understanding the evolution of law and its relationship with society.

It has been a real pleasure to work with my transatlantic colleagues Corinna Coupette (Max Planck Institute for Informatics), Janis Beckedorf (Heidelberg University) and Dirk Hartung (Bucerius Law School). We have other projects also in the works — so stay tuned!

LEGAL-BERT: The Muppets Straight Out of Law School

October 9, 2020October 9, 2020 Daniel Katz

ABSTRACT: “BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.”

Congrats to all of the authors on their acceptance in the Empirical Methods in Natural Language Processing Conference in November.

In the legal scientific community, we are witnessing increasing efforts to connect general purpose NLP Advances to domain specific applications within law. First, we saw Word Embeddings (i.e. word2Vec, etc.) now Transformers (i.e BERT, etc.). (And dont forget about GPT-3, etc.) Indeed, the development of LexNLP is centered around the idea that in order to have better performing Legal AI – we will need to connect broader NLP developments to the domain specific needs within law. Stay tuned!

Back to Future in Legal Artificial Intelligence — Expert Systems, Data Science and the Need for Peer Reviewed Technical Scholarship

September 10, 2020October 6, 2020 Daniel Katz

In the broader field of Artificial Intelligence (A.I.) there is a major divide between Data Driven A.I. and Rules Based A.I. Of course, it is possible to combine these approaches but let’s keep it separate and easy for now. Rules Based AI in the form of expert systems peaked in the late 1980’s and culminated in the last AI Winter. Absent a few commercial examples such as TurboTax, the world moved on and Data Driven A.I. took hold.

But here in #LegalTech #LawTech #LegalAI #LegalAcademy – it seems more and more like we have gone ‘Back to the A.I. Future’ (and brought an IF-THEN back in the Delorean). As even in 2020, we see individuals and companies touting themselves for taking us Back to the A.I. Future.

There is nothing wrong with Expert Systems or Rules Based AI per se. In law, the first expert system was created by Richard Susskind and Phillip Capper in the 1980’s. Richard discussed this back at ReInventLaw NYC in 2014. There are a some use cases where Legal Expert Systems (Rules Based AI) are appropriate. For example, it makes the most sense in the A2J context. Indeed, offerings such as A2J Author and Docassemble are good examples. However, for many (most) problems (particularly those with a decent level of complexity) such rule based methods alone are really not appropriate.

Data Science — mostly leveraging methods from Machine Learning (including Deep Learning) as well as Natural Language Processing (NLP) and other computational allied methods (Network Science, etc.) are the modern coin of the realm (both in the commercial and academic spheres).

As the image above highlights, the broader A.I. world faces challenges associated with overhyped AI and faux expertise. #LegalAI also faces the problem of individuals and companies passing themselves off as “cutting edge AI experts” or “offering cutting edge AI products” without an academic record or codebase to their name.

In the academy, we judge scholars on academic papers published in appropriate outlets. In order for someone to be genuinely considered an {A.I. and Law Scholar, Computational Law Expert, NLP and Law Researcher} that scholar should publish papers in technically oriented Peer Reviewed journals (*not* Law Reviews or trade publications alone). In the Engineering or Computer Science side of the equation, it is possible to substitute a codebase (such as a major Python package or contribution) for peer reviewed papers. In order for this field to be taken seriously within the broader academy (particularly by technical inclined faculty), we need more Peer Reviewed Technical Publications and more Codebases. If we do not take ourselves seriously – how can we expect others to do so.

On the commercial side, we need more objectively verifiable technology offerings that are not in line with Andriy Burkov’s picture as shown above … this is one of the reasons that we Open Sourced the core version of ContraxSuite / LexNLP.

NLLP Workshop 2020 — Legal Text Analysis Session — Video of Natural Legal Language Processing Workshop is Now on YouTube

September 8, 2020September 9, 2020 Daniel Katz

NLLP Workshop 2020 Session 1: Legal Text Analysis — Video of Natural Legal Language Processing Workshop is Now on YouTube.

Unfortunately, I was not available to participate as I was teaching class at the time of the workshop. However, Corinna Coupette and Dirk Hartung represented us well !

Copy of the paper presented is available here —
SSRN LINK: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3602098
arXiv LINK: https://arxiv.org/abs/2005.07646

2nd Workshop on Natural Legal Language Processing (NLLP) – Co-Located at the broader 2020 KDD Virtual Conference

August 24, 2020September 9, 2020 Daniel Katz

Today is the 2nd Workshop on Natural Legal Language Processing (NLLP) which is co-located at the broader 2020 KDD Virtual Conference. Corinna Coupette is presenting our paper ‘Complex Societies and the Growth of the Law’ as a Non-Archival Paper. NLLP is a strong scientific workshop (I did one the Keynote Addresses last year and found it to be a very good group of scholars and industry experts). More information is located here.

GPT-3 and Another Chat About the End of Lawyers

August 3, 2020September 9, 2020 Daniel Katz

Today I was Quoted in LegalIT Insider discussing GPT-3 and what it might mean for the future of Legal Tech / Legal AI … “There a few demos out there which look promising (but lots of things look good in a demo but are not robust in the end). I think the open question is always to what extent we can project any general advance in NLP to the domain specific challenges here in law-law land … On the positive side, I think every major and even minor advances present opportunities for us here in legal. It took a while but for example you see many of the products in our space taking advantage of Word Embedding [or transformer] methods (Word2Vec, BERT, ELMo, etc.)”

OpenAI GPT-3: Language Models are Few-Shot Learners

July 18, 2020September 9, 2020 Daniel Katz

< Go here to see Github / arXiv >

ContraxSuite’s Document Explorer UI: Supporting COVID-19 Contract Analysis

April 6, 2020September 9, 2020 Daniel Katz

< See Blog Post on Elevate Blog Site >

Technology Assisted COVID-19 Analysis & Thoughts on AI + Law Adoption

March 23, 2020September 9, 2020 Daniel Katz

Microsoft’s CodeBERT Ingests Public GitHub Repositories to Support Search for Code

February 25, 2020September 9, 2020 Daniel Katz

< Story via Venture Beat >

Managing LIBOR Risk through Technology-Assisted Review (WhitePaper)

July 20, 2019September 9, 2020 Daniel Katz

Managing LIBOR Risk through Technology-Assisted Review (Access WhitePaper) … Learn How ContraxSuite can help you remediate or repaper contracts for LIBOR!

Clifford Chance Selects ContraxSuite to Power its new Data Science Lab

July 16, 2019September 9, 2020 Daniel Katz

We are very happy that Clifford Chance has selected ContraxSuite and LexNLP to power its Legal Data Science Lab … Overall, I think there is increasing interest in flexible AI toolkits as opposed to individual point solutions. We are happy that a number of leading law firms, including Clifford Chance, have chosen to license our offerings, including but not limited to ContraxSuite and LexNLP.

#LegalTech #LegalAI #LegalData #LegalDataScience

See Coverage in Artificial Lawyer and Press Release

Keynote at the Workshop on Natural Legal Language Processing (NLLP) Workshop at North American Association of Computational Linguistics (NAACL) Conference

June 7, 2019September 9, 2020 Daniel Katz

Today I gave the afternoon Keynote at the Workshop on Natural Legal Language Processing (NLLP) which was part of the larger North American Association of Computational Linguistics Conference here in Minneapolis. Strong group of interdisciplinary scientists, lawyers and engineers.

Daniel Martin Katz, Ron Dolin & Michael Bommarito, Legal Informatics, Cambridge University Press (2021) (Edited Volume) < Cambridge >

Corinna Coupette, Janis Beckedorf, Dirk Hartung, Michael Bommarito, & Daniel Martin Katz, Measuring Law Over Time: A Network Analytical Framework with an Application to Statutes and Regulations in the United States and Germany, 9 Front. Phys. 658463 (2021) < Frontiers in Physics > < Supplemental Material >

Daniel Martin Katz, Legal Innovation (Book Forward) in Mapping Legal Innovation: Trends and Perspectives (Springer) (Antoine Masson & Gavin Robinson, eds.) (2021) < Springer >

Michael Bommarito, Daniel Martin Katz & Eric Detterman, LexNLP: Natural Language Processing and Information Extraction For Legal and Regulatory Texts in Research Handbook on Big Data Law (Edward Elgar Press) (Roland Vogl, ed.) (2021) < Edward Elgar > < Github > < SSRN > < arXiv >

Daniel Martin Katz, Corinna Coupette, Janis Beckedorf & Dirk Hartung, Complex Societies and the Growth of the Law, 10 Scientific Reports 18737 (2020) < Nature Research > < Supplemental Material >

Edward D. Lee, Daniel Martin Katz, Michael J. Bommarito II, Paul Ginsparg, Sensitivity of Collective Outcomes Identifies Pivotal Components, 17 Journal of the Royal Society Interface 167 (2020) < Journal of the Royal Society Interface > < Supplemental Material >

Michael Bommarito, Daniel Martin Katz & Eric Detterman, OpenEDGAR: Open Source Software for SEC EDGAR Analysis, MIT Computational Law Report (2020) < MIT Law > < Github >

J.B. Ruhl & Daniel Martin Katz, Mapping the Law with Artificial Intelligence in Law of Artificial Intelligence and Smart Machines (ABA Press) (2019) < ABA Press >

J.B. Ruhl & Daniel Martin Katz, Harnessing the Complexity of Legal Systems for Governing Global Challenges in Global Challenges, Governance, and Complexity (Edward Elgar) (2019) < Edward Elgar >

J.B. Ruhl & Daniel Martin Katz, Mapping Law’s Complexity with ‘Legal Maps’ in Complexity Theory and Law: Mapping an Emergent Jurisprudence (Taylor & Francis) (2018) < Taylor & Francis >

Michael Bommarito & Daniel Martin Katz, Measuring and Modeling the U.S. Regulatory Ecosystem, 168 Journal of Statistical Physics 1125 (2017) < J Stat Phys >

Daniel Martin Katz, Michael Bommarito & Josh Blackman, A General Approach for Predicting the Behavior of the Supreme Court of the United States, PLoS ONE 12(4): e0174698 (2017) < PLoS One >

J.B. Ruhl, Daniel Martin Katz & Michael Bommarito, Harnessing Legal Complexity, 355 Science 1377 (2017) < Science >

J.B. Ruhl & Daniel Martin Katz, Measuring, Monitoring, and Managing Legal Complexity, 101 Iowa Law Review 191 (2015) < SSRN >

Paul Lippe, Daniel Martin Katz & Dan Jackson, Legal by Design: A New Paradigm for Handling Complexity in Banking Regulation and Elsewhere in Law, 93 Oregon Law Review 831 (2015) < SSRN >

Paul Lippe, Jan Putnis, Daniel Martin Katz & Ian Hurst, How Smart Resolution Planning Can Help Banks Improve Profitability And Reduce Risk, Banking Perspective Quarterly (2015) < SSRN >

Daniel Martin Katz, The MIT School of Law? A Perspective on Legal Education in the 21st Century, Illinois Law Review 1431 (2014) < SSRN > < Slides >

Daniel Martin Katz & Michael Bommarito, Measuring the Complexity of the Law: The United States Code, 22 Journal of Artificial Intelligence & Law 1 (2014) < Springer > < SSRN >

Daniel Martin Katz, Quantitative Legal Prediction – or – How I Learned to Stop Worrying and Start Preparing for the Data Driven Future of the Legal Services Industry, 62 Emory Law Journal 909 (2013) < SSRN >

Daniel Martin Katz, Joshua Gubler, Jon Zelner, Michael Bommarito, Eric Provins & Eitan Ingall, Reproduction of Hierarchy? A Social Network Analysis of the American Law Professoriate, 61 Journal of Legal Education 76 (2011) < SSRN >

Michael Bommarito, Daniel Martin Katz & Jillian Isaacs-See, An Empirical Survey of the Written Decisions of the United States Tax Court (1990-2008), 30 Virginia Tax Review 523 (2011) < SSRN >

Daniel Martin Katz, Michael Bommarito, Juile Seaman, Adam Candeub, Eugene Agichtein, Legal N-Grams? A Simple Approach to Track the Evolution of Legal Language in Proceedings of JURIX: The 24th International Conference on Legal Knowledge and Information Systems (2011) < SSRN >

Daniel Martin Katz & Derek Stafford, Hustle and Flow: A Social Network Analysis of the American Federal Judiciary, 71 Ohio State Law Journal 457 (2010) < SSRN >

Michael Bommarito & Daniel Martin Katz, A Mathematical Approach to the Study of the United States Code, 389 Physica A 4195 (2010) < SSRN > < arXiv >

Michael Bommarito, Daniel Martin Katz & Jonathan Zelner, On the Stability of Community Detection Algorithms on Longitudinal Citation Data in Proceedings of the 6th Conference on Applications of Social Network Analysis (2010) < SSRN > < arXiv >

Michael Bommarito, Daniel Martin Katz, Jonathan Zelner & James Fowler, Distance Measures for Dynamic Citation Networks 389 Physica A 4201 (2010) < SSRN > < arXiv >

Michael Bommarito, Daniel Martin Katz & Jonathan Zelner, Law as a Seamless Web? Comparing Various Network Representations of the United States Supreme Court Corpus (1791-2005) in Proceedings of the 12th International Conference on Artificial Intelligence and Law (2009) < SSRN >

Marvin Krislov & Daniel Martin Katz, Taking State Constitutions Seriously, 17 Cornell Journal of Law & Public Policy 295 (2008) < SSRN >

Daniel Martin Katz, Derek Stafford & Eric Provins, Social Architecture, Judicial Peer Effects and the ‘Evolution’ of the Law: Toward a Positive Theory of Judicial Social Structure, 23 Georgia State Law Review 975 (2008) < SSRN >

Daniel Martin Katz, Institutional Rules, Strategic Behavior and the Legacy of Chief Justice William Rehnquist: Setting the Record Straight on Dickerson v. United States, 22 Journal of Law & Politics 303 (2006) < SSRN >

Daniel Martin Katz, Michael Bommarito, Tyler Sollinger & James Ming Chen, Law on the Market? Abnormal Stock Returns and Supreme Court Decision-Making < SSRN > < arXiv > < Slides >

Daniel Martin Katz, Michael Bommarito & Josh Blackman, Crowdsourcing Accurately and Robustly Predicts Supreme Court Decisions < SSRN > < arXiv > < Slides >

Daniel Martin Katz & Michael Bommarito, Regulatory Dynamics Revealed by the Securities Filings of Registered Companies < Slides >

Pierpaolo Vivo, Daniel Martin Katz & J.B. Ruhl (Editors), The Physics of the Law: Legal Systems Through the Prism of Complexity Science, Special Collection for Frontiers in Physics (2021 Forthcoming) < Frontiers in Physics >

Corinna Coupette, Dirk Hartung, Janis Beckedorf, Maximilian Bother & Daniel Martin Katz, Law Smells – Defining and Detecting Problematic Patterns in Legal Drafting < SSRN >

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz & Nikolaos Aletras, LexGLUE: A Benchmark Dataset for Legal Language Understanding in English < arXiv > < SSRN >