Workshop on Natural Legal Language Processing (NLLP) at North American Association of Computational Linguistics Conference

The First Workshop on Natural Legal Language Processing (NLLP) will take place as part of the larger North American Association of Computational Linguistics Conference in Minneapolis June 2019.  NAACL is one of the  premier technical events in the field of NLP / Computational Linguistics.  Thus, I am very happy to give one of the Keynotes at this workshop.  It is one more step toward making Legal Informatics and Legal AI / NLP a mainstream idea within the technically oriented portion of the academy. 

I plan to highlight both my work with Mike Bommarito and others as well as provide an overview of the state of the field from both a technical and commercial perspective.  

LexNLP: Natural Language Processing and Information Extraction For Legal and Regulatory Texts (Bommarito, Katz, Detterman)

Paper Abstract – LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. LexNLP is designed for use in both academic research and industrial applications, and is distributed at https://github.com/LexPredict/lexpredict-lexnlp

LexPredict Hackathon Challenge – Extracting Simple Contract Metadata

Beyond the specific prize attached to upcoming hackathon event, we welcome anyone who (for fun) would like to take a crack at this challenge.

Email us directly (Daniel Martin Katz or Mike Bommarito) – if you would like to work on this challenge.

Our LexPredict Challenge is an opportunity to develop basic tools for processing contracts.

Specifically, you will use the sample contract data below to develop algorithms to:
(1) identify the parties to an agreement
(2) identify effective date segment and date
(3) identify termination clause segment(s) and date(s)

At LexPredict, we have built this simple (and other more complex) technology for use in commercial applications.  This is an opportunity to use this challenge to produce open source content which we can be used by all (including in the Legal Analytics Course).

The Utility of Text: The Case of Amicus Briefs and the Supreme Court (by Yanchuan Sim, Bryan Routledge & Noah A. Smith)

From the Abstract: “We explore the idea that authoring a piece of text is an act of maximizing one’s expected utility. To make this idea concrete, we consider the societally important decisions of the Supreme Court of the United States. Extensive past work in quantitative political science provides a framework for empirically modeling the decisions of justices and how they relate to text. We incorporate into such a model texts authored by amici curiae (“friends of the court” separate from the litigants) who seek to weigh in on the decision, then explicitly model their goals in a random utility model. We demonstrate the benefits of this approach in improved vote prediction and the ability to perform counterfactual analysis.  (HT: R.C. Richards from Legal Informatics Blog)

Computational Law Workshop @ Stanford Code X

Today Mike Bommarito and I had the pleasure of participating in the Computational Law Workshop.  It was a very solid group featuring ~20 of the top global experts participating in a true workshop format about the pressing technical issues in computational law.  It was a great exchange of ideas!