Workshop on Natural Legal Language Processing (NLLP) at North American Association of Computational Linguistics Conference

The First Workshop on Natural Legal Language Processing (NLLP) will take place as part of the larger North American Association of Computational Linguistics Conference in Minneapolis June 2019.  NAACL is one of the  premier technical events in the field of NLP / Computational Linguistics.  Thus, I am very happy to give one of the Keynotes at this workshop.  It is one more step toward making Legal Informatics and Legal AI / NLP a mainstream idea within the technically oriented portion of the academy. 

I plan to highlight both my work with Mike Bommarito and others as well as provide an overview of the state of the field from both a technical and commercial perspective.  

LexPredict Launches New User Interface for ContraxSuite

Our LexPredict Team is excited to announce the new ContraxSuite User Interface – See Press Release < HERE >

ContraxSuite has a wide range of user types across our various legal service delivery customers. Relevant users include legal data scientists, power users in legal information technology, professional review teams at legal process outsourcers, contract review units in corporate legal departments, as well as associates and partners in law firms. While the existing ContraxSuite user interface will still serve as the interface for our data scientist community, the new UI is designed to serve the needs of a much broader community of users.

Eric Detterman – VP and Global Head of Products and Solution Engineering at LexPredict noted, “The new ContraxSuite User Interface delivers the bells and whistles that many users expect from a modern app or software tool, including dynamic menus, helpful dialog boxes, and an easy, intuitive design.”   

OpenEDGAR: Open Source Software for SEC EDGAR Analysis (Michael Bommarito, Daniel Martin Katz & Eric Detterman)

Our next paper — OpenEDGAR – Open Source Software for SEC Edgar Analysis is now available.  This paper explores a range of #OpenSource tools we have developed to explore the EDGAR system operated by the US Securities and Exchange Commission (SEC).  While a range of more sophisticated extraction and clause classification protocols can be developed leveraging LexNLP and other open and closed source tools, we provide some very simple code examples as an illustrative starting point.

Click here for Paper:   < SSRN > < arXiv >
Access Codebase Here: < Github >

Abstract
OpenEDGAR is an open source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC). OpenEDGAR is built on the Django application framework, supports distributed compute across one or more servers, and includes functionality to (i) retrieve and parse index and filing data from EDGAR, (ii) build tables for key metadata like form type and filer, (iii) retrieve, parse, and update CIK to ticker and industry mappings, (iv) extract content and metadata from filing documents, and (v) search filing document contents. OpenEDGAR is designed for use in both academic research and industrial applications, and is distributed under MIT License at https://github.com/LexPredict/openedgar

LexNLP: Natural Language Processing and Information Extraction For Legal and Regulatory Texts (Bommarito, Katz, Detterman)

Paper Abstract – LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. LexNLP is designed for use in both academic research and industrial applications, and is distributed at https://github.com/LexPredict/lexpredict-lexnlp

Exploring the Use of Text Classi€cation in the Legal Domain (via arXiv)

ABSTRACT:  In this paper, we investigate the application of text classi€cation methods to support law professionals. We present several experiments applying machine learning techniques to predict with high accuracy the ruling of the French Supreme Court and the law area to which a case belongs to. We also investigate the inƒuence of the time period in which a ruling was made on the form of the case description and the extent to which we need to mask information in a full case ruling to automatically obtain training and test data that resembles case descriptions. We developed a mean probability ensemble system combining the output of multiple SVM classi€ers. We report results of 98% average F1 score in predicting a case ruling, 96% F1 score for predicting the law area of a case, and 87.07% F1 score on estimating the date of a ruling

LexPredict Open Sources The 1910 Version of Black’s Law – The World’s Most Well Known Legal Dictionary is Now a Data Object


From the release
:  “At their core, many academic and commercial applications of natural language processing and machine learning can benefit from a controlled lexicon of expert-selected terms (i.e., a dictionary). This is especially true of highly technical language, such as legal text. However, after a search of the existing landscape, we were unable to find a high-quality open source or freely-available legal dictionary. Instead, the best existing versions, when available, exist under some form of restrictive licensing conditions.”

“Thus, in furtherance of both the legal profession as well as a range of legal technology providers and solutions, we are announcing another step in our broader open source plan that we outlined earlier this month. Namely, we are making available on Github the 1910 Version of Black’s Law (i.e., Black’s Law 2nd Edition) as a structured data object. This early version of arguably the premier legal dictionary is made available under the open source GPL license 3.0 which should allow both researchers and commercial providers to operate with limited restrictions.”

Click here to access the GitHub Repo.

Why We Are Open Sourcing ContraxSuite and Some Thoughts About Legal Tech and the Modern Information Economy

 

Today we here at LexPredict announce that we will be open sourcing our document analytics platform ContraxSuite (which works on a wide class of documents beyond just contracts).

From the Announcement – “Starting on August 1st, this code base and our public development roadmap will be hosted on Github under a permissive open-source licensing model that will allow most organizations to quickly and freely implement and customize their own contract and document analytics. Like Redhat does for Linux, we will provide support, customization, and data services to “cover the last mile” for those organizations who need it.

We believe that a very important future for law lies in its central role in facilitating and regulating the modern information economy. But unless we start treating law itself like the production of information, we’ll never get there. Before we can solve big problems with smart contracts, we need to start by structuring existing legacy contracts. We hope our actions today will help lawyers, companies, and other LegalTech providers accelerate the pace of improvement and innovation through more open collaboration.”    (click here for full announcement or access via Slideshare)