Our next paper — OpenEDGAR – Open Source Software for SEC Edgar Analysis is now available. This paper explores a range of #OpenSource tools we have developed to explore the EDGAR system operated by the US Securities and Exchange Commission (SEC). While a range of more sophisticated extraction and clause classification protocols can be developed leveraging LexNLP and other open and closed source tools, we provide some very simple code examples as an illustrative starting point.
Click here for Paper: < SSRN > < arXiv >
Access Codebase Here: < Github >
Abstract: OpenEDGAR is an open source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC). OpenEDGAR is built on the Django application framework, supports distributed compute across one or more servers, and includes functionality to (i) retrieve and parse index and filing data from EDGAR, (ii) build tables for key metadata like form type and filer, (iii) retrieve, parse, and update CIK to ticker and industry mappings, (iv) extract content and metadata from filing documents, and (v) search filing document contents. OpenEDGAR is designed for use in both academic research and industrial applications, and is distributed under MIT License at https://github.com/LexPredict/openedgar