Archive

Posts Tagged ‘united states code’

Measuring the Complexity of the Law : The United States Code

August 2nd, 2010 dmartink No comments

Understanding the sources of complexity in legal systems is a matter long considered by legal commentators. In tackling the question, scholars have applied various approaches including descriptive, theoretical and, in some cases, empirical analysis. The list is long but would certainly include work such as Long & Swingen (1987), Schuck (1992), White (1992), Kaplow (1995), Epstein (1997), Kades (1997), Wright (2000) and Holz (2007). Notwithstanding the significant contributions made by these and other scholars, we argue that an extensive empirical inquiry into the complexity of the law still remains to be undertaken.

While certainly just a slice of the broader legal universe, the United States Code represents a substantively important body of law familiar to both legal scholars and laypersons. In published form, the Code spans many volumes. Those volumes feature hundreds of thousands of provisions and tens of millions of words. The United States Code is obviously complicated, however, measuring its size and complexity has proven be non-trivial.

In our paper entitled, A Mathematical Approach to the Study of the United States Code we hope to contribute to the effort by formalizing the United States Code as a mathematical object with a hierarchical structure, a citation network and an associated text function that projects language onto specific vertices.

In the visualization above, Figure (a) is the full United States Code visualized to the section level. In other words, each ring is a layer of a hierarchical tree that halts at the section level. Of course, many sections feature a variety of nested sub-sections, etc. For example, the well known 26 U.S.C. 501(c)(3) is only shown above at the depth of Section 501.  If we added all of these layers there would simply be additional rings. For those interested in the visualization of specific Titles of the United States Code … we have previously created fully zoomable visualizations of Title 17 (Copyright), Title 11 (Bankruptcy),  Title 26 (Tax) [at section depth], Title 26 (Tax) [Capital Gains & Losses] as well as specific pieces of legislation such as the original Health Care Bill – HR 3962.

In the visualization above, Figure (b) combines this hierarchical structure together with a citation network.  We have previously visualized the United States Code citation network and have a working paper entitled Properties of the United States Code Citation Network. Figure (b) is thus a realization of the full United States Code through the section level.

With this representation in place, it is possible to measure the size of the Code using its various structural features such as vertices V and its edges E.  It is possible to measure the full Code at various time snapshots and consider whether the Code is growing or shrinking. Using a limited window of data, we observe growth not only in the size of the code but also its network of dependancies (i.e. its citation network).

Of course, growth in the size United States Code alone is not necessarily analogous to an increase in complexity.  Indeed, while we believe in general the size of the code tends to contribute to “complexity,” some additional measures are needed.  Thus, our paper features structural measurements such as number of sections, section sizes, etc.

In addition, we apply the well known Shannon Entropy measure (borrowed from Information Theory) to evaluate the “complexity” of the message passing / language contained therein.  Shannon Entropy has a long intellectual history and has been used as a measure of complexity by many scholars.  Here is the formula for Shannon entropy:

For those interested in reviewing the full paper, it is forthcoming in Physica A: Statistical Mechanics and its Applications. For those not familiar, Physica A is a journal published by Elsevier and is a popular outlet for Econophysics and Quantitative Finance. A current draft of the paper is available on the SSRN and the physics arXiv

We are currently working on a follow up paper that is longer, more detailed and designed for a general audience.  Even if you have little or no interest in the analysis of the United States Code, we hope principles such as entropy, structure, etc. will prove useful in the measurement of other classes of legal documents including contracts, treaties, administrative regulations, etc.

Computational Legal Studies – The Interactive Gallery

May 17th, 2010 dmartink No comments

Click on the above picture and you will be taken to the Interactive Gallery of Computational Legal Studies. Once inside the gallery, click on any thumbnail to see the full size image. Each image features a link to supporting materials such as documentation and/or the underlying academic paper. We hope to add more content to gallery over the coming weeks and months — so please check back!  Please note that load time may vary depending upon your connection, machine, etc.

Computational Legal Studies Presentation Slides from the Law.gov Meetings

May 7th, 2010 dmartink No comments

Thanks to Carl Malamud and the good folks at the University of Colorado Law School and University of Texas Law School for allowing us to participate in their respective law.gov meetings. For those interested in governmental transparency, we believe that Carl Malamud’s on-going national conversation is very important. The video above represents a fixed spaced movie combining the majority of the slides we presented at the two meetings. If the video will not load, click here to access the YouTube Version of the Slides. Enjoy!

The United States Code — The Movie — Featuring Title 16 — Conservation

April 22nd, 2010 dmartink No comments

Above is a movie displaying Title 16 (Conservation) a subset of the content contained within the United States Code. At more than 2,400 pages (download it here), Title 16 is one of the larger titles in the US Code.  Yet, it is not the largest.  For example, Title 26 (Internal Revenue Code) and Title 42 (Public Health and Welfare) are far larger than the object displayed above.

Now, you might be wondering why we chose to generate this movie. We envisioned at least two purposes.

(1) The title of this blog is Computational Legal Studies.  One of our major goals to either develop or apply tools that scale to life in the era of Big Data. Given the scope of an object such as the United States Code, it is is clear that a significant class of potential analysis cannot reasonably be undertaken without the use of computational tools.  Thus, with respect to developing new insights, we believe computational linguistics, information theory, applied graph theory can be of great use.  For those interested, our new paper entitled A Mathematical Approach to the Study of the United States Code offers our initial exploration of the possibilities.

(2) We believe this movie can be a meaningful pedagogical device.  Many students enter law school and are dismayed when even in  statutory based classes they are not exclusively reviewing the black letter law. Given the scope of this and other large bodies of documents, any model of legal education cannot be exclusively be dedicated to teaching black letter law. Instead, such training is appropriately devoted to a mixture of existing legal rules as well as the development of information acquisition protocols that train students to navigate the relevant landscape.

The Structure and Complexity of the United States Code

January 29th, 2010 dmartink No comments

Mike and I have been working on a paper we hope to soon post to the SSRN entitled ” The Structure and Complexity of the United States Code.”  Yesterday, we presented a pre-alpha version of the paper in the Michigan Center for Political Studies Workshop For those who might be interested, the abstract for the working abstract for the paper is below. If you are interested in accessing documentation for the above visualization please click here.

“The United States Code is the substantively important body of information that collectively constitutes the federal statutory law of the United States.  The Code is a complied hierarchical document organized into fifty substantive titles including Bankruptcy (Title 11), Judiciary and Judicial Procedure (Title 28), Public Health, and Welfare (Title 42) and Tax (Title 26).  In addition to its hierarchical organization, the Code contains an extensive citation network where cross-references connect its provisions in a variety of novel manners.

Claims regarding complexity of the Code, in particular the Internal Revenue Code, are consistently part of the public discourse. Undoubtedly, the Code is complicated. However, quantifying its complexity is a far more difficult proposition.  While there have been some initial attempts to identify the size of certain pieces of the Code, few comprehensive or comparative investigations of the entire United States Code have been undertaken.

In this article, we ask how complex is the United States Code and in comparative terms which titles are the most and least complex? Employing a wide variety of approaches including techniques drawn from information theory, computer science, linguistics and applied graph theory, we develop and apply a series of distinct measures for the structural and linguistic complexity of the Code.  After developing these discrete approaches, we generate a composite measure and use it to comparatively score each of the Code’s titles. While we recognize other composite measures for size and complexity could legitimately be offered, we believe our interdisciplinary approach represents a significant advance and provides much needed rigor to questions of code complexity.”

Sen. Sheldon Whitehouse on the Senate Floor Discussing the Length of the Health Care Bill and Citing Harry Potter Number [Via Think Progress]

December 4th, 2009 dmartink 2 comments

Full Story over at Think Progress.  Our original post on the length of HR 3962 is here.  The subsequent NY Times Article on the Length of HR 3962 is here. Enjoy!

New Paper: Properties of the United States Code Citation Network

November 11th, 2009 mjbommar No comments

We have been working on a larger paper applying many concepts from structural analysis and complexity science to the study of bodies of statutory law such as the United States Code. To preview the broader paper, we’ve published to SSRN and arXiv a shorter, more technical analysis of the properties of the United States Code’s network of citations.

Click here to Download the Paper!

Abstract: The United States Code is a body of documents that collectively comprises the statutory law of the United States. In this short paper, we investigate the properties of the network of citations contained within the Code, most notably its degree distribution. Acknowledging the text contained within each of the Code’s section nodes, we adjust our interpretation of the nodes to control for section length. Though we find a number of interesting properties in these degree distributions, the power law distribution is not an appropriate model for this system.

Citation In-Degree
Citation In-Degree

Katz & Bommarito in the New York Times Discussing H.R. 3962

November 10th, 2009 dmartink No comments

NYT Rx Blog

If you click through on the link above you will be directed to the New York Times Rx Blog.  The full version of the article appears online while a shorter version appeared in today’s print edition. For those viewing the print edition, the story is located on page A20. This website is mentioned in both versions of the story!

Visualizing the Structure of H.R. 3962 — The Health Care Bill

November 9th, 2009 mjbommar No comments

HR 3962 Visual

In addition to the facts we have presented on HR 3962, we wanted to offer a visualization for the structure of the Bill. Like many other bills, HR 3962, is divided into Divisions, Titles, Subtitles, Parts, Subparts, Sections, Subsections, Clauses, and Subclauses. These hierarchical splits represent the drafters’ conception of its organization, and thus the relative size of these categories may provide an indication of both the importance of each section of the Bill as well as the overall size of the document. By clicking through the image below, you can navigate a zoomable representation of the structure of HR 3962 using Microsoft’s Seadragon zoom interface.  Many of the Divisions, Titles, Subtitles, Parts, and Subparts of the Bill are labeled. The balance are not labeled because they fell on an angle on the radial layout which rendered them impossible to read.

The graph is laid out in a radial manner with the center node labeled “H.R. 3962.” Legislation, the broader United States Code as well as many other classes of information are organized as hierarchical documents. H.R. 3962 is no different. For those less familiar with this type of documents, we thought it useful to provide a tutorial regarding (1) how to use this zoomable visualization (2) the correspondence between the visual and the Library of Congress version of H.R. 3962


How Do I Open/Navigate the Visualization?

(1) Open the Library of Congress version of H.R. 3962 in another browser window.

(2) Open the visualization by clicking on the large image above.

(3) Clicking on the image above will take you to the Seadragon platform. (Note: Load times will vary from machine to machine… so please be patient.)

(4) Seadragon allows for zoomable visualizations and for full screen viewing. Full screen is really the best way to go. If you run your mouse over the black box where the visual is located you will see four buttons in the southeast corner.  The “full screen” button is the last one on the right. Click the button and you will be taken to full screen viewing!

(5) Click to zoom in and out, hold the mouse down and drag the entire visual, etc. Now, you are ready to traverse the graph using this visualization as your very own “H.R. 3962 Magic Decoder Wheel.”


How Do I Understand the Visualization?

To introduce the substance of the visualization, we have color coded two separate examples right into the visualization.

Example 1: Bills such as HR 3962 often feature a “short title” provision at the very begining of the legislation.  For example, if you download the PDF copy of the bill, you can see the short title at the bottom of page 1 of the bill.  You can also see this in the Library of Congress version of H.R. 3962.

SECTION 1. SHORT TITLE; TABLE OF DIVISIONS, TITLES, AND SUBTITLES.
(a) Short Title- This Act may be cited as the `Affordable Health Care for America Act’.

Zoom in close to start in the center where the large node labeled “HR 3962.”  Notice the blue colorized path features the blue labels 1. and terminates with the label (a). The labels in the graph are the labels in the text above.  While this is a simple example, the precise logic defines the entire graph.

Example 2: This is a bit more difficult as it requires the traversal of several provisions in order to reach a terminal node.  In this case, the terminal node read as follows … “SEC. 401. INDIVIDUAL RESPONSIBILITY.For an individual’s responsibility to obtain acceptable coverage, see section 59B of the Internal Revenue Code of 1986 (as added by section 501 of this Act).”

DIVISION A–AFFORDABLE HEALTH CARE CHOICES
TITLE IV–SHARED RESPONSIBILITY
Subtitle A–Individual Responsibility
SEC. 401. INDIVIDUAL RESPONSIBILITY.

Again, zoom in close to start in the center--where the large node labeled “HR 3962.”  Notice the blue colorized path features the blue labels A and terminates with the label 401. In between the start and finish, there are stops at IV and A, respectfully.  Just as before, the labels in the graph are the labels in the text above.  The end user can follow the precise journey but without the visual by using the Library of Congress version of H.R. 3962.

Facts About the Length of H.R. 3962, the Affordable Health Care for America Act (AHCAA)

November 8th, 2009 mjbommar 5 comments

In light of last night’s vote on H.R. 3962, the Affordable Health Care for America Act, we decided to calculate a few numbers on the current bill. Based on the Library of Congress’s XML representation of the bill (which can be obtained here), we have calculated a number of linguistic and citation properties of the Bill. The House of Representative approved HR 3962 by a 220-215 margin. The New York Times features a useful analysis of the vote including a breakout by party and region here.

On the Sunday morning talk shows as well as in other outlets, there has been significant discussion regarding the size of H.R. 3962. Specifically, many critics have decried the length of the bill citing its 1990 pages. The bill is indeed 1990 pages as you can see if you choose to download a PDF copy of the bill.

The purpose of this post is to provide a perspective regarding the length of H.R. 3962. Those versed in the typesetting practices of the United States Congress know that the printed version of a bill contains a significant amount of whitespace including non-trivial space between lines, large headers and margins, an embedded table of contents, and large font. For example, consider page 12 of the printed version of H.R. 3962.  This page contains fewer than 150 substantive words.

We believe a simple page count vastly overstates the actual length of bill. Rather than use page counts, we counted the number of words contained in the bill and compared these counts to the number of words in the existing United States Code. In addition, we consider the number of text blocks in the bill– where a text block is a unit of text under a section, subsection, clause, or sub-clause.


Basic Information about the Length of H.R. 3962

Number of words in H.R. 3962 impacting substantive law:

  • 234,812 words (w/ generous calculation)

Number of total words in H.R. 3962: 363,086 words (w/ titles, tables of contents …)
Number of text blocks: 7,961
Average number of words per text block: 24.18
Average words per section: 267.03


Is this a Large or Small Number? Comparison to Harry Potter

Number of substantive words in H.R. 3962: 234,812 words
Harry Potter and the Order of the Phoenix - 257,000 words
Harry Potter and the Goblet of Fire - 190,000 words
Harry Potter and the Deathly Hallows – 198,000 words


Is this a Large or Small Number? Comparison to Other Legislation

Number of substantive words in Energy Bill of 2007: 157,835 words
Number of substantive words in Defense Authorization Act for 2010: 119,960 Words
H.R. 3962 is roughly 2x the Size of Medicare Rx Bill of 2003 (Given there is no public XML version of the bill, the Exact “Substantive Words” Number is not available)


Is this a Large or Small Number? Comparison to the Full U.S. Code

Size of the United States Code: 42+ Million Words
Relative Size of H.R. 3962: H.R. 3962 is roughly 1/2 of one percent of the size of the United States Code


Longest Sections in H.R. 3962

  • Sec 341. Availability Through Health Insurance Exchange
  • Sec 1222. Demonstration to promote access for Medicare beneficiaries with limited English proficiency by providing reimbursement for culturally and linguistically appropriate services.
  • Sec 1160: Implementation, and Congressional review, of proposal to revise Medicare payments to promote high value health care
  • Sec 305: Funding for the construction, expansion, and modernization of small ambulatory care facilities
  • Sec 1417: Nationwide program for national and State background checks on direct patient access employees of long-term care facilities and providers


Modifications of the Existing U.S. Code By H.R. 3962

Number of Strikeouts: 332
Number of Inserts: 390
Number of Re-designations: 65


Acts Most Cited By H.R. 3962

Social Security Act: 622 times
Public Health Service Act: 134 times
Affordable Health Care for America Act: 60 times
Indian Health Care Improvement Act: 56 times
Indian Self-Determination and Education Assistance Act: 45 times
Employee Retirement Income Security Act: 39 times
Medicare Prescription Drug, Improvement, and Modernization Act: 11 times
American Recovery and Reinvestment Act: 7


Sections of the U.S. Code Cited (Properly) Most By H.R. 3962

25 U.S.C. §450. Congressional statement of findings: 38
25 U.S.C. §13. Expenditure of appropriations by Bureau: 13
42 U.S.C. §1396a(a). State plans for medical assistance: 10
42 U.S.C. §1396d(a). Definitions: 7
42 U.S.C. §2004a. Sanitation facilities: 7

The Structure of the United States Code [With Zoomorama]

September 15th, 2009 dmartink 1 comment

United States Code Zoomorama

Above we offer the same visual of the United States Code (Titles 1-50) which we previously offered here … this time we are using Zoomorama.  Zoomorama is an alternative to Seadragon which we believe might perform better on certain machine configurations.

Essentially, we do not want people to miss out on the visualization simply because their computer does not feature the necessary software/plugins.  While some class of endusers still might not be able to view either version, we hope this alternative version will maximize the chances that it would be visable.

So, feel free to scroll over the visual using your mouse.  For optimal viewing, however, we believe the full screen visual is the best way to go. Click on the square icon in the upper lright corner to make the visual full size.  Click Here for the Zoomorama Instructions!

WP SlimStat