Happy Birthday to Computational Legal Studies

On March 17, 2009 we offered our first post here at the Computational Legal Studies Blog. It has been an exciting and fun year.  Here are some of the highlights! (1) 3-Dimensional Hi Def Visualization of the Early United States Supreme Court Citation Network (2) Facts About the Length of H.R. 3962, the Affordable Health Care for America Act (AHCAA) (Discussed in NY Times Blog, NY Times, U.S. Senate Floor) (3) Zoomable Visualization of the Campaign Contributions to Senators in the 110th Congress (4) Visualizing Bank Failures (2008 – 2009) (5) Who Owns America’s Debt? An Dynamic Perspective on the Major Foreign Holders of Treasury Securities [2002- Present] (6) Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber (7) Cash for Clunkers – Visualization and Analysis (8) Dynamic Animation of the East Anglia Climate Research Unit Email Network (9) The Structure of the United States Code (10) Model of Intellectual Diffusion Upon the American Legal Academy

The Development of Structure in the Citation Network of the United States Supreme Court — Now in HD!

What are some of the key takeaway points? (1) The Supreme Court’s increasing reliance upon its own decisions over the 1800-1830 window. (2) The important role of maritime/admiralty law in the early years of the Supreme Court’s citation network.  At least with respect to the Supreme Court’s citation network, these maritime decisions are the root of the Supreme Court’s jurisprudence. (3) The increasing centrality of decisions such as Marbury v. Madison, Martin v. Hunter’s Lessee to the overall network. The Development of Structure in the SCOTUS Citation Network The visualization offered above is the largest weakly connected component of the citation network of the United States Supreme Court (1800-1829). Each time slice visualizes the aggregate network as of the year in question. In our paper entitled Distance Measures for Dynamic Citation Networks, we offer some thoughts on the early SCOTUS citation network.  In reviewing the visual above note ….“[T]he Court’s early citation practices indicate a general absence of references to its own prior decisions. While the court did invoke well-established legal concepts, those concepts were often originally developed in alternative domains or jurisdictions. At some level, the lack of self-reference and corresponding reliance upon external sources is not terribly surprising. Namely, there often did not …

The Development of Structure in the Citation Network of the United States Supreme Court

The Development of Structure in the Citation Network of the United States Supreme Court — Now in HD! from Computational Legal Studies on Vimeo. What are some of the key takeaway points? (1) The Supreme Court’s increasing reliance upon its own decisions over the 1800-1830 window. (2) The important role of maritime/admiralty law in the early years of the Supreme Court’s citation network.  At least with respect to the Supreme Court’s citation network, these maritime decisions are the root of the Supreme Court’s jurisprudence. (3) The increasing centrality of decisions such as Marbury v. Madison, Martin v. Hunter’s Lessee to the overall network. The Development of Structure in the SCOTUS Citation Network The visualization offered above is the largest weakly connected component of the citation network of the United States Supreme Court (1800-1829). Each time slice visualizes the aggregate network as of the year in question. In our paper entitled Distance Measures for Dynamic Citation Networks, we offer some thoughts on the early SCOTUS citation network.  In reviewing the visual above note ….“[T]he Court’s early citation practices indicate a general absence of references to its own prior decisions. While the court did invoke well-established legal concepts, those concepts were often originally developed in alternative …

United States Court of Appeals & Parallel Tag Clouds from IBM Research [Repost from 10/23]

Download the paper: Collins, Christopher; Viégas, Fernanda B.; Wattenberg, Martin. Parallel Tag Clouds to Explore Faceted Text Corpora To appear in Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), October, 2009. [Note: The Paper is 24.5 MB] Here is the abstract: Do court cases differ from place to place? What kind of picture do we get by looking at a country’s collection of law cases? We introduce Parallel Tag Clouds: a new way to visualize differences amongst facets of very large metadata-rich text corpora. We have pointed Parallel Tag Clouds at a collection of over 600,000 US Circuit Court decisions spanning a period of 50 years and have discovered regional as well as linguistic differences between courts. The visualization technique combines graphical elements from parallel coordinates and traditional tag clouds to provide rich overviews of a document collection while acting as an entry point for exploration of individual texts. We augment basic parallel tag clouds with a details-in-context display and an option to visualize changes over a second facet of the data, such as time. We also address text mining challenges such as selecting the best words to visualize, and how to do so in reasonable time periods to maintain …

The Structure and Complexity of the United States Code

Mike and I have been working on a paper we hope to soon post to the SSRN entitled ” The Structure and Complexity of the United States Code.”  Yesterday, we presented a pre-alpha version of the paper in the Michigan Center for Political Studies Workshop For those who might be interested, the abstract for the working abstract for the paper is below. If you are interested in accessing documentation for the above visualization please click here. “The United States Code is the substantively important body of information that collectively constitutes the federal statutory law of the United States.  The Code is a complied hierarchical document organized into fifty substantive titles including Bankruptcy (Title 11), Judiciary and Judicial Procedure (Title 28), Public Health, and Welfare (Title 42) and Tax (Title 26).  In addition to its hierarchical organization, the Code contains an extensive citation network where cross-references connect its provisions in a variety of novel manners. Claims regarding complexity of the Code, in particular the Internal Revenue Code, are consistently part of the public discourse. Undoubtedly, the Code is complicated. However, quantifying its complexity is a far more difficult proposition.  While there have been some initial attempts to identify the size of certain …

GraphMovie: A Library for Generating Movies from Dynamic Graphs with igraph

Over the past few months, we’ve developed a library for simply generating dynamic network animations. We’ve used this library in visualizations like (1) Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber and (2) Dynamic Animation of the East Anglia Climate Research Unit Email Network.  Prior to these visualizations, we’ve used Sonia to produce animations like this one. While certainly a useful program for those without programming expertise, Sonia suffers from a number of issues that make it unusable for large graphs or graphs with many “slices.”  Furthermore, in our experience rendering various movies a number of platform issues with the Quicktime and Flash rendering engines have arisen.  Fixing these problems is possible, but Sonia’s large Java codebase makes for a steep learning curve.  As a result, we’ve decided to release this GraphMovie class so that others can use or possibly improve this library. In order to use the GraphMovie, you’ll need the following: python (tested with 2.6) igraph for network manipulation and visualization Python Imaging Library for manipulating the image frames mencoder from the MPlayer package for encoding the image frames into a movie Here are the files, hosted on github: GraphMovie.py GraphMovie_Example1.py GraphMovie: Example 1 …

Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber

Based on the Farouk1986 Gawaher data posted earlier this week, we have analyzed the communication network of the alleged Christmas Day Bomber, Umar Farouk Abdulmutallab. Using the handle “Farouk1986,” Abdulmutallab was a regular participant on the Islamic forum Gawaher.com. Several years prior to the Christmas Day incident, the alleged Christmas Day Bomber took part in a significant number of communications.  Of course, these communications can be analyzed in a number of ways.  For example, over at Zero Intelligence Agents, Drew Conway has already done some useful initial analysis. We sought to contribute our analysis of the time-evolving communication network contained within these posts.  While more extensive documentation is available below, in reviewing the dynamic network visualization, consider the following observations: Click on the Full Screen Button! (4 Arrow Symbol in the Vimeo Bottom Banner) #1) “Farouk1986” Entered an Existing Network Which Appeared to Increase the Salience of Religion in His Life Although individuals in society may feel isolated or appear to be loners, the internet offers like minded, potentially meaningful networks of people with whom to connect. These internet is full of communities of individuals who interests are wide ranging—including topics such as Blizzard’s World of Warcraft, sports, culinary interests or religion.  With …

Visualizing Bank Failures ( 2008 – 2009 )

Three Takeaways Acceleration: There were four failures in the first six months of 2008, followed by another 22 failures in the next six months.  By January of 2009, there were 21 failures in the first three months of the year, followed by 138 from April to last Friday. Magnitude: Failures in the past two years have cost the Depositors Insurance Fund an estimated $57B.  The IndyMac failure of July 2008 accounted for $10B alone, followed by BankUnited at $4.9B and Guaranty Banks at $3B. Spatial Correlation: There is a significant amount of spatial correlation in California, Georgia, Florida, Texas, and Illinois.  These states account for 77% of the total costs to the Depositors Insurance Fund.  Furthermore, most of the losses in California and Georgia were concentrated highly around a few urban centers. The Movie The movie below shows the location of bank failures, beginning in 2008 and concluding with the three failed banks from Friday, December 11, 2009. Each green circle corresponds to a bank failure, and the size of each circle corresponds logarithmically to the FDIC’s estimated cost for the Depository Insurance Fund, as stated in the FDIC press releases. For failures with joint press releases, such as the …

Dynamic Animation of the East Anglia Climate Research Unit Email Network

  FULL SCREEN FOR BETTER VIEWING! Click on this icon to view the Movie in Full Screen Mode! STATIC SNAPSHOT TO DYNAMIC ANIMATION In our prior post analyzing the email database of Climate Research Unit at the University of East Anglia, we aggregated all emails over the relevant 1997-2009 time period into a single static visualization. Specifically, to build the network, we processed every email in the leaked data. Each email contains a sender and at least one recipient on the To:, Cc:, or Bcc: line. One obvious shortcoming associated with producing a static snapshot for data set, is that it often obscures the time evolving dynamics of interaction which produced the full graph.  To generate a dynamic picture, it is necessary to collect time stamped network data. In the current case, this required acquisition of the date field for each of the emails. With this information, we used the same underlying data to generate a dynamic network animation for the 1997-2009 time window. HOW TO INTERPET THE MOVIE Consistent with the approach offered in our prior visualization, each node represents an individual within the email dataset while each connection reflects the weighted relationship between those individuals. The movie posted above features the date in the upper left.  As …

Programming Dynamic Models in Python-Part 3: Outbreak on a Network

In this post, we will continue building on the basic models we discussed in the first and second tutorials. If you haven’t had a chance to take a look at them yet, definitely go back and at least skim them, since the ideas and code there form the backbone of what we’ll be doing here. In this tutorial, we will build a model that can simulate outbreaks of disease on a small-world network (although the code can support arbitrary networks).  This tutorial represents a shift away from both: a) the mass-action mixing of the first two and and b) the assumption of social homogeneity across individuals that allowed us to take some shortcuts to simplify model code and speed execution. Put another way, we’re moving more in the direction of individual-based modeling. When we’re done, your model should be producing plots that look like this: Red nodes are individuals who have been infected before the end of the run, blue nodes are never-infected individuals and green ones are the index cases who are infectious at the beginning of the run. And your model will be putting out interesting and unpredictable results such as these: In order to do this one, …

Visualizing the Structure of H.R. 3962 — The Health Care Bill

In addition to the facts we have presented on HR 3962, we wanted to offer a visualization for the structure of the Bill. Like many other bills, HR 3962, is divided into Divisions, Titles, Subtitles, Parts, Subparts, Sections, Subsections, Clauses, and Subclauses. These hierarchical splits represent the drafters’ conception of its organization, and thus the relative size of these categories may provide an indication of both the importance of each section of the Bill as well as the overall size of the document. By clicking through the image below, you can navigate a zoomable representation of the structure of HR 3962 using Microsoft’s Seadragon zoom interface.  Many of the Divisions, Titles, Subtitles, Parts, and Subparts of the Bill are labeled. The balance are not labeled because they fell on an angle on the radial layout which rendered them impossible to read. The graph is laid out in a radial manner with the center node labeled “H.R. 3962.” Legislation, the broader United States Code as well as many other classes of information are organized as hierarchical documents. H.R. 3962 is no different. For those less familiar with this type of documents, we thought it useful to provide a tutorial regarding (1) how …

Hustle and Flow: A Social Network Analysis of the American Federal Judiciary [Repost from 3/25]

Together with Derek Stafford from the University of Michigan Department of Political Science, Hustle and Flow: A Social Network Analysis of the American Federal Judiciary represents our initial foray into Computational Legal Studies. The full paper contains a number of interesting visualizations where we draw various federal judges together on the basis of their shared law clerks (1995-2004). The screen print above is a zoom very center of the center of the network.  Yellow Nodes represent Supreme Court Justices, Green Nodes represent Circuit Court Justices, Blue Nodes represent District Court Justices. There exist many high quality formal models of judicial decision making including those considering decisions rendered by judges in judicial hierarchy, whistle blowing, etc. One component which might meaningfully contribute to the extent literature is the rigorous consideration of the social and professional relationships between jurists and the impacts (if any) these relationships impose upon outcomes. Indeed, from a modeling standpoint, we believe the “judicial game” is a game on a graph–one where an individual strategic jurist must take stock of time evolving social topology upon which he or she is operating. Even among judges of equal institutional rank, we observe jurists with widely variant levels social authority (specifically social authority follows a power …

The Structure of the United States Code [w/ Zoomorama] [Repost]

Formally organized into 50 titles, the United States Code is the repository for federal statutory law. While each of the 50 titles define a particular substantive domain, the structure within and across titles can be represented as a graph/network. In a series of prior posts, we offered visualizations at various “depths” for a number of well know U.S.C. titles. Click here and click Here for our two separate visualizations of the Tax Code (Title 26).  Click here for our visualization of the Bankruptcy Code (Title 11).  Click here for our visualization of Copyright (Title 17). While our prior efforts were devoted to displaying the structure of a given title of the US Code, the visualization above offers a complete view of the structure of the entire United States Code (Titles 1-50). Using Zoomorama, each title is labeled with its respective number. The small black dots are “vertices” representing all sections in the aggregate US Code (~37,500 total sections). Given the size of the total undertaking, in the visual above, every title is represented to the “section level.”  As we described in earlier posts, a “section level” representation halts at the section and thus does not represent any of subsection depth.  For example, all …

United States Court of Appeals & Parallel Tag Clouds from IBM Research

Download the paper: Collins, Christopher; Viégas, Fernanda B.; Wattenberg, Martin. Parallel Tag Clouds to Explore Faceted Text Corpora To appear in Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), October, 2009. [Note: The Paper is 24.5 MB] Here is the abstract: Do court cases differ from place to place? What kind of picture do we get by looking at a country’s collection of law cases? We introduce Parallel Tag Clouds: a new way to visualize differences amongst facets of very large metadata-rich text corpora. We have pointed Parallel Tag Clouds at a collection of over 600,000 US Circuit Court decisions spanning a period of 50 years and have discovered regional as well as linguistic differences between courts. The visualization technique combines graphical elements from parallel coordinates and traditional tag clouds to provide rich overviews of a document collection while acting as an entry point for exploration of individual texts. We augment basic parallel tag clouds with a details-in-context display and an option to visualize changes over a second facet of the data, such as time. We also address text mining challenges such as selecting the best words to visualize, and how to do so in reasonable time periods …