This is a very important paper by Jonathan Chang and David Blei. Suffice to say, it has potential use in a wide class of social science applications. Click here to access related material on Professor Blei’s Princeton homepage. Click here for some slides (note 7.0 mb). Check it out!
We have spent the past couple days at the University of Pennsylvania where we presented information about our efforts to compile a complete United States Supreme Court Corpus. As noted in the slides below, we are interested in creating a corpus containing not only every SCOTUS opinion, but also every SCOTUS disposition from 1791-2010. Slight variants of the slides below were presented at the Penn Computational Linguistics Lunch (CLunch) and the Linguistic Data Consortium(LDC). We really appreciated the feedback and are looking forward to continue our work with the LDC. For those who might be interested, take a look at the slides embedded below or click on this link:
As we mentioned in previous posts, Seadragon is a really cool product. Please note load times may vary depending upon your specific machine configuration as well as the strength of your internet connection. For those not familiar with how to operate it please see below. In our view, the Full Screen is best the way to go ….
“Though law is almost certainly a web, questions regarding its interconnectedness remain. Building upon themes of Maitland, Professor Solum has properly raised questions as to whether or not the web of law is “seamless”. By leveraging the tools of computer science and applied graph theory, we believe that an empirical evaluation of this question is at last possible. In that vein, consider Figure 9, which offers several possible topological locations that might be populated by components of the graphs discussed herein. We believe future research should consider the relevant information contained in the union, intersection, and complement of our citation and semantic networks.
While we leave a detailed substantive interpretation for subsequent work, it is worth broadly considering the information defined in Figure 9. For example, the intersect (∩) displayed in Figure 9 defines the set of cases that feature both semantic similarity and a direct citation linkage. In general, these are likely communities of well-defined topical domains. Of greater interest to an empirical evaluation of the law as a seamless web, is likely the magnitude and composition of the Citation Only and Semantic Only subsets. Subject to future empirical investigation, we believe the Citation Only components of the graph may represent the exact type of concept exportation to and from particular semantic domains that would indeed make the law a seamless web.”
In our paper Law as a Seamless Web, we offer a first-order method to generate case-to-case and opinionunit-to-opinionunit semantic networks. As constructed in the figure above, nodes represent cases decided between 1791-1865 while edges are drawn when two cases possess a certain threshold of semantic similarity. Except for the definition of edges, the process of constructing the semantic graph is identical to that of the citation graph we offered in the prior post. While computer science/computational linguistics offers a variety of possible semantic similarity measures, we choose to employ a commonly used measure. Here a description from the paper:
“Semantic similarity measures are the focus of significant work in computational linguistics. Given the scope of the dataset, we have chosen a first-order method for calculating similarity. After lemmatizing the text of the case with WordNet, we store the nouns with the top N frequencies for each case or opinion unit. We define the similarity between two cases or opinion units A and B as the percentage of words that are shared between the top words of A and top words of B.
An edge exists between A and B in the set of edges if σ (A,B) exceeds some threshold. This threshold is the minimum similarity necessary for the graph to represent the presence of a semantic connection.”
As this a technical paper, it is slanted toward demonstrating proof of methodological concept rather than covering significant substantive ground. With that said, we do offer a hint of our broader substantive goal of detecting the spread of legal concepts between various topical domains. Specifically, with respect to enriching positive political theory, we believe union, intersect and compliment of the semantic and citation networks are really important. More on this point is forthcoming in a subsequent post…
We have recently posted Law as a Seamless Web? Comparison of Various Network Representations of the United States Supreme Court Corpus (1791-2005) to the SSRN. Given this is the first of several posts about the paper, I will speak broadly and leave details for a subsequent post. From the abstract “As research of judicial citation and semantic networks transitions from a strict focus on the structural characteristics of these networks to the evolutionary dynamics behind their growth, it becomes even more important to develop theoretically coherent and empirically grounded ideas about the nature of edges and nodes. In this paper, we move in this direction on several fronts …. Specifically, nodes represent whole cases or individual ‘opinion units’ within cases. Edges represent either citations or semantic connections.” The table below outlines several possible network representations for the USSC corpus.
The goal of the paper is to do some technical and conceptual work. It is a small slice of broader project with James Fowler (UCSD) and James Spriggs (WashU). We recently presented findings from the primary project at the Networks in Political Science Conference. The main project is entitled The Development of Community Structure in the Supreme Court’s Network of Citations and we hope to have a version of this paper on the SSRN soon. In the meantime, we plan additional discussion of Law as a Seamless Web in the days to come.
Live from Barcelona, we are on the road at the International Association for Artificial Intelligence and Law. Henry Prakken has just delivered the keynote address and we will soon be giving our presentation. The conference is interesting as it embraces a wide range of topics and intellectual traditions. For example, there is a significant emphasis on ontological reasoning, computational models of argumentation and the use of XML schemas. In addition, there are a number of folks using graph theoretic techniques and applying them to the development of the law. It has been a nice few days and we have enjoyed our time here. Tomorrow, the trip continues….
In honor of Tax Day, we’ve produced a simple time series representation of the Supreme Court and tax. The above plot shows the how often the word “tax” occurs in the cases of the Supreme Court, for each year – that is, what proportion of all words in every case in a given year are the word “tax.” The data underneath includes non-procedural cases from 1790 to 2004. The arrows highlight important legislation and cases for income tax as well.
Make sure to click through the image to view the full size.