Slides from our Presentation at UPenn Computational Linguistics (CLUNCH) / Linguistic Data Consortium (LDC)

We have spent the past couple days at the University of Pennsylvania where we presented information about our efforts to compile a complete United States Supreme Court Corpus.  As noted in the slides below, we are interested in creating a corpus containing not only every SCOTUS opinion, but also every SCOTUS disposition from 1791-2010. Slight variants of the slides below were presented at the Penn Computational Linguistics Lunch (CLunch) and the Linguistic Data Consortium(LDC).  We really appreciated the feedback and are looking forward to continue our work with the LDC.  For those who might be interested, take a look at the slides embedded below or click on this link:

Large Scale ( 130,000 + ) Zoomable Visualization of a Twitter Network

Starting with the Michael Bommarito’s twitter handle mjbommar, we built this visualization by collecting Mike’s direct friends, friends-friends, friends-friends-friends, etc. until we decided to stop …. just after passing 130,000 total twitter handles.  Using the Fruchterman-Rheingold algorithm, we visualized a network where |V| = 130365, |E| = 197399.

Those interested in reviewing some other twitter visualizations, please consult Nathan Yau at Flowing Data who has collected some of his favorites.  To our knowledge, the visualization we offer above is one the larger visualizations of twitter that have been produced to date. When you zoom in, you will notice we have flagged some of the celebrity twitter users we detected in the mjbommar friends-friends-friends, etc. network.  For example, as shown above Ashton Kutcher (aplusk), Chad Ochocinco (OGOchOCinco) and RainnWilson (rainnwilson) are contained therein.

Given the budget limitations of this blog, we cannot host this visualization in house. However, if you click the picture above, you can access the visual from Seadragon … a zoomable visualization platform from Microsoft Labs.

The Senate Campaign Contribution Network: A Visualization Repost in Light of the Court’s Decision in Citizens United v. Federal Election Commission

Today’s decision in Citizens United v. Federal Election Commission has justifiably generated a significant amount of media / blogosphere coverage. For those not familiar with the Court’s decision, there is a full roundup of analysis available at SCOTUS Blog and Election Law Blog. In light of today’s decision we decided to repost highlights of our visualization of the campaign contribution network for the Senators of the 110th Congress. For those interested, the original post is offered here and the documentation is here. Also, there are variety of other related posts related to the 110th Congress available under this tag.  Suffice to say, in light of today’s decision, there is likely to be some significant changes to the contribution network of the 111th Congress (Second Session) ….

GraphMovie: A Library for Generating Movies from Dynamic Graphs with igraph

Over the past few months, we’ve developed a library for simply generating dynamic network animations. We’ve used this library in visualizations like (1) Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber and (2) Dynamic Animation of the East Anglia Climate Research Unit Email Network.  Prior to these visualizations, we’ve used Sonia to produce animations like this one. While certainly a useful program for those without programming expertise, Sonia suffers from a number of issues that make it unusable for large graphs or graphs with many “slices.”  Furthermore, in our experience rendering various movies a number of platform issues with the Quicktime and Flash rendering engines have arisen.  Fixing these problems is possible, but Sonia’s large Java codebase makes for a steep learning curve.  As a result, we’ve decided to release this GraphMovie class so that others can use or possibly improve this library.

In order to use the GraphMovie, you’ll need the following:

  • python (tested with 2.6)
  • igraph for network manipulation and visualization
  • Python Imaging Library for manipulating the image frames
  • mencoder from the MPlayer package for encoding the image frames into a movie

Here are the files, hosted on github:

GraphMovie: Example 1 from Computational Legal Studies on Vimeo.

GraphMovie: Example 2 from Computational Legal Studies on Vimeo.

Netflix Challenge for SCOTUS Prediction?

During our break from blogging, Ian Ayers offered a very interesting post over a Freakonomics entitled “Prediction Markets vs. Super Crunching: Which Can Better Predict How Justice Kennedy Will Vote?” In general terms, the post compares the well known statistical model offered by Martin-Quinn to the new Supreme Court Fantasy League created by Josh Blackman. We were particularly interested in a sentence located at end of the post … “[T]he fantasy league predictions would probably be more accurate if market participants had to actually put their money behind their predictions (as with intrade.com).”  This point is well taken. Extending the idea of having some “skin in the game,” we wondered what sort of intellectual returns could be generated for the field of quantitative Supreme Court prediction by some sort of Netflix style SCOTUS challenge.

The Martin-Quinn model has significantly advanced the field of quantitative analysis of the United States Supreme Court. However, despite all of the benefits the model has offered, it is unlikely to be the last word on the question. While only time will tell, an improved prediction algorithm might very well be generated through the application of ideas in machine learning and via incorporation of additional components such as text, citations, etc.

With significant financial sum at stake … even far less than the real Netflix challenge … it is certainly possible that a non-trivial mprovement could be generated. In a discussion among a few of us here at the Michigan CSCS lab, we generated the following non-exhaustive set of possible ground rules for a Netflix Style SCOTUS challenge:

  1. To be unseated, the winning team should be required to make a non-trivial improvement upon the out-of-sample historical success of the Martin-Quinn Model.
  2. To prevent overfitting, the authors of this non-trivial improvement should be required to best the existing model for some prospective period.
  3. All of those who submit agree to publish their code in a standard programming language (C, Java, Python, etc.) with reasonable commenting / documentation.

Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber

Based on the Farouk1986 Gawaher data posted earlier this week, we have analyzed the communication network of the alleged Christmas Day Bomber, Umar Farouk Abdulmutallab.

Using the handle “Farouk1986,” Abdulmutallab was a regular participant on the Islamic forum Gawaher.com. Several years prior to the Christmas Day incident, the alleged Christmas Day Bomber took part in a significant number of communications.  Of course, these communications can be analyzed in a number of ways.  For example, over at Zero Intelligence Agents, Drew Conway has already done some useful initial analysis. We sought to contribute our analysis of the time-evolving communication network contained within these posts.  While more extensive documentation is available below, in reviewing the dynamic network visualization, consider the following observations:

Click on the Full Screen Button! (4 Arrow Symbol in the Vimeo Bottom Banner)

#1) “Farouk1986” Entered an Existing Network Which Appeared to Increase the Salience of Religion in His Life

Although individuals in society may feel isolated or appear to be loners, the internet offers like minded, potentially meaningful networks of people with whom to connect. These internet is full of communities of individuals who interests are wide ranging—including topics such as Blizzard’s World of Warcraft, sports, culinary interests or religion.  With whatever prior beliefs he held, “Farouk1986” entered this subset of the broader Islamic online community in late 2004.  While it is not possible for us to make definitive conclusions, it appears that the community with whom he connected increased the salience of religion in his life.  In other words, through the internet “Farouk1986” experienced a reinforcing feedback and this likely primed him for further radicalization.

#2) The Network of “Farouk1986” Grows Increasingly Stable Once Established

“Farouk1986” increasingly communicated with the same set of individuals over the window in question.  Thus, while communication continued to flow through the network … the network, once established, remains fairly stable.  In other words, instead of being exposed to diverse sets of individuals, “Farouk1986” continued to communicate with the same individuals. In turn, those direct contacts also continued to communicate with the same individuals.

#3) Additional Streams of Data Would Enhance Analysis

The forum posts which serve as the data for this analysis are only a subset of the communication network experienced by Umar Farouk Abdulmutallab. Additional streams of relevant data would include phone records, emails, participation in other forums, etc. would likely enhance the granularity of our analysis. If you have access to such data and are legally authorized to share it-please feel free to contact us.

Background

For those not already familar with the case, Umar Farouk Abdulmutallab is charged with willful attempt to destroy an aircraft in connection with the December 25, 2009 Delta Flight 253 from Amsterdam to Detroit.  Like many, we wondered precisely what path led the son of a wealthy banker to find himself as a would be suicide bomber.  While these communications represent only a small portion of the broader picture, a number of illuminating analyses can still be conducted using this available information.

Using the handle “Farouk1986,” Umar Farouk Abdulmutallab was a regular participant on the popular Islamic forum Gawaher.com. Thus, as a small contribution to the broader analysis of the Christmas Day Incident, we have generated a basic visualization and analysis of the time-evolving structure of the Farouk1986 online communication network.

Filtration of the Gawaher.com Forum

As a major Islamic forum, Gawaher.com features a tremendous number of participants and a wide range of post on topics including Islamic culture, religion, international football, politics, etc.

Given our specific interest in the online behavior of Umar Farouk Abdulmutallab, we were most interested in analyzing the direct and indirect communication network associated with the handle “Farouk1986” (aka Umar Farouk Abdulmutallab).  Therefore, it was necessary to filter the broader universe of communication on Gawaher.com to the relevant subset.

A portion of this information is contained in publically available NEFA dataset.  While useful, we determined that this dataset alone did not include the information necessary for us to construct the Farouk1986 secondary/indirect communications network. In order to obtain a better understanding of this communication network, we retrieved every “topic” in which Farouk1986 participated at least once.  Each “topic” is comprised of one or more “posts” from one or more users.  Each “post” may be in response to another user’s “post.”  The NEFA data contains only posts made by Farouk1986 – our data contains the entire context within which his posts existed.

Building the Time-Evolving Network of Direct and Indirect Communications

Building from this underlying data, we sought to both visualize and analyze the time evolving structure of the “Farouk1986” communication network. For those not familiar with network visualization and analytic techniques—networks consist of both nodes and edges.

In the animation offered above, each “node” is an author.  The labels of all best the most central authors have been removed for visibility purposes. Each “edge” is a weighted connection between two authors, where the weight is the strength of connection between each individual. Thus, within the communication network, thicker edges represent more communications while thinner edges reflect fewer communications.

In the visualization, you will notice most nodes are colored black.  For purposes of ocular differentiation, the Farouk1986 node is colored red.  In addition, we color direct communications with Farouk1986 in red and communications not directly involving Farouk1986 in  black.

Given each forum post is datestamped, we can order the network such that the animation reflects the changing composition of the Farouk1986 online communication network. The datestamp is reflected in the upper left corner of animation.  Our analysis is limited to the 2004-2005 time period when Farouk1986 was a regular participant.

The network is visualized in each time step using the Kamada-Kawai Visualization Algorithm. Kamada-Kawai is spring embedded force directed placement algorithm commonly used to visualize networks similar to the one considered herein. In order to smooth the visual while not undercutting the qualitatively results, we apply linear interpolation between frames.

10 Most Central Participants in the Farouk1986 Network

The following are the ten most central participants in the network, as measured by weighted eigenvector centrality:

Author Centrality
Crystal Eyes 1
property_of_allah 0.84
Farouk1986 0.81
amani 0.69
Mansoor Ansari 0.61
sis Qassab 0.55
muslim mujahid 0.49
Arwa 0.43
sister in islam 0.31
Anj 0.29

Directions for Additional Analysis

(a) Computational Linguistic Analysis of the Underlying Posts

Over what substantive dimensions did these networks of direct and indirect communication form?

(b) Recursive Growth of the Network

Friends-Friends-Friends and so on….

(c) Complete Analysis of the Gawaher.com Forum

Were the patterns of communications by Farouk1986 noticeably different from other forum participants?

(d) Linkage of Content and Structure

What is the nature of information diffusion across the Gawaher.com?

How did this differ by substantive topic?

Gawaher Forum Content from Farouk1986, the Christmas Day Bomber

Drew Conway over at Zero Intelligence Agents brought to our attention the Farouk1986 data set provided by Evan Kohlman from the NEFA Foundation. For those not familiar, Umar Farouk Abdulmutallab, who is charged with willful attempt to destroy an aircraft in connection with the Christmas Day Flight 253 from Amsterdam to Detroit, made a  number of posts to the Islamic Forum Gawaher.com using the handle “Farouk1986.”

This data is useful and some good initial analysis has been offered at ZIA as well as other sites across the blogosphere. Moving beyond this initial analysis, we thought it would be helpful to analyze this set within the broader context of Umar Farouk Abdulmutallab’s posts. Thus, we downloaded the entire thread for each post in the NEFA data set — including content from many other authors.

Having this content allows us to understand the broader context of Abdulmutallab communications … including to what Abdulmutallab was responding and how others in the relevant community responded to his contributions.  We have parsed the data into threads and posts, and for each post, we have indicated the author, date, and content. For those interested in executing their own analysis, you can find an XML document with all this data here:  http://www-personal.umich.edu/~mjbommar/farouk.xml.

Feel free to use this data with proper attribution and keep your eyes posted for further analysis of the Abdulmutallab communications on this blog in the coming days.

And to all of our readers Happy New Year!

Happy Holidays!

Happy Holidays!

While enjoying the holidays, we are taking a short break from blogging.  In the meantime, we will keep our computers working hard … we hope to have some new and (hopefully) interesting content in 2010!