Archive

Posts Tagged ‘network analysis’

Gregory Todd Jones — Evolution of Complexity and “Rethinking Individuality” at TedX Atlanta

March 9th, 2010

As a member of the Society for Evolutionary Analysis in Law (SEAL), I have had the oppurtunity to see a number of interesting presentations by Gregory Todd Jones. Gregory is a Faculty Research Fellow and Adjunct Professor of Law at the Georgia State University College of Law as well as Senior Director of Research and Principal Scientist at the Network for Collaborative Problem Solving. Of particular interest to readers of this blog, he is also the founding director of the Computational Laboratory for Complex Adaptive Systems at Georgia State Law School.

Above is a recent talk by Gregory at the TedX Atlanta in which he (1) assembles a model of sustainability based on collaboration and (2) discusses species behavior … from slugs to chimpanzees.  If you are interested in learning more … Gregory has launched a really cool blog … Cooperation Science Blog … Check it out!

dmartink Uncategorized , ,

On the Stability of Community Detection Algorithms on Longitudinal Citation Data: Paper & New Code

March 4th, 2010

Last summer, Dan, Jon and I wrote our conference paper for ASNA 2009 entitled On the Stability of Community Detection Algorithms on Longitudinal Citation Data.  The purpose of this paper was to experimentally explore a number of properties of community detection algorithms, especially as applied to citation networks (c.f. Supreme Court, Tax Court, United States Code).

The model is a fairly simple discrete-time directed growing graph.  At time 0, we create a completely disconnected graph with some initial number of vertices.  The number of new vertices after time 0 is then modeled by a homogeneous Poisson process.  For each of these new vertices, we also model the number of edges per vertex as IID Poisson distributions.  The probability distribution over these edges is specified by a modified preferential attachment mechanism.

Most of our questions consider what we’ve called the average pairwise stability.  This can be thought of as answering the following question: “what is the probability that Alice and Bob are friends tomorrow if they were friends today?” Here, friendship corresponds to sharing the same community.  By asking this question for all pairs of vertices (dyads) for all time steps in which both vertices existed, we get a probability that a community dyad is preserved from step to step.  It is important to note that we’re not claiming all algorithms applied to all systems should have high average pairwise stability. In fact, for systems that involve dynamics like random rewiring, the only way to get high pairwise stability is to put all vertices in the same community or their own community at all steps – obviously trivial and unhelpful solutions in practice.

Given this model and this conception of stability, we want to perform the following experiments:

  1. How do the edge-betweenness, fast greedy, and leading eigenvector community detection algorithms compare in terms of their average pairwise stability…
    1. for varying levels of preferential attachment?
    2. for varying vertex and edge rates?
  2. Is there a significant tradeoff between the number of communities and the average pairwise stability of these community detection algorithms?

The answers are yes, yes, and yes!  You should read the paper for more details.

I’ve also produced some code to help you assess the average pairwise stability for your dataset.  The code requires igraph and is only in Python at this point due to an issue with R’s vertex label handling (which I can hopefully work around).  You can get the average pairwise stability methods on github and check out the example below:

mjbommar Uncategorized , ,

“The Enemy of My Enemy” – Steve Strogatz in the NY Times

February 17th, 2010

The Development of Structure in the Citation Network of the United States Supreme Court — Now in HD!

February 11th, 2010

What are some of the key takeaway points?

(1) The Supreme Court’s increasing reliance upon its own decisions over the 1800-1830 window.

(2) The important role of maritime/admiralty law in the early years of the Supreme Court’s citation network.  At least with respect to the Supreme Court’s citation network, these maritime decisions are the root of the Supreme Court’s jurisprudence.

(3) The increasing centrality of decisions such as Marbury v. MadisonMartin v. Hunter’s Lessee to the overall network.

The Development of Structure in the SCOTUS Citation Network

The visualization offered above is the largest weakly connected component of the citation network of the United States Supreme Court (1800-1829). Each time slice visualizes the aggregate network as of the year in question.

In our paper entitled Distance Measures for Dynamic Citation Networks, we offer some thoughts on the early SCOTUS citation network.  In reviewing the visual above note ….“[T]he Court’s early citation practices indicate a general absence of references to its own prior decisions. While the court did invoke well-established legal concepts, those concepts were often originally developed in alternative domains or jurisdictions. At some level, the lack of self-reference and corresponding reliance upon external sources is not terribly surprising. Namely, there often did not exist a set of established Supreme Court precedents for the class of disputes which reached the high court. Thus, it was necessary for the jurisprudence of the United States Supreme Court, seen through the prism of its case-to-case citation network, to transition through a loading phase. During this loading phase, the largest weakly connected component of the graph generally lacked any meaningful clustering. However, this sparsely connected graph would soon give way, and by the early 1820’s, the largest weakly connected component displayed detectable structure.”

What are the elements of the network?

What are the labels?

To help orient the end-user, the visualization highlights several important decisions of the United States Supreme Court offered within the relevant time period:

Marbury v. Madison, 5 U.S. 137 (1803) we labeled as ”Marbury”
Murray v. The Charming Betsey, 6 U.S. 64 (1804) we labeled as “Charming Betsey”
Martin v. Hunter’s Lessee, 14 U.S. 304 (1816) we labeled as “Martin’s Lessee”
The Anna Maria, 15 U.S. 327 (1817) we labeled as “Anna Maria”
McCulloch v. Maryland, 17 U.S. 316 (1819) we labeled as “McCulloch”

Why do cases not always enter the visualization when they are decided?

As we are interested in the core set of cases, we are only visualizing the largest weakly connected component of the United States Supreme Court citation network. Cases are not added until they are linked to the LWCC.  For example, Marbury v. Madison is not added to the visualization until a few years after it is decided.

How do I best view the visualization?

Given this is a high-definition video, it may take few seconds to load.  We believe that it is worth the wait.  In our view, the video is best consumed (1) Full Screen (2) HD On (3) Scaling Off.

Where can I find related papers?

Here is a non-exhaustive list of related scholarship:

Michael Bommarito, Daniel Katz, Jon Zelner & James Fowler, Distance Measures for Dynamic Citation Networks (Under Review).

Yonatan Lupu & James Fowler, The Strategic Content Model of Supreme Court Opinion Writing, APSA 2009 Toronto Meeting Paper.

Michael Bommarito, Daniel Katz & Jon Zelner, Law as a Seamless Web? Comparison of Various Network Representations of the United States Supreme Court Corpus (1791-2005) in Proceedings of the 12th Intl. Conference on Artificial Intelligence and Law (2009).

Frank Cross, Thomas Smith & Antonio Tomarchio, The Reagan Revolution in the Network of Law, 57 Emory L. J. 1227 (2008).

James Fowler & Sangick Jeon, The Authority of Supreme Court Precedent, 30 Soc. Networks 16 (2008).

Elizabeth Leicht, Gavin Clarkson, Kerby Shedden & Mark Newman, Large-Scale Structure of Time Evolving Citation Networks, 59 European Physics Journal B 75 (2007).

Thomas Smith, The Web of the Law, 44 San Diego L.R. 309 (2007).

James Fowler, Timothy R. Johnson, James F. Spriggs II, Sangick Jeon & Paul J. Wahlbeck, Network Analysis and the Law: Measuring the Legal Importance of Precedents at the U.S. Supreme Court, 15 Political Analysis, 324 (2007).

_

dmartink Uncategorized , , ,

The Development of Structure in the Citation Network of the United States Supreme Court

February 10th, 2010

What are some of the key takeaway points?

(1) The Supreme Court’s increasing reliance upon its own decisions over the 1800-1830 window.

(2) The important role of maritime/admiralty law in the early years of the Supreme Court’s citation network.  At least with respect to the Supreme Court’s citation network, these maritime decisions are the root of the Supreme Court’s jurisprudence.

(3) The increasing centrality of decisions such as Marbury v. MadisonMartin v. Hunter’s Lessee to the overall network.

The Development of Structure in the SCOTUS Citation Network

The visualization offered above is the largest weakly connected component of the citation network of the United States Supreme Court (1800-1829). Each time slice visualizes the aggregate network as of the year in question.

In our paper entitled Distance Measures for Dynamic Citation Networks, we offer some thoughts on the early SCOTUS citation network.  In reviewing the visual above note ….“[T]he Court’s early citation practices indicate a general absence of references to its own prior decisions. While the court did invoke well-established legal concepts, those concepts were often originally developed in alternative domains or jurisdictions. At some level, the lack of self-reference and corresponding reliance upon external sources is not terribly surprising. Namely, there often did not exist a set of established Supreme Court precedents for the class of disputes which reached the high court. Thus, it was necessary for the jurisprudence of the United States Supreme Court, seen through the prism of its case-to-case citation network, to transition through a loading phase. During this loading phase, the largest weakly connected component of the graph generally lacked any meaningful clustering. However, this sparsely connected graph would soon give way, and by the early 1820’s, the largest weakly connected component displayed detectable structure.”

What are the elements of the network?

What are the labels?

To help orient the end-user, the visualization highlights several important decisions of the United States Supreme Court offered within the relevant time period:

Marbury v. Madison, 5 U.S. 137 (1803) we labeled as ”Marbury”
Murray v. The Charming Betsey, 6 U.S. 64 (1804) we labeled as “Charming Betsey”
Martin v. Hunter’s Lessee, 14 U.S. 304 (1816) we labeled as “Martin’s Lessee”
The Anna Maria, 15 U.S. 327 (1817) we labeled as “Anna Maria”
McCulloch v. Maryland, 17 U.S. 316 (1819) we labeled as “McCulloch”

Why do cases not always enter the visualization when they are decided?

As we are interested in the core set of cases, we are only visualizing the largest weakly connected component of the United States Supreme Court citation network. Cases are not added until they are linked to the LWCC.  For example, Marbury v. Madison is not added to the visualization until a few years after it is decided.

How do I best view the visualization?

Those interested in viewing the full screen video—click on the full screen icon contained in the Vimeo bottom banner.  Check out the NEW Hi-Def (HD) version of the video!


dmartink Uncategorized , , ,

GraphMovie: A Library for Generating Movies from Dynamic Graphs with igraph

January 17th, 2010

Over the past few months, we’ve developed a library for simply generating dynamic network animations. We’ve used this library in visualizations like (1) Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber and (2) Dynamic Animation of the East Anglia Climate Research Unit Email Network.  Prior to these visualizations, we’ve used Sonia to produce animations like this one. While certainly a useful program for those without programming expertise, Sonia suffers from a number of issues that make it unusable for large graphs or graphs with many “slices.”  Furthermore, in our experience rendering various movies a number of platform issues with the Quicktime and Flash rendering engines have arisen.  Fixing these problems is possible, but Sonia’s large Java codebase makes for a steep learning curve.  As a result, we’ve decided to release this GraphMovie class so that others can use or possibly improve this library.

In order to use the GraphMovie, you’ll need the following:

  • python (tested with 2.6)
  • igraph for network manipulation and visualization
  • Python Imaging Library for manipulating the image frames
  • mencoder from the MPlayer package for encoding the image frames into a movie

Here are the files, hosted on github:

GraphMovie: Example 1 from Computational Legal Studies on Vimeo.

GraphMovie: Example 2 from Computational Legal Studies on Vimeo.

mjbommar Uncategorized , ,

Colbert Nation Begins 2010 w/ Riley Crane (Mon) & James Fowler (Thurs)

January 8th, 2010

Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber

January 6th, 2010

Based on the Farouk1986 Gawaher data posted earlier this week, we have analyzed the communication network of the alleged Christmas Day Bomber, Umar Farouk Abdulmutallab.

Using the handle “Farouk1986,” Abdulmutallab was a regular participant on the Islamic forum Gawaher.com. Several years prior to the Christmas Day incident, the alleged Christmas Day Bomber took part in a significant number of communications.  Of course, these communications can be analyzed in a number of ways.  For example, over at Zero Intelligence Agents, Drew Conway has already done some useful initial analysis. We sought to contribute our analysis of the time-evolving communication network contained within these posts.  While more extensive documentation is available below, in reviewing the dynamic network visualization, consider the following observations:

Click on the Full Screen Button! (4 Arrow Symbol in the Vimeo Bottom Banner)

#1) “Farouk1986″ Entered an Existing Network Which Appeared to Increase the Salience of Religion in His Life

Although individuals in society may feel isolated or appear to be loners, the internet offers like minded, potentially meaningful networks of people with whom to connect. These internet is full of communities of individuals who interests are wide ranging—including topics such as Blizzard’s World of Warcraft, sports, culinary interests or religion.  With whatever prior beliefs he held, “Farouk1986” entered this subset of the broader Islamic online community in late 2004.  While it is not possible for us to make definitive conclusions, it appears that the community with whom he connected increased the salience of religion in his life.  In other words, through the internet “Farouk1986” experienced a reinforcing feedback and this likely primed him for further radicalization.

#2) The Network of “Farouk1986″ Grows Increasingly Stable Once Established

“Farouk1986″ increasingly communicated with the same set of individuals over the window in question.  Thus, while communication continued to flow through the network … the network, once established, remains fairly stable.  In other words, instead of being exposed to diverse sets of individuals, “Farouk1986″ continued to communicate with the same individuals. In turn, those direct contacts also continued to communicate with the same individuals.

#3) Additional Streams of Data Would Enhance Analysis

The forum posts which serve as the data for this analysis are only a subset of the communication network experienced by Umar Farouk Abdulmutallab. Additional streams of relevant data would include phone records, emails, participation in other forums, etc. would likely enhance the granularity of our analysis. If you have access to such data and are legally authorized to share it-please feel free to contact us.

Background

For those not already familar with the case, Umar Farouk Abdulmutallab is charged with willful attempt to destroy an aircraft in connection with the December 25, 2009 Delta Flight 253 from Amsterdam to Detroit.  Like many, we wondered precisely what path led the son of a wealthy banker to find himself as a would be suicide bomber.  While these communications represent only a small portion of the broader picture, a number of illuminating analyses can still be conducted using this available information.

Using the handle “Farouk1986,” Umar Farouk Abdulmutallab was a regular participant on the popular Islamic forum Gawaher.com. Thus, as a small contribution to the broader analysis of the Christmas Day Incident, we have generated a basic visualization and analysis of the time-evolving structure of the Farouk1986 online communication network.

Filtration of the Gawaher.com Forum

As a major Islamic forum, Gawaher.com features a tremendous number of participants and a wide range of post on topics including Islamic culture, religion, international football, politics, etc.

Given our specific interest in the online behavior of Umar Farouk Abdulmutallab, we were most interested in analyzing the direct and indirect communication network associated with the handle “Farouk1986” (aka Umar Farouk Abdulmutallab).  Therefore, it was necessary to filter the broader universe of communication on Gawaher.com to the relevant subset.

A portion of this information is contained in publically available NEFA dataset.  While useful, we determined that this dataset alone did not include the information necessary for us to construct the Farouk1986 secondary/indirect communications network. In order to obtain a better understanding of this communication network, we retrieved every “topic” in which Farouk1986 participated at least once.  Each “topic” is comprised of one or more “posts” from one or more users.  Each “post” may be in response to another user’s “post.”  The NEFA data contains only posts made by Farouk1986 – our data contains the entire context within which his posts existed.

Building the Time-Evolving Network of Direct and Indirect Communications

Building from this underlying data, we sought to both visualize and analyze the time evolving structure of the “Farouk1986” communication network. For those not familiar with network visualization and analytic techniques—networks consist of both nodes and edges.

In the animation offered above, each “node” is an author.  The labels of all best the most central authors have been removed for visibility purposes. Each “edge” is a weighted connection between two authors, where the weight is the strength of connection between each individual. Thus, within the communication network, thicker edges represent more communications while thinner edges reflect fewer communications.

In the visualization, you will notice most nodes are colored black.  For purposes of ocular differentiation, the Farouk1986 node is colored red.  In addition, we color direct communications with Farouk1986 in red and communications not directly involving Farouk1986 in  black.

Given each forum post is datestamped, we can order the network such that the animation reflects the changing composition of the Farouk1986 online communication network. The datestamp is reflected in the upper left corner of animation.  Our analysis is limited to the 2004-2005 time period when Farouk1986 was a regular participant.

The network is visualized in each time step using the Kamada-Kawai Visualization Algorithm. Kamada-Kawai is spring embedded force directed placement algorithm commonly used to visualize networks similar to the one considered herein. In order to smooth the visual while not undercutting the qualitatively results, we apply linear interpolation between frames.

10 Most Central Participants in the Farouk1986 Network

The following are the ten most central participants in the network, as measured by weighted eigenvector centrality:

Author Centrality
Crystal Eyes 1
property_of_allah 0.84
Farouk1986 0.81
amani 0.69
Mansoor Ansari 0.61
sis Qassab 0.55
muslim mujahid 0.49
Arwa 0.43
sister in islam 0.31
Anj 0.29

Directions for Additional Analysis

(a) Computational Linguistic Analysis of the Underlying Posts

Over what substantive dimensions did these networks of direct and indirect communication form?

(b) Recursive Growth of the Network

Friends-Friends-Friends and so on….

(c) Complete Analysis of the Gawaher.com Forum

Were the patterns of communications by Farouk1986 noticeably different from other forum participants?

(d) Linkage of Content and Structure

What is the nature of information diffusion across the Gawaher.com?

How did this differ by substantive topic?

dmartink Uncategorized , , ,

Dynamic Animation of the East Anglia Climate Research Unit Email Network

December 2nd, 2009

 

FULL SCREEN FOR BETTER VIEWING!

Click on this icon to view the Movie in Full Screen Mode!Picture 4

STATIC SNAPSHOT TO DYNAMIC ANIMATION

In our prior post analyzing the email database of Climate Research Unit at the University of East Anglia, we aggregated all emails over the relevant 1997-2009 time period into a single static visualization. Specifically, to build the network, we processed every email in the leaked data. Each email contains a sender and at least one recipient on the To:, Cc:, or Bcc: line.

One obvious shortcoming associated with producing a static snapshot for data set, is that it often obscures the time evolving dynamics of interaction which produced the full graph.  To generate a dynamic picture, it is necessary to collect time stamped network data. In the current case, this required acquisition of the date field for each of the emails. With this information, we used the same underlying data to generate a dynamic network animation for the 1997-2009 time window.

HOW TO INTERPET THE MOVIE

Consistent with the approach offered in our prior visualization, each node represents an individual within the email dataset while each connection reflects the weighted relationship between those individuals. The movie posted above features the date in the upper left.  As time ticks forward, you will notice that the relative social relationships between individuals are updated with each new batch of emails.  In some periods, this updating has significant impact upon the broader network topology and at other time it imposes little structural consequences.

In each period, both new connections as well as new communications across existing connections are colored teal while the existing and dormant relationships remain white.  Among other things, this is useful because it identifies when a connection is established and which interactions are active at any given time period.

A SHORT VERSION AND A LONG VERSION

We have two separate versions of the movie.  The version above is a shorter version where roughly 13 years is displayed in under 2 minutes.  In the coming days, we will have a longer version of the movie which ticks a one email at a time. In both versions, each frame is rendered using the Kamada-Kawai layout algorithm. Then, the frames are threaded together using linear interpolation.

SELECTION EFFECTS

Issues of selection of confront many researchers. Namely, given the released emails are only a subset of the broader universe of emails authored over the relevant time window, it is important to remember that the data has been filtered and the impact of this filtration can not be precisely determined. Notwithstanding this issue, our assumption is that every email from a sender to a recipient represents a some level of relationship between them.  Furthermore, we assume that more emails sent between two people generally indicates a stronger relationship between those individuals.

DIMENSIONALITY

In our academic scholarship, we have confronted questions of dimensionality in network data. Simply put, analyzing network data drawn from high dimensional space can be really thorny. In the current context, a given email box likely contains emails on lots of subjects and reflects lots of people not relevant to the specific issue in question. Again, while we do not specifically know the manner in which the filter was applied, it is certainly possible that the filter actually served to mitigate issues of dimensionality.

ACCESS THE DATA

For those interested in searching the emails, the NY Times directs the end user to http://www.eastangliaemails.com/

dmartink Uncategorized , , ,

Bommarito, Katz, Zelner & Fowler – Distance Measures for Dynamic Citation Networks (Version 3.0 w/ Theoretical Model & SCOTUS Citation Network Application)

November 30th, 2009
Computational Legal Studies™