Archive

Archive for the ‘Uncategorized’ Category

On the Stability of Community Detection Algorithms on Longitudinal Citation Data: Paper & New Code

March 4th, 2010

Last summer, Dan, Jon and I wrote our conference paper for ASNA 2009 entitled On the Stability of Community Detection Algorithms on Longitudinal Citation Data.  The purpose of this paper was to experimentally explore a number of properties of community detection algorithms, especially as applied to citation networks (c.f. Supreme Court, Tax Court, United States Code).

The model is a fairly simple discrete-time directed growing graph.  At time 0, we create a completely disconnected graph with some initial number of vertices.  The number of new vertices after time 0 is then modeled by a homogeneous Poisson process.  For each of these new vertices, we also model the number of edges per vertex as IID Poisson distributions.  The probability distribution over these edges is specified by a modified preferential attachment mechanism.

Most of our questions consider what we’ve called the average pairwise stability.  This can be thought of as answering the following question: “what is the probability that Alice and Bob are friends tomorrow if they were friends today?” Here, friendship corresponds to sharing the same community.  By asking this question for all pairs of vertices (dyads) for all time steps in which both vertices existed, we get a probability that a community dyad is preserved from step to step.  It is important to note that we’re not claiming all algorithms applied to all systems should have high average pairwise stability. In fact, for systems that involve dynamics like random rewiring, the only way to get high pairwise stability is to put all vertices in the same community or their own community at all steps – obviously trivial and unhelpful solutions in practice.

Given this model and this conception of stability, we want to perform the following experiments:

  1. How do the edge-betweenness, fast greedy, and leading eigenvector community detection algorithms compare in terms of their average pairwise stability…
    1. for varying levels of preferential attachment?
    2. for varying vertex and edge rates?
  2. Is there a significant tradeoff between the number of communities and the average pairwise stability of these community detection algorithms?

The answers are yes, yes, and yes!  You should read the paper for more details.

I’ve also produced some code to help you assess the average pairwise stability for your dataset.  The code requires igraph and is only in Python at this point due to an issue with R’s vertex label handling (which I can hopefully work around).  You can get the average pairwise stability methods on github and check out the example below:

mjbommar Uncategorized , ,

Gary Flake: Is Pivot a Turning Point for Web Exploration? [TED 2010]

March 4th, 2010


Another in the series of talks at TED 2010 …. From the description … “Gary Flake demos Pivot, a new way to browse and arrange massive amounts of images and data online. Built on breakthrough Seadragon technology, it enables spectacular zooms in and out of web databases, and the discovery of patterns and links invisible in standard web browsing.”

dmartink Uncategorized ,

Putting The Credit Score Puzzle Together

March 3rd, 2010

The Data Deluge [Via The Economist]

March 1st, 2010

The cover story of this week’s Economist is entitled The Data Deluge. This is, of course, a favorite topic of the hosts of this blog. While a number of folks have already highlighted this trend, we are happy to see a mainstream outlet such the Economist reporting on the era of big data. Indeed, the convergence of rapidly increasing computing power, and decreasing data storage costs, on one side, and large scale data collection and digitization on the other … has already impacted practices in the business, government and scientific communities. There is ample reason to believe that more is on the way.

In our estimation, for the particular class of questions for which data is available, two major implications of the deluge are worth reiterating: (1) no need to make assumptions about the asymptotic performance of a particular sampling frames when population level data is readily available; and (2) what statistical sampling was to the 20th century, data filtering may very well be to the 21st ….

dmartink Uncategorized , ,

Dissipation and Displacement of Hotspots in Reaction-Diffusion Models of Crime [PNAS]

February 25th, 2010

From the abstract… “The mechanisms driving the nucleation, spread, and dissipation of crime hotspots are poorly understood. As a consequence, the ability of law enforcement agencies to use mapped crime patterns to design crime prevention strategies is severely hampered. We also lack robust expectations about how different policing interventions should impact crime. Here we present a mathematical framework based on reaction-diffusion partial differential equations for studying the dynamics of crime hotspots. The system of equations is based on empirical evidence for how offenders move and mix with potential victims or targets. Analysis shows that crime hotspots form when the enhanced risk of repeat crimes diffuses locally, but not so far as to bind distant crime together. Crime hotspots may form as either supercritical or subcritical bifurcations, the latter the result of large spikes in crime that override linearly stable, uniform crime distributions. Our mathematical methods show that subcritical crime hotspots may be permanently eradicated with police suppression, whereas supercritical hotspots are displaced following a characteristic spatial pattern. Our results thus provide a mechanistic explanation for recent failures to observe crime displacement in experimental field tests of hotspot policing.”

dmartink Uncategorized ,

Fraunhofer HHI’s New 7-Megapixel Immersive Cinema

February 23rd, 2010

We just thought this was an interesting peek into the not so distant future …. Enjoy!

dmartink Uncategorized ,

What’s Next? Judging Sequences of Binary Events [HT Paul Kedrosky]

February 21st, 2010

From the abstract … “The authors review research on judgments of random and nonrandom sequences involving binary events with a focus on studies documenting gambler’s fallacy and hot hand beliefs. The domains of judgment include random devices, births, lotteries, sports performances, stock prices, and others. After discussing existing theories of sequence judgments, the authors conclude that in many everyday settings people have naive complex models of the mechanisms they believe generate observed events, and they rely on these models for explanations, predictions, and other inferences about event sequences. The authors next introduce an explanation-based, mental models framework for describing people’s beliefs about binary sequences, based on 4 perceived characteristics of the sequence generator: randomness, intentionality, control, and goal complexity. Furthermore, they propose a Markov process framework as a useful theoretical notation for the description of mental models and for the analysis of actual event sequences.”

dmartink Uncategorized

Blaise Aguera y Arcas Demos Augmented-Reality Maps [Ted 2010]

February 18th, 2010

“Blaise Aguera y Arcas is an architect at Microsoft Live Labs, architect of Seadragon, and the co-creator of Photosynth, a monumental piece of software capable of assembling static photos into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look ….”

dmartink Uncategorized ,

“The Enemy of My Enemy” – Steve Strogatz in the NY Times

February 17th, 2010

Relational Topic Models for Document Networks — Chang & Blei

February 16th, 2010

This is a very important paper by Jonathan Chang and David Blei. Suffice to say, it has potential use in a wide class of social science applications.  Click here to access related material on Professor Blei’s Princeton homepage.  Click here for some slides (note 7.0 mb). Check it out!

dmartink Uncategorized , ,

Computational Legal Studies™