Visualizing the Gawaher Interactions of Umar Farouk Abdulmutallab, the Christmas Day Bomber

Based on the Farouk1986 Gawaher data posted earlier this week, we have analyzed the communication network of the alleged Christmas Day Bomber, Umar Farouk Abdulmutallab.

Using the handle “Farouk1986,” Abdulmutallab was a regular participant on the Islamic forum Gawaher.com. Several years prior to the Christmas Day incident, the alleged Christmas Day Bomber took part in a significant number of communications.  Of course, these communications can be analyzed in a number of ways.  For example, over at Zero Intelligence Agents, Drew Conway has already done some useful initial analysis. We sought to contribute our analysis of the time-evolving communication network contained within these posts.  While more extensive documentation is available below, in reviewing the dynamic network visualization, consider the following observations:

Click on the Full Screen Button! (4 Arrow Symbol in the Vimeo Bottom Banner)

#1) “Farouk1986” Entered an Existing Network Which Appeared to Increase the Salience of Religion in His Life

Although individuals in society may feel isolated or appear to be loners, the internet offers like minded, potentially meaningful networks of people with whom to connect. These internet is full of communities of individuals who interests are wide ranging—including topics such as Blizzard’s World of Warcraft, sports, culinary interests or religion.  With whatever prior beliefs he held, “Farouk1986” entered this subset of the broader Islamic online community in late 2004.  While it is not possible for us to make definitive conclusions, it appears that the community with whom he connected increased the salience of religion in his life.  In other words, through the internet “Farouk1986” experienced a reinforcing feedback and this likely primed him for further radicalization.

#2) The Network of “Farouk1986” Grows Increasingly Stable Once Established

“Farouk1986” increasingly communicated with the same set of individuals over the window in question.  Thus, while communication continued to flow through the network … the network, once established, remains fairly stable.  In other words, instead of being exposed to diverse sets of individuals, “Farouk1986” continued to communicate with the same individuals. In turn, those direct contacts also continued to communicate with the same individuals.

#3) Additional Streams of Data Would Enhance Analysis

The forum posts which serve as the data for this analysis are only a subset of the communication network experienced by Umar Farouk Abdulmutallab. Additional streams of relevant data would include phone records, emails, participation in other forums, etc. would likely enhance the granularity of our analysis. If you have access to such data and are legally authorized to share it-please feel free to contact us.

Background

For those not already familar with the case, Umar Farouk Abdulmutallab is charged with willful attempt to destroy an aircraft in connection with the December 25, 2009 Delta Flight 253 from Amsterdam to Detroit.  Like many, we wondered precisely what path led the son of a wealthy banker to find himself as a would be suicide bomber.  While these communications represent only a small portion of the broader picture, a number of illuminating analyses can still be conducted using this available information.

Using the handle “Farouk1986,” Umar Farouk Abdulmutallab was a regular participant on the popular Islamic forum Gawaher.com. Thus, as a small contribution to the broader analysis of the Christmas Day Incident, we have generated a basic visualization and analysis of the time-evolving structure of the Farouk1986 online communication network.

Filtration of the Gawaher.com Forum

As a major Islamic forum, Gawaher.com features a tremendous number of participants and a wide range of post on topics including Islamic culture, religion, international football, politics, etc.

Given our specific interest in the online behavior of Umar Farouk Abdulmutallab, we were most interested in analyzing the direct and indirect communication network associated with the handle “Farouk1986” (aka Umar Farouk Abdulmutallab).  Therefore, it was necessary to filter the broader universe of communication on Gawaher.com to the relevant subset.

A portion of this information is contained in publically available NEFA dataset.  While useful, we determined that this dataset alone did not include the information necessary for us to construct the Farouk1986 secondary/indirect communications network. In order to obtain a better understanding of this communication network, we retrieved every “topic” in which Farouk1986 participated at least once.  Each “topic” is comprised of one or more “posts” from one or more users.  Each “post” may be in response to another user’s “post.”  The NEFA data contains only posts made by Farouk1986 – our data contains the entire context within which his posts existed.

Building the Time-Evolving Network of Direct and Indirect Communications

Building from this underlying data, we sought to both visualize and analyze the time evolving structure of the “Farouk1986” communication network. For those not familiar with network visualization and analytic techniques—networks consist of both nodes and edges.

In the animation offered above, each “node” is an author.  The labels of all best the most central authors have been removed for visibility purposes. Each “edge” is a weighted connection between two authors, where the weight is the strength of connection between each individual. Thus, within the communication network, thicker edges represent more communications while thinner edges reflect fewer communications.

In the visualization, you will notice most nodes are colored black.  For purposes of ocular differentiation, the Farouk1986 node is colored red.  In addition, we color direct communications with Farouk1986 in red and communications not directly involving Farouk1986 in  black.

Given each forum post is datestamped, we can order the network such that the animation reflects the changing composition of the Farouk1986 online communication network. The datestamp is reflected in the upper left corner of animation.  Our analysis is limited to the 2004-2005 time period when Farouk1986 was a regular participant.

The network is visualized in each time step using the Kamada-Kawai Visualization Algorithm. Kamada-Kawai is spring embedded force directed placement algorithm commonly used to visualize networks similar to the one considered herein. In order to smooth the visual while not undercutting the qualitatively results, we apply linear interpolation between frames.

10 Most Central Participants in the Farouk1986 Network

The following are the ten most central participants in the network, as measured by weighted eigenvector centrality:

Author Centrality
Crystal Eyes 1
property_of_allah 0.84
Farouk1986 0.81
amani 0.69
Mansoor Ansari 0.61
sis Qassab 0.55
muslim mujahid 0.49
Arwa 0.43
sister in islam 0.31
Anj 0.29

Directions for Additional Analysis

(a) Computational Linguistic Analysis of the Underlying Posts

Over what substantive dimensions did these networks of direct and indirect communication form?

(b) Recursive Growth of the Network

Friends-Friends-Friends and so on….

(c) Complete Analysis of the Gawaher.com Forum

Were the patterns of communications by Farouk1986 noticeably different from other forum participants?

(d) Linkage of Content and Structure

What is the nature of information diffusion across the Gawaher.com?

How did this differ by substantive topic?

Gawaher Forum Content from Farouk1986, the Christmas Day Bomber

Drew Conway over at Zero Intelligence Agents brought to our attention the Farouk1986 data set provided by Evan Kohlman from the NEFA Foundation. For those not familiar, Umar Farouk Abdulmutallab, who is charged with willful attempt to destroy an aircraft in connection with the Christmas Day Flight 253 from Amsterdam to Detroit, made a  number of posts to the Islamic Forum Gawaher.com using the handle “Farouk1986.”

This data is useful and some good initial analysis has been offered at ZIA as well as other sites across the blogosphere. Moving beyond this initial analysis, we thought it would be helpful to analyze this set within the broader context of Umar Farouk Abdulmutallab’s posts. Thus, we downloaded the entire thread for each post in the NEFA data set — including content from many other authors.

Having this content allows us to understand the broader context of Abdulmutallab communications … including to what Abdulmutallab was responding and how others in the relevant community responded to his contributions.  We have parsed the data into threads and posts, and for each post, we have indicated the author, date, and content. For those interested in executing their own analysis, you can find an XML document with all this data here:  http://www-personal.umich.edu/~mjbommar/farouk.xml.

Feel free to use this data with proper attribution and keep your eyes posted for further analysis of the Abdulmutallab communications on this blog in the coming days.

And to all of our readers Happy New Year!

Happy Holidays!

Happy Holidays!

While enjoying the holidays, we are taking a short break from blogging.  In the meantime, we will keep our computers working hard … we hope to have some new and (hopefully) interesting content in 2010!

Visualizing Bank Failures ( 2008 – 2009 )

Three Takeaways

  1. Acceleration: There were four failures in the first six months of 2008, followed by another 22 failures in the next six months.  By January of 2009, there were 21 failures in the first three months of the year, followed by 138 from April to last Friday.
  2. Magnitude: Failures in the past two years have cost the Depositors Insurance Fund an estimated $57B.  The IndyMac failure of July 2008 accounted for $10B alone, followed by BankUnited at $4.9B and Guaranty Banks at $3B.
  3. Spatial Correlation: There is a significant amount of spatial correlation in California, Georgia, Florida, Texas, and Illinois.  These states account for 77% of the total costs to the Depositors Insurance Fund.  Furthermore, most of the losses in California and Georgia were concentrated highly around a few urban centers.

The Movie

The movie below shows the location of bank failures, beginning in 2008 and concluding with the three failed banks from Friday, December 11, 2009. Each green circle corresponds to a bank failure, and the size of each circle corresponds logarithmically to the FDIC’s estimated cost for the Depository Insurance Fund, as stated in the FDIC press releases. For failures with joint press releases, such as the 9 banks that failed on October 30th, the circles are sized in proportion to their relative total deposits.

Our visualization is similar to this one offered by the Wall Street Journal.  For sizing the circles, the WSJ relied upon the value of assets at the time of failure.  By contrast, our approach focuses upon the estimated impact to the Depositors Insurance Fund (DIF). In several instances, this alternative approach leads to a different qualitative result than the WSJ.  For example, consider the case of Washington Mutual. While many have characterized Washington Mutual’s failure as the largest in history, according to the FDIC press release the failure did not actually lead to a draw upon Depositors Insurance Fund.  By contrast, the FDIC estimated cost for the IndyMac Bank failure was substantial– the latest available estimate sets it at 10.7 billion.

Additional Background

As reported in a number of news outlets, Friday witnessed the failure of three more banks – Solutions Bank (Overland Park, KS), Valley Capital Bank (Mesa, AZ), and Republic Federal Bank (Miami, FL).

According to information obtained from the Federal Deposit Insurance Corporation (FDIC), there have been a total of 186 bank failures in the United States since 2000.  Of these, 159 banks or roughly 85% have occurred in the past two years.  The plot below displays the yearly failures since 2000.  These 159 failures over the past two years have cost the Depositors Insurance Fund an estimated $57B.

fdic_failures

In addition to the increase in the rate of bank failures, there has also been a substantial amount of spatial correlation between these failures.  The table below shows the five states with the highest estimated total costs to the Depositors Insurance Fund since 2008.  Together, these five states account for $44B of the total $57B in the past two years.

State Estimated Cost to Fund
California $19.33B
Georgia $9.29B
Florida $6.77B
Texas $4.56B
Illinois $4.12B

Who Owns America’s Debt? An Dynamic Perspective on the Major Foreign Holders of Treasury Securities [2002- Present]

Three Things to Notice

(1) China Passes Japan —  This dynamic visual demonstrates how in the fall of 2008 China surpassed Japan as the top foreign holder of U.S. Debt.

(2) The Rise of Russia —  Notice how Russia becomes a significant holder of U.S. Debt between late-2006 and mid-2007.

(3) The Increasing Amount of U.S. Debt Held Abroad — The pie chart is sized by the total debt held by the current top ten debt holders. As a function of U.S. expenditures over the relevant time period, this pie grows in nearly every time period. In the bottom right corner, we track the total debts held by the current top debt holders. Of course, this alone does not represent the complete picture as there is additional U.S. debt held by a variety of other other countries. Therefore, we also track the grand total of all debts held abroad in the bottom right corner of the visual.

Dynamic Perspective on the Increasing Amount of American Debt Held Abroad

Focusing upon the  “Major Foreign Holders of Treasury Securities,” we were interested in considering how today’s major debt holders acquired their top position. The data used the generate the visual above is drawn from United States Department of Treasury. For those interested in replicating our results, the current data is located here and the historical data is located here.

Cash for Clunkers – Visualization and Analysis


Cash for Clunkers: A Dynamic Map of the Cash Allowance Rebate Systems (CARS)

Some Background on the Car Allowance Rebate System (CARS)

From the official July 27, 2009 press release – “The National Highway Traffic Safety Administration (NHTSA) also released the final eligibility requirements to participate in the program.  Under the CARS program, consumers receive a $3,500 or $4,500 discount from a car dealer when they trade in their old vehicle and purchase or lease a new, qualifying vehicle. In order to be eligible for the program, the trade-in passenger vehicle must: be manufactured less than 25 years before the date it is traded in; have a combined city/highway fuel economy of 18 miles per gallon or less; be in drivable condition; and be continuously insured and registered to the same owner for the full year before the trade-in. Transactions must be made between now [July 27, 2009] and November 1, 2009 or until the money runs out.”

On August 6, 2009, Congress extended the program adding $2 billion dollars to the program’s initial allocation. For those interested in background, feel free to read the CNN report on the program extension.

On August 13, 2009, the Secretary offered this press release noting “[T]he Department of Transportation today clarified that consumers who want to purchase new vehicles not yet on dealer lots can still be eligible for the CARS program. Dealers and consumers who have reached a valid purchase and sale agreement on a vehicle already in the production pipeline will be able to work with the manufacturer to receive the documentation needed to qualify for the program.”

On August 20, 2009, the Secretary announced the program would end on August 24, 2009 at 8pm EST.  While this remained the deadline for sales, dealers were provided a small extension to file paperwork ( Noon on August 25, 2009). For those interested, all other press releases are available here.

The Cars.gov DataSet

The full data set is available for download here.  From the Cars.gov website “these reports contain the transaction level information entered by participating dealers for the 677,081 CARS transactions that were paid or approved for payment as of Friday, October 16, 2009 at 3:00PM EDT for a total of $2,850,162,500. Please note that confidential financial or commercial information and consumer information protected under the DOT privacy policy has been redacted.” The official cars.gov website offers additional caveats on its note to analysts.  One important thing to note, there is a statutory exemption which allowed transactions to occur pursuant to an amended rule after the August 24, 2009 termination date.  Here is the relevant language of the amended rule:

“To qualify for the exception process, a dealer must have been prevented from submitting an application for reimbursement due to a hardship caused by the agency. Specifically, a dealer may request an exception if the dealer was locked out of the CARS system, contacted NHTSA for a password reset prior to the announced deadline, but did not receive a password reset. A dealer also may request an exception if its timely transaction was rejected by the CARS system due to a duplicate State identification number, trade-in vehicle VIN, or new vehicle VIN that was never used for a submitted  CARS transaction, if the dealer contacted NHTSA prior to the announced deadline to resolve the issue but did not receive a resolution. Finally, a dealer may seek an exception if it was prevented from submitting a transaction by the announced deadline due to another hardship attributable to NHTSA’s action or inaction, upon submission of proof and justification satisfactory to the Administrator.”

For those who have downloaded the full set, the above passage explains why there exist transaction data which fall outside of the general CARS program window.

Dynamic Visualization of the Spatial Distribution of Sales

Each time step of the animation represents a day for which there exists data in the CARS official dataset. While the program officially started on July 27, 2009, the dataset contains both transactions undertaken during the pilot program as well as transactions undertaken pursuant the exemption process described above. Thus, the movie begins with the first unit of observation on July 1, 2009 and terminates with the final transaction on October 24, 2009.  Similar to a flip book, the movie is generated by threading together each daily time slice.

The Size and Color of Each Circle

Each circle represents a zip code in which one or more participating dealerships is located.  The radius of a given circle is function of the number of CARS related sales in a given zip code as of the date in question. In each day, the circle is colored if there is at least one sale in the current period while the circle is resized based upon the number of sales in the given period.

In the later days of the data window, particular those after official August 25 termination of the program, the daily sales are fairly negligible. However, as outlined in the dataset description above, each participating institution who qualified for the exemption was allowed to submit transactions beyond official program termination date. Notice the cumulative percentage of sales reach nearly all total sales by August 25th.  Virtually all sales occur during the official July 27, 2009August 24, 2009 window. Thus, while these the stragglers caused certain circles to remain illuminated the size of circles is essentially fixed after August 24, 2009.

Some Things to Notice in the Visualization

In the lower left corner of the video, you will notice two charts.  The chart on the left tracks the contribution to total sales for the given day.  The chart on the right represent the cumulative percentage of sales to date under the program. Not surprisingly, most of the transactions under the CARS program take place between July 27, 2009 – August 24, 2009 time window.

Within this window, the daily sales feature a variety of interesting trends. During each Sunday of the program (i.e. August 2nd, August 9th, August 16th & August 23rd) sales were significantly diminished.  Not surprisingly, the end of week and early weekend sales tend to be the strongest.

In the very early days of the program, there were a variety of media reports (e.g. here, here, here) highlighting the quickly dimishing resources under the program. Obviously, it is difficult to determine the underlying demand for the program. However, given the extent of the acceleration, it appears these reports contributed to the rapid depletion of the initial 1 billion dollars allocated under the program.  A similar but less pronounced form of herding also accompanied the last days of the CARS program.

The Supreme Court Open Infrastructure Project Meeting

Wash U CERL Meeting

Mike and I just spent a couple days a Washington University’s Center for Empirical Research in the Law for a meeting related to the Supreme Court Open Infrastructure Project. The meeting featured a number of great folks with cool data projects. The discussion was very fruitful and it is clear that the end product is going to offer a wide range of data relevant resources.  We are looking forward to contribute to the project in the months to come!