Tag: computer science
Copyright → Title 17 U.S. Code w/ Sea Dragon From Microsoft Labs
This is part of our ongoing visualizations of the United States Code. For previous posts visualizing other portions of the code see Title 26 Tax and Title 11 BK. So, we wanted to test out the new Sea Dragon Visualizer from Microsoft Labs and thought Title 17 Copyright would be a fun way to give it a go. In this visual, each of the chapters under Title 17 is separately colored.
To use the visual, start in the center with the large label “Title 17 U.S.C.” and traverse the graph all the way out to any section or subsection. Sea Dragon should allow the user to smoothly zoom in and read any node. We love the interface.
In our view, the Full Screen Visual is the best. You can access it by clicking the Full Size Button on the far right. Also, if for whatever reason you zoom in too far, just use the Home Button to go back to the Full Image. Enjoy but note SeaDragon relies upon Silverlight and Javascript (so you might need to install this).
HarambeeNet @ Duke Computer Science
We enjoyed today’s discussion at the Harambeenet Conference here in the Duke Computer Science Department. The conference is centered upon network science and computer science education. It features lots of interdisciplinary scholarship and applications of computer science techniques in novel domains.
We are looking forward to an interesting final day of discussion and hope to participate in allied future conferences.
YouTube Research — Robust Dynamic Classes Revealed by Measuring the Response Function of a Social System
Here at the CSCS Lab, we are working hard to finish up some projects. In the meantime, we wanted to highlight one of our favorite articles, an article we previously highlighted on the blog. Some of you might ask “what does this have to do with law or social science?” (1) We believe the taxonomy outlined in this article could potentially be applied to a wide set of social phenomena (2) As we say around here, if you are not reading outside your discipline, you are far less likely to be able to innovate within your discipline. So we suggest you consider downloading this paper….
How Python can Turn the Internet into your Dataset: Part 1
As we covered earlier, Drew Conway over at Zero Intelligence Agents has gotten off to a great start with his first two tutorials on collecting and managing web data with Python. However, critics of such automated collection might argue that the cost of writing and maintaining this code is higher than the return for small datasets. Furthermore, someone still needs to manually enter the players of interest for this code to work.
To convince these remaining skeptics, I decided to put together an example where automated collection is clearly the winner.
Problem: Imagine you wanted to compare Drew’s NY Giants draft picks with the league as a whole. How would you go about obtaining data on the rest of the league’s players?
Human Solution: If you planned to do this the old-fashioned manual way, you would probably decide to collect the player data team-by-team. On the NFL.com website, the first step would thus be to find the list of team rosters:
http://www.nfl.com/players/search?category=team&playerType=current
Now, you’d need to click through to each team’s roster. For instance, if you’re from Ann Arbor, you might be a Lion’s fan…
http://www.nfl.com/players/search?category=team&filter=1540&playerType=current
This is the list of current players for Detroit Lions. In order to collect the desired player info, however, you’d again have follow the link to each player’s profile page. For instance, you might want to check out the Lion’s own first round pick:
http://www.nfl.com/players/matthewstafford/profile?id=STA134157
At last, you can copy down Stafford’s statistics. Simple enough, right? This might take all of 30 seconds with page load times and your spreadsheet entry.
The Lions have more than 70 players rostered (more than just active players); let’s assume this is representative. There are 32 teams in the NFL. By even a conservative estimate, there are over 2000 players you’d need to collect data. If each of the 2000 players took 30 seconds, you’d need about 17 man hours to collect the data. You might hand this data entry over to a team of bored undergrads or graduate students, but then you’d need to worry about double-coding and cost of labor. Furthermore, what if you wanted to extend this analysis to historical players as well? You better start looking for a source of funding…
What if there was an easier way?
Python Solution:
The solution requires just 100 lines of code. An experienced Python programmer can produce this kind of code in half an hour over a beer at a place like Ashley’s. The program itself can download the entire data set in less than half an hour. In total, this data set is the product of less than an hour of total time.
How long would it take your team of undergrads? Think about all the paperwork, explanations, formatting problems, delays, and cost…
The end result is a spreadsheet with the name, weight, age, height in inches, college, and NFL team for 2,520 players. This isn’t even the full list – for the purpose of this tutorial, players with missing data, e.g., unknown height, are not recorded.
You can view the spreadsheet here. In upcoming tutorials, I’ll cover how to visualize and analyze this data in both standard statistical models as well as network models.
In the meantime, think about which of these two solutions makes for a better world.
How to Use Python to Collect Data from the Web [From Drew Conway]
We wanted to highlight a couple of very interesting posts by Drew Conway of Zero Intelligence Agents. While not simple, the programming language python offers significant returns upon investment. From a data acquisition standpoint, python has made what seemed impossible quite possible. As a side note, this code looks like our first Bommarito led Ann Arbor Python Club effort to download and process NBA Box Scores…. you know it is all about trying to win the fantasy league…!
The Rise of the Data Scientist [From Flowing Data]
Earlier in the month, there was a very interesting discussion over at Flowing Data entitled the Rise of the Data Scientist. We decided to highlight it in this post because it raises important issues regarding the relationship between Computational Legal Studies and other movements within law.
As we consider ourselves empiricists, we are strong supporters of the Empirical Legal Studies movement. For those not familiar, the vast majority of existing Empirical Legal studies employ the use of econometric techniques. For some substantive questions, these approaches are perfectly appropriate. While for others, we believe techniques such as network analysis, computational linguistics, etc. are better suited. Even when appropriately employed, as displayed above, we believe the use of traditional statistical approaches should be seen as nested within a larger process. Namely, for a certain class of substantive questions, there exists tremendous amounts of readily available data. Thus, on the front end, the use of computer science techniques such as web scraping and text parsing could help unlock existing large-N data sources thereby improving the quality of inferences collectively produced. On the back end, the use of various methods of information visualization could democratize the scholarship by making the key insights available to a much wider audience.
It is worth noting that our commitment to Computational Legal Studies actually embraces a second important prong. From a mathematical modeling/formal theory perspective, at least for a certain range of questions, agent based models/computational models ≥ closed form analytical models. In other words, we are concerned that many paper & pencil game theoretic models fail to incorporate interactions between components or the underlying heterogeneity of agents. Alternatively, they demonstrate the existence of a P* without concern of whether such an equilibrium is obtained on a timescale of interest. In some instances, these complications do not necessarily matter but in other cases they are deeply consequential.
Artificial Intelligence and Law — Barcelona 2009
Live from Barcelona, we are on the road at the International Association for Artificial Intelligence and Law. Henry Prakken has just delivered the keynote address and we will soon be giving our presentation. The conference is interesting as it embraces a wide range of topics and intellectual traditions. For example, there is a significant emphasis on ontological reasoning, computational models of argumentation and the use of XML schemas. In addition, there are a number of folks using graph theoretic techniques and applying them to the development of the law. It has been a nice few days and we have enjoyed our time here. Tomorrow, the trip continues….
Data Mining the News — J. Kleinberg Work Discussed in MIT Tech Review
This short but cool article from MIT Technology Review discusses recent work by Computer Scientist Jon Kleinberg and his Cornell colleagues. This very nice visualization is the byproduct of their efforts at data mining more than 1 million online news items per day in the weeks leading up to the 2008 presidential election.