From the talk description: “Computers are fantastic at finding identical pieces of data, but terrible at finding similar data. Part of the problem is first defining the term similar in any given context. The relationships between similar pictures are different than the relationships between similar pieces of malware. This talk will explore the different kinds of similar, a scientific approach to finding similar things, and how these apply to computer forensics. Fuzzy hashing was just the beginning! Topics will include wavelet decomposition, control flow graphs, cosine similarity, and lots of other fun mathy stuffs which will make your life easier.”
I have been quite interested in the “science of similarity” and its application to a variety of questions in law and the social sciences. Whether it concerns the sort of analogical reasoning described by legal scholars such as Edward Levi or Cass Sunstein or cognitive biases such as the availability heuristic (Tversky & Kahneman (1973)), developments in “science of similarity” are of great relevance to theorists in a wide variety of sub-fields.
While there has been lots of skepticism regarding the application of these principles (particularly by those in legal theory), from our perspective it appears as though computer science ∩ psychology/cognitive science stands on the cusp of a new age in the “science of similarity.” I offer the slides above as I found them to be both interesting and useful. Stay tuned for more …
One thought on “Applying the Science of Similarity to Computer Forensics (with lots of other potential applications) [via Jesse Kornblum]”