Forensic Hashing In Criminal And Civil Discovery
Published date | 18 May 2022 |
Subject Matter | Intellectual Property, Litigation, Mediation & Arbitration, Trade Secrets, Disclosure & Electronic Discovery & Privilege, Trials & Appeals & Compensation |
Law Firm | Holland & Knight |
Author | Mr Jacob W Schneider |
After reading an earlier IP/Decode post about hashing, my friend Jenny Rossman reached out to explain how law enforcement was using hash values to fight the spread of child pornography. For over a decade, Jenny had been a sex crimes prosecutor in Florida. She, alongside law enforcement, had been using the technique to identify suspects and secure convictions. It is a brilliant use of hashing that is also worth considering in civil cases, particularly trade secret litigations.
Using Forensic Hashing to Fight Child Pornography
As I wrote in the earlier post, hashing can convert files to shorter strings of numbers and letters (the "hash value"). To demonstrate this, below is a set of five files that contain different content. I computed their unique hash values using the MD5 algorithm:
Filename |
MD5 Hash Value |
File1 |
585960c5cf6ed77c10d37e8dfa66629f |
File2 |
994d6db8e10d41ac5cc49f15281a5bef |
File3 |
fec2a0796d37905dec5b9ef0b24045bf |
File4 |
a3d95a3899c1050c146cd05c054cebf8 |
File5 |
748f65d8e5d27d17dd2f142a7b712392 |
Law enforcement, along with private entities, have been using these unique hash values like fingerprints to identify illicit digital materials. In practice, if law enforcement knows that File5 is child pornography from a previous investigation, then File5's hash value can be used to identify other files with that same hash value. If there is a match, then there may be a crime. (U.S. v. Miller, 982 F.3d 412 (6th Cir. 2020), is a good read for those interested in how this practice implicates the Fourth Amendment.)
As I wrote in the previous post, the solution to speeding up nearly any search problem is hashing, and it provides the solution in this context as well. To find File5 in a suspect's computer, one would only need to run all files on the computer through an MD5 hash. After those hash values are generated, you search for File5's unique string: 748f65d8e5d27d17dd2f142a7b712392. Below are hash values for another set of randomized files that include the illicit File5:
Filename |
MD5 Hash Value |
File6 |
01cadc70bb61741a28915dd336f878d0 |
File7 |
748f65d8e5d27d17dd2f142a7b712392 |
File8 |
8259db3e9b95531adae71e740ff362b0 |
File9 |
d76c67896451dc0d920dc39ed8c802fb |
File10 |
cdf2d0112d601302ede03f6eafea0ad4 |
File7's MD5 hash value is the same as File5's, so we have a match. Due to the math behind the MD5 hash algorithm, the odds of File7's content differing from File5's, but still resulting in the same hash value, are almost...
To continue reading
Request your trial