A joint project of Law Technology News and Law.com Legal Technology

LTN Law.com

« Re-Visiting Biller v. Toyota | Main | What do you call someone who gets the lowest passing grade on the Bar exam? »

February 06, 2010

Clinching the Concept of Concept Search

Fingerprint As a frequent speaker, I live for the "aha" moment that lights the eyes of an audience.  It's that magical turning point when you've made a daunting technical topic accessible.  You can almost hear the, "Thank you, thank you, thank you, for making something I've long wondered about but never fully grasped clear to me." 

Yesterday, at an e-discovery conference in Austin, I watched Ed Fiducia of Inventus earn his "aha" moment describing concept search.  It's a challenging topic--one that entails shoving a host of different approaches under a broad rubric, and more math than the average lawyer wants to recall.  Then, explanations are often laced with--or should I say lacerated by?--marketing-speak.  But Ed hit the bull's-eye.

Ed wisely defused rampant technofear by tying his explanation to the immensely popular CSI television series (Las Vegas and New York, as Ed's not fond of David Caruso's trite, trademark take-off-the-sunglasses move). 

Rather than embrace the specifics of the various approaches to concept search, Ed tackled the concept of concept search, particularly document clustering and near de-duplication.  He began by reminding us that when the CSI team runs a fingerprint through the Automated Fingerprint Identification System (AFIS), the system doesn't check every aspect of the print but only the spatial relationship between distinctive features comprised of loops, whorls and arches.  That is, the computer compares a digitally recorded geometric analysis of the ridges at their points of termination and bifurcation to a database of geometric characteristics of other fingerprints.  The computer then assembles a list of likely matches and calculates a percentage estimation of such likelihood.  On television, this is often accompanied by a fanciful "100% match" along with a mug shot and rap sheet.

Ed's point was that we don't need to consider every nuance of a fingerprint to drastically reduce the universe of potential matches.  Instead, we can calculate a finite number of geometric values and plot those values to identify candidates for identicality.  Then, we look carefully at the candidates to gauge true matches.  This doesn't eliminate the need for human judgment, but it allows human review to be deployed efficiently.

Applying this technique to documents, we plot words instead of whorls.  To lay the groundwork, Ed posited a world where all documents were composed of combinations of only three words, say "run," "home" and "cat."  Were we to analyze each document in terms of the number of instances of each word and plot these values on X, Y and Z axes, we'd have a crude measure of similarity.  If we factor in the spatial/geometric relationship of the words, we'd have a much more exact measure of similarity.  Plus, patterns would emerge, and we'd start to see similar documents cluster in geometric space.

Cluster map 

By focusing on clusters of similar documents, review for responsiveness and privilege becomes more efficient in the same way that focusing on geometrically similar fingerprint candidates makes crime scene investigation more efficient.  And, therein lies a leading concept behind concept search.

Enabling a single reviewer to rapidly muster similar documents not only reduces the risk of inconsistent characterization and redaction, but also reveals similarities that might have been overlooked.  It's like shopping in a neighborhood where all the stores sell the same things--think the Diamond District in New York or the Goldfish Market in Hong Kong.  Having all the permutations at hand fosters smarter choices.

Nice work, Ed!

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345280a669e20120a86c286b970b

Listed below are links to weblogs that reference Clinching the Concept of Concept Search:

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In.





An Affiliate of the Law.com Network

From the Law.com Newswire

Sign up to receive Legal Blog Watch by email
View a Sample


Subscribe to this blog's feed

PODCAST: Law Technology Now

Monica Bay

In this new monthly podcast, editor-in-chief of Law Technology News Monica Bay interviews key experts of the legal technology community on top issues confronting the legal profession.

Go to Podcast

RSS Feed: LTN Podcast

Monica Bay's Law Technology Now Podcasts are also available as an RSS feed.

Go to RSS Subscribe page




August 2010

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Blog Directory - Blogged