A joint project of Law Technology News and Law.com Legal Technology

LTN Law.com

« e-Discovery Team Blog Posts 150th Blog | Main | StoredIQ Releases Desktop Agent »

June 29, 2009

Over There: Where Angels Have No Fear to Tread

Digicel I am late to the party in discussing the case of Digicel et al v. Cable & Wireless, et al.  Others, including the extraordinary Chris Dale and the magnificent Sharon Nelson, long ago put their stamp on the case.  The peripatetic Sultan of Search, Jason Baron, even guest blogged it for the prolific Ralph Losey.  But as it was decided "Over There," and Sir Andrew Lloyd Webber hasn't set it to music, I paid it little heed. 

But lately, I'm obsessed with sensible ways to improve keyword searches and practical means to test searches before they're trotted out against vast swaths of ESI.

Mr. Justice Morgan's opinion is the rare case where a jurist closely analyzed the efficacy and burden of particular keywords for electronic search--an undertaking that U.S. Magistrate Judge John Facciolla artfully characterized as a fool's errand for lawyers and judges.  Still, once we change the "esses" to "zeds," there's much we Yanks can learn from the Digicel decision.

Digicel is a fight between mobile phone service providers operating in seven Carribean markets and the island phone companies to which they're obliged to interconnect in order to offer service.  The claimants sought damages for the defendants' alleged foot dragging in offering interconnection.

The Defendants unilaterally selected and deployed ten keywords in their electronic searches; to wit: Digicel, interconnect, interconnection, licence, liberalise, liberalisation, strategy, competing, competitor, competition.

Let’s consider what’s amiss with these terms and where they can be tweaked to improve performance

Digicel: Wouldn’t you think it likely that some would place a second “L” at the end of the name? At the very least, I’d test to assess that potential before running a broad search.

Interconnect and Interconnection:  In the manner searched, would there be any occurrence of “interconnection” that didn’t overlap with a hit for the root “interconnect?”  How would occurrences that word wrap with hyphenation be handled?

Licence:  Doubtlessly, the English spelling is the preferred in the collections searched, but it’s wise to anticipate that some may have employed the American spelling “license” and to search for it as well.  A wildcard character will fill the bill without hurting precision.

Liberalise and Liberalisation: Wouldn’t a search for the root “liberali!” make more sense in that it would grab both variants along with the American spelling?  It won’t hit on “liberal”, so it’s not likely to be significantly noisier.

Strategy: Perhaps they were seeking to steer clear of “strategic;” but here again, using the root makes more sense. Additionally, the word “strategy” is prone to transposition error. If it’s a crucial term, it’s wise to also search for “startegy,” too.  A test against a sample helps decide.

Competing, Competitor and Competition: Doesn’t stemming make more sense here?  If you don’t want to grab items with “compete,” then just use the stem “competi!.”

With a little thought, our list of ten terms becomes:

Digicel!
Interconnect!
Licen*e
Liberali!
Strateg!
competi!

This list doesn’t cure every potential problem I mention, but it’s a better, faster approach that won’t materially boost the cost of review.

With misgivings, the Court went on to require Defendants to run several other search terms against a much broader collection on the theory that, had the Defendants worked cooperatively with the Claimants at the outset, the additional terms should have been run along with those discussed above: delay, frustra*, impede and obstruct.

I'm not so sure.

Though the Defendants did a good job acquainting the Court with empirical data about the terms they ran, I see no indication that they undertook any testing to demonstrate to the Court that the proposed search terms were painfully overbroad.  Had they done so using a sampling of the data to be searched, I think the Court would have listened and ruled differently. 

The term "delay" is particularly problematic.  On average, when run across an array of file types, the term "delay" can be expected to generate thousands of false hits for every potentially relevant one.  No one's established a threshold "false hit ratio" to disallow a keyword as unacceptably imprecise, but plowing through many thousands of hay straws for a single needle can't be what the Court intended.

Why should you believe me as to the burden?  Because I tested it.  Obviously, I don't have any of the parties' data, but I can test it against data from, e.g., other telecommunications providers or a bare Windows installation or a Carribean concern and prove that, in each instance, including the word "delay" without some Boolean restraint or other limitation, will rake in huge numbers of irrelevant hits.

That's the power of testing search terms against sample data.  It equips you with the single most effective persuasion tool you can bring to court: credibility.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345280a669e2011570947ff7970c

Listed below are links to weblogs that reference Over There: Where Angels Have No Fear to Tread:

Comments

David Chaumette

Craig -- Your suggestions here are solid. Practitioners must get in the habit of testing their terms before running the big scale searches and, importantly, choosing vendors whose systems (and pricing structure) are most compatible with this kind of approach to production.

--Chaumette

Post a comment

If you have a TypeKey or TypePad account, please Sign In.





An Affiliate of the Law.com Network

From the Law.com Newswire

Sign up to receive Legal Blog Watch by email
View a Sample


Subscribe to this blog's feed

PODCAST: Law Technology Now

Monica Bay

In this new monthly podcast, editor-in-chief of Law Technology News Monica Bay interviews key experts of the legal technology community on top issues confronting the legal profession.

Go to Podcast

RSS Feed: LTN Podcast

Monica Bay's Law Technology Now Podcasts are also available as an RSS feed.

Go to RSS Subscribe page




August 2010

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Blog Directory - Blogged