JASON BARON: The King of Search
One of the feature articles in this month's ABA Journal concerns e-discovery search and the work of Jason R. Baron with TREC Legal Track. The article by Jason Krause is now online and is entitled In Search of the Perfect Search: A project closes in on a protocol to improve e-discovery results. It contains a good explanation of the scientific research program into electronic search in the litigation context known as TREC Legal Track. Jason Baron started this annual research project in 2006 with information scientist Doug Oard, a giant in his field. Jason and Doug are shown in the ABA picture above taken by Ron Aira. This Journal article is well written and makes a good supplement to my more lengthy blog on the subject, Jason Baron on Search - How Do You Find Anything When You Have a Billion Emails?
While almost every test found roughly 20 percent of potentially relevant documents, each different type of search basically found different documents. When testers threw different combinations of search technologies at the database, they were able to find roughly 78 percent of the total number of relevant documents.
Baron believes these paradoxical and confounding findings can be reconciled if “lawyers come to realize that to improve the results of searching, one needs to use a variety of available search methods and tools. No one off-the-shelf method will solve all of your e-discovery issues.”
...
So far the TREC Legal Track research has identified a couple of practices that improve on the baseline keyword search. To start, lawyers need to work with opposing counsel to identify good search terms and to negotiate proposed Boolean search strings.
And it is important to use sampling—testing to see whether the search engines are finding documents known to be relevant. That means deploying what e-discovery experts call iterative feedback loops. These involve a team of lawyers and other in-house experts conducting searches in stages, and conferring with counsel and experts from the opposing party to determine whether the process is working.
Unfortunately, only a small minority of lawyers actually sit down with opposing counsel to negotiate document search strings. The iterative process employed in Legal Track is an aspirational model for the profession, which still largely treats discovery as adversarial.
The interactive task used experienced lawyers to review search results in a feedback loop, as happens in actual litigation. In some limited circumstances, advanced search technologies could beat Boolean in a head-to-head comparison. Previously, TREC researchers were able to find more documents than Boolean only by employing multiple search technologies together.
Oard says it’s not clear yet why this is happening, but the results are an improvement. “A lawyer can go to a judge and tell him with a straight face that, if well-implemented, our system is a reasonable alternative to Boolean.”
Baron expects to get more commercial participants and academic teams in the next year of TREC Legal Track, when the tests will target the online collection of documents from the Enron litigation. It is newer than the tobacco litigation database, which was made up primarily of scanned records, and should produce even more useful results.
...
TREC researchers warn that their work has not yet found that ultimate search method, but they have created a viable test environment. The TREC Legal Track is to the point where it may soon offer lawyers a common language and defined processes for search that can account for the inherent deficiencies of the technology.




The concepts coming out of NIST Trec are neither new, nor particularyly profound. Many practioners who used search tools beginning in the late '80s realized the need to iterate through result sets to refine results relative to the responsiveness of the hits. It was an essential feature of the tools in BRS Search, dBTextworks and Summation as far back as 1996. What was manually executed then with care and forethought by an attorney is now subject to automation, but the same underlaying principles apply.
What is novel is the idea that adversaries will agree on how to search through each others drawers to find either exculpatory or damning evidence. Not likely.
Posted by: William Kellermann | April 06, 2009 at 11:49 AM
The difference is, these are not "concepts" as you put it coming out of NIST Trec, these are test results. That is new. True it is not surprising, but still many, if not most litigators fail to do it.
Your attitude towards cooperation is commonplace among trial lawyers unfamiliar with e-discovery, but you are a tech lawyer, and must know that this attitude causes excessive e-discovery costs that do not benefit the client. Moreover, it is contra to the rules of procedure, case law, and ethics. See ie Judge Grimm's 'Mancia' and model Ethics rules 3.2, 3.3, 3.4.
Posted by: Ralph Losey | April 06, 2009 at 01:47 PM
It's "the SULTAN of Search," not the king of search. Man, I've had trouble getting that moniker to take hold! I guess Jason's e-mail running buddy, George W., needs to be the one to assign the nicknames that stick. ;-)
Posted by: Craig Ball | April 06, 2009 at 03:01 PM