A joint project of Law Technology News and Law.com Legal Technology

LTN Law.com

« New Version of the EDRM Website | Main | New FTC Guidelines re: Product Reviews »

October 05, 2009

The Need for E-Discovery Standards: A Call From the Trenches

Most discussion about standards in electronic discovery focuses on the big-picture issues of scope, cost and cost shifting.

These are important questions eloquently argued in the courts. However, they overlook the mundane, pick-and-shovel e-discovery concerns that affect every case. I’m talking about the elementary technical issues of preservation, extraction, processing, review and production.

I’m talking about extracting data from electronic storage media, processing the data and its metadata into a document review software application platform, supporting the review and producing the data as discovery or evidence.

Outside the e-discovery world, the first stage of this process is known as Extract, Transform, Load (ETL). Identifying and overcoming the challenges of ETL have occupied computer scientists for decades. Principal obstacles to effective ETL include widely diverse and poorly documented storage repositories, asynchronous multimedia platforms, constantly evolving software, hardware and software anomalies, and human error, usually with respect to initial planning.

E-discovery vendors on the ground face those obstacles and more. Consider, as just a few of many examples, the following:

Mobile phones and PDAs: In some models, data can be extracted through forensic imaging. In others, such as many of those without SIM cards, data can only be pulled through live file extraction. Click here and here to read my earlier blog posts about the difference between forensic imaging and live file extraction. In any case, the question is this: Should data extraction scope be defined by current technical capabilities, or should there be a single common standard – such as live files only – for those instances when mobile phones and PDAs are subject to e-discovery?

A multitude of file types: Extraction and processing applications address dozens, sometimes hundreds, of file types. These file types are usually associated with, and identified by, a particular file extension, such as .doc or .xls. However, custom extensions are easy to apply – documents I create might have a .epb file extension, for example – and it is also simple to apply a nonstandard extension to a particular file type (e.g., a .doc extension to a PDF file). These are often missed, or improperly processed, by extraction and processing software.

Computer forensics software in the hands of an experienced technician can reveal documents by file type without relying on extension format and such, but doing so is costly and time consuming. What checks should be done for mislabeled or unusual file extensions? When are such checks required?

Metadata: Most of us think of metadata in basic terms such as the putative author, creation date, modification date, last-access date and so forth. However, metadata varies widely across data types. Microsoft Office documents, for example, have more than 100 metadata fields. It is also possible to create custom fields with many document types. Nearly all of these, such as the ubiquitous P-size and L-size, are nearly never important in civil litigation.

“Nearly never” is not, however, the same as “never.” Such data can be extracted, but it is not, as a rule, supported by processing software, which renders it unavailable at the attorney review level. Is it possible to agree on which metadata fields should be preserved and processed? When they should be processed? Which fields are important forensically? When all fields should be preserved?

Rapid technological change: Software is updated all the time. This affects how metadata is produced and the appearance of electronic documents. Processing software hasn’t kept up. It’s also inconsistent. For example, the last-access date on a Word 2007 document running in Windows Vista is affected differently than an Office XP Word document running on Vista. Both documents, however, are processed the same, as if the metadata means the same, when it does not. How should inconsistencies like this be addressed? What should the typical approach be?

Webmail: Screenshots of Web-based email services such as Hotmail are a common and inexpensive workaround to downloading actual Hotmail files. Which method is preferred? Is either method not preferred? As third-party cloud data repositories multiply, what constitutes best practices with regard to extraction methods will become a critical question.

Capture rates: What percentage capture rate is acceptable for processing software? Many files are often not processed by even the best technology, and must be laboriously hand processed. In a million-item processing job, a 1 percent miss rate equals 10,000 documents not processed and available for review. Is 99 percent acceptable? Is 98 percent? Note: If you think that the processing rate for your document review software is 100 percent, you’re kidding yourself.

Searching: Keyword searching, including keyword searches supported by “fuzzy” search techniques, are giving way to conceptual searching, which is the future of document search and review. Conceptual searching, however, involves proprietary algorithms and processes with a wide range of accuracy. What standards must conceptual searching meet to be accepted? How are these standards applied? When, if ever, is conceptual searching disallowed?

File format: In e-discovery today, most documents are produced in .Tiff format. Putting aside the larger question of whether .Tiff should be the standard for producing electronic documents, what about documents such as spreadsheets that don’t translate well into .Tiff files? In what format should presentation-type documents be produced? As slide shows? As workbook copies with notes and presenters’ comments? How are native files to be tracked and authenticated as a best practice?

Today, e-discovery consultants decide many of these questions on their own or after consulting with litigation counsel. In essence, a consultant decides when it is and isn’t practical to extract files from a system, whether to image a particular hard drive and whether to put aside as unreadable a back-up tape from a set of tapes that must be searched.

Much of the time, the consultant makes the “right” decision, as subsequently decided by the court, the client or the opposing party. It’s a rare consultant, however, who won’t admit that adopting e-discovery standards would bring enormous benefit to the practical challenges of data extraction, processing and production.

I'll discussing these and other issues in the future. Any of the problems mentioned above could be an entire article. I look forward to working with the legal and technical community to address these “technical” standards – as opposed to the widely discussed “strategic” standards which may ultimately be addressed by changes in the Federal Rules of Civil Procedure.

Eric P. Blank is the founder and managing attorney of Blank Law + Technology PS. His practice focuses on electronic discovery counseling, e-security response planning and implementation, investigations and computer forensics. Mr. Blank has conducted more than 300 investigations into computer and software-related torts and employee misconduct since 2001 and has frequently been a court-appointed special master or neutral in e-discovery matters.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345280a669e20120a615e4d4970c

Listed below are links to weblogs that reference The Need for E-Discovery Standards: A Call From the Trenches:

Comments

Jonathan Maas

Interesting thoughts which I confess I have only briefly skimmed through whilst on a conference call. I think your rallying call is probably a very good one and I look forward to reading it properly when I have more time.

I am a founding member of LiST and would point you to our draft Data Exchange Protocol, which goes some way to addressing the issue of parties exchanging data at any stage of any type of matter: http://www.listgroup.org/publications.htm (sorry - it's not a link to a Viagra sales site!).

Jim Gardner

I disagree with the point about renamed file types often being missed. Any off-the-shelf eDiscovery processing tool worth anything can look into the file header to determine what it is and not rely on the extension alone. Anybody using a piece of software that does NOT do that should reconsider being in the eDiscovery business.

More to the point, isn't this why we have newly developed things such as the EDRM model and the ALSP? Trying to standardize electronic discovery and litigation support is nothing new. Folks have been trying (and it seems, fairly recently, succeeding) for years.

Post a comment

If you have a TypeKey or TypePad account, please Sign In.





An Affiliate of the Law.com Network

From the Law.com Newswire

Sign up to receive Legal Blog Watch by email
View a Sample


Subscribe to this blog's feed

PODCAST: Law Technology Now

Monica Bay

In this new monthly podcast, editor-in-chief of Law Technology News Monica Bay interviews key experts of the legal technology community on top issues confronting the legal profession.

Go to Podcast

RSS Feed: LTN Podcast

Monica Bay's Law Technology Now Podcasts are also available as an RSS feed.

Go to RSS Subscribe page




August 2010

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Blog Directory - Blogged