Legal Technology News - E-Discovery and Compliance Blog

« Searching ESI: A Lesson from Aunt Judy | Main | Recommind Upgrades Insite Legal Hold »

December 10, 2009

Will Smarter TIFFs from Microsoft Change E-Discovery?

Borg_7of9 I trust you know, dear reader, that the only thing dumber than a TIFF file is converting your entire e-discovery collection to TIFF images for review.  But, while wholesale TIFF conversion will forever be monumentally stupid and profligate, it appears TIFF files just acquired a few brain cells. 

At risk of being revealed as the last kid on the block to figure this out, I learned today that Microsoft offers a way to smarten up TIFF images such that load files--those hinky, stinky electronic bills of lading that must accompany TIFF image productions to make them usable--may no longer be needed.

For those new to this topic, TIFF stands for Tagged Image File Format.  Think of a TIFF as a still photo of a document, either one snapshot per page (single page TIFF) or a snapshot of all the pages laid out on the floor (multipage TIFF).  I call TIFFs "dumb" because, unlike the native electronic versions of the documents they replace, TIFFs can't be searched electronically and don't function like native files.  To anthropomorphize, TIFFs are so dumb, they don't know what they say.  They're especially brain dead when used to replace spreadsheets or other formats which wither on the printed page.  To offset their low IQ, TIFFs need literate escorts in the form of load files carrying the document's textual content and metadata.

Of course, there's long been a way to pair an image of a document with its textual content and metadata.  It's called Portable Document Format or PDF, and it's made a boatload of cash for Adobe Corporation.  Need to download a tax form from IRS.gov?  It'll be a PDF document.  Want to electronically file a pleading in federal court?  Be sure it's a PDF or the Pacer filing system won't accept it.  PDF is a pretty smart format.  It even stores video, audio and  animation or the binary source of any file.

So, if native formats are optimum and PDF is so smart, why does anyone still use TIFF for e-discovery?  That's the billion dollar question.  Is it because of the clout of companies hawking entrenched TIFF-dependent tools and services from the horse-and-buggy time when e-discovery meant scanning and coding paper documents?  Or do lawyers so cling to paper and Bates numbers that they turn a blind eye to the staggering cost of their intransigence?  Whatever the reason, you can be sure it comes down to someone making more money from lesser technology.

As you gather, I'd pretty much written off TIFF as yesterday's news when today's news mentioned that Microsoft just released a document detailing the purpose and structure of three custom tags which the Microsoft Office Document Imaging (MODI) tool can embed in TIFF images.  It's long been possible to embed minimal metadata in a TIFF, e.g., identifying the scanner that made the image, but nothing of much value in e-discovery--certainly nothing to rival a PDF.  I'd paid little attention to MODI since it was first introduced because it seemed just another Microsoft bell and whistle that no one rang or blew.

Then, on December 9, 2009, Microsoft revealed the purpose of the catchy-named "Private Tag  37679" and--yowza!--it's a way to embed the text of a document into a TIFF image of the document in UTF-8 format!  Don't you just wish you had a little party horn to toot right about now?  A little digging revealed that Tag 37679 has been around for years but it was as undocumented as a Wal-Mart cleaning crew.  People could pretty obviously guess what the tag was for, but they couldn't be certain how to implement it

That's nothing new with Microsoft.  They're notorious for embracing an open standard, adding a few undocumented tweaks and labeling it a "new" proprietary standard.  Star Trek fans liken Microsoft to the Borg, a race of cybernetic organisms that destroy other races by compulsory assimilation, against which, "resistance is futile."  Mentioning the Borg gives me an excuse to attach a picture of Borg Seven of Nine (played by actress Jeri Ryan).  C'mon, would you really prefer a picture of a gavel or the Microsoft logo?

To its credit, by finally and formally documenting Tag 37679 and two other private tags, Microsoft makes it feasible for others to develop tools to read and write text searchable TIFFs.  Will anyone support the MODI TIFF format?  Doubtful, as a TIFF with embedded text is a far cry from native production or even PDFs.  TIFF will probably continue its inexorable slide into e-discovery oblivion.  But, just in case TIFF limps along, it's nice to know that there's a way--blessed by none other than Microsoft--to make TIFFs text searchable without a load file.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345280a669e20120a74169d4970b

Listed below are links to weblogs that reference Will Smarter TIFFs from Microsoft Change E-Discovery?:

Comments

I never understood the common instruction, often heard at e-discovery conferences, to convert all paper files and all electronic files to TIFF. Sometimes, a presenter will then direct the audience to run OCR on the resulting TIFF files. Grrr.

There is one advantage to TIFF over a multipage PDF: each page can be displayed in a software container and given its own annotations, descriptions, tags, etc., and can be used for pinpoint links. (Still hard to link to p. 43 of a PDF.) But a product like A-PDF Split will "bust up" a multi-page PDF into single-page PDFs (after Bates-stamping page numbers) if that is needed. For most e-discovery uses, though, the multipage PDF works much better.

PDF has support linking to not only specific pages of a document BUT to specific areas on a document since version 1.2 (Acrobat 3). You can do it from inside the same PDF, from one PDF to another, or from any URL-aware program.

If you wish to link to a page, you add #page=x (where x is the page number) to the end of your URL. If you want to link to an area, then create the "named destination" in the PDF (this is equivalent to an HTML Anchor) and then use #name=foo, where foo is the name you specified.

Works great and no need to split up the document!

Leonard Rosenthol
PDF Standards Architect
Adobe Systems

>If you wish to link to a page, you add #page=x (where x is the page number) to the end of your URL.

Yeah, I know that is the standard direction. But it does not work from the programs I use, at least under Vista. It might work from others that I do not use, or under other OSs. And if it does not work for me on a regular basis, I suspect it might not work regularly for others as well. That supports the "still hard" comment.

The comments to this entry are closed.

Sign Up for the E-Discovery and Compliance Newsletter

An Affiliate of the Law.com Network

From the Law.com Newswire

Sign up to receive Legal Blog Watch by email
View a Sample



Contact EDD Update


Subscribe to this blog's feed



RSS Feed: LTN Podcast

Monica Bay's Law Technology Now Podcasts are also available as an RSS feed.

Go to RSS Subscribe page




March 2013

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

Blog Directory - Blogged