Will Smarter TIFFs from Microsoft Change E-Discovery?
I trust you know, dear reader, that the only thing dumber than a TIFF file is converting your entire e-discovery collection to TIFF images for review. But, while wholesale TIFF conversion will forever be monumentally stupid and profligate, it appears TIFF files just acquired a few brain cells.
At risk of being revealed as the last kid on the block to figure this out, I learned today that Microsoft offers a way to smarten up TIFF images such that load files--those hinky, stinky electronic bills of lading that must accompany TIFF image productions to make them usable--may no longer be needed.
For those new to this topic, TIFF stands for Tagged Image File Format. Think of a TIFF as a still photo of a document, either one snapshot per page (single page TIFF) or a snapshot of all the pages laid out on the floor (multipage TIFF). I call TIFFs "dumb" because, unlike the native electronic versions of the documents they replace, TIFFs can't be searched electronically and don't function like native files. To anthropomorphize, TIFFs are so dumb, they don't know what they say. They're especially brain dead when used to replace spreadsheets or other formats which wither on the printed page. To offset their low IQ, TIFFs need literate escorts in the form of load files carrying the document's textual content and metadata.
Of course, there's long been a way to pair an image of a document with its textual content and metadata. It's called Portable Document Format or PDF, and it's made a boatload of cash for Adobe Corporation. Need to download a tax form from IRS.gov? It'll be a PDF document. Want to electronically file a pleading in federal court? Be sure it's a PDF or the Pacer filing system won't accept it. PDF is a pretty smart format. It even stores video, audio and animation or the binary source of any file.
So, if native formats are optimum and PDF is so smart, why does anyone still use TIFF for e-discovery? That's the billion dollar question. Is it because of the clout of companies hawking entrenched TIFF-dependent tools and services from the horse-and-buggy time when e-discovery meant scanning and coding paper documents? Or do lawyers so cling to paper and Bates numbers that they turn a blind eye to the staggering cost of their intransigence? Whatever the reason, you can be sure it comes down to someone making more money from lesser technology.
As you gather, I'd pretty much written off TIFF as yesterday's news when today's news mentioned that Microsoft just released a document detailing the purpose and structure of three custom tags which the Microsoft Office Document Imaging (MODI) tool can embed in TIFF images. It's long been possible to embed minimal metadata in a TIFF, e.g., identifying the scanner that made the image, but nothing of much value in e-discovery--certainly nothing to rival a PDF. I'd paid little attention to MODI since it was first introduced because it seemed just another Microsoft bell and whistle that no one rang or blew.
Then, on December 9, 2009, Microsoft revealed the purpose of the catchy-named "Private Tag 37679" and--yowza!--it's a way to embed the text of a document into a TIFF image of the document in UTF-8 format! Don't you just wish you had a little party horn to toot right about now? A little digging revealed that Tag 37679 has been around for years but it was as undocumented as a Wal-Mart cleaning crew. People could pretty obviously guess what the tag was for, but they couldn't be certain how to implement it
That's nothing new with Microsoft. They're notorious for embracing an open standard, adding a few undocumented tweaks and labeling it a "new" proprietary standard. Star Trek fans liken Microsoft to the Borg, a race of cybernetic organisms that destroy other races by compulsory assimilation, against which, "resistance is futile." Mentioning the Borg gives me an excuse to attach a picture of Borg Seven of Nine (played by actress Jeri Ryan). C'mon, would you really prefer a picture of a gavel or the Microsoft logo?
To its credit, by finally and formally documenting Tag 37679 and two other private tags, Microsoft makes it feasible for others to develop tools to read and write text searchable TIFFs. Will anyone support the MODI TIFF format? Doubtful, as a TIFF with embedded text is a far cry from native production or even PDFs. TIFF will probably continue its inexorable slide into e-discovery oblivion. But, just in case TIFF limps along, it's nice to know that there's a way--blessed by none other than Microsoft--to make TIFFs text searchable without a load file.