It was a happy day (for coffee drinkers) when a 1981 New England Journal of Medicine article, which published a Harvard study linking drinking coffee with pancreatic cancer, was finally double-checked and deemed bunk.
Bunk science is the term generally used when studies—even those meticulously conducted by prestigious academics—go array. When they show positive, but anomalous results.
Sometimes, as with the Harvard study, correlations are considered causation, which—we all know–confuses fiction for fact.
The existence of bunk science, however, is perhaps a positive thing. Medical studies don’t always have publishable or even particularly interesting results. But, the fact remains, they’re being conducted.
The same applies to the social sciences, where large datasets of findings often have no immediate, practical use. The key word here being “immediate.” It’s possible that one day, a new idea or new research direction could make use of such big data. So, why not hang on to it?
Well, unfortunately, big data is both costly to collect and costly to store. In 2007, the world watched Google create an unprecedented step in storage with its Palimpsest project, which offered to store and–better yet–share gigantesque data sets.
For the first time, academics had a way to preserve and perhaps reuse old data. And Google was participating in the name of science.
Sadly, the very next year, Google abandoned the project, citing financial cut-backs.
“As you know, Google is a company that promotes experimentation with innovative new products and services. At the same time, we have to carefully balance that with ensuring that our resources are used in the most effective possible way to bring maximum value to our users,” wrote Robert Tansley of Google on behalf of the Google Research Datasets team to its internal testers, reports Open Access News.
“It has been a difficult decision, but we have decided not to continue work on Google Research Datasets, but to instead focus our efforts on other activities such as Google Scholar, our Research Programs, and publishing papers about research here at Google.”
Scientific research without corporate agenda is rare. And if the financial crisis of 2008 had another victim, it was the trend of setting dark data free.
Dark data is the term applied to all those unused datasets: the ones that are collecting digital dust on somebody’s virtual shelf.
According to Thoams Goetz for Wired Magazine, however, “Technology is actually the simple part.”
“The tougher problem,” Goetz writes, “lies in the culture of science. More and more, research is funded by commercial entities, which deem any results proprietary. And even among fair-minded academics, the pressures of time, tender, and tenure can make openness an afterthought. If their research is successful, many academics guard their data like Gollum, wringing all the publication opportunities they can out of it over years. If the research doesn’t pan out, there’s a strong incentive to move on, ASAP, and a disincentive to linger in eddies that may not advance one’s job prospects.”
Law firm professionals are especially sensitive to this culture—where all information is precious and proprietary.
Luckily, in science, there’s renewed hope.
“There are some islands of innovation. Since 2002, the Journal of Negative Results in Biomedicinehas offered a peer-reviewed home to results that go negative or against the grain. Earlier this year, the journal Nature started Nature Precedings, a Web-based forum for prepublication research and unpublished manuscripts in biomedicine, chemistry, and the earth sciences,” reports Goetz for Wired.
“At Drexel University, chemist Jean-Claude Bradley practices ‘open notebook’ science—chronicling his lab’s work and sharing data via blog and wiki.”
Researchers and scientists are trying to keep data from disappearing into the dark. But, for law firm professionals, the need for dark data is dubious.
When should dark data be deleted? When should it be kept? What makes this material a potential liability, and when does that liability outweigh the potential benefits of keeping this material?
These are all questions that law firm managers should be asking, alongside legal IT departments.
Unmanaged, uncategorized content is lurking in your enterprise. This legacy data sits unmanaged and unknown in email repositories and file shares, and presents an added challenge for eDiscovery or investigations. A lack of control when it comes to ‘dark data’ can result in data spoliation, and increased collection, processing, and eDiscovery review costs.
By shrinking the dark data abyss, counsel can dramatically reduce costs and risks during litigation and government investigation. So why isn’t every GC doing it? Because dark data management is confusing, and knowing what to delete and what to keep is no easy task.
You’re not alone. The Center for Competitive Management offers a course on Wednesday, September 24, 2014, from 2pm to 3:15pm EST on “Dark Data: GC’s Guide to Identifying, Managing and Defensibly Disposing of Unmanaged Data.”
During this interactive session, you will learn:
- What ‘dark data’ is, and how it raises eDiscovery costs
- How to create a compelling business case for data handling that is consistent with the business environment
- What makes an information governance strategy legally defensible
- And much more!
Sign up here.