Why Data Deletion Makes Sense (and Dollars)
Conventional wisdom says the cost of storing data is declining. Conventional wisdom is right ... and wrong.
The price of disks has been dropping for years. According to Gartner, the cost of disk storage per terabyte has been falling, too. Additionally, distributed computing, virtual machines and on-demand storage capacity that can be ramped up or down according to a business’ needs all have combined to lower the total cost of ownership ("TCO") for storage. This has led many business executives to believe that the TCO for data storage will continue to decline ad infinitum, allowing them to collect all the data they would like to use to improve performance and drive top-line revenues.
All this would be true if not for several inconvenient truths.
Market research firm IDC estimates that the amount of all digital data created and consumed in 2012 was 2,837 exabytes. (One exabyte equals a million terabytes.) And that number is forecast to double every two years, reaching 40,000 exabytes by 2020.
Meanwhile, ICT Analytics reports that the amount of data being stored is increasing, on average, 45 percent annually. In fact, storage is the fastest growing cost within the enterprise data center.
But, one asks, what about the cloud? Doesn’t cloud computing permit businesses to outsource storage to providers at a fraction of the cost of a proprietary data center?
Yes it does for some types of data. But it gets complicated for critical data. Data privacy laws vary by industry, by country and even sometimes from state to state. The cloud storage providers’ business model typically assumes they can move data freely from jurisdiction to jurisdiction — optimizing server capacity and availability and, thereby, controlling their own costs. Adding jurisdiction- specific requirements to a hosting contract often can increase the cost significantly.
In practice, with the rapid acceleration of the volume of data generated (all those exabytes produced by the proliferation of sensors, tablets and smartphones) and the concomitant increase in the data that businesses are storing, the total cost of data storage is not (despite conventional wisdom) declining. How could it? Walmart, for example, handles more than a million customer transactions each hour and imports those transactions into a database estimated to contain more than 2.5 petabytes of data.
Do the math.
If a hypothetical company stores one petabyte of data this year, it will store 1.45 petabytes next year.
If the cost to store data drops 15 percent a year (or even 30 percent at the high end) while volume grows 40 percent, it’s easy to see that the conventional wisdom that the total cost of storage is declining is wrong. And this simple calculation does not include ancillary storage costs such as staffing; data backup; and confirmation that the data collected are accurate, useful and clean.
This growth in storage and its management is placing a growing burden on all businesses — a hidden tax that is ever increasing. However, this is a tax that businesses can do something about. They can delete a significant percentage of their expensive-to-store data.
Unfortunately, while everybody is storing more data, very few are deleting any. Call it data hoarding.
Data Hoarding: Sense and Nonsense
Not all data that businesses collect are useful. Indeed, as the enterprise’s haystack of data climbs ever higher, businesses often do not know what data they possess. Much of the information may be — and frequently is — junk, and data analysts waste time working with this junk, finding spurious patterns within it, thus hindering the company’s decision- making capabilities while incurring needless costs.
Why do businesses collect and store more data than they are able to process and use? One reason is Big Data hype and the vague belief that more is better — that somewhere in that ever-growing haystack is a golden needle that will produce new insight and generate additional revenues. This, however, is not a business strategy; it is a business wish.
Another reason businesses store data is fear of the possible legal consequences that may arise from deleting information. U.S. Securities and Exchange Commission regulations, for instance, demand that brokers and dealers retain all client account information for six years and copies of all reports requested or required by regulators for three years. Regulations such as these encourage data hoarding, as many businesses believe that in the current rigorous regulatory environment, it is safer to keep everything and delete nothing. There is, in effect, no obvious incentive to delete, and underpreserving creates risk if data later are deemed critical or discoverable. Recognizing this growing problem, and the potentially unreasonable persistence of data, some European states have proactive deletion policies, especially in cases such as employee performance reviews and disciplinary actions. According to the European Union Advisory Board on Data Protection and Privacy, "The annual assessment of a worker contains information regarding a concrete date and a given contact. After some years, there is no reason in principle to store the information regarding such evaluations. Therefore, the retention period should be limited to two or three years maximum after the evaluation."
In litigation, U.S. courts instruct juries to place a negative inference on the absence of relevant data such as emails, thereby encouraging businesses to store everything in the event there ever is a request to produce information in the discovery phase of a lawsuit or trial. However, that court mandate applies only if there was a duty to preserve the data in the first place. Unfortunately, that duty rarely is defined before a case is brought, and overpreserving, and failing to remediate backup materials, results in additional costs when there is a request to produce, as attorneys or e-discovery providers must spend time reviewing a greater quantity of material.
The hours add up.
A 2012 RAND study found the cost to review one gigabyte of data was $18,000. Of course, improvements in e-discovery and predictive coding technologies can reduce those costs, but, again, as volume increases, those savings can be devoured.
Volume is key and creates its own risks. For one thing, if more data are stored, there, obviously, is a greater amount of data to lose. Recent high-profile data breaches at various retail and entertainment companies have made public enormous troves of data.
Breaches are expensive. According to a recent Ponemon Institute study, the average total cost to an organization of a data breach in 2014 was $5.85 million.
That’s real money.
And today, even smaller companies are collecting — and storing — an ever higher volume of data as smartphones make data more available to businesses. Almost all retail sectors are seeing enormous growth in smartphone purchase conversion. According to Cisco’s Visual Networking Index forecast, global information processing traffic will grow at a compound annual growth rate of 20+ percent from 2013 to 2018, with over half of that coming from non-personal computer devices. All this collected data attract hackers and other criminals, as personal credit information (which either can be used or sold) becomes more available and accessible.
Businesses can attempt to secure their data — as they should — but recent history indicates there’s no guarantee they can do so successfully. The simplest solution to the risk and expense of collecting and storing too much data is deleting the data not needed.
Getting Rid of Junk Data Requires Information Governance
Storing data that businesses don’t have to keep ends up absorbing capital that otherwise could be deployed on operations or investments or return on capital. If a business chooses to reduce spending by cutting budget or laying off workers, in effect, it has (perhaps unknowingly) chosen data — much of which may be junk — over working capital and productive employees. It, therefore, is important to understand that junk data — and the attendant tax they levy on a company’s resources — are not an information technology ("IT") problem; they are a business problem.
To attack the junk data issue, businesses must take a holistic view of the challenge, working across functions. That includes the chief information officer and the chief financial officer, as well as the company’s Legal, Compliance and Security departments. Working together, the company can determine what data it needs to store and what data it can delete. The return on investment ("ROI") of deletion will become visible to the business as it begins to understand the extent of the resources needed to secure that data.
This is known as information governance. Good information governance requires creating a map of information assets across the business units, including cloud applications. This is the first step toward accurately classifying and categorizing data and allows a comprehensive assessment of which assets should be retained and which can be deleted.
Developing defensible statistical sampling protocols can help businesses reduce large amounts of stored media. Indexing and machine analysis of backup media can pinpoint what data should be preserved and what can be deleted.
Trying to delete large quantities of data manually is difficult and expensive; it is a process that begs to be automated. This means establishing machine rules that mandate the deletion of unnecessary and vulnerable duplicates. These are created when multiple copies of documents or files are downloaded to often-insecure devices or when individuals email files to themselves. It has been estimated that in a number of companies, duplicated files represent 20 percent to 40 percent of the data. Reducing duplication is a good thing. It improves operational efficiency, as duplicate data drive up data volume while slowing processing times and hampering business agility. Deleting duplicate data also decreases legal review costs as attorneys no longer have to examine repetitious documents. Good information governance is an investment with an immediate and long-term ROI.
For example, in 2014, multinational metals and mining company Rio Tinto, which was generating a rapidly growing volume of data, identified approximately 40 percent of its stored data as junk or, in the words of its head of global business services, "eligible for defensible destruction."
Acknowledging that Rio Tinto, like most large companies, is not good at "hitting the delete key," the executive said the company saw "a strong ongoing business case" for lowering storage costs "while strengthening our overall information governance across Rio Tinto."
It has been estimated that Rio Tinto immediately saved $8 million simply by eliminating 35 percent of the file shares in its network.
In another instance, a top-tier financial institution was able to get rid of useless log files (records of requests to servers saved to hard drives, including those created during system installations) that were stored in the depths of its IT system and provided no value whatsoever. Working with FTI Consulting, the bank was able to delete hundreds of useless terabytes of data. At a cost to store of $3.20 a terabyte, the company saved over $600,000 in the first year and more than $3 million over five years.
Another financial institution was sending thousands of backup tapes every month to an information management services company. Although the cost of storing tapes isn’t large, the software that makes the tapes must be licensed from a software provider — a recurring and perpetual expense. Reducing the number of tapes and licenses translated to impressive savings for the firm.
Of Course, No One Said It Would be Easy
In many businesses, data storage is considered an IT issue, and if IT tells a business unit leader that it wants to delete the unit’s data, there’s generally pushback. After all, the data belong to the business unit, not to IT, and maybe, just maybe, the information is valuable.
Even when an enterprise recognizes that it has a data retention problem, business- level views do not always align. The issue is that each business function considers data differently. Various functions have unique needs, requirements and targets, and these factors often discourage deletion. It necessitates someone with appropriate perspective and seniority to see across the business’ fiefdoms and work with Legal, Compliance, Security, IT and the business units to implement an information governance plan and begin deleting junk data. This is why, in the long run, information governance efforts have to be led from the top.
No End to the Data Deluge
As smartphone adoption and use increase, the digital universe will continue to grow. Right now, digital’s size beggars the imagination. In a few years, it will defy it. Unless businesses begin deleting data they don’t have to have access to at the moment, they will jeopardize the technological, financial and operational resources available to collect, process and analyze the torrent of incoming data they will need later on. This may place them at a future competitive disadvantage while increasing the financial and legal risks currently being faced.
Deleting data is not really about saving money; it is about not wasting money and spending it, instead, on initiatives and innovations that drive revenues.
Deleting data, and the information governance processes that enable enterprises to do so safely and securely, is just good — and logical — business. ￼
Senior Managing Director