Planned vs. Unplanned Downtime
Traditional data protection strategies focus on reactive response to unplanned events and the associated unplanned downtime. Yet storage and consequent server and application unavailability result from both unplanned and planned downtime. According to some estimates over 80% of all downtime results is planned. Typically, somewhat more than half of this planned downtime is attributable to database backup while the maintenance, upgrading and replacement of application and system software, hardware and networks typically accounts for most of the rest of the time in this category (Enterprise Management Associates, Inc., 2002).
Although much of this planned downtime occurs during “non business” hours, changes to the business environment brought about by globalisation, online ordering, back office consolidation, more flexible work practices and many other factors means that the number of “non business” hours is gradually, but inexorably being reduced, while the amount of data that needs to be protected is growing exponentially.
There are also problems other than downtime itself that stem from data protection systems dependent on manual configuration and management. Highly skilled personnel with multiple skill sets are required to manage, configure, and optimize the performance of large, distributed data protection infrastructures. Unfortunately, at the same time as these individuals are becoming more difficult to hire and retain, traditional data protection regimes are forcing them to perform most of their implementation and troubleshooting at a time when most people would prefer to be in their beds or with their families. While the professionalism of data protection specialists is typically high, the pressures of late nights, increasing workloads and decreasing resources often lead to increases staff turnover with the consequent rise in data loss caused by human error.
Why tape may be unsuitable for long term archiving
Other than the difficulty of expunging data which should no longer be kept, tape is a poor choice for long term archives for two other reasons. The first is that the Commonwealth and State Electronic Transactions Acts for the legal requirements for electronic transactions, and archiving procedures for electronic records state that data retention methods must allow for changes in technology. Tape has a poor track record in meeting this requirement, where for example data recorded on a “DLT III” tape cartridge which was still widely used less then 6 years ago, cannot be read by any commercially available tape drive today
Secondly “For any information kept in electronic form, the method of generating the information must provide a reliable way of maintaining the integrity of the information, unless a specific storage device is provided for by the relevant legislation.” . Tape is a relatively delicate contact media, which degrades with use, can become physically damaged and is adversely affected by swings in environmental conditions. Data stored on tape can also be lost from exposure to magnetic fields. Thus, in order to provide “a reliable way of maintaining the integrity of the information” tapes must be periodically refreshed (read and rewritten). Managing refresh cycles for hundreds of tapes written over many years is a complex and an extremely costly task with potentially serious consequences if not managed properly.
Is tape really capable of keeping data for long periods of time ?
Some tape media such as LTO-4 is often touted as having a 30 year archival life. For a technology that is less than three years old, this kind of claim can only be relied upon through a fair degree of faith in the vendors’ statistical analysis techniques. IT and business management are asked to take this leap of faith while accepting that unlike disk, there is little or no hard data published for tape on “mean time to data loss”, or annual failure rates under various conditions. Given the high rates of dissatisfaction with tape based backup, vendor claims of long-term reliability may need to be reviewed with greater vigor.
Tape is only as good as its handlers
One of the major failings of tape, is not the technology itself, but the way in which it is treated. In many cases the staff entrusted with tape management and movement are in entry level IT positions, or semi skilled third party couriers. Even tape media manufacturers openly acknowledge that expected archival life times are only for tapes kept in “optimum” operating and storage conditions of 16C to 25C ,Relative Humidity 20% to 50%, and no shock or vibration, none of which apply to courier vans. In addition, tape drives must themselves be subject to rigorous preventative maintenance. The reason for this is that atape that is used in a drive that was not well maintained and has accumulated dirt and debris from dirty heads, roller guides and other transport assemblies, may find this debris gets transferred to the tape media. When these dirty tapes are subsequently used in a good drive, they may transfer some or all of those contaminants and degrade a previously clean drive. As the new drive becomes contaminated, a variety of problems can result, including premature head wear, debris accumulation on critical parts of the drive transport, and then damage to the tape. This leads to an even larger media impact as any new tapes that are used in the drive can also be damaged,
A further cautionary note when using a backup application for long-term archive is that the media is recorded in a proprietary logical format readable only by the originating application. Backup vendors have been known to discontinue backwards read compatibility for their own logical tape formats, and a change of backup vendors, or products from the same vendor may make recovery from archived tapes difficult, if not impossible to do. Thus, true long-term archive would also require archiving the entire backup system including the computer, recording hardware and software as well as multiple copies of the media.
While this post is really more about archiving than it is about backup, almost every backup environment I’ve ever come across is still used to store long term archives on tape media. Most of the time this works reasonably well, but far too often it doesnt. To put this in perspective, if you needed the data on the tape to defend yourself in a legal battle, how would you feel about only having a “pretty good chance” of getting the informtaion you needed ?
How backup systems fail to satisfy regulatory requirements
If we then examine the typical backup retention policies against Australian regulatory requirements, we find that Sections 9 and 286 of the Corporations Act states that the following information needs to be kept
Financial Records (invoices, receipts, orders for the payment of money, bills of exchange, cheques, promissory notes, vouchers and other documents of prime entry; and such working papers and other documents as are necessary to explain the methods and calculations by which accounts are made up) that correctly record and explain the transactions and (including any transactions as trustee) and would enable true and fair financial statements to be prepared.
It is this legislation the drives the vast majority of the “7 year retention” requirement. This is then applied as a blanket policy across all data types regardless of whether the data falls under the definition of financial records above. This often results in large amounts of data being kept with little or no business justification.
Unfortunately, even for data which does fall under the Act, the “keep monthly backups for 7 years” policy does not completely satisfy the above requirement. Take for example a spreadsheet meeting the definition of a “working paper” above, that was created on the 4th day of the month, used as the basis for transaction on the 6th day of the month, and then inadvertently deleted, or changed on the 9th day. There is no guarantee that any document of this type will appear in the monthly archives as they are created after the previous month’s backup, and destroyed before the current month’s backup takes place.
If this wasn’t bad enough, there are a number of other regulations that requires data to be kept for a certain period of time after a specific event has passed. One example of this is the Workplace Relations Act that requires pay slips to be kept for seven years after employment is terminated. In the case of an employee who has been working for five years the “keep monthly backups for 7 years” retention policy would begin to cause potential non compliance two years after the termination of that employee. As a final complication, the Privacy Act of 1988 states that an organization must take reasonable steps to destroy or permanently de-identify personal information if it is no longer needed; a legal requirement which may prove very difficult with which to comply if tape backup is the primary method used for data archiving.
The reason traditional backup systems fails for regulatory compliance is that it was never designed for the task, nor is tape, the traditional backup media of choice.
Although there is considerable overlap in the functional requirements, backup is not the same as archive or disaster recovery. If people allowed a backup system to be just that, without overloading it with other non-core requirements, then it would have a good chance of meeting its data availabilty targets at a reasonable cost, however while it tries to carry these additional burdens, it is beaten before it has even started the race.
Data protection is about data availability, backup is not a business requirement, data availability is. 
IT professionals have at their disposal a variety of tools to help businesses achieve their data availability goals, including backup, restoration, replication and recovery, but it is critical to keep focussed on the actual goal, which is the availability of the data, and to balance that goal by using the right set of tools for the specific job. Held in balance are concepts like data importance or business criticality, budget, time to deploy, operational capability and costs of downtime.
Having stated that a data protection and retention solution should be designed from a balanced set of well-defined business requirements, it should be noted that this rarely the case. Instead, data protection solutions are designed and propagated based on inherited requirements that were not
- originally specified in consultation with the business,
- have not been adequately reviewed,
- and are inappropriate in the current regulatory, corporate, or technical environments.
The unfortunate fact is that backup, like insurance is an aspect of risk management, and that every dollar spent on backup is a dollar that cannot be spent on IT projects that have the potential to improve the bottom line. Because of this it is important for IT management to ensure that every dollar spent on data protection is not wasted on ineffective and outdated strategies.
The Origin of Typical Data Protection Requirements
Unfortunately, data protection processes and planning often revolve around the unsettling question of “how much can you afford to lose.”, a question that rarely meets with a well thought out or balanced response. While its understandable for the business to ask that “no data shall be lost under any circumstances, and all data created at any time shall be kept indefinitely”, the costs of doing so are prohibitive. The sad fact is, that in the absence of a detailed cost benefit analysis of the various options, these are the kinds of requirements that the backup designer has to satisfy often without adequate resources.
Rather than asking the difficult questions, many IT organizations fail to engage in any meaningful dialogue with the business, and fall into the habit of continuing whatever had gone before ; implementing a data protection regimes similar to the following
- Changed data is backed up to tape every night and kept for two to four weeks
- All data is backed up to tape once a week and kept for four to ten weeks
- All data is backed up to tape once a month and is kept for seven years
In addition, to protect against site failures, they may also perform the following additional procedures
- Backup tapes, or copies of backup tapes are sent offsite
- “Mission Critical” data is synchronously replicated via dedicated high speed networks to an alternate data centre.
The advantage of these data protection regimes is that they are well known, and serve as a general “catch all” solution. These are then signed off by management as “standard IT practice”. What surprises me is that this practice continues despite numerous examples of its inadequacy. As a case in point, over 25% of enterprise IT organizations report being either dissatisfied or very dissatisfied with tape based backup infrastructures, and synchronous replication is renowned for instantaneously replicating data corruption across two sites in a variety of data loss scenarios.
None of this is just theory, I talk to many of Australias largest organisations and best system integrators, and whenever I talk about this I see many heads nodding in agreement.
I’ll spend the next week or so detailing exactly what I believe is wrong with the way most people approach backup, and then, once my spleen is well and truly vented, I’ll write about how I think this should be fixed. (For those who cant wait, the answer is replication based backup combined with image based dumps to removable media)
 Paraphrased from SNIA Education “Disk and Tape Backup Mechanism”
 The Continued Shift to Disk based data protection solutions – Forrester Research 2007