Home > Uncategorized > Some Thoughts on Bit Rot.

Some Thoughts on Bit Rot.


During some recent discussions on Twitter, the subject of disk drive rebuild times for very large drives in excess of 10TB has raised the subject of urecoverable read errors also known as UER, which is sometimes blamed on something called  “bit rot”  however,  two NetApp sponsored studies shows that bit rot is far less of a problem for storage array reliability than many other factors.

The best publically available data on bit rot and it’s impact compared to other causes I’ve found is contained in “A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID) by Jon G. Elerath and Michael Pecht  in IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 3, MARCH 2009 http://media.netapp.com/documents/rp-0046.pdf”. The following information summarizes and paraphrases the information found in that document.

What Bit rot is and why you should care

Bit rot is a concern for two main reasons, for the home user with no RAID protection, it results in the inconvenience of a lost or corrupted file, or possibly a machine that wont boot, for the enterprise user, bit rot raises the specter, not just of a lost or corrupted file, but of the potential to completely lose an entire RAID group after the failure of a single drive due to the “Media Error on Data Reconstruct” problem. The less catastrophic issue on a enterprise calss array is far less because the additional error detection and correction available through the use of RAID and block level checksums means the chances of bit rot causing the loss or corruption of a file is vanishingly remote.

What I believe most people mean by bit rot, could be more accurately described as latent media errors rather “bit rot” which is more strictly caused by degradation of the magnetic properties of the media.

The reason for this is that most early RAID reliability models assumed that data will remain undestroyed except by “bit rot”. Although it is correct that the magnetic properties of the media can degrade, this failure mechanism is not a significant cause. Data can become corrupted any time the disks are spinning, even when data are not being written to or read from the disk.  The failure mechanisms outlined below here are not unknown, but neither are they readily available from HDD manufacturers

Common Causes for losing data

Four common causes for losing data after its been correctly written are “Thermal asperities”, scratches and smears, and corrosion.

  • Thermal asperities are instances of high heat for a short durations caused by head-disk contact. This is usually the result of heads hitting small “bumps” created by particles embedded in the media surface during the manufacturing process. The heat generated on a single contact may not be sufficient to thermally erase data but may be sufficient after many contacts.
  • Although disk heads are designed to push particles away, but contaminants can still become lodged between the head and disk, hard particles used in the manufacture of an HDD, can cause surface scratches and data erasure any time the disk is rotating.
  • Other “soft”materials such as stainless steel can come from assembly tooling. Soft particles tend to smear across the surface of the media, rendering the data unreadable.
  • Corrosion, although carefully controlled, can also cause data erasure and may be accelerated by thermal asperity generated heat

Why data is sometimes not there in the first place

A latent defect can also be caused by data that was incorrectly, or incompletely written to the disk in the first place, this can happen, this can happen because of the inherent “Bit Error Rate” or BER, writing to damaged media, or too much lubrication and “high-fly writes”

  • The bit error rate (BER) is a statistical measure of the effectiveness of all the electrical, mechanical, magnetic, and firmware control systems working together to write (or read) data. Most bit errors occur on a read command and are corrected, but since written data are rarely checked immediately after writing, bit errors can also occur during writes.
  • BER accounts for a fraction of defective data written to the HDD, but a greater source of errors is the magnetic recording media that coats the disks. Writing on scratched, smeared, or pitted media can result in corrupted data. The reasons for scratches and smears where covered earlier, however “pits and voids are caused by particles that were originally embedded in the media during the manufacturing process and subsequently dislodged during the polishing process or field use.
  • The final common cause for poorly written data is the “high-fly write.” The heads are aerodynamically designed to have a negative pressure and maintain the small, fixed distance above the disk surface at all times. If the aerodynamics are disturbed, the head can fly too high, resulting in weakly (magnetically) written data that cannot be read. In addition to “wind gusts” inside the disk, all disks have a very thin film of lubricant on them to help protection from head-disk contact. While this lubrication helps mitigate the effects of “thermal asperities”, lubrication build-up on the head can increase the flying height, resulting in weak or incomplete writes.

Where’s my data ?

Finally, all the data may have been written correctly, but the disk may not be able to “find” it, because of damage to special “servo” tracks which help keep the heads correctly aligned to the data on the disk. In some cases, it’s not damage to the servo tracks but wear and tear on the motor and disk head bearings, noise, vibration and other electromechanical errors can cause the head positioning to take too long to lock onto a track which ultimately also causes “latent block errors”

How to protect yourself

There are two main ways of dealing with these kinds of latent block errors, the first is to perform disk scrubs, which is something every reputable array vendor does, the problem is however that as disk sizes get larger and larger, the time taken to perform a full disk scrub can take too long for the protection to be as effective as it should. The other method is to use additional levels of RAID protection such as RAID-6 which allows for higher levels of resiliency and error correction in the event of hitting a latent block error when reconstructing a RAID set. NetApp uses both approaches as studies have shown that the risk of losing data through these kinds of events is thousands of times higher than predicted by most simple “MTBF” failure models.

 

 

Categories: Uncategorized
  1. November 17, 2010 at 9:21 am

    Good post. There is very good research that is more recent, especially the paper done by Oprea and Juels, published in the FAST ’10 proceedings. In that paper, they show that staggered scrubbing is superior to standard scrubbing, and even more so given disks with a higher propensity to take latent sector errors or LSE (in other words, the worse the disk, the better staggered scrubbing performs). Still, since scrubbing involves reading the disk, usage-related errors also rise proportionately as a result of scrubbing. Thus, there is a balance of optimal scrubbing and not, and the Oprea/Juels paper also goes into that. In other words, one should not scrub too much, nor too little. Over-scrubbing can lead to an increase in LSE, exactly the opposite of the desired effect.

    But by far, the best way to protect disks from LSE is to control the environment in which they are placed. Gibson et al did a very good paper on that topic. This is why my company goes to the lengths it does in the ISE to protect the drives and control the environment – heat, vibration in particular. Also, the use of T10 DIF eliminates many write-time errors which would normally be caught only by accident (application reads) or on purpose via scrubbing. In other words, prevent the errors from occurring in the first place – prevention is always the best medicine. These techniques lead to nearly two orders of magnitude less LSEs observed in the field.

  2. November 18, 2010 at 8:44 am

    Thanks good to know, I’ve been meaning to get up to speed on the T10 DIF stuff, looks like I’ll have to hurry tht up a little. Do you have a link to the LISA presentation material / paper ?

    By the way, feel free to include appropriate context links to product information for the products you mention for your company, so long as it’s factual and interesting, I think it does the readers of this blog a favour to make good information easy to find.

    Regards
    John

  1. No trackbacks yet.

Leave a Reply - Comments Manually Moderated to Avoid Spammers

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: