Tuesday, December 01, 2009

Unexpected relationship between hard-drive life and temperature


Today, I was reading Failure Trends in a Large Disk Drive Population [PDF], a February 2007 paper by Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso of Google, containing lots of great data on hard-drive failures and the difficulty of predicting same. The above graph depicts one of the more interesting findings, which is that the effect of operating temperature on disk reliability appears to vary with disk age, such that younger drives tend to be more susceptible to low-temperature failures, whereas older drives tend to be more susceptible to failure at elevated temperatures. Results are grouped by age of disk at failure, then broken out into subgroups (histogram bars) based on their operating temperature. So for example, among disks that failed at 3 years of age, the Annualized Failure Rate (AFR) was about 15% for those disks that had had operating temps of 45 deg. C or more, versus a fail rate of 5% for those that had seen temps of less than 30 deg. C.

Many people have assumed that high temps are bad for disks. And indeed maybe they are bad for 3-year-old disks, but disks that fail at younger ages tend to be much more traumatized by cold than by heat. Pinheiro et al. give additional data for this, and it's pretty convincing. For example, if you have a look at Fig. 4 of the paper, you'll see a bathtub curve, showing that extremes of temperature are deleterious to disk life expectancy. Ironically, the bathtub curve reaches its lowest point at around 38 deg. C -- very close to human body temperature.