The Little Drive That Could

In the couple of decades of my professional life, nothing was more painful, stressful, long or boring than hard drive recovery.

I've lost data that wasn't backed up, I've wasted days watching a very slow moving progress report with ddrescue, I physically opened drives to move platters in a working enclosure in a freaking white room.

I've had to run to places where I could get a big enough drive because my spider-sense was tingling on a Sunday night to launch into an emergency clone of a production-running machine.

Platter hard drives are a thing of the past. They are slow, and ridiculed on a daily basis. Who the hell waits for 2 goddamn minutes for an OS to boot? Everything is backed up in the cloud, it doesn't matter if your shiny flash storage dies, because a copy will be available by the time you finish installing the new one.

News flash: most cloud-based backup system use platter drives. Why? Because they are reliable in the long run. Maybe SSD will be too, in time. It's a fairly new technology that hasn't been tested by the heavy duty storage industry just yet, and they may incorporate new techniques to prevent the inherent flaws of the system.

There was a paper in 2016 that summarizes this:

An obvious question is how flash reliability compares to that of hard disk drives (HDDs), their main competitor.
We find that when it comes to replacement rates, flash drives win. The annual replacement rates of hard disk drives have previously been reported to be 2-9%, which is high compared to the 4-10% of flash drives we see being replaced in a 4 year period.
However, flash drives are less attractive when it comes to their error rates. More than 20% of flash drives develop uncorrectable errors in a four year period, 30-80% develop badblocks and 2-7% of them develop bad chips.
In comparison, previous work on HDDs reports that only 3.5% of disks in a large population developed bad sectors in a 32 months period – a low number when taking into account that the number of sectors on a hard disk is orders of magnitudes larger than the number of either blocks or chips on a solid state drive, and that sectors are smaller than blocks, so a failure is less severe.
In summary, we find that the flash drives in our study experience significantly lower replacement rates (within their rated lifetime) than hard disk drives. On the downside, they experience significantly higher rates of uncorrectable errors than hard disk drives.
(from this paper)

I honestly don't have an opinion yet on SSD vs HDD, and I was quite happy when I switched my older computers to SSDs to extend their lifespan by speeding them up. It's just too early to tell.

Anyways, the main point was to offer a probable eulogy to my last Apple-branded HDD.

WCAR00008188 was manufactured in Thailand on April 7, 2007, as part of the WD2500AAJS (Caviar) family of 250GB drives that shipped with my first Mac Pro. It came pre-installed with Mac OS X 10.4 "Tiger", which would be replaced immediately afterwards by the next version of the OS, and served me at the time as my main development machine for tools for the DVD/Cinema industry. It was running Xcode 2 and Final Cut Pro 7 daily, and even had a Bootcamp partition for Windows-specific software.

When my new (and last) Mac Pro came in to replace its ageing predecessor (which couldn't run 10.8 because "64 bits", even though it could run windows and linux 64 bits no problem), the Mac OS part was moved to a smaller disk I had lying around, and it got NetBSD on the full disk for a while, as a Xen host for various guest Linux and Windows systems I had need for at the time, and served both as my primary backup and my primary test machine for server stuff. It ran 24/7, rebooting only once in 3 years because of a major kernel update. Don't change anything NetBSD, I love you so much.

For five straight years, it clicked and it clacked happily through life until I developped a thing for GPU shenanigans, which meant retooling that older Mac Pro into a multiple CUDA/OpenCL machine, and NetBSD doesn't excel (yet) at those things, beeing less about the bleeding edge than Linux. For the past two years, the whole disk was a Linux installation, with multiple workers for gitlab instances, as well as dockerized server testing stuff, and the main ML/3D workhorse of the house, which naturally led to more clicking and clacking of its little heads.

On Thursday, April 4, 2019, a routine Python upgrade took 2 more hours than it should have (that is, it took 2 freaking hours), and the clickety-clack became erratic, with long streches of silence. Naturally distressed, the sysop (me), finished the upgrade, powered down the computer for the first time in probably 3 years, extracted the disk from its enclosure and ran a diagnosis. The disk had 20 or so bad blocks, but what was concerning is that the test ran for 12 hours. That probably meant that the rotors were in a bad shape, but not the platters or the heads.

On Friday, April 5, 2019, a decision was made to retire the drive, and clone its entire content to a newer faster one that wouldn't - unfortunately - click and clack : a 500GB SSD I have lying around for test purposes. partclone has been at it for 26h at the time of this writing, and has failed to copy half a dozen blocks, probably some data from the latest batch of commands, but it's 90% done and WCAR00008188 gives its best to perform its duty by giving back 99.9999% of the data it was entrusted with to its successor, before retiring from a long, hard, and too often thankless life.

WCAR00008188, you have been working for almost exactly 12 years straight, by my side through all manners of data shenanigans, and I can genuinely say I wouldn't have been able to go that far without you and your brothers. I will keep you around for sentimentality's sake, and you are not to die an ignominous death of failing at the wrong time and causing the sysop (me) grief and anguish, but have succeeded to perform your last act of duty with honor and pride and for that, I thank you.

Also, take that, programmed obsolescence.