Friday, May 1, 2009

Data Recovery from bad RAID 1 mirrored volume on Mac OS X

If you're reading this page, it's because you are trying to find good advice on how to recover data from a bad mirrored drive, also known as RAID 1 (one) on Mac OS X or Mac OS X Server 10.4 (Tiger) or 10.5 (Leopard).

First, as it says on the cover of the good book, don't panic. Take a deep breath, and take your time.

It's very likely that you can recover your data [NOTES 1, 2] in this situation (a probable bad disk in a RAID 1 mirror). The path to doing so is unfortunately not as obvious as it might seem, and not as obvious as it perhaps should be, but it's pretty simple.

Read a whole lot about the subject before you try anything, understand what each tool does before you use it, and then read and think some more before you take the first step.diskcopy.png

Before You Begin

Now, a bit of background. Typically in such a situation (a bad drive in a RAID 1 mirror) one would expect to be able to "break" the RAID 1 mirror, using the software controls you used to establish the mirror, in this case "Disk Utility.app. DO NOT TRY THIS.

If you value your data:

DO NOT attempt to use Disk Utility.app on Mac OS X to break the mirror, as you may lose data [NOTE 6].

DO NOT attempt to use any "disk recovery" tools until after you have attempted non-destructive recovery efforts [NOTES 4, 5]. Furthermore, if you are working with a "bad" drive which contains the only copy of your data, DO NOT use any "disk recovery" tools until you have cloned the bad drive, and made a copy of the clone. Use the special tools only on the 2nd working copy [7].

DO NOT take your system to the neighborhood PC Doctor style repair shop, DO NOT call the "Geek Squad".   

Steps for Data Recovery from a bad mirror (RAID 1) on Mac OS X

To recover a "bad" RAID 1 volume on Mac OS X or Mac OS X Server 10.4 (Tiger) and 10.5 (Leopard) simply:

  1. Shut down the system,unplug the broken drive [NOTE 3],
  2. then start the system in FireWire Target Disk Mode (hold down the T key during power up, until the FireWire symbol flashes on the display).
  3. You can then mount that system as an ordinary external drive on another system by connecting them with a FireWire cable, and then,
  4. simply copy the data any way you like: Carbon Copy Cloner, cp, scp, rsync, or Finder.

Once you identify the good drive you can replace the bad drive and rebuild the mirror. However, I strongly recommend cloning the good disk before you attempt this.

Special Tools

There are a variety of special tools which may help you recover your data, in the event that you are working with one bad drive (rather than one good and one bad, in a mirrored volume). Remember to use dd or Caron Copy Cloner to attempt to clone your bad drive to a good drive, then clone that to a working copy. Only run data recovery tools on a copy of your bad drive, never on the bad drive itself.

Here are some useful tools to help you clone a bad drive without further corrupting the data upon it.

Here are some tools which will help you recover your data, from the working copy of the bad drive which you made using dd, or CCC.

DISCLAIMER [1]:

These instructions ignore several things which can in some cases be useful to know. The typical RAID 1 problem presents a situation where one is unlikely to encounter these edge conditions, but for the record, I am not responsible for your data if you follow advice in this email, wether you follow it correctly or not.

NOTE Regarding the Complexity of Data Recovery [2]:

For example, (and this is not the only edge case I'm ignoring) in some of the rare cases mentioned elsewhere in this email, attempting to simply mount the drive can result in the operating system attempting a consistency check, and then, if it finds something wrong, a fix. In most cases of RAID 1 problems, this won't make anything worse on the "good" drive, but the "bad" drive could be made worse, as with running any other "disk recovery" tool. The general topic of data recovery is complicated enough that a book and several flowcharts are needed to guide one safely through the minefield with the best chance of data recovery. Unfortunately, that book doesn't yet exist, and the internet is littered with really truly dangerous advice. Even worse, many companies will charge you money to run "Norton Disk Doctor" (or whatever) on your bad drive as their first step, and in some cases utterly destroy a reasonable chance of data recovery.

NOTE On identifying the broken drive in the mirror [3]:

It's likely that the quickest way to identify the good drive is to just try them. Unplug one, try to boot the system to FireWire Target Disk Mode, then try to mount it to another system, and if that doesn't work, plug it back in and unplug the other. In the case of a bad RAID 1 volume, it's nearly always true that one drive (and filesystem) are good, while the other is bad. You will easily recover your data from the good drive. (See DISCLAIMER).

NOTE On Recovery of RAID 1 volumes [4]:

DO NOT use disk recovery tools, except as a last resort. They are ordinarily not required in this situation. Your RAID 1 volume is nearly always "bad" because one of the hard drives failed. The other drive is almost always OK, with your data intact. In very rare cases one might see the volume itself corrupted, with corruption mirrored to both disks, but one shouldn't attempt to non-destructively diagnose that until after one tries a safer recovery strategy anyway (outlined above, with more advanced details in notes below). Running most disk recovery tools is not a non-destructive act. These tools "recover" a drive by changing the filesystem and data. Running them on a bad hard drive, or on a RAID volume with a bad disk member, can result in data loss.

NOTE On Strategy [5]:

If one's data is valuable to them, one should never try to use any "data recovery" tools like Boomerang or Norton Disk Doctor on the original "bad" disk or volume, and never until after one has exhausted the "safe" mechanisms for data recovery. These tools can sometimes turn a situation from recoverable to non-recoverable, and should only be used as a last resort, and only after one has (a) cloned the data from the "bad" disk to a good disk, and then (b) made a copy to yet another disk. Only operate such recovery tools against the second copy, so that if and when they blow up, you can attempt to use another tool or another technique from the starting point, rather than from the munged-by-random-poorly-documented-tools point. If you blow up the 2nd copy, you can copy from the good disk to the 2nd copy, and try again. Copying from the original "bad" disk is often time consuming and problematic, so once you have a good dd style copy, clone that for the 2nd, "working" copy.)

NOTE on Mac OS X RAID 1 (mirroring) [6]:

On Mac OS X 10.4 and 10.5, as far as I can tell from my own experience, and from the experiences of others some relayed to me directly others revealed by googling, it appears that one cannot reliably use the software controls in Mac OS X Disk Utility to "break" the RAID 1 mirror, as doing so can result in data loss. I personally find the interface for this to be ambiguous, at best. I would not try to break the mirror with the Disk Utility, which *should* be the recommended approach. Several people I know have filed bugs on this, and it has not been improved in years. My system administrator friends now refuse to use Mac OS X software RAID as a result of extensive, and uniformly poor, experience.

ADVANCED NOTE [7]:

If you are performing forensic analysis, or if your data is really valuable, and you don't have a good backup, you can purchase special kits which include a special drive controller. If you need such a kit, you want to make sure that, in addition to addressing your type of drive (IDE, SCSI, SATA, whatever), it will (1) allow you to disable write on the drive with a hardware setting on the kit adaptor, and (2) will attempt unlimited retries of reads until a successful dd can be read from the bad drive. In many cases a very bad drive which cannot be recovered easily can be cloned with one of these kits. If these kits fail, then you'll need the drive to be opened in a clean room by a reputable data recovery specialist to have any hope of data recovery.

FINAL NOTE:

You wouldn't be reading this unless you didn't have a good backup of your data. Fix that problem immediately after you recover, or fail to recover, your data. For home users, Time Capsule is a great start to a backup solution.

2 comments:

  1. Hi, I have set up a raid and would like to break it into 2 separate hard drives again as it is not working properly. I do not care about the data, Could you please tell me how to do this so my computer will show the 2 drives instead of 1. Thank you in Advance .

    ReplyDelete
  2. To my previous Q on the Raid problem , I am using a Mac Computer.

    ReplyDelete