Thanks for visiting Daily Cup of Tech!
Here are a few things that you may want to do while you are visiting:

Hope you enjoy your stay!


Server R.I.P.We all know that feeling. Time slows down. It almost feels like you are having an out of body experience. Your stomach gets queasy and you feel like you are going to throw up. And you think to yourself, “This can’t be happening!”

But it is happening. Your server is dead.

But is it really? While it may not boot the way it is, there is still a lot of valuable configurations and data that may be available on the server that you could potentially get to with a bit of perseverance.

It May Not Be Completely Dead

One of the very first things to remember when you have a dead computer is that it is extremely rare for the entire computer to be actually dead. It is usually one hardware component or piece of software that is causing the problem. If you are really unlucky, it will be two things wrong (which is really difficult to diagnose, but not impossible). It is important to determine what component is causing the problem because if you can replace that component with a new one, you could be up and running again in short order.

Some examples of easy to replace components that could get you up and running quickly are:

  • power supplies
  • RAM
  • controller cards
  • video cards
  • hard drive in a RAID configuration
  • cooling fan

With many of these components, you can quickly replace them and experience little (or even no) downtime at all.

Hard Drive a Common Culprit

Unfortunately, the most common component to require replacement in a hard drive. There is a simple reason for this. It is almost always working and it is one of the few computer components that has moving parts. This also happens to be the component that has all of your important data on it. This is precisely why you keep your drives in a RAID array.

The idea is that if a hard drive dies, you will still have all of the data that is stored on that hard drive in one of the other drives. So, all you need to do is remove the defective drive and replace it with a new one.

But, the problem with using live data duplication as a form of protection is that if bad data is written to one drive, it can be written to both drives. Also, you are relying on another “layer” to sit between your hard drives and your data. If this layer becomes bad e.g. a RAID container corrupts or a RAID controller loses its configuration, you can find yourself just as hooped.

I generally build by domain controllers with five drives. Two in a mirrored configuration for the OS and three is a RAID 5 configuration for the data. The nice thing about this is that the OS and data are separated. I have experienced three systems now where the OS container on the RAID system corrupted and left all of the data completely in tact. Had I not configured these systems this way, I believe that I would have lost some or all of my data.

Booting the Unbootable

The problem that you are then faced with is getting data off a server that will not boot. There are a number of options that you have at this point:

  1. Remove the bad OS drives, replace them with new drives, and build a fresh OS. This will get you access to the data and, if the problem was the drives, you could keep the system running for a while longer.
  2. Try to determine the bad drive (if it is only a single drive), replace it, and rebuild it. This is probably the first thing that I would do because it can be the quickest and easiest solution. Unfortunately, many systems can take a very long time to rebuild the new drive and this is a very nerve racking time since you really do not know if this will work until it is done rebuilding.
  3. Boot from a LiveCD. We talked a bit about LiveCDs in Lesson #2. There are a lot of different LiveCDs available that can do many amazing things including recover data from bad sectors, recover deleted data, etc. There are even some that are specifically designed to rescue dead computers.

It’s Alive!!!…Kinda

Sometimes, you can get the server running for a little while. I’ve seen systems that will stay up for a half hour and then quit again. I’ve actually seen systems that behaved like a car turn signal. Data. No data. Data. No data. Data…

Now, there are generally two different camps of opinion as to what you should do in this situation. Some people will tell you to take the opportunity to try and figure out what the problem is. Others will tell you to get the data off the system while you can.

I, personally, follow the second camp. I figure that I will have all the time in the world to determine what the problem is later. Right now, my priority need to be getting that data somewhere safe.

Old Hardware No Longer Needed

If you are lucky enough to get a new server running and you no longer need the old server, do not immediately toss it out. If there is one bad component, this could be a really good test server.

Or, you could do a really good forensic analysis on the system and learn what went wrong and how you can avoid it in the future.

Of course, if your time with the downed system was really frustrating, you may just want to do this:

Then I would say, yes, it is officially dead!

If you found this post useful, why don't you buy me a cup of coffee to show your gratitude?