Redundant FlowersIn my mind, redundancy is one of those “Well, duh!” ideas that you just do. How great is it to have two of something when one breaks?

But, the problem with redundancy is that it can be very expensive, especially when you are talking about hardware. Redundant drives, redundant power supplies, redundant network connections…it all adds up.

I have a very interesting conversation with our company’s CEO immediately after we lost our systems. He asked me how much a new server costs. I know most of you right now are rolling your eyes because this can be the equivalent of asking how long is a piece of string. But, to avoid the obvious “it all depends” conversation I said, “About $16,000″. He took one look at me and said, “We’ve lost more than that in productivity today alone! We need to look at having a redundant system in place.”

So, this is what I came up with.

Note: This is a preliminary design that is still in the works. I would love to hear some feedback on this and get some improvements/suggestions on how to make it better. (You know, that whole collaborative Web 2.0 things!)

There are two things that I want to accomplish with this solution:

  1. Keep services available in the event of a software failure
  2. Keep services available in the event of a hardware failure

What I have come up with is two servers that work together to provide a pool of resources to the system. This means that the servers are no longer working alone. Rather, they are working as a group or team. This is not exactly server clustering but it does provide some of the same benefits. This system will work something like this:

Design

Each server will be configured to work with VMWare ESX server. If you are not familiar with ESX server, it is essentially a very small layer that gives the system the ability to run multiple operating systems on the hardware at the same time. In my mind, this was crucial for VMWare to do because the one big issue that I had with VMWare was the large overhead that was required by the hosting operating system. It seemed too inefficient to me.

With ESX server, this issue disappears. While it is true that there is still a bit of overhead required to run the ESX services, it is significantly less than what required for an entire OS.

I can then install two virtual systems on one physical server. These operating systems will be built into a cluster configuration for even further redundancy.

Now, let’s assume that we have a service that crashed on Virtual Server 1. The clustered configuration would automatically move the service over to Virtual Server 2, removing any downtime that the end user would have experienced.

So, let’s assume that we have a virtual server completely crash. In this instance, Virtual Server 1 would automatically take over all of the services that we one Virtual Server 2. We could restart Virtual Server 2 either on the original hardware or on a second physical server.

Virtual Machine Failure

We could also take snapshots of each server at a specific moment in time. This way, if we had a failure in a server because of a driver update or other software installation, we could instantly roll the server back and be back up and running in seconds.

So, let’s assume the worst has happened and we lose a complete server. VMWare has a product called VMWare High Availability which will automatically move the systems away from the dead server and host it on the system that is still running. Users may see a bit of system slowdown but they will not experience any actual downtime.

Hardware Failure

As I stated earlier, this is a very preliminary concept and I still have a lot of reading and research to do on this but I think that I have the start of a system that will help increase uptime and keep users working.

Similar Posts:

If you found this post useful, why don't you buy me a cup of coffee to show your gratitude?