When the domain controllers died, there were a number of very important Active Directory tools that we needed to use in order to get our systems back up and running properly. Unfortunately, we needed to find a lot of these tools on our own and on-the-fly.
Since we found these to be useful, I thought I would put up a list and brief description of some of these tools so that when you find yourself in a similar situation, you will not be scrambling.
Be aware that a lot of these are command line tools that do not have a pretty GUI. But, if you are in the process of recovering a Windows 2003 domain controller, I am certain that you have a pretty good grasp of the command line.
When you have a significant system failure like we did, things have a tendency to get a bit crazy rather quickly. There are several reasons for this, some of which include:
You are under a lot of pressure to get things done as quickly as possible
People outside of the IT department may have little or nothing to do
Many tasks have only one person in the company with the skills to perform the tasks at hand
Everyone’s individual need, in their opinion, is the most important task that needs to be completed first
Because this is such a stressful time, it is important to keep a clear head and manage the situation as quickly as possible. I have put together a few key action items that you can do when this happens to you so that you can keep things on track.
Imagine my surprise when I went to add my new Windows 2003 server to the domain as a domain controller only to be told that the version of Active Directory that I was running in the network was the wrong type and that Active Directory needed to be upgraded to support Windows 2003.
Now, those of you who are Active Directory savvy are probably thinking to yourself right about now, “I’ll bet he has a Windows 2000 version of the Active Directory running in his environment and he needs to run ADprep to get it upgraded.”
And you would be correct except for one thing. I added two Windows 2003 servers as domain controllers to the network a couple of years ago and I upgraded Active Directory at that time! There should be no need to upgrade Active Directory again!
And believe it or not, things started to get ever stranger after that!
So when our Windows 2000 domain controller at the main office suddenly up and died, we needed to move the FSMO roles to a new server. Typically, this is done by demoting the server with the FSMO roles and the roles will roll over to another server. When this is not possible, as it was in this case, you are then instructed to seize these roles. But, what you are not told about is the disastrous effects of the leftover remnants of the dead domain controller. So, I went looking.
It turns out that there are several things that you need to do in order to clean out your network of the DC ghosts. These include:
Removing metadata
Removing server object from the sites
Removing server object from domain controllers container
It appears that Murphy and his laws were in full effect this week at work. First, I get sick with a nasty case of the flu. Everything that I ate was either return to sender or express exit. As I’m settling in to a day of self pity and TV reruns, the phone start to ring with the news of my second problem.
Apparently our domain controller for our main office crashed and the IT team could not get it to come back up and stay up. So, by 10 o’ clock, I was dragging my flu ridden butt out to the office. I ended up working until 2:AM the next day.
The third problem occurred at 11:AM on day 2. I got a phone call from the tech guy out at our field office telling me that their server did not come back up when they rebooted it. So they now found themselves in the exact same position as we did at the main office.
Things are starting to get sorted out now. We have new servers running in both locations and we are getting everything to start pointing away from the old servers and point to the new servers. We are still getting the occasional person telling us about something that is not working and we are dealing with these as they come up.
One thing that I like to do in situations like this is try and get something positive out of the situation. And there are definitely some good things that are coming out of this whole turn of events. One of those positives is the fact that I have learned a lot about recovering your environment and getting it running in short order.
Since I have gained about five years worth of experience in the past three days, I’m going to be sharing a number of these lessons with you over the next week or so. I hope that you can learn this stuff from me and not the hard way like I did.
So, the first lesson is Poop Happens! We did everything right and by the book. We did proper backups. We plan for disasters to occur. We were prepared to act in the case of a server lose. And yet, we did not count on me being sick. We were not prepared to lose two servers in such a short period of time. There were a lot of details that we just could not foresee or if we did think of them in advance, we figured that the odds of them happening we so small, we did not worry about our actions in the event that they did occur.
What got us by were two key things: experience and flexibility. All of the combined experience that the team had allowed us to come up with solutions to our problems. The fact that one member in the team had tried a solution in a similar situation in the past helped to guide us to success.
Because the team was also flexible, able to think on their feet and come up with sometimes really unique solutions on the fly, was also significant to our success. Not only did the team think outside the box, they threw the box away! We did things that I though we would never do.
A big thanks goes out to Kent, Jeff, Mark, John, and Mamood for all of the help and effort that you put in over the past few days. You guys rock!