Active Directory (AD) is typically one of the key network services in an organization. Without it, everything comes to a grinding halt. With this in mind, it’s important to be prepared for the various disasters that might strike a forest.
When it relates to AD, the scope of the disaster can vary quite a bit. It can be as simple as the failure of single domain controller (DC) or the accidental deletion of a single object. An even worse situation is when an entire organizational unit (OU) hierarchy is accidentally deleted. In the worst case scenario, an entire domain or forest might need to be restored.
The good news is that many of the techniques that apply to recovering from simple disasters also apply to recovering from catastrophic disasters. I’ll discuss how to recover from the two most common calamities: a failed DC and accidentally deleted objects.
Backup Strategy
You first need to make sure that you have something to use for a recovery. At a minimum, you should have valid system state backups of at least two DCs in each domain in your AD forest. Windows Server Backup (Windows Server 2008 and later), NTBackup (Windows Server 2003 and Windows 2000 Server), and most commercially available backup tools can perform valid system state backups. However, it’s always worth testing the backups to make sure everything is in order. One important point regarding backup tools is that you should use a Volume Shadow Copy Service (VSS)–aware backup tool. Backup tools that rely on disk imaging or virtual machine (VM) snapshot technologies are generally incompatible with AD. Restoring a backup made by one of these tools can cause serious replication failures known as update sequence number (USN) rollback.
In many organizations, the responsibility for server backups and restores falls to a different team than the team that runs AD. This leads to a couple of problems. First, you have no direct control over the backup process, which makes validating backups difficult. Second, many backup tools require an agent on each DC being backed up, which indirectly provides elevated access to the DC.
To mitigate these problems, I frequently employ a two-tiered approach to DC backups. I use a script to run Windows Server Backup each night on the DC and keep a week or two of backups locally on the DC. The folder containing the backups is then shared, with access restricted to the backup tool, as many backup tools can back up a file share without an agent. I also sometimes store the backup files on neighboring DCs within a site. So, for example, if you have DC1 and DC2 in a site, the backups of DC1 are stored on a file share on DC2 and vice versa.
The benefits of this two-tiered approach include:
- You mitigate some of the risk of being dependent on another team for backups.
- In the event you need to perform a restore, you can proceed right away with the native backup files you have on hand versus waiting for another team to perform the restore.
- You’re not waiting for a backup to copy over the WAN from another site in the event backups are performed remotely.
I posted the script I use to run Windows Server Backup as well as directions for setting it up in my article, "Managing Local Backups with Windows Server Backup".
DC Recovery
One of the great things about AD is the mostly stateless nature of the DC. Aside from potentially holding one or more Flexible Single-Master Operation (FSMO) roles, a DC should generally be a matching replica of other DCs in the domain, except for some potential delay in replication depending on your topology. If a failure renders a DC inoperable, this stateless nature is fantastic because it will often remove the need to go through a complicated restore from a backup. Instead, you can simply reinstall Windows and use Dcpromo to promote the server to a DC and replicate all of the data back in—assuming your domain has more than one DC. If you only have one DC in your domain, you can greatly reduce your exposure to failure by deploying a second one.