At a recent conference, I spoke to a senior engineer for Aelita Software, an Independent Software Vendor (ISV) that produces Windows 2000 and Windows NT administration and migration tools, about a major bug related to Active Directory (AD) backups. During Aelita's development of ERDisk for Active Directory, the company's new AD backup product, engineers discovered that roughly half of all their AD backups resulted in corrupt backup copies. When restored, these backups cause the directory services on the restored domain controller (DC) to be unstartable. Upon further investigation, engineers discovered that the problem wasn't unique to Aelita's software but appeared to be a bug in Win2K's native API for performing AD backups and restores. Thus, this problem affects all software that uses the native API, including Win2K's built-in NT Backup utility and most third-party backup software (e.g., Computer Associates'CA'sARCserve, VERITAS Software's VERITAS Backup Exec).
Aelita engineers believe that the bug hasn't received attention because System State restores that include AD are fairly uncommon. Aelita reports that the problem is simple but unpredictable, and thus is difficult to reproduce during testing.
You might not discover this bug until you attempt to use a backup to restore AD on a Win2K DC. At that point, you're too latewhen you attempt to restore a backup, the bug prevents the DC from starting and causes the system to display a Directory Service cannot start error message. If you use Ntdsutil with the semantic database analysis option to run the database semantic checker, you receive error 550: Database is inconsistent. With NT Backup, the restores become corrupt even when the Verify option is turned on.
In a conference call with Aelita President and CEO Ratmir Timashev, I confirmed that Aelita believes this problem is a bug in Win2K's base release and Win2K Service Pack 1(SP1). Aelita discovered this bug just before Microsoft released Win2K SP2. Timashev mentioned that during the month before SP2's release, Aelita worked with Microsoft to identify the bug, and Microsoft discussed its intention to include a fix in SP2, which it did.
At the time I heard about this problem, Microsoft hadn't documented it. Although empirical data proves that SP2 resolves the problem (neither Aelita nor I have been able to reproduce the bug under SP2), Microsoft doesn't even mention the bug in the SP2 documentation.
A week after I reported the bug in WinInfo Daily UPDATE, "The AD Backup Bug: Monster in the Closet?" (http://www.wininformant.com/articles/index.cfm?articleid=21351), Microsoft officially acknowledged the problem in the article "Windows 2000 Domain Controllers Restored with System State Backups Made Prior to SP2 May Not Boot" (http://support.microsoft.com/support/kb/articles/q295/9/32.asp). The article describes the bug's symptoms and acknowledges that SP2 includes a fix. The article also sheds some light on why the problem occurs.
To protect customers from this bug, backup software vendors must enable their products to verify an AD backup before the backup is restored. Aelita has already implemented this functionality in ERDisk for Active Directory. You can protect yourself by updating your Win2K DCs with SP2. If you're not planning to upgrade to SP2 immediately, consider installing the bug's hotfix, which is available through Microsoft.
End of Article
In the Microsoft article "Windows 2000 Domain Controllers Restored with System State Backups Made Prior to SP2 May Not Boot" (http://support
.microsoft.com/support/kb/articles/
q295/9/32.asp) that Sean Daily mentions, the author fails to explain what to do if you experience the bug. Is AD totally hosed, or is there some way to get the machine running again (e.g., demote it to a member server, then delete and reinstall AD)?<br>
In my client's situation, Ntbackup made backups of AD to the System partition under C:\winnt\ntds each night. After 5 months, the backups had filled the 2GB partition. Attempts to reboot the server for other reasons failed (of course) with only 14MB of disk space left. I booted into AD Restore mode and deleted all the backups except the most recent log file and rebooted, only to be presented with an error. I don't have a clue about what to do.<br>
Chris Kuebler<br>
Your client's situation is different from the AD backup bug I reported. That problem occurs in specific circumstances under Win2K Service Pack 1 (SP1) when multiple backups occur simultaneously on the network. I think your answer will lie in performing AD-recovery steps. To that end, check out resources such as AD and Win2K disaster-recovery articles on the Windows & .NET Magazine Web site (http://www
.winnetmag.com/magazine) or NetPro's "The Definitive Guide to Active Directory Troubleshooting" (http://
www.netpro.com/ebook).<br>
Sean Daily
Chris Kuebler January 31, 2002